Comment by altcognito

Comment by altcognito 2 days ago

2 replies

It might be fun to collect the same data if not for any other reason than to note the changes but adding the caveat that it doesn’t represent human output.

Might even change the tool name.

jpjoi 2 days ago

The point was it’s getting harder and harder to do that as things get locked down or go behind a massive paywall to either profit off of or avoid being used in generative AI. The places where previous versions got data is impossible to gather from anymore so the dataset you would collect would be completely different, which (might) cause weird skewing.

  • oneeyedpigeon 2 days ago

    But that would always be the case. Twitter will not last forever; heck, it may not even be long before an open alternative like Bluesky competes with it. Would be interesting to know what percentage of the original mined data was from Twitter.