Comment by jpjoi

Comment by jpjoi 2 days ago

1 reply

The point was it’s getting harder and harder to do that as things get locked down or go behind a massive paywall to either profit off of or avoid being used in generative AI. The places where previous versions got data is impossible to gather from anymore so the dataset you would collect would be completely different, which (might) cause weird skewing.

oneeyedpigeon 2 days ago

But that would always be the case. Twitter will not last forever; heck, it may not even be long before an open alternative like Bluesky competes with it. Would be interesting to know what percentage of the original mined data was from Twitter.