Comment by abdullahkhalids

Comment by abdullahkhalids 2 days ago

15 replies

To play the devil's advocate. If you were running a large public forum, and you knew that many companies had started to scrape all data off your site, and were going to cumulatively make billions off that data, and some of those billions will come from polluting your forum with crap content, would you continue running your site in the open?

What is the game theory here? Twitter cooperates and OpenAI defects, and we call that a win?

OmarShehata 2 days ago

Alternatively, build a private invite only dataset of specific communities. Scale horizontally instead of having one single central dataset! That's still a huge win for user ownership.

It doesn't have to be one company controlling who gets access, the users can decide this

  • asdff 2 days ago

    The problem is that the users and their capital are not organized enough to suggest or demand for this. On the other side of the table we have the "establishment" which may not be formally organized but has enough shared incentives among itself where the outcomes aren't any different as if it were indeed formally organized and pooling resources.

    In this sense we have this intractable push and pull going on with just about any community of sufficient size. We have the users who might want or should want some privacy and respect and other such benefits, and then we have the people who actually invest and build these platforms, who are incentivized to deliver other things than what the users best interest might be. Either you empower users to have more money to be able to roll their own solutions, or you try and set up a world where the incentives of capital perfectly match the needs of the individual or collective of individuals, which is probably impossible to do.

numpad0 2 days ago

You having a problem with companies making billions? Why not just regulate that, instead of the latter "...and cumulatively make billions" part?

If companies making billions isn't the part that's problematic, the billions part can be left out, and the real problem can be discussed instead.

bravetraveler 2 days ago

Practically speaking, 'some' of the money is the same or just as good as 'all' of the money in terms of a functioning economy and not pure capitalism.

People pirate software, yet people still develop and sell it.

Yes, it's not the most profitable thing done alone. Only for those who find the nice combination or feedback loop of 'fit', demand, improvement, and expansion.

If you can make billions off the thing, you can presumably handle some GeoIP/rate limiting... or simply, not caring. Anything that falls through the cracks is categorically insignificant to your grand nature.

To justify it, if one must, consider it a trial. As your friendly neighborhood dealer would say: "The first taste is free".

rurp 2 days ago

It's tragic that the LLM craze OpenAI kicked off is threatening to ruin one of the greatest common goods ever invented in the open internet. But hey, at least a handful of giant corporations and investors are making money, so I guess that counts as a win.

KerrAvon 2 days ago

This is a problem for literally every website that isn't completely paywalled. Twitter is not special in this regard.

  • xNeil 2 days ago

    Yes, and Twitter isn't reacting specially in this regard. See Reddit hiking API prices by an absurd amount.

    • ChocMontePy 2 days ago

      Fact Check: Reddit didn't hike API prices by an absurd amount.

      That was was the story spread far and wide by the Apollo app developer that was believed by the gullible and angry Reddit masses.

      But Reddit actually set a reasonable API price, as evidenced by the fact that a year and a half later there are still five 3rd party apps running on reasonable subscriptions:

      Infinity For Reddit

      Nara For Reddit

      Narwhal 2

      Now for Reddit

      Relay For Reddit

      • skeaker 21 hours ago

        This is cherry-picked to the point of revisionism, no? There used to be dozens and dozens of apps and countless useful bots and other tools that ran off of the API. <1% of those tools surviving the price hike actually further confirms that the price change wasn't reasonable.

  • abdullahkhalids 2 days ago

    To continue playing the devil's advocate:

    1. Experts can comment, but I think the value of multi person conversational data from forums is uniquely valuable and in short supply relative to just blogposts/news stories on the internet.

    2. The absolute economic value of the entire corpus of Twitter is much more valuable than any single boatforum.com like website. So Twitter has a much more incentive to lock itself down than boatforum.com.

  • miki123211 2 days ago

    Twitter was "special" because they actually had an API.

    No other social media site (besides Reddit) had one. Facebook kind of tried, although theirs was always a lot more limited, and got shut down when it turned out Cambridge Analytica used it for widespread election fraud[1].

    Both Twitter and Reddit did basically the same thing at roughly the same time, Twitter just had the misfortune to be under the control of Elon Musk, so the move was perceived as ideological.

    [1] As it later turned out, Cambridge Analytica was basically a "nothing burger", they claimed to be able to accomplish a lot while, in reality, accomplishing very little. The damage to interoperability was already done, though.

    • shadowgovt 2 days ago

      Indeed. CA's impact on election outcomes was likely negligible; Antonio Martínez's "Chaos Monkeys" makes the case that Trump using CA was more indicative of the overall notion that Trump's team was spending money to try a bit of everything (online and offline) and the Clinton campaign wasn't. Their tactical failure was believing they could redirect the money to down-ticket races because Clinton / Trump was such an obvious matchup that they didn't need to spend to win.

      What CA did show was that Facebook's statements about protecting user privacy were fundamentally incompatible with the way their API worked, so they had to shut it down because the alternative would have been to just sort of... Let it hang in the air that it wasn't hard for a third-party to build a system to completely bypass user intent in scoping their information.

      (I had the misfortune of trying to write a Facebook app about fifteen years prior, and that was my takeaway at the time also... "Do people, like, realize that their whole process for protecting scraping the social network via third-party app integration is the honor system?" Turns out people didn't).

      • miki123211 2 days ago

        What Mastodon is doing seems suitably ironic in this situation.

        For those unaware, Mastodon's APIs are extremely open and it's very easy to scrape, to the point of providing you with a "firehose" of all public posts that an instance sees, both local and federated, with no authentication required. They also have an extreme anti-scraping culture, anybody who admits to running any kind of scraper which is not strictly opt-in, even for benign / scientific purposes, is very quickly shunned and blocked. Most instances also have a "disallow scraping via robots.txt" policy by default.

        The results? I posted a canary token[1] link on a medium-sized, well-federated, well-protected instance which disallows scraping, and it got hit by some shady social media crawler in a fraction of a second. It started getting hit by many other strange crawlers later on, and it still keeps getting visits (mostly from Google now).

        • shadowgovt 2 days ago

          It's an ecosystem of people that seem insistent on the idea that you can put stuff online behind no authentication wall and expect to stay exclusively on servers you believe it should be on.

          And I don't know what to tell them, because that's not how the internet has ever worked.