Comment by Symbiote

Comment by Symbiote 10 hours ago

1 reply

Oh great /s

In a month or two, I can be annoyed when I see some vibe-coded AI startup's script making five million requests a day to work's website with this.

They'll have been ignoring the error responses:

  {"All data is public and available for free download": "https://example.edu/very-large-001.zip"}
— a message we also write in the first line of every HTML page source.

Then I will spend more time fighting this shit, and less time improving the public data system.

mandatory 2 hours ago

Feel free to read the README, this was already an ability that startups could pay for using private premium proxy services before thermoptic.

Having an open source version allows regular people to do scraping and not just those rich in capital.

Much of the best data services on the internet all start with scraping, the README lists many of them.