Comment by robinsonb5

Comment by robinsonb5 2 days ago

3 replies

Unfortunately the choice isn't between sites with something like Anubis and sites with free and unencumbered access. The choice is between putting up with Anubis and the sites simply going away.

A web forum I read regularly has been playing whack-a-mole with LLM scrapers for much of this year, with multiple weeks-long periods where the swarm-of-locusts would make the site inaccessible to actual users.

The admins tried all manner of blocks, including ultimately banning entire countries' IP ranges, all to no avail.

The forum's continued existence depends on being able to hold off abusive crawlers. Having to see half-a-second of the Anubis splashscreen occasionally is a small price to pay for keeping it alive.

greatgib 2 days ago

[flagged]

  • pushcx 2 days ago

    The scrapers will not attempt to discover and use an efficient representation. They will attempt to hit every URL they can discover on a site, and they'll do it at a rate of hundreds of hits per second, from enough IPs that each only requests at a rate of 1/minute. It's rude to talk down to people for not implementing a technique that you can't get scrapers to adopt, and for matching their investment in performance to their needs instead of accurately predicting years beforehand that traffic would dramatically change.

  • xena 2 days ago

    I challenge you to take a critical look at the performance of things like PHPBB and see how even naive scraping brings commonly deployed server CPUs to their knees.