Comment by greatgib

Comment by greatgib 2 days ago

8 replies

Just a personal fact, when I want to see a page and instead I have to face a 3s stupid nagscreens like the one of anubis, I'm very pissed off and pushed even more to bypass the website when possible to get the info I want directly from llm or search engine.

It's kind of a self fulfilling prophecy, you make it the visitor experience worse, giving a self justification why llm giving the content is wanted and needed.

All of that because in the current lambda/cloud computing word, it became very expensive to process only a few requests.

robinsonb5 2 days ago

Unfortunately the choice isn't between sites with something like Anubis and sites with free and unencumbered access. The choice is between putting up with Anubis and the sites simply going away.

A web forum I read regularly has been playing whack-a-mole with LLM scrapers for much of this year, with multiple weeks-long periods where the swarm-of-locusts would make the site inaccessible to actual users.

The admins tried all manner of blocks, including ultimately banning entire countries' IP ranges, all to no avail.

The forum's continued existence depends on being able to hold off abusive crawlers. Having to see half-a-second of the Anubis splashscreen occasionally is a small price to pay for keeping it alive.

  • greatgib 2 days ago

    [flagged]

    • pushcx 2 days ago

      The scrapers will not attempt to discover and use an efficient representation. They will attempt to hit every URL they can discover on a site, and they'll do it at a rate of hundreds of hits per second, from enough IPs that each only requests at a rate of 1/minute. It's rude to talk down to people for not implementing a technique that you can't get scrapers to adopt, and for matching their investment in performance to their needs instead of accurately predicting years beforehand that traffic would dramatically change.

    • xena 2 days ago

      I challenge you to take a critical look at the performance of things like PHPBB and see how even naive scraping brings commonly deployed server CPUs to their knees.

eqvinox 2 days ago

If you don't feel like understanding the thing to be pissed off about here are the AI crawlers, we don't feel like understanding your displeasure about the Anubis wall either. The choices are either the Anubis wall or nothing. This isn't theoretical, I've been involved in this decision: we had to either close off the service entirely, or put [something like] Anubis in front of it.

> have to face a 3s stupid nagscreens like the one of anubis, I'm very pissed off and pushed even more to bypass the website when possible to get the info I want directly from llm or search engine.

Most (freely accessible) LLMs will take more than 3s to "think". Why are you pissed off about Anubis, but not the slow LLM? And then you have to double check the LLM anyway...

> All of that because in the current lambda/cloud computing word, it became very expensive to process only a few requests.

You're making some very arrogant assumptions here. FOSS repos and bugtrackers are generally not lambda/cloud hosted.

  • redwall_hp a day ago

    There are a lot of phpBB/XenForo/Discourse/etc fouls out there too that get slammed hard by those, and many cases of them just shutting down rather than eating much higher hosting costs. Which, of course, further pushes online communities in the hands of corporations like Reddit and Facebook.

    Most of them are simply throwing one of those tools on a VPS or such, which is perfect for their community size, and then falls over under LLM companies' botnets DDoSing them.

DanOpcode 2 days ago

I agree, I think it gives a bad impression when I need to see the anime Anubis girl before the page loads. Codeberg.org oftens shows me the nag screen, and it has worsened my impression of their service.