Comment by wraptile
Comment by wraptile 3 days ago
I'm a scraper developer and Anubis would have worked 10 - 20 years ago, but now all broad scrapers run on a real headless browser with full cookie support and costs relatively nothing in compute. I'd be surprised if LLM bots would use anything else given the fact that they have all of this compute and engineers already available.
That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale. It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.
You can actually see this in real life if you google web scraping services and which targets they claim to bypass - all of them bypass generic anti-bots like Cloudflare, Akamai etc. but struggle with custom and rare stuff like Chinese websites or small forums because scraping market is a market like any other and high value problems are solved first. So becoming a low value problem is a very easy way to avoid confrontation.
> That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale.
Isn't this what Microsoft is trying to do with their sliding puzzle piece and choose the closest match type systems?
Also, if you come in on a mobile browser it could ask you to lay your phone flat and then shake it up and down for a second or something similar that would be a challenge for a datacenter bot pretending to be a phone.