Comment by frogperson
Comment by frogperson 2 days ago
We need a crowd sourced list like adgaurd, but for bots. Id love to block all those ips at the firewall.
Comment by frogperson 2 days ago
We need a crowd sourced list like adgaurd, but for bots. Id love to block all those ips at the firewall.
So that would be at least: GCP, Azure, Alibaba, AWS, Huawei, AT&T, BT, Cox... it's a long list.
User Agents then? No, because that would be: Chrome and Safari.
It's an uphill battle, because the bot authors do not give a shit. You can now buy bot network from actual companies, who embed proxies in free phone games. Anthropic was caught hiding behind Browserbase, and neither of the companies seems to see problem with that.
User agents not IPs, but: https://github.com/ai-robots-txt/ai.robots.txt
A large portion of those addresses will be valid residential IP addresses running malware on compromised Windows machines.
Block GCP, AWS, Azure, and various datacenter prefixen, and you're pretty much golden. There are scant few legitimate reasons a human being's traffic would originate from those hosts.
I am working from a cloud desktop but I am only visiting corporate approved resources from that cloud desktop and I believe that is the case of most cloud desktop users as the whole point is to have a clear separation of duties.
The only way you can block these "AI" scrapers is a combination of IP filtering (https://spur.us/) and Fingerprinting (https://abrahamjuliot.github.io/creepjs/).
Things like browserbase are easy to block with this. It's a losing battle though, personally moved entirely to real environments for https://browser.cash/developers