Comment by mschuster91
Comment by mschuster91 13 hours ago
Global tarpit is the solution. It makes sense anyway even without taking AI crawlers into account. Back when I had to implement that, I went the semi manual route - parse the access log and any IP address averaging more than X hits a second on /api gets a -j TARPIT with iptables [1].
Not sure how to implement it in the cloud though, never had the need for that there yet.
[1] https://gist.github.com/flaviovs/103a0dbf62c67ff371ff75fc62f...
One such tarpit (Nepenthes) was just recently mentioned on Hacker News: https://news.ycombinator.com/item?id=42725147
Their site is down at the moment, but luckily they haven't stopped Wayback Machine from crawling it: https://web.archive.org/web/20250117030633/https://zadzmo.or...