Comment by jitl
Comment by jitl 10 hours ago
Really? If I’m an unsophisticated blog not using a CDN, and I get a $1000 bill for bandwidth overage or something, I’m gonna google a solve and slap it on there because I don’t want to pay another $1000 for Big Basilisk. I don’t think that’s emotional response, it’s common sense.
Seems like you've made profoundly questionable hosting or design choices for that to happen. Flat rate web hosting exists, and blogs (especially unsophisticated ones) do not require much bandwidth or processing power.
Misbehaving crawlers are a huge problem but bloggers are among the least affected by them. Something like a wiki or a forum is a better example, as they're in a category of websites where each page visit is almost unavoidably rendered on the fly using multiple expensive SQL queries due to the rapidly mutating nature of their datasets.
Git forges, like the one TFA is discussing, are also fairly expensive, especially as crawlers traverse historical states. When the crawler is poorly implemented they'll get stuck doing this basically forever. Detecting and dealing with git hosts is an absolute must for any web crawler due to this.