Comment by pilif

> Unknown websites will get very few crawls per day whereas popular sites millions.

we're hosting some pretty unknown very domain specific sites and are getting hammered by Claude and others who, compared to old-school search engine bots also get caught up in the weeds and request the same pages all over.

They also seem to not care about response time of the page they are fetching, because when they are caught in the weeds and hit some super bad performing edge-cases, they do not seem to throttle at all and continue to request at 30+ requests per second even when a page takes more than a second to be returned.

We can of course handle this and make them go away, but in the end, this behavior will only hurt them both because they will face more and more opposition by web masters and because they are wasting their resources.

For decades, our solution for search engine bots was basically an empty robots.txt and have the bots deal with our sites. Bots behaved reasonably and intelligently enough that this was a working strategy.

Now in light of the current AI bots which from an outsider observer's viewpoint look like they were cobbled together with the least effort possible, this strategy is no longer viable and we would have to resort to provide a meticulously crafted robots.txt to help each hacked-up AI bot individually to not get lost in the weeds.

Or, you know, we just blanket ban them.

kccqzy 6 months ago

The fact that AI bots seem like they were cobbled together with the least effort possible might be related. The people responsible for these bots might have zero experience writing an old school search engine bot and have no idea of the kind of edge cases that would be encountered. They might just turn to LLMs to write their bot code which is not exactly a recipe for success.

Reply View 0 replies