Comment by unsnap_biceps

Comment by unsnap_biceps 13 hours ago

I believe that a number of AI bots only respect robot.txt entries that explicitly define their static user agent name. They ignore wildcards in user agents.

That counts as barely imho.

I found this out after OpenAI was decimating my site and ignoring the wildcard deny all. I had to add entires specifically for their three bots to get them to stop.

joecool1029 12 hours ago

Even some non-profit ignore it now, Internet Archive stopped respecting it years ago: https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

Reply View 6 replies

SR2Z 11 hours ago

IA actually has technical and moral reasons to ignore robots.txt. Namely, they want to circumvent this stuff because their goal is to archive EVERYTHING.

Reply View | 4 replies
- prinny_ 9 hours ago
  
  Isn’t this a weak argument? OpenAI could also say their goal is to learn everything, feed it to AI, advance humanity etc etc.
  
  Reply View | 2 replies
  
  compootr 9 hours ago
  
  OAI is using others' work to resell it in models. IA uses it to presrrve the history of the web
  there is a case to be made about the value of the traffic you'll get from oai search though...
  
  Reply View | 1 reply
  
  [removed] an hour ago
  
  [deleted]
  
  Reply View | 0 replies
- amarcheschi 10 hours ago
  
  I also don't think they hit servers repeatedly so much
  
  Reply View | 0 replies
AnonC 7 hours ago

As I recall, this is outdated information. Internet Archive does respect robots.txt and will remove a site from its archive based on robots.txt. I have done this a few years after your linked blog post to get an inconsequential site removed from archive.org.

Reply View | 0 replies

noman-land 12 hours ago

This is highly annoying and rude. Is there a complete list of all known bots and crawlers?

Reply View 1 reply

jsheard 12 hours ago

https://darkvisitors.com/agents
https://github.com/ai-robots-txt/ai.robots.txt

Reply View | 0 replies

[removed] 12 hours ago

[deleted]

Reply View 0 replies