SR2Z 6 months ago

IA actually has technical and moral reasons to ignore robots.txt. Namely, they want to circumvent this stuff because their goal is to archive EVERYTHING.

  • prinny_ 6 months ago

    Isn’t this a weak argument? OpenAI could also say their goal is to learn everything, feed it to AI, advance humanity etc etc.

    • compootr 6 months ago

      OAI is using others' work to resell it in models. IA uses it to presrrve the history of the web

      there is a case to be made about the value of the traffic you'll get from oai search though...

      • [removed] 6 months ago
        [deleted]
    • SR2Z 6 months ago

      It does depend a lot on how you feel about IA's integrity :P

  • amarcheschi 6 months ago

    I also don't think they hit servers repeatedly so much

AnonC 6 months ago

As I recall, this is outdated information. Internet Archive does respect robots.txt and will remove a site from its archive based on robots.txt. I have done this a few years after your linked blog post to get an inconsequential site removed from archive.org.