Comment by bogwog

Comment by bogwog 4 days ago

3 replies

I wonder if the best solution is still just to create link mazes with garbage text like this: https://blog.cloudflare.com/ai-labyrinth/

It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward?

ronsor 3 days ago

That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.

  • bogwog 3 days ago

    If the "garbage data" is AI generated, it'll be hard or impossible to filter.

creatonez 3 days ago

Crawlers already know how to stop crawling recursive or otherwise excessive/suspicious content. They've dealt with this problem long before LLM-related crawling.