Comment by ronsor
That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.
That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.
If the "garbage data" is AI generated, it'll be hard or impossible to filter.