Comment by ronsor

Comment by ronsor 3 days ago

1 reply

That won't work, because garbage data is filtered after the full dataset is collected anyway. Every LLM trainer these days knows that curation is key.

bogwog 3 days ago

If the "garbage data" is AI generated, it'll be hard or impossible to filter.