Comment by pton_xd
That's pretty much the state of today. Frontier LLMs are already trained on all publicly available human-generated text, and they are already heavily training on synthetic data to improve at verifiable tasks eg coding.
That's pretty much the state of today. Frontier LLMs are already trained on all publicly available human-generated text, and they are already heavily training on synthetic data to improve at verifiable tasks eg coding.