Comment by jetrink

Comment by jetrink 2 days ago

0 replies

That sounds like a reasonable prediction to me if the LLM makers do nothing in response. However, I'll bet coding is the easiest area for which to generate synthetic training data. You could have an LLM generate 100k solutions to 10k programming problems in the target language and throw away the results that don't pass automated tests. Have humans grade the results that do pass the tests and use the best answers for future training. Repeat until you have a corpus of high quality code.