Comment by energy123
I don't know if that's a big blocker now we have abundant synthetic data from a RL training loop where language-specific things like syntax can be learned without any human examples. Human code may still be relevant for learning best practices, but even then it's not clear that can't happen via transfer learning from other languages, or it might even emerge naturally if the synthetic problems and rewards are designed well enough. It's still very early days (7-8 months since o1 preview) so to draw conclusions from current difficulties over a 2-year time frame would be questionable.
Consider a language designed only FOR an LLM, and a corresponding LLM designed only FOR that language. You'd imagine there'd be dedicated single tokens for common things like "class" or "def" or "import", which allows more efficient representation. There's a lot to think about ...
It’s just as questionable to declare victory because we had a few early wins and that time will fix everything.
Lots of people had predicted that we wouldn’t have a single human-driven vehicle by now. But many issues happened to be a lot more difficult to solve than previously thought!