Comment by kryptiskt

Comment by kryptiskt 13 hours ago

9 replies

I'd think LLMs would be more dependent on compatibility than humans, since they need training data in bulk. Humans can adapt with a book and a list of language changes, and a lot of grumbling about newfangled things. But an LLM isn't going to produce Python++ code without having been trained on a corpus of such code.

johnisgood 13 hours ago

It should work if you feed the data yourself, or at the very least the documentation. I do this with niche languages and it seems to work more or less, but you will have to pay attention to your context length, and of course if you start a new chat, you are back to square one.

energy123 10 hours ago

I don't know if that's a big blocker now we have abundant synthetic data from a RL training loop where language-specific things like syntax can be learned without any human examples. Human code may still be relevant for learning best practices, but even then it's not clear that can't happen via transfer learning from other languages, or it might even emerge naturally if the synthetic problems and rewards are designed well enough. It's still very early days (7-8 months since o1 preview) so to draw conclusions from current difficulties over a 2-year time frame would be questionable.

Consider a language designed only FOR an LLM, and a corresponding LLM designed only FOR that language. You'd imagine there'd be dedicated single tokens for common things like "class" or "def" or "import", which allows more efficient representation. There's a lot to think about ...

  • jurgenaut23 10 hours ago

    It’s just as questionable to declare victory because we had a few early wins and that time will fix everything.

    Lots of people had predicted that we wouldn’t have a single human-driven vehicle by now. But many issues happened to be a lot more difficult to solve than previously thought!

  • LtWorf 7 hours ago

    How would you debug a programming language made for LLMs? And why not make an LLM that can output gcc intermediate representation directly then?

    • energy123 7 hours ago

      You wouldn't, this would be a bet that humans won't be in the loop at all. If something needs debugging the LLM would do the debugging.

      • ModernMech 6 hours ago

        One has to wonder, why would there be any bugs at all if the LLM could fix them? Given Kernighan's Law, does this mean the LLM can't debug the bugs it makes?

        My feeling is unless you are using a formal language, then you're expressing an ambiguous program, and that makes it inherently buggy. How does the LLM infer your intended meaning otherwise? That means programmers will always be part of the loop, unless you're fine just letting the LLM guess.

          Kernighan's Law - Debugging is twice as hard as writing the code in the first place.