Comment by kryptiskt

Comment by kryptiskt 2 months ago

I'd think LLMs would be more dependent on compatibility than humans, since they need training data in bulk. Humans can adapt with a book and a list of language changes, and a lot of grumbling about newfangled things. But an LLM isn't going to produce Python++ code without having been trained on a corpus of such code.

johnisgood 2 months ago

It should work if you feed the data yourself, or at the very least the documentation. I do this with niche languages and it seems to work more or less, but you will have to pay attention to your context length, and of course if you start a new chat, you are back to square one.

Reply View 0 replies

energy123 2 months ago

I don't know if that's a big blocker now we have abundant synthetic data from a RL training loop where language-specific things like syntax can be learned without any human examples. Human code may still be relevant for learning best practices, but even then it's not clear that can't happen via transfer learning from other languages, or it might even emerge naturally if the synthetic problems and rewards are designed well enough. It's still very early days (7-8 months since o1 preview) so to draw conclusions from current difficulties over a 2-year time frame would be questionable.

Consider a language designed only FOR an LLM, and a corresponding LLM designed only FOR that language. You'd imagine there'd be dedicated single tokens for common things like "class" or "def" or "import", which allows more efficient representation. There's a lot to think about ...

Reply View 8 replies

jurgenaut23 2 months ago

It’s just as questionable to declare victory because we had a few early wins and that time will fix everything.
Lots of people had predicted that we wouldn’t have a single human-driven vehicle by now. But many issues happened to be a lot more difficult to solve than previously thought!

Reply View | 0 replies
LtWorf 2 months ago

How would you debug a programming language made for LLMs? And why not make an LLM that can output gcc intermediate representation directly then?

Reply View | 6 replies
- energy123 2 months ago
  
  You wouldn't, this would be a bet that humans won't be in the loop at all. If something needs debugging the LLM would do the debugging.
  
  Reply View | 5 replies
  
  ModernMech 2 months ago
  
  One has to wonder, why would there be any bugs at all if the LLM could fix them? Given Kernighan's Law, does this mean the LLM can't debug the bugs it makes?
  My feeling is unless you are using a formal language, then you're expressing an ambiguous program, and that makes it inherently buggy. How does the LLM infer your intended meaning otherwise? That means programmers will always be part of the loop, unless you're fine just letting the LLM guess.
  Kernighan's Law - Debugging is twice as hard as writing the code in the first place.
  
  Reply View | 2 replies
  
  codr7 2 months ago
  
  Yeah, I can imagine how that goes:
  Oh, there's a bug in this test case, deletes test case.
  Oh, now we're missing a test case, adds test case.
  Lather, rinse, repeat.
  
  Reply View | 0 replies
  
  LtWorf 2 months ago
  
  Lol.
  
  Reply View | 0 replies