Comment by grumbel

I don't think the probabilistic prediction is a problem. The problem with current LLM is that they are limited to doing "System 1" thinking, only giving you a fast instinctive response to a question. While that works great for a lot of small problems, it completely falls apart on any larger task that requires multiple steps or backtracking. "System 2" thinking is completely missing as is the ability to just self-iterate on their own output.

Reasoning models are trying to address that now, but monologueing in token-space still feels more like a hack than a real solution, but it does improve their performance a good bit nonetheless.

In practical terms all this means is that current LLMs still need a hell of a lot of hand holding and fail at anything more complex, even if their "System 1" thinking is good enough for the task (e.g. they can write Tetris in 30sec no problem, but they can't write SuperMarioBros at all, since that has numerous levels that would blow the context window size).