Comment by HarHarVeryFunny

Yes, it's a bit shocking to realize that all LLMs are doing is predicting next word (token) from samples in the training data, but the Transformer is powerful enough to do a fantastic job of prediction (which you can think of as selecting which training sample(s) to copy from), which is why the LLM - just a dumb function - appears as smart as the human training data it is copying.

The Math Olympiad results are impressive, but at the end of the day is just this same next word prediction, but in this case fine tuned by additional LLM training on solutions to math problems, teaching the LLM which next word predictions (i.e. output) will add up to solution steps that lead to correct problem solutions in the training data. Due to the logical nature of math, the reasoning/solution steps that worked for training data problems will often work for new problems it is then tested on (Math Olympiad), but most reasoning outside of logical domains like math and programming isn't so clear cut, so this approach of training on reasoning examples isn't necessarily going to help LLMs get better at reasoning on more useful real-world problems.