Comment by tsurba
I don’t believe so. I think all important parts that each need to be scaled to advance significantly in the LLM paradigm are at or near the end of the steep part of the sigmoid:
1) useful training data available in the internet 2) number of humans creating more training data ”manually” 3) parameter scaling 4) ”easy” algorithmic inventions 5) available+buildable compute
”Just” needing a few more algorithmic inventions to keep the graphs exponential is a cop out. It is already obvious that just scaling parameters and compute is not enough.
I personally predict that scaling LLMs for solving all physical tasks (eg cleaning robots) or intellectual pursuits (they suck at multiplication) will not work out.
We will get better specialized tools by collecting data from specific, high economic value, constrained tasks, and automating them, but scaling a (multimodal) LLM to solve everything in a single model will not be economically viable. We will get more natural interfaces for many tasks.
This is how I think right now as a ML researcher, will be interesting to see how wrong was I in 2 years.
EDIT: addition about latest algorithmic advances:
- Deepseek style GRPO requires a ladder of scored problems progressively more difficult and appropriate to get useful gradients. For open-ended problems (like most interesting ones are) we have no ladders for, and it doesn’t work. In particular, learning to generate code for leetcode problems with a good number of well made unit tests is what it is good for.
- Test-time inference is just adding an insane amount of more compute after training to brute-force double-check the sanity of answers
Neither will keep the graphs exponential.