Comment by hedgehog
Comment by hedgehog 4 days ago
The context some commenters here seem to be missing is that Marcus is arguing that spending another $100B on pure scaling (more params, more data, more compute) is unlikely to repeat the qualitatively massive improvement we saw between say 2017 and 2022. We see some evidence this is true in the shift towards what I categorize as system integration approaches: RAG, step by step reasoning, function calling, "agents", etc. The theory and engineering is getting steadily better as evidenced by the rapidly improving capability of models down in the 1-10B param range but we don't see the same radical improvements out of ChatGPT etc.
I don't see how that is evidence of the claim. We are doing all these things because they make existing models work better, but a larger model with RAG etc is still better than a small one, and everyone keeps working on larger models.