Comment by __MatrixMan__
Comment by __MatrixMan__ 2 days ago
Improved benchmarks are undeniably an improvement, but the bottleneck isn't the models anymore, it's the context engineering necessary to harness them. The more time and effort we put into our benchmarking systems the better we're able to differentiate between models, but then when you take an allegedly smart one and try to do something real with it, it behaves like a dumb one again because you haven't put as much work into the harness for the actual task you've asked it to do as you did into the benchmark suite.
The knowledge necessary to do real work with these things is still mostly locked up in the humans that have traditionally done that work.
The systems around the LLM will get built out. But do you think it will take 50 years to build out like you said before?
I’m thinking 5 years at most.
The key is that the LLMs get smart enough.