Comment by MatrixMan

Comment by __MatrixMan__ 2 days ago

View on Hacker News

It seems like it's approaching a horizontal asymptote to me, or is at the very least concave down. You might be describing a state 50 years from now.

aurareturn 2 days ago

It seems like progress is accelerating, not slowing down.

ARC AGI 2: https://x.com/poetiq_ai/status/2003546910427361402

METR: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

Reply View 4 replies

__MatrixMan__ 2 days ago

Improved benchmarks are undeniably an improvement, but the bottleneck isn't the models anymore, it's the context engineering necessary to harness them. The more time and effort we put into our benchmarking systems the better we're able to differentiate between models, but then when you take an allegedly smart one and try to do something real with it, it behaves like a dumb one again because you haven't put as much work into the harness for the actual task you've asked it to do as you did into the benchmark suite.
The knowledge necessary to do real work with these things is still mostly locked up in the humans that have traditionally done that work.

Reply View | 3 replies
- aurareturn a day ago
  
  The systems around the LLM will get built out. But do you think it will take 50 years to build out like you said before?
  I’m thinking 5 years at most.
  The key is that the LLMs get smart enough.
  
  Reply View | 2 replies
  
  __MatrixMan__ 19 hours ago
  
  The more I think of it the less likely I think it is that "all code written via LLM" will happen at all.
  I use LLMs to generate systems that interpret code that I use to express my wishes, but I don't think is would be desirable to express those wishes in natural language all of the time.
  
  Reply View | 1 reply
  
  aurareturn 5 hours ago
  
  That's why people don't think software engineers as a profession will disappear. It'll just change.
  
  Reply View | 0 replies

anthonypasq 2 days ago

sonnet 3.7 was released 10 months ago! (the first model truly capable of any sort of reasonable agentic coding at all) and opus 4.5 exists today.

Reply View 2 replies

rabf 2 days ago

To add to this: the tooling or `harness` around the models has vastly improved as well. You can get far better results with older or smaller models today than you could 10 months ago.

Reply View | 1 reply
- theshrike79 17 hours ago
  
  The harnesses are where most progress is made at the moment. There are some definite differences in the major models as to what kind of code they prefer, but I feel the harnesses make the biggest difference.
  Copilot + Sonnet is a complete idiot at times, while Claude Code + Sonnet is pretty good.
  
  Reply View | 0 replies