Comment by anon373839
Comment by anon373839 4 days ago
Much of these gains can be attributed to better tooling and harnesses around the models. Yes, the models also had to be retrained to work with the new tooling, but that doesn’t mean there was a step change in their general “intelligence” or capabilities. And sure enough, I’m seeing the same old flaws as always: frontier models fabricating info not present in the context, having blindness to what is present, getting into loops, failing to follow simple instructions…
> Much of these gains can be attributed to better tooling and harnesses around the models.
This isn't the case.
Take Claude Code and use it with Haiku, Sonnet and Opus. There's a huge difference in the capabilities of the models.
> And sure enough, I’m seeing the same old flaws as always: frontier models fabricating info not present in the context, having blindness to what is present, getting into loops, failing to follow simple instructions…
I don't know what frontier models you are using but Opus and Codex 5.2 don't ever do these things for me.