Comment by nl

> Much of these gains can be attributed to better tooling and harnesses around the models.

This isn't the case.

Take Claude Code and use it with Haiku, Sonnet and Opus. There's a huge difference in the capabilities of the models.

> And sure enough, I’m seeing the same old flaws as always: frontier models fabricating info not present in the context, having blindness to what is present, getting into loops, failing to follow simple instructions…

I don't know what frontier models you are using but Opus and Codex 5.2 don't ever do these things for me.