Comment by jumploops

Comment by jumploops 9 hours ago

Interesting, the new model uses a different prompt in Codex CLI that's ~half the size (10KB vs. 23KB) of the previous prompt[0][1].

SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).

As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite it (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.

Additionally, they claim the new model is more steerable (both with AGENTS.md and generally).

In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!

[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...

[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...

(comment reposted from other thread)

faangguyindia an hour ago

I do not trust SWE bench, here i am using gemini 2.5 pro and single shot most features: https://www.reddit.com/r/ChatGPTCoding/comments/1nh7bu1/3_ph...

Reply View 0 replies

robotswantdata 8 hours ago

saw the same behaviour

What worked was getting it to first write a detailed implementation plan for a “junior contractor” then attempt it in phases (clearing task window each time) and told to use /tmp to copy files and transform them then update the original.

Looking forward to trying the new model out on the next refactor!

Reply View 1 reply

jumploops 8 hours ago

Yes, regardless of tool, I always create a separate plan doc for larger changes
Will try adding the instructions specific to refactors (i.e. copy/move files, don't rewrite when possible)
I've also found it helpful, especially for certain regressions, to basically create a new branch for any Codex/CC assisted task (even if part of a larger task). Makes it easier to identify regressions due to recent changes (i.e. look at git diff, it worked previously)
Telling the "agent" to manage git leads to more context pollution than I want, so I manage all commits/branches myself, but I'm sure that will change as the tools improve/they do more RL on full-cycle software dev

Reply View | 0 replies