Addendum to GPT-5 system card: GPT-5-Codex

181 points by wertyk 8 hours ago

I've been extremely impressed (and actually had quite a good time) with GPT-5 and Codex so far. It seems to handle long context well, does a great job researching the code, never leaves things half-done (with long tasks it may leave some steps for later, but it never does 50% of a step and then just randomly mock a function like Gemini used to), and gives me good suggestions if I'm trying to do something I shouldn't. And the Codex CLI also seems to be getting constant, meaningful updates.

Reply View 49 replies

mmaunder 7 hours ago

Agreed. We're hardcore Claude Code users and my CC usage trended down to zero pretty quickly after I started using Codex. The new model updates today are great. Very well done OpenAI team!! CC was an existential threat. You responded and absolutely killed it. Your move Anthropic.

Reply View | 20 replies
- Jcampuzano2 7 hours ago
  
  To be fair, Anthropic kinda did this to themselves. I consider it as a pretty massive throw on their end in terms of the fairly tight grasp they had on developer sentiment.
  Everyone else slowly caught up and/or surpassed them while they simultaneously had quality control issues and service degradation plaguing their system - ALL while having the most expensive models comparatively in terms of intelligence.
  
  Reply View | 12 replies
  
  mmaunder 7 hours ago
  
  Agreed. I really wish Google would get their act together because I think they have the potential of being faster, cheaper with bigger context windows. They're so great at hardcore science and engineering, but they absolutely suck at products.
  
  Reply View | 10 replies
  
  zamalek 4 hours ago
  
  You're absolutely right!
  
  Reply View | 0 replies
- notfromhere 5 hours ago
  
  Gpt5 writes clean, simple code and listens to instructions. I went from tons of Claude APi usage to usage to basically none overnight
  
  Reply View | 1 reply
  
  ttul 2 hours ago
  
  Agreed. GPT’s coding is so much cleaner. Claude tends to ramble and generate unnecessary scaffolding. GPT’s code is artful and minimalist.
  
  Reply View | 0 replies
- ttul 2 hours ago
  
  This just goes to show how crucial it was for Anthropic and OpenAI to hire first class product leads. You can’t just pay the AI engineers $100M. Models alone don’t generate revenue.
  
  Reply View | 2 replies
  
  dwohnitmok 13 minutes ago
  
  I got the exact opposite lesson. The parent and grandparent comments seem to be talking about dropping one product for another purely on the strength of the model.
  
  Reply View | 0 replies
  
  arthurcolle an hour ago
  
  the model is the product
  
  Reply View | 0 replies
- epolanski 3 hours ago
  
  But how do you use it?
  It's super annoying that it doesn't provide a way to approve edits one by one instead it either vibe codes on its own or gives me diffs to copy paste.
  Claude code has a much saner "normal mode".
  
  Reply View | 1 reply
  
  brianjking an hour ago
  
  Wait, this wasn't what I was experiencing. Did something change in gpt-5-codex or was that your normal experience?
  
  Reply View | 0 replies
robotswantdata 6 hours ago

Agreed ditched my Claude code max for the $200 pro ChatGPT.
Gemini cli is too inconsistent, good for documentation tasks. Don’t let it write code for you

Reply View | 4 replies
- icelancer 5 hours ago
  
  Gemini's tool calling being so bad is pretty amazing. Hopefully in the next iteration they fix it, because the model itself is very good.
  
  Reply View | 3 replies
  
  nowittyusername 2 hours ago
  
  This is a recurring theme with Google. Their models are phenomenal but the systems around them are so bad that it degrades the whole experience. Veo3 great model horrible website, and so on...
  
  Reply View | 1 reply
  
  brianjking 44 minutes ago
  
  Their massive increase in token processing since Veo3 and nano banana have been released would say otherwise...
  Or we're all just used to eating things we don't like and smiling.
  
  Reply View | 0 replies
  
  robbrulinski 3 hours ago
  
  That has been my experience as well with every Gemini model, ugh!
  
  Reply View | 0 replies
DanielVZ 2 hours ago

Can someone compare it to cursor? So far i see people compare it with Claude code but I’ve had much more success and cost effectiveness with cursor than Claude code

Reply View | 0 replies
EnPissant 7 hours ago

My experience with Codex / Gpt-5:
- The smartest model I have used. Solves problems better than Opus-4.1.
- It can be lazy. With Claude Code / Opus, once given a problem, it will generally work until completion. Codex will often perform only the first few steps and then ask if I want to continue to do the rest. It does this even if I tell it to not stop until completion.
- I have seen severe degradation near max context. For example, I have seen it just repeat the next steps every time I tell it to continue and I have to manually compact.
I'm not sure if the problems are Gpt-5 or Codex. I suspect a better Codex could resolve them.

Reply View | 11 replies
- apigalore an hour ago
  
  Yes, this is the one thing stopping me from going to Codex completely. Currently, it's kind of annoying that Codex stops often and asks me what to do, and I just reply "continue". Even though I already gave it a checklist.
  With GPT‑5-Codex they do write: "During testing, we've seen GPT‑5-Codex work independently for more than 7 hours at a time on large, complex tasks, iterating on its implementation, fixing test failures, and ultimately delivering a successful implementation." https://openai.com/index/introducing-upgrades-to-codex/
  
  Reply View | 0 replies
- brookst 7 hours ago
  
  Claude seems to have gotten worse for me, with both that kind of laziness and a new pattern where it will write the test, write the code, run the test, and then declare that the test is working perfectly but there are problems in the (new) code that need to be fixed.
  Very frustrating, and happening more often.
  
  Reply View | 2 replies
  
  elliot07 7 hours ago
  
  They for sure nerfed it within the last ~3 weeks. There's a measurable difference in quality.
  
  Reply View | 1 reply
  
  conception 7 hours ago
  
  They actually just had a bug fix and it seems like it recently got a lot better in the last week or so
  
  Reply View | 0 replies
- M4v3R 7 hours ago
  
  Context degradation is a real problem with all frontier LLMs. As a rule of thumb I try to never exceed 50% of available context window when working with either Claude Sonnet 4 or GPT-5 since the quality drops really fast from there.
  
  Reply View | 4 replies
  
  darkteflon 6 hours ago
  
  Agreed, and judicious use of subagents to prevent pollution of the main thread is another good mitigant.
  
  Reply View | 0 replies
  
  EnPissant 7 hours ago
  
  I've never seen that level of extreme degradation (just making a small random change and repeating the same next steps infinitely) on Claude Code. Maybe Claude Code is more aggressive about auto compaction. I don't think Codex even compacts without /compact.
  
  Reply View | 2 replies
- bayesianbot 7 hours ago
  
  I definitely agree with all of those points. I just really prefer it completing steps and asking me if we should continue to next step rather than doing half of the step and telling me it's done. And the context degradation seems quite random - sometimes it hits way earlier, sometimes we go through crazy amount of tokens and it all works out.
  
  Reply View | 0 replies
- tanvach 7 hours ago
  
  I also noticed the laziness compared to Sonnet models but now I feel it’s a good feature. Sonnet models, now I realize, are way too eager to hammer out code with way more likelihood of bugs.
  
  Reply View | 0 replies
mritchie712 7 hours ago

Have you used Claude Code? How does it compare?

Reply View | 4 replies
- mmaunder 7 hours ago
  
  It's objectively a big improvement over Claude Code. I'm rooting for anthropic, but they better make a big move or this will kill CC.
  
  Reply View | 3 replies
  
  nightshift1 6 hours ago
  
  What are the usage limits like compared to Claude Code? Is it more like 5× or 20×? For twice the price, it would have to be very good.
  
  Reply View | 2 replies
FergusArgyll 6 hours ago

It doesn't seem to have any internal tools it can use. For example, web search; It just runs curl in the terminal. Compared to Gemini CLI that's rough but it does handle pasting much better... Maybe I'm just using both wrong...

Reply View | 3 replies
- Tiberium 6 hours ago
  
  It does have web search - it's just not enabled by default. You can enable it with --search or in the config, then it can absolutely search, for example finding manuals/algorithms.
  
  Reply View | 0 replies
- gizmodo59 6 hours ago
  
  Use --search option when you start codex
  
  Reply View | 0 replies
- ollybee 6 hours ago
  
  web search too is off by default
  
  Reply View | 0 replies
catlover76 6 hours ago

[dead]

Reply View | 0 replies

simonw 6 hours ago

This should probably be merged with the other GPT-5-Codex thread at https://news.ycombinator.com/item?id=45252301 since nobody in this thread is talking about the system card addendum.

Reply View 0 replies

jumploops 6 hours ago

Interesting, the new model uses a different prompt in Codex CLI that's ~half the size (10KB vs. 23KB) of the previous prompt[0][1].

SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).

As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite it (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.

Additionally, they claim the new model is more steerable (both with AGENTS.md and generally).

In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!

[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...

[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...

(comment reposted from other thread)

Reply View 2 replies

robotswantdata 5 hours ago

saw the same behaviour
What worked was getting it to first write a detailed implementation plan for a “junior contractor” then attempt it in phases (clearing task window each time) and told to use /tmp to copy files and transform them then update the original.
Looking forward to trying the new model out on the next refactor!

Reply View | 1 reply
- jumploops 5 hours ago
  
  Yes, regardless of tool, I always create a separate plan doc for larger changes
  Will try adding the instructions specific to refactors (i.e. copy/move files, don't rewrite when possible)
  I've also found it helpful, especially for certain regressions, to basically create a new branch for any Codex/CC assisted task (even if part of a larger task). Makes it easier to identify regressions due to recent changes (i.e. look at git diff, it worked previously)
  Telling the "agent" to manage git leads to more context pollution than I want, so I manage all commits/branches myself, but I'm sure that will change as the tools improve/they do more RL on full-cycle software dev
  
  Reply View | 0 replies

zapnuk 6 hours ago

It would be nice if this model would be good enough to update their typscript sdk (+agents library) to use, or at least support, zod v4 - they still use v3.

Had to spend quite a long time to figure out a dependency error...

Reply View 0 replies

mindwok 5 hours ago

Codex with GPT-5-High is extremely good. Like many I was a bit "meh" about the GPT 5 release, however once I started using it with Codex it became clear there was a substantial improvement in a capability I wasn't really paying attention to, which is tool calling. Or more specifically, when to call a tool. Ask GPT-5-High a question about your codebase and watch the things it looks for, and things it searches for (if you use --search). It has very good taste on how to navigate and solve a problem.

Reply View 0 replies

ionwake 6 hours ago

Can someone explain what this all means? Has codex just been updated to use chat-gpt 5 ? Or is this just extra info?

Reply View 5 replies

simonw 5 hours ago

I posted some notes here that might be useful: https://simonwillison.net/2025/Sep/15/gpt-5-codex/
Even shorter version:
- New coding-specialist model called GPT-5-Codex, coming soon to the API but for now available in their Codex CLI, VS Code and Codex Cloud products
- New code review product (part of Codex Cloud) that can review PRs for you
- New model promises better code review, less pointless comments and can vary its reasoning effort for simple vs complex tasks

Reply View | 2 replies
- naiv 5 hours ago
  
  The pelican is not so convincing though :)
  So a bit in line with what Theo mentioned in his video that he was not happy with the ui capabilities
  
  Reply View | 0 replies
- ionwake 4 hours ago
  
  Amazing thank you
  
  Reply View | 0 replies
amrrs 6 hours ago

It is a new version of GPT-5 that's been primarily optimized for coding. Hence this confusing name - GPT-5-Codex.
This model is available inside all OpenAI codex products. Yet to be available on Api
The model is supposed to be better at code reviews and Comments than the other GPT-5 variant. It can also think/work upto 7 hours.

Reply View | 1 reply
- ionwake 4 hours ago
  
  Amazing cheers
  
  Reply View | 0 replies

withinboredom 7 hours ago

Codex always appears to use spaces, even when the project uses tabs (aka, a Go file). It's so annoying.

Reply View 13 replies

asadm 7 hours ago

this + any coding conventions should ALWAYS be a post process. DO NOT include them in your prompt, you are losing model accuracy over these tiny things.

Reply View | 10 replies
- withinboredom 6 hours ago
  
  It helps to actually be able to read the diffs of its proposals/changes in the terminal. The changing from tabs -> spaces on every line it touches generally results in unreadable messes.
  I have a pretty complex project, so I need to keep an eye on it to ensure it doesn't go off the rails and delete all the code to get a build to pass (it wouldn't be the first time).
  
  Reply View | 7 replies
  
  ameliaquining 6 hours ago
  
  I think the idea is that your IDE or whatever should automatically run the project's autoformatter after every AI edit, so that any formatting mistakes the AI makes are fixed before you have to look at them.
  
  Reply View | 2 replies
  
  wahnfrieden 6 hours ago
  
  You are poisoning your context making it focus on an unusual requirement contrary to most of its training data. It’s a formatter task, not an LLM task
  In fact you should convert your code to spaces at least before LLM sees it. It’ll improve your results by looking more like its training data.
  
  Reply View | 3 replies
- scrollaway 3 hours ago
  
  Does codex have a good way of doing post process hooks? For Claude Code hooks I never found a way to run a formatter over only the file that was edited. It’s super annoying as I want to constantly have linting and formatting cleaned up right after the model finishes editing a file…
  
  Reply View | 0 replies
- Der_Einzige 6 hours ago
  
  Stop telling the normies the secrets please! You've just harmed job security quite a bit for a lot of people!
  
  Reply View | 0 replies
dgfitz 6 hours ago

The future is truly here, we finally solved the tab vs spaces debate. The singularity must be right around the corner.

Reply View | 0 replies
wahnfrieden 7 hours ago

Just use a linter hook to standardize style

Reply View | 0 replies

WhitneyLand 6 hours ago

Apparently today is the first release with MCP support.

Updates (v0.36) https://github.com/openai/codex/releases

Reply View 2 replies

artdigital 2 hours ago

Codex had MCP support for a long long time

Reply View | 1 reply
- WhitneyLand 2 hours ago
  
  Really, I thought I had checked for it a couple months ago and didn’t see it?
  Commented after I saw this added in today’s release notes: “initial MCP interface and docs”
  
  Reply View | 0 replies

6thbit 6 hours ago

Direct link to the pdf

https://cdn.openai.com/pdf/97cc5669-7a25-4e63-b15f-5fd5bdc4d...

Reply View 0 replies

hereme888 5 hours ago

Codex just ate up my remaining turns for the day for a clearly defined patch that should have taken just a few actions. Anyone else experienced that?

Reply View 2 replies

bn-l 4 hours ago

Yes. I believe it’s a bug from their issues page

Reply View | 0 replies
denuoweb 2 hours ago

Yes. "Failed. You've hit your usage limit. Upgrade to Pro (https://openai.com/chatgpt/pricing) or try again in 3 days 10 minutes."
I can't use the IDE codex at all now it seems.

Reply View | 0 replies

bezzi 4 hours ago

is this model just acting super slow with you guys too?

Reply View 1 reply

naiv 4 hours ago

Feels slower than GPT-5 and I understood it that medium should be a lot faster than high but for me it's almost the same , so I don't see a reason preferring medium.

Reply View | 0 replies

sergiotapia 6 hours ago

I signed up to OpenAI, verified my identity, and added my credit card, bought $10 of credits.

But when I installed Codex and tried to make a simple code bugfix, I got rate limited nearly immediately. As in, after 3 "steps" the agent took.

Are you meant to only use Codex with their $200 "unlimited" plans? Thanks!

Reply View 7 replies

wahnfrieden 6 hours ago

Use Plus first

Reply View | 6 replies
- sergiotapia 6 hours ago
  
  Thank you - so to confirm Codex _requires_ basically the Plus or $200 plans otherwise it just does not work?
  
  Reply View | 5 replies
  
  simonw 6 hours ago
  
  The new GPT-5-Codex model isn't yet available in the API, so if you want to try that model using the Codex CLI tool the only way to do that is with a ChatGPT account (I'm more sure if the free account has it, the $20/month definitely does). You need to then authenticate Codex CLI with ChatGPT.
  OpenAI say API access to that model is coming soon, at which point till be able to use it in Codex CLI with an API key and pay for tokens as you go.
  You can also use the Codex CLI tool without using the new GPT-5-Codex model.
  
  Reply View | 0 replies
  
  Tiberium 6 hours ago
  
  You can use Codex CLI with an API key instead of a subscription, but then you won't have access to this new GPT-5 Codex model, since it's not out on the API yet. But normal GPT-5 in Codex is perfectly fine.
  
  Reply View | 3 replies

Difwif 7 hours ago

Is this available to use now in Codex? Should I see a new /model?

Reply View 1 reply

andrewmunsell 7 hours ago

Yes, but I had to update the Codex CLI manually via NPM to see it. The VS Code extension auto-updated for me

Reply View | 0 replies

darkteflon 6 hours ago

Does Codex have token-hiding (cf Anthropic’s “subagents”)?

I was tempted to give Codex a try but a colleague was stung by their pricing. Apparently if you go over your Pro plan allocation, they just quietly and automatically start billing you per-token?

Reply View 4 replies

steveklabnik 6 hours ago

I tried Codex with the $20/month plan recently and it did exactly what Claude Code does, stop and tell you “sorry, you’re out of credit, come back in x days.”

Reply View | 3 replies
- darkteflon 6 hours ago
  
  Thank you, glad to hear it. Sounds like my colleague might have had it misconfigured. I’ll give Codex a try then.
  
  Reply View | 2 replies
  
  embirico 5 hours ago
  
  Hey, I work on Codex—absolutely no way that a user on a Pro plan would somehow silently move to token-based billing. You just hit a limit and have to wait for the reset. (Which also sucks, and which we're also improving early warnings of.)
  
  Reply View | 1 reply
  
  darkteflon 4 hours ago
  
  Thanks for that, appreciate the clarification. I’ll check with my colleague and report back on his experience. Certainly don’t want to misrepresent.
  
  Reply View | 0 replies

tschellenbach 3 hours ago

is it already supported in cursor? don't see it just yet

Reply View 2 replies

mindwok 3 hours ago

It's not available via the API yet, so probably not.

Reply View | 0 replies
toomanyflops 3 hours ago

while not available as a specific model to use in cursor, it is available via openai’s codex extension on vscode/cursor

Reply View | 0 replies

lvl155 7 hours ago

I think it would be cool to see *nix “emulation” integrated into coding AIs. I don’t think it’s necessary to run these agents inside of container as most people are right now. That’s a lot of overhead.

Reply View 5 replies

simonw 6 hours ago

You mean instead of them running the code that they are writing they pretend to run the code and the model shows what it thinks would happen?
I don't like that at all. Actually running the code is the single most effective protection we have against coding mistakes, from both humans and machines.
I think it's absolutely worth the complexity and performance overhead of hooking up a real container environment.
Not to mention you can run a useful code execution container in 100MB of RAM on a single CPU (or slice thereof). Simulating that with an LLM takes at least one GPU and 100GB or more of VRAM.

Reply View | 4 replies
- lvl155 6 hours ago
  
  I understand your point but I basically find myself running all my agents in barebones containers and they’re basically short-run make-or-kill types. And once we ramp up agent counts, possibly into the thousands, that could add up rapidly. Of course, you would run milestone tests on actual container/envs but I think there might be a need for lighter solutions for rapid agent dev runs.
  
  Reply View | 3 replies
  
  rgo 5 hours ago
  
  There are now many solutions, and full-blown startups, under the "swarm", "agent orchestration" and other similar keywords, for spinning agents in the cloud. I'm not sure if that's what you mean, but I totally see most of vibe coding being replaced by powerhouse agents, placed locally or in the cloud, picking up tasks and working them out until its really done.
  
  Reply View | 0 replies
  
  withinboredom 6 hours ago
  
  You do realize that there is virtually no overhead in running containers, right? That's the entire point of their existence. They're just processes, with specific permissions (to generalize it). Your computer can run thousands of processes without sweating.
  
  Reply View | 1 reply
  
  lvl155 3 hours ago
  
  > You do realize that there is virtually no overhead in running containers, right? That's the entire point of their existence.
  No, I didn’t know running containers used “virtually no overhead.” It appears I can run millions of containers without any resource constraint? Is that some sort of cheat code?
  
  Reply View | 0 replies