HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by tedsanders

Comment by tedsanders 11 hours ago

3 replies

View on Hacker News

> SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors

SWE-bench is a great eval, but it's very narrow. Two models can have the same SWE-bench scores but very different user experiences.

Here's a nice thread on X about the things that SWE-bench doesn't measure:

https://x.com/brhydon/status/1953648884309536958

dwaltrip 9 hours ago

so annoying you cant read replies without an account nowadays

Reply View | 2 replies
  • Tiberium 9 hours ago

    Use Nitter, the main instance works but there are a lot of other instances as well.

    https://nitter.net/brhydon/status/1953648884309536958

    Reply View | 0 replies
  • dcre 5 hours ago

    Change the url from x.com to xcancel.com to see it all.

    Reply View | 0 replies