Comment by simonw

Comment by simonw 3 days ago

I'd be very interested to hear from anyone who's finding local models that work well for coding agents (Claude Code, Codex CLI, OpenHands etc).

I haven't found a local model that fits on a 64GB Mac or 128GB Spark yet that appears to be good enough to reliably run bash-in-a-loop over multiple turns, but maybe I haven't tried the right combination of models and tools.

embedding-shape 3 days ago

I've had good luck with GPT-OSS-120b (reasoning_effort set to "high") + Codex + llama.cpp all running locally, but I needed to do some local patches to Codex as they don't allow configuring and setting the right values for temperature and top_p for GPT-OSS. Also heavy prompting via AGENTS.md was needed to get it to have similar workflow to GPT-5, it didn't seem to pick up that by itself, so I'm assuming GPT-5 been trained with Codex in mind while GPT-OSS wasn't.

Reply View 3 replies

Xenograph 3 days ago

Would love for you to share the Codex patches you needed to make and the AGENTS.md prompting, if you're open to it.

Reply View | 2 replies
- embedding-shape 3 days ago
  
  Basically just find the place where the inference call happens, add top_k, top_p and temperature to hard-coded numbers (0, 1.0 and 1.0 for GPT-OSS) and you should be good to go. If you really need it, I could dig out patch from it, but it should be really straightforward today, and my patch might be conflicting with the current master of codex, I've diverged for other reasons since I did this.
  
  Reply View | 1 reply
  
  Xenograph 3 days ago
  
  That makes sense, wasn't sure if it was as simple as tweaking those two numbers or not, thanks for sharing!
  If there's any insight you can share about your AGENTS.md prompting, it may also be helpful for others!
  
  Reply View | 0 replies