Comment by seanmcdirmid

Comment by seanmcdirmid 2 days ago

2 replies

I invested in a beefy laptop that can run Qwen Coder locally and it works pretty good. I really think local models are the future, you don’t have to worry about credits or internet access so much.

jimmaswell 2 days ago

What are the specs, and how does it compare to Copilot or GPT Codex?

  • seanmcdirmid 2 days ago

    You can check out https://www.reddit.com/r/LocalLLaMA/comments/1piq11p/mac_wit... for a sentiment of usefulness and the specs of the machines running it. It will be some variation of Max or Ultra level Apple silicon, and around 64GB or more RAM. Oh, and an HN submission from 9 months ago: https://news.ycombinator.com/item?id=43856489

    Copilot comparison:

    Intelligence: Qwen2.5-Coder-32B is widely considered the first open-source model to reach GPT-4o and Claude 3.5 Sonnet levels of coding proficiency. While Copilot (using GPT-4o) remains highly reliable, Qwen often produces more concise code and can outperform cloud models in specific tasks like code repair.

    Latency: Local execution on an M3 Max provides near-zero network latency, resulting in faster "start-to-type" responses than Copilot, which must round-trip to the cloud.

    Reliability: Copilot is an all-in-one "vibe" that integrates deeply into VS Code. Qwen requires local tools like Ollama or MLX-LM and a plugin like Continue.dev to achieve the same UX.

    GPT-Codex:

    Intelligence & Reasoning: In recent 2025–2026 benchmarks, the Qwen3-Coder series has emerged as the strongest open-source performer, matching the "pass@5" resolution rates of flagship models like GPT-5-High. While OpenAI’s latest GPT-5.1-Codex-Max remains the overall leader in complex, project-wide autonomous engineering, Qwen is frequently cited as the better choice for local, file-specific logic.

    Architecture & Efficiency: OpenAI models like GPT-OSS-20b (a Mixture-of-Experts model) are optimized for extreme speed and tool-calling. However, the M3 Max with 64GB is powerful enough to run the Qwen3-Coder-30B or 32B models at full fidelity, which provides superior logic to OpenAI's smaller "mini" or "OSS" models.

    Context Window: Qwen models offer substantial context (up to 128K–256K tokens), which is comparable to OpenAI’s specialized Codex variants. This allows you to process entire modules locally without the high per-token cost of sending that data to OpenAI's servers.