Comment by dust42
On a Macbook pro 64GB I use Qwen3-Coder-30B-A3B Q4 quant with llama.cpp.
For VSCode I use continue.dev as it allows to set my own (short) system prompt. I get around 50token/sec generation and prompt processing 550t/s.
When giving well defined small tasks, it is as good as any frontier model.
I like the speed and low latency and the availability while on the plane/train or off-grid.
Also decent FIM with the llama.cpp VSCode plugin.
If I need more intelligence my personal favourites are Claude and Deepseek via API.
Would you use a different quant with a 128 GB machine? Could you link the specific download you used on huggingface? I find a lot of the options there to be confusing.