Comment by jhonof

Not OP but for autocomplete I am running Qwen2.5-Coder-7B and I quantized it using Q2_K. I followed this guide:

https://blog.steelph0enix.dev/posts/llama-cpp-guide/#quantiz...

And I get fast enough autcomplete results for it to be useful. I have and NVIDIA 4060 RTX in a laptop with 8 gigs of dedicated memory that I use for it. I still use claude for chat (pair programming) though, and I don't really use agents.