Comment by jauntywundrkind

What had Qwen3-coder rejected? Right now that seems like the strongest recommendation. GLM-4.7-Flash seems very promising but is so new.

Gemma3 is also very good. Nanbeige-4 is supposedly incredibly capable. Both are very small. https://huggingface.co/google/gemma-3-4b-it https://huggingface.co/Nanbeige/Nanbeige4-3B-Thinking-2511

Ideally IMO, you should probably build little tools or a multi-tool for doing the work you want done. Rather than having LLMs having to figure out what needs to be done, doing a more code mode style of development and giving the LLM's the ability to call your tool will be far faster and far more consistent with far lower resources. Tiny models like FunctionGemma will be able to take simple commands and get the work done, very fast, with very little resources. Anthropic wrote this up, citing also CloudFlare calling it Code Mode. https://blog.google/innovation-and-ai/technology/developers-... https://www.anthropic.com/engineering/code-execution-with-mc...

(Note that while Anthropic is suggesting MCP for their "code mode" direction, and while writing MCP's is super easy: writing a cli tool can have just as good as a results! And is often easier for humans to work with!)