Comment by breput

Comment by breput a day ago

Nemotron-3-Nano-30B-A3B[0][1] is a very impressive local model. It is good with tool calling and works great with llama.cpp/Visual Studio Code/Roo Code for local development.

It doesn't get a ton of attention on /r/LocalLLaMA but it is worth trying out, even if you have a relatively modest machine.

[0] https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...

[1] https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF

bhadass a day ago

Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...

Reply View 1 reply

nextos a day ago

Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.

Reply View | 0 replies

jychang a day ago

It was good for like, one month. Qwen3 30b dominated for half a year before that, and GLM-4.7 Flash 30b took over the crown soon after Nemotron 3 Nano came out. There was basically no time period for it to shine.

Reply View 4 replies

breput a day ago

It is still good, even if not the new hotness. But I understand your point.
It isn't as though GLM-4.7 Flash is significantly better, and honestly, I have had poor experiences with it (and yes, always the latest llama.cpp and the updated GGUFs).

Reply View | 0 replies
ThrowawayTestr a day ago

Genuinely exciting to be around for this. Reminds me of the time when computers were said to be obsolete by the time you drove them home.

Reply View | 0 replies
binary132 a day ago

I recently tried GLM-4.7 Flash 30b and didn’t have a good experience with it at all.

Reply View | 1 reply
- breput a day ago
  
  It feels like GLM has either a bit of a fan club or maybe some paid supporters...
  
  Reply View | 0 replies

superjan a day ago

Oh those ghastly model names. https://www.smbc-comics.com/comic/version

Reply View 0 replies

deskamess a day ago

Do they have a good multilingual embedding model? Ideally, with a decent context size like 16/32K. I think Qwen has one at 32K. Even the Gemma contexts are pretty small (8K).

Reply View 0 replies

binary132 a day ago

I find the Q8 runs a bit more than twice as fast as gpt-120b since I don’t have to offload as many MoE layers, but is just about as capable if not better.

Reply View 0 replies