Comment by lovelydata
Comment by lovelydata 3 days ago
llama.cpp + Qwen3-4B running on older PC with AMD Radeon GPU (Vulcan). Users connect via web UI. Usually around 30 tokens/sec. Usable.
Comment by lovelydata 3 days ago
llama.cpp + Qwen3-4B running on older PC with AMD Radeon GPU (Vulcan). Users connect via web UI. Usually around 30 tokens/sec. Usable.
Autocomplete words, I'd wager, as yeah, super tiny model that can barely output coherent output in many cases.
What do they use it for? It's a very small model