Comment by lovelydata

Comment by lovelydata 3 days ago

llama.cpp + Qwen3-4B running on older PC with AMD Radeon GPU (Vulcan). Users connect via web UI. Usually around 30 tokens/sec. Usable.

NicoJuicy 3 days ago

What do they use it for? It's a very small model

embedding-shape 3 days ago

Autocomplete words, I'd wager, as yeah, super tiny model that can barely output coherent output in many cases.

Reply View | 0 replies