Comment by refulgentis

Comment by refulgentis 3 days ago

The amount of people who will be using it at 1 token/sec because there's no better option, and have 64 GB of RAM, is vanishingly small.

IMHO it sets the local LLM community back when we lean on extreme quantization & streaming weights from disk to say something is possible*, because when people try it out, it turns out it's an awful experience.

* the implication being, anything is possible in that scenario

selfhoster11 2 days ago

Good. Vanishingly small is still more than zero. Over time, running such models will become easier too, as people slowly upgrade to better hardware. It's not like there aren't options for the compute-constrained either. There are lots of Chinese models in the 3-32B range, and Gemma 3 is particularly good too.

I will also point out that having three API-based providers deploying an impractically-large open-weights model beats the pants of having just one. Back in the day, this was called second-sourcing IIRC. With proprietary models, you're at the mercy of one corporation and their Kafkaesque ToS enforcement.

Reply View 3 replies

refulgentis 2 days ago

You said "Good." then wrote a nice stirring bit about how having a bad experience with a 1T model will force people to try 4B/32B models.
That seems separate from the post it was replying to, about 1T param models.
If it is intended to be a reply, it hand waves about how having a bad experience with it will teach them to buy more expensive hardware.
Is that "Good."?
The post points out that if people are taught they need an expensive computer to get 1 token/second, much less try it and find out it's a horrible experience (let's talk about prefill), it will turn them off against local LLMs unnecessarily.
Is that "Good."?

Reply View | 2 replies
- jimjimwii a day ago
  
  Had you posted this comment in the early 90s about linux instead of local models, it would have made about the same amount of sense but aged just as poorly as this comment will.
  I'll remain here happily using 2.something tokens / second model.
  
  Reply View | 1 reply
  
  apitman 12 hours ago
  
  But local aka desktop Linux is still an awful experience for most people. I use Arch btw
  
  Reply View | 0 replies

homarp 3 days ago

agentic loop can run all night long. It's just a different way to work: prepare your prompt queue, set it up, check result in the morning, adjust. 'local vibe' in 10h instead of 10mn is still better than 10 days of manual side coding.

Reply View 1 reply

hereme888 2 days ago

Right on! Especially if its coding abilities are better than Claude 4 Opus. I spent thousands on my PC in anticipation of this rather than to play fancy video games.
Now, where's that spare SSD...

Reply View | 0 replies