Comment by singularity2001

Comment by singularity2001 2 days ago

Why are there so few 32,64,128,256,512 GB models which could run on current consumer hardware? And why is the maximum RAM on Mac studio M4 128 GB??

eldenring 2 days ago

the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.

Reply View 2 replies

cowpig 2 days ago

More and more people I talk to care about privacy, but not in SF

Reply View | 0 replies
mistercheph a day ago

and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack

Reply View | 0 replies

jameslk 2 days ago

128 GB should be enough for anybody (just kidding). I hope the M5 Max will have higher RAM limits

Reply View 1 reply

aryonoco 2 days ago

M5 Max probably won’t, but M5 Ultra probably will

Reply View | 0 replies

ainch a day ago

As LLMs are productionised/commodified they're incorporating changes which are enthusiast-unfriendly. Small dense models are great for enthusiasts running inference locally, but for parallel batched inference MoE models are much more efficient.

Reply View 0 replies