Comment by eldenring

Comment by eldenring 2 days ago

2 replies

the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.

cowpig 2 days ago

More and more people I talk to care about privacy, but not in SF

mistercheph a day ago

and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack