Comment by eldenring
the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.
the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.
and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack
More and more people I talk to care about privacy, but not in SF