Comment by rfoo

Comment by rfoo 7 days ago

0 replies

> the fastest completion since it would run locally

We are living in a strange age that local is slower than the cloud. Due to the sheer amount of compute we need to do. Compute takes hundreds of milliseconds (if not seconds) on local hardware, making 100ms of network latency irrelevant.

Even for a 7B model your expensive Mac or 4090 can't beat, for example, a box with 8x A100s running FOSS serving stack (sglang) with TP=8, in latency.