Comment by Karrot_Kream

Comment by Karrot_Kream 9 months ago

2 replies

Why does latency matter for a model that responds in 10s of seconds? Latency to a datacenter is measured in 10s or 100s of milliseconds, which is 3-4 orders of magnitude less.

scottcha 9 months ago

Two reasons that I understand 1. not all these AIs are LLMs and many have much lower latency SLAs than chat and 2. These are just one part of a service architecture and when you have multiple latencies across the stack they tend to have multiplicative effects.

recursivecaveat 9 months ago

If you look at a model with a diverse competitive provider set like llama 3 the latency is 1/4 second, and it will definitely improve at a minimum incrementally if quality is held constant: https://artificialanalysis.ai/models/llama-3-3-instruct-70b/... Remember that as long as you experience the response linearly (very much the case for audio output for eg) then the first-chunk latency is your actual latency, not to stream the entire response.