Comment by pzo

Comment by pzo a day ago

5 replies

The did explain a little bit:

> We’ll be able to do things like run fast models on the edge, run model pipelines on instantly-booting Workers, stream model inputs and outputs with WebRTC, etc.

Benefit to 3rd party developers is reducing latency and improving robustness of AI pipeline. Instead of going back and forth with https request at each stage to do inference you could make all in one request, e.g. doing realtime, pipelined STT, text translation, some backend logic, TTS and back to user mobile device.

badmonster a day ago

Does edge inference really solve the latency issue for most use cases? How does cost compare at scale?

  • viraptor a day ago

    Depends on how much the latency matters to you and the customers. Most services realistically won't gain much at all. Even the latency of normal web requests is very rarely relevant. Only the business itself and answer that question though.

    • chrisweekly a day ago

      > "Even the latency of normal web requests is very rarely relevant."

      Hard disagree. Performance is typically the most important feature for any website. User abandonment / bounce rate follows a predictable, steep, nonlinear curve based on latency.

      • viraptor 21 hours ago

        I've changed the latency of actual services as well as core web vials many times and... no. Turns out the line is not that steep. For the range 200ms-1s, it's pretty much flat. Sure, you can start seeing issues for multi second requests, but that's terrible processing time. A change like eliminating intercontinental transfer latency - barely visible in results in ecommerce.

        There's this old meme of Amazon seeing a difference for every 100ms latency and I've never seen it actually reproduced in a controlled way. Even when CF tries to advertise lower latency https://www.cloudflare.com/en-au/learning/performance/more/w... their data is companies reducing it by whole seconds. "Walmart found that for every 1 second improvement in page load time, conversions increased by 2%" - that's not steep. When there's a claim about improvements per 100ms, it's still based on averaging multi-second data like in https://auditzy.com/blog/impact-of-fast-load-times-on-user-e...

        In short - if you have something extremely interactive, I'm sure it matters for experience. For a typical website loading in under 1s, edge will barely matter. If you have data proving otherwise, I'd genuinely love to see that. For websites loading in over 1s, it's likely much easier to improve the core experience than split thing out into edge.