Comment by xmorse

Running models locally is very expensive in terms of memory and scheduling requirements, maybe instead they should host their model on the Cloudflare AI network which is distributed all around the world and can have lower latency