Comment by jasonsb

Comment by jasonsb 2 days ago

It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3.

The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.

Edit: It looks like Cerebras is offering a very fast GLM 4.6

irthomasthomas 2 days ago

Gemini 3 = ~70tps https://openrouter.ai/google/gemini-3-pro-preview

Opus 4.5 = ~60-80tps https://openrouter.ai/anthropic/claude-opus-4.5

Kimi-k2-think = ~60-180tps https://openrouter.ai/moonshotai/kimi-k2-thinking

Deepseek-v3.2 = ~30-110tps (only 2 providers rn) https://openrouter.ai/deepseek/deepseek-v3.2

Reply View 2 replies

jasonsb 2 days ago

It doesn't work like that. You need to actually use the model and then go to /activity to see the actual speed. I constantly get 150-200tps from the Big 3 while other providers barely hit 50tps even though they advertise much higher speeds. GLM 4.6 via Cerebras is the only one faster than the closed source models at over 600tps.

Reply View | 1 reply
- irthomasthomas 2 days ago
  
  These aren't advertised speeds, they are the average measured speeds by openrouter across different providers.
  
  Reply View | 0 replies

observationist 2 days ago

The network effects of using consistently behaving models and maintaining API coverage between updates is valuable, too - presumably the big labs are including their own domains of competence in the training, so Claude is likely to remain being very good at coding, and behave in similar ways, informed and constrained by their prompt frameworks, so that interactions will continue to work in predictable ways even after major new releases occur, and upgrades can be clean.

It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front.

Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work.

Reply View 1 reply

dworks 2 days ago

Qwen 3 Coder Plus has been braindead this past weekend, but Codex 5.1 has also been acting up. It told me updating UI styling was too much work and I should do it myself. I also see people complaining about Claude every week. I think this is an unsolved problem, and you also have to separate perception from actual performance, which I think is an impossible task.

Reply View | 0 replies

jodleif 2 days ago

Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq?

Reply View 0 replies

DeathArrow 2 days ago

> If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini.

I think GLM 4.6 offered by Cerebras is much faster than any US model.

Reply View 1 reply

jasonsb 2 days ago

You're right, I forgot about that one.

Reply View | 0 replies

kachapopopow 2 days ago

cerebras AI offers models at 50x the speed of sonnet?

Reply View 2 replies

baq a day ago

if that's an honest question, the answer is pretty much yes, depending on model.

Reply View | 1 reply
- kachapopopow 16 hours ago
  
  the question mark was expressing confusion.
  
  Reply View | 0 replies

csomar 2 days ago

According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude.

Reply View 0 replies