espadrine 2 days ago

Two aspects to consider:

1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.

2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.

On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/

You can notice that, while Chinese models are quite good, the gap to the top is still significant.

However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).

  • coliveira 2 days ago

    Nothing you said helps with the issue of valuation. Yes, the US models may be better by a few percentage points, but how can they justify being so costly, both operationally as well as in investment costs? Over the long run, this is a business and you don't make money being the first, you have to be more profitable overall.

    • ben_w 2 days ago

      I think the investment race here is an "all-pay auction"*. Lots of investors have looked at the ultimate prize — basically winning something larger than the entire present world economy forever — and think "yes".

      But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet.

      * https://en.wikipedia.org/wiki/All-pay_auction

      • coliveira 2 days ago

        > investors have looked at the ultimate prize — basically winning something larger than the entire present world economy

        This is what people like Altman want investors to believe. It seems like any other snake oil scam because it doesn't match reality of what he delivers.

        • saubeidl 2 days ago

          Yeah, this is basically financial malpractice/fraud.

  • jodleif 2 days ago

    1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA.

    • brabel 2 days ago

      Qwen Image and Image Edit were among the best image models until Nano Banana Pro came along. I have tried some open image models and can confirm , the Chinese models are easily the best or very close to the best, but right now the Google model is even better... we'll see if the Chinese catch up again.

      • BoorishBears 2 days ago

        I'd say Google still hasn't caught up on the smaller model side at all, but we've all been (rightfully) wowed enough by Pro to ignore that for now.

        Nano Banano Pro starts at 15 cents per image at <2k resolution, and is not strictly better than Seedream 4.0: yet the latter does 4K for 3 cents per image.

        Add in the power of fine-tuning on their open weight models and I don't know if China actually needs to catch up.

        I finetuned Qwen Image on 200 generations from Seedream 4.0 that were cleaned up with Nano Banana Pro, and got results that were as good and more reliable than either model could achieve otherwise.

  • janalsncm a day ago

    > Chinese models typically focus on text

    Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese.

    Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text.

  • raincole 2 days ago

    > video

    Most of AI-generated videos we see on social media now are made with Chinese models.

  • culi 2 days ago

    Qwen, Hunyuan, and WAN are three of the major competitors in the vision, text-to-image, and image-to-video spaces. They are quite competitive. Right now WAN is only behind Google's Veo in image-to-video rankings on llmarena for example

    https://lmarena.ai/leaderboard/image-to-video

  • torginus 2 days ago

    Thanks for sharing that!

    The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today?

    I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past)

    • espadrine 2 days ago

      Good question. There's 2 points to consider.

      • For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version. Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding

      • The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher.

      By the way, you can look at the Table tab to see all known and predicted results on benchmarks.

      • pama a day ago

        The table is confusing. It is not clear what is known and what is predicted (and how it is predicted). Why not measure the missing pieces instead of predicting—is it too expensive or is the tooling missing?

  • agumonkey 2 days ago

    forgive me for bringing politics into it, are chinese LLM more prone to censorship bias than US ones ?

    • coliveira 2 days ago

      Being open source, I believe Chinese models are less prone to censorship, since the US corporations can add censorship in several ways just by being a closed model that they control.

    • skeledrew 2 days ago

      It's not about a LLM being prone to anything, but more about the way a LLM is fine-tuned (which can be subject to the requirements of those wielding political power).

      • agumonkey 2 days ago

        that's what i meant even though i could have been more precise

    • erikhorton 2 days ago

      Yes extremely likely they are prone to censorship based on the training. Try running them with something like LM Studio locally and ask it questions the government is uncomfortable about. I originally thought the bias was in the GUI, but it's baked into the model itself.

jasonsb 2 days ago

It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3.

The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.

Edit: It looks like Cerebras is offering a very fast GLM 4.6

  • irthomasthomas 2 days ago
    • jasonsb 2 days ago

      It doesn't work like that. You need to actually use the model and then go to /activity to see the actual speed. I constantly get 150-200tps from the Big 3 while other providers barely hit 50tps even though they advertise much higher speeds. GLM 4.6 via Cerebras is the only one faster than the closed source models at over 600tps.

      • irthomasthomas 2 days ago

        These aren't advertised speeds, they are the average measured speeds by openrouter across different providers.

  • observationist 2 days ago

    The network effects of using consistently behaving models and maintaining API coverage between updates is valuable, too - presumably the big labs are including their own domains of competence in the training, so Claude is likely to remain being very good at coding, and behave in similar ways, informed and constrained by their prompt frameworks, so that interactions will continue to work in predictable ways even after major new releases occur, and upgrades can be clean.

    It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front.

    Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work.

    • dworks 2 days ago

      Qwen 3 Coder Plus has been braindead this past weekend, but Codex 5.1 has also been acting up. It told me updating UI styling was too much work and I should do it myself. I also see people complaining about Claude every week. I think this is an unsolved problem, and you also have to separate perception from actual performance, which I think is an impossible task.

  • jodleif 2 days ago

    Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq?

  • DeathArrow 2 days ago

    > If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini.

    I think GLM 4.6 offered by Cerebras is much faster than any US model.

  • kachapopopow 2 days ago

    cerebras AI offers models at 50x the speed of sonnet?

    • baq a day ago

      if that's an honest question, the answer is pretty much yes, depending on model.

  • csomar 2 days ago

    According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude.

jazzyjackson 2 days ago

Valuation is not based on what they have done but what they might do, I agree tho it's investment made with very little insight into Chinese research. I guess it's counting on deepseek being banned and all computers in America refusing to run open software by the year 2030 /snark

  • jodleif 2 days ago

    > Valuation is not based on what they have done but what they might do

    Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.

    • ksynwa 2 days ago

      Even if they do continue to lag behind they are a good bet against monopolisation by proprietary vendors.

      • coliveira 2 days ago

        They would if corporations were allowed to run these models. I fully expect the US government to prohibit corporations from doing anything useful with Chinese models (full censorship). It's the same game they use with chips.

  • bilbo0s 2 days ago

    >I guess it's counting on deepseek being banned

    And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.

    Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".

    "Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?

    • coliveira 2 days ago

      Exactly. "Business investment" these days means that the people involved will have at least some amount of power to determine the winning results.

Bolwin 2 days ago

Third party providers rarely support caching.

With caching the expensive US models end up being like 2x the price (e.g sonnet) and often much cheaper (e.g gpt-5 mini)

If they start caching then US companies will be completely out priced.

newyankee 2 days ago

Yet tbh if the US industry had not moved ahead and created the race with FOMO it would not had been easier for Chinese strategy to work either.

The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?

fastball 2 days ago

They're not that close (on things like LMArena) and being cheaper is pretty meaningless when we are not yet at the point where LLMs are good enough for autonomy.

mrinterweb 2 days ago

I would expect one of the motivations for making these LLM model weights open is to undermine the valuation of other players in the industry. Open models like this must diminish the value prop of the frontier focused companies if other companies can compete with similar results at competitive prices.

rprend 2 days ago

People pay for products, not models. OpenAI and Anthropic make products (ChatGPT, Claude Code).

isamuel 2 days ago

There is a great deal of orientalism --- it is genuinely unthinkable to a lot of American tech dullards that the Chinese could be better at anything requiring what they think of as "intelligence." Aren't they Communist? Backward? Don't they eat weird stuff at wet markets?

It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."

  • mosselman 2 days ago

    Back when deepseek came out and people were tripping over themselves shouting it was so much better than what was out there, it just wasn’t good.

    It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true.

    What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive.

    Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great!

    So I love everything about the competitive nature of this.

    • CamperBob2 2 days ago

      If you thought DeepSeek "just wasn't good," there's a good chance you were running it wrong.

      For instance, a lot of people thought they were running "DeepSeek" when they were really running some random distillation on ollama.

      • bjourne 2 days ago

        WDYM? Isn't https://chat.deepseek.com/ the real DeepSeek?

        • CamperBob2 2 days ago

          Good point, I was assuming the GP was running local for some reason. Hard to argue when it's the official providers who are being compared.

          I ran the 1.58-bit Unsloth quant locally at the time it came out, and even at such low precision, it was super rare for it to get something wrong that o1 and GPT4 got right. I have never actually used a hosted version of the full DS.

  • stocksinsmocks a day ago

    Early stages of Barbarossa were very successful and much of the Soviet Air Force, which had been forward positioned for invasion, was destroyed. Given the Red Army’s attitude toward consent, I would keep the praise carefully measured. TV has taught us there are good guys and bad guys when the reality is closer to just bad guys and bad guys

  • ecshafer 2 days ago

    I don't think that anyone, much less someone working in tech or engineering in 2025, could still hold beliefs about Chinese not being capable scientists or engineers. I could maybe give (the naive) pass to someone in 1990 thinking China will never build more than junk. But in 2025 their product capacity, scientific advancement, and just the amount of us who have worked with extremely talented Chinese colleagues should dispel those notions. I think you are jumping to racism a bit fast here.

    Germany was right in some ways and wrong in others for the soviet unions strength. USSR failed to conquer Finland because of the military purges. German intelligence vastly under-estimated the amount of tanks and general preparedness of the Soviet army (Hitler was shocked the soviets had 40k tanks already). Lend Lease act really sent an astronomical amount of goods to the USSR which allowed them to fully commit to the war and really focus on increasing their weapon production, the numbers on the amount of tractors, food, trains, ammunition, etc. that the US sent to the USSR is staggering.

    • hnfong 2 days ago

      I don't think anyone seriously believes that the Chinese aren't capable, it's more like people believe no matter what happens, USA will still dominate in "high tech" fields. A variant of "American Exceptionalism" so to speak.

      This is kinda reflected in the stock market, where the AI stocks are surging to new heights every day, yet their Chinese equivalents are relatively lagging behind in stock price, which suggests that investors are betting heavily on the US companies to "win" this "AI race" (if there's any gains to be made by winning).

      Also, in the past couple years (or maybe a couple decades), there had also been a lot of crap talk about how China has to democratize and free up their markets in order to be competitive with the other first world countries, together with a bunch of "doomsday" predictions for authoritarianism in China. This narrative has completely lost any credibility, but the sentiment dies slowly...

  • breppp 2 days ago

    Not sure how the entire Nazi comparison plays out, but at the time there were good reasons to imagine the Soviets will fall apart (as they initially did)

    Stalin just finished purging his entire officer corps, which is not a good omen for war, and the USSR failed miserably against the Finnish who were not the strongest of nations, while Germany just steamrolled France, a country that was much more impressive in WW1 than the Russians (who collapsed against Germany)

  • newyankee 2 days ago

    but didn't Chinese already surpass the rest of the world in Solar, batteries, EVs among other things ?

    • cyberlimerence 2 days ago

      They did, but the goalposts keep moving, so to speak. We're approximately here : advanced semiconductors, artificial intelligence, reusable rockets, quantum computing, etc. Chinese will never catch up. /s

  • lukan 2 days ago

    "It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; ..."

    Ideology played a role, but the data they worked with, was the finnish war, that was disastrous for the sowjet side. Hitler later famously said, it was all a intentionally distraction to make them believe the sowjet army was worth nothing. (Real reasons were more complex, like previous purging).

  • gazaim 2 days ago

    These Americans have no comprehension of intelligence being used to benefit humanity instead of being used to fund a CEO's new yacht. I encourage them to visit China to see how far the USA lags behind.

    • astrange 2 days ago

      Lags behind meaning we haven't covered our buildings in LEDs?

      America is mostly suburbs and car sewers but that's because the voters like it that way.

  • littlestymaar 2 days ago

    > It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed

    Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

    • gazaim 2 days ago

      > Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

      There were many counter revolutionaries among the leadership, even those conducting the purges. Stalin was like "ah fuck we're hella compromised." Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.

      • littlestymaar 2 days ago

        > There were many counter revolutionaries among the leadership

        Well, Stalin was, by far, the biggest counter-revolutionary in the Politburo.

        > Stalin was like "ah fuck we're hella compromised."

        There's no evidence that anything significant was compromised at that point, and clear evidence that Stalin was in fact medically paranoid.

        > Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.

        Can we please not recycle 90-years old soviet propaganda? The Moscow trial being irrational self-harm was acknowledged by the USSR leadership as early as the fifties…