Comment by oskarkk

Comment by oskarkk 2 days ago

1 reply

> While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute.

I'm not sure if it's close to 100x more. xAI had 100K Nvidia H100s, while this is what SemiAnalysis writes about DeepSeek:

> We believe they have access to around 50,000 Hopper GPUs, which is not the same as 50,000 H100, as some have claimed. There are different variations of the H100 that Nvidia made in compliance to different regulations (H800, H20), with only the H20 being currently available to Chinese model providers today. Note that H800s have the same computational power as H100s, but lower network bandwidth.

> We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months. These GPUs are shared between High-Flyer and DeepSeek and geographically distributed to an extent. They are used for trading, inference, training, and research. For more specific detailed analysis, please refer to our Accelerator Model.

> Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters. Similarly, all AI Labs and Hyperscalers have many more GPUs for various tasks including research and training then they they commit to an individual training run due to centralization of resources being a challenge. X.AI is unique as an AI lab with all their GPUs in 1 location.

https://semianalysis.com/2025/01/31/deepseek-debates/

I don't know how much slower are these GPUs that they have, but if they have 50K of them, that doesn't sound like 100x less compute to me. Also, a company that has N GPUs and trains AI on them for 2 months can achieve the same results as a company that has 2N GPUs and trains for 1 month. So DeepSeek could spend a longer time training to offset the fact that have less GPUs than competitors.

cma 2 days ago

Having 50K of them isn't the same thing as 50K in one high bandwidth cluster, right? x.AI has all theirs so far in one connected cluster, and all of homogenous H100s right?