Comment by kgwgk

Comment by kgwgk 2 days ago

0 replies

> my estimation is they had more (maybe 20k) A100s, and single-digit thousands of H800s.

Their technical report on DeepSeek-V3 says that it "is trained on a cluster equipped with 2048 NVIDIA H800 GPUs." If they had even high-single-digit thousands of H800s they would have probably used more computing power instead of waiting almost two months.