Comment by segmondy
10 years worth of cash? So all these Chinese labs that came out and did it for less than $1 billion must have 3 heads per developer, right?
10 years worth of cash? So all these Chinese labs that came out and did it for less than $1 billion must have 3 heads per developer, right?
We don't really know how much it cost them. Plenty of reasons to doubt the numbers passed around and what it wasn't counting.
(And even if you do believe it, they also aren't licensing the IP they're training on, unlike american firms who are now paying quite a lot for it)
Rumor has it that they weren't trained "from scratch" the was US would, i.e. Chinese labs benefitted from government "procured" IP (the US $B models) in order to train their $M models. Also understand there to be real innovation in the many-MoE architecture on top of that. Would love to hear a more technical understanding from someone who does more than repeat rumors, though.