Comment by maxloh

Comment by maxloh a day ago

Is the training cost really that high, though?

The Allen Institute (a non-profit) just released the Molmo 2 and Olmo 3 models. They trained these from scratch using public datasets, and they are performance-competitive with Gemini in several benchmarks [0] [1].

AMD was also able to successfully train an older version of OLMo on their hardware using the published code, data, and recipe [2].

If a non-profit and a chip vendor (training for marketing purposes) can do this, it clearly doesn't require "burning 10 years of cash flow" or a Google-scale TPU farm.

[0]: https://allenai.org/blog/molmo2

[1]: https://allenai.org/blog/olmo3

[2]: https://huggingface.co/amd/AMD-OLMo

lostmsu 20 hours ago

No, I doesn't beat Gemini in any benchmarks. It beats Gemma, which isn't a SoTA even among open models of that size. That would be Nemotron 3 or GPT-OSS 20B.

Reply View 0 replies

turtlesdown11 a day ago

No, of course the training costs aren't that high. Apple's ten years of future free cash flow is greater than a trillion dollars (they are above $100b per year). Obviously, the training costs are a trivial amount compared to that figure.

Reply View 7 replies

ufmace 17 hours ago

What I'm wondering - their future cash flow may be massive compared to any conceivable rational task, but the market for servers and datacenters seems to be pretty saturated right now. Maybe, for all their available capital, they just can't get sufficient compute and storage on a reasonable schedule.

Reply View | 0 replies
bombcar 21 hours ago

I have no idea what AI involves, but "training" sounds like a one-and-done - but how is the result "stored"? If you have trained up a Gemini, can you "clone" it and if so, what is needed?
I was under the impression that all these GPUs and such were needed to run the AI, not only ingest the data.

Reply View | 4 replies
- DougBTX 19 hours ago
  
  > but how is the result "stored"
  Like this: https://huggingface.co/docs/safetensors/index
  
  Reply View | 0 replies
- esafak 20 hours ago
  
  Yes, serving requires infra, too. But you can use infra optimized for serving; nvidia GPUs are not the only game in town.
  
  Reply View | 0 replies
- tefkah 20 hours ago
  
  Theoretically it would be much less expensive to just continue to run the existing models, but ofc none of the current leaders are going to stop training new ones any time soon.
  
  Reply View | 1 reply
  
  bombcar 18 hours ago
  
  So are we on a hockey stick right now where a new model is so much better than the previous that you have to keep training?
  Because almost every example of previous cases of things like this eventually leveled out.
  
  Reply View | 0 replies
amelius 20 hours ago

Hiring the right people should also be trivial with that amount of cash.

Reply View | 0 replies

PunchyHamster 17 hours ago

my prediction is that they might switch once AI craze will simmer down to some more reasonable level

Reply View 0 replies