Comment by maxloh
Comment by maxloh a day ago
Is the training cost really that high, though?
The Allen Institute (a non-profit) just released the Molmo 2 and Olmo 3 models. They trained these from scratch using public datasets, and they are performance-competitive with Gemini in several benchmarks [0] [1].
AMD was also able to successfully train an older version of OLMo on their hardware using the published code, data, and recipe [2].
If a non-profit and a chip vendor (training for marketing purposes) can do this, it clearly doesn't require "burning 10 years of cash flow" or a Google-scale TPU farm.
[0]: https://allenai.org/blog/molmo2
No, I doesn't beat Gemini in any benchmarks. It beats Gemma, which isn't a SoTA even among open models of that size. That would be Nemotron 3 or GPT-OSS 20B.