Comment by alganet
> we weren’t held back by hardware
Llama 3 8B took 1.3M hours to train in a H100-80GB.
Of course, it didn't took 1.3M hours (~150 years). So, many machines with 80GB were used.
Let's do some napkin math. 150 machines with a total of 12TB VRAM for a year.
So, what would be needed to train a 300K parameter model that runs on 128MB RAM? Definitely more, much more than 128MB RAM.
Llama 3 runs on 16GB VRAM. Let's imagine that's our Pentium II of today. You need at least 750 times what is needed to run it in order to train it. So, you would have needed ~100GB RAM back then, running for a full year, to get that 300K model.
How many computers with 100GB+ RAM do you think existed in 1997?
Also, I only did RAM. You also need raw processing power and massive amounts of training data.
You’re basically arguing that because A380s need millions of liters of fuel and a 4km runway, the Wright Flyer was impossible in 1903. That logic just doesn’t hold. Different goals, different scales, different assumptions. The 300K model shows that even in the 80s, it was both possible and sufficient for narrow but genuinely useful tasks.
We simply weren’t looking, blinded by symbolic programming and expert systems. This could have been a wake-up call, steering AI research in a completely different direction and accelerating progress by decades. That’s the whole point.