Comment by milesvp
Worse, is that a lot of these people are acting like Moore's law isn't still in effect. People conflate clock speeds on beefy hardware with moore's law, and act like it's dead, when transistor density rises, and cost per transistor continue to fall at rates similar to what they always have. That means the people racing to build out infrastructure today might just be better off parking that money in a low interest account, and waiting 6 months. That was a valid strategy for animation studios in the late 90s (it was not only cheaper to wait, but also the finished renders happened sooner), and I'd be surprised if it's not a valid strategy today for LLMs. The amount of silicon that is going to be produced that is specialized for this type of processing is going to be mind boggling.
Cost per transistor is increasing. or flat, if you stay on a legacy node. They pretty much squeezed all the cost out of 28nm that can be had, and it’s the cheapest per transistor.
“based on the graph presented by Milind Shah from Google at the industry tradeshow IEDM, the cost of 100 million transistors normalized to 28nm is actually flat or even increasing.”
https://www.tomshardware.com/tech-industry/manufacturing/chi...