Comment by alecco
Comment by alecco 14 hours ago
SemiAnalysis said it last week and AFAIK it wasn't denied.
https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...
Comment by alecco 14 hours ago
SemiAnalysis said it last week and AFAIK it wasn't denied.
https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...
This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.
The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.
Sure they chose to not serve the large base models anymore for cost reasons.
But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.
In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.
Big model => distill => RL
Makes the most theoretical sense for training now days for efficient spending.
So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.
This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?
As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.
The bigger issue is that entering a 'race' implies a race to the bottom.
I've noted this before, but one of NVDA's biggest risks is that its primary customers are also technical, also make hardware, also have money, and clearly see NVDA's margin (70% gross!!, 50%+ profit) as something they want to eliminate. Google was first to get there (not a surprise), but Meta is also working on its own hardware along with Amazon.
This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.
Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.
Everything.
They can easily just do this for more optimized Chips.
"easily" in sense of that wouldn't require that much investment. Nvidia knows how to invest and has done this for a long time. Their Ominiverse or robots platform isaac are all epxensive. Nvidia has 10x more software engineers than AMD
Making the hardware is actually the easy part. Everyone and their uncle who had some cash have tried by now: Microsoft, Meta, Tesla, Huawei, Amazon, Intel - the list goes on and on. But Nvidia is not a chip company. Huang himself said they are mostly a software company. And that is how they were able to build a gigantic moat. Because noone else has even come close on the software side. Google is the only one who has had some success on this side, because they also spent tons of money and time on software refinement by now, while all the other chips vanished into obscurity.
Are you saying that Google, Meta, Amazon, etc... can't do software? It's the bread and butter of these companies. The CUDA moat is important to hold off the likes of AMD, but hardware like TPUs for internal use or other big software makers is not a big hurdle.
Of course Huang will lean on the software being key because he sees the hardware competition catching up.
Huang said that many years ago, long before ChatGPT or the current AI hype were a thing. In that interview he said that their costs for software R&D and support are equal or even bigger than their hardware side. They've also been hiring top SWE talent for almost two decades now. None of the other companies have spent even close to this much time and money on GPU software, at least until LLMs became insanely popular. So I'd be surprised to see them catch up anytime soon.
> What's to stop them from entering the TPU race with their $5 trillion valuation?
Valuation isn’t available money; they'd have to raise more money in the current, probably tighter for them, investment environment to enter the TPU race, since the money they have already raised that that valuation is based on is already needed to provide runway for what they are already doing without putting money into the TPU race
That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.
This article about them got published just yesterday... https://news.ycombinator.com/item?id=46124883
There's a lot of misleading information in what they publish, plagiarism, and I believe some information that wouldn't be possible to get without breaking NDAs
Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2
And this is relevant how? That interview is 1.5 hours, not something you just casually drop a link to and say "here, listen to this to even understand what point I was trying to make"
Sorry, this was meant to be a reply to this comment: https://news.ycombinator.com/item?id=46127942
I was trying to make the point that SemiAnalysis is semi-famous.
I have a few lines of "download subtitles with yt-dlp", "remove the VTT crap", and "shove it into llm with a summarization prompt and/or my question appended", but I mostly use Gemini for that now. (And I use it for basically nothing else, oddly enough. They just have the monopoly on access to YouTube transcripts ;)
The SemiAnalysis article that you linked to stated:
"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."
Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.