Comment by alecco

Comment by alecco 14 hours ago

SemiAnalysis said it last week and AFAIK it wasn't denied.

https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

The SemiAnalysis article that you linked to stated:

"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."

Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.

Reply View 5 replies

alecco 2 hours ago

My reading in between the lines is OpenAI's "GPT-5" is really a GPT-4 generation model. And this is aligned with it being unimpressive. Not the promised leap forward Altman promised.

Reply View | 1 reply
- [removed] 34 minutes ago
  
  [deleted]
  
  Reply View | 0 replies
nbardy 7 hours ago

This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.
The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.
Sure they chose to not serve the large base models anymore for cost reasons.
But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.
In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.
Big model => distill => RL
Makes the most theoretical sense for training now days for efficient spending.
So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

Reply View | 2 replies
- barrell 7 minutes ago
  
  My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models
  
  Reply View | 0 replies
- copedetector 2 hours ago
  
  [flagged]
  
  Reply View | 0 replies

binkHN 9 hours ago

This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?

Reply View 10 replies

matwood 4 hours ago

As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.
The bigger issue is that entering a 'race' implies a race to the bottom.
I've noted this before, but one of NVDA's biggest risks is that its primary customers are also technical, also make hardware, also have money, and clearly see NVDA's margin (70% gross!!, 50%+ profit) as something they want to eliminate. Google was first to get there (not a surprise), but Meta is also working on its own hardware along with Amazon.
This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.

Reply View | 7 replies
- Glemkloksdjf 29 minutes ago
  
  Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.
  Everything.
  They can easily just do this for more optimized Chips.
  "easily" in sense of that wouldn't require that much investment. Nvidia knows how to invest and has done this for a long time. Their Ominiverse or robots platform isaac are all epxensive. Nvidia has 10x more software engineers than AMD
  
  Reply View | 0 replies
- sigmoid10 4 hours ago
  
  Making the hardware is actually the easy part. Everyone and their uncle who had some cash have tried by now: Microsoft, Meta, Tesla, Huawei, Amazon, Intel - the list goes on and on. But Nvidia is not a chip company. Huang himself said they are mostly a software company. And that is how they were able to build a gigantic moat. Because noone else has even come close on the software side. Google is the only one who has had some success on this side, because they also spent tons of money and time on software refinement by now, while all the other chips vanished into obscurity.
  
  Reply View | 5 replies
  
  matwood 3 hours ago
  
  Are you saying that Google, Meta, Amazon, etc... can't do software? It's the bread and butter of these companies. The CUDA moat is important to hold off the likes of AMD, but hardware like TPUs for internal use or other big software makers is not a big hurdle.
  Of course Huang will lean on the software being key because he sees the hardware competition catching up.
  
  Reply View | 1 reply
  
  sigmoid10 32 minutes ago
  
  Huang said that many years ago, long before ChatGPT or the current AI hype were a thing. In that interview he said that their costs for software R&D and support are equal or even bigger than their hardware side. They've also been hiring top SWE talent for almost two decades now. None of the other companies have spent even close to this much time and money on GPU software, at least until LLMs became insanely popular. So I'd be surprised to see them catch up anytime soon.
  
  Reply View | 0 replies
  
  sanjayjc 3 hours ago
  
  Genuine question: given LLMs' inexorable commoditization of software, how soon before NVDA's CUDA moat is breached too? Is CUDA somehow fundamentally different from other kinds of software or firmware?
  
  Reply View | 2 replies
dragonwriter 5 hours ago

> What's to stop them from entering the TPU race with their $5 trillion valuation?
Valuation isn’t available money; they'd have to raise more money in the current, probably tighter for them, investment environment to enter the TPU race, since the money they have already raised that that valuation is based on is already needed to provide runway for what they are already doing without putting money into the TPU race

Reply View | 0 replies
sysguest 6 hours ago

$5 trillion valuation doesn't mean it has $5 trillion cash in pocket -- so "it depends"

Reply View | 0 replies

CamperBob2 13 hours ago

That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.

Reply View 5 replies

Numerlor an hour ago

This article about them got published just yesterday... https://news.ycombinator.com/item?id=46124883
There's a lot of misleading information in what they publish, plagiarism, and I believe some information that wouldn't be possible to get without breaking NDAs

Reply View | 0 replies
seatac76 12 hours ago

Semi analysis is great, they typically do semiconductors but reporting is top notch.

Reply View | 1 reply
- lanstin 10 hours ago
  
  Wow, that was a good article. So much detail from financial to optical linking to build various data flow topologies. Makes me less aghast at the $10M salaries for the masters of these techniques.
  
  Reply View | 0 replies
CSMastermind 10 hours ago

Semianalysis is great, def recommend following

Reply View | 0 replies
ipnon 3 hours ago

Dylan Patel founded Semianalysis and he has a great interview with Satya Nadella on Dwarkesh Patel's podcast.

Reply View | 0 replies

rahimnathwani 11 hours ago

Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2

Reply View 4 replies

embedding-shape 10 hours ago

And this is relevant how? That interview is 1.5 hours, not something you just casually drop a link to and say "here, listen to this to even understand what point I was trying to make"

Reply View | 3 replies
- rahimnathwani 10 hours ago
  
  Sorry, this was meant to be a reply to this comment: https://news.ycombinator.com/item?id=46127942
  I was trying to make the point that SemiAnalysis is semi-famous.
  
  Reply View | 0 replies
- kovezd 9 hours ago
  
  You can now ask Gemini, about a video. Very useful!
  
  Reply View | 1 reply
  
  andai 8 hours ago
  
  I have a few lines of "download subtitles with yt-dlp", "remove the VTT crap", and "shove it into llm with a summarization prompt and/or my question appended", but I mostly use Gemini for that now. (And I use it for basically nothing else, oddly enough. They just have the monopoly on access to YouTube transcripts ;)
  
  Reply View | 0 replies

[removed] 9 hours ago

[deleted]

Reply View 0 replies