RossBencina 10 hours ago

The SemiAnalysis article that you linked to stated:

"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."

Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.

  • alecco 2 hours ago

    My reading in between the lines is OpenAI's "GPT-5" is really a GPT-4 generation model. And this is aligned with it being unimpressive. Not the promised leap forward Altman promised.

    • [removed] 34 minutes ago
      [deleted]
  • nbardy 7 hours ago

    This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.

    The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

    Sure they chose to not serve the large base models anymore for cost reasons.

    But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.

    In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

    Big model => distill => RL

    Makes the most theoretical sense for training now days for efficient spending.

    So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

    • barrell 7 minutes ago

      My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models

binkHN 9 hours ago

This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?

  • matwood 4 hours ago

    As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.

    The bigger issue is that entering a 'race' implies a race to the bottom.

    I've noted this before, but one of NVDA's biggest risks is that its primary customers are also technical, also make hardware, also have money, and clearly see NVDA's margin (70% gross!!, 50%+ profit) as something they want to eliminate. Google was first to get there (not a surprise), but Meta is also working on its own hardware along with Amazon.

    This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.

    • Glemkloksdjf 29 minutes ago

      Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.

      Everything.

      They can easily just do this for more optimized Chips.

      "easily" in sense of that wouldn't require that much investment. Nvidia knows how to invest and has done this for a long time. Their Ominiverse or robots platform isaac are all epxensive. Nvidia has 10x more software engineers than AMD

    • sigmoid10 4 hours ago

      Making the hardware is actually the easy part. Everyone and their uncle who had some cash have tried by now: Microsoft, Meta, Tesla, Huawei, Amazon, Intel - the list goes on and on. But Nvidia is not a chip company. Huang himself said they are mostly a software company. And that is how they were able to build a gigantic moat. Because noone else has even come close on the software side. Google is the only one who has had some success on this side, because they also spent tons of money and time on software refinement by now, while all the other chips vanished into obscurity.

      • matwood 3 hours ago

        Are you saying that Google, Meta, Amazon, etc... can't do software? It's the bread and butter of these companies. The CUDA moat is important to hold off the likes of AMD, but hardware like TPUs for internal use or other big software makers is not a big hurdle.

        Of course Huang will lean on the software being key because he sees the hardware competition catching up.

        • sigmoid10 32 minutes ago

          Huang said that many years ago, long before ChatGPT or the current AI hype were a thing. In that interview he said that their costs for software R&D and support are equal or even bigger than their hardware side. They've also been hiring top SWE talent for almost two decades now. None of the other companies have spent even close to this much time and money on GPU software, at least until LLMs became insanely popular. So I'd be surprised to see them catch up anytime soon.

      • sanjayjc 3 hours ago

        Genuine question: given LLMs' inexorable commoditization of software, how soon before NVDA's CUDA moat is breached too? Is CUDA somehow fundamentally different from other kinds of software or firmware?

  • dragonwriter 5 hours ago

    > What's to stop them from entering the TPU race with their $5 trillion valuation?

    Valuation isn’t available money; they'd have to raise more money in the current, probably tighter for them, investment environment to enter the TPU race, since the money they have already raised that that valuation is based on is already needed to provide runway for what they are already doing without putting money into the TPU race

  • sysguest 6 hours ago

    $5 trillion valuation doesn't mean it has $5 trillion cash in pocket -- so "it depends"

CamperBob2 13 hours ago

That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.

  • seatac76 12 hours ago

    Semi analysis is great, they typically do semiconductors but reporting is top notch.

    • lanstin 10 hours ago

      Wow, that was a good article. So much detail from financial to optical linking to build various data flow topologies. Makes me less aghast at the $10M salaries for the masters of these techniques.

  • ipnon 3 hours ago

    Dylan Patel founded Semianalysis and he has a great interview with Satya Nadella on Dwarkesh Patel's podcast.

rahimnathwani 11 hours ago

Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2

  • embedding-shape 10 hours ago

    And this is relevant how? That interview is 1.5 hours, not something you just casually drop a link to and say "here, listen to this to even understand what point I was trying to make"

    • kovezd 9 hours ago

      You can now ask Gemini, about a video. Very useful!

      • andai 8 hours ago

        I have a few lines of "download subtitles with yt-dlp", "remove the VTT crap", and "shove it into llm with a summarization prompt and/or my question appended", but I mostly use Gemini for that now. (And I use it for basically nothing else, oddly enough. They just have the monopoly on access to YouTube transcripts ;)

[removed] 9 hours ago
[deleted]