Comment by airhangerf15

Comment by airhangerf15 15 days ago

111 replies

An H100 is a $20k USD card and has 80GB of vRAM. Imagine a 2U rack server with $100k of these cards in it. Now imagine an entire rack of these things, plus all the other components (CPUs, RAM, passive cooling or water cooling) and you're talking $1 million per rack, not including the costs to run them or the engineers needed to maintain them. Even the "cheaper"

I don't think people realize the size of these compute units.

When the AI bubble pops is when you're likely to be able to realistically run good local models. I imagine some of these $100k servers going for $3k on eBay in 10 years, and a lot of electricians being asked to install new 240v connectors in makeshift server rooms or garages.

semi-extrinsic 15 days ago

What do you mean 10 years?

You can pick up a DGX-1 on Ebay right now for less than $10k. 256 GB vRAM (HBM2 nonetheless), NVLink capability, 512 GB RAM, 40 CPU cores, 8 TB SSD, 100 Gbit HBAs. Equivalent non-Nvidia branded machines are around $6k.

They are heavy, noisy like you would not believe, and a single one just about maxes out a 16A 240V circuit. Which also means it produces 13 000 BTU/hr of waste heat.

  • kj4ips 15 days ago

    Fair warning: the BMCs on those suck so bad, and the firmware bundles are painful, since you need a working nvidia-specific container runtime to apply them, which you might not be able to get up and running because of a firmware bug causing almost all the ram to be presented as nonvolatile.

    • iJohnDoe 14 days ago

      Are there better paths you would suggest? Any hardware people have reported better luck with?

      • kj4ips 14 days ago

        Honestly, unless you //really// need nvlink/ib (meaning that copies and pcie trips are your bottleneck), you may do better with whatever commodity system with sufficient lanes, slots, and CFM is available at a good price.

  • ksherlock 15 days ago

    It's not waste heat if you only run it in the winter.

    • hdgvhicv 15 days ago

      Opt if you ignore that both gas furnaces and heat pumps are more efficient than resistive loads.

      • tgma 15 days ago

        Heat pump sure, but how is gas furnace more efficient than resistive load inside the house? Do you mean more economical rather than more efficient (due to gas being much cheaper/unit of energy)?

      • Tade0 15 days ago

        I'm in the market for an oven right now and 230V/16A is the voltage/current the one I'll probably be getting operates under.

        At 90°C you can do sous vide, so basically use that waste heat entirely.

        For such temperatures you'd need a CO2 heat pump, which is still expensive. I don't know about gas, as I don't even have a line to my place.

  • eulgro 15 days ago

    > 13 000 BTU/hr

    In sane units: 3.8 kW

    • andy99 15 days ago

      You mean 1.083 tons of refrigeration

    • Skunkleton 15 days ago

      > In sane units: 3.8 kW

      5.1 Horsepower

      • amy214 15 days ago

        > > In sane units: 3.8 kW

        > 5.1 Horsepower

        0-60 in 1.8 seconds

        • oblio 14 days ago

          Again, in sane units:

          0-100 in 1.92 seconds

      • _kb 15 days ago

        3.8850 poncelet

    • markdown 15 days ago

      How many football fields of power?

    • semi-extrinsic 14 days ago

      The choice of BTU/hr was firmly tongue in cheek for our American friends.

  • quickthrowman 15 days ago

    You’ll need (2) 240V 20A 2P breakers, one for the server and one for the 1-ton mini-split to remove the heat ;)

    • Dylan16807 15 days ago

      Matching AC would only need 1/4 the power, right? If you don't already have a method to remove heat.

      • quickthrowman 15 days ago

        Cooling BTUs already take the coefficient of performance of the vapor-compression cycle into account. 4w of heat removed for each 1w of input power is around the max COP for an air cooled condenser, but adding an evaporative cooling tower can raise that up to ~7.

        I just looked at a spec sheet for a 230V single-phase 12k BTU mini-split and the minimum circuit ampacity was 3A for the air handler and 12A for the condenser, add those together for 15A, divide by .8 is 18.75A, next size up is 20A. Minimum circuit ampacity is a formula that is (roughly) the sum of the full load amps of the motor(s) inside the piece of equipment times 1.25 to determine the conductor size required to power the equipment.

        So the condensing unit likely draws ~9.5-10A max and the air handler around ~2.4A, and both will have variable speed motors that would probably only need about half of that to remove 12k BTU of heat, so ~5-6A or thereabouts should do it, which is around 1/3rd of the 16A server, or a COP of 3.

    • Scoundreller 15 days ago

      Just air freight them from 60 degrees North to 60 degrees South and vice verse every 6 months.

    • kelnos 15 days ago

      Well, get a heat pump with a good COP of 3 or more, and you won't need quite as much power ;)

      • [removed] 15 days ago
        [deleted]
  • xtiansimon 14 days ago

    > “They are heavy, noisy like you would not believe, … produces … waste heat.”

    Haha. I bought a 20 yro IBM server off eBay for a song. It was fun for a minute. Soon became a doorstop and I sold it as pickup-only on eBay for $20. Beast. Never again have one in my home.

    • yencabulator 14 days ago

      That's about the era my company was an IBM reseller. Once I was kneeling behind 8x1U starting up and all the fans went to max speed for 3 seconds. Never put rackmount hardware in a room that is near anything living.

    • guenthert 14 days ago

      Get an AS400. Those were actually expected to be installed in an office, rather than a server room. Might still be perceived as loud at home, but won't be deafening and probably not louder than some gaming rigs.

  • CamperBob2 15 days ago

    Are you talking about the guy in Temecula running two different auctions with some of the same photos (356878140643 and 357146508609, both showing a missing heat sink?) Interesting, but seems sketchy.

    How useful is this Tesla-era hardware on current workloads? If you tried to run the full DeepSeek R1 model on it at (say) 4-bit quantization, any idea what kind of TTFT and TPS figures might be expected?

    • oceanplexian 15 days ago

      I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.

      • CamperBob2 15 days ago

        Impressive. Is that a distillation, or the real thing?

  • nulltype 10 days ago

    > What do you mean 10 years?

    Didn’t the DGX-1 come out 9 years ago?

invaliduser 15 days ago

Even is the AI bubble does not pops, your prediction about those servers being available on ebay in 10 years will likely be true, because some datacenters will simply upgrade their hardware and resell their old ones to third parties.

  • potatolicious 15 days ago

    Would anybody buy the hardware though?

    Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.

    It's kind of like buying a used GeForce 980Ti in 2025. Would anyone buy them and run them besides out of nostalgia or curiosity? Just the power draw makes them uneconomical to run.

    Much more likely every single H100 that exists today becomes e-waste in a few years. If you have need for H100-level compute you'd be able to buy it in the form of new hardware for way less money and consuming way less power.

    For example if you actually wanted 980Ti-level compute in a desktop today you can just buy a RTX5050, which is ~50% faster, consumes half the power, and can be had for $250 brand new. Oh, and is well-supported by modern software stacks.

    • CBarkleyU 15 days ago

      Off topic, but I bought my (still in active use) 980ti literally 9 years ago for that price. I know, I know, inflation and stuff, but I really expected more than 50% bang for my buck after 9 whole years…

    • nucleardog 15 days ago

      > Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.

      I think the existence of a pretty large secondary market for enterprise servers and such kind of shows that this won't be the case.

      Sure, if you're AWS and what you're selling _is_ raw compute, then couple generation old hardware may not be sufficiently profitable for you anymore... but there are a lot of other places that hardware could be applied to with different requirements or higher margins where it may still be.

      Even if they're only running models a generation or two out of date, there are a lot of use cases today, with today's models, that will continue to work fine going forward.

      And that's assuming it doesn't get replaced for some other reason that only applies when you're trying to sell compute at scale. A small uptick in the failure rate may make a big dent at OpenAI but not for a company that's only running 8 cards in a rack somewhere and has a few spares on hand. A small increase in energy efficiency might offset the capital outlay to upgrade at OpenAI, but not for the company that's only running 8 cards.

      I think there's still plenty of room in the market in places where running inference "at cost" would be profitable that are largely untapped right now because we haven't had a bunch of this hardware hit the market at a lower cost yet.

      • [removed] 15 days ago
        [deleted]
    • nullc 14 days ago

      I have around a thousand broadwell cores in 4 socket systems that I got for ~nothing from these sorts of sources... pretty useful. (I mean, I guess literally nothing since I extracted the storage backplanes and sold them for more than the systems cost me). I try to run tasks in low power costs hours on zen3/4 unless it's gonna take weeks just running on those, and if it will I crank up the rest of the cores.

      And 40 P40 GPUs that cost very little, which are a bit slow but with 24gb per gpu they're pretty useful for memory bandwidth bound tasks (and not horribly noncompetitive in terms of watts per TB/s).

      Given highly variable time of day power it's also pretty useful to just get 2x the computing power (at low cost) and just run it during the low power cost periods.

      So I think datacenter scrap is pretty useful.

    • mindslight 15 days ago

      It's interesting to think about scenarios where that hardware would get used only part of the time, like say when the sun is shining and/or when dwelling heat is needed. The biggest sticking point would seem to be all of the capex for connecting them to do something useful. It's a shame that PLX switch chips are so expensive.

    • airhangerf15 15 days ago

      The 5050 doesn't support 32-bit PsyX. So a bunch of games would be missing a ton of stuff. You'd still need the 980 running with it for older PhyX games because nVidia.

  • belter 15 days ago

    Except their insane electricity demands will still be the same, meaning nobody will buy them. You have plenty of SPARC servers on Ebay.

    • cicloid 15 days ago

      There is also a community of users known for not making sane financial decisions and keeping older technologies working in their basements.

      • dijit 15 days ago

        But we are few, and fewer still who will go for high power consumption devices with esoteric cooling requirements that generate a lot of noise.

  • mattmanser 15 days ago

    Someone's take on AI was that we're collectively investing billions in data centers that will be utterly worthless in 10 years.

    Unlike the investments in railways or telephone cables or roads or any other sort of architecture, this investment has a very short lifespan.

    Their point was that whatever your take on AI, the present investment in data centres is a ridiculous waste and will always end up as a huge net loss compared to most other investments our societies could spend it on.

    Maybe we'll invent AGI and he'll be proven wrong as they'll pay back themselves many times over, but I suspect they'll ultimately be proved right and it'll all end up as land fill.

    • toast0 15 days ago

      The servers may well be worthless (or at least worth a lot less), but that's pretty much true for a long time. Not many people want to run on 10 year old servers (although I pay $30/month for a dedicated server that's dual Xeon L5640 or something like that, which is about 15 years old).

      The servers will be replaced, the networking equipment will be replaced. The building will still be useful, the fiber that was pulled to internet exchanges/etc will still be useful, the wiring to the electric utility will still be useful (although I've certainly heard stories of datacenters where much of the floor space is unusable, because power density of racks has increased and the power distribution is maxed out)

      • hattmall 15 days ago

        I have a server in my office that's at from 2009 still far more economical to run than buying any sort of cloud compute. By at least an order of magnitude.

    • bespokedevelopr 15 days ago

      If it is all a waste and a bubble, I wonder what the long term impact will be of the infrastructure upgrades around these dcs. A lot of new HV wires and substations are being built out. Cities are expanding around clusters of dcs. Are they setting themselves up for a new rust belt?

      • abeyer 15 days ago

        Or early provisioning for massively expanded electric transit and EV charging infrastructure, perhaps.

      • thenthenthen 13 days ago

        There are a lot of examples of former industrial sites (rust belts) that are now redeveloped into data center sites because the infra is already partly there and the environment might be beneficial, politically, environmentally/geographically. For example many old industrial sites relied on water for cooling and transportation. This water can now be used to cool data centers. I think you are onto something though, if you depart from the history of these places and extrapolate into the future.

      • hirvi74 15 days ago

        Maybe the dcs could be turned into some mean cloud gaming servers?

    • dortlick 15 days ago

      Sure, but what about the collective investment in smartphones, digital cameras, laptops, even cars. Not much modern technology is useful and practical after 10 years, let alone 20. AI is probably moving a little faster than normal, but technology depreciation is not limited to AI.

    • gscott 14 days ago

      If a coal powered electric plant it next to the data-center you might be able to get electric cheap enough to keep it going.

      Datacenters could go into the business of making personal PC's or workstations using the older NVIDIA cards and sell them.

    • jonplackett 15 days ago

      They probably are right, but a counter argument could be how people thought going to the moon was pointless and insanely expensive, but the technology to put stuff in space and have GPS and comms satellites probably paid that back 100x

      • vl 15 days ago

        Reality is that we don’t know how much of a trope this statement is.

        I think we would get all this technology without going to the moon or Space Shuttle program. GPS, for example, was developed for military applications initially.

      • DaiPlusPlus 15 days ago

        I don’t mean to invalidate your point (about genuine value arising from innovations originating from the Apollo program), but GPS and comms satellites (and heck, the Internet) are all products of nuclear weapons programs rather than civilian space exploration programs (ditto the Space Shuttle, and I could go on…).

        • CamperBob2 15 days ago

          Yes, and no. The people working on GPS paid very close attention to the papers from JPL researchers describing their timing and ranging techniques for both Apollo and deep-space probes. There was more cross-pollination than meets the eye.

      • somenameforme 14 days ago

        It's not that going to the Moon was pointless, but stopping after we'd done little more than planted a flag was. Werner von Braun was the head architect of the Apollo Program and the Moon was intended as little more than a stepping stone towards setting up a permanent colony on Mars. Incidentally this is also the technical and ideological foundation of what would become the Space Shuttle and ISS, which were both also supposed to be little more than small scale tools on this mission, as opposed to ends in and of themselves.

        Imagine if Columbus verified that the New World existed, planted a flag, came back - and then everything was cancelled. Or similarly for literally any colonization effort ever. That was the one downside of the space race - what we did was completely nonsensical, and made sense only because of the context of it being a 'race' and politicians having no greater vision than beyond the tip of their nose.

    • pbh101 15 days ago

      This isn’t my original take but if it results in more power buildout, especially restarting nuclear in the US, that’s an investment that would have staying power.

    • mensetmanusman 15 days ago

      Utterly? Moores law per power requirement is dead, lower power units can run electric heating for small towns!

  • DecentShoes 14 days ago

    This seems likely. Blizzard even sold off old World of Warcraft servers. You can still get them on ebay

torginus 15 days ago

My personal sneaking suspicion is that publicly offered models are using way less compute than thought. In modern mixture of experts models, you can do top-k sampling, where only some experts are evaluated, meaning even SOTA models aren't using much more compute than a 70-80b non-MoE model.

ActorNightly 15 days ago

To piggyback on this, at enterprise level in modern age, the question is really not about "how are we going to serve all these users", it comes down to the fact that investors believe that eventually they will see a return on investment, and then pay whatever is needed to get the infra.

Even if you didn't have optimizations involved in terms of job scheduling, they would just build as many warehouses as necessary filled with as many racks as necessary to serve the required user base.

RagnarD 14 days ago

An RTX 6000 Pro (NVIDIA Blackwell GPU) has 96GB of VRAM and can be had for around $7700 currently (at least, the lowest price I've found.) It plugs into standard PC motherboard PCIe slots. The Max Q edition has slightly less performance but a max TDP of only 300W.

eitally 15 days ago

What I wonder is what this means for Coreweave, Lambda and the rest, who are essentially just renting out fleets of racks like this. Does it ultimately result in acquisition by a larger player? Severe loss of demand? Can they even sell enough to cover the capex costs?

  • cootsnuck 14 days ago

    It means they're likely going to be left holding a very expensive bag.

  • adw 15 days ago

    These are also depreciating assets.

[removed] 15 days ago
[deleted]
torginus 15 days ago

I wonder if it's feasible to hook up NAND flash with a high bandwidth link necessary for inference.

Each of these NAND chips hundreds of dies of flash stacked inside, and they are hooked up to the same data line, so just 1 of them can talk at the same time, and they still achieve >1GB/s bandwidth. If you could hook them up in parallel, you could have 100s of GBs of bandwidth per chip.

  • potatolicious 15 days ago

    NAND is very, very slow relative to RAM, so you'd pay a huge performance penalty there. But maybe more importantly my impression is that memory contents mutate pretty heavily during inference (you're not just storing the fixed weights), so I'd be pretty concerned about NAND wear. Mutating a single bit on a NAND chip a million times over just results in a large pile of dead NAND chips.

    • torginus 15 days ago

      No it's not slow - a single NAND chip in SSDs offers >1GB of bandwidth - inside the chip there are 100+ wafers actually holding the data, but in SSDs only one of them is active when reading/writing.

      You could probably make special NAND chips where all of them can be active at the same time, which means you could get 100GB+ bandwidth out of a single chip.

      This would be useless for data storage scenarios, but very useful when you have huge amounts of static data you need to read quickly.

      • slickytail 15 days ago

        The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.

neko_ranger 15 days ago

Four H100 in a 2U rack didn't sound impressive, but that is accurate:

>A typical 1U or 2U server can accommodate 2-4 H100 PCIe GPUs, depending on the chassis design.

>In a 42U rack with 20x 2U servers (allowing space for switches and PDU), you could fit approximately 40-80 H100 PCIe GPUs.

  • michaelt 15 days ago

    Why stop at 80 H100s for a mere 6.4 terabytes of GPU memory?

    Supermicro will sell you a full rack loaded with servers [1] providing 13.4 TB of GPU memory.

    And with 132kW of power output, you can heat an olympic-sized swimming pool by 1°C every day with that rack alone. That's almost as much power consumption as 10 mid-sized cars cruising at 50 mph.

    [1] https://www.supermicro.com/en/products/system/gpu/48u/srs-gb...

  • jzymbaluk 15 days ago

    And the big hyperscaler cloud providers are building city-block sized data centers stuffed to the gills with these racks as far as the eye can see

tootie 14 days ago

Yeah I think the crux of the issue is that chatgpt is serving a huge number of users including paid users and is still operating at a massive operating loss. They are spending truckloads of money on GPUs and selling access at a loss.

scarface_74 15 days ago

This isn’t like how Google was able to buy up dark fiber cheaply and use it.

From what I understand, this hardware has a high failure rate over the long term especially because of the heat they generate.

shusaku 15 days ago

> When the AI bubble pops is when you're likely to be able to realistically run good local models.

After years of “AI is a bubble, and will pop when everyone realizes they’re useless plagiarism parrots” it’s nice to move to the “AI is a bubble, and will pop when it becomes completely open and democratized” phase

  • cootsnuck 14 days ago

    It's not even been 3 years. Give it time. The entire boom and bust of the dot come bubble took 7 years.