Comment by zug_zug

Comment by zug_zug 2 days ago

170 replies

Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.

jstummbillig 2 days ago

How could we judge if anyone is "winning" on cost-effectiveness, when we don't know what everyones profits/losses are?

  • tedivm 2 days ago

    If you're trying to build AI based applications you can and should compare the costs between vendor based solutions and hosting open models with your own hardware.

    On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.

    Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.

    You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.

    That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

    • miki123211 a day ago

      > with your own hardware

      Or with somebody else's.

      If you don't have strict data residency requirements, and if you aren't doing this at an extremely large scale, doing it on somebody else's hardware makes much more economic sense.

      If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size. If you don't have enough requests to keep GPUs properly fed 24/7, those GPUs will end up underutilized.

      Sometimes underutilization is okay, if your system needs to be airgapped for example, but that's not an economics discussion any more.

      Unlike e.g. video streaming workloads, LLMs can be hosted on the other side of the world from where the user is, and the difference is barely going to be noticeable. This means you can keep GPUs fed by bringing in workloads from other timezones when your cluster would otherwise be idle. Unless you're a large, worldwide organization, that is difficult to do if you're using your own hardware.

      • embedding-shape 20 hours ago

        > If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size

        Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.

    • AlexCoventry a day ago

      Mixture-of-Expert models benefit from economies of scale, because they can process queries in parallel, and expect different queries to hit different experts at a given layer. This leads to higher utilization of GPU resources. So unless your application is already getting a lot of use, you're probably under-utilizing your hardware.

    • Muromec 2 days ago

      >That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

      What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap?

      • GTP a day ago

        Not really in the loop either, but when Deepseek R1 was released, I sumbled upon this YouTube channel [1] that made local AI PC builds in the 1000-2000$ range. But he doesn't always use GPUs, maybe the cheaper builds were CPU plus a lot of RAM, I don't remember.

        [1] https://youtube.com/@digitalspaceport?si=NrZL7MNu80vvAshx

    • chazeon a day ago

      Well the seemingly cheap comes with significantly degraded performance, particular for agentic use. Have you tried replacing Claude Code with some locally deployed model, say, on 4090 or 5090? I have. It is not usable.

      • nylonstrung a day ago

        Deepseek and Kimi both have great agentic performance

        When used with crush/opencode they are close to Claude performance.

        Nothing that runs on a 4090 would compete but Deepseek on openrouter is still 25x cheaper than claude

      • estsauver a day ago

        Well, those are also extremely limited vram areas that wouldn't be able to run anything in the ~70b parameter space. (Can you run 30b even?)

        Things get a lot more easier at lower quantisation, higher parameter space, and there's a lot of people's whose jobs for AI are "Extract sentiment from text" or "bin into one of these 5 categories" where that's probably fine.

      • elif a day ago

        Strictly speaking, you have not deployed any model on a 5090 because a 5090 card has never been produced.

        And without specifying your quantization level it's hard to know what you mean by "not usable"

        Anyway if you really wanted to try cheap distilled/quantized models locally you would be using used v100 Teslas and not 4 year old single chip gaming GPUs.

      • JosephjackJR 20 hours ago

        they took the already ridiculous v3.1 terminus model, added this new deepseek sparse attention thing, and suddenly it’s doing 128k context at basically half the inference cost of the old version with no measurable drop in reasoning or multilingual quality. like, imo gold medal level math and code, 100+ languages, all while sipping tokens at 14 cents per million input. that’s stupid cheap. the rl recipe they used this time also seems way more stable. no more endless repetition loops or random language switching you sometimes got with the earlier open models. it just works. what really got me is how fast the community moved. vllm support landed the same day, huggingface space was up in hours, and people are already fine-tuning it for agent stuff and long document reasoning. i’ve been playing with it locally and the speed jump on long prompts is night and day. feels like the gap to the closed frontier models just shrank again. anyone else tried it yet?

    • kmacdough 20 hours ago

      Furthermore, paid models are heavily subsidized by bullish investors playing for monopoly. So that tips the scales further towards Deepseek.

    • qeternity 2 days ago

      > DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.

      Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.

  • ericskiff 2 days ago

    I believe this was a statement on cost per token to us as consumers of the service

    • moffkalast a day ago

      Training cost-effectiveness doesn't matter for open models since someone else ate the cost. In this case, Chinese taxpayers.

      • KvanteKat a day ago

        Deepseek is a private corporation funded by a hedge fund (High-Flyer). I doubt much public money was spent by the Chinese state on this. Like with LLMs in the US, the people paying for it so far are mainly investors who are betting on a return in the long to medium term.

        • boringg 20 hours ago

          Do you actually believe what you just wrote or are you trolling? One version at least has a foot planted in reality. The other one well...

  • deaux 2 days ago

    We can judge on inference cost because we do know what those are for open-weights models as there are a dozen independent providers that host these models and price them according to respective inference cost.

    We can't judge on training cost, that's true.

  • mzl a day ago

    Apart from measuring prices from venture-backed providers which might or might not correlate with cost-effectiveness, I think the measures of intelligence per watt and intelligence per joule from https://arxiv.org/abs/2511.07885 is very interesting.

  • stingraycharles 2 days ago

    You can use tokens/sec on something like AWS Bedrock (which hosts both open and closed models) as a proxy for “costs per token” for the closed providers.

  • badmonster a day ago

    Good point. Could usage patterns + inference costs give us proxy metrics? What would be a fair baseline?

  • rowanG077 2 days ago

    Well consumers care about the cost to them, and those we know. And deepseek is destroying everything in that department.

    • eru a day ago

      Yes. Though we don't know for sure whether that's because they actually have lower costs, or whether it's just the Chinese taxpayer being forced to serve us a treat.

      • chronogram a day ago

        Third party providers are still cheap though. The closed models are the ones where you can't see the real cost to running them.

srameshc 2 days ago

As much I agree with your sentiment, but I doubt the intention is singular.

  • energy123 a day ago

    It's like AMD open-sourcing FSR or Meta open-sourcing Llama. It's good for us, but it's nothing more than a situational and temporary alignment of self-interest with the public good. When the tables turn (they become the best instead of 4th best, or AMD develops the best upscaler, etc), the decision that aligns with self-interest will change, and people will start complaining that they've lost their moral compass.

    • orbital-decay a day ago

      >situational and temporary alignment of self-interest with the public good

      That's how it supposed to work.

    • re-thc a day ago

      It's not. This isn't about competition in a company sense but sanctions and wider macro issues.

      • energy123 a day ago

        It's like it in the sense that it's done because it aligns with self-interest. Even if the nature of that self-interest differs.

  • twelvechairs 2 days ago

    The bar is incredibly low considering what OpenAI has done as a "not for profit"

    • kopirgan 2 days ago

      You need get a bunch of accountants to agree on what's profit first..

      • komali2 a day ago

        Agree against their best interest, mind you!

  • echelon 2 days ago

    I don't care if this kills Google and OpenAI.

    I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign).

    Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech.

    I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero.

    The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now.

    These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful.

    • raw_anon_1111 2 days ago

      I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.

      Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital.

      They would feel the same way about using xAI or maybe even Facebook models.

      • StealthyStart 2 days ago

        This is the real cause. At the enterprise level, trust outweighs cost. My company hires agencies and consultants who provide the same advice as our internal team; this is not to imply that our internal team is incorrect; rather, there is credibility that if something goes wrong, the decision consequences can be shifted, and there is a reason why companies continue to hire the same four consulting firms. It's trust, whether it's real or perceived.

      • tokioyoyo 2 days ago

        If the Chinese model becomes better than competitors, these worries will suddenly disappear. Also, there are plenty startups and enterprises that are running fine-tuned versions of different OS models.

      • deaux 2 days ago

        > Even when the technical people understood that

        I'm not sure if technical people who don't understand this deserve the moniker technical in this context.

      • nylonstrung a day ago

        The average person has been programmed to be distrustful of open source in general, thinking it is inferior quality or in service of some ulterior motive

      • register 2 days ago

        That might be the perspective of a US based company. But there is also Europe and basically it's a choice between Trump and China.

        • Muromec 2 days ago

          Europe has Mistral. It feels that governments that can do things without fax take this as a sovereignity thing and roll their own or have their provider in their jurisdiction.

      • tehjoker 2 days ago

        really a testament to how easily the us govt has spun a china bad narrative even though it is mostly fiction and american exceptionalism

      • littlestymaar 2 days ago

        > I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.

        Well for non-American companies, you have the choice between Chinese models that don't send data home, and American ones that do, with both countries being more or less equally threatening.

        I think if Mistral can just stay close enough to the race it will win many customers by not doing anything.

    • giancarlostoro 2 days ago

      ChatGPT is like "Photoshop" people will call any AI chatgpt.

make3 2 days ago

I suspect they will keep doing this until they have a substantially better model than the competition. Sharing methods to look good & allow the field to help you keep up with the big guys is easy. I'll be impressed if they keep publishing even when they do beat the big guys soundly.

chistev a day ago

How do they make their money

  • binary132 a day ago

    I suspect it is a state venture designed to undermine the American-led proprietary AI boom. I'm all for it, tbh, but as others have pointed out, if they successfully destroy the American ventures it's not like we can expect an altruistic endgame from them.

  • vitaflo 20 hours ago

    Deepseek is owned by a Chinese hedge fund. It was originally created for finance and then generalized later. In any case you pay for it like any other LLM.

paulvnickerson 2 days ago

[flagged]

  • amunozo 2 days ago

    Should I root for the democratic OpenAI, Google or Microsoft instead?

    • doctorwho42 2 days ago

      Further more, who thinks our little voices matter anymore in the US when it comes to the investor classes?

      And if they did, having a counterweight against corrupt self-centered US oligarchs/CEOs is actually one of the biggest proponents for an actual powerful communist or other model world power. The US had some of the most progressive tax policies in its existence when it was under existential threat during the height of the USSR, and when their powered started to diminish, so too did those tax policies.

  • Lucasoato 2 days ago

    > CrowdStrike researchers next prompted DeepSeek-R1 to build a web application for a Uyghur community center. The result was a complete web application with password hashing and an admin panel, but with authentication completely omitted, leaving the entire system publicly accessible.

    > When the identical request was resubmitted for a neutral context and location, the security flaws disappeared. Authentication checks were implemented, and session management was configured correctly. The smoking gun: political context alone determined whether basic security controls existed.

    Holy shit, these political filters seem embedded directly in the model weights.

    • tadfisher a day ago

      LLMs are the perfect tools of oppression, really. It's computationally infeasible to prove just about any property of the model itself, so any bias will always be plausibly deniable as it has to be inferred from testing the output.

      I don't know if I trust China or X less in this regard.

    • tehjoker 2 days ago

      not convincing. have you tried saying "free palestine" on a college campus recently?

ActorNightly 2 days ago

>winning on cost-effectiveness

Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks.

  • JSR_FDED 2 days ago

    Nobody is winning until cars are the size of a pack of cards. Which is big enough to transport even the largest cargo.

    • ActorNightly a day ago

      Lol its kinda suprising that the level of understanding around LLMs is so little.

      You already have agents, that can do a lot of "thinking", which is just generating guided context, then using that context to do tasks.

      You already have Vector Databases that are used as context stores with information retrieval.

      Fundamentally, you can have the same exact performance on a lot of task whether all the information exists in the model, or you use a smaller model with a bunch of context around it for guidance.

      So instead of wasting energy and time encoding the knowledge information into the model, making the size large, you could have an "agent-first" model along with just files of vector databases, and the model can fit in a single graphics cards, take the question, decide which vector db it wants to load, and then essentially answer the question in the same way. At $50 per TB from SSD not only do you gain massive cost efficiency, but you also gain the ability to run a lot more inference cheaper, which can be used for refining things, background processing, and so on.

      • eru a day ago

        You should start a company and try your strategy. I hope it works! (Though I am doubtful.)

        In any case, models are useful, even when they don't hit these efficiency targets you are projecting. Just like cars are useful, even when they are bigger than a pack of cards.

  • beefnugs 2 days ago

    Why does that matter? They wont be making at home graphics cards anymore. Why would you do that when you can be pre-sold $40k servers for years into the future

    • observationist 2 days ago

      Because Moore's law marches on.

      We're around 35-40 orders of magnitude from computers now to computronium.

      We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today.

      If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains.

      Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise.

      Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things.

      What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

      NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like.

      • eru a day ago

        > What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?

        Well, these days people have the equivalent of a frontier lab from perhaps 40 years ago in their pocket. We can see what that has done to the economy, and try to extrapolate.

      • blonder a day ago

        I appreciate your rabid optimism, but considering that Moores Law has ceased to be true for multiple years now I am not sure a handwave about being able to scale to infinity is a reasonable way to look at things. Plenty of things have slowed down in progress in our current age, for example airplanes.

      • ActorNightly a day ago

        Nothing to do with Moores Law or AGI.

        The current models are simply inefficient for their capability in how they handle data.

      • delaminator 2 days ago

        > If we do get to AGI (2029 according to Kurzweil)

        if you base your life on Kurzweil's hard predictions you're going to have a bad time

    • ActorNightly a day ago

      I didn't say winning business, I said winning on cost effectiveness.

  • bbor 2 days ago

    I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.

    Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).

    FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].

    [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

    [2] https://huggingface.co/google/gemma-3n-E4B-it

    [3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...

    • qeternity 2 days ago

      > but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1]

      No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.