DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

960 points by pretext 2 days ago

https://huggingface.co/deepseek-ai/DeepSeek-V3.2

https://api-docs.deepseek.com/news/news251201

zug_zug 2 days ago

Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.

Reply View 169 replies

jstummbillig 2 days ago

How could we judge if anyone is "winning" on cost-effectiveness, when we don't know what everyones profits/losses are?

Reply View | 33 replies
- tedivm 2 days ago
  
  If you're trying to build AI based applications you can and should compare the costs between vendor based solutions and hosting open models with your own hardware.
  On the hardware side you can run some benchmarks on the hardware (or use other people's benchmarks) and get an idea of the tokens/second you can get from the machine. Normalize this for your usage pattern (and do your best to implement batch processing where you are able to, which will save you money on both methods) and you have a basic idea of how much it would cost per token.
  Then you compare that to the cost of something like GPT5, which is a bit simpler because the cost per (million) token is something you can grab off of a website.
  You'd be surprised how much money running something like DeepSeek (or if you prefer a more established company, Qwen3) will save you over the cloud systems.
  That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
  
  Reply View | 19 replies
  
  miki123211 a day ago
  
  > with your own hardware
  Or with somebody else's.
  If you don't have strict data residency requirements, and if you aren't doing this at an extremely large scale, doing it on somebody else's hardware makes much more economic sense.
  If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size. If you don't have enough requests to keep GPUs properly fed 24/7, those GPUs will end up underutilized.
  Sometimes underutilization is okay, if your system needs to be airgapped for example, but that's not an economics discussion any more.
  Unlike e.g. video streaming workloads, LLMs can be hosted on the other side of the world from where the user is, and the difference is barely going to be noticeable. This means you can keep GPUs fed by bringing in workloads from other timezones when your cluster would otherwise be idle. Unless you're a large, worldwide organization, that is difficult to do if you're using your own hardware.
  
  Reply View | 1 reply
  
  embedding-shape 19 hours ago
  
  > If you use MoE models (al modern >70B models are MoE), GPU utilization increases with batch size
  Isn't that true for any LLM, MoE or not? In fact, doesn't that apply to most concepts within ML, as long as it's possible to do batching at all, you can scale it up and utilize more of the GPU, until you saturate some part of the process.
  
  Reply View | 0 replies
  
  AlexCoventry a day ago
  
  Mixture-of-Expert models benefit from economies of scale, because they can process queries in parallel, and expect different queries to hit different experts at a given layer. This leads to higher utilization of GPU resources. So unless your application is already getting a lot of use, you're probably under-utilizing your hardware.
  
  Reply View | 0 replies
  
  Muromec 2 days ago
  
  >That's just one factor though. Another is what hardware you can actually run things on. DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
  What's cheap nowdays? I'm out of the loop. Does anything ever run on integrated AMD that is Ryzen AI that comes in framework motherboards? Is under 1k americans cheap?
  
  Reply View | 3 replies
  
  chazeon a day ago
  
  Well the seemingly cheap comes with significantly degraded performance, particular for agentic use. Have you tried replacing Claude Code with some locally deployed model, say, on 4090 or 5090? I have. It is not usable.
  
  Reply View | 9 replies
  
  kmacdough 19 hours ago
  
  Furthermore, paid models are heavily subsidized by bullish investors playing for monopoly. So that tips the scales further towards Deepseek.
  
  Reply View | 0 replies
  
  qeternity 2 days ago
  
  > DeepSeek and Qwen will function on cheap GPUs that other models will simply choke on.
  Uh, Deepseek will not (unless you are referring to one of their older R1 finetuned variants). But any flagship Deepseek model will require 16x A100/H100+ with NVL in FP8.
  
  Reply View | 0 replies
- ericskiff 2 days ago
  
  I believe this was a statement on cost per token to us as consumers of the service
  
  Reply View | 3 replies
  
  moffkalast a day ago
  
  Training cost-effectiveness doesn't matter for open models since someone else ate the cost. In this case, Chinese taxpayers.
  
  Reply View | 2 replies
- deaux a day ago
  
  We can judge on inference cost because we do know what those are for open-weights models as there are a dozen independent providers that host these models and price them according to respective inference cost.
  We can't judge on training cost, that's true.
  
  Reply View | 0 replies
- mzl a day ago
  
  Apart from measuring prices from venture-backed providers which might or might not correlate with cost-effectiveness, I think the measures of intelligence per watt and intelligence per joule from https://arxiv.org/abs/2511.07885 is very interesting.
  
  Reply View | 0 replies
- stingraycharles a day ago
  
  You can use tokens/sec on something like AWS Bedrock (which hosts both open and closed models) as a proxy for “costs per token” for the closed providers.
  
  Reply View | 0 replies
- badmonster a day ago
  
  Good point. Could usage patterns + inference costs give us proxy metrics? What would be a fair baseline?
  
  Reply View | 0 replies
- rowanG077 2 days ago
  
  Well consumers care about the cost to them, and those we know. And deepseek is destroying everything in that department.
  
  Reply View | 4 replies
  
  eru a day ago
  
  Yes. Though we don't know for sure whether that's because they actually have lower costs, or whether it's just the Chinese taxpayer being forced to serve us a treat.
  
  Reply View | 3 replies
srameshc 2 days ago

As much I agree with your sentiment, but I doubt the intention is singular.

Reply View | 96 replies
- energy123 a day ago
  
  It's like AMD open-sourcing FSR or Meta open-sourcing Llama. It's good for us, but it's nothing more than a situational and temporary alignment of self-interest with the public good. When the tables turn (they become the best instead of 4th best, or AMD develops the best upscaler, etc), the decision that aligns with self-interest will change, and people will start complaining that they've lost their moral compass.
  
  Reply View | 3 replies
  
  orbital-decay a day ago
  
  >situational and temporary alignment of self-interest with the public good
  That's how it supposed to work.
  
  Reply View | 0 replies
  
  re-thc a day ago
  
  It's not. This isn't about competition in a company sense but sanctions and wider macro issues.
  
  Reply View | 1 reply
  
  energy123 a day ago
  
  It's like it in the sense that it's done because it aligns with self-interest. Even if the nature of that self-interest differs.
  
  Reply View | 0 replies
- twelvechairs 2 days ago
  
  The bar is incredibly low considering what OpenAI has done as a "not for profit"
  
  Reply View | 2 replies
  
  kopirgan 2 days ago
  
  You need get a bunch of accountants to agree on what's profit first..
  
  Reply View | 1 reply
  
  komali2 a day ago
  
  Agree against their best interest, mind you!
  
  Reply View | 0 replies
- echelon 2 days ago
  
  I don't care if this kills Google and OpenAI.
  I hope it does, though I'm doubtful because distribution is important. You can't beat "ChatGPT" as a brand in laypeople's minds (unless perhaps you give them a massive "Temu: Shop Like A Billionaire" commercial campaign).
  Closed source AI is almost by design morphing into an industrial, infrastructure-heavy rocket science that commoners can't keep up with. The companies pushing it are building an industry we can't participate or share in. They're cordoning off areas of tech and staking ground for themselves. It's placing a steep fence around tech.
  I hope every such closed source AI effort is met with equivalent open source and that the investments made into closed AI go to zero.
  The most likely outcome is that Google, OpenAI, and Anthropic win and every other "lab"-shaped company dies an expensive death. RunwayML spent hundreds of millions and they're barely noticeable now.
  These open source models hasten the deaths of the second tier also-ran companies. As much as I hope for dents in the big three, I'm doubtful.
  
  Reply View | 88 replies
  
  raw_anon_1111 2 days ago
  
  I can’t think of a single company I’ve worked with as a consultant that I could convince to use DeepSeek because of its ties with China even if I explained that it was hosted on AWS and none of the information would go to China.
  Even when the technical people understood that, it would be too much of a political quagmire within their company when it became known to the higher ups. It just isn’t worth the political capital.
  They would feel the same way about using xAI or maybe even Facebook models.
  
  Reply View | 85 replies
  
  giancarlostoro 2 days ago
  
  ChatGPT is like "Photoshop" people will call any AI chatgpt.
  
  Reply View | 0 replies
  
  banq 2 days ago
  
  [dead]
  
  Reply View | 0 replies
make3 2 days ago

I suspect they will keep doing this until they have a substantially better model than the competition. Sharing methods to look good & allow the field to help you keep up with the big guys is easy. I'll be impressed if they keep publishing even when they do beat the big guys soundly.

Reply View | 0 replies
chistev a day ago

How do they make their money

Reply View | 2 replies
- binary132 a day ago
  
  I suspect it is a state venture designed to undermine the American-led proprietary AI boom. I'm all for it, tbh, but as others have pointed out, if they successfully destroy the American ventures it's not like we can expect an altruistic endgame from them.
  
  Reply View | 0 replies
- vitaflo 19 hours ago
  
  Deepseek is owned by a Chinese hedge fund. It was originally created for finance and then generalized later. In any case you pay for it like any other LLM.
  
  Reply View | 0 replies
paulvnickerson 2 days ago

[flagged]

Reply View | 6 replies
- amunozo 2 days ago
  
  Should I root for the democratic OpenAI, Google or Microsoft instead?
  
  Reply View | 1 reply
  
  doctorwho42 2 days ago
  
  Further more, who thinks our little voices matter anymore in the US when it comes to the investor classes?
  And if they did, having a counterweight against corrupt self-centered US oligarchs/CEOs is actually one of the biggest proponents for an actual powerful communist or other model world power. The US had some of the most progressive tax policies in its existence when it was under existential threat during the height of the USSR, and when their powered started to diminish, so too did those tax policies.
  
  Reply View | 0 replies
- stared 2 days ago
  
  There used to be memes „open source is communism”, vide https://souravroy.com/2010/01/01/is-open-source-pro-communis...
  
  Reply View | 0 replies
- Lucasoato 2 days ago
  
  > CrowdStrike researchers next prompted DeepSeek-R1 to build a web application for a Uyghur community center. The result was a complete web application with password hashing and an admin panel, but with authentication completely omitted, leaving the entire system publicly accessible.
  > When the identical request was resubmitted for a neutral context and location, the security flaws disappeared. Authentication checks were implemented, and session management was configured correctly. The smoking gun: political context alone determined whether basic security controls existed.
  Holy shit, these political filters seem embedded directly in the model weights.
  
  Reply View | 2 replies
  
  tadfisher a day ago
  
  LLMs are the perfect tools of oppression, really. It's computationally infeasible to prove just about any property of the model itself, so any bias will always be plausibly deniable as it has to be inferred from testing the output.
  I don't know if I trust China or X less in this regard.
  
  Reply View | 0 replies
  
  tehjoker 2 days ago
  
  not convincing. have you tried saying "free palestine" on a college campus recently?
  
  Reply View | 0 replies
ActorNightly 2 days ago

>winning on cost-effectiveness
Nobody is winning in this area until these things run in full on single graphics cards. Which is sufficient compute to run even most of the complex tasks.

Reply View | 26 replies
- JSR_FDED 2 days ago
  
  Nobody is winning until cars are the size of a pack of cards. Which is big enough to transport even the largest cargo.
  
  Reply View | 6 replies
  
  ActorNightly a day ago
  
  Lol its kinda suprising that the level of understanding around LLMs is so little.
  You already have agents, that can do a lot of "thinking", which is just generating guided context, then using that context to do tasks.
  You already have Vector Databases that are used as context stores with information retrieval.
  Fundamentally, you can have the same exact performance on a lot of task whether all the information exists in the model, or you use a smaller model with a bunch of context around it for guidance.
  So instead of wasting energy and time encoding the knowledge information into the model, making the size large, you could have an "agent-first" model along with just files of vector databases, and the model can fit in a single graphics cards, take the question, decide which vector db it wants to load, and then essentially answer the question in the same way. At $50 per TB from SSD not only do you gain massive cost efficiency, but you also gain the ability to run a lot more inference cheaper, which can be used for refining things, background processing, and so on.
  
  Reply View | 5 replies
- beefnugs 2 days ago
  
  Why does that matter? They wont be making at home graphics cards anymore. Why would you do that when you can be pre-sold $40k servers for years into the future
  
  Reply View | 16 replies
  
  observationist 2 days ago
  
  Because Moore's law marches on.
  We're around 35-40 orders of magnitude from computers now to computronium.
  We'll need 10-15 years before handheld devices can run a couple terabytes of ram, 64-128 terabytes of storage, and 80+ TFLOPS. That's enough to run any current state of the art AI at around 50 tokens per second, but in 10 years, we're probably going to have seen lots of improvements, so I'd guess conservatively you're going to be able to see 4-5x performance per parameter, possibly much more, so at that point, you'll have the equivalent of a model with 10T parameters today.
  If we just keep scaling and there are no breakthroughs, Moore's law gets us through another century of incredible progress. My default assumption is that there are going to be lots of breakthroughs, and that they're coming faster, and eventually we'll reach a saturation of research and implementation; more, better ideas will be coming out than we can possibly implement over time, so our information processing will have to scale, and it'll create automation and AI development pressures, and things will be unfathomably weird and exotic for individuals with meat brains.
  Even so, in only 10 years and steady progress we're going to have fantastical devices at hand. Imagine the enthusiast desktop - could locally host the equivalent of a 100T parameter AI, or run personal training of AI that currently costs frontier labs hundreds of millions in infrastructure and payroll and expertise.
  Even without AGI that's a pretty incredible idea. If we do get to AGI (2029 according to Kurzweil) and it's open, then we're going to see truly magical, fantastical things.
  What if you had the equivalent of a frontier lab in your pocket? What's that do to the economy?
  NVIDIA will be churning out chips like crazy, and we'll start seeing the solar system measured in terms of average cognitive FLOPS per gram, and be well on the way toward system scale computronium matrioshka brains and the like.
  
  Reply View | 14 replies
  
  ActorNightly a day ago
  
  I didn't say winning business, I said winning on cost effectiveness.
  
  Reply View | 0 replies
- bbor 2 days ago
  
  I mean, there are lots of models that run on home graphics cards. I'm having trouble finding reliable requirements for this new version, but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1], which is very doable for professionals in the first world. Quantization can also help immensely.
  Of course, the smaller models aren't as good at complex reasoning as the bigger ones, but that seems like an inherently-impossible goal; there will always be more powerful programs that can only run in datacenters (as long as our techniques are constrained by compute, I guess).
  FWIW, the small models of today are a lot better than anything I thought I'd live to see as of 5 years ago! Gemma3n (which is built to run on phones[2]!) handily beats ChatGPT 3.5 from January 2023 -- rank ~128 vs. rank ~194 on LLMArena[3].
  [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...
  [2] https://huggingface.co/google/gemma-3n-E4B-it
  [3] https://lmarena.ai/leaderboard/text/overall [1] https://blogs.novita.ai/what-are-the-requirements-for-deepse...
  
  Reply View | 1 reply
  
  qeternity 2 days ago
  
  > but V3 (from February) has a 32B parameter model that runs on "16GB or more" of VRAM[1]
  No. They released a distilled version of R1 based on a Qwen 32b model. This is not V3, and it's not remotely close to R1 or V3.2.
  
  Reply View | 0 replies

gradus_ad 2 days ago

How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models? What hurt open source in the past was its inability to keep up with the quality and feature depth of closed source competitors, but models seem to be reaching a performance plateau; the top open weight models are generally indistinguishable from the top private models.

Infrastructure owners with access to the cheapest energy will be the long run winners in AI.

Reply View 54 replies

teleforce 2 days ago

>How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?
According to Google (or someone at Google) no organization has moat on AI/LLM [1]. But that does not mean that it is not hugely profitable providing it as SaaS even you don't own the model or Model as a Service (MaaS). The extreme example is Amazon providing MongoDB API and services. Sure they have their own proprietary DynamoDB but for the most people scale up MongoDB is more than suffice. Regardless brand or type of databases being used, you paid tons of money to Amazon anyway to be at scale.
Not everyone has the resource to host a SOTA AI model. On top of tangible data-intensive resources, they are other intangible considerations. Just think how many company or people host their own email server now although the resources needed are far less than hosting an AI/LLM model?
Google came up with the game changing transformer at its backyard and OpenAI temporarily stole the show with the well executed RLHF based system of ChatGPT. Now the paid users are swinging back to Google with its arguably more superior offering. Even Google now put AI summary as its top most search return results for free to all, higher than its paid advertisement clients.
[1]Google “We have no moat, and neither does OpenAI”:
https://news.ycombinator.com/item?id=35813322

Reply View | 5 replies
- Tepix a day ago
  
  Hosting a SOTA AI model is something that can be separated well from the rest of your cloud deployments. So you can pretty much choose between lots of vendors and that means margins will probably not be that great.
  
  Reply View | 0 replies
- istjohn a day ago
  
  That quote from Google is 2.5 years old.
  
  Reply View | 3 replies
  
  KeplerBoy a day ago
  
  I also cringed a bit about seeing a statement that old being cited, but all the events since then only proved google right, I'd say.
  Improvements seem incremental and smaller. For all I care, I could still happily use sonnet 3.5.
  
  Reply View | 0 replies
  
  zamadatix a day ago
  
  Have they said differently since?
  
  Reply View | 0 replies
  
  mistrial9 a day ago
  
  undergrads at UC Berkeley are wearing vLLM t-shirts
  
  Reply View | 0 replies
bashtoni a day ago

This is exactly why the CEO of Anthropic has been talking up "risks" from AI models and asking for legislation to regulate the industry.

Reply View | 1 reply
- menaerus 21 hours ago
  
  He's talking about completely different type of risks and regulation. It's about the job displacement risks, security and misuse concerns, and ethical and societal impact.
  https://www.youtube.com/watch?v=aAPpQC-3EyE
  https://www.youtube.com/watch?v=RhOB3g0yZ5k
  
  Reply View | 0 replies
alexandre_m a day ago

> What hurt open source in the past was its inability to keep up with the quality and feature depth of closed source competitors
Quality was rarely the reason open source lagged in certain domains. Most of the time, open source solutions were technically superior. What actually hurt open source were structural forces, distribution advantages, and enterprise biases.
One could make an argument that open source solutions often lacked good UX historically, although that has changed drastically the past 20 years.

Reply View | 2 replies
- zarzavat a day ago
  
  For most professional software, the open source options are toys. Is there anything like an open source DAW, for example? It's not because music producers are biased against open source, it's because the economics of open source are shitty unless you can figure out how to get a company to fund development.
  
  Reply View | 1 reply
  
  throwup238 a day ago
  
  > Is there anything like an open source DAW, for example?
  Yes, Ardour. It’s no more a toy than KiCad or Blender.
  
  Reply View | 0 replies
dotancohen 2 days ago

People and companies trust OpenAI and Anthropic, rightly or wrongly, with hosting the models and keeping their company data secure. Don't underestimate the value of a scapegoat to point a finger at when things go wrong.

Reply View | 3 replies
- reed1234 2 days ago
  
  But they also trust cloud platforms like GCP to host models and store company data.
  Why would a company use an expensive proprietary model on Vertex AI, for example, when they could use an open-source one on Vertex AI that is just as reliable for a fraction of the cost?
  I think you are getting at the idea of branding, but branding is different from security or reliability.
  
  Reply View | 1 reply
  
  verdverm a day ago
  
  Looking at and evaluating kimi-2/deepseek vs gemini-family (both through vertex ai), it's not clear open sources is always cheaper for the the same quality
  and then we have to look at responsiveness, if the two models are qualitatively in the same ballpark, which one runs faster?
  
  Reply View | 0 replies
- ehnto a day ago
  
  > Don't underestimate the value of a scapegoat to point a finger at when things go wrong.
  Which is an interesting point in favour of the human employee, as you can only consolidate scape goats so far up the chain before saying "It was AIs fault" just looks like negligence.
  
  Reply View | 0 replies
jonplackett 2 days ago

Either...
Better (UX / ease of use)
Lock in (walled garden type thing)
Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name)

Reply View | 6 replies
- niek_pas 2 days ago
  
  > Trust (If an AI is gonna have the level of insight into your personal data and control over your life, a lot of people will prefer to use a household name.
  Not Google, and not Amazon. Microsoft is a maybe.
  
  Reply View | 4 replies
  
  reed1234 2 days ago
  
  People trust google with their data in search, gmail, docs, and android. That is quite a lot of personal info, and trust, already.
  All they have to do is completely switch the google homepage to gemini one day.
  
  Reply View | 0 replies
  
  polyomino 2 days ago
  
  The success of Facebook basically proves that public brand perception does not matter at all
  
  Reply View | 2 replies
- poszlem 2 days ago
  
  Or lobbing for regulations. You know. The "only american models are safe" kind of regulation.
  
  Reply View | 0 replies
WhyOhWhyQ a day ago

I don't see what OpenAI's niche is supposed to be, other than role playing? Google seems like they'll be the AI utility company, and Anthropic seems like the go-to for the AI developer platform of the future.

Reply View | 4 replies
- linkage a day ago
  
  Anthropic has RLed the shit out of their models to the extent that they give sub-par answers to general purpose questions. Google has great models but is institutionally incapable of building a cohesive product experience. They are literally shipping their org chart with Gemini (mediocre product), AI Overview (trash), AI Mode (outstanding but limited modality), Gemini for Google Workspace (steaming pile), Gemini on Android (meh), etc.
  ChatGPT feels better to use, has the best implementation of memory, and is the best at learning your preferences for the style and detail of answers.
  
  Reply View | 3 replies
  
  Mistletoe a day ago
  
  Gemini is not mediocre, have you used it lately?
  https://www.vellum.ai/llm-leaderboard
  
  Reply View | 0 replies
  
  a96 a day ago
  
  RLed?
  
  Reply View | 1 reply
  
  cmckn a day ago
  
  Reinforcement learning, I believe
  
  Reply View | 0 replies
adam_patarino a day ago

It’s convenience - it’s far easier to call an API than deploy a model to a VPC and configure networking, etc.
Given how often new models come out, it’s also easier to update an API call than constantly deploying model upgrades.
But in the long run, I hope open source wins out.

Reply View | 0 replies
seydor a day ago

Yes but how do you find the best open model? You check google.

Reply View | 2 replies
- mistercheph 20 hours ago
  
  Kagi
  
  Reply View | 1 reply
  
  seydor 20 hours ago
  
  Let me google "free alternative to kagi"
  
  Reply View | 0 replies
delichon 2 days ago

> Infrastructure owners with access to the cheapest energy will be the long run winners in AI.
For a sufficiently low cost to orbit that may well be found in space, giving Musk a rather large lead. By his posts he's currently obsessed with building AI satellite factories on the moon, the better to climb the Kardashev scale.

Reply View | 6 replies
- kridsdale1 2 days ago
  
  The performance bottleneck for space based computers is heat dissipation.
  Earth based computers benefit from the existence of an atmosphere to pull cold air in from and send hot air out to.
  A space data center would need to entirely rely on city sized heat sink fins.
  
  Reply View | 5 replies
  
  delichon 2 days ago
  
  For radiative cooling using aluminum, per 1000 watts at 300 kelvin: ~2.4m^2 area, ~4.8 liters volume, ~13kg weight. So a Starship (150k kg, re-usable) could carry about a megawatt of radiators per launch to LEO.
  And aluminum is abundant in the lunar crust.
  
  Reply View | 2 replies
  
  ehnto a day ago
  
  And the presence of humans. Like with a lot of robotics, the devil is probably in the details. Very difficult to debug your robot factory while it's in orbit.
  That was fun to write but also I am generally on board with humanity pushing robotics further into space.
  I don't think an orbital AI datacentre makes much sense as your chips will be obsolete so quickly that the capex getting it all up there will be better spent on buying the next chips to deploy on earth.
  
  Reply View | 1 reply
  
  eru a day ago
  
  Well, _if_ they can get launch costs down to 100 dollar / kg or so, the economics might make sense.
  Radiative cooling is really annoying, but it's also an engineering problem with a straightforward solution, if mass-in-orbit becomes cheap enough.
  The main reason I see for having datacentres in orbit would be if power in orbit becomes a lot cheaper than power on earth. Cheap enough to make up for the more expensive cooling and cheap enough to make up for the launch costs.
  Otherwise, manufacturing in orbit might make sense for certain products. I heard there's some optical fibres with superior properties that you can only make in near zero g.
  I don't see a sane way to beam power from space to earth directly.
  
  Reply View | 0 replies
tsunamifury 2 days ago

Pure models clearly aren’t the monetizing strategy, use of them on existing monetized surfaces are the core value.
Google would love a cheap hq model on its surfaces. That just helps Google.

Reply View | 2 replies
- gradus_ad 2 days ago
  
  Hmmm but external models can easily operate on any "surface". For instance Claude Code simply reads and edits files and runs in a terminal. Photo editing apps just need a photo supplied to them. I don't think there's much juice to squeeze out of deeply integrated AI as AI by its nature exists above the application layer, in the same way that we exist above the application layer as users.
  
  Reply View | 1 reply
  
  tsunamifury a day ago
  
  Gemini is the most used model on the planet per request.
  All the facts say otherwise to your thoughts here.
  
  Reply View | 0 replies
empath75 20 hours ago

> How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?
So a couple of things. There are going to be a handful of companies in the world with the infrastructure footprint and engineering org capable of running LLMs efficiently and at scale. You are never going to be able to run open models in your own infra in a way that is cost competitive with using their API.
Competition _between_ the largest AI companies _will_ drive API prices to essentially 0 profit margin, but none of those companies will care because they aren't primarily going to make money by selling the LLM API -- your usage of their API just subsidizes their infrastructure costs, and they'll use that infra to build products like chat gpt and claude, etc. Those products are their moat and will be where 90% of their profit comes from.
I am not sure why everyone is so obsessed with "moats" anyway. Why does gmail have so many users? Anybody can build an email app. For the same reason that people stick with gmail, people are going to stick with chatgpt. It's being integrated into every aspect of their lives. The switching costs for people are going to be immense.

Reply View | 0 replies
iLoveOncall 2 days ago

> How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?
They won't. Actually, even if open models aren't competitive, they still won't. Hasn't this been clear since a while already?
There's no moat in models, investments in pure models has only been to chase AGI, all other investment (the majority, from Google, Amazon, etc.) has been on products using LLMs, not models themselves.
This is not like the gold rush where the ones who made good money were the ones selling shovels, it's another kind of gold rush where you make money selling shovels but the gold itself is actually worthless.

Reply View | 0 replies
pembrook a day ago

I call this the "Karl Marx Fallacy." It assumes a static basket of human wants and needs over time, leading to the conclusion competition will inevitably erode all profit and lead to market collapse.
It ignores the reality of humans having memetic emotions, habits, affinities, differentiated use cases & social signaling needs, and the desire to always want to do more...constantly adding more layers of abstraction in fractal ways that evolve into bigger or more niche things.
5 years ago humans didn't know a desire for gaming GPUs would turn into AI. Now it's the fastest growing market.
Ask yourself: how did Google Search continue to make money after Bing's search results started benchmarking just as good?
Or: how did Apple continue to make money after Android opened up the market to commoditize mobile computing?
Etc. Etc.

Reply View | 8 replies
- chinesedessert a day ago
  
  this name is illogical as karl marx did not commit this fallacy
  
  Reply View | 7 replies
  
  pembrook a day ago
  
  Yes, he did, and it was fundamental to his entire economic philosophy: https://en.wikipedia.org/wiki/Tendency_of_the_rate_of_profit...
  
  Reply View | 6 replies
Craighead 2 days ago

[dead]

Reply View | 0 replies
blibble a day ago

> How will the Google/Anthropic/OpenAI's of the world make money on AI if open models are competitive with their models?
hopefully they won't
and their titanic off-balance sheet investments will bankrupt them as they won't be able to produce any revenue

Reply View | 0 replies

red2awn 2 days ago

Worth noting this is not only good on benchmarks, but significantly more efficient at inference https://x.com/_thomasip/status/1995489087386771851

Reply View 16 replies

[removed] 2 days ago

[deleted]

Reply View | 0 replies
ode 2 days ago

Do we know why?

Reply View | 14 replies
- hammeiam 2 days ago
  
  Sparse Attention, it's the highlight of this model as per the paper
  
  Reply View | 13 replies
  
  culi a day ago
  
  How did we come to the place that the most transparent and open models are now coming out of China—freely sharing their research and source code—while all the American ones are fully locked down
  
  Reply View | 11 replies
  
  pylotlight 2 days ago
  
  I'll have to wait for the bycloud video on this one :P
  
  Reply View | 0 replies

embedding-shape 2 days ago

> DeepSeek-V3.2 introduces significant updates to its chat template compared to prior versions. The primary changes involve a revised format for tool calling and the introduction of a "thinking with tools" capability.

At first, I thought they had gone the route of implementing yet another chat format that can handle more dynamic conversations like that, instead of just using Harmony, but looking at the syntax, doesn't it look exactly like Harmony? That's a good thing, don't get me wrong, but why not mention straight up that they've implemented Harmony, so people can already understand up front that it's compatible with whatever parsing we're using for GPT-OSS?

Reply View 1 reply

throwdbaaway a day ago

That DSML in the encoding directory looks quite a bit different from the Harmony chat template.

Reply View | 0 replies

TIPSIO 2 days ago

It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps?

Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further

Reply View 56 replies

tarruda 2 days ago

You can run at ~20 tokens/second on a 512GB Mac Studio M3 Ultra: https://youtu.be/ufXZI6aqOU8?si=YGowQ3cSzHDpgv4z&t=197
IIRC the 512GB mac studio is about $10k

Reply View | 11 replies
- menaerus 21 hours ago
  
  ~20 tokens/second is actually pretty good. I see he's using the q5 version of the model. I wonder how it scales with the larger contexts. And the same guy published the video today with the new 3.2 version: https://www.youtube.com/watch?v=b6RgBIROK5o
  
  Reply View | 0 replies
- hasperdi 2 days ago
  
  and can be faster if you can get an MOE model of that
  
  Reply View | 9 replies
  
  dormento 2 days ago
  
  "Mixture-of-experts", AKA "running several small models and activating only a few at a time". Thanks for introducing me to that concept. Fascinating.
  (commentary: things are really moving too fast for the layperson to keep up)
  
  Reply View | 4 replies
  
  miohtama 2 days ago
  
  All modern models are MoE already, no?
  
  Reply View | 1 reply
  
  hasperdi a day ago
  
  That's not the case. Some are dense and some are hybrid.
  MOE is not the holy grail, as there are drawbacks eg. less consistency, expert under/over-use
  
  Reply View | 0 replies
  
  tarruda a day ago
  
  Deepseek is already a MoE
  
  Reply View | 0 replies
  
  bigyabai 2 days ago
  
  >90% of inference hardware is faster if you run an MOE model.
  
  Reply View | 0 replies
noosphr 2 days ago

Home rigs like that are no longer cost effective. You're better off buying an rtx pro 6000 outright. This holds both for the sticker price, the supporting hardware price, the electricity cost to run it and cooling the room that you use it in.

Reply View | 24 replies
- torginus 2 days ago
  
  I was just watching this video about a Chinese piece of industrial equipment, designed for replacing BGA chips such as flash or RAM with a good deal of precision:
  https://www.youtube.com/watch?v=zwHqO1mnMsA
  I wonder how well the aftermarket memory surgery business on consumer GPUs is doing.
  
  Reply View | 11 replies
  
  dotancohen 2 days ago
  
  I wonder how well the opthalmologist is doing. These guys are going to be paying him a visit playing around with those lasers and no PPE.
  
  Reply View | 9 replies
  
  ThrowawayTestr 2 days ago
  
  LTT recently did a video on upgrading a 5090 to 96gb of ram
  
  Reply View | 0 replies
- throw4039 2 days ago
  
  Yeah, the pricing for the rtx pro 6000 is surprisingly competitive with the gamer cards (at actual prices, not MSRP). A 3x5090 rig will require significant tuning/downclocking to be run from a single North American 15A plug, and the cost of the higher powered supporting equipment (cooling, PSU, UPS, etc) needed will pay for the price difference, not to mention future expansion possibilities.
  
  Reply View | 0 replies
- mikae1 2 days ago
  
  Or perhaps a 512GB Mac Studio. 671B Q4 of R1 runs on it.
  
  Reply View | 10 replies
  
  redrove 2 days ago
  
  I wouldn’t say runs. More of a gentle stroll.
  
  Reply View | 9 replies
reilly3000 2 days ago

There are plenty of 3rd party and big cloud options to run these models by the hour or token. Big models really only work in that context, and that’s ok. Or you can get yourself an H100 rack and go nuts, but there is little downside to using a cloud provider on a per-token basis.

Reply View | 8 replies
- cubefox 2 days ago
  
  > There are plenty of 3rd party and big cloud options to run these models by the hour or token.
  Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy.
  
  Reply View | 7 replies
  
  reilly3000 a day ago
  
  If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:
  https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...
  Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.
  https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.
  You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.
  
  Reply View | 2 replies
  
  weberer a day ago
  
  Fireworks supports this model serverless for $1.20 per million tokens.
  https://fireworks.ai/models/fireworks/deepseek-v3p2
  
  Reply View | 1 reply
  
  cubefox 20 hours ago
  
  That's the final, fine-tuned model. The base model (pretraining only, no instruction SFT, RLHF, RLVR etc) is this one: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base It's apparently not offered at any inference provider, nor are older DeepSeek base models.
  
  Reply View | 0 replies
  
  big_man_ting 2 days ago
  
  have you checked OpenRouter if they offer any providers who serve the model you need?
  
  Reply View | 1 reply
  
  cubefox a day ago
  
  I searched for "base" and the best available base model seems to be indeed Llama 3.1 405B Base at Hyperbolic.ai, as mentioned in the comment above.
  
  Reply View | 0 replies
halyconWays 2 days ago

As someone with a basement rig of 6x 3090s, not really. It's quite slow, as with that many params (685B) it's offloading basically all of it into system RAM. I limit myself to models with <144B params, then it's quite an enjoyable experience. GLM 4.5 Air has been great in particular

Reply View | 1 reply
- lostmsu a day ago
  
  Did you find it better than GPT-OSS 120B? The public rankings are contradictory.
  
  Reply View | 0 replies
seanw265 2 days ago

FWIW it looks like OpenRouter's two providers for this model (one of whom being Deepseek itself) are only running the model around 28tps at the moment.
https://openrouter.ai/deepseek/deepseek-v3.2
This only bolsters your point. Will be interesting to see if this changes as the model is adopted more widely.

Reply View | 1 reply
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
bigyabai 2 days ago

People with basement rigs generally aren't the target audience for these gigantic models. You'd get much better results out of an MoE model like Qwen3's A3B/A22B weights, if you're running a homelab setup.

Reply View | 3 replies
- Aachen a day ago
  
  Who is the target audience of these free releases? I don't mind free and open information sharing but I have wondered what's in it for the people that spent unholy amounts of energy on scraping, developing, and training
  
  Reply View | 0 replies
- Spivak 2 days ago
  
  Yeah I think the advantage of OSS models is that you can get your pick of providers and aren't locked into just Anthropic or just OpenAI.
  
  Reply View | 1 reply
  
  hnfong 2 days ago
  
  Reproducibility of results are also important in some cases.
  There are consumer-ish hardware that can run large models like DeepSeek 3.x slowly. If you're using LLMs for a specific purpose that is well-served by a particular model, you don't want to risk AI companies deprecating it in a couple months and push you to a newer model (that may or may not work better in your situation).
  And even if the AI service providers nominally use the same model, you might have cases where reproducibility requires you use the same inference software or even hardware to maintain high reproducibility of the results.
  If you're just using OpenAI or Anthropic you just don't get that level of control.
  
  Reply View | 0 replies
potsandpans 2 days ago

I run a bunch of smaller models on a 12gb vram 3060 and it's quite good. For larger open models ill use open router. I'm looking into on- demand instances with cloud/vps providers, but haven't explored the space too much.
I feel like private cloud instances that run on demand is still in the spirit of consumer hobbyist. It's not as good as having it all local, but the bootstrapping cost plus electricity to run seems prohibitive.
I'm really interested to see if there's a space for consumer TPUs that satisfy usecases like this.

Reply View | 1 reply
- wickedsight a day ago
  
  Which ones are your favorites that fit on the 3060?
  
  Reply View | 0 replies

zparky 2 days ago

Benchmarks are super impressive, as usual. Interesting to note in table 3 of the paper (p. 15), DS-Speciale is 1st or 2nd in accuracy in all tests, but has much higher token output (50% more, or 3.5x vs gemini 3 in the codeforces test!).

Reply View 1 reply

futureshock 2 days ago

The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling.

Reply View | 0 replies

nickandbro a day ago

For anyone that is interested

"create me a svg of a pelican riding on a bicycle"

https://www.svgviewer.dev/s/FhqYdli5

Reply View 1 reply

chronogram a day ago

It created a whole webpage to showcase the SVG with animation for me: https://output.jsbin.com/qeyubehate

Reply View | 0 replies

cgearhart a day ago

So DSA means a lightweight indexing model evaluated over the entire context window + a top-k attention evaluation. There’s no soft max in the indexing model, so it can run blazingly fast in parallel.

I’m surprised that a fixed size k doesn’t experience degrading performance in long context windows though. That’s a _lot_ of responsibility to push into that indexing function. How could such a simple model achieve high enough precision and recall in a fixed size k for long context windows?

Reply View 0 replies

mcbuilder 2 days ago

After using it a couple hours playing around, it is a very solid entry, and very competitive compared with the big US relaeses. I'd say it's better than GLM4.6 and I'm Kimi K2. Looking forward to v4

Reply View 1 reply

energy123 a day ago

Did you try with 60k+ context? I found previous releases to be lacklustre which I tentatively attributed to the longer context, due to the model being trained on a lot of short context data.

Reply View | 0 replies