singularity2001 2 days ago

Why are there so few 32,64,128,256,512 GB models which could run on current consumer hardware? And why is the maximum RAM on Mac studio M4 128 GB??

  • eldenring a day ago

    the only real benefit is privacy which 99.9% of people dont get about. Almost all serving metrics (cost, throughput, ttft) are better with large gpu clusters. Latency is usually hidden by prefill cost.

    • cowpig a day ago

      More and more people I talk to care about privacy, but not in SF

    • mistercheph 21 hours ago

      and sovereignty. I can go into the woods with a fuzzy approximation of all internet text in my backpack

  • jameslk 2 days ago

    128 GB should be enough for anybody (just kidding). I hope the M5 Max will have higher RAM limits

    • aryonoco 2 days ago

      M5 Max probably won’t, but M5 Ultra probably will

  • ainch a day ago

    As LLMs are productionised/commodified they're incorporating changes which are enthusiast-unfriendly. Small dense models are great for enthusiasts running inference locally, but for parallel batched inference MoE models are much more efficient.

mark_l_watson a day ago

I used DeepSeek-v3.2 to solve two coding problems by pasting code and directions as one large prompt into a chat interface and it performed very well. VERY WELL!

I am still happy to pay Google because of their ecosystem or Gemini app, NotebookLM, Colab, gemini-cli, etc. Google’s moat for me is all the tooling and engineering around the models.

That said, my one year Google AI subscription ends in four months and I might try an alternative, or at least evaluate options. Alibaba Cloud looks like an interesting low cost alternative to AWS for building systems. I am now a retired ‘gentleman scientist’ now and my personal research is inexpensive no matter who I pay for inference compute, but it is fun to spend a small amount of time evaluating alternatives even though mostly using Google is time efficient.

sfdlkj3jk342a a day ago

What version is actually running on chat.deepseek.com?

It refuses to tell me when asked, only that it's been train with data up until July 2024, which would make it quite old. I turned off search and asked it for the winner of the US 2024 election, and it said it didn't know, so I guess that confirms it's not a recent model.

  • scottyeager a day ago

    You can read that 3.2 is live on web and app here: https://api-docs.deepseek.com/news/news251201

    The pdf describes how they did "continued pre-training" and then post training to make 3.2. I guess what's missing is the full pre-training that absorbs most date sensitive knowledge. That's probably also the reason that the versions are 3.x still.

chistev a day ago

I've found it better than ChatGPT lately, at least the free version of GPT.

I don't know, but GPT seems to have regressed a lot, at least the free version.

swframe2 21 hours ago

The AI market is hard to predict due to the constant development of new algorithms that could emerge unexpectedly. Refer to this summary of Ilya's opinions for insights into the necessity of these new algorithms: https://youtu.be/DcrXHTOxi3I

DeepSeek is a valuable product, but its open-source nature makes it difficult to displace larger competitors. Any advancements can be quickly adopted, and in fact, it may inadvertently strengthen these companies by highlighting weaknesses in their current strategies.

jodleif 2 days ago

I genuinely do not understand the evaluations of the US AI industry. The chinese models are so close and far cheaper

  • espadrine 2 days ago

    Two aspects to consider:

    1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.

    2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.

    On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/

    You can notice that, while Chinese models are quite good, the gap to the top is still significant.

    However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).

    • coliveira 2 days ago

      Nothing you said helps with the issue of valuation. Yes, the US models may be better by a few percentage points, but how can they justify being so costly, both operationally as well as in investment costs? Over the long run, this is a business and you don't make money being the first, you have to be more profitable overall.

      • ben_w 2 days ago

        I think the investment race here is an "all-pay auction"*. Lots of investors have looked at the ultimate prize — basically winning something larger than the entire present world economy forever — and think "yes".

        But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet.

        * https://en.wikipedia.org/wiki/All-pay_auction

    • jodleif 2 days ago

      1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA.

      • brabel 2 days ago

        Qwen Image and Image Edit were among the best image models until Nano Banana Pro came along. I have tried some open image models and can confirm , the Chinese models are easily the best or very close to the best, but right now the Google model is even better... we'll see if the Chinese catch up again.

    • janalsncm a day ago

      > Chinese models typically focus on text

      Not true at all. Qwen has a VLM (qwen2 vl instruct) which is the backbone of Bytedance’s TARS computer use model. Both Alibaba (Qwen) and Bytedance are Chinese.

      Also DeepSeek got a ton of attention with their OCR paper a month ago which was an explicit example of using images rather than text.

    • raincole 2 days ago

      > video

      Most of AI-generated videos we see on social media now are made with Chinese models.

    • culi a day ago

      Qwen, Hunyuan, and WAN are three of the major competitors in the vision, text-to-image, and image-to-video spaces. They are quite competitive. Right now WAN is only behind Google's Veo in image-to-video rankings on llmarena for example

      https://lmarena.ai/leaderboard/image-to-video

    • torginus 2 days ago

      Thanks for sharing that!

      The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today?

      I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past)

      • espadrine 2 days ago

        Good question. There's 2 points to consider.

        • For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version. Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding

        • The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher.

        By the way, you can look at the Table tab to see all known and predicted results on benchmarks.

        • pama a day ago

          The table is confusing. It is not clear what is known and what is predicted (and how it is predicted). Why not measure the missing pieces instead of predicting—is it too expensive or is the tooling missing?

    • agumonkey 2 days ago

      forgive me for bringing politics into it, are chinese LLM more prone to censorship bias than US ones ?

      • coliveira 2 days ago

        Being open source, I believe Chinese models are less prone to censorship, since the US corporations can add censorship in several ways just by being a closed model that they control.

      • skeledrew 2 days ago

        It's not about a LLM being prone to anything, but more about the way a LLM is fine-tuned (which can be subject to the requirements of those wielding political power).

        • agumonkey 2 days ago

          that's what i meant even though i could have been more precise

      • erikhorton 2 days ago

        Yes extremely likely they are prone to censorship based on the training. Try running them with something like LM Studio locally and ask it questions the government is uncomfortable about. I originally thought the bias was in the GUI, but it's baked into the model itself.

  • jasonsb 2 days ago

    It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3.

    The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.

    Edit: It looks like Cerebras is offering a very fast GLM 4.6

    • irthomasthomas 2 days ago
      • jasonsb 2 days ago

        It doesn't work like that. You need to actually use the model and then go to /activity to see the actual speed. I constantly get 150-200tps from the Big 3 while other providers barely hit 50tps even though they advertise much higher speeds. GLM 4.6 via Cerebras is the only one faster than the closed source models at over 600tps.

        • irthomasthomas 2 days ago

          These aren't advertised speeds, they are the average measured speeds by openrouter across different providers.

    • observationist 2 days ago

      The network effects of using consistently behaving models and maintaining API coverage between updates is valuable, too - presumably the big labs are including their own domains of competence in the training, so Claude is likely to remain being very good at coding, and behave in similar ways, informed and constrained by their prompt frameworks, so that interactions will continue to work in predictable ways even after major new releases occur, and upgrades can be clean.

      It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front.

      Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work.

      • dworks 2 days ago

        Qwen 3 Coder Plus has been braindead this past weekend, but Codex 5.1 has also been acting up. It told me updating UI styling was too much work and I should do it myself. I also see people complaining about Claude every week. I think this is an unsolved problem, and you also have to separate perception from actual performance, which I think is an impossible task.

    • jodleif 2 days ago

      Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq?

    • DeathArrow 2 days ago

      > If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini.

      I think GLM 4.6 offered by Cerebras is much faster than any US model.

    • kachapopopow 2 days ago

      cerebras AI offers models at 50x the speed of sonnet?

      • baq a day ago

        if that's an honest question, the answer is pretty much yes, depending on model.

    • csomar 2 days ago

      According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude.

  • jazzyjackson 2 days ago

    Valuation is not based on what they have done but what they might do, I agree tho it's investment made with very little insight into Chinese research. I guess it's counting on deepseek being banned and all computers in America refusing to run open software by the year 2030 /snark

    • jodleif 2 days ago

      > Valuation is not based on what they have done but what they might do

      Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.

      • ksynwa 2 days ago

        Even if they do continue to lag behind they are a good bet against monopolisation by proprietary vendors.

        • coliveira 2 days ago

          They would if corporations were allowed to run these models. I fully expect the US government to prohibit corporations from doing anything useful with Chinese models (full censorship). It's the same game they use with chips.

    • bilbo0s 2 days ago

      >I guess it's counting on deepseek being banned

      And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.

      Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".

      "Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?

      • coliveira 2 days ago

        Exactly. "Business investment" these days means that the people involved will have at least some amount of power to determine the winning results.

  • Bolwin 2 days ago

    Third party providers rarely support caching.

    With caching the expensive US models end up being like 2x the price (e.g sonnet) and often much cheaper (e.g gpt-5 mini)

    If they start caching then US companies will be completely out priced.

  • newyankee 2 days ago

    Yet tbh if the US industry had not moved ahead and created the race with FOMO it would not had been easier for Chinese strategy to work either.

    The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?

  • fastball 2 days ago

    They're not that close (on things like LMArena) and being cheaper is pretty meaningless when we are not yet at the point where LLMs are good enough for autonomy.

  • mrinterweb 2 days ago

    I would expect one of the motivations for making these LLM model weights open is to undermine the valuation of other players in the industry. Open models like this must diminish the value prop of the frontier focused companies if other companies can compete with similar results at competitive prices.

  • rprend 2 days ago

    People pay for products, not models. OpenAI and Anthropic make products (ChatGPT, Claude Code).

  • isamuel 2 days ago

    There is a great deal of orientalism --- it is genuinely unthinkable to a lot of American tech dullards that the Chinese could be better at anything requiring what they think of as "intelligence." Aren't they Communist? Backward? Don't they eat weird stuff at wet markets?

    It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."

    • mosselman 2 days ago

      Back when deepseek came out and people were tripping over themselves shouting it was so much better than what was out there, it just wasn’t good.

      It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true.

      What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive.

      Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great!

      So I love everything about the competitive nature of this.

      • CamperBob2 2 days ago

        If you thought DeepSeek "just wasn't good," there's a good chance you were running it wrong.

        For instance, a lot of people thought they were running "DeepSeek" when they were really running some random distillation on ollama.

    • stocksinsmocks a day ago

      Early stages of Barbarossa were very successful and much of the Soviet Air Force, which had been forward positioned for invasion, was destroyed. Given the Red Army’s attitude toward consent, I would keep the praise carefully measured. TV has taught us there are good guys and bad guys when the reality is closer to just bad guys and bad guys

    • ecshafer 2 days ago

      I don't think that anyone, much less someone working in tech or engineering in 2025, could still hold beliefs about Chinese not being capable scientists or engineers. I could maybe give (the naive) pass to someone in 1990 thinking China will never build more than junk. But in 2025 their product capacity, scientific advancement, and just the amount of us who have worked with extremely talented Chinese colleagues should dispel those notions. I think you are jumping to racism a bit fast here.

      Germany was right in some ways and wrong in others for the soviet unions strength. USSR failed to conquer Finland because of the military purges. German intelligence vastly under-estimated the amount of tanks and general preparedness of the Soviet army (Hitler was shocked the soviets had 40k tanks already). Lend Lease act really sent an astronomical amount of goods to the USSR which allowed them to fully commit to the war and really focus on increasing their weapon production, the numbers on the amount of tractors, food, trains, ammunition, etc. that the US sent to the USSR is staggering.

      • hnfong 2 days ago

        I don't think anyone seriously believes that the Chinese aren't capable, it's more like people believe no matter what happens, USA will still dominate in "high tech" fields. A variant of "American Exceptionalism" so to speak.

        This is kinda reflected in the stock market, where the AI stocks are surging to new heights every day, yet their Chinese equivalents are relatively lagging behind in stock price, which suggests that investors are betting heavily on the US companies to "win" this "AI race" (if there's any gains to be made by winning).

        Also, in the past couple years (or maybe a couple decades), there had also been a lot of crap talk about how China has to democratize and free up their markets in order to be competitive with the other first world countries, together with a bunch of "doomsday" predictions for authoritarianism in China. This narrative has completely lost any credibility, but the sentiment dies slowly...

    • breppp 2 days ago

      Not sure how the entire Nazi comparison plays out, but at the time there were good reasons to imagine the Soviets will fall apart (as they initially did)

      Stalin just finished purging his entire officer corps, which is not a good omen for war, and the USSR failed miserably against the Finnish who were not the strongest of nations, while Germany just steamrolled France, a country that was much more impressive in WW1 than the Russians (who collapsed against Germany)

    • newyankee 2 days ago

      but didn't Chinese already surpass the rest of the world in Solar, batteries, EVs among other things ?

      • cyberlimerence 2 days ago

        They did, but the goalposts keep moving, so to speak. We're approximately here : advanced semiconductors, artificial intelligence, reusable rockets, quantum computing, etc. Chinese will never catch up. /s

    • lukan 2 days ago

      "It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; ..."

      Ideology played a role, but the data they worked with, was the finnish war, that was disastrous for the sowjet side. Hitler later famously said, it was all a intentionally distraction to make them believe the sowjet army was worth nothing. (Real reasons were more complex, like previous purging).

    • gazaim 2 days ago

      These Americans have no comprehension of intelligence being used to benefit humanity instead of being used to fund a CEO's new yacht. I encourage them to visit China to see how far the USA lags behind.

      • astrange a day ago

        Lags behind meaning we haven't covered our buildings in LEDs?

        America is mostly suburbs and car sewers but that's because the voters like it that way.

    • littlestymaar 2 days ago

      > It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed

      Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

      • gazaim 2 days ago

        > Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

        There were many counter revolutionaries among the leadership, even those conducting the purges. Stalin was like "ah fuck we're hella compromised." Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.

        • littlestymaar 2 days ago

          > There were many counter revolutionaries among the leadership

          Well, Stalin was, by far, the biggest counter-revolutionary in the Politburo.

          > Stalin was like "ah fuck we're hella compromised."

          There's no evidence that anything significant was compromised at that point, and clear evidence that Stalin was in fact medically paranoid.

          > Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.

          Can we please not recycle 90-years old soviet propaganda? The Moscow trial being irrational self-harm was acknowledged by the USSR leadership as early as the fifties…

Havoc 2 days ago

Note combination of big frontier level model and MIT license.

nickstinemates a day ago

I am waiting for the first truly open model without any of the censorship built in.

I wonder how long it will take and how quickly it will try to get shut down.

  • naeq a day ago

    Most open models have been converted to uncensored versions. Search for the model name with the suffix "abliterated".

spullara 2 days ago

I hate that their model ids don't change as they change the underlying model. I'm not sure how you can build on that.

  % curl https://api.deepseek.com/models \          
    -H "Authorization: Bearer ${DEEPSEEK_API_KEY}"  
  {"object":"list","data":[{"id":"deepseek-chat","object":"model","owned_by":"deepseek"},{"id":"deepseek-reasoner","object":"model","owned_by":"deepseek"}]}
  • KronisLV 2 days ago

    Oh hey, quality improvement without doing anything!

    (unless/until a new version gets worse for your use case)

  • hnfong 2 days ago

    Agree that having datestamps on model ids is a good idea, but it's open source, you can download the weights and build on those. In the long run, this is better than the alternative of calling API of a proprietary model and hoping it doesn't get deprecated.

  • deaux a day ago

    Anthropic has done similar before (changing model behavior on the same dated endpoint).

  • cherioo a day ago

    Allegedly deepseek is doing this because they don’t have enough gpu to serve two models concurrently.

htrp 2 days ago

what is the ballpark vram / gpu requirement to run this ?

  • rhdunn 2 days ago

    For just the model itself: 4 x params at F32, 2 x params at F16/BF16, or 1 x params at F8, e.g. 685GB at F8. It will be smaller for quantizations, but I'm not sure how to estimate those.

    For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.

    You'll also need space for the context window; I'm not sure how to calculate that either.

    • anvuong 2 days ago

      I think your understanding of MoE is wrong. Depending on the settings, each token can actually be routed to multiple experts, called experts choice architecture. This makes it easier to parallelize the inference (each expert on a different device for example), but it's not simply just keeping one expert in memory.

    • petu 2 days ago

      I think your idea of MoE is incorrect. Despite the name they're not "expert" at anything in particular, used experts change more or less on each token -- so swapping them into VRAM is not viable, they just get executed on CPU (llama.cpp).

      • jodleif 2 days ago

        A common pattern is to offload (most of) the expert layers to the CPU. This combination is still quite fast even with slow system ram, though obviously inferior to a pure VRAM loading

BoorishBears 2 days ago

3.2-Exp came out in September: this is 3.2, along with a special checkpoint (DeepSeek-V3.2-Speciale) for deep reasoning that they're claiming surpasses GPT-5 and matches Gemini 3.0

https://x.com/deepseek_ai/status/1995452641430651132

  • deaux a day ago

    The assumption here is that 3.2 (without suffix) is an evolution of 3.2-Exp rather than being the same model, but they don't seem to be explicitly stating anywhere whether they're actually different or that they just made the same model GA.

johnxie a day ago

Cool to see open models catching up fast. For builders the real question is simple. Which model gives you the tightest loop and the least surprises in production. Sometimes that is open. Sometimes closed. The rest is noise.

twistedcheeslet 2 days ago

How capable are these models at tool calling?

  • segmondy a day ago

    Depends on if they are trained for tool calling, this model is experimentation with new architecture, training methods, etc. It's not designed for tool calling. If you want to tool call, then you should look into DeepSeekv3.1-Terminus.

  • potsandpans 2 days ago

    From some very brief experimentation with deepseek about 2 months ago, tool calling is very hot or miss. Claude appears to be the absolute best.

orena 2 days ago

Any results on frontier math or arc ?

lalassu 2 days ago

Disclaimer: I did not test this yet.

I don't want to make big generalizations. But one thing I noticed with chinese models, especially Kimi, is that it does very well on benchmarks, but fails on vibe testing. It feels a little bit over-fitting to the benchmark and less to the use cases.

I hope it's not the same here.

  • msp26 2 days ago

    K2 Thinking has immaculate vibes. Minimal sycophancy and a pleasant writing style while being occasionally funny.

    If it had vision and was better on long context I'd use it so much more.

  • CuriouslyC a day ago

    This was a bad problem with earlier Chinese (Qwen and Kimi K1 in particular) models, but the original DeepSeek delivered and GLM4.6 delivers. They don't diversify training as much as American labs so you'll find more edge cases and the interaction experience isn't quite as smooth, but the models put in work.

  • vorticalbox 2 days ago

    This used to happen with bench marks on phones, manufacturers would tweak android so benchmarks ran faster.

    I guess that’s kinda how it is for any system that’s trained to do well on benchmarks, it does well but rubbish at everything else.

    • make3 2 days ago

      yes, they turned off all energy economy measures when benchmarking software activity was detected, which completely broke the point of the benchmarks because your phone is useless if it's very fast but the battery lasts one hour

  • segmondy a day ago

    Weird, I have gone local for the last 2 years. I use Chinese models 90% of the time, Kimi K2 Thinking, DeepSeekv3.Terminus, Qwen3 and GLM4.6. I'm not vibe testing it but really putting them to use and they do keep up great.

  • nylonstrung a day ago

    My experience with deepseek and Kimi is quite the opposite: smarter than benchmarks would imply

    Whereas the benchmark gains seem by new OpenAI, Grok and Claude models don't feel accompanied by vibe improvement

  • not_that_d 2 days ago

    What is "Vibe testing"?

    • catigula 2 days ago

      He means capturing things that benchmarks don't. You can use Claude and GPT-5 back-to-back in a field that score nearly identically on. You will notice several differences. This is the "vibe".

    • BizarroLand 2 days ago

      I would assume that it is testing how well and appropriately the LLM responds to prompts.

    • [removed] 2 days ago
      [deleted]
  • make3 2 days ago

    I would assume that huge amount is spent in frontier models just making the models nicer to interact with, as it is likely one of the main things that drives user engagement.

  • catigula 2 days ago

    This is why I stopped bothering checking out these models and, funnily enough, grok.

sidcool a day ago

Can someone kind please ELI5 this paper?

  • HarHarVeryFunny 21 hours ago

    They've developed a sparse attention mechanism (which they document and release source code for) to increase model efficiency with long context, as needed for fast & cost-effective extensive RL training for reasoning and agentic use

    They've built a "stable & scalable" RL protocol - more capable RL training infrastructure

    They've built a pipeline/process to generate synthetic data for reasoning and agentic training

    These all combine to build an efficient model with extensive RL post-training for reasoning and agentic use, although they note work is still needed on both the base model (more knowledge) and post-training to match frontier performance.

[removed] 2 days ago
[deleted]
catigula 2 days ago

To push back on naivety I'm sensing here I think it's a little silly to see Chinese Communist Party backed enterprise as somehow magnanimous and without ulterior, very harmful motive.

  • stared 2 days ago

    Do you think it is from goodness of their heart that corporates support open source? E.g. Microsoft - VSCode and Typescript, Meta - PyTorch and React, Google - Chromium and Go.

    Yet, we (developers, users, human civilization), benefit from that.

    So yes, I cherish when Chinese companies release open source LLMs. Be it as it fits their business model (the same way as US companies) or from grants (the same way as a lot of EU-backed projects, e.g. Python, DuckDB, scikit-learn).

  • jascha_eng 2 days ago

    Oh they need control of models to be able to censor and ensure whatever happens inside the country with AI stays under their control. But the open-source part? Idk I think they do it to mess with the US investment and for the typical open source reasons of companies: community, marketing, etc. But tbh especially the messing with the US, as a european with no serious competitor, I can get behind.

    • catigula 2 days ago

      They're pouring money to disrupt American AI markets and efforts. They do this in countless other fields. It's a model of massive state funding -> give it away for cut-rate -> dominate the market -> reap the rewards.

      It's a very transparent, consistent strategy.

      AI is a little different because it has geopolitical implications.

      • ForceBru 2 days ago

        When it's a competition among individual producers, we call it "a free market" and praise Hal Varian. When it's a competition among countries, it's suddenly threatening to "disrupt American AI markets and efforts". The obvious solution here is to pour money into LLM research too. Massive state funding -> provide SOTA models for free -> dominate the market -> reap the rewards (from the free models).

      • tokioyoyo 2 days ago

        I can’t believe I’m shilling for China in these comments, but how different it is for company A getting blank check investments from VCs and wink-wink support from the government in the west? And AI-labs in China has been getting funding internally in the companies for a while now, before the LLM-era.

    • ptsneves 2 days ago

      This is the rare earth minerals dumping all over again. Devalue to such a price as to make the market participants quit, so they can later have a strategic stranglehold on the supply.

      This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it.

      It is a kind of cheat on the fair market but at the same time it is also costly to China and its capital costs may become unsustainable before the last players fold.

      • coliveira 2 days ago

        > cheat on the fair market

        Can you really view this as a cheat this when the US is throwing a trillion dollars in support of a supposedly "fair market"?

      • embedding-shape 2 days ago

        > This is using open source in a bit of different spirit than the hacker ethos, and I am not sure how I feel about it.

        It's a bit early to have any sort of feelings about it, isn't it? You're speaking in absolutes, but none of this is necessarily 100% true as we don't know their intentions. And judging a group of individuals intention based on what their country seems to want, from the lens of a foreign country, usually doesn't land you with the right interpretation.

      • tokioyoyo 2 days ago

        I mentioned this before as well, but AI-competition within China doesn’t care that much about the western companies. Internal market is huge, and they know winner-takes-it-all in this space is real.

      • nextaccountic a day ago

        Where exactly is this fair market? Giant US companies love rules and regulations, but only when it benefits them (and they pay dearly for it)

      • Jedd 2 days ago

        > It is a kind of cheat on the fair market ...

        I am very curious on your definition and usage of 'fair' there, and whether you would call the LLM etc sector as it stands now, but hypothetically absent deepseek say, a 'fair market'. (If not, why not?)

      • csomar a day ago

        Prosecutor, judge and jury? You have access to their minds to know their true intentions? This whole “deepseek is controlled by CCP” is ridiculous. If you want to know how bad the CCP is at IT, then check the government backed banks.

        The way I see this, some tech teams in China have figured out that training and tuning LLMs is not that expensive after all and they can do it at a fraction of the cost. So they are doing it to enter a market previously dominated by US only players.

      • DiogenesKynikos 2 days ago

        Are you by chance an OpenAI investor?

        We should all be happy about the price of AI coming down.

        • doctorwho42 2 days ago

          But the economy!!! /s

          Seriously though, our leaders are actively throwing everything and the kitchen sink into AI companies - in some vain attempt to become immortal or own even more of the nations wealth beyond what they already do, chasing some kind of neo-tech feudalism. Both are unachievable because they rely on a complex system that they clearly don't understand.

      • jsiepkes 2 days ago

        The way we fund the AI bubble in the west could also be described as: "kind of cheat on the fair market". OpenAI has never made a single dime of profit.

        • nylonstrung a day ago

          Yeah and OpenAI's CPO was artificially commissioned as a Lt. Colonel in the US Army in conjunction with a $200M contract

          Absurd to say Deepseek is CCP controlled while ignoring the govt connection here

      • josh_p 2 days ago

        Isn’t it already well accepted that the LLM market exists in a bubble with a handful of companies artificially inflating their own values?

        ESH

      • jascha_eng 2 days ago

        Do they actually spend that much though? I think they are getting similar results with much fewer resources.

        It's also a bit funny that providing free models is probably the most communist thing China has done in a long time.

      • CamperBob2 2 days ago

        Good luck making OpenAI and Google cry uncle. They have the US government on their side. They will not be allowed to fail, and they know it.

        What I appreciate about the Chinese efforts is that they are being forced to get more intelligence from less hardware, and they are not only releasing their work products but documenting the R&D behind them at least as well as our own closed-source companies do.

        A good reason to stir up dumping accusations and anti-China bias would be if they stopped publishing not just the open-source models, but the technical papers that go with them. Until that happens, I think it's better to prefer more charitable explanations for their posture.

      • deaux a day ago

        Ah, so exactly like Uber, Netflix, Microsoft, Amazon, Facebook and so on have done to the rest of the world over the last few decades then?

        Where do you think they learnt this trick? Years lurking on HN and this post's comment section wins #1 on the American Hypocrisy chart. Unbelievable that even in the current US people can't recognize when they're looking in the mirror. But I guess you're disincentivized to do so when most of your net worth stems from exactly those companies and those practices.

    • [removed] 2 days ago
      [deleted]
  • amunozo 2 days ago

    The motive is to destroy the American supremacy on AI, it's not that deep. This is much easier to do open sourcing the models than competing directly, and this can have good ramifications for everybody, even if the motive is "bad".

  • tehjoker 2 days ago

    the motive is to prevent us dominance of this space, which is a good thing

    • mac-attack 16 hours ago

      And the next question is what have they some with power historically, and what are they liable to do in the future with said power. Limiting scope to AI is shortsighted and doesn't speak to the concerns people have beyond an Ai Race

      • tehjoker 16 hours ago

        It's a fair question, but my view of America's influence on world affairs has been dismal. China by contrast has not had a history of invading its neighbors, though I strongly criticize their involvement in the American attack on Cambodia and Vietnam (China supported the Khmer Rouge and briefly invaded Vietnam but was quickly pushed back, a reason Mao is sometimes criticized as having a good early period and a bad late period).

        Meanwhile, America has been causing death and destruction around the world. It's easy to make lists: Vietnam, Iraq, Gaza, Cuba, South and Central America etc etc.

  • [removed] 2 days ago
    [deleted]
nimchimpsky 2 days ago

Pretty amazing that a relatively small Chinese hedge fund can build AI better than almost anyone.

  • Havoc 2 days ago

    Yeah they've consistently delivered. At the same time there are persistent whispers that they're not all that small and scruffy as portrayed either.

    • astrange a day ago

      Anthropic also said their development costs aren't very different.

Foobar8568 2 days ago

At least, there is no doubt where he is from !

which version are you?

我是DeepSeek最新版本模型! 如果你想了解具体的版本号信息,我建议你:

    查看官方文档 - DeepSeek官网和文档会有最准确的版本信息

    关注官方公告 - 版本更新通常会在官方渠道公布

    查看应用商店/网页版 - 使用界面通常会显示当前版本
我具备DeepSeek的所有最新功能特性,包括:

    强大的对话和推理能力

    128K上下文长度

    文件上传处理(图像、文档等)

    联网搜索功能(需手动开启)

    完全免费使用
如果你需要知道精确的版本号用于技术对接或其他特定用途,最好直接查阅官方技术文档,那里会有最准确和详细的技术规格说明。

有什么其他问题我可以帮你解答吗?

  • schlauerfox 2 days ago

    It's so strange when it obviously hits a preprogrammed non-answer in these models, how can one ever trust them when there is a babysitter that interferes in an actual answer. I suppose that asking it what version it is isn't a valid question in it's training data so it's programmed to say check the documentation, but still definitely suspicious when it gives a non-answer.

wosined 2 days ago

Remember: If it is not peer-reviewed, then it is an ad.

  • Havoc 2 days ago

    Good general approach, but deepseek has thus far always delivered. And not just delivered, but under open license too. "Ad" as starting assumption seems overly harsh

  • vessenes 2 days ago

    I mean.. true. Also, DeepSeek has good cred so far on delivering roughly what their PR says they are delivering. My prior would be that their papers are generally credible.