storus 9 hours ago

Does Qwen3 allow adjusting context during an LLM call or does the housekeeping need to be done before/after each call but not when a single LLM call with multiple tool calls is in progress?

  • segmondy 9 hours ago

    Not applicable... the models just process whatever context you provide to them, context management happens outside of the model and depends on your inference tool/coding agent.

    • cyanydeez 7 hours ago

      It's interesting how people can be so into LLMs but dont, at the end of the day, understand they're just passing "well formatted" text to a text processor and everything else is build around encoding/decoding it into familiar or novel interfaces & the rest.

      The instability of the tooling outside of the LLM is what keeps me from building anything on the cloud, because you're attaching your knowledge and work flow to a tool that can both change dramatically based on context, cache, and model changes and can arbitrarily raise prices as "adaptable whales" push the cost up.

      Its akin to learning everything about beanie babies in the early 1990's and right when you think you understand the value proposition, suddenly they're all worthless.

      • storus 5 hours ago

        That's why you can use latest open coding models locally that reportedly reached the performance of Sonet-4.5 so almost SOTA. And then you can think of tricks like I mentioned above to directly manipulate GPU RAM for context cleanup when needed which is not possible with cloud models unless their provider enables that.

orliesaurus 10 hours ago

how can anyone keep up with all these releases... what's next? Sonnet 5?

  • gessha 10 hours ago

    Tune it out, come back in 6 months, the world is not going to end. In 6 months, you’re going to change your API endpoint and/or your subscription and then spend a day or two adjusting. Off to the races you go.

  • Squarex 10 hours ago

    Well there are rumors sonnet 5 is coming today, so...

  • Havoc 6 hours ago

    Pretty much every lab you can think of has something scheduled for february. Gonna be a wild one

  • cmrdporcupine 6 hours ago

    This is going to be a crazy month because the Chinese labs are all trying to get their releases out prior to their holidays (Lunar New Year / Spring Festival).

    So we've seen a series of big ones already -- GLM 4.7 Flash, Kimi 2.5, StepFun 3.5, and now this. Still to come is likely a new DeepSeek model, which could be exciting.

    And then I expect the Big3, OpenAI/Google/Anthropic will try to clog the airspace at the same time, to get in front of the potential competition.

  • bigyabai 8 hours ago

    Relatively, it's not that hard. There's like 4-5 "real" AI labs, who altogether manage to announce maybe 3 products max, per-month.

    Compared to RISC core designs or IC optimization, the pace of AI innovation is slow and easy to follow.

ltbarcly3 3 hours ago

Here's a tip: Never name anything new, next, neo, etc. You will have a problem when you try to name the thing after that!

StevenNunez 8 hours ago

Going to try this over Kimi k2.5 locally. It was nice but just a bit too slow and a resource hog.

endymion-light 11 hours ago

Looks great - i'll try to check it out on my gaming PC.

On a misc note: What's being used to create the screen recordings? It looks so smooth!

ossicones 10 hours ago

What browser use agent are they using here?

  • novaray 7 hours ago

    Yes, the general purpose version is already supported and should have the same identical architecture

fudged71 9 hours ago

I'm thrilled. Picked up a used M4 Pro 64GB this morning. Excited to test this out

throwaw12 10 hours ago

We are getting there, as a next step please release something to outperform Opus 4.5 and GPT 5.2 in coding tasks

  • gordonhart 10 hours ago

    By the time that happens, Opus 5 and GPT-5.5 will be out. At that point will a GPT-5.2 tier open-weights model feel "good enough"? Based on my experience with frontier models, once you get a taste of the latest and greatest it's very hard to go back to a less capable model, even if that less capable model would have been SOTA 9 months ago.

    • cirrusfan 10 hours ago

      I think it depends on what you use it for. Coding, where time is money? You probably want the Good Shit, but also want decent open weights models to keep prices sane rather than sama’s 20k/month nonsense. Something like a basic sentiment analysis? You can get good results out of a 30b MoE that runs at good pace on a midrange laptop. Researching things online with many sources and decent results I’d expect to be doable locally by the end of 2026 if you have 128GB ram, although it’ll take a while to resolve.

      • bwestergard 10 hours ago

        What does it mean for U.S. AI firms if the new equilibrium is devs running open models on local hardware?

        • selectodude 10 hours ago

          OpenAI isn’t cornering the market on DRAM for kicks…

    • yorwba 10 hours ago

      When Alibaba succeeds at producing a GPT-5.2-equivalent model, they won't be releasing the weights. They'll only offer API access, like for the previous models in the Qwen Max series.

      Don't forget that they want to make money in the end. They release small models for free because the publicity is worth more than they could charge for them, but they won't just give away models that are good enough that people would pay significant amounts of money to use them.

    • tosh 10 hours ago

      It feels like the gap between open weight and closed weight models is closing though.

      • theshrike79 10 hours ago

        Mode like open local models are becoming "good enough".

        I got stuff done with Sonnet 3.7 just fine, it did need a bunch of babysitting, but still it was a net positive to productivity. Now local models are at that level, closing up on the current SOTA.

        When "anyone" can run an Opus 4.5 level model at home, we're going to be getting diminishing returns from closed online-only models.

    • thepasch 9 hours ago

      If an open weights model is released that’s as capable at coding as Opus 4.5, then there’s very little reason not to offload the actual writing of code to open weight subagents running locally and stick strictly to planning with Opus 5. Could get you masses more usage out of your plan (or cut down on API costs).

    • rglullis 10 hours ago

      I'm going in the opposite direction: with each new model, the more I try to optimize my existing workflows by breaking the tasks down so that I can delegate tasks to the less powerful models and only rely on the newer ones if the results are not acceptable.

    • rubslopes 7 hours ago

      I used to say that Sonnet 4.5 was all I would ever need, but now I exclusively use Opus...

    • littlestymaar 7 hours ago

      > Based on my experience with frontier models, once you get a taste of the latest and greatest it's very hard to go back to a less capable model, even if that less capable model would have been SOTA 9 months ago.

      That's the tyranny of comfort. Same for high end car, living in a big place, etc.

      There's a good work around though: just don't try the luxury in the first place so you can stay happy with the 9 months delay.

  • Keyframe 9 hours ago

    I'd be happy with something that's close or same as opus 4.5 that I can run locally, at reasonable (same) speed as claude cli, and at reasonable budget (within $10-30k).

  • segmondy 9 hours ago

    Try KimiK2.5 and DeepSeekv3.2-Speciale

  • IhateAI 9 hours ago

    Just code it yourself, you might surprise yourself :)

valcron1000 10 hours ago

Still nothing to compete with GPT-OSS-20B for local image with 16 VRAM.

dzonga 6 hours ago

the qwen website doesn't work for me in safari :(. had to read the announcement in chrome

kylehotchkiss 6 hours ago

Is there any online resource tracking local model capability on say... a $2000 64gb memory Mac Mini? I'm getting increasingly excited about the local model space because it offers us a future where we can benefit from LLMs without having to listen to tech CEOs saber rattle about removing America of its jobs so they can get the next fundraising round sorted

syntaxing 10 hours ago

Is Qwen next architecture ironed out in llama cpp?

moron4hire 8 hours ago

My IT department is convinced these "ChInEsE cCcP mOdElS" are going to exfiltrate our entire corporate network of its essential fluids and vita.. erh, I mean data. I've tried explaining to them that it's physically impossible for model weights to make network requests on their own. Also, what happened to their MitM-style, extremely intrusive network monitoring that they insisted we absolutely needed?

cpill 6 hours ago

I wonder if we could have much smaller models if they train on less languages? i.e. python + yaml + json only or even an single languages with an cluster of models loaded into memory dynamically...?

lysace 5 hours ago

Is it censored according to the wishes of the CCP?

  • mirekrusin 4 hours ago

    Who cares? If you don't like it, you can fine tune.

    • lysace 4 hours ago

      I think a lot of people care. Most decidedely not you.

    • [removed] 4 hours ago
      [deleted]
Soerensen 10 hours ago

The agent orchestration point from vessenes is interesting - using faster, smaller models for routine tasks while reserving frontier models for complex reasoning.

In practice, I've found the economics work like this:

1. Code generation (boilerplate, tests, migrations) - smaller models are fine, and latency matters more than peak capability 2. Architecture decisions, debugging subtle issues - worth the cost of frontier models 3. Refactoring existing code - the model needs to "understand" before changing, so context and reasoning matter more

The 3B active parameters claim is the key unlock here. If this actually runs well on consumer hardware with reasonable context windows, it becomes the obvious choice for category 1 tasks. The question is whether the SWE-Bench numbers hold up for real-world "agent turn" scenarios where you're doing hundreds of small operations.

  • cirrusfan 10 hours ago

    I find it really surprising that you’re fine with low end models for coding - I went through a lot of open-weights models, local and "local", and I consistently found the results underwhelming. The glm-4.7 was the smallest model I found to be somewhat reliable, but that’s a sizable 350b and stretches the definition of local-as-in-at-home.

    • NitpickLawyer 10 hours ago

      You're replying to a bot, fyi :)

      • CamperBob2 9 hours ago

        If it weren't for the single em-dash (really an en-dash, used as if it were an em-dash), how am I supposed to know that?

        And at the end of the day, does it matter?

        • axus 7 hours ago

          Some people reply for their own happiness, some reply to communicate with another person. The AI won't remember or care about the reply.