Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model

285 points by c4pt0r 2 days ago

GitHub: https://github.com/MoonshotAI/Kimi-K2

I tried Kimi on a few coding problems that Claude was spinning on. It’s good. It’s huge, way too big to be a “local” model — I think you need something like 16 H200s to run it - but it has a slightly different vibe than some of the other models. I liked it. It would definitely be useful in ensemble use cases at the very least.

Reply View 25 replies

summarity a day ago

Reasonable speeds are possible with 4bit quants on 2 512GB Mac Studios (MLX TB4 Ring - see https://x.com/awnihannun/status/1943723599971443134) or even a single socket Epyc system with >1TB of RAM (about the same real world memory throughput as the M Ultra). So $20k-ish to play with it.
For real-world speeds though yeah, you'd need serious hardware. This is more of a "deploy your own stamp" model, less a "local" model.

Reply View | 18 replies
- wongarsu 19 hours ago
  
  Reasonable speeds are possible if you pay someone else to run it. Right now both NovitaAI and Parasail are running it, both available through Openrouter and both promising not to store any data. I'm sure the other big model hosters will follow if there's demand.
  I may not be able to reasonably run it myself, but at least I can choose who I trust to run it and can have inference pricing determined by a competitive market. According to their benchmarks the model is about in a class with Claude 4 Sonet, yet already costs less than one third of Sonet's inference pricing
  
  Reply View | 3 replies
  
  winter_blue 14 hours ago
  
  I’m actually finding Claude 4 Sonnet’s thinking model to be too slow to meet my needs. It literally takes several minutes per query on Cursor.
  So running it locally is the exact opposite of what I’m looking for.
  Rather, I’m willing to pay more, to have it be run on a faster than normal cloud inference machine.
  Anthropic is already too slow.
  Since this model is open source, maybe someone could offer it at a “premium” pay per use price, where the response rate / inference is done a lot faster, with more resources thrown at it.
  
  Reply View | 2 replies
- gpm a day ago
  
  > or even a single socket Epyc system with >1TB of RAM
  How many tokens/second would this likely achieve?
  
  Reply View | 3 replies
  
  [removed] a day ago
  
  [deleted]
  
  Reply View | 0 replies
  
  kachapopopow 19 hours ago
  
  around 1 by the time you try to do anything useful with it (>10000 tokens)
  
  Reply View | 0 replies
  
  neuroelectron a day ago
  
  1
  
  Reply View | 0 replies
- refulgentis a day ago
  
  I write a local LLM client, but sometimes, I hate that local models have enough knobs to turn that people can advocate they're reasonable in any scenario - in yesterday's post re: Kimi k2, multiple people spoke up that you can "just" stream the active expert weights out of 64 GB of RAM, and use the lowest GGUF quant, and then you get something that rounds to 1 token/s, and that is reasonable for use.
  Good on you for not exaggerating.
  I am very curious what exactly they see in that, 2-3 people hopped in to handwave that you just have it do agent stuff overnight and it's well worth it. I can't even begin to imagine unless you have a metric **-ton of easily solved problems that aren't coding. Even a 90% success rate gets you into "useless" territory quick when one step depends on the other, and you're running it autonomoously for hours
  
  Reply View | 7 replies
  
  segmondy a day ago
  
  I do deepseek at 5tk/sec at home and I'm happy with it. I don't need to do agent stuff to gain from it, I was saving to eventually build out enough to run it at 10tk/sec, but with kimi k2, plan has changed and the savings continue with a goal to run it at 5 tk/sec at home.
  
  Reply View | 6 replies
- spaceman_2020 12 hours ago
  
  This is fairly affordable if you’re a business honestly
  
  Reply View | 0 replies
- tuananh a day ago
  
  looks very much usable for local usage.
  
  Reply View | 0 replies
handzhiev a day ago

I tried it a couple of times in comparison to Claude. Kimi wrote much simpler and more readable code than Claude's over-engineered solutions. It missed a few minor subtle edge cases that Claude took care of though.

Reply View | 0 replies
airstrike a day ago

Claude what? Sonnet? 3.7? 3.5? Opus? 4?

Reply View | 0 replies
nathan_compton a day ago

The first question I gave it (a sort of pretty simple recreational math question I asked it to code up for me) and it was outrageously wrong. In fairness, and to my surprise, OpenAI's model also failed with this task, although with some prompting, sort of got it.

Reply View | 0 replies
[removed] a day ago

[deleted]

Reply View | 0 replies
moffkalast a day ago

Still pretty good, someone with enough resources could distil it down to a more manageable size for the rest of us.

Reply View | 0 replies
Xmd5a 9 hours ago
I asked it to give me its opinion on a mail I'm writing. 95% of its content is quotes from famous authors, and the 5% I wrote is actually minimal glue in-between.
All the models I tested, which includes Sonnet 4, DeepSeekR1, 4o and Gemini 2.5 understand this isn't your normal email and what I ask is literary/philosophical criticism, not remarks about conventions, formatting or how to convey my message in a more impactful way.
Quick fix: - Replace half the commas with periods. - Let one metaphor bloom and then stop. - Read it aloud; wherever you inhale, insert a line break. Before: >And if the collective imagination is so seized by it, it's because we are already—not in some other world—but in this very life, in a state of socio-, photo-, and videosynthesis. After: >The collective imagination is seized. >We are not in another world. >We are in this life, in a state of videosynthesis.
Yes, this quote is by Baudrillard. None of the other models fixated on the fact that it’s an email (I only used the word once in the prompt). My gut feeling is that this reflects not so much a lack of intelligence as a difference in model personality. Here's what it replied when I shared Gemini's analysis. The point was to have it understand that:
Gemini, second sentence: It is less a simple email than an intellectual and personal exploration still in the making.
Point deflected as soon as understood:
**Gemini is polite and pedagogical; I’m going to be blunt.** 1. **This isn’t an email, it’s a trip.** ... 2. **The content is good, the format is not.** ... 3. **Gemini is right about one thing:** ... * cut 80% of the material, * ask **one clear question** * put the rest in a separate Google Doc or PDF for posterity. 4. **Brutal summary:** Keep your draft for yourself — it’s valuable. But for the email, write a new one in **three sentences**: * thank them, * recall the context in one line, * ask the question. You’ve already got Gemini’s answer to stroke your ego — now give me the short version you’re actually going to send.
The solution to sycophancy is not disparagement (misplaced criticism). The classical true/false positive/negative dilemma is at play here. I guess the bot got caught in the crossfire of 1°) its no-bullshit attitude (it can only be an attitude) 2°) preference for delivering blunt criticism over insincere flattery 3°) being a helpful assistant. Remove point 3°), and it could have replied: "I'm not engaging in this nonsense". Preserve it and it will politely suggest that you condense your bullshit text, because shorter explanations are better than long winding rants (it's probably in the prompt).
Reply View | 0 replies

ozgune a day ago

This is a very impressive general purpose LLM (GPT 4o, DeepSeek-V3 family). It’s also open source.

I think it hasn’t received much attention because the frontier shifted to reasoning and multi-modal AI models. In accuracy benchmarks, all the top models are reasoning ones:

https://artificialanalysis.ai/

If someone took Kimi k2 and trained a reasoning model with it, I’d be curious how that model performs.

Reply View 2 replies

GaggiX a day ago

>If someone took Kimi k2 and trained a reasoning model with it
I imagine that's what they are going at MoonshotAI right now

Reply View | 0 replies
Alifatisk a day ago

Why hasn’t Kimis current and older models been benchmarked and added to Artificial analysis yet?

Reply View | 0 replies

simonw 2 days ago

Pelican on a bicycle result: https://simonwillison.net/2025/Jul/11/kimi-k2/

Reply View 11 replies

ebiester 2 days ago

At this point, they have to be training it. At what point will you start using something else?

Reply View | 1 reply
- simonw 2 days ago
  
  Once I get a picture that genuinely looks like a pelican riding a bicycle!
  
  Reply View | 0 replies
qmmmur 2 days ago

I'm glad we are looking to build nuclear reactors so we can do more of this...

Reply View | 5 replies
- 1vuio0pswjnm7 37 minutes ago
  
  "I'm glad we are looking to build nuclear reactors so we can do more of this..."
  Does this actually mean "they" not "we"
  
  Reply View | 0 replies
- sergiotapia a day ago
  
  me too - we must energymaxx. i want a nuclear reactor in my backyard powering everything. I want ac units in every room and my open door garage while i workout.
  
  Reply View | 3 replies
  
  GenerWork a day ago
  
  You're saying this in jest, but I would LOVE to have a nuclear reactor in my backyard that produced enough power to where I could have a minisplit for every room in my house, including the garage so I could work out in there.
  
  Reply View | 2 replies
csomar 2 days ago

Much better than that of Grok 4.

Reply View | 0 replies
jug a day ago

That's perhaps the best one I've seen yet! For an open weight model, this performance is of course particularly remarkable and impactful.

Reply View | 0 replies
_alex_ 2 days ago

wow!

Reply View | 0 replies

exegeist a day ago

Technical strengths aside, I’ve been impressed with how non-robotic Kimi K2 is. Its personality is closer to Anthropic’s best: pleasant, sharp, and eloquent. A small victory over botslop prose.

Reply View 1 reply

orbital-decay 10 hours ago

I have a different experience in chatting/creative writing. It tends to overuse certain speech patterns without repeating them verbatim, and is strikingly close to the original R1 writing, without being "chaotic" like R1 - unexpected and overly dramatic sci-fi and horror story turns, "somewhere, X happens" at the end etc.
Interestingly enough, EQ-Bench/Creative Writing Bench doesn't spot this despite clearly having it in their samples. This makes me trust it even less.

Reply View | 0 replies

ksec 7 hours ago

Kimi K2 is the large language model series developed by Moonshot AI team.

Moonshot AI [1] (Moonshot; Chinese: 月之暗面; pinyin: Yuè Zhī Ànmiàn) is an artificial intelligence (AI) company based in Beijing, China. As of 2024, it has been dubbed one of China's "AI Tiger" companies by investors with its focus on developing large language models.

I guess everyone is up to date with AI stuff but this is the first time I heard of Kimi and Moonshot and was wondering where it is from. And it wasn't obvious from a quick glance of comments.