Comment by fzysingularity

Comment by fzysingularity 2 days ago

If I had to guess, the OpenAI open-source model got delayed because Kimi K2 stole their thunder and beat their numbers.

Someone at openai did say it was too big to host at home, so you could be right. They will probably be benchmaxxing, right now, searching for a few evals they can beat.

Reply View 12 replies

johnb231 a day ago

These are all "too big to host at home". I don't think that is the issue here.
https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_...
"The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP)."
16 GPUs costing ~$30k each. No one is running a ~$500k server at home.

Reply View | 11 replies
- weitendorf a day ago
  
  For most people, before it makes sense to just buy all the hardware yourself, you probably should be renting GPUs by the hour from the various providers serving that need. On Modal, I think should cost about $72/hr to serve Kimi K2 https://modal.com/pricing
  Once that's running it can serve the needs of many users/clients simultaneously. It'd be too expensive and underutilized for almost any individual to use regularly, but it's not unreasonable for them to do it in short intervals just to play around with it. And it might actually be reasonable for a small number of students or coworkers to share a $70/hr deployment for ~40hr/week in a lot of cases; in other cases, that $70/hr expense could be shared across a large number of coworkers or product users if they use it somewhat infrequently.
  So maybe you won't host it at home, but it's actually quite feasible to self-host, and is it ever really worth physically hosting anything at home except as a hobby?
  
  Reply View | 2 replies
  
  apitman 6 hours ago
  
  How does multi-user work, and how many users could it handle concurrently? My only experience is running much smaller models, and they easily peg my GPU at ~90 tokens/s. So maybe I could run 5-10 users at <10t/s? Does software like llama.cpp and ollama handle this?
  
  Reply View | 0 replies
  
  [removed] a day ago
  
  [deleted]
  
  Reply View | 0 replies
- pxc a day ago
  
  I think what GP means is that because the (hopefully) pending OpenAI release is also "too big to run at home", these two models may be close enough in size that they seem more directly comparable, meaning that it's even more important for OpenAI to outperform Kimi K2 on some key benchmarks.
  
  Reply View | 1 reply
  
  [removed] a day ago
  
  [deleted]
  
  Reply View | 0 replies
- spaceman_2020 a day ago
  
  The real users for these open source models are businesses that want something on premises for data privacy reasons
  Not sure if they’ll trust a Chinese model but dropping $50-100k for a quantized model that replaces, say, 10 paralegals is good enough for a law firm
  
  Reply View | 2 replies
  
  MaxPock a day ago
  
  An on-premise,open source Chinese model for my business,or a closed source American model from a company that's a defense contractor .Shouldn’t be too difficult a decision to make.
  
  Reply View | 1 reply
  
  apitman 6 hours ago
  
  Even if they provide the code/data and not just the weights, aren't you taking their word for it that the weights were trained using that code, and not modified? Or is there some way to verify that?
  
  Reply View | 0 replies
- ls612 a day ago
  
  This is a dumb question I know, but how expensive is model distillation? How much training hardware do you need to take something like this and create a 7B and 12B version for consumer hardware?
  
  Reply View | 2 replies
  
  johnb231 a day ago
  
  The process involves running the original model. You can rent these big GPUs for ~$10 per hour, so that is ~$160 per hour for as long as it takes
  
  Reply View | 1 reply
  
  qeternity 21 hours ago
  
  You can rent H100s for $1.50/gpu/hr these days.
  
  Reply View | 0 replies

cubefox a day ago

According to the benchmarks, Kimi K2 beats GPT-4.1 in many ways. So to "compete", OpenAI would have to release the GPT-4.1 weights, or a similar model. Which, I guess, they likely won't do.

Reply View 0 replies