Comment by irthomasthomas

Comment by irthomasthomas 2 days ago

Someone at openai did say it was too big to host at home, so you could be right. They will probably be benchmaxxing, right now, searching for a few evals they can beat.

johnb231 2 days ago

These are all "too big to host at home". I don't think that is the issue here.

https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_...

"The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP)."

16 GPUs costing ~$30k each. No one is running a ~$500k server at home.

Reply View 12 replies

weitendorf 2 days ago

For most people, before it makes sense to just buy all the hardware yourself, you probably should be renting GPUs by the hour from the various providers serving that need. On Modal, I think should cost about $72/hr to serve Kimi K2 https://modal.com/pricing
Once that's running it can serve the needs of many users/clients simultaneously. It'd be too expensive and underutilized for almost any individual to use regularly, but it's not unreasonable for them to do it in short intervals just to play around with it. And it might actually be reasonable for a small number of students or coworkers to share a $70/hr deployment for ~40hr/week in a lot of cases; in other cases, that $70/hr expense could be shared across a large number of coworkers or product users if they use it somewhat infrequently.
So maybe you won't host it at home, but it's actually quite feasible to self-host, and is it ever really worth physically hosting anything at home except as a hobby?

Reply View | 2 replies
- apitman 12 hours ago
  
  How does multi-user work, and how many users could it handle concurrently? My only experience is running much smaller models, and they easily peg my GPU at ~90 tokens/s. So maybe I could run 5-10 users at <10t/s? Does software like llama.cpp and ollama handle this?
  
  Reply View | 0 replies
- [removed] a day ago
  
  [deleted]
  
  Reply View | 0 replies
pxc 2 days ago

I think what GP means is that because the (hopefully) pending OpenAI release is also "too big to run at home", these two models may be close enough in size that they seem more directly comparable, meaning that it's even more important for OpenAI to outperform Kimi K2 on some key benchmarks.

Reply View | 1 reply
- [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
spaceman_2020 a day ago

The real users for these open source models are businesses that want something on premises for data privacy reasons
Not sure if they’ll trust a Chinese model but dropping $50-100k for a quantized model that replaces, say, 10 paralegals is good enough for a law firm

Reply View | 3 replies
- MaxPock a day ago
  
  An on-premise,open source Chinese model for my business,or a closed source American model from a company that's a defense contractor .Shouldn’t be too difficult a decision to make.
  
  Reply View | 2 replies
  
  apitman 12 hours ago
  
  Even if they provide the code/data and not just the weights, aren't you taking their word for it that the weights were trained using that code, and not modified? Or is there some way to verify that?
  
  Reply View | 1 reply
  
  MaxPock 5 hours ago
  
  I don't care .I'm hosting LLM and I can train or modify it the way I like. I'll have this authoritarian open source any day
  
  Reply View | 0 replies
ls612 2 days ago

This is a dumb question I know, but how expensive is model distillation? How much training hardware do you need to take something like this and create a 7B and 12B version for consumer hardware?

Reply View | 2 replies
- johnb231 2 days ago
  
  The process involves running the original model. You can rent these big GPUs for ~$10 per hour, so that is ~$160 per hour for as long as it takes
  
  Reply View | 1 reply
  
  qeternity a day ago
  
  You can rent H100s for $1.50/gpu/hr these days.
  
  Reply View | 0 replies