Comment by johnb231

Comment by johnb231 a day ago

11 replies

These are all "too big to host at home". I don't think that is the issue here.

https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_...

"The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP)."

16 GPUs costing ~$30k each. No one is running a ~$500k server at home.

weitendorf a day ago

For most people, before it makes sense to just buy all the hardware yourself, you probably should be renting GPUs by the hour from the various providers serving that need. On Modal, I think should cost about $72/hr to serve Kimi K2 https://modal.com/pricing

Once that's running it can serve the needs of many users/clients simultaneously. It'd be too expensive and underutilized for almost any individual to use regularly, but it's not unreasonable for them to do it in short intervals just to play around with it. And it might actually be reasonable for a small number of students or coworkers to share a $70/hr deployment for ~40hr/week in a lot of cases; in other cases, that $70/hr expense could be shared across a large number of coworkers or product users if they use it somewhat infrequently.

So maybe you won't host it at home, but it's actually quite feasible to self-host, and is it ever really worth physically hosting anything at home except as a hobby?

  • apitman 6 hours ago

    How does multi-user work, and how many users could it handle concurrently? My only experience is running much smaller models, and they easily peg my GPU at ~90 tokens/s. So maybe I could run 5-10 users at <10t/s? Does software like llama.cpp and ollama handle this?

  • [removed] a day ago
    [deleted]
pxc a day ago

I think what GP means is that because the (hopefully) pending OpenAI release is also "too big to run at home", these two models may be close enough in size that they seem more directly comparable, meaning that it's even more important for OpenAI to outperform Kimi K2 on some key benchmarks.

  • [removed] a day ago
    [deleted]
spaceman_2020 a day ago

The real users for these open source models are businesses that want something on premises for data privacy reasons

Not sure if they’ll trust a Chinese model but dropping $50-100k for a quantized model that replaces, say, 10 paralegals is good enough for a law firm

  • MaxPock a day ago

    An on-premise,open source Chinese model for my business,or a closed source American model from a company that's a defense contractor .Shouldn’t be too difficult a decision to make.

    • apitman 6 hours ago

      Even if they provide the code/data and not just the weights, aren't you taking their word for it that the weights were trained using that code, and not modified? Or is there some way to verify that?

ls612 a day ago

This is a dumb question I know, but how expensive is model distillation? How much training hardware do you need to take something like this and create a 7B and 12B version for consumer hardware?

  • johnb231 a day ago

    The process involves running the original model. You can rent these big GPUs for ~$10 per hour, so that is ~$160 per hour for as long as it takes

    • qeternity 21 hours ago

      You can rent H100s for $1.50/gpu/hr these days.