johnb231 2 days ago

These are all "too big to host at home". I don't think that is the issue here.

https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_...

"The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP)."

16 GPUs costing ~$30k each. No one is running a ~$500k server at home.

  • weitendorf 2 days ago

    For most people, before it makes sense to just buy all the hardware yourself, you probably should be renting GPUs by the hour from the various providers serving that need. On Modal, I think should cost about $72/hr to serve Kimi K2 https://modal.com/pricing

    Once that's running it can serve the needs of many users/clients simultaneously. It'd be too expensive and underutilized for almost any individual to use regularly, but it's not unreasonable for them to do it in short intervals just to play around with it. And it might actually be reasonable for a small number of students or coworkers to share a $70/hr deployment for ~40hr/week in a lot of cases; in other cases, that $70/hr expense could be shared across a large number of coworkers or product users if they use it somewhat infrequently.

    So maybe you won't host it at home, but it's actually quite feasible to self-host, and is it ever really worth physically hosting anything at home except as a hobby?

    • apitman 12 hours ago

      How does multi-user work, and how many users could it handle concurrently? My only experience is running much smaller models, and they easily peg my GPU at ~90 tokens/s. So maybe I could run 5-10 users at <10t/s? Does software like llama.cpp and ollama handle this?

    • [removed] a day ago
      [deleted]
  • pxc 2 days ago

    I think what GP means is that because the (hopefully) pending OpenAI release is also "too big to run at home", these two models may be close enough in size that they seem more directly comparable, meaning that it's even more important for OpenAI to outperform Kimi K2 on some key benchmarks.

    • [removed] 2 days ago
      [deleted]
  • spaceman_2020 a day ago

    The real users for these open source models are businesses that want something on premises for data privacy reasons

    Not sure if they’ll trust a Chinese model but dropping $50-100k for a quantized model that replaces, say, 10 paralegals is good enough for a law firm

    • MaxPock a day ago

      An on-premise,open source Chinese model for my business,or a closed source American model from a company that's a defense contractor .Shouldn’t be too difficult a decision to make.

      • apitman 12 hours ago

        Even if they provide the code/data and not just the weights, aren't you taking their word for it that the weights were trained using that code, and not modified? Or is there some way to verify that?

        • MaxPock 5 hours ago

          I don't care .I'm hosting LLM and I can train or modify it the way I like. I'll have this authoritarian open source any day

  • ls612 2 days ago

    This is a dumb question I know, but how expensive is model distillation? How much training hardware do you need to take something like this and create a 7B and 12B version for consumer hardware?

    • johnb231 2 days ago

      The process involves running the original model. You can rent these big GPUs for ~$10 per hour, so that is ~$160 per hour for as long as it takes