Comment by viraptor
How well separated are experts per domain in a model like that? Specifically, if I'm interested in a programming use only, could we possibly strip it to one or two of them? Or should I assume a much wider spread? (And there would be some overlap anyway from the original root model)
My experience is that experts are not separated in any intuitive way. I would be very interested (and surprised) if someone manages to prune a majority of experts in a way that preserves model capabilities in a specific domain but not others.
See https://github.com/peteryuqin/Kimi-K2-Mini, a project that keeps a small portion of experts and layers and keep the model capabilities across multiple domains.