Comment by viraptor

Comment by viraptor 2 days ago

0 replies

Sounds like dumping the routing information from programming questions would answer that... I guess I can do a dump from qwen or deepseek locally. You'd think someone would created that kind of graph already, but I couldn't find one.

What I did find instead is that some MoE models are explicitly domain-routed (MoDEM), but it doesn't apply to deepseek which is just equally load balanced, so it's unlikely to apply to Kimi. On the other hand, https://arxiv.org/html/2505.21079v1 shows modality preferences between experts, even in mostly random training. So maybe there's something there.