Comment by phi-go
Does this have a compute benefit or could one use different specialized LLM architectures / models for the subnetworks?
Does this have a compute benefit or could one use different specialized LLM architectures / models for the subnetworks?