Comment by lhl

Comment by lhl 14 days ago

0 replies

You're right on ratios, but actually the ratio is much worse than 6:1 since they are MoEs. The 20B has 3.6B active, and the 120B has only 5.1B active, only about 40% more!