Comment by meatmanek
Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.
Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.
Thanks. That was not obvious to me either.