Comment by gojomo
Some might prefer the fidelity of this method's 70% savings over the lossyness of 4-bit quantization's 75%.
And, maybe the methods stack for those willing to trade both costs for the smallest representation.
Some might prefer the fidelity of this method's 70% savings over the lossyness of 4-bit quantization's 75%.
And, maybe the methods stack for those willing to trade both costs for the smallest representation.
This is only a 30% savings, which is a cool technical feat but hard to see a use case for.