Comment by MisterTea
Comment by MisterTea 18 hours ago
This is something I have been curious about in terms of how an LLM's achieves compression.
I would like to know what deviations are in the output as this almost feels like a game of telephone where each re-compression results in a loss of data which is then incorrectly reconstructed. Sort of like misremembering a story and as you tell it over time the details change slightly.
When LLMs predict the next token, they actually produce a distribution of the probability of each of the possible next tokens, and the sampler chooses one of them, and not necessarily the most likely one!
If instead you run LLM prediction and then encode the probability of the next token of the input text you want to encode (from the cumulative distribution, a number in [0, 1]) using arithmetic coding, you can run the same operation in reverse to achieve lossless compression.
The tricky part is ensuring that your LLM executes absolutely deterministically, because you need to make sure that the encoder and decoder have the same probability distribution map at each step.