Comment by senderista
Comment by senderista 8 hours ago
Indeed, KL-divergence can be seen as the difference between the average number of bits required to arithmetically encode a sample from a given distribution, using symbol probabilities from both the original distribution and an approximating distribution.
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_diver...