HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by eru

Comment by eru 14 hours ago

1 reply

View on Hacker News

Yes.

When you train your neural network to minimise cross-entropy that's literally the same as making it better as a building block in an arithmetic coding data compressor. See https://en.wikipedia.org/wiki/Arithmetic_coding

See also https://learnandburn.ai/p/an-elegant-equivalence-between-llm...

senderista 8 hours ago

Indeed, KL-divergence can be seen as the difference between the average number of bits required to arithmetically encode a sample from a given distribution, using symbol probabilities from both the original distribution and an approximating distribution.

https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_diver...

Reply View | 0 replies