Comment by pkilgore

Comment by pkilgore 11 hours ago

1 reply

To me the most interesting thing here isn't that you can compress something better by removing randomly-distributed semantically-meaningless information. It's why zstd --long does so much better than gzip when you do and the default does worse than gzip.

What lessons can we take from this?

cogman10 8 hours ago

Why it does worse than gzip isn't something that I know. Why --long is so efficient is likely a result of evolution of all things :). A lot of things have common ancestors which means shared genetic patterns across species. --long allows zstd to see a 2gb window of data which means it's likely finding all those genetic similarities across species.

Endogenous retroviruses [1] are interesting bits of genetics that helps link together related species. A virus will inject a bit of it's genetics into the host which can effectively permanently scar the host's DNA and all their offspring's DNA.

[1] https://en.wikipedia.org/wiki/Endogenous_retrovirus