Comment by rini17

Comment by rini17 4 days ago

2 replies

This might in general be a good preprocessing step to check for punctuation repeating in fixed intervals and remove it, and restore after decompression.

vintermann 19 hours ago

That turns in into specialized compression, which DNA already has plenty of. Many forms of specialized compression even allow string-related queries directly on the compressed data.

bede 3 days ago

Yes, it sounds like 7-Zip/LZMA can do this using custom filters, among other more exotic (and slow) statistical compression approaches.