Comment by jeroenhd

Comment by jeroenhd 6 months ago

2 replies

Looking at the spec (https://github.com/facebook/zstd/blob/dev/contrib/seekable_f...), I don't see any mention of custom dictionaries like you describe.

The spec does mention:

> While only Checksum_Flag currently exists, there are 7 other bits in this field that can be used for future changes to the format, for example the addition of inline dictionaries.

so I don't think seekable zstd supports these dictionaries just yet.

With multiple inline dictionaries, one could detect when new chunks compress badly with the previous dictionary and train new ones on the fly. Could be useful for compressing formats with headers and mixed data (i.e. game files, which can contain a mix of text + audio + video, or just regular old .tar files I suppose).

ikawe 6 months ago

Custom dictionaries are a feature of vanilla (non-seekable) zstd. As I understand it, all seekable-zstd are valid zstd, so it should be possible?

https://github.com/facebook/zstd?tab=readme-ov-file#the-case...

  • rorosen 6 months ago

    Yes, dictionaries should be totally possible. However, I've never tried them to be honest because I usually only compress big files. They can be set on the (de)compression contexts the same way as with regular zstd.