Comment by stu2010
This is cool, I'd say that the most common tool in this space is bgzip[1]. Have you thought about training a dictionary on the first few chunks of each file and embedding the dictionary in a skippable frame at the start? Likely makes less difference if your chunk size is 2MB, but at smaller chunk sizes that could have significant benefit.
Looking at the spec (https://github.com/facebook/zstd/blob/dev/contrib/seekable_f...), I don't see any mention of custom dictionaries like you describe.
The spec does mention:
> While only Checksum_Flag currently exists, there are 7 other bits in this field that can be used for future changes to the format, for example the addition of inline dictionaries.
so I don't think seekable zstd supports these dictionaries just yet.
With multiple inline dictionaries, one could detect when new chunks compress badly with the previous dictionary and train new ones on the fly. Could be useful for compressing formats with headers and mixed data (i.e. game files, which can contain a mix of text + audio + video, or just regular old .tar files I suppose).