Comment by bede
Yes I'd expect a dict-based approach to do better here. That's probably how it should be done. But --long is compelling for me because using it requires almost no effort, it's still very fast, and yet it can dramatically improve compression ratio.
From what I've read (although I haven't tested and I can't find my source from when I read it), dictionaries aren't very useful when dataset is big, and just by using '--long' you can cover that improvement.
Have any of you tested it?