Comment by shawnz
Comment by shawnz 18 hours ago
Another fun application of combining LLMs with arithmetic coding is steganography. Here's a project I worked on a while back which effectively uses the opposite technique of what's being done here, to construct a steganographic transformation: https://github.com/shawnz/textcoder
Cool! It creates very plausible encodings.
> The Llama tokenizer used in this project sometimes permits multiple possible tokenizations for a given string.
Not having tokens be a prefix code is thoroughly unfortunate. Do the Llama team consider it a bug? I don't see how to rectify the situation without a full retrain, sadly.