Comment by akoboldfrying
Comment by akoboldfrying 10 hours ago
I find this surprising, but I suppose it must be more efficient overall.
Presumably parsing text into tokens is done in some deterministic way. If it is done by greedily taking the longest-matching prefix that is a token, then when generating text it should be possible to "enrich" tokens that are prefixes of other tokens with additional constraints to force a unique parse: E.g., if "e" is a token but "en" is too, then after generating "e" you must never generate a token that begins with "n". A text generated this way can be deterministically parsed by the greedy parser.
Alternatively, it would suffice to restrict to a subset of tokens that are a prefix code. This would be simpler, but with lower coding efficiency.
Regarding the first part: that's an interesting idea, although I worry it would bias the outputs in an unrealistic way. Then again, maybe it would only impact scenarios that would have otherwise been unparsable anyway?
Regarding the second part: you'd effectively just be limiting yourself to single character tokens in that case which would drastically impact the LLM's output quality