Comment by happytoexplain
Comment by happytoexplain 9 hours ago
I have a love-hate relationship with backwards compatibility. I hate the mess - I love when an entity in a position of power is willing to break things in the name of advancement. But I also love the cleverness - UTF-8, UTF-16, EAN, etc. To be fair, UTF-8 sacrifices almost nothing to achieve backwards compat though.
> To be fair, UTF-8 sacrifices almost nothing to achieve backwards compat though.
It sacrifices the ability to encode more than 21 bits, which I believe was done for compatibility with UTF-16: UTF-16’s awful “surrogate” mechanism can only express code units up to 2^21-1.
I hope we don’t regret this limitation some day. I’m not aware of any other material reason to disallow larger UTF-8 code units.