Comment by zahlman

Comment by zahlman 7 hours ago

0 replies

They should have at least all used a single system. Instead, we have:

* European-style combining characters, as well as precomposed versions for some arbitrary subset of legal combinations, and nothing preventing you from stacking them arbitrarily (as in Zalgo text) or on illogical base characters (who knows what your font renderer will do if you ask to put a cedilla on a kanji? It might even work!)

* Jamo for Hangul that are three pseudo-characters representing the parts of a larger character, that have to be in order (and who knows what you're supposed to do with an invalid jamo sequence)

* Emoji that are produced by applying a "variation selector" to a normal character

* Emoji that are just single characters — including ones that used to be normal characters and were retconned to now require the variation selector to get the original appearance

* Some subset of emoji that can have a skin-tone modifier applied as a direct suffix

* Some other subset of emoji that are formed by combining other emoji, which requires a zero-width-joiner in between (because they'd also be valid separately), which might be rendered as the base components anyway if no joined glyph is available

* National flags that use a pair of abstract characters used to spell a country code; neither can be said to be the base vs the modifier (this lets them say that they never removed or changed the meaning of a "character" while still allowing for countries to change their country codes, national flags or existence status)

* Other flags that use a base flag character, followed by "tag letter" characters that were originally intended for a completely different purpose that never panned out; and also there was temporary disagreement about which base character should be used

* Other other flags that are vendor-specific but basically work like emoji with ZWJ sequences

And surely more that I've forgotten about or not learned about yet.