Comment by danhau
Are you referring to Unicode? Because UTF-8 is simple and relatively straight forward to parse.
Unicode definitely has its faults, but on the whole it‘s great. I‘ll take Unicode w/ UTF-8 any day over the mess of encodings we had before it.
Needless to say, Unicode is not a good fit for every scenario.
I think GP is really talking about extended grapheme clusters (at least the mention of invisible glyph injection makes me think that)
Those really seem hellish to parse, because there seem to be several mutually independent schemes how characters are combined to clusters, depending on what you're dealing with.
E.g. modifier characters, tags, zero-width joiners with magic emoji combinations, etc.
So you need both a copy of the character database and knowledge of the interaction of those various invisible characters.