Comment by xg15
I think GP is really talking about extended grapheme clusters (at least the mention of invisible glyph injection makes me think that)
Those really seem hellish to parse, because there seem to be several mutually independent schemes how characters are combined to clusters, depending on what you're dealing with.
E.g. modifier characters, tags, zero-width joiners with magic emoji combinations, etc.
So you need both a copy of the character database and knowledge of the interaction of those various invisible characters.