Comment by csande17
IMO if you care about surrogate code points being invalid, you're in "designing the system around UTF-16" territory conceputally -- even if you then send the bytes over the wire as UTF-8, or some more exotic/compressed format. Same as how "potentially ill-formed UTF-16" and WTF-8 have the same underlying model for what a string is.
The Unicode spec itself is designed around UTF-16: the block of code points that surrogate pairs would map to are reserved for that purpose and explicitly given “no interpretation” by the spec. [1] An implementation has to choose how to behave if it encounters one of these reserved code points in e.g. a UTF-8 string: Throw an encoding error? Silently drop the character? Convert it to an Object Replacement character?
[1] https://www.unicode.org/versions/Unicode16.0.0/core-spec/cha...