Comment by o11c
I have had real-world programs broken by blind assumption of "does not deliberately contain controls" (form feed is particularly common for things intended to be paginated, escape is common for things designed for a terminal, etc.) and even "is fully UTF-8" (there are lots of old data files and logs that are never going away).
If you aren't doing something useful with the text, you're best off passing a byte-sequence through unchanged. Unfortunately, Microsoft Windows exists, so sometimes you have to pass `char16_t` sequences through instead.
The worst part about UTF-16 is that invalid UTF-16 is fundamentally different than invalid UTF-8. When translating between them (really: when transforming external data into an internal form for processing), the former can use WTF-8 whereas the latter can use Python-style surrogateescape, but you can't mix these.