Comment by csande17
Comment by csande17 14 hours ago
Yeah, I feel like the only really defensible choices you can make for string representation in a low-level wire protocol in 2025 are:
- "Unicode Scalars", aka "well-formed UTF-16", aka "the Python string type"
- "Potentially ill-formed UTF-16", aka "WTF-8", aka "the JavaScript string type"
- "Potentially ill-formed UTF-8", aka "an array of bytes", aka "the Go string type"
- Any of the above, plus "no U+0000", if you have to interface with a language/library that was designed before people knew what buffer overflow exploits were
> - "Potentially ill-formed UTF-16", aka "WTF-8", aka "the JavaScript string type"
I thought WTF-8 was just, "UTf-8, but without the restriction to not encode unpaired surrogates"? Windows and Java and JavaScript all use "possibly ill-formed UTF-16" as their string type, not WTF-8.