Comment by mort96
Comment by mort96 8 hours ago
> - "Potentially ill-formed UTF-16", aka "WTF-8", aka "the JavaScript string type"
I thought WTF-8 was just, "UTf-8, but without the restriction to not encode unpaired surrogates"? Windows and Java and JavaScript all use "possibly ill-formed UTF-16" as their string type, not WTF-8.
Also known as UCS-2: https://www.unicode.org/faq/utf_bom.html#utf16-11
Surrogate pairs were only added with Unicode 2.0 in 1996, at which point Windows NT and Java already existed. The fact that those continue to allow unpaired surrogate characters is in parts due to backwards compatibility.