Comment by zahlman
Comment by zahlman 11 hours ago
I've always taken "WTF-8" to mean that someone had mistakenly interpreted UTF-8 data as being in Latin-1 (or some other code page) and UTF-8 encoded it again.
Comment by zahlman 11 hours ago
I've always taken "WTF-8" to mean that someone had mistakenly interpreted UTF-8 data as being in Latin-1 (or some other code page) and UTF-8 encoded it again.
GP is right about the original meaning, author of that page acknowledges hijacking it here: https://news.ycombinator.com/item?id=9611710
When I posted that, I was honestly projecting from my own use. I think I may have independently thought of the term on Stack Overflow prior to koalie's tweet, but it's not the easiest thing (by design) to search for comments there (and that's assuming they don't get deleted, which they usually should).
(On review, it appears that the thread mentions much earlier uses...)
I did the search because I have a similar memory, I'd place it in the early 2000s before StackOverflow existed, around when people were first switching from latin1 and Windows-1251 and others to UTF-8 on the web and browsers would often pick the wrong encoding, and IE had a submenu where you could tell it which one to use on the page. WTF-8 was a thing because occasionally none of these options would work, because the layers server-side would be misconfigured and cause the double (or more, if it involved user input) encoding. It was also used just in general to complain about UTF-8 breaking everything as it was slowly being introduced.
That thing was occasionally called WTF-8, but not often—it was normally called “double UTF-8” (if given a name at all).
In the last few years, the name has become very popular with Simon Sapin’s definition.
No, WTF-8[1] is a precisely defined format (that isn't that).
If you imagine a format that can encode JavaScript strings containing unpaired surrogates, that's WTF-8. (Well-formed WTF-8 is the same type as a JS string, through with a different representation.)
(Though that would have been cute name for the UTF-8/latin1/UTF-8 fail.)
[1]: https://simonsapin.github.io/wtf-8/