Comment by Waterluvian
Comment by Waterluvian 14 hours ago
I’m frustrated by things like Unicode where it’s “good” except… you need to know to exclude some of them. Unicode feels like a wild jungle of complexity. An understandable consequence of trying to formalize so many ways to write language. But it really sucks to have to reason about some characters being special compared to others.
The only sanity I’ve found is to treat Unicode strings as if they’re some proprietary data unit format. You can accept them, store them, render them, and compare them with each other for (data, not semantic) equality. But you just don’t ever try to reason about their content. Heck I’m not even comfortable trying to concatenate them or anything like that.
Unicode really is an impossibly bottomless well of trivia and bad decisions. As another example, the article's RFC warns against allowing legacy ASCII control characters on the grounds that they can be confusing to display to humans, but says nothing about the Explicit Directional Overrides characters that https://www.unicode.org/reports/tr9/#Explicit_Directional_Ov... suggests should "be avoided wherever possible, because of security concerns".