Comment by Rendello
I (OP) have been working on some Unicode visualization tooling for a while now. The idea started when I had some buggy string-matching code. I was matching case-insensitively, then using those ranges to highlight the original text.
Turns out, sometimes changing case changes not only the number of bytes (in UTF-8), but the number of encoded characters! This led to my post "UTF-8 characters that behave oddly when the case is changed" [1], which inspired a lot of conversation that taught me a lot. After that, I started reading Unicode documentation in earnest, and building up an idea of what a new tool should show. I'm trying to make clear things I didn't (and sometimes still don't) understand, so I'd love to know what causes pains in the wild / gaps in people's understanding.