Comment by dcrazy

Comment by dcrazy a day ago

2 replies

You’re arguing against a strawman. The advice wasn’t to ignore learning about code points; it’s that if your solution to a problem involves reasoning about code points, you’re probably doing it wrong and are likely to make a mistake.

Trying to handle code points as atomic units fails even in trivial and extremely common cases like diacritics, before you even get to more complicated situations like emoji variants. Solving pretty much any real-world problem involving a Unicode string requires factoring in canonical forms, equivalence classes, collation, and even locale. Many problems can’t even be solved at the _character_ (grapheme) level—text selection, for example, has to be handled at the grapheme _cluster_ level. And even then you need a rich understanding of those graphemes to know whether to break them apart for selection (ligatures like fi) or keep them intact (Hangul jamo).

Yes, people should learn about code points. Including why they aren’t the level they should be interacting with strings at.

torstenvl a day ago

> You’re arguing against a strawman.

Ironic.

> The advice wasn’t to ignore learning about code points

I didn't say "learning about."

Look man. People operate at different levels of abstraction, depending on what they're doing.

If you're doing front-end web dev, sure, don't worry about it. If you're hacking on a text editor in C, then you probably ought to be able to take a string of UTF-8 bytes, decode them into code points, and apply the grapheme clustering algorithm to them, taking into account your heuristics about what the terminal supports. And then probably either printing them to the screen (if it seems like they're supported) or printing out a representation of the code points. So yeah, you kind of have to know.

So don't sit there and presume to tell others what they should or should not reason about, based solely on what you assume their use case is.