Comment by Tade0

Comment by Tade0 8 days ago

4 replies

Strange that this exists. Polish also has dz(it's the same phoneme), along with dź, dż, sz, cz, all of which use Title case in, among other instances, acronyms (e.g. RiGCz), but I'm not aware of any special code points for them - dz is definitely always spelled as d-z.

int_19h 7 days ago

Does Polish treat them as distinct letters in their own right for sorting purposes? That is usually when you see digraphs appear in (at least some) national encodings, from whence they end up in Unicode for compatibility reasons.

  • dhosek 7 days ago

    Sorting rules can get really weird, and while some languages treat digraphs as separate letters for sorting, (e.g., Czech considers ch a separate letter coming after h), Polish does not.

advisedwang 7 days ago

Per the article:

> These digraphs owe their existence in Unicode ... to Serbo-Croatian. Serbo-Croatian is written in both Latin script (Croatian) and Cyrillic script (Serbian), and these digraphs permit one-to-one transliteration between them.

  • dhosek 7 days ago

    There are lots of weirdnesses in Unicode that are consequences of enabling lossless round-trip translations to/from legacy encodings. Inconsistencies in how the various descendants of the Brahmic script are another such consequence.