Comment by alexvitkov

Comment by alexvitkov 8 days ago

4 replies

Bringing up "monoculture" here is hilarious, as this whole situation is a direct consequence of a people attempting to enforce just that by replacing their native Cyrillic alphabet with the Latin one.

My native language also happens to use a Cyrillic alphabet and has letters that would translate to multiple ones in the Latin alphabet:

  ш -> sh
  щ -> sht
  я -> ya
Somehow we manage to get by without special sh, sht, and ya unicode characters, weird.
int_19h 7 days ago

The native alphabet for most Southern Slavs would be Glagolitic - indeed, Croatians still occasionally used that in religious contexts as late as 19th century. Cyrillic alphabet is more or less Glagolitic with new and distinct letter shapes replaced by Greek ones, so it is in an of itself a product of the same process that you are complaining about; it just happened a few centuries earlier than the transition to Latin, so you're accustomed to its outcome being the normal.

I should also note that it's not like Cyrillic doesn't have its share of digraphs - that's what combinations like нь effectively are, since they signify a single phoneme. And, conversely, it's pretty obvious that you can have a Latin-based orthography with no digraphs at all, just diacritics.

This whole situation has to do with legacy encodings and not much else.

  • alexvitkov 7 days ago

    > The native alphabet for most Southern Slavs would be Glagolitic

    That's a bit of an exaggeration, the Glagolitic script was only ever used by scholars, the earliest Cyrillic writings are not not even 50 years older than the Glagolitic.

    You're right that the Cyrillic is indeed way closer to the Greek alphabet than the Glagolitic, despite being named after Cyril. I'm not complaining about the "forsaking of culture", I just found it interesting that I was being "mono-cultural" for disagreeing with the existence of a few weird Unicode code-points that themselves are a direct result of someone's attempt to move towards a "mono-culture".

    What I'm complaining against, if anything, are overly complex standards. This is just one of what's probably 100 different quirks that you should be aware of when working with Unicode text, and this one could've been easily avoided by just not including a few useless characters.

    • int_19h 7 days ago

      Unicode is supposed to be able to represent basically everything humans ever wrote, that's why we have things like https://en.wikipedia.org/wiki/Phaistos_Disc_(Unicode_block) in there, and why it's inevitably so complex. These aren't even particularly weird codepoints when you look at some other scripts like Arabic or traditional Mongolian.

      Correctly supporting the entirety of Unicode faithfully in this sense has been unreachable for your average app for a very long time now, IMO, so it's fine to just do the best you can (i.e. usually, the most you can defer to the libraries) for the audience that you actually have or anticipate for convoluted stuff like this. I don't think that correctly handling casing for legacy digraph codepoints is something that many people need in practice, not even speakers of languages whence those Unicode digraphs originate.

      It's still a massive improvement for interop because at least you can be sure that any two apps that need the symbol will use the same encoding for it and will be able to exchange that data, even if nobody truly supports the whole thing.

notpushkin 7 days ago

This exactly. Digraphs should just be deprecated and normalized to two code points.