Comment by layer8

Comment by layer8 13 hours ago

6 replies

One benefit of the suffix convention is that strings sort more usefully that way by default, without requiring special handling for those characters.

Unicode 1.0 also explains: “The convention used by the Unicode standard is consistent with the logical order of other non-spacing marks in Semitic and Indic scripts, the great majority of which follow the base characters with respect to which they are positioned. To avoid the complication of defining and implementing non-spacing marks on both sides of base characters, the Unicode standard specifies that all non-spacing marks must follow their base characters. This convention conforms to the way modern font technology handles the rendering of non-spacing graphical forms, so that mapping from character store to font rendering is simplified.”

kps 13 hours ago

Sorting is a good point.

On the other hand, prefix combining characters would have vastly simplified keyboard handling, since that's exactly what typewriter dead keys are.

  • layer8 13 hours ago

    Keyboard input handling at that level generally isn’t character-based, and instead requires looking at scancodes and modifier keys, and sometimes also distinguishing between keyup and keydown events.

    You generally also don’t want to produce different Unicode sequences depending on whether you have an “é” key you can press or have to use a dead-key “’”.

    • kps 13 hours ago

      Depends on the system. X11/Wayland do it at a higher level where you have `<dead_acute> <e> : eacute` and keysyms are effectively a superset of Unicode with prefix combiners. (This can lead to weirdness since the choice of Compose rules is orthogonal to the choice of keyboard layout.)

      • layer8 12 hours ago

        I guess your conception is that one could then define

            <dead_acute> : <combining_acute_accent>
        
        instead and use it for arbitrary letters. However, that would fail in locales using a non-Unicode encoding such as iso-8859-1 that only contain the combined character. Unless you have the input system post-process the mapped input again to normalize it to e.g. NFC before passing it on to the application, in which case the combination has to be reparsed anyway. So I don’t see what would be gained with regard to ease of parsing.

        If you want to define such a key, you can probably still do it, you’ll just have to press it in the opposite order and use backspace if you want to cancel it.

        The fact that dead keys happen to be prefix is in principle arbitrary, they could as well be suffix. On physical typewriters, suffix was more customary I think, i.e. you’d backspace over the character you want to accent and type the accent on top of it. To obtain just the accent, you combine it with Space either way.

        • moefh an hour ago

          > On physical typewriters, suffix was more customary I think

          Why would anyone type like that? Instead of pressing two keys (the accent key followed by the letter key), you'd need to press four (letter, backspace, accent, space bar) for no reason.

  • dcrazy 12 hours ago

    Not all input methods use dead keys to emit combining characters.