Comment by unbalancedevh

Comment by unbalancedevh 7 days ago

3 replies

> The fact that dz is treated as a single letter in Hungarian means that if you search for “mad”, it should not match “madzag” (which means “string”) because the “dz” in “madzag” is a single letter and not a “d” followed by a “z”, no more than “lav” should match “law” just because the first part of the letter “w” looks like a “v”.

This doesn't seem right. If the individual letters "d" and "z" exist, then it should be possible to have them next to each other in a text file without them necessarily collapsing into a single letter -- especially if they're actually represented as separate characters, which they are in the example. Even if the letter "w" wasn't correctly represented and required actually typing "uu", you wouldn't want the word "vacuum" to be interpreted as having a "w"!

Hunpeter 7 days ago

Yes, I'm Hungarian, and I'm not even mad (pun intended) about "mad" matching "madzag". I find that we ourselves sometimes conflate characters and letters, so many people's first thought would be that "madzag" is six letters. I think most other digraphs e.g. "sz" or "gy" are considered more tightly bound, so one would be unlikely to say that "szám" (=number) is four letters rather than three.

  • d1sxeyes 7 days ago

    Yes but it’s utter nonsense that you shouldn’t return it as a search result. There’s no “dz” key on a Hungarian keyboard, so you’d need to create that (or an alternative way to type it)… and on top of that it’s not consistent.

    The easiest way is to imagine text being written vertically. In some cases, the digraphs (or trigraphs) will be written together on a single line, and sometimes they’ll be written on separate lines.

    However, more consistently, if you imagine a person’s initials, Csanádi Dzsenifer is CsDzs.