Comment by guappa

Comment by guappa a day ago

19 replies

What if you need to find 5 letter words to play wordle? Why do you care how many bytes they occupy or how large they are on screen?

xigoi a day ago

In the case of Wordle, you know the exact set of letters you’re going to be using, which easily determines how to compute length.

  • guappa a day ago

    No no, I want to create tomorrow's puzzle.

    • tomsmeding a day ago

      As the parent said:

      > In the case of Wordle, you know the exact set of letters you’re going to be using

      This holds for the generator side too. In fact, you have a fixed word list, and the fixed alphabet tells you what a "letter" is, and thus how to compute length. Because this concerns natural language, this will coincide with grapheme clusters, and with English Wordle, that will in turn correspond to byte length because it won't give you words with é (I think). In different languages the grapheme clusters might be larger than 1 byte (e.g. [1], where they're codepoints).

taneq a day ago

If you're playing at this level, you need to define:

- letter

- word

- 5 :P

  • guappa a day ago

    Eh in macedonian they have some letters that in russian are just 2 separate letters

    • CorrectHorseBat a day ago

      In German you have the same, only within one language. ß can be written as ss if it isn't available in a font, and only in 2017 they added a capital version. So depending the font and the unicode version the number of letters can differ.

      • kbelder a day ago

        "Traditionally, ⟨ß⟩ did not have a capital form, and was capitalized as ⟨SS⟩. Some type designers introduced capitalized variants. In 2017, the Council for German Orthography officially adopted a capital form ⟨ẞ⟩ as an acceptable variant, ending a long debate."

        Thanks, that is interesting!

      • guappa a day ago

        should "ß" == "ss" evaluate as true?

    • int_19h 18 hours ago

      That's not really any different than the distinction (or lack thereof) between "ae" and "æ". For that matter, in Russian there is a letter "ы" which is historically a digraph consisting of two separately letters "ъ" and "i" that just happens to be treated as a single letter for so long that few people would even recognize it as a digraph. This kind of stuff is all language-specific, which is why for Worlde etc you always need to be aware of the context, and this context will then unambiguously decide what constitutes a single letter.