shever73 10 months ago

A nice synchronicity here, I was only checking Māori words today because The Guardian's cryptic crossword was set by "Pangakupu" (which means, logically enough, "crossword"). This crossword setter always includes a hidden Māori word or phrase in the puzzle.

mydriasis 10 months ago

I see you've posted about Maori stuff a couple of times. I want to congratulate you, this is really, really great. Thank you for working to preserve a language and culture! You're presenting resources that are tough to find, and that's an amazing thing.

hk__2 10 months ago

I get "Sorry, something went wrong. If this error persists, contact us." every time I type something.

ks2048 10 months ago

I can't type anything in the text area on Firefox. Works in Chrome (macOS).

  • XeO3 10 months ago

    Yeah, that element should be 'textarea' instead of 'div' or at least the 'contenteditable' should be true.

  • joemi 10 months ago

    Also doesn't work in Firefox on Windows but does in Chrome.

stephantul 10 months ago

This is very nice and important. We need more tools for small languages.

timonoko 10 months ago

Is this true that Maaori is crapped by Ænglish spelling? In all other languages long vowel is just two wovels, not some stupid umlaut on top.

  • timonoko 10 months ago

    Yes, says ChatGPT:

    The Māori word "Māori" can be transcribed into the International Phonetic Alphabet (IPA) as:

    /ˈmaːɔɾi/

    Here’s a breakdown:

      /ˈ/ – indicates primary stress on the first syllable
      /m/ – a voiced bilabial nasal, like the "m" in "man"
      /aː/ – a long open front unrounded vowel, similar to the "a" in "father," but held longer (the macron indicates length)
      /ɔ/ – a mid-open back rounded vowel, like the "o" in "thought"
      /ɾ/ – a tapped or flapped "r," similar to the quick "r" sound in Spanish "pero"
      /i/ – a close front unrounded vowel, like the "ee" in "see"
      This transcription represents the most common pronunciation of the word "Māori."
  • TRiG_Ireland 10 months ago

    It's certainly not an umlaut. Nor yet is it a trema, which is what you probably mean. It's a macron, which is commonly used to mark long vowels.

    • timonoko 10 months ago

      Sort-of. Because In Anglo world "aa" is "ä". Even ChatGPT thinks that it ok to use "AA" when making a Finnish morse generator.

      In hindsight Maaori is not so bad. Some American Indian writing systems are just pronunciation quides for Anglos (or French). I tried to study Haida some 30 years ago, but it was too complex and miserable, because there was no actual audio clips available at that time.

      • TRiG_Ireland 10 months ago

        ChatGPT doesn't think, and I fail to see how it is in any way relevant to the discussion.

        Marking a long vowel with a macron has a long heritage, dating back to Ancient Greece at least. Yes, some other writing systems, such as Greenlandic, use a double vowel.

        Finnish seems to use ä, ö and å as independent letters, rather like Swedish and Danish, unlike German, were ä, ö and ü are regarded as a, o and u with a diacritical mark. These do not seem to be symbols which mark vowel length.

        I don't know Māori, but the Wikipedia page gives the alphabetical order for the language and does not list the long vowels separately, so I assume that, as with German or French, they're regarded as the standard letters with a diacritic mark added.

        • peterashford 10 months ago

          They are indeed standard letters with diacritics added - but macrons are the only diacritical marks used for Māori. Some people do use double vowels but it's less common than using macrons.

neallindsay 10 months ago

excellent use of a Punycode domain

  • yardstick 10 months ago

    I was going to disagree with you, because most kiwis have no idea how to write the special o (myself included), so they’d end up typing toreo.nz instead.

    Which as it turns out, redirects to xn--treo-l3a.nz anyway.

    Nice!

    • lostlogin 10 months ago

      > kiwis have no idea how to write the special o (myself included)

      I’m in New Zealand too. I work in MRI and have to type ‘TE’ (echo time) regularly, as well as the Māori word ‘te’.

      Whatever secret sauce Apple sprinkles into iOS is actually malignant and it takes about 3 edits to type te/TE whenever I try.

      • nicoburns 10 months ago

        Yeah, Apple's autocorrect implementation is shockingly bad. Android is much better in this regard.

    • mkl 10 months ago

      The ō is an o with a macron. It's pretty easy to install a keyboard layout that supports it: https://kupu.maori.nz/about/macrons-keyboard-setup. Many mobile keyboards support it by default with long presses to pop up an accent/variant chooser.

      • lmm 10 months ago

        > It's pretty easy to install a keyboard layout that supports it

        Only if you don't need anything else from your keyboard layout. I use Dvorak and need to type Japanese, and I think either of those makes it impossible to enter macrons on Windows.

    • EdwardDiego 10 months ago

      I'm a fan of MacOS for making it real easy to type vowels with umlauts / macrons etc.

      • samatman 10 months ago

        Unfortunately the macron is the one missing dead-key accent on the US "ABC" layout. It's easy enough to hit the globe key when this comes up, but it annoys me a bit that Opt-y is ¥, and Shift-Opt-Y is Á, which is a duplicate: Opt-e-A will also produce it. I'd be happier if Opt-y was the macron dead key and Shift-Opt-Y took over for ¥: I can go a year without needing the Yen symbol, but it makes sense to have it. I don't think the English layout needs two ways to type Á though, it's excessive.

scanny 10 months ago

Awesome work, love to see the effort on the technical front of bringing a language into broader use!

pabs3 10 months ago

Is the source code of this somewhere?

addaon 10 months ago

Slightly off-topic, but it would be nice if HN interpreted punycode in link descriptions. Especially given that the links go through a redirect, which means that the browser status bar sees them as part of the query and not the domain, so the browser's own interpretation of punycode never gets applied.

  • zahlman 10 months ago

    Seeing the Punycode link is actually a security feature, because it means you aren't tricked into visiting, say, pple-06g.com (apple with a Cyrillic a).

    • smallerize 10 months ago

      There are conventions around that. https://chromium.googlesource.com/chromium/src/+/main/docs/i... Generally, if all the characters are from one script, then it is decoded. There are lots of exceptions detailed there, but it's harder to make a homoglyph attack work using only characters from one script to impersonate another.

      • dmurray 10 months ago

        That's not a convention, it's a specification for how Google Chrome does it.

        And it's not even a full specification. Several of its 13 steps link to other documents that need to be read to implement the spec fully. Step 12 refers to a list of "dangerous patterns" which appears only to exist in the Chromium source. Step 5 refers vaguely to "any characters used in an unusual way".

        It's not OK to say that because Chromium does it, it's some internet standard that random website maintainers should implement.

        • smallerize 10 months ago

          I think you're ignoring the conversation. There is a lot of discussion to be had, and we don't have to say that decoding punycode is a security risk and simply do without. I also said "conventions" specifically to avoid meaning that these are hard-and-fast rules. And Firefox does something pretty similar. https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm

  • lpapez 10 months ago

    You can easily write a Tampermonkey Userscript for that. As HN doesn't update the CSS that often, should be quite low-maintenance solution.

  • samatman 10 months ago

    Someone always says this when a punycode link shows up.

    I'm glad they don't. What you see? That's the link. It's what the browser sends, it's what DNS resolves: it's the link. Displaying it as Unicode is just a display option, and it's one which opens up all manner of mischief through confusables.

    It's a hacker culture choice, and it's one I appreciate.

    • TRiG_Ireland 10 months ago

      On the other hand, that's a rather ango-centric viewpoint.

      • samatman 10 months ago

        It is! So kind of you to notice. Perhaps you could also notice that English is the language used on Hacker News.

        I'm quite sure a website centered in a different cultural landscape might choose a different convention. Good for them, I say.

        If URLs start being Unicode, and not an ASCII encoding which is sometimes displayed as Unicode, that would be a different story. But that's not how things are.

  • [removed] 10 months ago
    [deleted]