Tō Reo – A Māori Spellchecker
(xn--treo-l3a.nz)170 points by firstbabylonian 10 months ago
170 points by firstbabylonian 10 months ago
I see you've posted about Maori stuff a couple of times. I want to congratulate you, this is really, really great. Thank you for working to preserve a language and culture! You're presenting resources that are tough to find, and that's an amazing thing.
Wow neat! There's a great collection of Māori-made technology for te reo Māori. I'm thinking also of Te Hiku Media's work building a Māori speech recognition system: https://blogs.nvidia.com/blog/te-hiku-media-maori-speech-ai/
My favourite is PAHU! https://pahu.maori.nz/ - a lorem ipsum generator in te reo Māori.
https://www.maoridictionary.co.nz/ This is the dictionary I use most often
Thanks — there was a cookie-related bug, which should now be resolved.
This is very nice and important. We need more tools for small languages.
Yes, says ChatGPT:
The Māori word "Māori" can be transcribed into the International Phonetic Alphabet (IPA) as:
/ˈmaːɔɾi/
Here’s a breakdown:
/ˈ/ – indicates primary stress on the first syllable
/m/ – a voiced bilabial nasal, like the "m" in "man"
/aː/ – a long open front unrounded vowel, similar to the "a" in "father," but held longer (the macron indicates length)
/ɔ/ – a mid-open back rounded vowel, like the "o" in "thought"
/ɾ/ – a tapped or flapped "r," similar to the quick "r" sound in Spanish "pero"
/i/ – a close front unrounded vowel, like the "ee" in "see"
This transcription represents the most common pronunciation of the word "Māori."
It's certainly not an umlaut. Nor yet is it a trema, which is what you probably mean. It's a macron, which is commonly used to mark long vowels.
Sort-of. Because In Anglo world "aa" is "ä". Even ChatGPT thinks that it ok to use "AA" when making a Finnish morse generator.
In hindsight Maaori is not so bad. Some American Indian writing systems are just pronunciation quides for Anglos (or French). I tried to study Haida some 30 years ago, but it was too complex and miserable, because there was no actual audio clips available at that time.
ChatGPT doesn't think, and I fail to see how it is in any way relevant to the discussion.
Marking a long vowel with a macron has a long heritage, dating back to Ancient Greece at least. Yes, some other writing systems, such as Greenlandic, use a double vowel.
Finnish seems to use ä, ö and å as independent letters, rather like Swedish and Danish, unlike German, were ä, ö and ü are regarded as a, o and u with a diacritical mark. These do not seem to be symbols which mark vowel length.
I don't know Māori, but the Wikipedia page gives the alphabetical order for the language and does not list the long vowels separately, so I assume that, as with German or French, they're regarded as the standard letters with a diacritic mark added.
They are indeed standard letters with diacritics added - but macrons are the only diacritical marks used for Māori. Some people do use double vowels but it's less common than using macrons.
I was going to disagree with you, because most kiwis have no idea how to write the special o (myself included), so they’d end up typing toreo.nz instead.
Which as it turns out, redirects to xn--treo-l3a.nz anyway.
Nice!
> kiwis have no idea how to write the special o (myself included)
I’m in New Zealand too. I work in MRI and have to type ‘TE’ (echo time) regularly, as well as the Māori word ‘te’.
Whatever secret sauce Apple sprinkles into iOS is actually malignant and it takes about 3 edits to type te/TE whenever I try.
The ō is an o with a macron. It's pretty easy to install a keyboard layout that supports it: https://kupu.maori.nz/about/macrons-keyboard-setup. Many mobile keyboards support it by default with long presses to pop up an accent/variant chooser.
I'm a fan of MacOS for making it real easy to type vowels with umlauts / macrons etc.
Unfortunately the macron is the one missing dead-key accent on the US "ABC" layout. It's easy enough to hit the globe key when this comes up, but it annoys me a bit that Opt-y is ¥, and Shift-Opt-Y is Á, which is a duplicate: Opt-e-A will also produce it. I'd be happier if Opt-y was the macron dead key and Shift-Opt-Y took over for ¥: I can go a year without needing the Yen symbol, but it makes sense to have it. I don't think the English layout needs two ways to type Á though, it's excessive.
Slightly off-topic, but it would be nice if HN interpreted punycode in link descriptions. Especially given that the links go through a redirect, which means that the browser status bar sees them as part of the query and not the domain, so the browser's own interpretation of punycode never gets applied.
There are conventions around that. https://chromium.googlesource.com/chromium/src/+/main/docs/i... Generally, if all the characters are from one script, then it is decoded. There are lots of exceptions detailed there, but it's harder to make a homoglyph attack work using only characters from one script to impersonate another.
That's not a convention, it's a specification for how Google Chrome does it.
And it's not even a full specification. Several of its 13 steps link to other documents that need to be read to implement the spec fully. Step 12 refers to a list of "dangerous patterns" which appears only to exist in the Chromium source. Step 5 refers vaguely to "any characters used in an unusual way".
It's not OK to say that because Chromium does it, it's some internet standard that random website maintainers should implement.
I think you're ignoring the conversation. There is a lot of discussion to be had, and we don't have to say that decoding punycode is a security risk and simply do without. I also said "conventions" specifically to avoid meaning that these are hard-and-fast rules. And Firefox does something pretty similar. https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm
Someone always says this when a punycode link shows up.
I'm glad they don't. What you see? That's the link. It's what the browser sends, it's what DNS resolves: it's the link. Displaying it as Unicode is just a display option, and it's one which opens up all manner of mischief through confusables.
It's a hacker culture choice, and it's one I appreciate.
On the other hand, that's a rather ango-centric viewpoint.
It is! So kind of you to notice. Perhaps you could also notice that English is the language used on Hacker News.
I'm quite sure a website centered in a different cultural landscape might choose a different convention. Good for them, I say.
If URLs start being Unicode, and not an ASCII encoding which is sometimes displayed as Unicode, that would be a different story. But that's not how things are.
A nice synchronicity here, I was only checking Māori words today because The Guardian's cryptic crossword was set by "Pangakupu" (which means, logically enough, "crossword"). This crossword setter always includes a hidden Māori word or phrase in the puzzle.