Comment by pessimizer
Comment by pessimizer 13 hours ago
Almost every English word is French, except for the most important ones.
Comment by pessimizer 13 hours ago
Almost every English word is French, except for the most important ones.
The food is French, the animal is Anglo Saxon. At least English lacks compound words or whatever German calls those 30-character constructions.
> At least English lacks compound words or whatever German calls those 30-character constructions.
Not entirely true. English, as any other Germanic language, still likes to compound words to produce a new meaning, the main difference is that, as opposed to most other Germanic languages, spaces are usually retained in writing. But this is just a spelling difference, the underlying process is the same.
Does that mean, that "compound word" counts as a single word? And how do I distinguish between "a" "compound" "word" and "a" "compound word"?
Depends on your definition of a word and how it relates to writing. It's not such a simple question, actually.
Let's consider "scheepskapitein" and "ship captain". Both are formed the exactly same way and mean roughly the same thing, but it's customary in Dutch to spell it without a space and in English it's considered correct to have a space between them. Note, that there are no spaces in speech, it's simply a writing convention. So, how many words are there in this example?
"Cattle labeling meat labeling supervision task transfer act" is just as bad as Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz, English just gets to use spaces where German doesn't. The underlying construction is the same. (I definitively got that translation wrong)
English gets to use a sentence. It can be reworded any number of ways. I did a bit of quick googling and the clearest English I came up with for `Regulation (EC) No 1760/2000` is "Requirements for the Labelling of Minced Beef" which is a lot easier to process than Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz. The reason we split code over lines is the same reason we split sentences into words. Easier for the brain to parse.
I wonder do German brains work on a much longer context window because of the language?
> I wonder do German brains work on a much longer context window because of the language?
Maybe, but more due to the spelling of numbers and long sentences. Compound words are not an example of this, since Germans can parse these words just fine as different things. It just means that the lowest "tokenization" in everyday use is not the word, but subcomponents of them.
Do English native speakers "tokenize" expressions in words? Do you see it as '(labelling) (of) (minced)' or '(label)l(ing) (of) (minc)(ed)' ?
I can't speak for most Germans, but the algorithm I think I use is just greedy from left to right. This is also consistent with how mistokenization in common puns works, so I think this is common.
In primary school we trained to recognize syllable boundaries. Is that just a German thing, or is this common in other countries? You need to know these for spelling and once you know these separating word components is trivial.
a) the title of the regulation is not equivalent to the law (unsurprisingly), onestay42's translation is clunky but a lot closer
b) the official title of the law was "Gesetz zur Übertragung der Aufgaben für die Überwachung der Rinderkennzeichnung und Rindfleischetikettierung", so how again is it that English "gets to use a sentence" and German doesn't? German has the choice depending on context, sometimes having one word is convenient.
Usually English will try to come up with a single, Latin-or-Greek-derived word for compound ideas like this, which is another bad habit.
So surgery is full of -ectomies instead of -cut-outs.
Touche