Comment by onestay42

Comment by onestay42 18 hours ago

8 replies

"Cattle labeling meat labeling supervision task transfer act" is just as bad as Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz, English just gets to use spaces where German doesn't. The underlying construction is the same. (I definitively got that translation wrong)

arkensaw 11 hours ago

English gets to use a sentence. It can be reworded any number of ways. I did a bit of quick googling and the clearest English I came up with for `Regulation (EC) No 1760/2000` is "Requirements for the Labelling of Minced Beef" which is a lot easier to process than Rinderkennzeichnungsfleischetikettierungsüberwachungsaufgabenübertragungsgesetz. The reason we split code over lines is the same reason we split sentences into words. Easier for the brain to parse.

I wonder do German brains work on a much longer context window because of the language?

  • 1718627440 8 hours ago

    > I wonder do German brains work on a much longer context window because of the language?

    Maybe, but more due to the spelling of numbers and long sentences. Compound words are not an example of this, since Germans can parse these words just fine as different things. It just means that the lowest "tokenization" in everyday use is not the word, but subcomponents of them.

    Do English native speakers "tokenize" expressions in words? Do you see it as '(labelling) (of) (minced)' or '(label)l(ing) (of) (minc)(ed)' ?

    I can't speak for most Germans, but the algorithm I think I use is just greedy from left to right. This is also consistent with how mistokenization in common puns works, so I think this is common.

    In primary school we trained to recognize syllable boundaries. Is that just a German thing, or is this common in other countries? You need to know these for spelling and once you know these, separating word components becomes trivial.

  • detaro 11 hours ago

    a) the title of the regulation is not equivalent to the law (unsurprisingly), onestay42's translation is clunky but a lot closer

    b) the official title of the law was "Gesetz zur Übertragung der Aufgaben für die Überwachung der Rinderkennzeichnung und Rindfleischetikettierung", so how again is it that English "gets to use a sentence" and German doesn't? German has the choice depending on context, sometimes having one word is convenient.

    • arkensaw 10 hours ago

      I'm not a German speaker. Why would someone use such a long word as a convenience?

      • 1718627440 9 hours ago

        I am. It is a semantic difference. Single entities get referred to by a single word. If you use a word group to describe it, it means you don't consider it a single "thing", but a "system" described by the relations of single "things".

        The composed word also has a specific meaning that the same words with space between doesn't. For example "das rote Kraut" – "red herb" and "das Rotkraut" – "red cabbage". Also suppose "red cabbage" was grown in abnormal conditions, so it doesn't have the color pigments, it is still "red cabbage", but not "red" "cabbage". This is awkward to state in English, but no problem in German.

magarnicle 17 hours ago

Usually English will try to come up with a single, Latin-or-Greek-derived word for compound ideas like this, which is another bad habit.

So surgery is full of -ectomies instead of -cut-outs.

  • 1718627440 8 hours ago

    Medicine terms in German also use Latin or Greek, since this is the subject language, so this is a bad example.

bmacho 12 hours ago

Maybe in speech they are similar, but not in writing. The underlying construction is as different as it can be. English puts " " between words, and German does not.