Comment by godelski

Comment by godelski a day ago

9 replies

Calling it addition is hairy here. Do you just mean an operator? If so, I'm with you. But normally people are expecting addition to have the full abelian group properties, which this certainly doesn't. It's not a ring because it doesn't have the multiplication structure. But it also isn't even a monoid[0] since, as we just discussed, it doesn't have associativity nor unitality.

There is far less structure here than you are assuming, and that's the underlying problem. There is local structure and so the addition operation will work as expected when operating on close neighbors, but this does greatly limit the utility.

And if you aren't aware of the terms I'm using here I think you should be extra careful. It highlights that you are making assumptions that you weren't aware were even assumptions (an unknown unknown just became a known unknown). I understand that this is an easy mistake to make since most people are not familiar with these concepts (including many in the ML world), but this is also why you need to be careful. Because even those that do are probably not going to drop these terms when discussing with anyone except other experts as there's no expectation that others will understand them.

[0] https://ncatlab.org/nlab/show/monoid

yellowcake0 a day ago

I think you misinterpreted the tone of my original comment as some sort of gotcha. Presumably you're overloading the addition symbol with some other operational meaning in the context of vector embeddings. I'm just calling it addition because you're using a plus sign and I don't know what else to call it, I wasn't referring to addition as it's commonly understood which is clearly associative.

  • danielmarkbruce a day ago

    You guys are debating this as though embedding models and/or layers work the same way. They don't.

    Vector addition is absolutely associative. The question is more "does it magically line up with what sounds correct in a semantic sense?".

    • yellowcake0 a day ago

      I'm just trying to get an idea of what the operation is such that man - man + man = woman, but it's like pulling teeth.

      • danielmarkbruce 18 hours ago

        It's just plain old addition. There is nothing fancy about the operation. The fancy part is training a model such that it would produce vector representations of words which had this property of conceptually making sense.

        If someone says: "conceptually, what is king - man + woman". One might reasonably say "queen". This isn't some well defined math thing, just sort of a common sense thing.

        Now, imagine you have a function (lets call it an "embedding model") which turns words into vectors. The function turns king into [3,2], man into [1,1], woman into [1.5, 1.5] and queen into [3.5, 2.5].

        Now for king - man + woman you get [3,2] - [1,1] + [1.5,1.5] = [3.5, 2.5] and hey presto, that's the same as queen [3.5, 2.5].

        Now you have to ask - how do you get a function to produce those numbers? If you look at the word2vec paper, you'll come to see they use a couple of methods to train a model and if you think about those methods and the data, you'll realize it's not entirely surprising (in retrospect) that you could end up with a function that produced vectors which had such properties. And, if at the same time you are sort of mind blown, welcome to the club. It blew Jeff Dean's big brain too.