Comment by jongjong

Comment by jongjong 2 days ago

26 replies

Interesting. All developers I know who tinkered around with embeddings and vector similarity scoring were instantly hooked. The efficiency of computing the embeddings once and then reusing as many times as needed, comparing the vectors with a cheap <30-line function is extremely appealing. Not to mention the indexing capabilities to make it work at scale.

IMO vector embedding is the most important innovation in computing of the last decade. There's something magical about it. These people deserve some kind of prize. The idea that you can reduce almost any intricate concept including whole paragraphs to a fixed-size vector which encapsulates its meaning and proximity to other concepts across a large number of dimensions is pure genius.

_jayhack_ 2 days ago

Vector embedding is not an invention of the last decade. Featurization in ML goes back to the 60s - even deep learning-based featurization is decades old at a minimum. Like everything else in ML this became much more useful with data and compute scale

liampulles 2 days ago

If you take the embedding for king, subtract the embedding for male, add the embedding for female, and lookup the closest embedding you get queen.

The fact that dot product addition can encode the concept of royalty and gender (among all other sorts) is kind of magic to me.

  • puttycat 2 days ago

    This was actually shown to not really work in practice.

    • intelkishan 2 days ago

      I have seen this particular work example to work. You don't get the exact match but the closest one is indeed Queen.

      • godelski 2 days ago

        Yes but it doesn't generalize very well. Even on simple features like gender. If you go look at embeddings you'll find that man and woman are neighbors, just as king and queen are[0]. This is a better explanation for the result as you're just taking very small steps in the latent space.

        Here, play around[1]

          mother - parent + man = woman
          father - parent + woman = man
          father - parent + man = woman
          mother - parent + woman = man
          woman - human + man = girl
        
        Or some that should be trivial

          woman - man + man = girl
          man - man + man = woman
          woman - woman + woman = man
          
        Working in very high dimensions is funky stuff. Embedding high dimensions into low dimensions results in even funkier stuff

        [0] https://projector.tensorflow.org/

        [1] https://www.cs.cmu.edu/~dst/WordEmbeddingDemo/

      • mirekrusin 2 days ago

        Shouldn't this itself be a part of training?

        Having set of "king - male + female = queen" like relations, including more complex phrases to align embeddings.

        It seems like terse, lightweight, information dense way to address essence of knowldge.

ekidd 2 days ago

Vector embeddings are slightly interesting because they come pre-trained with large amounts of data.

But similar ways to reduce huge numbers of dimensions to a much smaller set of "interesting" dimensions have been known for a long time.

Examples include principal component analysis/single value decomposition, which was the first big breakthrough in face recognition (in the early 90s), and also used in latent semantic indexing, the Netflix prize, and a large pile of other things. And the underlying technique was invented in 1901.

Dimensionality reduction is cool, and vector embedding is definitely an interesting way to do it (at significant computational cost).

CuriouslyC 2 days ago

Vector embeddings are so overhyped. They're decent as a secondary signal, but they're expensive to compute and fragile. BM25 based solutions are more robust and WAY lower latency, at the cost of some accuracy loss vs hybrid solutions. You can get the majority of the lift from hybrid solutions with ingest time semantic expansion/reverse hyde type input annotation with a sparse embedding BM25 at a fraction of the computational cost.

  • jongjong 2 days ago

    But it's much cheaper to compute than inference, and also you only have to compute once for any content and reuse multiple times.

calf 2 days ago

The idea of reducing language to mere bits, in general, sounds like it would violate the Godel/Turing theorems about computability.