Comment by perfmode

Comment by perfmode 6 months ago

> These vectors are quite long - text-embedding-3-large has up 3072 dimensions - to the point that we can truncate them at a minimal loss of quality.

Would it be beneficial to use dimensionality reduction instead of truncating? Or does “truncation” mean dimensionality reduction in this context?

sc077y 6 months ago

The way that the embedding is done is using Matryoshka Representation Learning, truncating it allows to compress while losing as little meaning as possible. In some sense it's like dimensionality reduction.

Reply View 0 replies

marginalia_nu 6 months ago

An argument could be made truncation is a sort of random projection, though it probably depends on how the embedding was created, and a more textbook random projection is likely going to be more robust.

Reply View 0 replies