Comment by nostrebored

Comment by nostrebored 13 hours ago

View on Hacker News

So you’re quantizing and using IVF — what are your recall numbers with actual use cases?

VoVAllen 11 hours ago

We do have some benchmark number at https://blog.vectorchord.ai/vector-search-over-postgresql-a-.... It varies on different dataset, but most cases it's 2x or more QPS comparing to pgvector's hnsw at same recall.

Reply View 2 replies

nostrebored 9 hours ago

Your graphs are measuring accuracy [1] (i'm assuming precision?), not recall? My impression is that your approach would miss surfacing potentially relevant candidates, because that is the tradeoff IVF makes for memory optimization. I'd expect that this especially struggles with high dim vectors and large datasets.
[1] https://cdn.hashnode.com/res/hashnode/image/upload/v17434120...

Reply View | 1 reply
- VoVAllen 9 hours ago
  
  It's recall. Thanks for pointing out this, we'll update the diagram.
  The core part is a quantization technique called RaBitQ. We can scan over the bit vector to have an estimation about the real distance between query and data. I'm not sure what do you mean by "miss" here. As the approximate nearest neighbor index, all the index including HNSW will miss some potential candidates.
  
  Reply View | 0 replies