Comment by nostrebored
Comment by nostrebored 9 hours ago
Your graphs are measuring accuracy [1] (i'm assuming precision?), not recall? My impression is that your approach would miss surfacing potentially relevant candidates, because that is the tradeoff IVF makes for memory optimization. I'd expect that this especially struggles with high dim vectors and large datasets.
[1] https://cdn.hashnode.com/res/hashnode/image/upload/v17434120...
It's recall. Thanks for pointing out this, we'll update the diagram.
The core part is a quantization technique called RaBitQ. We can scan over the bit vector to have an estimation about the real distance between query and data. I'm not sure what do you mean by "miss" here. As the approximate nearest neighbor index, all the index including HNSW will miss some potential candidates.