stared 4 days ago

It's nice to hear that! And from this thread, it is not us only two—otherwise, the title wouldn't have resonated with the Hacker News community.

This blog post stemmed from my frustration that people use cosine distance without a second thought. In virtually all tutorials on vector databases, cosine distance is treated as if it were some obvious ground truth.

When questioned about cosine similarity, even seasoned data scientists will start talking about "the curse of dimensionality" or some geometric interpretations but forget that (more than often) they work with a hack.

  • anArbitraryOne 3 days ago

    Your post was much better than my stupid comment, and I like the points you articulated. Cheers.

nejsjsjsbsb 4 days ago

You called it! But it is a pattern as old as the hills in the software industry. "Just add an index". "Put it in the cloud" "Do sprints". One size fits all!

khafra 4 days ago

That was a helpful list, in your second comment downthread. What are your top 3 metrics that perform the best on the greatest number of those features that make cosine distance perform poorly?

  • anArbitraryOne 3 days ago

    Good question. Unfortunately, I'm just a keyboard warrior asshole that bad mouths things without offering solutions