Comment by simonw

Comment by simonw 12 hours ago

9 replies

That's not YAGNI backfiring.

The point of YAGNI is that you shouldn't over-engineer up front until you've proven that you need the added complexity.

If you need vector search against 100,000 vectors and you already have PostgreSQL then pgvector is a great YAGNI solution.

10 million vectors that are changing constantly? Do a bit more research into alternative solutions.

But don't go integrating a separate vector database for 100,000 vectors on the assumption that you'll need it later.

Fripplebubby 10 hours ago

I think the tricky thing here is that the specific things I referred to (real time writes and pushing SQL predicates into your similarity search) work fine at small scale in such a way that you might not actually notice that they're going to stop working at scale. When you have 100,000 vectors, you can write these SQL predicates (return the 5 top hits where category = x and feature = y) and they'll work fine up until one day it doesn't work fine anymore because the vector space has gotten large. So, I suppose it is fair to say this isn't YAGNI backfiring, this is me not recognizing the shape of the problem to come and not recognizing that I do, in fact, need it (to me that feels a lot like YAGNI backfiring, because I didn't think I needed it, but suddenly I do)

  • morshu9001 9 hours ago

    If the consequence of being wrong about the scalability is that you just have to migrate later instead of sooner, that's a win for YAGNI. It's only a loss if hitting this limit later causes service disruption or makes the migration way harder than if you'd done it sooner.

    • simonw 9 hours ago

      And honestly, even then YAGNI might still win.

      There's a big opportunity cost involved in optimizing prematurely. 9/10 times you're wasting your time, and you may have found product-market fit faster if you had spent that time trying out other feature ideas instead.

      If you hit a point where you have to do a painful migration because your product is succeeding that's a point to be celebrated in my opinion. You might never have got there if you'd spent more time on optimistic scaling work and less time iterating towards the right set of features.

      • Fripplebubby 9 hours ago

        I think I see this point now. I thought of YAGNI as, "don't ever over-engineer because you get it wrong a lot of the time" but really, "don't over-engineer out of the gate and be thankful if you get a chance to come back and do it right later". That fits my case exactly, and that's what we did (and it wasn't actually that painful to migrate).

      • morshu9001 8 hours ago

        Yeah the "only if" is more like a "necessary, not sufficient." The future migration pain had better be extremely bad to worry about it so far in advance.

        Or it should be a well defined problem. It's easier to determine the right solution after you've already encountered the problem, maybe in a past project. If you're unsure, just keep your options open.

        • simonw 7 hours ago

          A few years ago I coined the term PAGNI for "Probably Are Gonna Need It" to cover things that are worth putting in there from the start because they're relatively cheap to implement early but quite expensive to add later on: https://simonwillison.net/2021/Jul/1/pagnis/

  • hobofan 9 hours ago

    > When you have 100,000 vectors [...] and they'll work fine

    So 95% of use-cases.