Comment by tacoooooooo
Comment by tacoooooooo 13 hours ago
some fair points points on the specifics.
> maintenance_work_mem
sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.
> REINDEX CONCURRENTLY
this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.
> HNSW vs B+tree
it's not that graph updates are uniquely expensive. vector workloads have different characteristics than traditional OLTP, and pg wasn't originally designed for them
my broader point: these features exist, but using them correctly requires significant Postgres expertise. my thesis isn't "Postgres lacks features"—it's "most teams underestimate the operational complexity." dedicated vector DBs handle this automatically, and are often going to be much cheaper than the dev time put into maintaining pgvector (esp. for a small team)
> sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.
How does it not? You should know the amount of freeable memory your DB has, and a rough idea of peak requirements. Give the index build some amount below that.
> this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.
Yes, those are the trade-offs for not locking the table during the entire build. They’re generally considered acceptable.
> it's "most teams underestimate the operational complexity.
Agreed, which is why I don’t think dev teams should be running DBs if they lack expertise. Managed solutions (for Postgres; no idea on Pinecone et al.) only remove backup and failover complexity; tuning various parameters and understanding the optimizer’s decisions are still wholly on the human. RDBMS are some of the most complicated pieces of software that exist, and it’s absurd that the hyperscalers pretend that they aren’t.