Comment by tacoooooooo

Comment by tacoooooooo 13 hours ago

1 reply

some fair points points on the specifics.

> maintenance_work_mem

sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.

> REINDEX CONCURRENTLY

this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.

> HNSW vs B+tree

it's not that graph updates are uniquely expensive. vector workloads have different characteristics than traditional OLTP, and pg wasn't originally designed for them

my broader point: these features exist, but using them correctly requires significant Postgres expertise. my thesis isn't "Postgres lacks features"—it's "most teams underestimate the operational complexity." dedicated vector DBs handle this automatically, and are often going to be much cheaper than the dev time put into maintaining pgvector (esp. for a small team)

sgarland 8 hours ago

> sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.

How does it not? You should know the amount of freeable memory your DB has, and a rough idea of peak requirements. Give the index build some amount below that.

> this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.

Yes, those are the trade-offs for not locking the table during the entire build. They’re generally considered acceptable.

> it's "most teams underestimate the operational complexity.

Agreed, which is why I don’t think dev teams should be running DBs if they lack expertise. Managed solutions (for Postgres; no idea on Pinecone et al.) only remove backup and failover complexity; tuning various parameters and understanding the optimizer’s decisions are still wholly on the human. RDBMS are some of the most complicated pieces of software that exist, and it’s absurd that the hyperscalers pretend that they aren’t.