Comment by sschnei8
I feel like this is more an example of:
“We filtered first instead of reading an entire table from disk and performing a lookup”
Where both OLAP and OLTP dbms would benefit.
To your point, it’s clear certain workloads lend themselves to OLAP and columnar storage much better, but “an endless sequence of misbehavior” seems a bit harsh .
It's not harsh.
Recent example, have 1GB of data in total across tables. Query needs 20 minutes. Obvious quadratic/cubic-or-even-worse behavior.
I disable nested loop join and it's 4 seconds. Still slow, but don't want to spend time figuring out why it's slower than reading 1GB of data and pipelining the computation so that it's just 1 second, or even faster given the beefy NVME where files are stored (ignoring that I actually have good indices and the surface area of the query is probably 10MB and not 1GB).
How can the strategy be slower than downloading 1GB of data and gluing it together in Python?
Something is just off with the level of abstraction, query planner relying on weird stats. The whole system, outside of its transactional guarantees, just sucks.
Another example where materializing CTE reduces exec time from 2 seconds to 50ms, because then you somehow hint to the query planner that result of that CTE is small.
So even PostgreSQL is filled with these endless riddles in misbehavior, even though PhDs boast about who knows what in the query optimizer and will make an effort to belittle my criticism by repeating the "1 row" vs "agg all rows" as if I'm in elementary school and don't know how to use both OLTP or OLAP systems.
Unlike column dbs where I know it's some nice fused group-by/map/reduce behavior where I avoid joins like plague and there's no query planner, stats maintenance, indices, or other mumbo-jumbo that does not do anything at all most of the time.
Most of my workloads are extremely tiny and I am familiar with how to structure schemas for OLTP and OLAP and I just dislike how most relational databases work.