Comment by sschnei8

Comment by sschnei8 11 hours ago

I feel like this is more an example of:

“We filtered first instead of reading an entire table from disk and performing a lookup”

Where both OLAP and OLTP dbms would benefit.

To your point, it’s clear certain workloads lend themselves to OLAP and columnar storage much better, but “an endless sequence of misbehavior” seems a bit harsh .

vjerancrnjak 8 hours ago

It's not harsh.

Recent example, have 1GB of data in total across tables. Query needs 20 minutes. Obvious quadratic/cubic-or-even-worse behavior.

I disable nested loop join and it's 4 seconds. Still slow, but don't want to spend time figuring out why it's slower than reading 1GB of data and pipelining the computation so that it's just 1 second, or even faster given the beefy NVME where files are stored (ignoring that I actually have good indices and the surface area of the query is probably 10MB and not 1GB).

How can the strategy be slower than downloading 1GB of data and gluing it together in Python?

Something is just off with the level of abstraction, query planner relying on weird stats. The whole system, outside of its transactional guarantees, just sucks.

Another example where materializing CTE reduces exec time from 2 seconds to 50ms, because then you somehow hint to the query planner that result of that CTE is small.

So even PostgreSQL is filled with these endless riddles in misbehavior, even though PhDs boast about who knows what in the query optimizer and will make an effort to belittle my criticism by repeating the "1 row" vs "agg all rows" as if I'm in elementary school and don't know how to use both OLTP or OLAP systems.

Unlike column dbs where I know it's some nice fused group-by/map/reduce behavior where I avoid joins like plague and there's no query planner, stats maintenance, indices, or other mumbo-jumbo that does not do anything at all most of the time.

Most of my workloads are extremely tiny and I am familiar with how to structure schemas for OLTP and OLAP and I just dislike how most relational databases work.

Reply View 2 replies

atombender 5 hours ago

I think part of the problem is that the people working on Postgres for the most part aren't PhDs, and Postgres isn't very state of the art.
Postgres implements the ancient Volcano model from the 1980s, but there's been a ton of query optimization research since then, especially from the database groups at TUM Munich, University of Washington, and Carnegie Mellon. Systems like HyPer and Umbra (both at TUM) are state of the art query planners that would eat Postgres' lunch. Lots of work on making planners smarter about rearranging joins to be more optimal, improving cache locality and buffer management, and so on.
Unfortunately, changing an old Volcano planner and applying newer techniques would probably be a huge endeavor.

Reply View | 0 replies
thr0w 6 hours ago

> I disable nested loop join and it's 4 seconds.
I feel your pain. I've been through all stages of grief with `enable_nestloop`. I've arrived at acceptance. Sometimes you just need to redo your query approach. Usually by the time I get the planner to behave, I've ended up with something that's expressed more simply to boot.

Reply View | 0 replies