450× Faster Joins with Index Condition Pushdown

jamesblonde 3 hours ago

We call these pushdown joins in rondb. They only support an equality condition for the index condition. Joins with index condition pushdown is a bit of a mouthful.

We also went from like 6 seconds to 50ms. Huge speedup.

Reference

https://docs.rondb.com/rondb_parallel_query/#pushdown-joins

Reply View 0 replies

[removed] 3 hours ago

[deleted]

Reply View 0 replies

flufluflufluffy an hour ago

I read their website landing page but it’s still kinda confusing — what exactly is readyset? It all sounds like it’s a cache you can set up in front of MySQL/postgres. But then this article is talking about implementing joins which is what the database itself would do, not a cache. But then the blurbs talk about it like it’s a “CDN for your database” that brings your data to the edge. What the heck is it?!

Reply View 1 reply

Sesse__ 44 minutes ago

It seems to be some sort of read-only reimplementation of MySQL/Postgres that can ingest their replication streams and materialize views (for caching). Complete with a really primitive optimizer, if the article is to be believed.

Reply View | 0 replies

ianks 3 hours ago

I love this type of practical optimization for DB queries. I’ve always liked how [rom-rb](https://rom-rb.org/learn/core/5.2/combines/) made the combine pattern easy to use when joins are slow. Nice to see this implemented at DB layer

Reply View 0 replies

dangoodmanUT an hour ago

Maybe it's not obvious initially, but in retrospect, this handling of joins feels like the obvious way to handle it.

Push down filters to read the least data possible.

Or, know your data and be able to tell the query engine which kind of join strategy you would like (hash vs push down)

Reply View 2 replies

[removed] an hour ago

[deleted]

Reply View | 0 replies
SoftTalker an hour ago

Decades ago we used to provide hints in queries based on "knowing the data" but modern optimizers have a lot better statistics on indexes, and the need to tell the query optimizer what to do should be rare.

Reply View | 0 replies

bdcravens 3 hours ago

What database engine is this in? You reference your product, but I assume this is in MySQL/MariaDB?

https://dev.mysql.com/doc/refman/9.4/en/index-condition-push...

Reply View 2 replies

Sesse__ 3 hours ago

This isn't really the same as MySQL's ICP; it seems more like what MySQL would call a “ref” or “eq_ref” lookup, i.e. a simple lookup on an indexed value on the right side of a nested-loop join. It's bread and butter for basically any database optimizer.
ICP in MySQL (which can be built on top of ref/eq_ref, but isn't part of the normal index lookup per se) is a fairly weird concept where the storage engine is told to evaluate certain predicates on its own, without returning the row to the optimizer. This is to a) reduce the number of round-trips (function calls) from the executor down into the storage engine, and b) because InnoDB's secondary indexes need an extra storage round-trip to return the row (secondary indexes don't point at the row you want, they contain the primary key and then you have to lookup the actual row from the PK), so if you can remove the row early, you can skip the main row lookup.

Reply View | 0 replies
LtdJorge 2 hours ago

Seems like they are caching MySQL with their own layer built on RocksDB.

Reply View | 0 replies

[removed] 3 hours ago

[deleted]

Reply View 0 replies

vjerancrnjak 2 hours ago

Another example of row based dbs somehow being insanely slow compared to column based.

Just an endless sequence of misbehavior and we’re waving it off as rows work good for specific lookups but columns for aggregations, yet here it is all the other stuff that is unreasonably slow.

Reply View 3 replies

tharkun__ 2 hours ago

It's an example. But not of that.
It's an example of old things being new again maybe. Or reinventing the wheel because the wheel wasn't known to them.
Yes I know, nobody wants to pay that tax or make that guy richer, but databases like Oracle have had JPPD for a long time. It's just something the database does and the optimizer chooses whether to do it or not depending on whether it's the best thing to do or not.

Reply View | 1 reply
- rotis an hour ago
  
  Exactly. This is a basic optimization technique and all the dinosaur era databases should have that. But if you build a new database product you have to implement these techniques from scratch. There is no way you shortcut that. Reminds me about CockroachDB and them building a query optimizer[1]. They started with rule based one and then switched to cost based. Feature that older databases already had.
  [1] https://www.cockroachlabs.com/blog/building-cost-based-sql-o...
  
  Reply View | 0 replies
sschnei8 2 hours ago

I feel like this is more an example of:
“We filtered first instead of reading an entire table from disk and performing a lookup”
Where both OLAP and OLTP dbms would benefit.
To your point, it’s clear certain workloads lend themselves to OLAP and columnar storage much better, but “an endless sequence of misbehavior” seems a bit harsh .

Reply View | 0 replies

marceloaltmann 4 days ago

Straddled joins were still a bottleneck in Readyset even after switching to hash joins. By integrating Index Condition Pushdown into the execution path, we eliminated the inefficiency and achieved up to 450× speedups.

Reply View 3 replies

LtdJorge 2 hours ago

Why downvote?

Reply View | 2 replies
- airstrike 2 hours ago
  
  Reads like an ad written by an LLM, is my guess.
  It could just be that they translated from their original language to English and got that as a byproduct. Many such cases.
  
  Reply View | 1 reply
  
  Sesse__ 37 minutes ago
  
  It also does not add anything interesting to the discussion. Like, why add a bland summary of the article?
  
  Reply View | 0 replies