[removed] 3 hours ago
[deleted]
flufluflufluffy an hour ago

I read their website landing page but it’s still kinda confusing — what exactly is readyset? It all sounds like it’s a cache you can set up in front of MySQL/postgres. But then this article is talking about implementing joins which is what the database itself would do, not a cache. But then the blurbs talk about it like it’s a “CDN for your database” that brings your data to the edge. What the heck is it?!

  • Sesse__ 44 minutes ago

    It seems to be some sort of read-only reimplementation of MySQL/Postgres that can ingest their replication streams and materialize views (for caching). Complete with a really primitive optimizer, if the article is to be believed.

dangoodmanUT an hour ago

Maybe it's not obvious initially, but in retrospect, this handling of joins feels like the obvious way to handle it.

Push down filters to read the least data possible.

Or, know your data and be able to tell the query engine which kind of join strategy you would like (hash vs push down)

  • [removed] an hour ago
    [deleted]
  • SoftTalker an hour ago

    Decades ago we used to provide hints in queries based on "knowing the data" but modern optimizers have a lot better statistics on indexes, and the need to tell the query optimizer what to do should be rare.

bdcravens 3 hours ago

What database engine is this in? You reference your product, but I assume this is in MySQL/MariaDB?

https://dev.mysql.com/doc/refman/9.4/en/index-condition-push...

  • Sesse__ 3 hours ago

    This isn't really the same as MySQL's ICP; it seems more like what MySQL would call a “ref” or “eq_ref” lookup, i.e. a simple lookup on an indexed value on the right side of a nested-loop join. It's bread and butter for basically any database optimizer.

    ICP in MySQL (which can be built on top of ref/eq_ref, but isn't part of the normal index lookup per se) is a fairly weird concept where the storage engine is told to evaluate certain predicates on its own, without returning the row to the optimizer. This is to a) reduce the number of round-trips (function calls) from the executor down into the storage engine, and b) because InnoDB's secondary indexes need an extra storage round-trip to return the row (secondary indexes don't point at the row you want, they contain the primary key and then you have to lookup the actual row from the PK), so if you can remove the row early, you can skip the main row lookup.

  • LtdJorge 2 hours ago

    Seems like they are caching MySQL with their own layer built on RocksDB.

[removed] 3 hours ago
[deleted]
vjerancrnjak 2 hours ago

Another example of row based dbs somehow being insanely slow compared to column based.

Just an endless sequence of misbehavior and we’re waving it off as rows work good for specific lookups but columns for aggregations, yet here it is all the other stuff that is unreasonably slow.

  • tharkun__ 2 hours ago

    It's an example. But not of that.

    It's an example of old things being new again maybe. Or reinventing the wheel because the wheel wasn't known to them.

    Yes I know, nobody wants to pay that tax or make that guy richer, but databases like Oracle have had JPPD for a long time. It's just something the database does and the optimizer chooses whether to do it or not depending on whether it's the best thing to do or not.

    • rotis an hour ago

      Exactly. This is a basic optimization technique and all the dinosaur era databases should have that. But if you build a new database product you have to implement these techniques from scratch. There is no way you shortcut that. Reminds me about CockroachDB and them building a query optimizer[1]. They started with rule based one and then switched to cost based. Feature that older databases already had.

      [1] https://www.cockroachlabs.com/blog/building-cost-based-sql-o...

  • sschnei8 2 hours ago

    I feel like this is more an example of:

    “We filtered first instead of reading an entire table from disk and performing a lookup”

    Where both OLAP and OLTP dbms would benefit.

    To your point, it’s clear certain workloads lend themselves to OLAP and columnar storage much better, but “an endless sequence of misbehavior” seems a bit harsh .

marceloaltmann 4 days ago

Straddled joins were still a bottleneck in Readyset even after switching to hash joins. By integrating Index Condition Pushdown into the execution path, we eliminated the inefficiency and achieved up to 450× speedups.

  • LtdJorge 2 hours ago

    Why downvote?

    • airstrike 2 hours ago

      Reads like an ad written by an LLM, is my guess.

      It could just be that they translated from their original language to English and got that as a byproduct. Many such cases.

      • Sesse__ 37 minutes ago

        It also does not add anything interesting to the discussion. Like, why add a bland summary of the article?