Comment by rnikander

Comment by rnikander a day ago

7 replies

Some Clojure fans once told me they thought datalog was better than SQL and it was a shame that the relational DBs all used SQL. I never dug into it enough to find out why they thought that way.

kragen 17 hours ago

Basically Datalog is much less verbose than SQL, imposes much lighter tariffs on factoring out views, and supports transitive closure enormously better. I started http://canonical.org/~kragen/binary-relations off with a simple nonrecursive query for which the SQL translation (given below) is already criminal and whose properly factored SQL solution merits the death penalty.

Recent additions to ANSI SQL have added the capacity for recursion, so it's no longer completely impossible. But they have three big disadvantages:

1. They accidentally made SQL Turing-complete. Datalog queries, by contrast, are guaranteed to terminate.

2. They're still extremely clumsy to use.

3. Because of #1, they're often not implemented fully, so they are hard to rely on.

  • twoodfin 27 minutes ago

    Curious: How would you usefully & naturally add recursion to SQL without making it Turing-complete?

  • ulrikrasmussen 12 hours ago

    Yes, #1 basically means that they screwed up the design from the get go, since it is impossible to reap the actual benefits of Datalog when the language you evaluate is not, in fact, Datalog. Recursive queries have the ability to perform arbitrary computation in projections, so for starters any top-down evaluation strategy or hybrid evaluation such as magic sets is ruled out.

jitl 19 hours ago

I struggle to understand the Clojure/Datomic dialect, but I agree generally. I recommend Percival for playing around with Datalog in a friendly notebook environment online: https://percival.ink/

Although there’s no “ANSI SQL” equivalent standard across Datalog implementations, once you get a hang of the core idea it’s not too hard to understand another Datalog.

I started a Percival fork that compiles the Datalog to SQLite, if you want to check out how the two can express the same thing: https://percival.jake.tl/ (unfinished when it comes to aggregates and more advanced joins but the basic forms work okay). Logica is a much more serious / complete Datalog->SQL compiler written by a Google researcher that compiles to BigTable, DuckDB, and a few other SQL dialects (https://logica.dev/).

One area Datalog is an order of magnitude easier is when working with recursive queries / rules; this is possible in SQL but feels a bit like drinking playdough through a straw. Frank’s Materialize.com has a “WITH MUTUALLY RECURSIVE” SQL form (https://materialize.com/blog/recursion-in-materialize/) that’s much nicer than the ancient ANSI SQL recursive approach, we’re evaluating it for page load queries & data sync at Notion.

Feldera has a similar form for recursive views as well (https://www.feldera.com/blog/recursive-sql-queries-in-felder...). I like that Feldera lets you make each “rule” or subview its own statement rather than needing to pack everything into a single huge statement. Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite, the Materialize SQL dialect tries very hard to be PostgresSQL compatible.

  • ben_pfaff 17 hours ago

    > Main downside I found when testing Feldera is that their SQL dialect has a bunch of limitations inherited from Apache Calcite

    At Feldera, we're adding features to our SQL over time, by contributing them upstream to Calcite, making it better for everyone. Mihai Budiu, who is the author of the Feldera SQL compiler, is a Calcite committer.

    • jitl 14 hours ago

      Thanks for contributing. I see Mihai implemented the UUID type in Calcite (https://issues.apache.org/jira/browse/CALCITE-6738) back in January which is one of the issues I hit, so for sure my experience with Feldera is 6 months out of date and y'all move pretty quick.

      Most of what I mean is places where Feldera/Calcite has slightly different syntax from Postgres for things. For example, Postgres syntax for cast to bigint is `some_expresion::bigint` although Postgres also supports ANSI SQL `CAST(some_expression AS bigint)`, most examples I find in the wild and in my own Postgres SQL use the Postgres special syntax. JSON syntax also differs; Feldera uses its own pretty elegant `VARIANT` type and `some_expression[key_expression]` to access properties, where Postgres calls this `json` or `jsonb`, and uses `some_expression->key_expression` to access properties. In those cases it's not like Feldera is wrong or lacks some support, but it's a bit harder to work with for me because I'm so used to Postgres syntax and I need to do some regex replace whenever I bring a query from Postgres over to Feldera.

      Definitely not a deal-breaker, I am a Feldera enjoyer, but it does add some friction.

      • lsuresh 5 hours ago

        Thanks for the kind words. :) We hear you on the dialect differences.

        An interesting case of a user dealing with this problem: they use LLMs to mass migrate SparkSQL code over to Feldera (it's often json-related constructs as you also ran into). They then verify that both their original warehouse and Feldera compute the same results for the same inputs to ensure correctness.