Comment by philzook

Comment by philzook 4 days ago

12 replies

Nice!

I'll note there is a really shallow version of naive datalog I rather like if you're willing to compromise on syntax and nonlinear variable use.

   edge = {(1,2), (2,3)}
   path = set()
   for i in range(10):
       # path(x,y) :- edge(x,y).
       path |= edge
       # path(x,z) :- edge(x,y), path(y,z).
       path |= {(x,z) for x,y in edge for (y1,z) in path if y == y1}

Similarly it's pretty easy to hand write SQL in a style that looks similar and gain a lot of functionality and performance from stock database engines. https://www.philipzucker.com/tiny-sqlite-datalog/

I wrote a small datalog from the Z3 AST to sqlite recently along these lines https://github.com/philzook58/knuckledragger/blob/main/kdrag...

richard_shelton 3 days ago

Or you can use Python AST + Z3 :) Here is a toy implementation:

https://github.com/true-grue/python-dsls/blob/main/datalog/d...

  • philzook 3 days ago

    Love it! I was trying to use python as a dsl frontend to Z3 in a different way https://github.com/philzook58/knuckledragger/blob/ecac7a568a... (it probably has to be done syntactically rather than by overloading in order to make `if` `and` etc work). Still not sure if it is ultimately a good idea. I think it's really neat how you're abusing `,` and `:` in these examples

    • richard_shelton 2 days ago

      In some parallel world, Python is the perfect tool for language-oriented programming. Any decorated function is compiled by its own DSL compiler into the internal representation of the corresponding solver, and functions communicate with each other using rich Python data structures :)

sirwhinesalot 3 days ago

If I ever get around to writing my own at least somewhat serious Datalog engine, I definitely want to add a "translate to SQL" capability. Your work looks like the perfect inspiration, thanks!

(Also added a link to your article on what you can do with Datalog, excellent stuff, couldn't have written it better myself)

  • philzook 3 days ago

    Thanks! I wouldn't so much say it was written so much as it was vomited out in a year of enthusiasm, but I'm glad it has some value.

ulrikrasmussen 3 days ago

Thank you! I have been searching for something like this but for some reason couldn't find your work.

I am currently implementing a Datalog to PostgreSQL query engine at work as we want to experiment with modeling authorization rules in Datalog and then run authorization queries directly in the database. As I want to minimize the round trips to the database I use a different approach than yours and translate Datalog programs to recursive CTEs. These are a bit limited, so we have to restrict ourselves to linearly recursive Datalog programs, but for the purpose of modeling authorization rules that seems to be enough (e.g. you can still model things such as "permissions propagate from groups to group members").

  • philzook 3 days ago

    My suspicion is that if you can get away with it that recursive CTEs would be more performant than doing the datalog iteration query by query. AFAIK for general rules the latter is the only option though. I had a scam in that post to do seminaive using timestamps but it was ugly. I recently came across this new feature in duckdb and was wondering if there is some way to use it to make a nice datalog https://duckdb.org/2025/05/23/using-key.html . Anything that adds expressiveness to recursive CTEs is a possibility in that direction

    • ulrikrasmussen a day ago

      Yes, I think so too, if not just for the reduced number of round-trips to the database. The evaluation strategy that PostgreSQL employs for CTEs is essentially a degenerate form of seminaive evaluation where it only has to consider the delta for one subgoal in each rule due to being restricted to linearly recursive programs with no mutual recursion. The restriction to not have mutual recursion means that PostgreSQL can just stratify the CTE and compute a fixed point using the degenerate seminaive algorithm for each strata. Once a stratum is done, it can consider its idb as ground facts in the next, and so on.

      I am really unhappy about the design of CTEs, and frankly I think it is a clunky hack designed by someone who was not aware of all the research on Datalog evaluation, which Ullman and others had already written excellent textbooks about at the time. CTEs are simultaneously too powerful - Turing-completeness in a declarative query language is a sure way to tell that the designers screwed up - and too restricted because being stuck in the Turing tarpit rules out so many techniques for optimization.

      I didn't know about DuckDB's approach, but it seems to be a way to mark your relations as moded and exploit that during the evaluation. It appears to improve performance, but unfortunately the design it builds on is inherently flawed. I wish SQL would get a proper pure Datalog fragment in the future.

  • whitten 3 days ago

    What does CTE stand for, and how do I research it ?

    • burakemir 3 days ago

      Common Table Expression, a SQL concept that enables more expressive programming with SQL queries. They are introduced using WITH ...