Comment by kragen

Comment by kragen 6 months ago

4 replies

This sounds very interesting! I'll have to take a look.

I am always worried about posting comments like mine because often people get defensive when I try to engage, as I see it, on substance. Responses like yours make it all worthwhile!

deosjr 6 months ago

I appreciate it; this kind of exchange is exactly why I read HackerNews. If you have any good sources on extending Datalog to N-ary relations, I'd love to know. Just had a look at the implementation I based mine on and it exclusively talks about triples: https://www.instantdb.com/essays/datalogjs

Coming from Prolog I'd like to get closer to the original if possible :)

  • thesz 6 months ago

    They use triples as triplets can represent any n-tuple facts.

    E.g., if you have a fact id=(a,b,c,d), you can record triples (id, 1, a), (id, 2, b), (id, 3, c) and (id, 4, d) and reconstruct original fact.

    Look at it as columnar storage in databases.

    Then, if your query only needs a third value from a 4-tuple facts, you can get only those, ignoring first, second and fourth values. This is what columnar storage engines do.

    In fact, I read that one of most efficient datalog engines use relational query execution under the hood.

    Take a look here: https://github.com/philzook58/awesome-egraphs

    The paper you'll most probably find interesting is "Better Together: Unifying Datalog and Equality Saturation," but there are many others interesting things there.

    • deosjr 6 months ago

      Cheers, this is super useful. I will have to do some reading. Being able to build up n-ary predicates using triples that way makes a lot of sense.

  • kragen 6 months ago

    Datalog always supports N-ary relations! It's not an extension.

    The Wikipedia article recommends https://search.worldcat.org/title/30546436 "Foundations of Databases" by Abiteboul, Hull, and Vianu, from 01995, and https://archive.org/details/logicdatabases0000symp/page/n5/m... "Logic and Data Bases [sic]" by Gallaire and Minker from 01978. Some poking at Google Scholar also turns up, in rough order of how promising they look (without having read them that I can recall):

    https://dl.acm.org/doi/pdf/10.1145/6012.15399 "Magic Sets and Other Strange Ways to Implement Logic Programs", Bancilhon, Maier, Sagiv, & Ullman (yes, that Ullman), 01985 (15 pp.)

    https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d... "What You Always Wanted to Know About Datalog (And Never Dared to Ask)", Ceri, Gottlob, and Tanca, 01989 (21 pp.)

    https://web.cecs.pdx.edu/harry/earley/datalog.pdf "Optimizations to Earley Deduction for DATALOG Programs", Porter, 01985 (12 pp.)

    https://dl.acm.org/doi/pdf/10.1145/308386.308420 "Optimizing Existential Datalog Queries", Ramakrishhnan, Beers, & Krishnamurthy, 01988 (14 pp.)

    https://dl.acm.org/doi/pdf/10.1145/298514.298542 "On the Expressive Power of Datalog: Tools and a Case Study", Kolaitis & Vardi, 01990

    https://dl.acm.org/doi/pdf/10.1145/93605.98724 "A Framework for the Parallel Processing of Datalog Queries", Ganguly, Silberschatz, & Tsur, 01990

    https://deepblue.lib.umich.edu/bitstream/handle/2027.42/3116... "Datalog vs First-Order Logic", Ajtai & Gurevich, 01993

    https://lat.inf.tu-dresden.de/teaching/ss2014/Seminar/Papers... "Equivalence of Datalog Queries is Undecidable", Shmueli, 01993

    It also turned up "Portability of Syntax and Semantics in Datalog" which turned out to be an unrelated NLP AI system called Datalog.

    Bancilhon, Maier, Sagiv, & Ullman give as their reference for Datalog "Maier and Warren [1985]", which turns out to be "D. Maier and D. S. Warren [1985]. Introduction to Logic Programming, unpublished memorandum, Oregon Graduate Center," which I can't find a copy of easily. But given that Maier is a shared author we can probably trust their summary of what Datalog is.

    Ceri, Gottlob, and Tanca reference "[120], [15], [16]," which are respectively:

    J. D. Ullman, “Implementation of logic query languages for databases,” ACM Trans. Database Syst., vol. 10, no. 3, 1985

    F. Bancilhon and R. Ramakrishnan, “An amateur’s introduction to recursive query processing,” in Proc. ACM-SIGMOD Conf., May 1986.

    -, “Performance evaluation of data intensive logic programs,” in Foundations of Deductive Databases and Logic Programming, J. Minker, Ed. Washington, DC, 1986.

    The Ullman paper is https://dl.acm.org/doi/pdf/10.1145/3979.3980, "Implementation of Logical Query Languages for Databases", ACM Transactions on Database Systems, Vol. 10, No. 3, September 1985, Pages 289-321 (33 pp.). Ceri, Gottlob, and Tanca screwed up the title. https://dl.acm.org/doi/10.1145/971699.320000 probably isn't it; the journal name, volume number, and issue number don't match, although it's the right author and year. That seems to be the oldest published Datalog paper, although the word "Datalog" hadn't been invented yet and doesn't appear in the paper.

    I think I'm going to read the Bancilhon, Maier, Sagiv, & Ullman paper first, because it's shorter and has a more readable-sounding title, and then maybe Ceri, Gottlob, and Tanca, and then maybe Ullman, and then maybe a relevant chapter or two of Gallaire and Minker.