Comment by cryptonector

Comment by cryptonector 3 days ago

0 replies

I've been thinking about building a graph DB in PG with a DB schema that uses CRDT techniques. Every time I think about it the biggest problem ends up being about unique/primary keys, and then I end up with an idea that looks a lot like an Active Directory as for UNIQUE/PRIMARY KEY keys:

- every object has an ID from a single object ID namespace allocated in large chunks to all the participating servers, and

- first UNIQUE/PRIMARY KEY key creation wins / competing ones get renamed (to "Copy of <original>" or similar) or deleted.

I'm a fan of EAV schema design for graph DBs, so that's what my schema would use, at least for inter-object relations, though maybe also for object attributes (but that's tricky to do in PG since there is no ANY type for columns, even though the JSON and JSONB support essentially gives one an approximation of ANY). Why EAV? Because graph traversal (for transitive closure or reachability closure computations) is very simple to express generically in SQL when using EAV schemas, using a RECURSIVE CTE).

Yeah, so you have to re-implement FOREIGN KEY functionality, but you use the same sort of SQL that the RDBMS would generate internally for FKs, so no problem.

EAV also ends up creating a second layer of schema: for the application whose metaschema you define in the selected RDBMS' language (typically SQL). But then another thing you get trivially out of EAV schemas is class inheritance for your higher-level schema, and, yeah, OOP _is_ bad, but for a graph DB it's actually quite handy and convenient, and it need not bleed into client applications. SQL RDBMSes often don't really do inheritance well -- at least PG doesn't.

It'd be nice to be able to create first-class CRDT types in PG for columns that are not UNIQUE/PRIMARY KEY columns... but IIUC while you can create TYPEs and operators, you can't limit the operators to the type's monoid operators. The CRDTs for the non-key columns would have to be exported to the application schema layer, which is what I'd want anyways. The typical CRDTs for non-key columns would be ones like the one in TFA, especially last-write-wins types.

All of this is by way of answering your question by mentioning PG's first-class TYPEs and operators: CRDTs _can_ be built into the RDBMS, or the RDBMS can let you implement them yourself. But a question I have is: what is the application in the RDBMS context? Is it... the SQL tables and views? The tables, views, triggers, and the rest of the SQL schema? Is it the code that generates SQL statements and sends them to the RDBMS? Is it both those things (schema, clients)? The answer will depend on how much of the business logic you end up implementing in the SQL schema.