Comment by aftbit

Comment by aftbit 2 days ago

As far as I can tell, this has nothing to do with CAP theorem or distributed systems. It's just being used as an analogy.

> [CAP theorem] states that any distributed storage system can provide only two of these three guarantees: Consistency, Availability and Partition safety.

> In the realm of graph databases, we observe a similar “two out three” situation. You can either have scalable systems that are not fully open source or you can have open source systems designed for small graphs. Details below.

(the article follows)

> This is one solution to the CAP theorem for graphs. We can store a billion scale graph using this method in parquet files and use a free, cheap and open source solution to traverse them, perform joins without storage costs that are prohibitively high.

adsharma 2 days ago

That's right - it was a fun 2 out of 3 analogy.

The real question being raised in the blog post is - should the next generation graph databases pursue a local-only embedded strategy or build on top of object storage like many non-graph and vector embedded databases are doing.

Specifically, DuckLake (using system catalog for metadata instead of JSON/YAML) is interesting. I became aware of Apache GraphAr (incubating) after writing the blog post. But it seems to be designed for data interchange between graph databases instead of designing for primary storage.

Reply View 2 replies

aftbit 2 days ago

I only mentioned it because I clicked it wondering if someone had found a way to "cheat" CAP for graph databases. When I saw that it was being used as an analogy and not literally, I figured I'd comment.
I still don't quite get the analogy. What are the 2 out of 3 that you can have? The second paragraph I quoted gives a classic 1 out of 2 dilemma - either scalable _or_ open-source.

Reply View | 1 reply
- adsharma a day ago
  
  DuckDB is scalable (can handle TPC-H 1TB) and open source, but doesn't support graphs natively. It supports some graph queries on a SQL native columnar storage.
  With the proposed solution, you'll be able to query larger graphs on an open source graph native engine. Thus beating the "CAP theorem for graphs".
  
  Reply View | 0 replies