Comment by snicker7

Comment by snicker7 17 hours ago

2 replies

This API is very similar to DynamoDB, which is basically a hash table of B-trees.

My experience is that this architecture can lead to very chatty applications if you have a rich data model (eg a graph).

jolynch 14 hours ago

(post author)

It is indeed similar to DynamoDB as well as the original Cassandra Thrift API! This is intentional since those are both targeted backends and we need to be able to migrate customers between Cassandra Thrift, Cassandra CQL and DynamoDB. One of the most important things we use this abstraction for is seamless migration [1] as use cases and offerings evolve. Rather than think of KeyValue as the only database you ever need, think of it like your language's Map interface, and depending on the problem you are solving you need different implementations of that interface (different backing databases).

Graphs are indeed a challenge (and Relational is completely out of scope), but the high-scale Netflix graph abstraction is actually built atop KV just like a Graph library might be built on top of a language's built in Map type.

[1] https://www.youtube.com/watch?v=3bjnm1SXLlo

pradn 5 hours ago

Graphs are inherently "chatty" because there are more shapes in which you could store them. The same goes for querying. Similar to "degrees of freedom".

Even storing a graph in memory, you're going to have load a lot more cache lines to traverse/query its structure. For remote graphs, this translates into more network calls.

The smart thing Netflix did here is finding the minimal abstraction that supports their online querying needs. Turns out that they need a few things above a bare KV store:

1) Idempotency keys allow multiple reads/writes without reordering issues. You can use them to do request hedging, which greatly helps w/ tail latency, at the cost of higher resource usage.

2) KV, with the value being a map. A little more structure, which can use the backing store's native structure.

3) Passing client/server parameters back and forth in a handshake. This allows clear request policy propagation, so the whole path behaves the way the client op wants it to.

4) Filtering/selection - to reduce the set of items returned, on the server side. So the network + client don't have to bear the extra burden.

The summary is: "minimal viable structure", "maximal chances to hedge requests / reduce data movement".