ericmcer 12 hours ago

Can anyone explain why Netflix is considered to have such high tier engineering? Just from a super high level view they store and serve ~5000 videos saved at a few different qualities (4?) so lets say a total of 20,000 videos. Those files only change when specific privileged users update them.

Compare that with Youtube where ~5,000 videos are uploaded, processed into different formats/qualities every minute, and can be added by anyone with an email. It seems like Netflix has a fairly trivial problem when compared with video sharing or content sharing sites.

  • jolynch 12 hours ago

    My experience has been that the talent density is the main difference. Netflix tackles huge problems with a small number of engineers. I think one angle of complexity you may be missing is efficiency - both in engineering cost and infrastructure cost.

    Also YouTube has _excellent_ engineering (e.g. Vitess in the data space), and they are building atop an excellent infrastructure (e.g. Borg and the godly Google network). It's worth noting though that the whole Netflix infrastructure team is probably smaller than a small to medium satellite org at Google.

  • thecosmicfrog 10 hours ago

    As soon as a streaming service starts having availability issues, it will garner a reputation very quickly and lose customers just as quickly. Being able to serve N amount of content reliably and consistently (even if less than M amount) is still a strong demonstration of good engineering practice in my opinion.

    On that point, I can't honestly recall a time I had Netflix streaming issues that weren't because of a problem on my side. Maybe I've just been lucky though, so ymmv.

  • NBJack 12 hours ago

    Hype for the engineering culture? Helps attract the right talent. It is a relatively small team that is...ah, heavily motivated to come up with good solutions around the clock. And they maintain an excellent tech blog.

    Don't get me wrong; serving the level of traffic they handle isn't easy to scale or do cost-effectively around the globe. They are also considered by some to be pioneers in chaos engineering, and made headlines years ago making a competition to find the "best" suggestion algorithm.

  • loire280 12 hours ago

    You're probably right, but Netflix does a good job building their engineering brand by writing up and sharing their technical work publicly.

  • ianbutler 10 hours ago

    Netflix still has to serve 20k videos to 300million people. That's about a 750million hours of streamed content. Serving that content is challenging.

    Then they have their ad network on top of it. Then they have their analytics apparatus. Then they probably have a whole suite of tools for content producers. Then they probably have a bunch of janky tools for things that didn't exist as products 15 years ago.

    Seems reasonable to me if you put in a little more thought about the problem and scale.

  • dangus 4 hours ago

    On top of that, their competition didn’t need any of that technical adeptness to catch up in the span of a decade or so.

    There is now zero value to the technology advantage of Netflix. Perhaps its impressive that they managed to become a new major studio because of that early success, but we could argue that the incumbent studios’ inability to snuff them out is more of a failure of their leadership than anything impressive about Netflix itself. Heck, the incumbents gave Netflix their place in the market by licensing content to them in the first place.

    So why did Netflix need to build this “pro sports team-like” team of highly paid technologists where they actively fire/lay off low performers again? Netflix was bragging all over the internet about how their culture is so different and better.

    I think ideas like this are something engineers should keep in mind in their careers. You can have the technical advantage but the money and the business environment wins in the end. If you’re in an oligopoly market like Netflix it doesn’t matter that you had a 5-10 year lead and the best technology, Disney and Time Warner and everyone else already had content production, Apple and Amazon have unlimited money.

snicker7 15 hours ago

This API is very similar to DynamoDB, which is basically a hash table of B-trees.

My experience is that this architecture can lead to very chatty applications if you have a rich data model (eg a graph).

  • jolynch 13 hours ago

    (post author)

    It is indeed similar to DynamoDB as well as the original Cassandra Thrift API! This is intentional since those are both targeted backends and we need to be able to migrate customers between Cassandra Thrift, Cassandra CQL and DynamoDB. One of the most important things we use this abstraction for is seamless migration [1] as use cases and offerings evolve. Rather than think of KeyValue as the only database you ever need, think of it like your language's Map interface, and depending on the problem you are solving you need different implementations of that interface (different backing databases).

    Graphs are indeed a challenge (and Relational is completely out of scope), but the high-scale Netflix graph abstraction is actually built atop KV just like a Graph library might be built on top of a language's built in Map type.

    [1] https://www.youtube.com/watch?v=3bjnm1SXLlo

  • pradn 3 hours ago

    Graphs are inherently "chatty" because there are more shapes in which you could store them. The same goes for querying. Similar to "degrees of freedom".

    Even storing a graph in memory, you're going to have load a lot more cache lines to traverse/query its structure. For remote graphs, this translates into more network calls.

    The smart thing Netflix did here is finding the minimal abstraction that supports their online querying needs. Turns out that they need a few things above a bare KV store:

    1) Idempotency keys allow multiple reads/writes without reordering issues. You can use them to do request hedging, which greatly helps w/ tail latency, at the cost of higher resource usage.

    2) KV, with the value being a map. A little more structure, which can use the backing store's native structure.

    3) Passing client/server parameters back and forth in a handshake. This allows clear request policy propagation, so the whole path behaves the way the client op wants it to.

    4) Filtering/selection - to reduce the set of items returned, on the server side. So the network + client don't have to bear the extra burden.

    The summary is: "minimal viable structure", "maximal chances to hedge requests / reduce data movement".

jerf 13 hours ago

For anyone looking for a TL;DR, I'd suggest starting at https://netflixtechblog.com/introducing-netflixs-key-value-d... , which HN is truncating so you can't see it but I've directly linked to a later section in the post with a #. Up to that point it's basically "a networked HashMap<String, SortedMap<Bytes, Bytes>>". But the ability to return partial results based on a timeout with a pagination token is somewhat unusual and the next section called "Signaling" is at least worth a look.