Comment by geoctl
Is it? I honestly kinda believe that etcd is probably the weakest point in vanilla k8s. It is simply unsuitable for heavy write environments and causes lots of consistency problems under heavy write loads, it's generally slow, it has value size constraints, it offers very primitive querying, etc... Why not replace etcd altogether with something like Postgres + Redis/NATS?
that touches on what I consider the dichotomy of k8s: it's a really scalable system that makes it easy to spin up a cluster locally on your laptop and interact with the full API locally just like in prod. so it's a super scalable system with a dense array of features. but paradoxically most shops won't need the vast majority of k8s features ever and by the time they scale to where they do need a ton of distributed init features they're extremely close to the point where they'd be better served by a bespoke system conceived from scratch in house that solves problems very specific to the business in question. if you have many thousands of k8s nodes, you're probably in the gray area of if using k8s is worth it because the loop of k8s will never be as fast as a centralized push control plane vs the k8s pull/watch control plane. and naturally at scale that problem will only compound