Comment by yowlingcat
Comment by yowlingcat 3 days ago
I see a lot of value in spinning up microservices where the database is global across all services (and not inside the service) but I struggle more to see the value of separate core transactional databases for separate services unless/until the point where two separate parts of the organizations are almost two separate companies that cannot operate as a single org/single company. You lose data integrity, joining ability, one coherent state of the world, etc.
The main time I can see this making sense is when the data access patterns are so different in scale and frequency that they're optimizing for different things that cause resource contention, but even then, my question would become do you really need a separate instance of the same kind of DB inside the service, or do you need another global replica/a new instance of a new but different kind of DB (for example Clickhouse if you've been running Postgres and now need efficient OLAP on large columnar data).
Once you get to this scale, I can see the idea of cell-based architecture [1] making sense -- but even at this point, you're really looking at a multi-dimensionally sharded global persistence store where each cell is functionally isolated for a single slice of routing space. This makes me question the value of microservices with state bound to the service writ large and I can't really think of a good use case for it.
[1] https://docs.aws.amazon.com/wellarchitected/latest/reducing-...
> I see a lot of value in spinning up microservices where the database is global across all services (and not inside the service)
This issue with this is schema evolution. As a very simple example, let's say you have a User table, and many microservices accessing this table. Now you want to add an "IsDeleted" column to implement soft deletion; how do you do that? First you need to add the actual column to the database, then you need to go update every single service which queries that table and ensure that it's filtering out IsDeleted=True, deploy all those services, and only then can you actually start using the column. If you must update services in lockstep like this, you've built a distributed monolith, which is all of the complexity of microservices with none of the benefits.
A proper service-oriented way to deal with this is have a single service with control of the User table and expose a `GetUsers` API. This way, only one database and its associated service needs to be updated to support IsDeleted. Because of API stability guarantees--another important guarantee of good SoA--other services will continue to only get non-deleted users when using this API, without any updates on their end.
> You lose data integrity, joining ability, one coherent state of the world, etc.
You do lose this! And it's one of the tradeoffs, and why understanding your domain is so important for doing SoA well. For subsets of the domain where data integrity is important, it should all be in one database, and controlled by one service. For most domains, though, a lot of features don't have strict integrity requirements. As a concrete though slightly simplified example, I work with IoT time-series data, and one feature of our platform is using some ML algorithms to predict future values based on historical trends. The prediction calculation and storage of its results is done in a separate service, with the results being linked back via a "foreign key" to the device ID in the primary database. Now, if that device is deleted from the primary database, what happens? You have a bunch of orphaned rows in the prediction service's database. But how big of a deal is this actually? We never "walk back" from any individual prediction record to the device via the ID in the row; queries are always some variant of "give me the predictions for device ID 123". So the only real consequence is a bit of database bloat, which can be resolved via regularly scheduled orphan checking processes if it's a concern.
It's definitely a mindshift if you're used to a "everything in one RDBMS linked by foreign keys" strategy, but I've seen this successfully deployed at many companies (AWS, among others).