Comment by everfrustrated

Comment by everfrustrated 5 hours ago

How is Azure still having faults that affect multiple regions? Clearly their region definition is bollocks.

ragall 2 hours ago

All 3 hyperscalers have vulnerabilities in their control planes: they're either single point of failure like AWS with us-east-1, or global meaning that a faulty release can take it down entirely; and take AZ resilience to mean that existing compute will continue to work as before, but allocation of new resources might fail in multi-AZ or multi-region ways.

It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.

Reply View 6 replies

tbrownaw 2 hours ago

> It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.
That sounds oddly similar to owning hardware.

Reply View | 1 reply
- ragall 2 hours ago
  
  In a way. It means that you can get new capacity most often, but the transition windows where a service gets resized (or mutated in general) has to be minimised and carefully controlled by ops.
  
  Reply View | 0 replies
everfrustrated 2 hours ago

This outage talks about what appears to be a VM control plane failure (it mentions stop not working) across multiple regions.
AWS has never had this type of outage in 20 years. Yet Azure constantly had them.
This is a total failure of engineering and has nothing to do with capacity. Azure is a joke of a cloud.

Reply View | 3 replies
- mirashii 2 hours ago
  
  AWS had an outage that blocked all EC2 operations just a few months ago: https://aws.amazon.com/message/101925/
  
  Reply View | 1 reply
  
  everfrustrated 27 minutes ago
  
  This was the largest AWS outage in a long long time and was still constrained to a single AWS region.
  Which is my point.
  The same fault on Azure would be a global (all-regions) fault.
  
  Reply View | 0 replies
- ragall 2 hours ago
  
  I do agree that Azure seems to be a lot worse: its control plane(s) seems to be much more centralized than the other two.
  
  Reply View | 0 replies