Comment by xnorswap
Comment by xnorswap 4 days ago
33 minutes from impact to status page for a complete outage is a joke.
Comment by xnorswap 4 days ago
33 minutes from impact to status page for a complete outage is a joke.
Actually one of the inventors of k8s was the project lead for copilot in the azure portal, and deployed it over a year ago.
I've only used Azure, to me it seems fine ish. Some things are rather overcomplicated and it's far from perfect but I assumed the other providers were similarly complicated and imperfect.
Can't say I've experienced many bugs in there either. It definitely is overpriced but I assume they all are?
> In Microsoft's defense, Azure has always been a complete joke. It's extremely developer unfriendly, buggy and overpriced.
Don't forget extremely insecure. There is a quarterly critical cross-tenant CVE with trivial exploitation for them, and it has been like that for years.
Given how much time I spent on my first real multi-tenant project, dealing with the consequences of architecture decisions meant to prevent these sorts of issues, I can see clearly the temptation to avoid dealing with them.
But what we do when things are easy is not who we are. That's a fiction. It's how we show up when we are in the shit that matters. It's discipline that tells you to voluntarily go into all of the multi-tenant mitigations instead of waiting for your boss to notice and move the goalposts you should have moved on your own.
Yeah Windows Phone's first releases were decent. I have developed apps for Windows actually using Window's UWP framework but there weren't enough users on their platform sadly.
That’s some very carefully chosen phrasing.
I think if you really wanted to do on call right to avoid gaps you’d want no more than 6 hours on primary per day per shift, and you want six, not four, shifts per day. So you’re only alone for four hours in the middle of your shift and have plenty of time to hand off.
It's 19 minutes until active engagement by staff. And planned rolling restarts can trigger alerts if you don't set thresholds of time instead of just thresholds of count.
It would be nice though if alert systems made it easy to wire up CD to turn down sensitivity during observed actions. Sort of like how the immune system turns down a bit while you're eating.
Unfortunately,that is also typical. I've seen it take longer than that for AWS to update their status page.
The reason is probably because changes to the status page require executive approval, because false positives could lead to bad publicity, and potentially having to reimburse customers for failing to meet SLAs.
I've been on bridges where people _forgot_ to send comms for dozens of minutes. Too many inexperienced people around these days.
In Microsoft's defense, Azure has always been a complete joke. It's extremely developer unfriendly, buggy and overpriced.