Tell HN: Azure outage
881 points by tartieret 5 days ago
Azure is down for us, we can't even access the azure portal. Are other experiencing this? Our services are located in Canada/Central and US-East 2
881 points by tartieret 5 days ago
Azure is down for us, we can't even access the azure portal. Are other experiencing this? Our services are located in Canada/Central and US-East 2
Is voting there a one day only event? If not, I feel the solution to that particular problem is quite clear. There’s a million things that could go wrong causing you to miss something when you try to do it in a narrow time range (today after work before polls close)
If it’s a multi day event, it’s probably that way for a reason. Partially the same as the solution to above.
In europe, voting typically happens in one day, where everyone physically goes to their designated voting place and puts papers in a transparent box. You can stay there and wait for the count at the end of the day if you want to. Tom Scott has a very good video about why we don't want electronic/mail voting: https://www.youtube.com/watch?v=w3_0x6oaDmI
Electronic voting and mail voting are very different things though.
In Italy we typically vote for two days, usually Sunday and Monday or Saturday and Sunday.
Europe, expect in the middle of it in Switzerland where at least I know nobody that actually goes to the voting place. We do it by mail.
UK is a one day affair with voting booths typically open like 6 am to 10 pm
We do mail voting from embassies or consulates when abroad.
Voting seems like one of the few problems that blockchain is actually the solution for.
If India can have voters vote and tally all the votes in one day, then so can everyone else. It’s the best way to avoid fraud and people going with whoever is ahead. I am sympathetic with emergency protocols for deadly pandemics, but for all else, in-person on a given day.
> If India can have voters vote and tally all the votes in one day, then so can everyone else.
In most countries, in the elections you vote or the member of parliament you want. Presidential elections, and city council elections are held separately, but are also equally simple. But in one election you cast your vote for one person, and that's it.
With this kind of elections, many countries manage to hold the elections on paper ballots, count them all by hand, and publish results by midnight.
But on an American ballot, you vote for, for example:
- US president
- US senator
- US member of congress
- state governor
- state senator
- state member of congress
- several votes for several different state judge positions
- several other state officer positions
- several votes for several local county officers
- local sheriff
- local school board member
- several yes/no votes for several proposed laws, whether they should be passed or not
I don't think it would be possible to calculate all these 20 or 40 votes, if calculated by hand. That's why they use voting machines in America.Voting in India is staggered over multiple phases over multiple days/weeks. Only the vote count happens on a single day at the end.
If it's not a national holiday where the vast majority of people don't have to work, and if there aren't polling places reasonably near every voting age citizen, it's voter suppression.
In particular India has a law that no one shall be made to walk more than 2km to vote. The Indian military will literally deploy a voting booth into the jungle so that a single caretaker of an old temple can vote.
Washington State having full vote-by-mail (there is technically a layer of in-person voting as a fallback for those who need it for accessibility reasons or who missed the registration deadline) has spoiled me rotten, I couldn't imagine having to go back to synchronous on-site voting on a single day like I did in Illinois. Awful. Being able to fill my ballot at my leisure, at home, where I can have all the research material open, and drive it to a ballot drop box whenever is convenient in a 2-3 week window before 20:00 on election night, is a game-changer for democracy. Of course this also means that people who serve to benefit from disenfranchising voters and making it more difficult to vote, absolutely hate our system and continually attack it for one reason or another.
As a Dutchman, I have to go vote in person on a specific day. But to be honest: I really don't mind doing so. If you live in a town or city, there'll usually be multiple voting locations you can choose from within 10 minutes walking distance. I've never experienced waiting times more than a couple of minutes. Opening times are pretty good, from 7:30 til 21:00. The people there are friendly. What's not to like? (Except for some of the candidates maybe, but that's a whole different story. :-))
Finance is increasingly reliant on it too, my bank moved their entire system to AWS. The amount of power being handed over to these cloud companies in exchange for “convenience” is astonishing.
Here in Belgium voting is usually done during the weekend, although it shouldn't matter because voting is a civic duty (unless you have a good reason you have to go vote or you'll be fined), so those who work during the weekend have a valid reason to come in late or leave early.
In the US, where I assume a lot of the griping comes from, election day is not a national holiday, nor is it on a weekend (in fact, by law it is defined as "the Tuesday next after the first Monday in November"), and even though it is acknowledged as an important civic duty, only about half of the states have laws on the books that require employers provide time off to vote. There are no federal laws to that effect, so it's left entirely to states to decide.
In Australia there are so many places to vote, it is almost popping out to get milk level if convenience. Just detour your dog walk slightly. Always at the weekend.
From mandatory voting to preferential voting, australia seems to have figured out what this works best for democracy.
In washington we have a 100% mail-in voting system (for all intents and purposes). I can put my ballot back in the mail or drop at any number of drop-boxes throughout the city (less dropboxes in rural areas i'm sure). I think there are some allowances for in-person voting but I don't think they are often used.
There is a ballot tracking system as well, I can see and be notified as my ballot moves through the counting system. It's pretty cool.
I actually just got back from dropping off my local elections ballot 15m ago, quick bike trip maybe a mile or so away and back.
Of course, because it makes it easy for people to vote, the republicans want to do away with it. If you have to stand in line for several hours (which seems to be very normal in most cities) and potentially miss work to do it that's going to all but guarantee that working people and the less motivated will not vote.
So yes in places that only do in person voting, national or state holiday.
In Australia there are so many places to vote, it is almost popping out to get milk level if convenience. (At least in urbia and suburbia) Just detour your dog walk slightly. Always at the weekend.
In the US getting milk involves driving multiple miles, finding parking, walking to the store, finding a shopping cart, finding the grocery department, navigating the aisles to the dairy section, finding the milk, waiting in line to check out, returning the cart if you’re courteous, and driving back. Could take an hour or so.
i'm not sure this is an easily solvable problem. i remember reading an article arguing that your cloud provider is part of your tech stack and it's close to impossible/a huge PITA to make a non-trivial service provider-agnostic. they'd have to run their own openstack in different datacenters, which would be costly and have their own points of failure.
I run non trivial services on EC2, using that service as a VPS. My deploy script works just as well on provisioned Digital Ocean services and on docker containers using docker-compose.
I do need a human to provision a few servers and configure e.g. load balancing and when to spin up additional servers under load. But that is far less of a PITA than having my systems tied to a specific provider or down whenever a cloud precipitates.
It's absolutely doable if you design for it.
The moment you choose to use S3 instead of hosting your own object store, though, you either use AWS because S3 and IAM already have you or spend more time on the care and feeding of your storage system as opposed to actually doing the thing you customers are paying you to do.
It's not impossible, just complicated and difficult for any moderately complex architecture.
can't believe it's 2025 and some still need to go to some place to vote. I can vote since I can remember(at least 20 years) by mail for anything, we also vote multiple times a year(4-6 times), we just get 1 Month before the things to vote by mail and then mail in back votes. Hope we can soon vote online to get rid of the paper overhead.
Is that you? The same guy with the comment "hahahhahaha"[1] on "Women dating safety app 'Tea' breached, users' IDs posted to 4chan"[2]?
Here is an article in English:
https://nltimes.nl/2025/10/29/ns-hit-microsoft-cloud-outage-...
It should be noted that the article isn't complete: while the travel planner and ticket machines were the first to fail, trains were cancelled soon after; it took a few hours before everything restarted.
Based on what the conductors said, I would speculate that the train drivers digital schedule was not operative, so they didn't know where to go next.
Thanks! Perhaps one of the sites that track train delays can give a statistic?
This list doesn't have anything that looks relevant: https://www.rijdendetreinen.nl/en/disruptions/archive?date_b...
The day does not appear as an outlier in the monthly statistics: https://www.rijdendetreinen.nl/en/statistics/2025/10
I don't find a detailed statistic on the overall delays, but the per-station statistics for Amsterdam Centraal say 5% of trains were cancelled and 17% were delayed by 5 minutes or more (mostly by 10 minutes): https://www.rijdendetreinen.nl/en/train-archive/2025-10-29/a...
It is a great age to live in where we have transportation infrastructure beyond foot paths.
The Flemish bus company (de Lijn) uses Azure and I couldn't activate my ticket when I came home after training a couple of hours ago. I should probably start using physical tickets again, because at least those work properly. It's just stupid that there's so much stuff being moved to digital only (often even only being accessible through an Android or iOS app, despite the parent companies of those two being utterly atrocious) when the physical alternatives are more reliable.
You can get way more 9s if you roll your own procedures - and each step has a way of handling failure of the other steps.
Old trains had paper tickets, the locomotive was its own power source, the conductor had a flashlight, and the conductor could sell tickets for cash.
And if everything else failed, the conductor would just let you ride for free.
Now everything's so interconnected that any one part failing brings everything to a halt.
Google cloud run or cloudflare workers it is.
Personally I am thinking more and more about hetzner, yes I know its not an apples to orange comparison. But its honestly so good
Someone had created a video where they showed the underlying hardware etc., I am wondering if there is something like https://vpspricetracker.com/ but with geek-benchmarks as well.
This video was affiliated with scalahosting but still I don't think that there was too much bias of them and they showed at around 3:37 a graph comparison with prices https://www.youtube.com/watch?v=9dvuBH2Pc1g
Now it shows how contabo has better hardware but I am pretty sure that there might be some other issues, and honestly I feel a sense of trust with hetzner I am not sure about others.
Either hetzner or self hosting stuff personally or just having a very cheap vps and going to hetzner if need be but hetzner already is pretty cheap or I might use some free service that I know of are good as well.
One of recent (4 months ago) Cloudflare outages (I think it was even workers) was caused by Google Cloud being down and Cloudflare hosting an essential service there
It was Workers KV (an optional storage add-on to Workers), and we fixed it, it no longer depends on GCP:
https://blog.cloudflare.com/rearchitecting-workers-kv-for-re...
Hm it seemed that they hosted a critical service for cloudflare kv on google itself, but I wonder about the update.
Personally I just trust cloudflare more than google, given how their focus is on security whereas google feels googly...
I have heard some good things about google cloud run and the google's interface feels the best out of AWS,Azure,GCloud but I still would just prefer cloudflare/hetzner iirc
Another question: Has there ever been a list of all major cloud outages, like I am interested how many times google cloud and all cloud providers went majorly down I guess y'know? is there a website/git project that tracks this?
For some reason an Azure outage does not faze me in the same way that an AWS outage does.
I have never had much confidence in Azure as a cloud provider. The vertical integration of all the things for a Microsoft shop was initially very compelling. I was ready to fight that battle. But, this fantasy was quickly ruined by poor execution on Microsoft's part. They were able to convince me to move back to AWS by simply making it difficult to provision compute resources. Their quota system & availability issues are a nightmare to deal with compared to EC2.
At this point I'd rather use GCP over Azure and I have zero seconds of experience with it. The number of things Microsoft gets right in 2025 can be counted single-handedly. The things they do get right are quite good, but everything else tends to be extremely awful.
Many years back was the first time I used Azure, evaluating it for a client.
I remember I at one point had expanded enough menus that it covered the entirety of the screen.
Never before have I felt so lost in a cloud product.
The "Blades" experience [0] where instead of navigating between pages it just kept opening things to the side and expanding horizontally?
Yeah, that had some fun ideas but was way more confusing than it needed to be. But also that was quite a few years back now. The Portal ditched that experience relatively quickly. Just long enough to leave a lot of awful first impressions, but not long enough for it to be much more than a distant memory at this point, several redesigns later.
[0] The name "Blades" for that came from the early years of the Xbox 360, maybe not the best UX to emulate for a complex control panel/portal.
Azure to me has always suffered from a belief that “UI innovations can solve UX complexity if you just try hard enough.”
Like, AWS, and GCP to a lesser extent, has a principled approach where simple click-ops goals are simple. You can access the richer metadata/IAM object model at any time, but the wizards you see are dumb enough to make easy things easy.
With Azure, those blades allow tremendously complex “you need to build an X Container and a Container Bucket to be able to add an X” flows to coexist on the same page. While this exposes the true complexity, and looks cool/works well for power users, it is exceedingly unintuitive. Inline documentation doesn’t solve this problem.
I sometimes wonder if this is by design: like QuickBooks, there’s an entire economy of consultants who need to be Certified and thus will promote your product for their own benefit! Making the interface friendly to them and daunting to mere mortals is a feature, not a bug.
But in Azure’s case it’s hard to tell how much this is intentional.
Count your blessings. You could have to use Azure SSO through Oracle Cloud..... ; ;
AWS' UI is similarly messy, and to this day. They regularly remove useful data from the UI, or change stuff like the default sort order of database snapshots from last created to initial instance created date.
I never understood why a clear and consistent UI and improved UX isn't more of a priority for the big three cloud providers. Even though you talk mostly via platform SDK's, I would consider better UI especially initially, a good way to bind new customers and pick your platform over others.
I guess with their bottom line they don't need it (or cynically, you don't want to learn and invest in another cloud if you did it once).
It’s more than just the UI itself (which is horrible), it’s the whole thing that is very hostile to new users even if they’re experienced. It’s such an incoherent mess. The UI, the product names, the entire product line itself, with seemingly overlapping or competing products… and now it’s AI this and AI that. If you don’t know exactly what you’re looking for, good luck finding it. It’s like they’re deliberately trying to make things as confusing as possible.
For some reason this applies to all AWS, GCP and Azure. Seems like the result of dozens of acquisitions.
Amazon: here's two buttons, some check boxes and a random popup.
MSFT : Hold my beer...
> At this point I'd rather use GCP over Azure and I have zero seconds of experience with it.
TBH, GCP is very good! More people should use it.
I know for some people the prospect of losing their Google Cloud access due to an automated terms of service violation on some completely unrelated service is worrisome.
https://cloud.google.com/resource-manager/docs/project-suspe...
I'd hope you can create a Google Cloud account under a completely different email address, but I do as little business with Google as I can get away with, so I have no idea.
ACI is the corresponding service in our favorite cloud.
have used both. I prefer Azure. Much easier to use. found GCP unintuitive.
The problem is that in some industries, Microsoft is the only option. Many of these regulated industries are just now transitioning from the data center to the cloud, and they've barely managed to get approval for that with all of the Microsoft history in their organization. AWS or GCloud are complete non-starters.
I moved a 100% MS shop to AWS circa 2015. We ran our DCs on EC2 instances just as if they were on prem. At some point we installed the AAD connector and bridged some stuff to Azure for office/mail/etc., but it was all effectively in AWS. We were selling software to banks so we had a lot of due diligence to suffer. AWS Artifact did much of the heavy lifting for us. We started with Amazon's compliance documentation and provided our own feedback on top where needed.
I feel like compliance is the entire point of using these cloud providers. You get a huge head start. Maintaining something like PCI-DSS when you own the real estate is a much bigger headache than if it's hosted in a provider who is already compliant up through the physical/hardware/networking layers. Getting application-layer checkboxes ticked off is trivial compared to "oops we forgot to hire an armed security team". I just took a look and there are currently 316 certifications and attestations listed under my account.
I've found that lift and shifting to EC2 is also generally cheaper than the equivalent VMs on Azure.
Microsoft really wants you to use their PaaS offerings, and so things on Azure are priced accordingly. A Microsoft shop just wanting to lift-and-shift, Azure isn't the best choice unless the org has that "nobody ever got fired for buying Microsoft" attitude.
What Amazon, Azure, and Google are showing with their platform crashes amid layoffs, while they supports governments that are Oppressing's Citizens and Ignoring the Law, is that they do not care about anything other than the bottom line.
They think they have the market captured, but I think what their dwindling quality and ethics are really going to drive is adoption of self hosting, distributed computing frameworks. Nerds are the ones who drove adoption of these platforms, and we can eventually end if we put in the work.
Seriously with container technology, and a bit more work / adoption on distributed compute systems and file storage (IPFS,FileCoin) there is a future where we dont have to use big brothers compute platform. Fuck these guys.
These were my thoughts exactly. I may have my tinfoil hat on, but outages these close together between the largest cloud providers amid social unrest, my wonder is the government / tech companies implementing some update that adds additional spyware / blackout functionality.
I really hope this pushes the internet back to how it used to be, self hosted, privacy, anonymity. I truly hope that's where we're headed, but the masses seem to just want to stay comfortable as long as their show is on TV
GitHub doesn't use Azure yet, but has just published their migration path to azure a few days ago.
I would link to that article, but that one does seem down ;)
https://www.githubstatus.com/incidents/4jxdz4m769gy
> They're stating they're working with the Azure teams, so I suspect this is related.
At least some bits of it do. I was writing something to pull logs the other day and the redirect was to an azure bucket. It also returned a 401 with the valid temporary authed redirect in the header. I was a bit worried I'd found a massive security hole but it appears after some testing it just returned the wrong status code.
Currently standing in a half closed supermarket because the tills are down and they cant take payments
IIRC, the grocery chain I worked for used to have an offline mode to move customers out the door.
Chick-fil-a has this.
One of the tech people there was on HN a few years ago describing their system. Credit card approval slows down the line, so the cards are automatically "approved" at the terminal, and the transaction is added to a queue.
The loss from fraudulent transactions turns out to be less than the loss from customers choosing another restaurant because of the speed of the lines.
The POS I work on also has this feature. Line busters take the order and payment but we have a toggle where you can immediately “approve” and queue it up. If the payment fails then the person handing you your food will see it on the order and ask you for alternative payment. It helps prevent loss and speeds up the line overall.
Yea, good old store and forward. We implemented it in our PoS system. Now, we do non PCI integrations so we arn't in PCI scope, but depending on the processor, it can come with some limitations. Like, you can do store and forward, but only up to X number of transactions. I think for one integration, it's 500-ish store wide (it uses a local gateway that store and forwards to the processors gateway). The other integration we have, its 250, but store and forward on device, per device.
In many places it's also possibly just a left over feature from older times. I worked at a major UK supermarket in the mid-00s, and their checkout system had this feature. But it was like that because that's how it was originally built, it wasn't a 'feature' they added.
Credit card information would be recorded by the POS, synced to a mini-server in the back office (using store-and-forward to handle network issues) and then in a batch process overnight, sent to HQ where the payment was processed.
It wasn't until chip-and-PIN was rolled out that they started supporting "online" (i.e. processed then and there) card transactions, and even then the old method still worked if there was a network issues or power failure (all POSes has their own UPS).
The only real risk at the time was that someone tried to pay with a cancelled credit card - the bank would always honour the payment otherwise. But that was pretty uncommon back then, as you'd have to phone your bank to do it, not just press a button in an app.
I was shopping at a mall with a visa vanilla card once. I got it as a gift and didn't know the limit. No matter what I bought the card kept going -- and I never got a balance of what was on the card. Eventually, later that day it stopped. I called customer support and asked how much was left on the balance. They told me they had no idea my balance - but everything I bought was mine.
What I gather from this is to always try a dead card first just in case the store is in offline mode
They still capture the name on the card, so I would be careful about trying this, unless you can make use of a prepaid card.
There's a Family Dollar by my house that is down at least 2 full days per month because of bad inet connectivity. I live close enough that with a small tower on my roof i can get line of sight to theirs. I've thought about offering them a backup link off my home inet if they give me 50% of sales whenever its in use. It would be a pretty good deal for them, better some sales when their inet is down vs none.
It's Family Dollar, margin has to be almost nothing and sales per day is probably < $1k. That's why I said 50% of sales and not profit.
I go there daily because it's a nice 30min round trip walk and I wfh. I go up there to get a diet coke or something else just to get out of the house. It amazes me when i see a handwritten sign on the door "closed, system is down". I've gotten to know the cashiers so I asked and it's because the internet connection goes down all the time. That store has to one of the most poorly run things i've ever seen yet it stays in business somehow.
it's retail. the margin is 30-50% for sure.
EDIT: their last quarterly was 36%. they lost $3.7bn in 24Q4 -- the christmas quarter. sold to PE in Q1.
You'd think any SeriousBusiness would have a backup way to take customers' money. This is the one thing you always want to be able to do: accept payment. If they made it so they can't do that, they deserve the hit to their revenue. People should just walk out of the store with the goods if they're not being charged.
Why doesn't someone in the store at least have one of those manual kachunk-kachunk carbon copy card readers in the back that they can resuscitate for a few days until the technology is turned back on? Did they throw them all away?
I think a lot of payment terminals have an option to record transactions offline and upload them later, but apparently it's not enabled by default - probably because it increases your risk that someone pays with a bad card.
After having my credit card locked by stupid fraud prevention algorithms on my honeymoon, I had a long chat with them before going overseas a second time.
And that was the day Visa had a full on outage. We would walk into one shop, try to buy stuff, get declined, then go into the next and get accepted because they were running in offline mode.
Got a nice big bill from my cellphone carrier for making the call to visa to ask them wtf as well.
The kachunk-kachunk credit card machines need raised digits on the cards, and I don't think most banks have been issuing those for years at this point. Mine have been smooth for at least 10 years.
If they used standalone merchant terminals, then those typically use the local LAN which can rollover to cellular or PoT in the event of a network outage. The store can process a card transaction with the merchant terminal and then reconcile with the end of day chit. This article from 2008 describes their PoS https://www.retailtouchpoints.com/topics/store-operations/ca...
These stores appear everywhere, even in areas with high income. You'd be surprised, but often people with those high incomes shop for certain products at very low rates, and that's how they keep their savings. A good example is garbage bags. Most people don't care too much about the quality of their garbage bags, unless they rip on the way to the bin.
Pretty sure it'd be a lot better deal for them to have no sales than to pay out 50% of sales on stuff with single digit margins.
I remember last mechanical cash registers in my country in 90s and when these got replaced by early electronic ones with blue vacuum fluorescent tubes. Then everything got smaller and smaller. Now I'm pestered to "add the item to the cart" by software.
Last week I couldn't pay for flowers for grandma's grave because smartphone-sized card terminal refused to work - it stuck on charging-booting loop so I had to get cash. Tho my partner thinks she actually wanted to get cash without a receipt for herself excluding taxes
In Germany many stores still accept cash and some even only accept cash and we are ridiculed for this... Seems like one of the rare instances where this is useful :D
It's sad the number of stores I've seen where they just shut down when they can't use the checkout machines; the clerks aren't allowed to do math even if they could.
Whereas the smaller, owner-run stores have more leeway; the local tiny grocery "sold" all freezer/refrigerator food for cheap/free during a power failure. The big Walmart closed and threw everything away the next day.
The odd thing is that the US has been teaching math using the “in your head” heuristic (New Math) for almost 20 years and yet young adults cannot make change without the machine to save their lives.
God help me if I hand someone $25 for a $14.75 total. I’m getting small bills back.
I wonder what they teach in Germany.
Mind-boggling that any retailer would not have the capability to at least run the checkout stations offline.
You can, but it's all about risk mitigation. Most processors have some form of store and forward (and it can have limitations like only X number of transactions). Some even have controls to limit the amount you can store-and-forward (for instance, only charges under $50). But ultimately, it's still risk mitigation. You can store-and-forward, but you're trusting that the card/account has the funds. If it doesn't, you loose and ain't shit you can do about it. If you can't tolerate any risk, you don't turn on store and forward systems and then you can't process cards offline.
Its not the we are not capable. Its, is the business willing to assume the risk?
I knew an old guy in the '00s who specialized in cobal/fortran for working on tiller software. Guess he retired and they couldn't maintain it
Anyone remember Bob's number?? Bob?! Oh the humanity! We're all gonna be canned!
Currently standing in a half closed supermarket because the tills are down and they cant take payments
There's a fairly large supermarket near me that has both kinds of outages.
Occasionally it can't take cards because the (fiber? cable?) internet is down, so it's cash only.
Occasionally it can't take cash because the safe has its own cellular connection, and the cell tower is down.
I was at Frank's Pizza in downtown Houston a few weeks ago and they were giving slices of pizza away because the POS terminal died, and nobody knew enough math to take cash. I tried to give them a $10 and told them to keep the change, but "keep the change" is an unknown phrase these days. They simply couldn't wrap their brains around it. But hey, free pizza!
why tf would a supermarket depend on Azure? Payment processing isn't their thing
It is interesting to see the differential across different tenants in different geographies:
- on a US tenant I am unable to access login.microsoftonline.com and the login flow stalls on any SSO authentication attempt.
- on a European tenant, probably germany-west, I am able to login and access the Azure portal.
SSO and 365 are working fine for us, but admin portals for Azure/365 are down. Our workloads in Azure don't seem to be impacted.
I am still stunned people choose to do this, considering major Office 365 outages are basically a weekly thing now.
We are very dependent on Azure and Microsoft Authentication and Microsoft 365 and haven’t had weekly or even monthly issues. I can think of maybe three issues this year.
I’ve been migrating our services off of Azure slowly for the past couple of years. The last internet facing things remaining are a static assets bucket and an analytics VM running Matomo. Working with Front Door has been an abysmal experience, and today was the push I needed to finally migrate our assets to Cloudflare.
I feel pretty justified in my previous decisions to move away from Azure. Using it feels like building on quicksand…
> I feel pretty justified in my previous decisions to move away from Azure
I felt this way about AWS last week
We’re 100% on Azure but so far there’s no impact for us.
Luckily, we moved off Azure Front Door about a year ago. We’d had three major incidents tied to Front Door and stopped treating it as a reliable CDN.
They weren’t global outages, more like issues triggered by new deployments. In one case, our homepage suddenly showed a huge Microsoft banner about a “post-quantum encryption algorithm” or something along those lines.
Kinda wild that a company that big can be so shaky on a CDN, which should be rock solid.
We battled https://learn.microsoft.com/en-us/answers/questions/1331370/... for over a year, and finally decided to move off since there was no any resolution. Unfortunately our API servers were still behind AFD so they were affected by today's stuff...
Can't download VSCode :D
Error: visual-studio-code: Download failed on Cask 'visual-studio-code' with message: Download failed: https://update.code.visualstudio.com/1.105.1/darwin-arm64/st...
I have had intermittent issues with winget today. I use UniGetUI for a front-end, and anything tied to Microsoft has failed for me. Judging by the logs, it's mostly retrieving the listing of versions (I assume similar to what 'apt-get update' does, I'm fairly new to using winget for Windows package management).
Also cant do anything right now with the repo's we have in Azure Devops, how lovely...
So that's why I can't check in for my Alaska Airlines flight... https://news.microsoft.com/source/features/digital-transform...
"BREAKING: Alaska Airlines' website, app impacted amid Microsoft Azure outage"
I am unable to load this article...presumably for related reasons
microsoft.com and some subdomains (answers.microsoft.com) has no A and AAA records. They screwed up big time.
That specific subdomain has issues with propagation: https://dnschecker.org/#A/answers.microsoft.com (only four resolvers return records)
The root zone and www. do not: https://dnschecker.org/#A/microsoft.com (all resolvers return records)
And querying https://www.microsoft.com/ results in HTTP 200 on the root document, but the page elements return errors (a 504 on the .css/.js documents, a 404 on some fonts, Name Not Resolved on scripts.clarity.ms, Connection Timed Out on wcpstatic.microsoft.com and mem.gfx.ms). That many different kinds of errors is actually kind of impressive.
I'm gonna say this was a networking/routing issue. The CDN stayed up, but everything else non-CDN became unroutable, and different requests traveled through different paths/services, but each eventually hit the bad network path, and that's what created all the different responses. Could also have been a bad deploy or a service stopped running and there's different things trying to access that service in different ways, leading to the weird responses... but that wouldn't explain the failed DNS propagation.
2027: the year of migrating from your own metal to a managed provider
2028: the year of migrating from a managed provider to the cloud
2029: the year of migrating from the cloud to your own metal in a rack
People keep thinking the solution to their problems is to do something new (that they don't fully understand).
TIL it's called Nirvana Fallacy
The upside of keep moving it acts as a hardener and chaos monkey. You shake out any crufty service no one knows how to build let alone deploy.
TIL it's called Nirvana Fallacy
We used to call it "The grass is always greener on the other side of the fence."
At least YOLD is possible. Is there capacity in the world for everyone to ditch clouds.
Instead of cyber security awareness month, we should rename it to cloud availability awareness month.
For me the same. It's very confusing that status page [1] is green
That status page is never red. Absolutely useless.
> There are currently no active events. Use Azure Service Health to view other issues that may be impacting your services.
Links to a page on Azure Portal which is down...
Surely more vibecoding will fix this problem. Time to fire more staff
Yikes, http://schemas.xmlsoap.org/soap/encoding/ is running on Azure and it's down. So any SOAP/WSDL api's are dead in the water.
HTTPSConnectionPool(host='schemas.xmlsoap.org', port=443): Max retries exceeded with url: /soap/encoding/ (Caused by SSLError(CertificateError("hostname 'schemas.xmlsoap.org' doesn't match '*.azureedge.net'")))
A service we rely on that isn't even running on Azure is inaccessible due to this issue. For an asset that probably never changes. Wild for that to be the SPOF.160k+ results on GitHub: https://github.com/search?q=http%3A%2F%2Fschemas.xmlsoap.org...
Yeah just took down the prod site for one of our clients since we host the front-end out of their CDN. Just got wrapped up panic hosting it somewhere else for the past hour, very quickly reminds you about the pain of cookies...
Pretty much all Azure services seem to be down. Their status page says it's only the portal since 16:00. It would be nice if these mega-companies could update their status page when they take down a large fraction of the Internet and thousands of services that use them.
All of our Azure workloads are up, but we don't use Azure Front Door. That seems to be the only impacted product, apart from the management portal.
We're using Application Gateway for ingress, that seems to be effected.
It still surprises me how much essential services like public transport are completely reliant on cloud providers, and don't seem to have backups in place.
Here in The Netherlands, almost all trains were first delayed significantly, and then cancelled for a few hours because of this, which had real impact because today is also the day we got to vote for the next parlement (I know some who can't get home in time before the polls close, and they left for work before they opened).