Comment by yibers

Comment by yibers 15 hours ago

Ass covering-wise, you are probably better off going down with everyone else on us-east-1. The not so fun alternative: being targeted during an RCA explaining why you chose some random zone no one ever heard of.

rconti 14 hours ago

Places nobody's ever heard of like "Ohio" or "Oregon"?

Yeah, I'm not worried about being targeted in an RCA and pointedly asked why I chose a region with way better uptime than `us-tirefire-1`.

What _is_ worth considering is whether your more carefully considered region will perform better during an actual outage where some critical AWS resource goes down in Virginia, taking my region with it anyway.

Reply View 8 replies

xingped 13 hours ago

IIRC, some AWS services are solely deployed on and/or entirely dependent on us-east-1. I don't recall which ones, but I very distinctly remember this coming up once.

Reply View | 7 replies
- cj 12 hours ago
  
  AWS IAM has caused multiple cross-region outages.
  
  Reply View | 0 replies
- nothrabannosir 12 hours ago
  
  CloudFront certificates
  
  Reply View | 0 replies
- technicalape 9 hours ago
  
  Everything new basically, like the AI services.
  
  Reply View | 0 replies
- paulddraper 8 hours ago
  
  IAM and Route53 have dependencies on us-east-1.
  AWS Organizations/Account management is us-east-1.
  And if you want a CDN with a custom hostname and want TLS…you have to use us-east-1.
  
  Reply View | 2 replies
  
  TonyCoffman 4 hours ago
  
  The Route53 control plane is in us-east-1, with an optional temporary auto-failover to us-west-2 during outages. The data plane for public zones is globally distributed and highly resilient, with a 100% SLA. It continues to serve DNS records during regular control plane outages in us-east-1, but access to make changes is lost during outages.
  CloudFront CDN has a similar setup. The SSL certificate and key have to be hosted in us-east-1 for control plane operations but once deployed, the public data plane is globally or regionally dispersed. There is no auto failover for the cert dependency yet. The SLA is only three 9s. Also depends on Route53.
  The elephant in the room for hyperscalers is the potential for rogue employees or a cyber attack on a control plane. Considering the high stakes and economic criticality of these platforms, both are inevitable and both have likely already happened.
  
  Reply View | 1 reply
  
  paulddraper 21 minutes ago
  
  > It continues to serve DNS records during regular control plane outages in us-east-1, but access to make changes is lost during outages.
  Which is crazy, because a common failover mechanism is…DNS.
  If there were anything to run/control in a distributed fashion, it would be DNS.
  
  Reply View | 0 replies
- nexus-uw 10 hours ago
  
  IAM
  
  Reply View | 0 replies

kristianc 13 hours ago

I find it funny that we see complaints about why software quality has got worse alongside people advocating to choose objectively risky AWS regions for career risk and blame minimisation reasons.

Reply View 6 replies

goalieca 13 hours ago

This was always the case. The OG saying was “no one got fired for buying IBM”. Then it was changed to Microsoft. And so on..

Reply View | 0 replies
[removed] 12 hours ago

[deleted]

Reply View | 0 replies
throwawaysleep 12 hours ago

They are for the same reason. How do customers react to either? If us-east-1 fails, nobody complains. If Microsoft uses a browser to render components on Windows and eats all of your RAM, nobody complains.

Reply View | 3 replies
- bigstrat2003 10 hours ago
  
  Oh, people complain. The companies responsible have just gotten to the point where they are so entrenched that they don't need to care at all about customer complaints.
  
  Reply View | 2 replies
  
  zx8080 10 hours ago
  
  The value now is not really money from customers, but a company's share price or valuation. That, together with the hard push for subscriptions from every single app and service, devaluated customer experience and feedback. Because not many will go through the hell of unsubscribing process even after the outage or serious issues like private data stolen.
  There's just not much motivation left to do better systems.
  
  Reply View | 0 replies
  
  zx8080 10 hours ago
  
  It all sticks with the 'monopoly' scent.
  
  Reply View | 0 replies

g947o an hour ago

> explaining why you chose some random zone no one ever heard of

Is this from real experience of something that actually happened, or just imagined?

The only things that matter in a decision are:

* Services that are available in the region

* (if relevant and critical) Latency to other services

* SLAs for the region

Everything else is irrelevant.

If you think AWS is so bad that their SLAs are not trustworthy, that's a different problem to solve.

Reply View 0 replies

jordanb 11 hours ago

Istr major resource unavailability in US-East-2 during one of the big US-East-1 outages because people were trying to fail over. Then a week later there was a US-East-2 outage that didn't make the news.

So if you tried to be "smart" and set up in Ohio you got crushed by the thundering herd coming out of Virginia and then bit again because aws barely cares about you region and neither does anyone else.

The truth is Amazon doesn't have any real backup for Virginia. They don't have the capacity anywhere else and the whole geographic distribution scheme is a chimera.

Reply View 2 replies

Fhch6HQ 11 hours ago

This is an interesting point. As recently as mid-2023 us-east-2 was 3 campuses with a 5 building design capacity at each. I know they've expanded by multiples since, but us-east-1 would still dwarf them.
Makes one wonder, does us-west-2 have the capacity to take on this surge?

Reply View | 1 reply
- redditor98654 35 minutes ago
  
  us-west-2 is indeed very large, but will still not be able to take a full failover from us-east-1
  
  Reply View | 0 replies

nothrabannosir 12 hours ago

> being targeted during an RCA explaining why you chose some random zone no one ever heard of.

“Duh, because there’s an AZ in us-east-1 where you can’t configure EBS volumes for attachment to fargate launch type ECS tasks, of course. Everybody knows that…”

Reply View 0 replies

riffic 14 hours ago

how about following the well-architected framework and building something with a suitable level of 9s where you can justify your decisions during a blameless postmortem (please stamp your buzzword bingo card for a prize.)

Reply View 3 replies

paradox460 14 hours ago

We vibe code everything in flavor of the month node frameworks, tyvm, because elixir is too hard to hire for (or some equally inane excuse)

Reply View | 2 replies
- transcriptase 8 hours ago
  
  I look forward to the eventual launch of a new and improved version of your app using electron.
  What’s the point in having 64 Gb of DDR5 and 16 cores @ 4.2 GHz if not to be able to have a couple electron apps sitting at idle yet somehow still using the equivalent computational resources of the most powerful supercomputer on earth in the mid 1990s.
  
  Reply View | 0 replies
- DANmode 14 hours ago
  
  I agree with your post conceptually.
  However: Don’t underestimate community support (in the areas you’re likely to want it) when comparing development stacks.
  
  Reply View | 0 replies

throwawaysleep 14 hours ago

This to me was the real lesson of the outage. A us-east-1 outage is treated like bad weather. A regional outage can be blamed on the dev. us-east-1 is too big to get blamed, which is why it should be the region of choice for an employee.

Reply View 10 replies

Esophagus4 12 hours ago

Bizarre way of making decisions.
us-east-2 is objectively a better region to pick if you want US east, yet you feel safer picking use1 because “I’m safer making a worse decision that everyone understands is worse, as long as everyone else does it as well.”

Reply View | 5 replies
- nemomarx 12 hours ago
  
  It's about risk profile. The question isn't "which region goes down the least" but "how often will I be blamed for an outage."
  If you never get blamed for a US east outage, that's better than us-east-2 if that could get you blamed 0.5% of the time when it goes down and us1 isn't down or etc
  
  Reply View | 1 reply
  
  Esophagus4 2 hours ago
  
  But ise1 is down 4x more than use2 (AWS closely guards the numbers and won’t release them, but that is what I’ve seen from 3rd party analysis). Don’t you want your customers to say, “wow, half the internet was down today but XYZ service was up with no issues! I love them.”
  I can’t tell if it’s you thinking this way, or if your company is setup to incentivize this. But either way, I think it’s suboptimal.
  That’s not about “risk profile” of the business or making the right decision for the customer, that’s about risk profile of saving your own tail in the organizational gamesmanship sense. Which is a shame, tbh. For both the customer and for people making tech decisions.
  I fully appreciate that some companies may encourage this behavior, and we all need a job so we have to work somewhere, but this type of thinking objectively leads to worse technology decisions and I hope I never have to work for a company that encourages this.
  Edit: addressing blame when things go wrong… don’t you think it would be a better story to tell your boss that you did the right thing for the customer, rather than “I did this because everyone else does it, even though most of us agree it’s worse for the customer in general”. I would assume I’d get more blame for the 2nd decision than the 1st.
  
  Reply View | 0 replies
- naet 7 hours ago
  
  If my cloud provider goes down and my site is offline, my customers and my boss will be upset with me and demand I fix it as fast as possible. They will not care what caused it.
  If my cloud provider goes down and also takes down Spotify, Snapchat, Venmo, Reddit, and a ton of other major services that my customers and my boss use daily, they will be much more understanding that there is a third party issue that we can more or less wait out.
  Every provider has outages. US-east-2 will sometimes go down. If I'm not going to make a system that can fail over from one provider to another (which is a lot of work and can be expensive, and really won't be actively used often), it might be better to just use the popular one and go with the group.
  
  Reply View | 1 reply
  
  Esophagus4 2 hours ago
  
  us-east-2 goes down far, far less frequently than us-east-1. AWS doesn’t publicly release the outage numbers (they hold them very close to the chest) but some people have compiled the stats on their own if you poke around.
  The regions provide the same functionality, so I see genuinely no downside or additional work to picking the 2 regions over the 2 regions.
  It seems like one of those no brainer decisions to me. I take pride in being up when everyone else is down. 5 9s or bust, baby!
  
  Reply View | 0 replies
- TheNewsIsHere 11 hours ago
  
  I also don’t understand this.
  US-East-2 staying up isn’t my responsibility. If I need my own failover, I’m going to select a different region anyway.
  And it’s not like US-East-2 isn’t already huge and growing. It’s effectively becoming another US-East-1.
  
  Reply View | 0 replies
dontdoxxme 13 hours ago

Why aren't you using IBM cloud?

Reply View | 3 replies
- throwawaysleep 13 hours ago
  
  If IBM still had a good reputation, I probably would.
  
  Reply View | 1 reply
  
  skissane 11 hours ago
  
  I’ve seen people go with IBM Cloud because their salespeople were willing to discount more heavily than AWS/GCP/Azure were. Tier 2 players can be hungrier for your business than tier 1 are. And here I’m talking about completely mainstream workloads (Linux, K8S, etc)
  Separately from that, if you are trying to move certain types of non-mainstream IBM workloads to cloud (AIX, IBM i, z/OS) then IBM is tier 1 in that case
  
  Reply View | 0 replies
- [removed] 13 hours ago
  
  [deleted]
  
  Reply View | 0 replies

thejosh 14 hours ago

Bandwidth cost is also another major reason.

Reply View 0 replies