Comment by 8cvor6j844qw_d6

I'll be interested in the incident writeup since DNS is mentioned. It will be interesting in a way if it is similar to what happened at AWS.

Insanity 5 days ago

It's pretty unlikely. AWS published a public 'RCA' https://aws.amazon.com/message/101925/. A race condition in a DNS 'record allocator' causing all DNS records for DDB to be wiped out.

I'm simplifying a bit, but I don't think it's likely that Azure has a similar race condition wiping out DNS records on _one_ system than then propagates to all others. The similarity might just end at "it was DNS".

Reply View 8 replies

parliament32 5 days ago

That RCA was fun. A distributed system with members that don't know about each other, don't bother with leader elections, and basically all stomp all over each other updating the records. It "worked fine" until one of the members had slightly increased latency and everything cascade-failed down from there. I'm sure there was missing (internal) context but it did not sound like a well-architected system at all.

Reply View | 2 replies
- nijave 5 days ago
  
  >slightly increased latency
  They didn't provide any details on latency. It could have been delayed an hour or a day and no one noticed
  
  Reply View | 0 replies
- RajT88 5 days ago
  
  Needs STONITH
  
  Reply View | 0 replies
kyrra 5 days ago

https://isitdns.com/

Reply View | 0 replies
cdr420 5 days ago

It's always DNS

Reply View | 3 replies
- tempest_ 5 days ago
  
  It is a coin flip, heads DNS, tails BGP
  
  Reply View | 2 replies
  
  r_lee 5 days ago
  
  THIS is the real deal. Some say it's always DNS but many times it's some routing fuckup with BGP. two most cursed 3 letter acronym technologies out there
  
  Reply View | 1 reply
  
  chasd00 5 days ago
  
  when a service goes down it's DNS when an entire nation or group of nations vanish it's BGP.
  
  Reply View | 0 replies

layer8 5 days ago

DNS has both naming and cache invalidation, so no surprise it’s among the hardest things to get right. ;)

Reply View 1 reply

dotancohen 5 days ago

That's three of the hardest problems in CS ))

Reply View | 0 replies