Comment by infaloda
More importantly `15:45 UTC on 29 October 2025 – Customer impact began.
16:04 UTC on 29 October 2025 – Investigation commenced following monitoring alerts being triggered. ` A 19-minute delay in alert is a joke.
More importantly `15:45 UTC on 29 October 2025 – Customer impact began.
16:04 UTC on 29 October 2025 – Investigation commenced following monitoring alerts being triggered. ` A 19-minute delay in alert is a joke.
That’s some very carefully chosen phrasing.
I think if you really wanted to do on call right to avoid gaps you’d want no more than 6 hours on primary per day per shift, and you want six, not four, shifts per day. So you’re only alone for four hours in the middle of your shift and have plenty of time to hand off.
It's 19 minutes until active engagement by staff. And planned rolling restarts can trigger alerts if you don't set thresholds of time instead of just thresholds of count.
It would be nice though if alert systems made it easy to wire up CD to turn down sensitivity during observed actions. Sort of like how the immune system turns down a bit while you're eating.
10 minutes to alert, to avoid flapping false positives. 10 minute response window for first responders. Or, 5 minute window before failing over to backup alerts, and 4 minutes to wake up, have coffee, and open the appropriate windows.