Comment by hirako2000
Comment by hirako2000 4 days ago
It's a centralization vs decentralisation vs distributed system question.
Since down detectors serve to detect failures of centralized (and decentralized systems) the idea would be to at least get that right: a distributed system to detect outages.
You basically run detectors that heartbeat each others. Just a few suffice.
Once you start to see clusters of detectors go silent, you can assume things are falling apart, which is fine so long as a few remain.
Self healing also helps to make the web of nodes resilient to inevitable infrastructure failures.