Comment by hirako2000

Comment by hirako2000 4 days ago

0 replies

It's a centralization vs decentralisation vs distributed system question.

Since down detectors serve to detect failures of centralized (and decentralized systems) the idea would be to at least get that right: a distributed system to detect outages.

You basically run detectors that heartbeat each others. Just a few suffice.

Once you start to see clusters of detectors go silent, you can assume things are falling apart, which is fine so long as a few remain.

Self healing also helps to make the web of nodes resilient to inevitable infrastructure failures.