Comment by convolvatron

Comment by convolvatron 19 hours ago

1 reply

this is likely wrong. the issue with partitions is that we can no longer communicate at all, thus we can't end up in the same state. If we have poor performance, thats certainly something that worth putting machinery in to adapt to, but its not at all in the same class as 'I can't talk to you and I dont know what you're doing at all' fro a correctness standpoint

edit: yeah ok, since failure detection is being driven by timers by necessity, then sure. the tradeoff we're making between the interval under which we're unable to make progress vs the upheaval caused by announcing a failure.

anonymars 18 hours ago

Yeah, I glossed over a few steps. There's likely a latency threshold beyond which you should abort, and then it is a partition (after all, that's what TCP is doing under the hood if it sends a packet and doesn't get a response).

One should be so lucky to have an operation fail immediately, rather than lumber on until it times out (holding resources hostage all the while)!