Comment by jchw

Comment by jchw 10 months ago

2 replies

Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason. On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot. I'd go as far as to say that in practice you really can't guarantee a message will be delivered and processed because you will have finite bounds on time, the absolute best you can do is at least guarantee that it either was definitely processed once or probably was not and handle it accordingly. (I formerly wrote "definitely" for the latter, thinking you could do this with two-phase commit, and then realized after walking away from the computer that you absolutely can't guarantee that, of course. Distributed systems are such a pain to reason about.)

Do I misunderstand?

jhanschoo 10 months ago

> On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot.

My understanding is that these happen IRL all the time in the guise of healing a network split or rebooting crashed nodes or bring new uninitialized servers into the system. Of course, IRL you usually translate the result to needing a different strategy to bring these systems up to speed beyond a certain threshold. But these thresholds and strategies and changing the number of nodes in the system are application-dependent, so the fiction of unbounded messages/memory/time helps focus the formal analysis and result.

In the context of, say, a distributed KV store, it cautions you that unless you have said other strategy, you will end up with an inconsistent system or failure state if your message buffers are more space-constrained than required.

Izkata 10 months ago

> Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason.

This is exactly where the argument is coming from. The same people who will say "you can get at most once or at least once, but not only once" don't realize they're doing the exact same thing as the "you can get only once" people, when they criticize the conflation of delivery and processing. They'll argue "delivery" and "processing" have to be kept separate because of memory/storage/bandwidth/etc it uses up in the retries, which is why "only once delivery" can't exist and they actually mean "only once processing", but if you keep that reasoning in mind, there's also no such thing as "at least once delivery" - you'll run out of something at some point (or even just hit your retry limit) and have to drop the retries, resulting in no delivery.

The people saying you can get "only once delivery" by using "at least once"+idempotency are working under other group's definitions, then getting annoyed when the definitions are changed so this implementation of "only once" isn't allowed.