Comment by jongjong

Comment by jongjong 5 hours ago

6 replies

Yes of course you can have exactly once delivery if you clearly define who/what the receiver is. You can easily deduplicate messages on the receiving end based on UUIDs created by the sender for example... Since they are duplicates of the exact same message, it doesn't matter which one is 'the original'.

It makes sense to worry about this only if you're worried about wasting bandwidth in the event of network instability (since the same message may sometimes traverse the network multiple times) but that's not generally something engineers should worry about.

It's ironic how some people use this to try to talk down to 'junior' developers.

Anyone can memorise hearsay about distributed systems but few can speak from experience.

qaq 5 hours ago

you can't in general case because you don't have infinite memory for dedupe.

  • lisper 5 hours ago

    You apparently missed this:

    "Just for the sake of completeness I should point out that removing duplicates at the receiver is a pretty extreme oversimplification of what you would do in practice to provide exactly-once delivery. A complete solution would almost certainly be an example of Greenspun's Tenth Law applied to the TCP protocol rather than Common Lisp."

  • jongjong 5 hours ago

    Good point but you don't need infinite memory. You can set expiries to discard, for example. Expiries and timeouts can be defined as part of the protocol. You can impose reasonable constraints per-socket on the buffer size for spam prevention. There is a lot of wiggle room.

    On the sender side, you can require an ACK response for each message UUID and rebroadcast only if the ACK is not received within a certain timeframe.

    You don't necessarily need a sophisticated algorithm to get a good practical solution which solves real problems.

    • mtndew4brkfst 5 hours ago

      If lack of a successful ACK means you retry indefinitely until you do get an ACK, then whichever side handles deduplication need to store all observed messages indefinitely or you may deliver the same message more than once without correctly deduplicating it.

      If instead you only store them for N days but an ongoing retry loop finally succeeds in N+1 days, you get duplicated delivery.

      If a retry loop gives up after finite elapsed time, which sets a ceiling on how long you must retain observations, then you do not in fact achieve at-least-once delivery without using a lesser qualifying statement.

      One of those constraints must be compromised on. That doesn't mean a practically useful implementation can't exist, it just means a theoretically ideal one can't without the infinite memory.

      Many systems and real world processes would actually suffer negative consequences if a weeks-old payload was delivered-as-new, so hardly anyone balks at this part of the implications. But again, you have to redefine at least one of the claims (or acquire infinite memory) to have a coherent system that fulfills those claims.

    • two_handfuls 5 hours ago

      Right, but all this assumes timing guarantees that are not present in the original impossibility result.

nhumrich 4 hours ago

You seem to be missing the point though. Wanting "exactly once delivery" in practice is like saying, "I shouldn't need to worry about dedupes at the application level". Which is everyone's dream, but the seniors are saying, "yes, you have to handle it at the application level, there is no way around that".