Comment by tacitusarc

Comment by tacitusarc 5 hours ago

9 replies

I appreciate the insights here, but I am struggling to understand how “exactly one” can equate to “eliminate duplicates”. Let’s say someone arrived at my house and cut my grass, and I failed to confirm they had done so, so the company sent someone over to cut my grass again, maybe multiple times. It seems silly to claim my grass was cut exactly once, despite it consistently remaining at the same height. Obviously it was cut multiple times, just not with much effect after the first. The point of exactly-once is that the server and client don’t need to expend pointless effort on duplicates… right?

valzam 5 hours ago

In particular what exactly once delivery implies that I do not have to worry about it in my processing logic. I can build a `count += 1` and it will always be exactly correct.

The notion that there is no distinction between exactly once delivery and exactly once processing is very odd to me. In practice my processing needs to accommodate duplicates to be correct. If I had exactly once delivery my processing could be much simpler. If I could get exactly once delivery for free I would always choose it in a heartbeat.

  • jchw 5 hours ago

    The point is that it doesn't matter exactly where the deduplication matters. It could happen in your own processing code, or something upstream of it, like a queue library of some kind. That's pretty much what the entire article is saying; it's hard to meaningfully distinguish what part is actually delivery versus processing. e.g. most people would consider the guarantees imparted by the TCP stack are indeed part of delivery and not processing, but your TCP stack is having to do a lot of processing work to actually maintain the logical stream of bytes.

    • warkdarrior 4 hours ago

      > The point is that it doesn't matter exactly where the deduplication matters.

      Actually the point is that once deduplication is done at some layer, the layers above it will have to re-achieve exactly-once delivery.

      "Yes, the TCP layer did deliver this message only once, but the receiving software crashed right after, so now the sender has to send it again."

      • jchw 4 hours ago

        Hmmm. Maybe this is the reason why the processing vs delivery distinction matters. Because my thought is, well of course: To fix that you only send the acknowledgement after processing succeeds.

        But then again, once you do that, the processing code that is being wrapped really doesn't have to care about being idempotent anymore, as it is being handled a layer up. At that point, all it needs to care about is being atomic.

        I'm not sure if it practically matters either way. I'd rather have my processing code be both atomic and idempotent regardless just to make things easier to reason about, as long as it's not too much of a burden. I've always been a fan of concepts like idempotency tokens.

theamk 5 hours ago

We are talking network stack, so there is no actions - just data hand-off to the actual application code.

Someone arrives at your house, gives you a package, says "this is order 123". You thank them, they leave, but then they are hit by a car before they can report this. You unpack the package and use it.

Next day, someone else arrives at your house, gives you a package, says "this is order 123". You thank them, they leave. You know you've already received order 123, so you throw package away without even taking it into the house.

This happens few more times, but you don't care, your trash can is big.

Done! You now have "exactly once delivery".

Now, some might argue this is "exactly once processing" and you should only count what the delivery person does.. but this depends on where you draw the boundary. I draw it at "I am taking the package into the house", and I've only ever took one package there, so it was exactly-once for me.

The key part here is cost. I am assuming that opening package and using its contents is hard and takes a long time; while answering the door and throwing the package away is easy. This is definitely the case with modern networking stack, which re-transmits stuff all the time, and where the loss rate is very low.

  • tacitusarc 4 hours ago

    As this is a semantic debate over the definition of delivery, I asked my very non-technical wife if she thought in the scenario you described, the package was delivered exactly once. She said obviously not, and this discussion is very stupid, and I should stop participating in it. So there’s that.

    • Jach 2 hours ago

      Smart wife. My take on the whole thing is that it's not wise to reason from non-technical metaphors around packages or lawn mowing when the reality is electronic systems. I don't know if it's any wiser but what I like to do is work my way up from the basics. What does delivery mean? Start with two wires, one for signal and one for common ground. (Or just one wire, and pretend you can use earth-return reliably.) If that isn't enough to resolve what terms should mean, consider them with differential signaling. If that still isn't enough to get it, consider them with relay nodes. If at some point "delivery" has changed definitions to suddenly forbid something that previously wasn't forbidden, maybe you've made a mistake.

    • jbergens 3 hours ago

      I don't think the example was perfect which explains your wife's reaction.

      Think of it more like the first delivery guy/girl left his/her car outside and wrote 123 on it. Then walked back.

      The next one sees the car with a sign saying 123 and won't even ring the door bell or leave a package. Now you haven't gotten the package twice, it has not been delivered twice.

      Sure you can complain that there's a car outside your home but in digital system you won't even see it. It would also cost the deliver firm a car for every package but that is not your problem and again, in the digital world the cost is a lot less than a car.

      There is an argument that the street would be filled up with delivery vans ans there would be no more room for new deliveries to you or your neighbors but that is a limitation you could talk about. You probably can't handle an infinite number of packages delivered at the same time either and you won't wait an infinite amount of time for any specific package.

      Try this version with your wife.

schobi 5 hours ago

Same understanding: On the receiver side, we are going to drop duplicates (by processing, or by having no effect on the grass cutting any more). Thus, the end user is then seeing only one effect, one message delivered. The effect of delivery "message received" or "grass is cut" is achieved.

But still, the sender might need to send more than once (until confirmation). From the cost at the sender "sending multiple packages" or "sending more grass cutters" this is still the scenario "send one or more".

Sorry to fuel the fire... it is about the definition of "delivery"