Comment by bhaney

Comment by bhaney 6 hours ago

13 replies

In case you're someone who actually knows anything about distributed systems and you're not looking forward to slogging through this long article filled with claims like "I have a PhD in AI so I know what I'm talking about" to find where the author made their mistake, let me save you the time. It's the typical conflation of exactly-once delivery with exactly-once processing, which the author acknowledges and then chooses to ignore because they're basically the same for practical purposes, as if this somehow changes the reality of the delivery itself and the restrictions on its guarantees.

Yes, everyone knows you can layer an idempotency mechanism on top of at-least-once delivery to achieve exactly-once processing (so long as you're willing to tie up memory/storage for an infinite amount of time). But this does not equate to exactly-once delivery, and you know that.

Groxx 5 hours ago

Yea... this is kinda inflammatory but I honestly have to agree with it.

The post largely summarizes as "you can have exactly-once delivery if you re-define it to be at-least-once processing with idempotency".

Those are different things.

In fact, that's the entire point behind saying that it's impossible.

You can't design a system that is exactly-once at any level, so don't even bother trying. If someone wants you to guarantee something will happen, you can point to that impossibility to say "you need to retry, it's not optional, and anyone who tells you otherwise is lying to you". That has happened to me multiple times in my career; it's a thing charlatans keep trying to sell to businesses, and businesses eat it up because it sounds so much magically simpler than what their engineers keep telling them needs to be done.

Because it is magic. It doesn't exist.

  • sam_lowry_ 3 hours ago

    Gosh, only after this comment I understood why so many programmers litter the code with retries, even though they seem superfluous.

  • philipswood 5 hours ago

    So if you build systems using messaging middleware you get to choose upfront whether you are going to use XA with exactly-once semantics and pay the performance penalty for that or whether you will implement your business logic idempotently on processing instead.

    You can do this across a whole landscape of vendors.

    So this whole class of "that's impossible" responses sounds to me (as an ex-brick layer) like "obviously you can't stack bricks next to each other perfectly straightly - so building walls is clearly impossible".

    So it feels a bit jarring when several vendors allow you to do this impossible thing.

    Google for JMS and exactly-once and you can find documentation with several products on exactly how to do this.

    One example: https://www.atomikos.com/Blog/TheSimpleSecretOfExactlyOnceDe...

    • Groxx 5 hours ago

      >In this post we will show how [exactly once delivery] can be done, plus how simple it is with Atomikos. ...

      >The producer should do its processing and send its message as part of a JTA/XA transaction. This ensures a message is sent if and only if the transaction commits. Any failures will result in rollback of the JTA/XA transaction - and no message will be sent. This means that failures can be safely retried until they succeed, without sending the resulting message more than once.

      This is at-least-once (retries) with probably-idempotency: a transaction.

      As I said: charlatans.

      They're selling you "exactly once" in the headline, but clearly state that it is not exactly once in the legalese below. This is less "buyer beware" and more "blatantly false advertising", in exactly the same way as a free energy machine. The abstraction they're offering may indeed be useful, but "exactly once" is literally lying, and I wouldn't trust them to be honest elsewhere eit

  • EGreg 5 hours ago

    This is sorta like the argument about CAP impossibility theorem while in practice consensus algorithms work 99.999% of the time. Or like Shannon’s information theory showing impossibility of compression, while many compression algorithms work well on actual data.

    This seems to me the same. In practical applications, you can indeed have the at-least-once delivery with an idempotency / backpressure system, and work 99% of the time, and be unavailable 1% of the time.

    • Groxx 5 hours ago

      Yep, and for practical applications (i.e. stuff that exists in this universe) that is absolutely good enough. You just have to choose which tradeoffs you can stomach best. With a fancy enough system, those tradeoffs can be driven shockingly low.

      But if someone tries to sell you a database that is 100% available and has perfect consistency, you can laugh and walk away. They're a flat-earther trying to sell you a bridge: they're either trying to trick you or they have no idea what they're talking about. Either way you don't want to be involved with them.

jchw 5 hours ago

Honestly, I really do find the traditional nomenclature to be a little pointless. It seems like the classic saying assumes that it's somehow okay to assume infinite time for re-delivery, but not infinite memory for memoization for some reason. On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot. I'd go as far as to say that in practice you really can't guarantee a message will be delivered and processed because you will have finite bounds on time, the absolute best you can do is at least guarantee that it either was definitely processed once or probably was not and handle it accordingly. (I formerly wrote "definitely" for the latter, thinking you could do this with two-phase commit, and then realized after walking away from the computer that you absolutely can't guarantee that, of course. Distributed systems are such a pain to reason about.)

Do I misunderstand?

  • jhanschoo 3 hours ago

    > On the other hand, in real life there aren't unlimited numbers of messages and you rarely want to accept infinitely stale messages either, so it's a bit moot.

    My understanding is that these happen IRL all the time in the guise of healing a network split or rebooting crashed nodes or bring new uninitialized servers into the system. Of course, IRL you usually translate the result to needing a different strategy to bring these systems up to speed beyond a certain threshold. But these thresholds and strategies and changing the number of nodes in the system are application-dependent, so the fiction of unbounded messages/memory/time helps focus the formal analysis and result.

    In the context of, say, a distributed KV store, it cautions you that unless you have said other strategy, you will end up with an inconsistent system or failure state if your message buffers are more space-constrained than required.

Spivak 5 hours ago

This article isn't for you then, this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible. When someone talks about at-least-once and at-most-once in the context of well-known queueing systems they say will say delivery but will mean processing, because as you say, they're the same in practice.

You typically hide the processing bit so that from the perspective of your application code it really is exactly-once.

  • bhaney 5 hours ago

    > this article is for people who have casually heard that exactly once delivery is impossible and take it to mean exactly once processing is impossible

    Those people would be better served by approximately two sentences clarifying that exactly-once processing is a different thing that can be achieved with at-least-once delivery and idempotency, rather than 20+ rambling paragraphs of redefining formal terms.

  • Ferret7446 5 hours ago

    I think the words "delivery" and "processing" are taught around middle school. There's probably no need to have an article for it.

[removed] 5 hours ago
[deleted]