Comment by chc4

Comment by chc4 9 hours ago

4 replies

> Exactly-Once Event Processing

This sounds...impossible? If you have some step in your workflow, either you 1) record it as completed when you start, but then you can crash halfway through and when you restore the workflow it now isn't processed 2) record it as completed after you're done, but then you can crash in-between completing and recording and when you restore you run the step twice.

#2 sounds like the obvious right thing to do, and what I assume is happening, but is not exactly once and you'd need to still be careful that all of your steps are idempotent.

KraftyOne 9 hours ago

The specific claim is that workflows are started exactly-once in response to an event. This is possible because starting a workflow is a database transaction, so we can guarantee that exactly one workflow is started per (for example) Kafka message.

For step processing, what you say is true--steps are restarted if they crash mid-execution, so they should be idempotent.

  • reillyse 7 hours ago

    "Exactly-Once Event Processing" is the headline claim - I actually missed the workflow starting bit. So what happens if the workflow fails? Does it get restarted (and so we have twice-started) or does the entire workflow just fail ? Which is probably better described as "at-most once event processing"

    • qianli_cs 6 hours ago

      I think a clearer way to think about this is "at least once" message delivery plus idempotent workflow execution is effectively exactly-once event processing.

      The DBOS workflow execution itself is idempotent (assume each step is idempotent). When DBOS starts a workflow, the "start" (workflow inputs) is durably logged first. If the app crashes, on restart, DBOS reloads from Postgres and resumes from the last completed step. Steps are checkpointed so they don't re-run once recorded.

    • bjornsing 6 hours ago

      "Exactly-Once Event Processing" is possible if (all!) the processing results go into a transactional database along with the stream position marker in a single transaction. That’s probably the mechanism they are relying on.