Comment by bjornsing
> Yes, in any durability framework there's still the possibility that a process crashes mid-step, in which case you have no choice but to restart the step.
Golem [1] is an interesting counterexample to this. They run your code in a WASM runtime and essentially checkpoint execution state at every interaction with the outside world.
But it seems they are having trouble selling into the workflow orchestration market. Perhaps due to the preconception above? Or are there other drawbacks with this model that I’m not aware of?
1. https://www.golem.cloud/post/durable-execution-is-not-just-f...
That still fundamentally suffers the same idempotency problem as any other system. When interacting with the outside world, you, the developer, need to be idempotent and enforce it.
For example, if you call an API (the outside world) to charge the user’s credit card, and the WASM host fails and the process is restarted, you’ll need to be careful to not charge again. This can happen after the request is issued, but before the response is received/processed.
This is no different than any other workflow library or service.
The WASM idea is interesting, and maybe lets you be more granular in how you checkpoint (eg for complex business logic that is self-contained but expensive to repeat). The biggest win is probably for general preemption or resource management, but those are generally wins for the provider not the user. Also, this requires compiling your application into WASM, which restricts which languages/libraries/etc you can use.