Comment by qianli_cs

Comment by qianli_cs 6 hours ago

1 reply

I think one potential concern with "checkpoint execution state at every interaction with the outside world" is the size of the checkpoints. Allowing users to control the granularity by explicitly specifying the scope of each step seems like a more flexible model. For example, you can group multiple external interactions into a single step and only checkpoint the final result, avoiding the overhead of saving intermediate data. If you want finer granularity, you can instead declare each external interaction as its own step.

Plus, if the crash happens in the outside world (where you have no control), then checkpointing at finer granularity won't help.

bjornsing 2 hours ago

Sure you get more control with explicit state management. But it’s also more work, and more difficult work. You can do a lot of writes to NVMe for one developer salary.