Dbos: Durable Workflow Orchestration with Go and PostgreSQL

52 points by Bogdanp 4 days ago

Sounds exactly like how Temporal markets itself. I find that the burden of creating idempotent sub-steps in the workflow falls on the developer, regardless of checkpoints and state management at the workflow level.

Reply View 4 replies

KraftyOne 3 hours ago

Yes, in any durability framework there's still the possibility that a process crashes mid-step, in which case you have no choice but to restart the step.
Where DBOS really shines (vs. Temporal and other workflow systems) is a radically simpler operational model--it's just a library you can install in your app instead of a big heavyweight cluster you have to rearchitect your app to work with. This blog post goes into more detail: https://www.dbos.dev/blog/durable-execution-coding-compariso...

Reply View | 3 replies
- bjornsing 26 minutes ago
  
  > Yes, in any durability framework there's still the possibility that a process crashes mid-step, in which case you have no choice but to restart the step.
  Golem [1] is an interesting counterexample to this. They run your code in a WASM runtime and essentially checkpoint execution state at every interaction with the outside world.
  But it seems they are having trouble selling into the workflow orchestration market. Perhaps due to the preconception above? Or are there other drawbacks with this model that I’m not aware of?
  1. https://www.golem.cloud/post/durable-execution-is-not-just-f...
  
  Reply View | 1 reply
  
  qianli_cs 12 minutes ago
  
  I think one potential concern with "checkpoint execution state at every interaction with the outside world" is the size of the checkpoints. Allowing users to control the granularity by explicitly specifying the scope of each step seems like a more flexible model. For example, you can group multiple external interactions into a single step and only checkpoint the final result, avoiding the overhead of saving intermediate data. If you want finer granularity, you can instead declare each external interaction as its own step.
  Plus, if the crash happens in the outside world (where you have no control), then checkpointing at finer granularity won't help.
  
  Reply View | 0 replies
- jiggunjer 2 hours ago
  
  Oh I see. Seems Nextflow is a strong contender in the serverless orchestrator market (serverless sounds better than embedded).
  From what I can tell though, NF just runs a single workflow at a time, no queue or database. It relies on filesystem caching for "durability". That's changing recently with some optional add-ons.
  
  Reply View | 0 replies

chc4 3 hours ago

> Exactly-Once Event Processing

This sounds...impossible? If you have some step in your workflow, either you 1) record it as completed when you start, but then you can crash halfway through and when you restore the workflow it now isn't processed 2) record it as completed after you're done, but then you can crash in-between completing and recording and when you restore you run the step twice.

#2 sounds like the obvious right thing to do, and what I assume is happening, but is not exactly once and you'd need to still be careful that all of your steps are idempotent.

Reply View 4 replies

KraftyOne 3 hours ago

The specific claim is that workflows are started exactly-once in response to an event. This is possible because starting a workflow is a database transaction, so we can guarantee that exactly one workflow is started per (for example) Kafka message.
For step processing, what you say is true--steps are restarted if they crash mid-execution, so they should be idempotent.

Reply View | 3 replies
- reillyse an hour ago
  
  "Exactly-Once Event Processing" is the headline claim - I actually missed the workflow starting bit. So what happens if the workflow fails? Does it get restarted (and so we have twice-started) or does the entire workflow just fail ? Which is probably better described as "at-most once event processing"
  
  Reply View | 2 replies
  
  bjornsing 23 minutes ago
  
  "Exactly-Once Event Processing" is possible if (all!) the processing results go into a transactional database along with the stream position marker in a single transaction. That’s probably the mechanism they are relying on.
  
  Reply View | 0 replies
  
  qianli_cs an hour ago
  
  I think a clearer way to think about this is "at least once" message delivery plus idempotent workflow execution is effectively exactly-once event processing.
  The DBOS workflow execution itself is idempotent (assume each step is idempotent). When DBOS starts a workflow, the "start" (workflow inputs) is durably logged first. If the app crashes, on restart, DBOS reloads from Postgres and resumes from the last completed step. Steps are checkpointed so they don't re-run once recorded.
  
  Reply View | 0 replies

odie5533 3 hours ago

For a project with minimal users, we get a lot of DBOS posts.

Reply View 3 replies

KraftyOne 3 hours ago

We may be a small startup, but we're growing fast with no shortage of production users who love our tech: https://www.dbos.dev/customer-stories

Reply View | 0 replies
Jtsummers 3 hours ago

Not really.
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... - Not even 3 full pages worth over the past 5 years, though the first page is entirely from this year. It's maybe 2-3 a month on average this year, and a lot are dupes.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... - Nim, for comparison, which doesn't really make a dent in the programming world but shows up a lot. The first 15 pages covers the same time period.

Reply View | 0 replies
krashidov 2 hours ago

Why such negativity? What have you shipped?

Reply View | 0 replies

hmaxdml 4 hours ago

Thanks for posting! I am one of the author, happy to answer any question!

Reply View 14 replies

plmpsu 17 minutes ago

How does DBOS scale in a cluster? with Temporal or Dapr Workflows, applications register running their supported workflows types or activities and the workflow orchestration framework balances work across applications. How does this work in the library approach?
Also, how is DBOS handling workflow versioning?
Looking forward for your Java implementation. Thanks

Reply View | 1 reply
- qianli_cs 6 minutes ago
  
  Good questions!
  DBOS naturally scales to distributed environments, with many processes/servers per application and many applications running together. The key idea is to use the database concurrency control to coordinate multiple processes. [1]
  When a DBOS workflow starts, it’s tagged with the version of the application process that launched it. This way, you can safely change workflow code without breaking existing ones. They'll continue running on the older version. As a result, rolling updates become easy and safe. [2]
  [1] https://docs.dbos.dev/architecture#using-dbos-in-a-distribut...
  [2] https://docs.dbos.dev/architecture#application-and-workflow-...
  
  Reply View | 0 replies
rickette an hour ago

There's a clear text password in one of your GitHub Action workflows: https://github.com/dbos-inc/dbos-transact-golang/blob/main/....

Reply View | 1 reply
- qianli_cs an hour ago
  
  That password is only used by the GHA to start a local Postgres Docker container (https://github.com/dbos-inc/dbos-transact-golang/blob/main/c...), which is not accessible from outside.
  
  Reply View | 0 replies
saintarian 3 hours ago

Great project! Love the library+db approach. Some questions:
1. How much work is it to add bindings for new languages? 2. I know you provide conductor as a service. What are my options for workflow recovery if I don't have outbound network access? 3. Considering this came out of https://dbos-project.github.io/, do you guys have plans beyond durable workflows?

Reply View | 2 replies
- KraftyOne 3 hours ago
  
  1. We also have support for Python and TypeScript with Java coming soon: https://github.com/dbos-inc
  2. There are built-in APIs for managing workflow recovery, documented here: https://docs.dbos.dev/production/self-hosting/workflow-recov...
  3. We'll see! :)
  
  Reply View | 1 reply
  
  travisgriggs 3 hours ago
  
  Elixir? Or does Oban hew close enough, that it’s not worth it?
  
  Reply View | 0 replies
jiggunjer 3 hours ago

Does it natively support job priorities? E.g. if there's 10 workflows submitted and I start up a worker, how does it pick the first job.

Reply View | 1 reply
- KraftyOne 3 hours ago
  
  Yeah, queue priority is natively supported: https://docs.dbos.dev/golang/tutorials/queue-tutorial#priori...
  
  Reply View | 0 replies
drakenot 3 hours ago

I read the Dbos vs Temporal thing, but can you speak more about if there is a different in durability guarantees?

Reply View | 3 replies
- KraftyOne 3 hours ago
  
  The durability guarantees are similar--each workflow step is checkpointed, so if a workflow fails, it can recover from the last completed step.
  The big difference, like that blog post (https://www.dbos.dev/blog/durable-execution-coding-compariso...) describes, is the operational model. DBOS is a library you can install into your app, whereas Temporal et al. require you to rearchitect your app to run on their workers and external orchestrator.
  
  Reply View | 2 replies
  
  dfee 2 hours ago
  
  This makes sense, but I wonder if there’s a place for DBOS, then, for each language?
  For example, a Rust library. Am I missing how a go library is useful for non-go applications?
  
  Reply View | 1 reply
  
  KraftyOne 2 hours ago
  
  There are DBOS libraries in multiple languages--Python, TS, and Go so far with Java coming soon: https://github.com/dbos-inc
  No Rust yet, but we'll see!
  
  Reply View | 0 replies
tester5555 2 hours ago

[dead]

Reply View | 0 replies

danggit 3 hours ago

Looks interesting

Reply View 0 replies