Comment by fifilura
The thing with data pipelines is they have a linear execution. You start from the top and work your way down.
Notebooks do that, and even leave a trace while doing it. Table outputs, plots, etc.
It is not like a python backend that listens to events and handle them as they come, sometimes even in parallel.
For data flow, the code has an inherent direction.
> Notebooks do that, and even leave a trace while doing it.
Perhaps the largest critique against notebooks is that they don't enforce a linear execution of cells. Every data scientist I know has been bitten by this at least once (not realizing they're in a stale cell that should have been updated).
Sure you could solve this by automating the entire notebook ensuring top-down execution order but then why in the world are you using a notebook like this? There is no case I can think of where this would be remotely better than just pulling out the code into shared libraries.
I've worked on a wide range of data science teams in my career and by far the most productive ones are the ones that have large shared libraries and have a process in place for getting code out of notebooks and into a proper production pipeline.
Normally I'm the person defending notebooks since there's a growing number of people who outright don't want to see them used ever. But they do have their place, as notebooks. I can't believe I'm getting down voted for suggesting one shouldn't build complex workflows using notebooks.