Comment by fifilura

Comment by fifilura 10 months ago

2 replies

View on Hacker News

The thing with data pipelines is they have a linear execution. You start from the top and work your way down.

Notebooks do that, and even leave a trace while doing it. Table outputs, plots, etc.

It is not like a python backend that listens to events and handle them as they come, sometimes even in parallel.

For data flow, the code has an inherent direction.

crystal_revenge 10 months ago

> Notebooks do that, and even leave a trace while doing it.

Perhaps the largest critique against notebooks is that they don't enforce a linear execution of cells. Every data scientist I know has been bitten by this at least once (not realizing they're in a stale cell that should have been updated).

Sure you could solve this by automating the entire notebook ensuring top-down execution order but then why in the world are you using a notebook like this? There is no case I can think of where this would be remotely better than just pulling out the code into shared libraries.

I've worked on a wide range of data science teams in my career and by far the most productive ones are the ones that have large shared libraries and have a process in place for getting code out of notebooks and into a proper production pipeline.

Normally I'm the person defending notebooks since there's a growing number of people who outright don't want to see them used ever. But they do have their place, as notebooks. I can't believe I'm getting down voted for suggesting one shouldn't build complex workflows using notebooks.

Reply View 1 reply

fastasucan 10 months ago

> I can't believe I'm getting down voted for suggesting one shouldn't build complex workflows using notebooks.
I think it is more the way you express your general attitude and how you look down upon your colleague just because they have found notebooks perfectly suited for their work.

Reply View | 0 replies