Comment by crystal_revenge
Comment by crystal_revenge 2 days ago
Notebooks are great as notebooks, but it's very well established, even in the DS community, that they are a terrible way to write maintainable, sharable, scalable code.
It's not about preference, it's objectively a terrible idea to build complex workflows with notebooks.
The "scoff" was in my head, the action that came out of my mouth was to help them understand how to create reusable Python modules to help them organize their code.
The answer is to help these teams build an understanding of how to properly translate their notebook work into re-useable packages. There is really no need for data scientists to follow terrible practices, and I've worked on plenty of teams that have successfully been able to onboard DS as functioning software engineers. You just need a process and a culture that notebooks cannot be the last stage of a project.
The thing with data pipelines is they have a linear execution. You start from the top and work your way down.
Notebooks do that, and even leave a trace while doing it. Table outputs, plots, etc.
It is not like a python backend that listens to events and handle them as they come, sometimes even in parallel.
For data flow, the code has an inherent direction.