Comment by __MatrixMan__

Comment by __MatrixMan__ 2 days ago

3 replies

I sort of suspected that adding parameters was not the end of the story. My experience with this was just "make it work with papermill", so the notebooks I tested with were nice and self contained.

Although it does seem like packaging dependencies and handling parameters are separate problems, so I'm not sure if papermill is to be blamed for the fact that most notebooks are not ready to be handled like a black box, even after they're parameter-ready. Something like jupyenv is needed also.

crabbone a day ago

Jupyter is not the end of the story here. There are plenty of "extensions". These extensions go, generally, down two different ways: kernels and magic.

It's not very common for Jupyter magic to be added ad hoc by users, but it typically creates a huge dependency on the environment, so no jupyenv is going to help (eg. all the workload-manager related magic to launch jobs in Slurm / OpenPBS).

Kernels... well, they can do all sorts of things... beyond your wildest dreams and imagination. And, unlike magic, they are readily available for the end-user to mess with. And, of course, there are a bunch of pre-packaged ones, supplied by all sorts of vendors who want, in this way, to promote their tech. Say, stuff like running Jupyter over Kubernetes with Ceph volumes exposed to the notebook. There's no easy way of making this into a "module" / "black box" that can be combined with some other Python code. It needs a ton of infra code to support this, if it's meant to be somewhat stand-alone.

  • __MatrixMan__ 7 hours ago

    Are we talking about the same https://github.com/tweag/jupyenv ?

    It encapsulates the kernel, which encapsulates pretty much everything for the notebook, right? I haven't worked with Slurm or OpenPBS, but I think if you let nix build the images that your tasks are running in then I think you're covered for pretty much everything except things that only exist at runtime like database connections. Not a perfect black box, but close.

    • crabbone 25 minutes ago

      > It encapsulates the kernel, which encapsulates pretty much everything for the notebook, right?

      Not even close to everything. In real world the environment of a notebook consists of a bunch of things provided by whoever set up the lab, i.e. storage and tools.

      Typical examples include setting up Lustre or Ceph in a way that it will be accessible from a notebook (but that would also involve authentication, potentially).

      And, in terms of tools: a workload manager that's perhaps integrated with Jupyter to schedule notebook execution on available nodes, but also to simply run workloads. Also, just a bunch of stuff written by this or another research group. Just this week I had to install and configure some tool for arterial spin labeling, but that would be the case with any kind of research: there's plenty of stuff that researchers will rely on on, that is central to their research, but isn't directly related to Jupyter.

      By and large, Jupyter is just a front-end to any particular system where research happens, it's not the system itself.