Comment by andrewstuart2

Comment by andrewstuart2 21 hours ago

20 replies

I'm always torn when I see anything mentioning running an init system in a container. On one hand, I guess it's good that it's designed with that use case in mind. Mainly, though, I've just seen too many overly complicated things attempted (on greenfield even) inside a single container when they should have instead been designed for kubernetes/cloud/whatever-they-run-on directly and more properly decoupled.

It's probably just one of those "people are going to do it anyway" things. But I'm not sure if it's better to "do it better" and risk spreading the problem, or leave people with older solutions that fail harder.

bityard 20 hours ago

Yes, application containers should stick to the Unix philosophy of, "do one thing and do it well." But if the thing in your docker container forks for _any_ reason, you should have a real init on PID 1.

  • benreesman 12 hours ago

    There's nothing inherently wrong with containers in the abstract: virtualization is a critical tool in computer science (some might it's difficult to define computer science without a virtual machine). There's not even anything wrong with this "less than a new kernel, more than a new libc" neighborhood.

    The broken, ugly, malignant thing is this one godawful implementation Docker and its attic-dwelling Quasimodo cousin docker-compose.yml

    It's trivial to slot namespaces (or jails if you also like the finer things BSD) into a sane init system, process id regime, network interface regime: its an exercise in choosing good defaults for all the unshare-adjacent parameters.

    But a whole generation of SWEs memorized docker jank instead of Unix, and so now people are emotionally invested in it. You run compose to run docker to get Alpine and a node built on musl.

    You can just link node to musl. And if you want a chroot or a new tuntap scope? man unshare.

  • RulerOf 14 hours ago

    > you should have a real init on PID 1

    Got a handy list of those? My colleagues use supervisord and it kinda bugs me. Would love to know if it makes the list.

  • pas 20 hours ago

    is there any issue besides the potential zombies? also, why can't the real pid1 do it? it sees all the processes after all.

    • MyOutfitIsVague 19 hours ago

      Mostly just zombies and signal handlers.

      And your software can do it, if it's written with the assumption that it will be pid1, but most non-init software isn't. And rather than write your software to do so, it's easier to just reach for something like tini that does it already with very little overhead.

      I'd recommend reading the tini readme[0] and its linked discussion for full detail.

      [0]: https://github.com/krallin/tini

    • dathery 18 hours ago

      The main other problem is that the kernel doesn't register default signal handlers for signals like SIGTERM if the process is PID 1. So if your process doesn't register its own signal handlers, it's hard to kill (you have to use SIGKILL). I'm sure anyone who has used Docker a lot has run into containers that seem to just ignore signals -- this is the usual reason why.

      > also, why can't the real pid1 do it? it sees all the processes after all.

      How would the real PID 1 know if it _should_ reap the zombie? It's normal to have some zombie processes -- they're just processes whose exit statuses haven't been reaped yet. If you force-reaped a zombie you could break a program that just hasn't yet gotten around to checking the status of a subprocess it spawned.

      • immibis 14 hours ago

        Processes only reap their direct children. Init is special because orphaned processes are reparented to init, which then has to reap them.

mikepurvis 20 hours ago

From my experience in the robotics space, a lot of containers start life as "this used to be a bare metal thing and then we moved it into a container", and with a lot of unstructured RPC going on between processes, there's little benefit in breaking up the processes into separate containers.

Supervisor, runit, systemd, even a tmux session are all popular options for how to run a bunch of stuff in a monolithic "app" container.

  • palata 20 hours ago

    My experience in the robotics space is that containers are a way to not know how to put a system together properly. It's the quick equivalent of "I install it on my Ubuntu, then I clone my whole system into a .iso and I call that a distribution". Most of the time distributed without any consideration for the open source licences being part of it.

    • mikepurvis 19 hours ago

      I've always advocated against containers as a means of deploying software to robots simply because to my mind it doesn't make sense— robots are full of bare-metal concerns, whether it's udev rules, device drivers, network config, special kernel or bootloader setup, never mind managing the container runtime itself including startup, updating, credentials, and all the rest of it. It's always felt to me like by the time you put in place mechanisms to handle all that crap outside the container, you might as well just be building a custom bare metal image and shipping that— have A/B partitions so you copy an update from the network to the other partition, use grub chainloading, wipe hands on pants.

      The concern regarding license-adherence is orthogonal to all that but certainly valid. I think with the ROS ecosystem in particular there is a lot of "lol everything is BSD/Apache2 so we don't even have to think about it", without understanding that these licenses still have an attribution requirement.

      • westurner 17 hours ago

        For workstations with GPUs and various kernel modules, rpm-ostree + GRUB + Native Containers for the rootfs and /usr and flatpaks etc on a different partition works well enough.

        ostree+grub could be much better at handling failover like switches and rovers that then need disk space for at least two separate A/B flash slots and badblocks and a separate /root quota. ("support configuring host to retain more than two deployments" https://github.com/coreos/rpm-ostree/issues/577#issuecomment... )

        Theoretically there's a disk space advantage to container layers.

        Native Containers are bare-metal host images as OCI Images which can be stored in OCI Container Registries (or Artifact registries because packages too). GitHub, GitLab, Gitea, GCP, and AWS all host OCI Container/Artifact Registries.

        From https://news.ycombinator.com/item?id=44401634 re bootc-image-builder and Native Containers and ublue-os/image-template, ublue-os/akmods, ublue-os/toolboxes w/ "quadlets and systemd" (and tini is already built-in to Docker and Podman) though ublue/bazzite has too many patches for a robot:

        > ostree native containers are bootable host images that can also be built and signed with a SLSA provenance attestation; https://coreos.github.io/rpm-ostree/container/

        SBOM tools can scan hosts, VMs, and containers to identify software versions and licenses for citation and attribution. (CC-BY-SA requires Attribution if the derivative work is distributed. AGPL applies to hosted but not necessarily distributed derivative works. There's choosealicense.com , which has a table of open source license requirements in an Appendix: https://choosealicense.com/appendix/ )

        BibTeX doesn't support schema.org/SoftwareApplication or subproperties of schema:identifier for e.g. the DOI URN of the primary schema.org/ScholarlyArticle and it's :funder(s).

        ...

        ROS on devices, ROS in development and simulation environments;

        Conda-forge and RoboStack host ROS Robot Operating System as conda packages.

        RoboStack/ros-noetic is ROS as conda packages: https://github.com/RoboStack/ros-noetic

        gz-sim is the new version of gazebosim, a simulator for ROS development: https://github.com/conda-forge/gz-sim-feedstock

        From https://news.ycombinator.com/item?id=44372666 :

        > mujoco_menagerie has Mujoco MJCF XML models of various robots.

        Mujoco ROS-compatibility: https://github.com/google-deepmind/mujoco/discussions/990

        Moveit2: https://github.com/moveit/moveit2 :

        > Combine Gazebo, ROS Control, and MoveIt for a powerful robotics development platform.

        RoboStack has moveit2 as conda packages with clearly-indicated patches for Lin/Mac/Win: ros-noetic-moveit-ros-visualization.patch: https://github.com/RoboStack/ros-noetic/blob/main/patch/ros-...

        ...

        Devcontainer.json has been helpful for switching between projects lately.

        devcontainer.json can reference a local container/image:name or a path to a ../Dockerfile. I personally prefer to build a named image with a Makefile, though vscode Remote Containers (devcontainers extension) can build from a Dockerfile and, if the devcontainer build succeeds, start code-server in the devcontainer and restart vscode as a client of the code-server running in the container so that all of the tools for developing the software can be reproducibly installed in a container isolated from the host system.

        It looks like it's bootc or bootc-image-builder for building native container images?

        bootc-image-builder: https://github.com/osbuild/bootc-image-builder

  • yjftsjthsd-h 20 hours ago

    > Supervisor, runit, systemd, even a tmux session are all popular options for how to run a bunch of stuff in a monolithic "app" container.

    Did docker+systemd get fixed at some point? I would be surprised to hear that it was popular given the hoops you had to jump through last time I looked at it

    • mikepurvis 20 hours ago

      It's only really fixed in podman, with the special `--systemd=always` flag. Docker afaik still requires manually disabling certain services that will conflict with the host and then running the whole thing as privileged— basically, a mess.

  • [removed] 17 hours ago
    [deleted]
  • [removed] 17 hours ago
    [deleted]
  • sho_hn 20 hours ago

    tmux?! Please share your war stories.

    • mikepurvis 19 hours ago

      Not my favoured approach, but for early stage systems where proper off-board observability/alerting is not yet in place, tmux can function as a kind of ssh-accessible dashboard displaying the stdout of key running processes, and also allowing some measure of inline recovery— like if a process has crashed, you can up-arrow and relaunch it in the same environment it crashed out of.

      Obviously not an approach that scales, but I think it can also work decently well as a dev environment, where you want to run "stock" for most of the components in the system, and just be syncing in an updated workspace and restarting the one bit being actively developed on. Being able to do this without having to reason about a whole tree of interlinked startup units or whatever does lower the barrier to entry somewhat.

      • PhilipRoman 13 hours ago

        One advantage is that if the process has some sort of console on it's stdin, you can do admin work easily. With init systems you now have to configure named pipes, worry about them blocking, have output in separate place, etc.

simonw 18 hours ago

I've used several hosting providers that charge by the container - Fly.io and Render and Google Cloud Run.

I often find myself wanting to run more than one process in s container for pricing reasons.

[removed] 17 hours ago
[deleted]