Comment by staticassertion

Comment by staticassertion 3 days ago

4 replies

One of the most annoying parts of being in a container is that you can't sandbox yourself further within that container. Normal approaches like namespaces, mounts, chroot, etc, are all incompatible with running in a container. Therefor, if you want to go further than what a container provides, landlock is a powerful solution.

Further, while "whole process" sandboxing like containerizing is very effective under some conditions, having more fine grained access and the ability to reduce permissions over time is incredible.

Consider that I may need to open a file in my program. The file path will be provided by an env var `CONFIG_PATH`. My program now has to have total file system read permissions if it is going to support reading arbitrary configuration file paths, even though it only has to read one file.

I can instead set my program up to read that file one time and then never again, or I can set things up to only ever need to read that single file and no others, etc. I can incrementally reduce permissions, and that's really cool. You can't do that with a container - containers get what they get.

chuckadams 2 days ago

both cgroups and namespaces are hierarichal, so you certainly can subdivide the sandbox. That is, if you're a decent C programmer and can navigate some dense kernel documentation. You can also run Docker in Docker, but it requires a privileged root container, and even the creator of that feature suggests just bind-mounting the docker socket instead.

I have a nagging feeling Plan9 probably had a solution for all this 30 years ago.

  • staticassertion 2 days ago

    > both cgroups and namespaces are hierarichal, so you certainly can subdivide the sandbox.

    This is true, you can enter a namespace while in another namespace, but it's a privileged operation to namespace.

    Docker in Docker does use socket bind mounting already afaik, and it's a trivial privesc because docker runs as root and the ability to talk to the socket means you can run `docker run --privileged --user root image_name -it bash` and get a shell as the host root user.

    The solution is to allow unprivileged users to drop privileges, which is how MacOS and Windows work. On Windows you have integrity levels, tokens, etc, all of which you can drop without privileges. On MacOS you have seatbelt.

    Linux almost had this with unprivileged user namespaces but that's not viable because 30 years of "root -> kernel privesc isn't a security issue" attitude proved to be problematic.

    • chuckadams 2 days ago

      Docker-in-Docker is a different thing than bind mounting the socket. The former runs a new docker daemon in a container, the latter just talks to the host's socket. Anyway, https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-d... tells it straight from the horse's mouth. It appears you may not even need privileged containers to pull it off nowadays, but the author still lists several more footguns.

      Landlock is an all right start at unprivileged restrictions, but yeah, doesn't seem anywhere near as nice as pledge() and unveil().

      • staticassertion 2 hours ago

        Thanks, I'd misremembered that it just required --privileged. I suspect that will continue to be a requirement since unprivileged user namespaces are not viable.