Comment by yalogin
Comment by yalogin 4 days ago
As a noob in this space, why is this needed when every job already runs inside a VM or a container? Again, a noob so please bear with me
Comment by yalogin 4 days ago
As a noob in this space, why is this needed when every job already runs inside a VM or a container? Again, a noob so please bear with me
On your desktop/laptop, most tasks probably don't run inside VMs or containers. Perhaps some applications use Flatpak or snaps or similar, but the default state for many currently popular Linux distributions is "no sandboxing of any kind".
Linux holds on to a negligible share of the overall desktop market OS, but it is marginally more popular among tech savvy people, which have plenty of disposable income, meaning the platform has steadily growing interest for malware authors and distributors despite its relatively low usage.
Its a way for legitimate apps to add an extra protection layer to protect the system from bad inputs or compromised dependencies, and it's very easy to use (see https://github.com/landlock-lsm/go-landlock). As an app developer it's so easy to add landlock to your app.
Another benefit is that it makes it easier for fine-grain control of resources in the application lifecycle. Maybe on initialization the app needs credentials to fetch some data and later on the all doesnt need them. Landlock allows the app to remove its own access to those credentials.
One of the most annoying parts of being in a container is that you can't sandbox yourself further within that container. Normal approaches like namespaces, mounts, chroot, etc, are all incompatible with running in a container. Therefor, if you want to go further than what a container provides, landlock is a powerful solution.
Further, while "whole process" sandboxing like containerizing is very effective under some conditions, having more fine grained access and the ability to reduce permissions over time is incredible.
Consider that I may need to open a file in my program. The file path will be provided by an env var `CONFIG_PATH`. My program now has to have total file system read permissions if it is going to support reading arbitrary configuration file paths, even though it only has to read one file.
I can instead set my program up to read that file one time and then never again, or I can set things up to only ever need to read that single file and no others, etc. I can incrementally reduce permissions, and that's really cool. You can't do that with a container - containers get what they get.
both cgroups and namespaces are hierarichal, so you certainly can subdivide the sandbox. That is, if you're a decent C programmer and can navigate some dense kernel documentation. You can also run Docker in Docker, but it requires a privileged root container, and even the creator of that feature suggests just bind-mounting the docker socket instead.
I have a nagging feeling Plan9 probably had a solution for all this 30 years ago.
> both cgroups and namespaces are hierarichal, so you certainly can subdivide the sandbox.
This is true, you can enter a namespace while in another namespace, but it's a privileged operation to namespace.
Docker in Docker does use socket bind mounting already afaik, and it's a trivial privesc because docker runs as root and the ability to talk to the socket means you can run `docker run --privileged --user root image_name -it bash` and get a shell as the host root user.
The solution is to allow unprivileged users to drop privileges, which is how MacOS and Windows work. On Windows you have integrity levels, tokens, etc, all of which you can drop without privileges. On MacOS you have seatbelt.
Linux almost had this with unprivileged user namespaces but that's not viable because 30 years of "root -> kernel privesc isn't a security issue" attitude proved to be problematic.
Docker-in-Docker is a different thing than bind mounting the socket. The former runs a new docker daemon in a container, the latter just talks to the host's socket. Anyway, https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-d... tells it straight from the horse's mouth. It appears you may not even need privileged containers to pull it off nowadays, but the author still lists several more footguns.
Landlock is an all right start at unprivileged restrictions, but yeah, doesn't seem anywhere near as nice as pledge() and unveil().
Thanks, I'd misremembered that it just required --privileged. I suspect that will continue to be a requirement since unprivileged user namespaces are not viable.
Containers are NOT security wrappers. They are convenience to avoid dependency hell from lazy people.
VM's can be security wrappers, but if you expose all of $HOME to a VM, then there really isn't much security happening, in terms of your data.
This lets developers of applications harden themselves, it doesn't require the end-user to do anything(like put it in a VM).
The opposite is true. Containwrization systems were built into operating systems as security features. The whole “Linux packaging is a hellscape of self-induced problems, so let’s duct tape a squashfs onto the side of this new security isolation system and call it a deployment primitive” use case we now call “containers” came later and is a fairly inelegant and wasteful way to avoid needing to solve the packaging hellscape problem. It’s valuable to many! But definitely is the square peg to the round hole (security isolation layer) of setns and chroot and friends.
You can make containers mostly as hardened security wise as a VM (but generally none of that comes by default), the big thing you can't get that a VM gives you is a new kernel instance. In a VM you have to break 2 kernels to totally own a machine.
In a container, provided the container software doesn't do it for you(which is likely true), you just have to break 1 kernel.
Not the case; there's a fascinating history here.
The technologies that enabled containerization (namespaces, chroot, and cgroups, and their predecessors on BSD/Solaris) were created specifically for security and resource isolation.
The people who came up with "containers" as we know them today found a clever hack: combining those security-oriented tools with a filesystem-in-a-box and packaging system allowed people to package entire OS userlands and run them pretty deterministically in multiple places. The security isolation properties of namespaces/cgroups/chroot also happened to provide increased determinism.
And I'm not criticizing that; containers are a very clever hack that solved a problem a lot of people have. I use them every day.
That said, the fact that containers became so ubiquitous in the first place speaks a completely self-induced problem that we didn't need to have in the software engineering community. That problem is, unfortunately, human/incentive-related in nature, so containers are probably the best we're going to get--problem is, they're not that good.
I complained about the root problems here awhile ago, easier to link than rehash that here: https://news.ycombinator.com/item?id=44069483
Drew deVault also explained it much more thoroughly and better than I could: https://drewdevault.com/2021/09/27/Let-distros-do-their-job....
> It provides a simple, developer-friendly way to add defense-in-depth to applications.
Defense in depth. Lock your valuables inside a safe, inside of your locked house. Why lock them in a safe when your house is already locked? Because if someone breaks into your house, you want additional defense "just in case". So just in case I wrote some shitty code and my server got hacked, lock the valuables in a safe anyway so that thief can't steal the expensive silverware (prod credentials).
Yes, but basically nobody uses either of those things. Some vendors like Redhat enables some of it by default, but when people have issues getting software to work, the first thing they are told to try is to turn all that stuff off.
Which means in the real world, the likelihood of that stuff being on and secure is fairly low, but not zero.
With landlock, pledge/unveil and similar tech, the developers of the software write and configure it, it's on by default and probably can't be turned off(or at least not easily).
You need to be root to set those up. These are typically admin-driven policies, not dev-driven. Landlock is unprivileged, meaning that a program can set its own policy up without root.
This is massive since most ways of dropping privileges on Linux require already having significant permissions (ie: root).
Landlock isn't really an alternative to containers. You can use it as another layer of security, within or outside a container.
It could even be paired with a chroot to make a container runtime. It's more like a building block for process restrictions
I think it's a reasonable question. The answer is that not everything does indeed run in a VM or a container: lots of things (notably on developer machines) run directly in a host user context, where they have access to all kinds of global state that they don't really need (developer credentials, browser state, etc.).
But also: even within a container (which isn't itself a sandbox) or a VM, you still have concentric circles of trust and/or privilege. If you're installing arbitrary dependencies from the Internet, for example, you probably want a basic initial defense of preventing those dependencies from exfiltrating your secrets at build time.