Comment by nunez

Comment by nunez 6 days ago

9 replies

I'd like more clarity on this:

> The advantage over docker here is that (when using Flakes) Nix builds are completely reproducible. Docker containers may be isolated, but surprisingly they are not deterministic out of the box. With some work you can make docker deterministic, but thats what you need, its much easier to use Nix.

as the whole purpose of the Dockerfile is to create a reproducible environment.

thln666 4 days ago

The whole purpose of the Dockerfile is not to create a reproducible environment. The purpose of a Dockerfile is to run a bunch of commands inside of a container and save the output. Those commands may or may not produce the same output every time they're run.

For example, if you have a debian base container that you run `apt install nginx` in, what version you actually get depends on a lot of different things including what the current version of nginx is inside of the remote repositories you're installing from _when the docker build command is executed_, not when the Dockerfile is written.

So, if you do "docker build ." today, and then the same thing 6 months from now, you will probably not get the same thing. Thus, Dockerfiles are not reproducible without a lot of extra work.

Nix flakes are not like that - they tag _exact_ versions of every input in the flake.lock, so a build 6 months from now will give you the _exact same system_ as you have today, given the same input. This is the same as like an npm lock file or a fully-specified python requirements.txt (where you have each package with an ==<version>).

So, you definitely can make Dockerfiles reproducible, but again, the Dockerfile itself is not made to do that.

Hope that helps your understanding here!

  • JamesSwift 3 days ago

    > For example, if you have a debian base container that you run `apt install nginx` in, what version you actually get depends on a lot of different things including what the current version of nginx is inside of the remote repositories you're installing from _when the docker build command is executed_, not when the Dockerfile is written.

    Its even worse. Its not the current version when the command is executed, its _the current version taking the layer cache into account_, which is a classic docker gotcha in needing to do single line `apt-get update && apt-get install` to sidestep. The layer cache really makes it hard to reason about.

aidenn0 4 days ago

Do any of your Dockerfiles make e.g. apt calls? If so, then they will get a different version of software installed when built on different days, because that will depend on the state of the package servers.

A more trivial example of non-deterministic would be that you can write a Dockerfile that uses curl to fetch data from random.org; the functions nix provides for fetching from URLs require you to specify the sha-sum of the data you fetch.

Nix flakes make it hard for you to inject anything into your dependencies that hasn't been hashed to confirm its identity. It in many cases still isn't 100% deterministic (consider e.g. a multithreaded build system where orderings can influence the output), but it's a big improvement.

JamesSwift 6 days ago

Its reproducible at a superficial level. Tags are mutable, so someone can push a different “3.1” between build 1 and 2, which results in a different build. You can also be fuzzy with tags, so if you say “from nginx:3” as your base (or nginx:latest) then build 1 and 2 can change because of a new tagged build upstream.

Then theres the million app-level changes that can creep in, eg copying local source is non-deterministic, apt-update, git clone, etc. Nix requires you to be fully explicit about the hash of the content you expect in each of those cases and so if you build it twice it is actually the same build.

theossuary 6 days ago

I guess you could consider a docker image a "reproducible environment," but it's certainly not a reproducible build; running docker build twice on the same directory isn't guaranteed to give you the same image. You could put in the work to make it a reproducible build, but it doesn't do anything to help you achieve that. Nix defaults to reproducible builds, and requires flags for "impure" non-reproducible builds. It does this by requiring all dependencies be managed by nix, and all sources be copied into the nix store.

jonotime 6 days ago

Author here.

The idea with nix flakes is it has a lock file which should guarantee the same build. This is like package-lock.json or pdm.lock which contains dependency checksums for every package.

Docker works more like your standard package manager. If you ask for mysql 5, today you may get mysql 5.1, but next week you may get mysql 5.2. So it does not come with a guarantee.

edude03 4 days ago

Containers are only reproducible at run time not at build time. Once you build a container and pull it down by its sha256, you'll get the same environment each time. However if your Dockerfile does any IO (curl, apt get, pip install etc) you're quite likely to get different images on different machines.

soraminazuki 4 days ago

Docker images are just as reproducible as binary blobs, which is essentially what they are.

  • antonvs 4 days ago

    Binary blobs can be easily reproducible, depending on how they’re built. By comparison, the “easily” part doesn’t apply to any non-trivial Docker image.

    The issue is that you have to lock down all your dependencies, including local data, repos, and registries. Most people, and even most companies, don’t have the resources to achieve that, so they simply don’t do it.

    Further, Docker doesn’t provide any significant mechanisms to help ensure reproducibility in the face of these issues, so you can’t say that Docker supports reproducibility.