Comment by zatkin
272 layers in a single image seems really unusual, is that just due to my lack of experience with containers? I've never seen an image with more than maybe a few dozen in my career...
272 layers in a single image seems really unusual, is that just due to my lack of experience with containers? I've never seen an image with more than maybe a few dozen in my career...
Well, as described...
> Here's how the disaster unfolded:
> 1. A user's container is under a brute-force attack, and /var/log/btmp grows to 11GB.
> 2. The user performs a commit, creating a new image layer.
> 3. A single new failed login is appended to /var/log/btmp.
> 4. Because of CoW, OverlayFS doesn't just write the new line. It copies the entire 11GB file into the new, upper layer.
> 5. This process repeated 271 times.
So the user is creating hundreds of layers for unclear reasons. The article refers to this as "exponential growth", but for that to be the case those commits would need to be triggered in proportion to the number of existing layers, which seems unlikely. Assuming the commits are caused by the user for reasons unrelated to the size of the existing image, this is growth that is quadratic† (in the number of layers; it's hard to characterize as a function of time or whatever), and it'd be nice to know why there were so many layers.
† Note that while the growth is technically quadratic, I don't think that impacted them. They say that the problem occurred when one 11GB file got copied into each of 272 image layers. That would require 2,992 GB, but they also say that the image exhibiting this problem was only 800GB.
I suspect that the answer here is that only some of the layers modified (and therefore copied) the log file. Probably about 72 of the layers. This is more like growth that's linear (still technically slightly superlinear, but probably not quadratic) in the number of failed SSH login attempts. ~75% of layers aren't contributing to the problem at all.
I can't think of anything that would justify that many layers. If I have that much complexity, I would split up the container or start writing bash scripts.
The automation of containers looks simple but developers with systems experience know the actual complexity of operating systems and running applications.
People who know javascript but don't know how a file system works can build and deploy containers. They just copy and paste stuff until it runs. The automation of containers makes brute force iteration a viable option. It was a lot more difficult trying to run a Linux server, which would force you to learn something or use a platform as a service instead.