A while back I wrote a post about selecting a base Docker image for Node.js. In that post, I talked about the size difference of the default build for Node.js and the smaller, “slim” and “alpine” builds.
The difference can be significant: 650MB for the full image, vs 50MB for the Alpine Linux version.
However, there’s a note at the bottom of the description for the full, “node:<version”> image in the readme for the Node.js images on Dockerhub, that had me a bit confused (emphasis mine):
This tag is based off of buildpack-deps. buildpack-deps is designed for the average user of docker who has many images on their system. It, by design, has a large number of extremely common Debian packages. This reduces the number of packages that images that derive from it need to install, thus reducing the overall size of all images on your system.
… wait, what?
You’re saying installing a large number of packages – with massive file size – will reduce the overall size of images on my system?!
Surprisingly (to me, at least) it’s true.
Docker Image Layers
When you build a Docker image from a Dockerfile, every instruction in the file creates a new image layer.
While these layers collectively create a single image, they are stored individually with individual IDs. This is done so that the layer can be cached and re-used whenever possible.
By tagging (naming) a Docker image, you can easily refer to a complete image – one that is built out of many layers. You can do a lot of things with a Docker image that has a tag, including create new images `FROM` it, in a new Dockerfile.
When you combine Docker’s cache with tagged images, you get a very efficient re-use of large binary objects.
Using the same tagged image in multiple Dockerfile `FROM` instructions will not re-create the base image every time. It will use the one existing image that your system already has (or download it if it doesn’t have it, yet).
Dockerhub: Public Image Cache
While Docker does a great job of caching images and image layers on your local system, it also provides a globally public repository of image caches, called Dockerhub.
This is where you’ll find the Node.js Docker images, among thousands of others, for public use.
And when you think of Dockerhub as little more than a public cache of images – which can be easily downloaded to your system, to be cached and used and re-used locally – then the way in which a 650MB Node.js image can save space, begins to reveal itself.
Saving Space With A Larger Base
Let’s say you have 4 Node.js applications that all build “FROM node:6.9.5-alpine”. Each of these applications uses a module from npm that requires native build tools. To install that module, you have to add the build tools to your Docker image.
Generally, this will balloon your Docker image from 50MB to around 200MB before you even install your project into the image.
But worse yet, none of the images built from these three Dockerfiles will re-use the installed tool set. Each of them will add another 200MB of used hard drive space to your system, because each of them will individually install of the build tools.
With 4 applications and images, you now have 800MB+ of hard drive space used up.
If you were to switch to the full version of the “node:6.9.5” image, however, you would save approximately 550MB of drive space by not duplicating the build tool installation.
Yes, you need to have one copy of the full image and all of it’s layers, taking up 250MB of space when you build from it.
However, you only need one copy of the 250MB image.
When you specify FROM node:6.9.5 in multiple images on the same machine, it is re-used.
This is how a 650MB image can save space, compared to a 50MB image. Re-use.
Download vs Build
There is very little difference when it comes to caching, and downloading an image vs building an image locally.
If you build your own image and call it “my-image”, for example, you can then re-use the “my-image” as a base. Where that base image comes from is almost irrelevant, as long as Docker knows how to access it.
You can build “my-image” directly on your system and then re-use it on that system.
You can upload “my-image” to a private Docker repository, and re-use it from there.
You can upload “my-image” to the public Dockerhub, as well.
Wherever “my-image” lives, specifying that as the base image of another Dockerfile will ensure it is re-used and not re-created.
Alpine? Or Full Node.js Image?
The question remains: should you use the Alpine image or the full image, or the “slim” image?
My guide to choosing a Docker image for Node.js (which can be downloaded as a .pdf, using the form, below) will recommend the “-alpine” variation to start with and I’ll stick with that recommendation.
However, once you start adding build tools and other common libraries, you have another choice to make.
Is it worth the extra space of building multiple apps with duplicated layers? Or should you use the full Node.js image?
Or, as a third option, should you build your own version of the Node.js -alpine image, with the build tools you need, and re-use that as the base image for your apps?
These are questions no one can answer but you and your team, for your specific project.