7 quick steps to improve your Dockerfile
We are living in a weird time. Docker has been around for almost 10 years and asking if someone uses it can be considered borderline offensive. On the other hand, in many organizations, it still is not used for various reasons like technical restrictions, security, or lack of knowledge to name a few. But the most probable scenario is that Docker images are being used, but very inefficiently. What does it mean? Let me give you some examples and suggestions on how to fix them.
1. Reinventing the wheel
Many times, especially when migrating from non-Docker environments, a plain operating system image is used as a base without even considering that there might already be an official, ready-to-use Docker image that will both decrease build time and delegate the maintaining responsibility outside our organization. For example, instead of using
debian:bullseye image and installing NodeJS during the building process, consider using
# don’t do this FROM debian:bullseye RUN apt update && apt install -y node COPY . . RUN node install && echo “success!” # do this FROM node:bullseye COPY . . RUN node install && echo “great success!”
2. Not using “slim”/Alpine base images
Usually, we do not need a full-fledged OS image with all of its utilities and programs. We may want to only use some of them and that is why slim images can be a better option. An even bigger improvement could be gained using the Alpine base image, perfect for carrying apps running as a single binary. Alpine APK repositories are limited compared to the ones used by Ubuntu or even Debian, so if you need to install something you might want to check the repos first. Otherwise, you will need to bring your dependency inside manually.
3. Creating unnecessary layers
I have seen Dockerfiles containing dozens of RUN and several COPY statements. This is insidious when we don’t know how Docker images work. Every step that adds, removes, or modifies something in the image file system triggers the creation of a new, incremental layer. This adds to the total size of our image, but most importantly - build times are longer than they could have been. Especially during development, we don’t want to wait for a new build longer than necessary. The solution is simple - to merge these instructions. Instead of writing several RUN statements, let’s make one with
&& between them:
# don’t do this: FROM node:slim RUN apt update RUN apt install -y postgresql RUN apt install -y redis (...) # do this instead: FROM node:slim RUN apt update && apt install -y postgresql redis (...)
We can do similarly with COPY instructions:
# instead of this: (...) COPY src/* . COPY assets/* . COPY other/* . (...) # do this: (...) COPY src/* assets/* other/* . (...)
TL;DR - remember that every COPY, RUN, and ADD instruction will create a new image layer which increases the total image size.
4. Copying all files from the local machine to the image
In our earlier example you can see that we have used:
COPY . .
While this is an easy way to move all the source code from the current directory inside the image, it may also carry over redundant files that will only increase the image size. An elegant way to solve this is to add a
.dockerignore file. If you are familiar with the
.gitignore file format, you will feel right at home. It’s very simple:
# Comments starting with ‘#’ are ignored # Add a path of the file you don’t want to use .gitignore # Exclude a whole directory temp/* # You can also use wildcards *.md # Exclude files if you need any that get caught by the wildcard !IMPORTANT.md
5. Keeping package manager cache inside the image
This one is very easy to forget about, but in my opinion, is one of the most important ones. If we install something using a package manager like apt, apk, etc. we need to make sure we clear its cache, otherwise we will end up with files we will never use. Depending on which base image we use, there are a few ways to take care of this:
# On Alpine-based images it’s very simple - just add –no-cache parameter: FROM alpine RUN apk add –no-cache curl (...) # On Debian-based images remove files from /var/lib/apt/lists directory: FROM debian:stable-slim RUN apt update && apt install -y curl && rm -rf /var/lib/apt/lists/* (...)
6. Using a single image for building and deployment
Early in the development process, we can get away with having one image that we build and immediately deploy on a dev environment (which often is the developer’s machine). Over time, however, we add more libraries, tools, and other dependencies that can make our image bloated. To deal with this, we can use a technique called multi-stage building. It allows us to build artifacts in one container and move them to another, clean container. We can leave our source code behind and only add things that are required for our app to run:
# Firstly, let’s create an image used for compiling and give it an alias FROM golang:1.19-alpine AS builder WORKDIR /build COPY go.mod go.sum . RUN go mod download COPY *.go . RUN go build -o go-app # Now we will use a new container and add only one new layer FROM alpine:latest COPY --from=builder /build/go-app /bin/go-app ENTRYPOINT [“/bin/go-app“]
As a result, we built a small, lightweight container that contains only what we wanted - our Go binary.
test ➤ docker image ls gotest:gotest REPOSITORY TAG IMAGE ID CREATED SIZE gotest gotest b93962d2f028 About a minute ago 8.87MB
7. Ignoring caching
This is a point that is very often overlooked. Docker image’s layer structure allows it for caching layers that have not changed since the last build. If you are often changing something in the application you are developing, you do not want to wait every time to download all your packages. That is why you should always keep the least frequently changing instructions to be run at the beginning of your Dockerfile and leave the ones related to your app in the bottom lines.
# avoid doing this FROM node:slim COPY . . RUN yarn install RUN apt update && apt install -y curl && rm -rf /var/lib/apt/lists/* # do this instead to allow caching of curl installation FROM node:slim RUN apt update && apt install -y curl && rm -rf /var/lib/apt/lists/* COPY . . RUN yarn install
If you follow these tips you will receive a decent Dockerfile, which will perform at least acceptably. There are more ways to improve it, often reliant on specific languages and/or tools. I am sure that you will have more ideas that could be added to the list, so please let me know if that is the case.
reviewed by Łukasz Lenart