How to Port a Sample Application to Chainguard Images
Porting Key Points
- Chainguard’s distroless Images have no shell or package manager by default. This is great for security, but sometimes you need these things, especially in builder images. For those cases we have
-dev
images (such ascgr.dev/chainguard/python:latest-dev
) which do include a shell and package manager. - Chainguard Images typically don’t run as root, so a
USER root
statement may be required before installing software. - The
-dev
images andwolfi-base
/chainguard-base
use BusyBox by default, so anygroupadd
oruseradd
commands will need to be ported toaddgroup
andadduser
. - The free Developer tier of Images provides
:latest
and:latest-dev
versions. Our paid Production Images offer tags for major and minor versions. - We use apk tooling, so
apt install
commands will becomeapk add
. - Chainguard Images are based on
glibc
and our packages cannot be mixed with Alpine packages. - In some cases, the entrypoint in Chainguard Images can be different from equivalent images based on other distros, which can lead to unexpected behavior. You should always check the image’s specific documentation to understand how the entrypoint works.
- When needed, Chainguard recommends using a base image like
chainguard-base
or a-dev
image to install an application’s OS-level dependencies. - Although
-dev
images are still more secure than most popular container images based on other distros, for increased security on production environments we recommend combining them with a distroless variant in a multi-stage build.
The Sample Application
The application in question is identidock. This application was written for the book Using Docker about ten years ago, which shows that we can still migrate software of this age to a new container while realizing the benefits of a no-to-low CVE count. The application itself will create identicons for a user name, similar to what GitHub generates for users with no avatar. It was designed at the time to demonstrate a “microservices” approach, and as such it’s made up of 3 services:
- The main identidock service, which takes the requests and talks to the dnmonster service and the redis cache
- A NodeJS application which creates the identicons
- Redis which is used as a simple cache
The services are put together as shown in the below diagram. The user only talks to the identidock service. The identidock service will first check the cache to see if it has already created an identicon for the input and, if not, requests a new identicon from the dnmonster service. The identicon is then returned to the user and saved to the cache if required.
The book walked through using various orchestrators to deploy the application, some of which have since fallen out of usage (anyone remember fleet?). For the sake of this tutorial, we’ll use Docker Compose, which is arguably the simplest surviving orchestrator covered in the book.
The first task was to get the 10-year-old application building and running again. As it was a simple example application, this was thankfully straightforward and mainly required bumping versions of dependencies and a couple of cases of replacing unmaintained libraries. For a larger project, this may well have been a major effort. The original code can be found on the using-docker GitHub repository and the updated working version (prior to moving to Chainguard Images) can be found on the v1 branch of the repository for this tutorial.
In order to follow along with the tutorial, please clone the code and switch to the v1
branch:
git clone https://github.com/chainguard-dev/identidock-cg.git
cd identidock-cg
git switch v1
Updating the Node.js Microservice
To begin, we’ll update the heart of the application – the dnmonster service. dnmonster is based on monsterid.js by Kevin Gaudin. The dnmonster container hosts an API which returns an identicon based on the input it’s given.
docker run -d -p 8080:8080 amouat/dnmonster
curl --output ./monster.png 'localhost:8080/monster/wolfi?size=100'
In this example, we give dnmonster the input “wolfi”, for which it will produce the following image:
The version of the code on the v1 branch already contains a few updates from the original code, as well as bumping versions, the codebase was moved from the old and sporadically maintained restify module to the more modern express module.
The Dockerfile for this version of the dnmonster service can be found in the dnmonster folder:
FROM node
RUN apt-get update && apt-get install -yy --no-install-recommends \
libcairo2-dev libjpeg62-turbo-dev libpango1.0-dev libgif-dev \
librsvg2-dev build-essential g++
#Create non-root user
RUN groupadd -r dnmonster && useradd -r -g dnmonster dnmonster
RUN install -d -o dnmonster -g dnmonster /home/dnmonster
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app/
RUN npm install
COPY ./src /usr/src/app
RUN chown -R dnmonster:dnmonster /usr/src/app
USER dnmonster
EXPOSE 8080
CMD [ "npm", "start" ]
The image can be built with:
cd dnmonster
docker build --pull -t dnmonster .
Looking at this image:
docker images dnmonster
REPOSITORY TAG IMAGE ID CREATED SIZE
dnmonster latest 1eb74ae9d0f0 9 seconds ago 1.22GB
We can also run the Grype scanning tool to investigate if there are any known vulnerabilities present in the image:
grype docker:dnmonster
✔ Vulnerability DB [no update available]
✔ Loaded image dnmonster:latest
✔ Parsed image sha256:20b235c7f120abdb223516c9c2649aac1e6e3dde8301ae7394c6b4a433537b51
✔ Cataloged contents 5703441c038c01715db710653b028e4689ec2ad96bf471bad9a06e0c4cf5775e
├── ✔ Packages [788 packages]
├── ✔ File digests [18,974 files]
├── ✔ File metadata [18,974 locations]
└── ✔ Executables [1,343 executables]
✔ Scanned for vulnerabilities [970 vulnerability matches]
├── by severity: 5 critical, 70 high, 154 medium, 32 low, 450 negligible (259 unknown)
└── by status: 11 fixed, 959 not-fixed, 0 ignored
This tells us that (at the time of writing) the image is 1.22 GB in size and has 970 known vulnerabilities.
The first step in moving to Chainguard Images is to try switching the image name in to check if anything breaks. In this case, we’ll begin with the developer variant of the Node image. Change the first line of the Dockerfile from:
FROM node
To:
FROM cgr.dev/chainguard/node:latest-dev
Unlike the cgr.dev/chainguard/node:latest
image, the :latest-dev
version includes a shell and package manager, which we will need for some of the build steps. In general, it’s better to use the more minimal :latest
version where possible in order to keep the size down and reduce the tooling available to attackers. Often the :latest-dev
image can be used as a build step in a multi-stage, with a more minimal image such as :latest
used in the final production image.
If you try building this image, you’ll find that it breaks in several places. The image needs to install various libraries so that it can compile the node-canvas
dependency, and this looks a bit different in Debian than it does in Wolfi (the OS powering Chainguard Images). In Wolfi, we first need to switch to the root user to install software and we use apk add
instead of apt-get
. We then need to figure out the Wolfi equivalents of the various Debian packages, which may not always have a one-to-one correspondence. There are tools to help here – you can consult our migration guides and use apk tools (like apk search libjpeg
), but searching the Wolfi GitHub repository for package names will often provide you with what you’re looking for.
Make these changes by replacing the RUN apt-get …
line with the following RUN apk update
and adding a USER root
line. The start of the Dockerfile should look like this:
FROM cgr.dev/chainguard/node:latest-dev
USER root
RUN apk update && apk add \
cairo-dev libjpeg-turbo-dev pango-dev giflib-dev \
librsvg-dev glib-dev harfbuzz-dev fribidi-dev expat-dev libxft-dev
The next change we need to make is to the RUN groupadd …
line. Chainguard images use BusyBox by default, which means groupadd
needs to become addgroup
. Rewrite the line so that it looks like this:
RUN addgroup dnmonster && adduser -D -G dnmonster dnmonster
Finally, the default entrypoint for the Chainguard image is /usr/bin/node
. If we leave the CMD
as it is, it will be interpreted as an argument to node, which isn’t what we want. The Docker official image uses an entrypoint script to interpret commands, but this isn’t available in the cgr.dev/chainguard/node:latest-dev
image. The easiest fix is to change the CMD
command to ENTRYPOINT
which will override the /usr/bin/node
command:
ENTRYPOINT [ "npm", "start" ]
Once you’ve made all these changes, you should have a Dockerfile that looks like:
FROM cgr.dev/chainguard/node:latest-dev
USER root
RUN apk update && apk add \
cairo-dev libjpeg-turbo-dev pango-dev giflib-dev \
librsvg-dev glib-dev harfbuzz-dev fribidi-dev expat-dev libxft-dev
#Create non-root user
RUN addgroup dnmonster && adduser -D -G dnmonster dnmonster
RUN install -d -o dnmonster -g dnmonster /home/dnmonster
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app/
RUN npm install
COPY ./src /usr/src/app
RUN chown -R dnmonster:dnmonster /usr/src/app
USER dnmonster
EXPOSE 8080
ENTRYPOINT [ "npm", "start" ]
At this point, we have a version of dnmonster that works and is equivalent to the previous version. We can build this image:
docker build --pull -t dnmonster-cg .
...
And investigate it again:
docker images dnmonster-cg
REPOSITORY TAG IMAGE ID CREATED SIZE
dnmonster-cg latest d4bb3a473a90 36 seconds ago 880MB
grype docker:dnmonster-cg
✔ Vulnerability DB [no update available]
✔ Loaded image dnmonster-cg:latest
✔ Parsed image sha256:d4bb3a473a9086d3e3d17e781b3ea0ad84031cb4082cbf433e9561ee349e5e86
✔ Cataloged contents d32762222ca3091f6f6a5009fe407da33f96cecec2d29368b271b2e0d52f8ff0
├── ✔ Packages [506 packages]
├── ✔ File digests [14,262 files]
├── ✔ File metadata [14,262 locations]
└── ✔ Executables [714 executables]
✔ Scanned for vulnerabilities [0 vulnerability matches]
├── by severity: 0 critical, 0 high, 0 medium, 0 low, 0 negligible
└── by status: 0 fixed, 0 not-fixed, 0 ignored
No vulnerabilities found
So the image is now significantly smaller at 880MB, but more importantly, we’ve eliminated all 970 vulnerabilities.
But we can still do more. In particular, although 880MB is significantly smaller than the previous version, it’s still a large image. To get the size down, we can use a multi-stage build where the built assets are copied into a minimal production image, which doesn’t include build tooling or dependencies required during development.
Ideally, we would use the cgr.dev/chainguard/node:latest
image for this, but we also need to install the dependencies for node-canvas
, which means we need an image with apk tools. Normally it’s recommended to use a :latest-dev
image for this, but in node’s case, the :latest-dev
image is relatively large due to the inclusion of build tooling, such as C compilers, that can be required by Node modules. Instead, we’re going to use the wolfi-base
image and install nodejs
as a package.
To do this, replace the Dockerfile with the following:
FROM cgr.dev/chainguard/node:latest-dev as build
USER root
RUN apk update && apk add \
cairo-dev libjpeg-turbo-dev pango-dev giflib-dev \
librsvg-dev glib-dev harfbuzz-dev fribidi-dev expat-dev libxft-dev
#Create non-root user
RUN addgroup dnmonster && adduser -D -G dnmonster dnmonster
RUN install -d -o dnmonster -g dnmonster /home/dnmonster
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app/
RUN npm install
COPY ./src /usr/src/app
RUN chown -R dnmonster:dnmonster /usr/src/app
USER dnmonster
EXPOSE 8080
ENTRYPOINT [ "npm", "start" ]
FROM cgr.dev/chainguard/wolfi-base
RUN apk update && apk add nodejs \
cairo-dev libjpeg-turbo-dev pango-dev giflib-dev \
librsvg-dev glib-dev harfbuzz-dev fribidi-dev expat-dev libxft-dev
WORKDIR /app
COPY --from=build /usr/src/app /app
EXPOSE 8080
ENTRYPOINT [ "node", "server.js" ]
We’ve added an as build
statement to the first FROM
line and added a second build that starts on the line FROM cgr.dev/chainguard/wolfi-base
. The second build installs the required dependencies (including Node.js) before copying the build artifacts from the first image. We also changed the entrypoint to execute Node directly, as the image no longer contains npm.
Build and investigate the image:
docker build --pull -t dnmonster-multi .
…
docker images dnmonster-multi
REPOSITORY TAG IMAGE ID CREATED SIZE
dnmonster-multi latest 9b2e79c87572 18 minutes ago 336MB
grype docker:dnmonster-multi
✔ Vulnerability DB [no update available]
✔ Loaded image dnmonster-multi:latest
✔ Parsed image sha256:9b2e79c87572902c3dc1577e55b56b27406b521cae6b231cb15d9b347eb0b94c
✔ Cataloged contents 88258c2837dbe88bd4490fc672d170581fbcdbe0dcc34c696d2a14032ad36395
├── ✔ Packages [256 packages]
├── ✔ File digests [9,357 files]
├── ✔ File metadata [9,357 locations]
└── ✔ Executables [599 executables]
✔ Scanned for vulnerabilities [0 vulnerability matches]
├── by severity: 0 critical, 0 high, 0 medium, 0 low, 0 negligible
└── by status: 0 fixed, 0 not-fixed, 0 ignored
No vulnerabilities found
This results in an image that is now only 336MB in size and still has 0 CVEs.
We’re most of the way now, but there are still a couple of finishing touches to make. The first one is to remove the dnmonster user. The wolfi-base image already defines a nonroot
user, so we can make the build a little less complicated by using that user directly. The second one is to add in a process manager. We have node running as the root process (PID 1) in the container, which isn’t ideal as it doesn’t handle some of the responsibilities that come with running as PID 1, such as forwarding signals to subprocesses. You can see this most clearly when you try to stop the image – it takes several seconds as the process doesn’t respond to the SIGTERM signal sent by Docker and has to be hard killed with SIGKILL. To fix this, we can add tini
, a small init for containers.
The tini
binary will run as PID 1, launch npm as a subprocess and take care of PID 1 responsibilities. Now, the final Dockerfile looks like this:
FROM cgr.dev/chainguard/node:latest-dev as build
USER root
RUN apk update && apk add \
tini cairo-dev libjpeg-turbo-dev pango-dev giflib-dev \
librsvg-dev glib-dev harfbuzz-dev fribidi-dev expat-dev libxft-dev
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
ENV NODE_ENV production
COPY package.json /usr/src/app/
RUN npm install
COPY ./src /usr/src/app
FROM cgr.dev/chainguard/wolfi-base
RUN apk update && apk add tini nodejs \
cairo-dev libjpeg-turbo-dev pango-dev giflib-dev \
librsvg-dev glib-dev harfbuzz-dev fribidi-dev expat-dev libxft-dev
WORKDIR /app
COPY --from=build /usr/src/app /app
EXPOSE 8080
ENTRYPOINT ["tini", "--" ]
CMD [ "node", "server.js" ]
This version is also available in the main branch of the repository.
Build it:
docker build --pull -t dnmonster-final .
…
And run it to prove it still works:
docker run -d -p 8080:8080 dnmonster-final
...
curl --output ./monster.png 'localhost:8080/monster/wolfi?size=100'
Note: If you receive a “port is already allocated” error, be sure to clean up the previous container. Check what containers are running with
docker container ls
and remove it withdocker rm -f <container-name>
. There are still more tweaks that could be made. Bret Fisher has some excellent resources on building Node.js containers in this GitHub repo. But for the purposes of this example app, we’ve made excellent progress.
Updating the Python Microservice
The next service we will look at updating is Identidock, the main entrypoint for the application. Identidock is responsible for looking up requests in the cache and falling-back to calling the dnmonster service if they’re not present.
Again, the version of the code on the v1 branch already contains a few updates from the original code, but in this case all that was needed was to bump various libraries to newer versions. The Dockerfile for the v1 version can be found in the identidock folder and looks like:
FROM python
RUN groupadd -r uwsgi && useradd -r -g uwsgi uwsgi
RUN pip install Flask==3.0.2 uWSGI requests==2.31.0 redis==5.0.3
WORKDIR /app
USER uwsgi
COPY app /app
COPY cmd.sh /
EXPOSE 9090 9191
CMD ["/cmd.sh"]
The image can be built with the following, assuming the current directory is the root of repo:
cd identidock
docker build --pull -t identidock .
…
Take a look at the image:
docker images identidock
REPOSITORY TAG IMAGE ID CREATED SIZE
identidock latest 46d3857f790b 8 minutes ago 1.05GB
Scan for vulnerabilities:
grype docker:identidock
✔ Vulnerability DB [no update available]
✔ Loaded image identidock:latest
✔ Parsed image sha256:46d3857f790b295a14829e709d8483f98bd3c1f304bc1f40f87b16a62300b76b
✔ Cataloged contents 09873b2d93203eb19c56257a597702b060b803851d152c624407f8aca30ab018
├── ✔ Packages [455 packages]
├── ✔ File digests [19,267 files]
├── ✔ File metadata [19,267 locations]
└── ✔ Executables [1,437 executables]
✔ Scanned for vulnerabilities [980 vulnerability matches]
├── by severity: 5 critical, 66 high, 155 medium, 32 low, 463 negligible (259 unknown)
└── by status: 11 fixed, 969 not-fixed, 0 ignored
At the time of writing, this image is 1.05GB with 980 vulnerabilities (5 critical) according to Grype.
Again as a first step, we will try to switch out directly to the Chainguard Image. To do this, edit the Dockerfile so the first line reads:
FROM cgr.dev/chainguard/python:latest-dev
Before building the image we need to also update groupadd
syntax to use the addgroup
format. As Chainguard Images don’t run as root by default for security reasons, we also need to change to the root user for this command to work. Replace the RUN groupadd
line with the lines:
USER root
RUN addgroup uwsgi && adduser -D -G uwsgi uwsgi
The image now builds, but there are issues due to differences in the image entrypoint. If you run the image, you will get a confusing error message such as:
`File "/cmd.sh", line 4`
if [ "$ENV" = 'DEV' ]; then
^^^^^^
SyntaxError: cannot assign to literal here. Maybe you meant '==' instead of '='?
This is caused by the Chainguard Image using /usr/bin/python
as the entrypoint, which means the cmd.sh
entrypoint script is interpreted by Python instead of the shell. Fixing this can be as easy as changing the entrypoint, but let’s take a look at the script first.
This is the file cmd.sh
in the identidock directory:
#!/bin/bash
set -e
if [ "$ENV" = 'DEV' ]; then
echo "Running Development Server"
exec python "/app/identidock.py"
elif [ "$ENV" = 'UNIT' ]; then
echo "Running Unit Tests"
exec python "tests.py"
else
echo "Running Production Server"
exec uwsgi --http 0.0.0.0:9090 --wsgi-file /app/identidock.py \
--callable app --stats 0.0.0.0:9191
fi
This script decides how to run the application depending on how the ENV
environment variable is set. The idea here is to allow us to use the same image in development, testing, and production. This approach is no longer recommended as it leads to development tooling being present in the production environment. Even though the development tooling isn’t run in production, it is still bloating the image and is potentially exploitable by attackers. Therefore, we will use a different approach and break the Dockerfile into separate development and production images.
Let’s skip to the final Dockerfile for our image and walk through the changes made. These changes address multiple issues, beyond just having multiple images, and are based on the Chainguard Academy guide to Python images.
Replace the Dockerfile with this one (this is also available on the “main” branch):
FROM cgr.dev/chainguard/python:latest-dev as dev
ENV LANG=C.UTF-8
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV PATH="/app/venv/bin:$PATH"
WORKDIR /app
RUN python -m venv /app/venv
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY app /app
EXPOSE 5000
ENTRYPOINT ["python"]
CMD ["identidock.py"]
FROM cgr.dev/chainguard/python
WORKDIR /app
ENV PYTHONUNBUFFERED=1
ENV PATH="/app/venv/bin:$PATH"
COPY app/identidock.py ./
COPY --from=dev /app/venv /app/venv
EXPOSE 9090
ENTRYPOINT [ "gunicorn", "-b", "0.0.0.0:9090", "identidock:app" ]
And add the file requirements.txt
to the current directory (identidock
) with the following contents:
Flask==3.0.2
requests==2.31.0
redis==5.0.3
gunicorn==21.2.0
The first thing to notice is that we have a multistage build now. If you want the development image rather than the production one, you can specify it during docker build:
docker build --pull --target dev -t identidock:dev .
Otherwise, you will get the production image only.
There are several more environment variables defined. These prevent the creation of Python bytecode and buffering of output. For more detail on why this is useful, see the blog PYTHONDONTWRITEBYTECODE and PYTHONUNBUFFERED Explained.
The installation of pip modules has moved to the requirements.txt
file. The main thinking here is that we don’t need to update the Dockerfile each time a dependency is updated or changed.
The development server runs on port 5000, while the production server runs on port 9090. We could edit this so they both run on the same port, but this approach reduces the chance of accidentally running the development server in production. The development server is started directly from the entrypoint, so we are no longer dependent on an entrypoint script, simplifying our architecture.
To get a minimal, clean production install, we are using a Python virtual environment(venv) in the development image to isolate all dependencies, which are then copied over to the production image. Finally, the production image has been changed to use gunicorn as uwsgi has entered “maintenance mode”.
Build the final image:
docker build --pull -t identidock-cg .
And take a look at it:
docker images identidock-cg
REPOSITORY TAG IMAGE ID CREATED SIZE
identidock-cg latest 30a424aa8a51 15 seconds ago 93.6MB
Run a scan with grype:
grype docker:identidock-cg
✔ Vulnerability DB [updated]
✔ Loaded image identidock-cg:latest
✔ Parsed image sha256:30a424aa8a51548491ea3c152383ce2c16a801859e9e8e252b147162ac0b637e
✔ Cataloged contents 40024c9c378c73d52c9e62f32fcf6d0d328d8d14a66a328df8d6fc0278a4a2e8
├── ✔ Packages [46 packages]
├── ✔ File digests [2,020 files]
├── ✔ File metadata [2,020 locations]
└── ✔ Executables [141 executables]
✔ Scanned for vulnerabilities [0 vulnerability matches]
├── by severity: 0 critical, 0 high, 0 medium, 0 low, 0 negligible
└── by status: 0 fixed, 0 not-fixed, 0 ignored
The result of all these changes is that the production image is only 93.6 MB (down from 1.05GB, so an enormous saving of just under a GB) and has 0 CVEs (down from 980). This is a huge improvement!
Further information on using Chainguard Images with Python can be found in our Getting Started guide.
Replacing the Redis Image and Updating the Docker Compose File
Updating Redis is straightforward. We’re not making any changes to the application image, so all we need to do is directly update the reference to redis:7
in the Docker Compose file to cgr.dev/chainguard/redis
. The new image requires no extra configuration, but we go from a 139 MB image with 137 vulnerabilities to a 23MB image with 0 CVEs (again according to Grype).
To update Compose, in the top level directory of the repo, replace the content of the docker-compose.yml
file with the following:
name: identidock
services:
frontend:
build: identidock
ports:
- "9090:9090"
dnmonster:
build: dnmonster
redis:
image: cgr.dev/chainguard/redis
If you now run docker compose up -–build
, you should have a working application that can be reached on port 9090:
There are some differences between this version and the original. The environment variable used for switching between image variants has been removed and the ports have changed to reflect the default port used in gunicorn.
This Compose file doesn’t contain support for a development workflow currently – ideally we would be able to quickly iterate on our code without building a new image. The original file used volumes to achieve this, but this isn’t something we want to do with the production image. One solution is to have a separate development Compose file, which will build the development image and use a volume to mount code at runtime for immediate feedback. New versions of Docker also support Compose Watch which can be a more efficient and granular solution than volume mounts. See What is Docker Compose Watch and what problem does it solve? for an introductory tutorial on using Compose Watch.
Conclusion
Porting our application to Chainguard Images was relatively straightforward. There were some gotchas around differences to other images, such as different entrypoint settings and names for packages. The largest part of the puzzle was moving from single image builds to multistage builds that take advantage of the minimal Chainguard runtime images for Python and NodeJS. Once all this was done, we ended up with a much smaller set of images and with a drastically reduced number of CVEs.
Cleaning Up
To clean up the resources used in this guide, first stop any containers started by Compose:
docker-compose down
Then check docker ps
to see if any containers are still running and stop them with docker stop <id>
.
You can then remove any images you’ve built:
docker rmi -f dnmonster dnmonster-cg dnmonster-multi identidock identidock:dev identidock-cg
Last updated: 2024-08-08 14:44