Building Minimal Docker Containers for Python Applications
This post was last updated September 19th, 2019.
A best practice when creating Docker containers is keeping the image size to a minimum. The fewer bytes you have to shunt over the network or store on disk, the better. Keeping the size down generally means it is faster to build and deploy your container.
Each container should contain the application code, language-specific dependencies, OS dependencies and that’s it. Any more is a waste and a potential security issue. If you have tools like gcc inside a container that is deployed to production, then an attacker with shell access can easily build tools to access other internal systems. Having layers of security minimises the damage one attack can cause.
Python in Docker
I was recently working on a Python webserver. The requirements.txt looked something like:
Flask>=1.1.1,<1.2
flask-restplus>=0.13,<0.14
Flask-SSLify>=0.1.5,<0.2
Flask-Admin>=1.5.3,<1.6
gunicorn>=19,<20
Fat image
If you search Google you will find examples of Dockerfiles that look like:
FROM python:3.7COPY . /app
WORKDIR /appRUN pip install -r requirements.txtCMD ["gunicorn", "-w 4", "main:app"]
This container image weighs in at 958MB!!
If you’re like me, then you’re scratching your head wondering “this is just a simple Python web app, why is it that big??” Let’s find a way to reduce that.
Alpine
Minimalism is important but too small can be harmful as well. You could build all containers from scratch, but that means you have to deal with low-level OS primitives like shell, cat, find, etc. That very quickly becomes tedious and distracts from getting code in front of customers as fast as possible (one of our mantras). I have found that a pragmatic balance is using a base image such as Alpine. At time of writing, the latest Alpine image (v3.10) weighs in at 5.58MB, very respectable. You also get a minimal POSIX environment with which to build your application.
FROM python:3.7-alpineCOPY . /app
WORKDIR /appRUN pip install -r requirements.txtCMD ["gunicorn", "-w 4", "main:app"]
Building this container results in an image size of 139MB. Of this, the base image is 98.7MB (at time of writing). That means that our app is responsible for the additional 40.3MB.
It is important to note that by default Alpine uses musl instead of glibc by default. This means that some Python wheels won’t work without forcing a recompilation.
Development
Those with keen eyes and Docker experience will see an issue with the Dockerfile above. Every time we make a change to our source code and rebuild the container, the dependencies will be re-downloaded and re-installed. This is not good — it takes too much time to do iterative development. Let’s rewrite the Dockerfile to take advantage of layer caching.
Layer caching
FROM python:3.7-alpineCOPY requirements.txt /RUN pip install -r /requirements.txtCOPY src/ /app
WORKDIR /appCMD ["gunicorn", "-w 4", "main:app"]
Rewriting our Dockerfile this way makes use of Docker’s layer caching and skips installing Python requirements if the requirements.txt file does not change.
This makes our build fast, but it has no impact on the overall image size.
Cached dependencies
If you look closely at output of the Docker build from above you should see something along the lines of:
Building wheels for collected packages: Flask-SSLify, Flask-Admin, MarkupSafe, pyrsistent
Building wheel for Flask-SSLify (setup.py): started
Building wheel for Flask-SSLify (setup.py): finished with status 'done'
Created wheel for Flask-SSLify: filename=Flask_SSLify-0.1.5-cp37-none-any.whl size=2439 sha256=97d9f3687a0ead6056c0d5472e506cf01c5bbfa7e688964c96072653aa581ede
Stored in directory: /root/.cache/pip/wheels/f6/be/7c/b262753258e34b3f07ec47973038f199c34678985b9614a50d
Building wheel for Flask-Admin (setup.py): started
Building wheel for Flask-Admin (setup.py): finished with status 'done'
Created wheel for Flask-Admin: filename=Flask_Admin-1.5.3-cp37-none-any.whl size=1853777 sha256=04068d272f06c802ff0288fa5727ea01a90f5a8ce8d9f545f945c9e24207fc31
Stored in directory: /root/.cache/pip/wheels/6f/ca/26/3dcc4b3286ed103ef9328b856221a9881188653c5d38ac73db
Building wheel for MarkupSafe (setup.py): started
Building wheel for MarkupSafe (setup.py): finished with status 'done'
Created wheel for MarkupSafe: filename=MarkupSafe-1.1.1-cp37-none-any.whl size=12629 sha256=32853345d5291f8c97218a4ca0474098a69680961306192205366a277fc1141e
Stored in directory: /root/.cache/pip/wheels/f2/aa/04/0edf07a1b8a5f5f1aed7580fffb69ce8972edc16a505916a77
Building wheel for pyrsistent (setup.py): started
Building wheel for pyrsistent (setup.py): finished with status 'done'
Created wheel for pyrsistent: filename=pyrsistent-0.15.4-cp37-cp37m-linux_x86_64.whl size=56384 sha256=c960e45578b3a33a35094111af9445ebb96287038841438c83e01c9cc1df63d4
Stored in directory: /root/.cache/pip/wheels/bb/46/00/6d471ef0b813e3621f0abe6cb723c20d529d39a061de3f7c51
Successfully built Flask-SSLify Flask-Admin MarkupSafe pyrsistent
When pip install was running, it also stored a copy of the dependencies we downloaded to /root/.cache. This is useful when we’re using doing local development outside of Docker, but uses unnecessary space that is never going to be touched by the application. This directory is taking up 8.3MB of our 40.3MB ‘app’ image. Let’s eliminate this by taking advantage of another Docker feature — multistage builds.
Multistage builds
Docker 17.05 added support for multistage builds. This means that dependencies can be built in one image and then imported into another. Rewriting our Dockerfile to use multistage builds now looks like:
FROM python:3.7-alpine as baseFROM base as builderRUN mkdir /install
WORKDIR /installCOPY requirements.txt /requirements.txtRUN pip install --install-option="--prefix=/install" -r /requirements.txtFROM baseCOPY --from=builder /install /usr/local
COPY src /appWORKDIR /appCMD ["gunicorn", "-w 4", "main:app"]
This Docker container is 125MB with the compiled Python dependencies weighing in at 31.0M (I worked this out by running a du -h /install from within the build container).
125MB is a significant improvement over the 958MB that we started with!
Further down the rabbit hole
It is definitely possible to reduce the image size further — by switching to the Alpine version of Python and removing extraneous files at container build time like docs, tests, etc. I was able to get the image size to less than 70MB. But is it worth it? If you’re doing many deploys per day to the same set of VMs then there is a high chance that the majority of the layers that make up the image will be cached on disk, meaning that there are diminishing returns as the image size approaches zero. Additionally, the resulting Dockerfile was complex and not simple to grok (simplicity is king). Pragmatism is important — letting someone else shoulder the maintenance of the base image allows you to focus on the business problem.
Conclusion
Docker is a powerful tool that allows us to bundle up our application along side language and OS dependencies. This is extremely valuable when we roll out this image to production as we guarantee that the image that we tested with will be the image that is run in production.
The Docker build system allows us to create images that are very large if written naively but also small, lightweight, and cacheable if done correctly.
Real Kinetic help companies get the most value out of containers and improve their cloud architecture. Contact us to learn more.