Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away ― Antoine de Saint-Exupéry, Airman's Odyssey

Sharing the meme from python dependencies is always funny until something bad related to Python dependencies happens to you, and the meme becomes a reality. One month ago (December 2023) the alpine Linux image, one of the more popular Docker images for building various applications with over 1 billion downloads, released the version 3.19. This new version included support to the python PEP 668, which, unfortunately, broke many pipelines.

Figure 1: Dependency madness in python, got from https://xkcd.com/1987/

PEP 668

Different Linux distros have started to adopt PEP 668 and have incorporated these changes into their latest versions, similar to what Alpine did in December. Users who continue to use the latest tag in their image will encounter an error like this:

Figure 2: Standard PEP668 error

The last error was generated by a standard Dockerfile with common commands.

FROM alpine:latest
RUN apk add --update py-pip
RUN pip install fastapi

So what happened? If your pipeline was working well and there weren't recent changes, how did it break? These kinds of problems are difficult to debug because finding the root cause of the failures is tricky. Let's try to dig deeper.

Virtualenvs

Virtualenv is a tool for creating isolated Python environments for maintaining package dependencies, in Python it is the correct way to manage external dependencies. However, some developers using Docker have chosen to omit the use of virtualenvs and install packages directly, as shown in the last Dockerfile example.

The issues

There are two main issues when a package is installed without using a virtualenv. The first is a system operative problem and the second is a versioning problem.

The System operative issue is based in how different distros depends on python libraries for their correct behavior, Show Figure 3. For instance, Ubuntu has the update-manager package, the GUI for installing system updates, it depends on python3-yaml. The YAML Python package is installed using apt and saved as a system package (usr/lib/python3.x).

Figure 3: Python lib structure on Ubuntu

When a user tries to execute a command as a root, like the following

sudo pip install python3-yaml

It is a really bad practice, because the python YAML package will be installed in the usr/lib/python3.x folder, which is the system folder. In the best case (not really the best case), this can lead to either the system breaking due to incompatibility or simply corrupting some files.

The second issue is dependencies incompatibility. Installing pip and using it without sudo will install the packages inside usr/local/lib/python3.x, which does not break the system, but is still considered a bad practice. Let's review the following two Python libraries, Pip and Request, which have a common dependency, urlib3. If you install Pip and the requests library, both will depend on urlib3, if you want to update requests or pip it can lead to version incompatibility.

The current state

The last two issues explained before are well-known problems, but until now, there did not exist a strong method to prevent users from doing that. PEP 668 changed that. This pep work in two parts:

  1. The distro add the following file into system package usr/local/lib/python3.x/EXTERNALLY-MANAGED
[externally-managed]
Error=To install Python packages system-wide, try apt install
 python3-xyz, where xyz is the package you are trying to
 install.

 If you wish to install a non-Debian-packaged Python package,
 create a virtual environment using python3 -m venv path/to/venv.
 Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
 sure you have python3-full installed.

 If you wish to install a non-Debian packaged Python application,
 it may be easiest to use pipx install xyz, which will manage a
 virtual environment for you. Make sure you have pipx installed.

 See /usr/share/doc/python3.9/README.venv for more information.

This file essentially informs third-party package managers like pip that this location is managed by the system, preventing users from installing anything there.

2. The pip manage package do the following validations

  • To validate that the pip install is running outside the virtual environment, it is checked by:
sys.prefix == sys.base_prefix
  • Check if there is a EXTERNALLY-MANAGED file in the python folder, and also verify if this folder is a system folder. It is validated by:
sysconfig.get_path("stdlib", sysconfig.get_default_scheme())

As you can see, it is a task that involves both systems; both of them have to incorporate the logic to make it work. Pip added this functionality in some versions before; however, not all distros have included the EXTERNALLY-MANAGED file. That is why probably some docker images are still working. After providing this context, let's return to the Dockerfile explanation to recall the exception.

FROM alpine:latest
RUN apk add --update py-pip
RUN pip install fastapi

This Dockerfile is using the "latest" tag, meaning each time the image is built, it gets the latest Alpine version, which contains the EXTERNALLY-MANAGED file inside the Python system folder, added in December. Since each RUN command is executed as root, the command RUN pip install fastapi will install FastAPI along with its dependencies inside the Python system folder, triggering the exception.

Solutions

There are three options to make it works again

  • The first option is a workaround and might still break things inside the image. However, if it hasn't caused issues in the image before, it probably works well. So, if you have all the pipelines on fire, and you need to fix it ASAP, it can work as a temporary solution, while you find the final resolution. The workaround involves installing the dependencies while ignoring the warnings using the flag --break-system-packages.
FROM alpine:latest
RUN apk add --update py-pip
RUN pip install fastapi --break-system-packages
  • The second option involves avoiding the bad practice of using the “latest” tag. However, eventually, you will need to update and find a proper solution, but at least it will work meanwhile. The version 3.18.6 does not have the EXTERNALLY-MANAGED yet.
FROM alpine:3.18.6
RUN apk add --update py-pip
RUN pip install fastapi --break-system-packages
  • The third option is the best approach in terms of maintainability. It involves removing the “latest” tag and using the specific version number, thus avoiding unexpected behavior by pulling in the latest updates. Additionally, it aligns with best practices by using virtualenvs, as recommended by PEP 668.
FROM alpine:3.19
ENV MY_ENV=/opt/my-venv
RUN apk add --update py-pip
RUN python3 -m venv $MY_ENV \
	&& $MY_ENV/bin/pip install -U pip setuptools 
ENV PATH="${MY_ENV}/bin:$PATH"
RUN pip install fastapi

Finally, I would recommend use tools such as Poetry or Hatch o whatever other management dependencies in python you like it, allowing you to use different features that pip does not have.

Conclusion

Fixing these types of problems is not easy. The best takeaway here is to always avoid these weird bugs as much you can, and adopting good practices from the start is so far the way to go. Nevertheless, it is a cyclical process, so each time you encounter situations similar to this, share them with your team, and ensure they are fixed using the best approach for the future.