Preparing a consistent Python environment
Building QEMU is a complex task, split across several programs.
the configure
script finds the host and cross compilers that are needed
to build emulators and firmware; Meson prepares the build environment
for the emulators; finally, Make and Ninja actually perform the build,
and in some cases they run tests as well.
In addition to compiling C code, many build steps run tools and scripts which are mostly written in the Python language. These include processing the emulator configuration, code generators for tracepoints and QAPI, extensions for the Sphinx documentation tool, and the Avocado testing framework. The Meson build system itself is written in Python, too.
Some of these tools are run through the python3
executable, while others
are invoked directly as sphinx-build
or meson
, and this can create
inconsistencies. For example, QEMU’s configure
script checks for a
minimum version of Python and rejects too-old interpreters. However,
what would happen if code run by Sphinx used a different version?
This situation has been largely hypothetical until recently; QEMU’s
Python code is already tested with a wide range of versions of the
interpreter, and it would not be a huge issue if Sphinx used a different
version of Python as long as both of them were supported. This will
change in version 8.1 of QEMU, which will bump the minimum supported
version of Python from 3.6 to 3.8. While all the distros that QEMU
supports have a recent-enough interpreter, the default on RHEL8 and
SLES15 is still version 3.6, and that is what all binaries in /usr/bin
use unconditionally.
As of QEMU 8.0, even if configure
is told to use /usr/bin/python3.8
for the build, QEMU’s custom Sphinx extensions would still run under
Python 3.6. configure does separately check that Sphinx is executing
with a new enough Python version, but it would be nice if there were
a more generic way to prepare a consistent Python environment.
This post will explain how QEMU 8.1 will ensure that a single interpreter is used for the whole of the build process. Getting there will require some familiarity with Python packaging, so let’s start with virtual environments.
Virtual environments
It is surprisingly hard to find what Python interpreter a given script
will use. You can try to parse the first line of the script, which will
be something like #! /usr/bin/python3
, but there is no guarantee of
success. For example, on some version of Homebrew /usr/bin/meson
will be a wrapper script like:
#!/bin/bash
PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" \
exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@"
The file with the Python shebang line will be hidden somewhere in
/usr/local/Cellar
. Therefore, performing some kind of check on the
files in /usr/bin
is ruled out. QEMU needs to set up a consistent
environment on its own.
If a user who is building QEMU wanted to do so, the simplest way would
be to use Python virtual environments. A virtual environment takes an
existing Python installation but gives it a local set of Python packages.
It also has its own bin
directory; place it at the beginning of your
PATH
and you will be able to control the Python interpreter for scripts
that begin with #! /usr/bin/env python3
.
Furthermore, when packages are installed into the virtual environment
with pip
, they always refer to the Python interpreter that was used to
create the environment. Virtual environments mostly solve the consistency
problem at the cost of an extra pip install
step to put QEMU’s build
dependencies into the environment.
Unfortunately, this extra step has a substantial downside. Even though
the virtual environment can optionally refer to the base installation’s
installed packages, pip
will always install packages from scratch
into the virtual environment. For all Linux distributions except RHEL8
and SLES15 this is unnecessary, and users would be happy to build QEMU
using the versions of Meson and Sphinx included in the distribution.
Even worse, pip install
will access the Python package index (PyPI)
over the Internet, which is often impossible on build machines that
are sealed from the outside world. Automated installation of PyPI
dependencies may actually be a welcome feature, but it must also remain
strictly optional.
In other words, the ideal solution would use a non-isolated virtual
environment, to be able to use system packages provided by Linux
distributions; but it would also ensure that scripts (sphinx-build
,
meson
, avocado
) are placed into bin
just like pip install
does.
Distribution packages
When it comes to packages, Python surely makes an effort to be confusing.
The fundamental unit for importing code into a Python program is called
a package; for example os
and sys
are two examples of a package.
However, a program or library that is distributed on PyPI consists
of many such “import packages”: that’s because while pip
is usually
said to be a “package installer” for Python, more precisely it installs
“distribution packages”.
To add to the confusion, the term “distribution package” is often shortened to either “package” or “distribution”. And finally, the metadata of the distribution package remains available even after installation, so “distributions” include things that are already installed (and are not being distributed anywhere).
All this matters because distribution metadata will be the key to
building the perfect virtual environment. If you look at the content
of bin/meson
in a virtual environment, after installing the package
with pip
, this is what you find:
#!/home/pbonzini/my-venv/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from mesonbuild.mesonmain import main
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(main())
This looks a lot like automatically generated code, and in fact it is;
the only parts that vary are the from mesonbuild.mesonmain import main
import, and the invocation of the main()
function on the last line.
pip
creates this invocation script based on the setup.cfg
file
in Meson’s source code, more specifically based on the following stanza:
[options.entry_points]
console_scripts =
meson = mesonbuild.mesonmain:main
Similar declarations exist in Sphinx, Avocado and so on, and accessing their
content is easy via importlib.metadata
(available in Python 3.8+):
$ python3
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
[EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')]
importlib
looks up the metadata in the running Python interpreter’s
search path; if Meson is installed under another interpreter’s site-packages
directory, it will not be found:
$ python3.8
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
Traceback (most recent call last):
...
importlib.metadata.PackageNotFoundError: meson
So finally we have a plan! configure
can build a non-isolated virtual
environment, use importlib
to check that the required packages exist
in the base installation, and create scripts in bin
that point to the
right Python interpreter. Then, it can optionally use pip install
to
install the missing packages.
While this process includes a certain amount of
specialized logic, Python provides a customizable venv
module to create virtual
environments. The custom steps can be performed by subclassing
venv.EnvBuilder
.
This will provide the same experience as QEMU 8.0, except that there will
be no need for the --meson
and --sphinx-build
options to the
configure
script. The path to the Python interpreter is enough to
set up all Python programs used during the build.
There is only one thing left to fix…
Nesting virtual environments
Remember how we started with a user that creates her own virtual
environment before building QEMU? Well, this would not work
anymore, because virtual environments cannot be nested. As soon
as configure
creates its own virtual environment, the packages
installed by the user are not available anymore.
Fortunately, the “appearance” of a nested virtual environment is easy
to emulate. Detecting whether python3
runs in a virtual environment
is as easy as checking sys.prefix != sys.base_prefix
; if it is,
we need to retrieve the parent virtual environments site-packages
directory:
>>> import sysconfig
>>> sysconfig.get_path('purelib')
'/home/pbonzini/my-venv/lib/python3.11/site-packages'
and write it to a .pth
file in the lib
directory of the new virtual
environment. The following demo shows how a distribution package in the
parent virtual environment will be available in the child as well:
A small detail is that configure
’s new virtual environment should
mirror the isolation setting of the parent. An isolated venv can be
detected because sys.base_prefix in site.PREFIXES
is false.
Conclusion
Right now, QEMU only makes a minimal attempt at ensuring consistency
of the Python environment; Meson is always run using the interpreter
that was passed to the configure script with --python
or $PYTHON
,
but that’s it. Once the above technique will be implemented in QEMU 8.1,
there will be no difference in the build experience, but configuration
will be easier and a wider set of invalid build environments will
be detected. We will merge these checks before dropping support for
Python 3.6, so that users on older enterprise distributions will have
a smooth transition.