As part of a partnership with the eBPF Foundation, Bootlin engineers Bastien Curutchet and Alexis Lothoré are working with the kernel community in order to improve eBPF support in the kernel on different aspects. This post is the first one of a series highlighting this effort. For those who need to catch up with the eBPF technology, you can take a look at our “Linux Debugging, tracing and profiling” training course which has been recently updated with eBPF basics !

eBPF testing in the kernel

While having been supported in the Linux kernel for many years, eBPF keeps receiving a lot of new features and fixes (for example, the 6.14 release cycle have introduced more than 300 new commits in the eBPF subsystem). As a consequence, and similarly to any other part of the kernel, it is critical for both developers and maintainers to be able to check that no regression is being introduced by new changes. This need is even more pressing as some parts of the eBPF subsystem, like the verifier, or the architecture-specific JIT modules, can be very complex, even for experienced developers. To address this, the eBPF subsystem is covered by a set of “selftests” living directly in the kernel source tree. Those selftests are composed of multiple parts, the most important ones being:

eBPF programs designed to exercise some specific parts of the subsystem
userspace programs, either in C or bash, that manipulate the eBPF programs and actually run the stimuli and the corresponding checks
Makefiles to build all of those components
a vmtest.sh script which facilitates the tests execution: it is able to load a kernel configuration tailored for the tests, build it, build the tests, fetch a root filesystem, start a virtual machine with the freshly built kernel, and then run tests. This script is even capable of handling different architectures for the emulated platform.

So one can very easily start hacking the kernel source code and check that the eBPF core still works as expected with just a few commands:

$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git linux
$ cd linux
$ # Add some code to add a new feature in the eBPF core
$./tools/testing/selftests/bpf/vmtest.sh # Ensure that we did not break anything

Automated testing for upstream contributions

While those tests make a strong basis to detect regressions, their real strength lies in the continuous integration automation configured between the eBPF mailing list, the Patchwork framework used by maintainers to handle contributions and a dedicated Github repository able to run specific Github Actions. This whole set allows systematic testing of any code sent by developers for integration in the eBPF subsystem. The basic workflow is the following:

a developer implements a new feature or a fix, and wants to have it integrated in the upstream kernel
he prepares his commits and send those to bpf@vger.kernel.org for review
once the series is posted, the Github testing repository will automatically pick the series code, rebase it on top of the current eBPF integration tree, and open a pull request with it.
this pull request then automatically runs a matrix of tests on the submitted code. Those tests will replicate the behavior described above for vmtest.sh, for different architectures (currently, the automation supports x86, arm64, and s390) and different flavors and versions of compilers.
once all tests have been executed, the series status will be updated on patchwork to allow maintainers to quickly check the overall impact of the series on the existing codebase. The contributor will also receive the CI run status by mail.

This whole automation allows maintainers and developers to detect regressions as early as possible (before the code is integrated), without having to manually re-test all eBPF features, which would obviously be a huge overhead in the contribution process. With great powers come great responsibility though: contributors sending new code upstream are expected to provide the tests covering the code want to integrate, to make sure that it is tested as well each time it will be impacted by some modification in the future.

It is also worth noting that any individual can manually trigger a CI run on his code, without having to send a series onto the mailing list. The only requirement is to get a Github account, then he can:

Fork the official testing repository on his Github account
Push the code to be tested on the fork, on a dedicated branch (let’s call it code_under_test)
Manually open a pull request, asking to integrate the commits from the code_under_test branch to the bpf-next_base branch

This pull request will not be monitored by the eBPF maintainers, as the official method to contribute code is to use the mailing lists. However, it will trigger all the CI automation, similarly to the automatic pull requests created for the series sent on the mailing list.

The generic test runners

Unfortunately, the CI automation is not able to run all the tests present in the tools/testing/selftests/bpf directory in the kernel source tree. The entry point to run tests in CI is a set of generic “runners”, which are nothing more but generic programs able to receive specific commands and configuration to:

list all supported tests
run a specific list of tests, filter out some specific tests
configure tests parallelization
generate test logs
etc

The main test runner for eBPF tests is test_progs, but we can also find some more specific runners for some more specific topics, like test_maps, test_verifier or veristat. If you take a look at the logs generated by the CI automation, you will see for example that vmtest is actually running and monitoring the results from test_progs.

The direct consequence of this organization is that any test, to have a chance to run in CI, must be integrated in one of the test runners. This is generally an easy task when creating a new test, thanks to all the hard work already being handled by the test runners and the corresponding makefiles. But either for historical or technical reasons, many tests are not integrated this way: they are generally made of custom scripts and standalone binaries, ingesting their own set of specific inputs, and having their own format for the test output.

Those tests, while being needed for the features they are covering, are not as useful as they could be, since we have to manually run them to check that the targeted features work as expected. This is where Bootlin engineers have jumped in, on request from the eBPF Foundation, to sort out those standalone tests: each of those must be properly converted and integrated in the generic test runners so they can be executed automatically on series sent upstream.

Increasing the automated testing coverage

Converting those tests involved pretty much the same process each time. Most of them must be integrated in the test_progs runner, which is a C program. The makefile for test_progs automatically recognizes functions with a specific pattern in the name, making it easy to create new tests. It is also able to build automatically the needed eBPF programs and generate the corresponding libbpf skeletons to be used by the userspace part of the test. To convert a test to the test_progs runner, we must then:

put the eBPF program(s) needed by the test in the progs directory
create the actual test (the userspace part) in the prog_tests directory
- it is a C file with at least one function prefixed by test_ (or serial_test_ for tests which can not run in parallel), which will make it automatically integrated in the final test_progs binary
- it is able to use some test_progs helpers, as well as some network helpers, for example to configure a specific network setup
move the content from the “standalone” test into the new test file

While the overall process looks quite simple, this task obviously came with some challenges and peculiarities for each test:

some tests rely on a specific hardware setup, which is not available in the CI environment: the tests executed by test_progs are run in a virtualized machine thanks to Qemu. For example, when test_progs tests depend on network interfaces, they generally use the loopback interface, or a pair of virtual interfaces. Even with those solutions, despite the conversion to test_progs, developers need to keep a version of some tests that can run on real hardware.
some tests do not validate the code under test with a pure “functional” approach but by gathering some metrics (eg: the network bandwith measured on an interface) and comparing it to a threshold. This approach makes it difficult to have robust and repeatable results in CI, so the way of validating the test must be changed.
the standalone tests are generally not designed to run in parallel of other tests (which is the case for tests in test_progs, to keep the execution time reasonable), so their default behavior generally leads to conflicts and race conditions with other tests: for example it may try to blindly take full control of a resource (eg: a specific network interface), which must be shared between all the tests. This kind of test then need some work to be properly isolated, for example with namespaces.
a particular attention must be brought to CI performance when adding new tests: the needed time to run the CI tests must remain reasonable so we can get results quickly enough, so we must be careful about the overhead introduced by each converted test.

Some of those issues led to more head-scratching than others, but thanks to community feedback and some targeted refactoring, those has been tackled and sometimes even led to some general improvements in the test runners.

Before the beginning of Bootlin’s work on the project, the kernel source tree got around 34 shell scripts for tests (ignoring those performing benchmarking rather than functional testing, and vmtest.sh) . At the time of writing this post, this number has been reduced to 22. This effort have been covered by 14 different series (not counting the various revisions), bringing around 76 new commits:

Among all the remaining scripts, only a few may still be candidates for a direct conversion into the test_progs runner. Indeed, there are multiple scripts which are not really designed for automatic testing of the eBPF subsystem:

some of them are not really testing the kernel eBPF features but rather external tools (for example, bpftool)
some others are not really exposing tests but some tooling, that may be already used by the test_progs runner
some scripts in the testing directory are really dedicated to benchmarking, and so they are not relevant for a systematic run in CI

Further analysis and discussions with the community must be undertaken to really define what should be done with those remaining scripts.

A more relevant metric is the number of new tests integrated in test_progs:

on tag v6.11, an arbitrary test_progs execution (x86_64 with GCC) ran 551 tests, spawning 4103 subtests
at time of writing this post, this same test configuration runs 606 tests in CI, spawning 4436 subtests.

Those numbers also include the new tests that other developers have contributed during this time span, but it still reflects well the progress made on the selftests so far. It naturally increases the number of eBPF features covered by the automatic testing, granting more confidence about any code merged in the subsystem. The effort is still ongoing, but it is definitely going in the right direction, thanks to the eBPF Foundation support !

If you enjoyed this post and are more curious about our work on eBPF, make sure not to miss our next post in this series, in which we will dive into more architecture-specific features and low level details !

Author: Alexis Lothoré

Alexis works at Bootlin as embedded Linux engineer since 2023. He has packaged full Linux distributions for a variety of devices, mostly for IoT devices View all posts by Alexis Lothoré