The Yocto Project is an open source umbrella project which gathers all needed tools to build full Linux distributions for a wide variety of devices. As the interest for Yocto grew since its first steps, its size and number of use cases increased consequentially. This growth quickly introduced the need of automated testing so that developers can keep introducing new features to the project while making sure not to break any existing part. Bootlin engineer Alexis Lothoré has recently been involved in the Continuous Integration infrastructure of the Yocto Project and has brought improvements to allow Yocto maintainers to detect regressions earlier.

A primer on Yocto automatic testing

Yocto tests

The Yocto Project embeds a wide variety of tests:

oe-selftest: those tests ensure that the build system works as expected. Those are basically Python scripts which either test some internal tools or run full builds with specific configurations and ensure those builds succeed
runtime: those tests run on images generated with Yocto. The tests can run either on real target, or on virtual machines thanks to Qemu
sdk tests: ensure that SDKs generated with Yocto embed all needed tools (e.g. all toolchain components) and are able to build components for the target
ptests: many Yocto recipes rely on open source projects managed by other communities, which bring in their own tests in their codebase . Yocto defines a thin wrapping layer to add to the corresponding recipes to be able to run those tests and parse their results

Yocto also defines a general test results format and a way to store them. The test results are managed with the resulttool script, which can be found in the OpenEmbedded core layer and provides the following features:

aggregate test results from multiple test runs/test cases
store test results in the test results repository
- Yocto project has a repository which contains a large history of tests with their results.
- All commits in the repository are tagged, and both commits and tags bear the poky revision with which the tests have been built and run, making it easy to match tests results with corresponding revisions
print a human readable summary of test results
parse 2 provided tests results files and display regressions (tests that were passing and that are not passing anymore)
parse 2 provided poky revisions, search corresponding tests in the tests results repository (see above), compare found results and display regressions

Any developer can modify and run the tests, and even use resulttool to check for regressions with its modification. But the more important use case for this tooling is the automatic regression reporting configured in the Yocto Autobuilder.

Yocto Autobuilder

The Yocto project has its own CI infrastructure to automatically build, test and deliver a large matrix of build configurations. This engine is composed of the following elements:

at its core, the infrastructure relies on Buildbot, a generic job scheduling engine
Yocto defines basic buildbot elements (like workers, schedulers, services, builders, etc) in its yocto-autobuilder2 repository
Finally, all Yocto-specific configuration (repositories, build configurations and commands, etc) are defined in yocto-autobuilder-helper. This repository also contains some tools used by the infrastructure, for release management for example.

With those layers, the Yocto CI infrastructure defines a wide matrix of build/tests configurations, with multiple pairs of branches and hardware targets:

A brief overview of the build matrix of Yocto autobuilder

Anyone interested in this infrastructure can take a look at it since it is publicly accessible!

An example of general pipeline is the “a-full” pipeline, which is the “all-inclusive” job that is run before any release. Its sequence can be summarized with the following steps:

write a local configuration based on input parameters (target branches, who is starting the build and why, is it a release, etc). This configuration is then used by all subsequent steps
clone all needed repositories to run builds (mostly, Yocto layers and associated tools)
build and tests all configured targets
depending on the build type, collect all artifacts (binaries, logs, tests results, etc) and publish them on a public web server
if the build is a release, notify the Yocto community to start a QA cycle

Improving the regression reports on release

When Yocto maintainers trigger a build on the autobuilder, a regression report is generated with resulttool. If the build is a release, it is also published alongside all other artifacts. However, since recently, the regression reports suffered some issues, which made them not reliable enough to properly detect regressions:

by looking at just the regression report, it was hard to check when the regression was introduced (no indication of before/after revisions), and it was suspected that selected before/after revisions were not very relevant for the regression reporting generation.
regression reports were inconsistent: sometimes the report was drowned with false positives, sometimes the report was completely silent on known regressions

This is where Bootlin engineer Alexis Lothoré had the opportunity to bring improvements based on the inputs of Yocto lead architect and maintainer Richard Purdie.

A first area of improvement has been to improve the before/after versions used for the regression comparison. The script in charge of executing resulttool to generate regression reports is called send-qa-email and can be found in yocto-autobuilder-helper repository. Improving this point was a bit challenging since Yocto has multiple version formats depending on the configuration and release cycle, but it led to a better selection of “before” reference by guessing the previous version. With this update, send-qa-email is now able to properly select the revisions, like those shown below:

Current version	Reference selected for regression comparison
master	Previous tag on master
4.2_M2	4.2_M1
4.2_M2.rc3	4.2_M1
4.0.8.rc1	4.0.7
yocto-4.2	yocto-4.1
kirkstone	Previous tag on kirkstone
etc	etc

Now that reference revisions were more accurately set, another needed improvement was the “noise” reduction in the report (which has, incidentally, kind of exploded with the new base/target improvement…). As stated above, many regression reports were affected by false positives/false negatives. While making the reports almost unreadable, it led those to have worrisome sizes for mere text reports (“5.5GB of regressions on latest release?!”). There were multiple root causes of this level of noise, needing different kind of fixes:

many tests results are lacking metadata to ensure that they are compared only to relevant tests. One major example of this issue is the oeselftest category: the oeselftest tests are executed for multiple configurations (configured machines, included/excluded test cases, included/excluded tests tags, etc), leading to some specific subsets of all available tests being run. When different oeselftest subsets are compared, many regressions are raised since the base and target results do not match. Some part of the fixes were quite simple (ensuring that tests are compared only if the target MACHINE matches), while some other cases required bringing in a new metadata tagging mechanism.
some stored tests results have a broken name in the test results repository. Because of those incorrect names, resulttool is not able to compare base and target tests, so it raises many false positives with the transition “PASS => None” (“None” meaning that the test result is not found on the “target” revision, because of a broken name in the “base” results). So we have to detect and “hotfix” those broken test names in order to be able to compute the regression report.

The challenge here is that while tests results generation must be fixed for new tests results, the project can not afford to merely ignore the tests results history affected by those “bugs”, so most of the fixes brought on this are about circumventing those issues on existing tests results and being able to use them for future regression reports.

Bootlin contributions

In order to implement those improvements, we contributed a number of changes to both the openembedded-core and yocto-autobuilder-helper repositories.

More specifically, our contributions to the OpenEmbedded Core repository were:

And to the Yocto Autobuilder Helper repository:

Current state of regression reports

With those specific overhauls, regression reports are becoming readable and start to bring valuable information. They even started to give pointers to some real regressions! If you are interested in going further on this topic, you can check available regression reports in the release directory of the project (please note that this directory is a temporary place for release during validation, and at time of this writing there was no pending release with updated regression report).

But the work on this topic is far from complete, if we can even talk for possible “completion”. There are still many improvements that would benefit the project:

- - generating reports against multiple “base” revisions, to speed up the regression investigation step. For example, we would probably like to know, when generating a new milestone release, when a newly detected regression has been introduced (was it during the last milestone release? Or before the major release cycle?)
  - improve tests results generation more generally

The last point represents in fact many minor issues (like for example this one) that affects the project. Some of those minor issues led to the regression reporting rework. Those issues are good entry points to start contributing to the Yocto Project (they are listed on the project Bugzilla), so do not be afraid to jump in, the project is always happy to welcome newcomers wishing to help on those issues!

Author: Alexis Lothoré

Alexis works at Bootlin as embedded Linux engineer since 2023. He has packaged full Linux distributions for a variety of devices, mostly for IoT devices View all posts by Alexis Lothoré