Releasing Snagboot: a cross-vendor recovery tool for embedded platforms

Recovering and reflashing a bricked board can be a tedious process. It often involves flashing an SD card to bring your device back up, and it gets worse if the board does not have an SD card slot to begin with. Thankfully, most embedded platforms almost always include some form of recovery via USB or UART, which usually involves sending a boot image to the platform’s ROM code. A few tools exist that leverage this functionality to offer quick recovery and reflashing via USB, such as STM32CubeProgrammer, SAM-BA or UUU. However, these tools are all vendor-specific, which means that developers working on various kinds of platforms have to switch between different tools and learn how to use each one.

To address this issue, Bootlin is happy to release today a new recovery and reflashing tool, called Snagboot, which intends to be a generic and open-source replacement to the vendor-specific tools mentioned earlier. It is composed of two parts:

  • snagrecover, which uses vendor-specific ROM code mechanisms to initialize external RAM and run your bootloader (typically U-Boot), without modifying any non-volatile memories.
  • snagflash, which communicates with your bootloader over USB to flash system images to non-volatile memories, using either DFU, USB Mass Storage or fastboot.

Snagboot currently supports about 50 different SoC models, from six different SoC families:

  • STMicroelectronics STM32MP1
  • Microchip SAMA5
  • NXP i.MX6/7/8
  • Texas Instruments AM335x
  • Allwinner Sunxi
  • Texas Instruments AM62x

You can get it from PyPI or browse the sources on github. Our extensive user guide gives all the details on how to setup and use Snagboot on supported platforms. We hope that this tool will be useful to embedded software engineers and that it will continue to grow as support for more SoCs/platforms is added! If you’re familiar with a certain SoC family’s boot process, don’t hesitate to contribute to the project by adding support for your platform!

Yocto Project 4.2 released – Bootlin contributions inside

The Yocto Project has published its new release: 4.2, also known as “Mickledore”.

It features improved Rust support, BitBake engine improvements, support for Linux 6.1 (the latest Long Term Support kernel), new QEMU features, testing improvements and of course many other new features and package updates. See the release notes for all details.

Bootlin has actively contributed to this release, as seen in the number of commits, in particular through our work maintaining the documentation, improving regression detection and on Autobuilder SWAT.

Continue reading “Yocto Project 4.2 released – Bootlin contributions inside”

Bootlin at Embedded Open Source Summit 2023 in Prague, June 28-30

Embedded Open Source Summit logoIn the Embedded Linux ecosystem, the Embedded Linux Conference is the most important event, covering all topics related to the usage of Linux in embedded systems, and probably gathering the largest audience of embedded Linux developers and maintainers.

After several years where it was combined in the much larger Open Source Summit, mixed with conferences on largely unrelated topics, the Embedded Linux Conference is this year grouped only with other embedded-related conferences under an umbrella event called the Embedded Open Source Summit.

Like every year, Bootlin will have a strong participation to the event: no less than 14 engineers of our team will be at the conference, which is almost our entire team. At Bootlin, we strongly believe that participating to conferences is a key aspect of an engineer’s job, in order to stay up-to-date with the latest developments in our field, but also to make or strengthen connections with other members of the embedded Linux community.

Overall, Alexandre Belloni, Kamel Bouhara, Luca Ceresoli, Maxime Chevallier, Hervé Codina, Jérémie Dautheribes, Paul Kocialkowski, Théo Lebrun, Alexis Lothoré, Köry Maincent, Michael Opdenacker, Thomas Perrot and Thomas Petazzoni will participate to the conference.

In addition, we also have 3 talks that have been accepted at the conference, which are visible in the schedule:

Finally it is worth mentioning that Bootlin has already started contributing to the conference: as a member of the Embedded Linux Conference program committee, Bootlin CEO Thomas Petazzoni has already reviewed and participated to the selection of talks that made it to the schedule of this year’s conference.

We look forward to seeing you all in Prague!

Linux 6.3 released, Bootlin contributions inside

Linux 6.3 has been released yesterday, right on schedule. As usual, see the LWN.net articles that covered the 6.3 merge window (part 1 and part 2) as well as the KernelNewbies page.

For this release, Bootlin engineers contributed a total of 66 commits, with the following highlights:

  • Alexandre Belloni, as the RTC subsystem maintainer, contributed a number of patches to RTC drivers: add support for ACPI-based probing to two RTC drivers, convert a number of RTC drivers to use the fwnode API to retrieve IRQ flags
  • Alexis Lothoré contributed a fix for a regression in the FPGA subsystem
  • Clément Léger contributed a fix for a reference count issue found while testing Device Tree overlays and also contributed a minor cleanup to the pcs-rzn1-miic he contributed sometime ago.
  • Hervé Codina contributed a full new driver to support the USB Device controller found in Renesas RZ/N1 processors: renesas_usbf, together with the corresponding Device Tree binding description, Device Tree files updates, as well as an update to the Renesas clock driver
  • Hervé Codina also contributed two new audio codec drivers: one for the Renesas IDT821034 codec and one for the Infineon PEB2466 codec
  • Miquèl Raynal contributed a significant number of updates to the IEEE 802.15.4 stack of the Linux kernel, most notably implementing passive scanning support as well as beaconing support.
  • Paul Kocialkowski contributed a number of fixes for the Allwinner sun6i-csi camera interface driver, the Allwinner MIPI CSI2 bridge driver as well as the Allwinner sun6i-isp ISP driver, following previous contributions he made on all those multimedia drivers

Here are the details of our contributions, commit by commit:

Yocto: sharing the sstate cache and download directories

When developing projects based on Yocto Project / OpenEmbedded, a quite common practice is to have multiple build environments in different directories: one per product, or one for each development branch, or for other scenarios. Each build environment could have different layers, a different configuration, or just using a different version of the source code.

With default settings, different build directories result in duplicated storage for the downloaded source code and build artifacts, as well as duplicated time for downloading the sources and to build everything. This can be troublesome for large projects.

Fortunately, the bitbake build engine can share both the downloaded source code and the intermediate build results across multiple build directories, saving build time and disk space.

Continue reading “Yocto: sharing the sstate cache and download directories”

Continuous integration in Yocto: improving the regressions detection

The Yocto Project is an open source umbrella project which gathers all needed tools to build full Linux distributions for a wide variety of devices. As the interest for Yocto grew since its first steps, its size and number of use cases increased consequentially. This growth quickly introduced the need of automated testing so that developers can keep introducing new features to the project while making sure not to break any existing part. Bootlin engineer Alexis Lothoré has recently been involved in the Continuous Integration infrastructure of the Yocto Project and has brought improvements to allow Yocto maintainers to detect regressions earlier.

Continue reading “Continuous integration in Yocto: improving the regressions detection”

Testing audio: the beauty of sine-waves

XLR connectors, male and femaleAs part of a recent project involving advanced sound cards, Bootlin engineer Miquèl Raynal had to find a way to automate audio hardware loopback testing. In hand, he had a PCI audio device with many external interfaces, each of them featuring an XLR connector. The connectors were wired to analog and digital inputs and outputs. In a regular sound-engineers based company, playing back heavy music through amplifiers and loud speakers is probably the norm, but in order to prevent his colleagues ears from bleeding during his ALSA/DMA debug sessions, he decided to anticipate all human issues and save himself from any whining coming from his nearby colleagues.

Continue reading “Testing audio: the beauty of sine-waves”

A Tegra20 parallel camera capture driver heading for the mainline Linux kernel

Over the past year Bootlin engineer Luca Ceresoli has been working to add a device driver for the parallel camera interface of the NVIDIA Tegra20 System on Chip into the mainline Linux kernel.

The main challenge faced during this work has been the lack of documentation. So the work has been based on a driver from an NVIDIA BSP, forked from a 3.1 kernel (which has been released back in 2011!). The old driver code base needed a huge rework, being largely rewritten, and not only because of the changes in 10+ years of kernel development.

The mainline kernel already has a driver for CSI capture on Tegra210, albeit in staging. The two hardware components have some common functionality, thus to avoid code duplication Luca augmented the existing driver and generalized the code implementing common areas instead of adding a new driver. This posed the additional challenge of not breaking functionality on another SoC, based on a different architecture and using a different video bus… all without access to such other hardware!

Luca just resent version 4 of the patch series implementing this.

If you have a device using Tegra20 parallel capture or Tegra210 CSI video capture, this is a great opportunity to test the code and report your findings! And in case you don’t have the hardware, you’d still be very welcome in reviewing the patches.

Finally, if you have access to the Tegra20 documentation, we’d love to know: the driver could possibly be improved with good knowledge of the hardware.

Fixing reboot in ZynqMP PMU Firmware

Thanks to community contributions, our engineer Luca Ceresoli has recently published a fix to the zynqmp-pmufw-builder repository that allows building a fully working PMU Firmware binary. Rebooting had previously been broken for a long time.

Continue reading “Fixing reboot in ZynqMP PMU Firmware”

Boot time: choose your kernel loading address carefully

When the compressed and uncompressed kernel images overlap

At least on ARM32, there seems to be many working addresses where the compressed kernel can be loaded in RAM. For example, one can load the compressed kernel at offset 0x1000000 (16 MB) from the start of RAM, and the Device Tree Blog (DTB) at offset 0x2000000 (32 MB). Whatever this loading address, the kernel is then decompressed at offset 0x8000 from the start of RAM, as explained this the famous How the ARM32 Linux kernel decompresses article from Linus Walleij.

There is a potential issue with the loading address of the compressed kernel, as explained in the article too. If the compressed kernel is loaded too close to the beginning of RAM, where the kernel must be decompressed, there will be an overlap between the two. The decompressed kernel will overwrite the compressed one, potentially breaking the decompression process.

Overlapping compressed and decompressed kernel

As you see in the above diagram, when this happens, the bootstrap code in the compressed kernel will first copy the compressed image to a location that’s far enough to guarantee that the decompressed kernel won’t overlap it. However, this extra step in the boot process has a cost.

Measuring boot time impact

In the context of updating our materials for our upcoming Embedded Linux Boot Time Optimization course in June, we measured this additional time on the STM32MP157A-DK1 Discovery Kit from STMicroelectronics, with a dual-core ARM Cortex-A7 CPU running at 650 MHz.

Initially, in our Embedded Linux System Development course, we were booting the DK1 board as follows:

ext4load mmc 0:4 0xc0000000 zImage; ext4load mmc 0:4 0xc4000000 dtb; bootz 0xc0000000 - 0xc4000000

0xc0000000 is exactly the beginning of RAM! We are therefore in the overlap situation.

We used grabserial from Tim Bird to measure the time between Starting kernel in U-Boot and when the compressed kernel starts executing (Booting Linux on physical CPU 0x0):

...
[4.451996 0.000124] Starting kernel ...
[0.001838 0.001838] 
[2.439980 2.438142] [    0.000000] Booting Linux on physical CPU 0x0
...

On a series of 5 identical tests, we obtained an average time of 2,440 ms, with a standard deviation of 0.4 ms.

Then, we measured the optimum case, in which the compressed kernel is loaded far enough from the beginning of RAM so that no overlap is possible:

No overlap between compressed and decompressed kernel

Here we chose to load the kernel at 0xc2000000:

ext4load mmc 0:4 0xc2000000 zImage; ext4load mmc 0:4 0xc4000000 dtb; bootz 0xc2000000 - 0xc4000000

On a series of 5 identical tests, we obtained an average time of 2,333 ms, with a standard deviation of 0.7 ms.

The new average is 107 ms smaller, which you are likely to consider as a worthy reduction, if you have experience with boot time reduction projects.

What to remember

In your embedded projects, if you are using a compressed kernel, make sure it is loaded far enough from the beginning of RAM, leaving enough space for the decompressed kernel to fit in between. Otherwise, your system will still be able to boot, but depending on the speed of your CPU and storage, it will be slower, from a few tens to a few hundreds of milliseconds.

We checked the How to optimize the boot time page on the STM32 wiki, and it recommends optimum loading addresses: 0xc2000000 for the kernel and 0xc4000000 for the device tree. This way, the upper limit for the decompressed kernel is 32 MB, which is more than enough.

If you are directly using an uncompressed kernel, which is more rare, you should also make sure that it is loaded at an optimum location, so that there is no need to move it before starting it.