Systemd, read-only rootfs and overlay file system over /etc

Systemd is a popular init system, used to bootstrap user space and manage user processes. It now replaces several Linux utilities with its own components like log management, networking, time management, etc. There is even a bootloader component now. Systemd is obviously ubiquitous nowadays for desktop/server Linux distributions, and is also commonly used on embedded devices to benefit from features such as parallel startup of services, monitoring of services, and more.

In a recent project that uses Buildroot as its build system, we have used systemd with the storage consisting of a read-only root filesystem (SquashFS) and an overlay file system (OverlayFS) mounted on /etc. While doing this, we faced two issues with the use of OverlayFS on /etc:

  • /etc/machine-id file management. This file is created during the first boot by systemd, and if the root filesystem is read-only, it will bind mount it to /run and wait to have read-write access to create it (see more details). In that case, the machine-id file is re-generated at each boot (because /run is a tmpfs, which means that the machine identification changes at each boot, which is not necessarily desirable. On the other hand, we don’t want to machine-id file to be part of the SquashFS filesystem because the SquashFS filesystem is identical on all devices, while the /etc/machine-id file is unique per device. So ideally, we would like this machine-id file to be stored in our OverlayFS, generated during the first boot. The issue is that reading the machine-id file is done very early by systemd, before we get the chance to mount the OverlayFS.
  • We wanted to be able to add or modify systemd services using the OverlayFS. Systemd parses the service files at early init and executes them according to their order and dependencies. The service mounting the filesystems from /etc/fstab and any other services is started after such parsing, which is too late. We could think of running daemon-reload from a custom service once mounting was complete, but this is not really a stable solution, as
    Lennart Poettering commanted on in a short e-mail thread about this issue.

The solution suggested by Lennart, and elsewhere on the wider Internet is to mount the OverlayFS from an initramfs, which allows to have it setup before systemd even starts. As we use Buildroot and using an initramfs adds complexity by requiring a separate configuration to manage multiple images. This was overkill in our case, just for setting up the overlay. The solution we eventually chose was to create an init_overlay.sh script which is started as init before systemd, by adding init=/sbin/init_overlay.sh to the kernel command line:

#!/bin/sh
mount -t proc -o nosuid,nodev,noexec none /proc
mount -t sysfs -o nosuid,nodev,noexec none /sys
mount /dev/mmcblk0p2 /mnt/data
mount -t overlay overlay -o lowerdir=/etc,upperdir=/mnt/data/etc,workdir=/mnt/data/.etc-work /etc
exec /sbin/init

Hopefully, this will be useful to others. Of course, we’re also curious to hear if others faced the same issue, and discover how they solved this. Let us know in the comments.

Snagboot: Designing a USB recovery process for AM335x SoCs

A few months ago, Bootlin released Snagboot, an open-source and generic replacement to the vendor-specific, sometimes proprietary, tools used to recover and reflash embedded platforms. This has led us to design recovery processes over USB for several different SoC families.

Our goal for each recovery process was the following: be able to upload U-Boot in external RAM and run it without modifying any non-volatile memories. Implementing this for many different platforms was challenging, as each vendor used different protocols, bootloader binaries, and methods to boot from recovery mode. Moreover, it was critical that the recovery tool be as user-friendly as possible, not requiring any complex configuration or vendor-specific workflows. This blog post describes the strangest recovery process we had to support so far: the one provided over USB by the Texas Instruments AM335x SoC.

Initializing AM335x platforms

When booted, each SoC has a specific sequence of actions it performs to load and run a target operating system or bare-metal program. This sequence typically starts with a ROM code, stored in a non-volatile internal memory. The main job of a ROM code is to search for a first-stage bootloader in various external memories and load it to internal RAM. In the case of AM335x platforms, this initialization sequence is described in the TI reference manual.

AM335x initialization procedure
AM335x initialization procedure

As we can see, there is nothing too outlandish here. The ROM code checks each device in its boot sequence and attempts to boot from it. What is particularly interesting to us here is the Boot from peripheral device part. Indeed, our ultimate goal is to send U-Boot to the SoC over a USB connection. So we will now dig a little further into this peripheral boot feature. The reference manual states that the AM335x ROM code is capable of booting from three types of peripheral interfaces: EMAC (Ethernet), USB and UART. Considering what we said earlier, what really interests us here is the USB boot feature. The USB boot procedure is described in more detail in the reference manual. And this is where things get a little strange.

Most ROM codes we’ve encountered use fairly simple vendor protocols to communicate over USB. You’ll typically find some memory read/write operations, some run operations, and maybe a few vendor-specific commands. The AM335x ROM code however, uses network protocols to boot over USB! Specifically, the ROM code exposes an RNDIS class device which will be registered as an Ethernet interface by the host-side rndis_host driver. The ROM code will then broadcast BOOTP requests. A BOOTP server on the network should respond to this and supply the SoC with an IP address and the address of a TFTP server. Finally, the ROM code will download the first stage firmware from this TFTP server. To summarize, here is the expected USB boot procedure for AM335x SoCs:

AM335x boot sequence
AM335x boot sequence

This poses a number of issues. Remember, our goal is to boot the SoC using snagboot, a user-friendly and easily configurable CLI tool. Meaning we can’t expect the user to perform any complicated network configurations to be able to use the tool! So these are the main challenges associated with recovering AM335x SoCs:

  1. We need a BOOTP and TFTP server to respond to the ROM code. These servers need IP addresses, which means our tool has to obtain IPs every time it runs.
  2. BOOTP and TFTP servers use ports 67 and 69 which are privileged. However, we don’t want users to have to run snagboot as root.
  3. The ROM code requires an IP address, which means that snagboot has to supply a valid IP address to it every time it runs the recovery.
  4. If another BOOTP server is present on the user’s network during recovery, it could try to answer the ROM code, interfering with snagboot’s operation.

Designing a user-friendly recovery process

To circumvent these challenges, we made use of a number of nice Linux features. Firstly, we can see that the common theme in all these issues is interference with the user’s network. We have to work with local routers to get IP addresses, and we have to ensure that other BOOTP servers will not race us to respond to the board. To address this need, we’ve made use of network namespaces, which are a way of partitioning network resources on the system. When a process runs in a separate network namespace, it will not share network interfaces, routing rules, or firewall rules with the rest of the system.

This is very interesting to us, as it means that we can effectively create a sandbox environment where we can interact with the AM335x ROM code without touching the user’s local network! We can set whatever strange routing and firewall rules we want, and they will be automatically destroyed when we delete the namespace! The general sequence for our recovery process is:

  1. Move the ROM Code’s virtual ethernet interface to a new “snagbootnet” namespace
  2. Set up firewall rules to link ports 67 and 69 to unprivileged ports 9067 and 9069, which will spare us from running as root.
  3. Set up routing rules to assign whatever IP’s we want to the ROM interface and the servers generated by snagboot.
  4. Run snagrecover which will serve a U-Boot SPL image to the ROM Code
  5. Repeat the same process to serve a U-Boot image to SPL (SPL will use essentially the same boot process as the ROM code)
# These iptable rules will allow snagboot to use unprivileged ports 9067 and 9069
# as proxies for privileged ports 67 and 69
ip netns exec $NETNS_NAME iptables -t nat -A PREROUTING \
   -p udp --dport 67 -j DNAT --to-destination :9067
ip netns exec $NETNS_NAME iptables -t nat -A PREROUTING \
   -p udp --dport 69 -j DNAT --to-destination :9069
ip netns exec $NETNS_NAME iptables -t nat -A POSTROUTING \
   -p udp --sport 9067 -j MASQUERADE --to-ports 67
ip netns exec $NETNS_NAME iptables -t nat -A POSTROUTING \
   -p udp --sport 9069 -j MASQUERADE --to-ports 69

The network namespace and network configurations can be done by a wrapper script, that will be executed by the user before running snagboot normally. However, there is another challenging issue with this method. When U-Boot SPL runs, it will expose a new RNDIS interface which will be registered by the host system and be brought up in the default network namespace. This means that we will not be able to access SPL’s virtual ethernet interface from inside our custom network namespace! Thus, we must use one final trick to automatically move SPL’s interface inside our namespace when it is brought up. The namespace setup script will run a polling subprocess in the background. This subprocess will regularly check /sys/class/net for new interfaces matching certain USB addresses, and will automatically move them to our namespace once detected.

poll_interface () {
  # check for network interfaces with device nodes matching our ROM code
  # and SPL RNDIS gadget addresses
  ROMNETFILE=$(grep -l "PRODUCT=$ROMUSB" $(grep -l "DEVTYPE=usb_interface" /sys/class/net/*/device/uevent))
  SPLNETFILE=$(grep -l "PRODUCT=$SPLUSB" $(grep -l "DEVTYPE=usb_interface" /sys/class/net/*/device/uevent))
  if [ -e "$ROMNETFILE" ]; then
    config_interface "$(echo $ROMNETFILE | cut -d '/' -f 5)"
  fi
  if [ -e "$SPLNETFILE" ]; then
    config_interface "$(echo $SPLNETFILE | cut -d '/' -f 5)"
  fi
}

You can check out the full setup script by running snagrecover --am335-setup if you are interested.

With this, we have a complete recovery process for AM335! From the user’s points of view, the only big difference with other SoC recoveries is an additional helper script that needs to be run before snagrecover. Designing the AM335x support for Snagboot was a very interesting technical problem, with a solution that illustrated the flexibility offered by Linux systems.

Welcome to Romain Gantois and Louis Chauvet

Welcome on board!We are pleased to welcome two additional engineers to our team based in Toulouse, France: Romain Gantois and Louis Chauvet.

Romain Gantois graduated from ISEP and completed his final internship at Bootlin during which he developed and published Snagboot, the generic and open-source board recovery and reflashing tool, and worked on an upstream Linux kernel driver for a Qualcomm Ethernet switch (patches will be submitted soon!). Following this internship, Romain is joining our team as a full-time embedded Linux and Linux engineer.

Louis Chauvet graduated from INSA Toulouse. He completed his final internship abroad, during which he worked on developing in Rust, in particular the development of Linux kernel drivers in Rust. Louis is also joining us as a full-time embedded Linux and Linux kernel engineer.

Both Romain and Louis are experienced Linux users and developers, with a solid education in low-level and embedded systems development. They will help us address more embedded Linux projects from our customers on a wide variety of topics, and are already benefiting from our training courses and the interaction with our senior engineers to quickly gain even more knowledge and experience.

Once again, welcome Romain and Louis!

Feedback from ELCE 2023: selection of talks #3

As we reported in a previous blog post, almost the entire Bootlin engineering team was at the Embedded Linux Conference Europe in Prague in June. In order to share with our readers more about what happened at this conference, we have asked all engineers at Bootlin to select one talk they found interesting and useful and share a short summary of it. We will share this feedback in a series of blog post: first post, second post, this one being the third of the series.

rtla timerlat: Debugging Real-time Linux Scheduling Latency

Talk by Daniel Bristot de Oliveira, chosen by Bootlin engineer Maxime Chevallier.

Talks related to real-time linux debugging are pretty common at ELCE, I gave one myself in 2017 and I’ve been attending most of them since then. Besides a headache, what I could get from attending all these talks is that this topic is complex, time consuming, and that there’s a lot of different methodologies one can use to find the cause of these elusive problems.

Users who aren’t very familiar with the inner workings of the Linux Kernel can ask for help on mailing-lists, and the reply usually asks for a trace. This is where things get complicated, the Linux kernel tracer is very powerful, but can drown users in a flood of trace events from which it is difficult to extract the relevant data.

Hopefully, Daniel’s talk is going to make this kind of talk less common, as the tool he wrote and presented, rtla, makes it easy to gather important information about the cause of undesired latencies. By using cleverly placed trace-points, in-kernel testing tools (timerlat and osnoise) and an automated trace analyzer, rtla can not only detect latencies as cyclictest would, it can also give you what caused the latency. If it’s a blocking problem, rtla tells you which process is blocking your task. If it’s an interference, rtla will tell you which task or interrupt caused the latency, and can even detect if the hardware itself is the culprit.

For developers, this tool is also a perfect way to gather user feedback and bug reports that are small, precise and easily reproducible.

I therefore strongly recommend checking out Daniel’s talk and his dedicated blog article.

Slides: PDF
Video: Youtube

Zbus – the Lightweight and Flexible Zephyr Message Bus

Talk by Robrigo Peixoto, chosen by Bootlin engineer Thomas Perrot

Zbus is a new message bus for Zephyr allowing threads to communicate to many others, easily. This bus allows to implement several bus topologies:

    • one-to-one
    • one-to-many
    • Many-to-many

In addition, it can be used on very constrained systems.

In this talk, Rodrigo explained in detail how Zbus works, through a few examples. A thread can read or publish in bus channels, and when a message is published into a channel:

      • The Listener’s callbacks are executed
      • A notification is put to the subscriber’s queues
      • Then the subscriber will be executed by priority order

The bus is managed by a dispatcher, named Virtual Distributed Event Dispatcher (VDED) that is robust to priority inversion.

We found Zbus to be a very interesting feature because before there was no easy way to implement one-to-many and many-to-many topologies, but also one-to-one communications without having to manage the problems of inverting priorities and to use FIFO, LIFO, pipe, etc.

Slides: PDF
Video: Youtube

Linux Power ! (from the Perspective of a PMIC Vendor)

Talk by Matti Vaittinen, chosen by Bootlin engineer Kamel Bouhara.

PMICs (Power Management Integrated Circuit) are a key component of low power embedded systems as they often handle complexity in controlling various power voltages required by SoCs. In his talk Matti Vaittinen started by depicting the various devices that can be embedded in a PMIC (Power Management Integrated Circuit): watchdog, RTC, GPIOs are examples of such extra functionalities. He reminded us the reason why such devices are best fitted in the Linux MFD subsystem to take advantage of existing code. However the main subsystem used to implement support for a PMIC is the regulator subsystem and the talk gives us a good understanding of how it works, the concept of provider/consumer, how to register multiple regulators for a PMIC and how to handle specific events. A focus is made on error detection and how over current errors are reported over three categories:

      • PROTECTION : hardware level errors reported when protection limit is reached
      • ERROR: Unrecoverable errors that don’t directly involve hardware shutdown.
      • WARNING: System is still recoverable but requires specific action to be taken

Some PMICs also provide IRQs to notify errors or events and the kernel provides a helper function to handle such notifications and map them to specific actions depending on their severity.

Overall, we found this talk interesting to understand bettert the features provided by PMICs, and how these features are supported by Linux.

Slides: PDF
Video: Youtube

Linux 6.5 released, Bootlin contributions

Linux 6.5 was released yesterday, with as usual over 10,000 commits from a large number of contributors. We recommend reading LWN.net articles on the merge window (part 1, part 2), but also the CNX Software page that focuses on embedded-related improvements.

Bootlin contributed 76 commits to this kernel release, putting us as the #26 contributing company. This time around, our main contributions have been:

  • The large stack of patches from Luca Ceresoli on the NVidia Tegra camera interface driver finally landed: they add support for the Tegra20 parallel camera interface to the existing driver, which required a lot of changes to the driver that was so far only support Tegra210 CSI. This work allows one of our customers, who was stuck on an old vendor NVidia kernel to an upstream Linux kernel.
  • Hervé Codina contributed a driver for the Renesas X9250 potentiometer, in the IIO subsystem. This will be followed in Linux 6.6 by a glue driver that allows to expose an IIO device as an auxiliary device in the ALSA subsystem, allowing this potentiometer to be used in audio applications
  • Alexis Lothoré contributed support for the Marvell MV88E6361 Ethernet switch into the existing mv88e6xxx DSA driver
  • Maxime Chevallier contributed a new regmap-based MDIO driver, which required some changes in the regmap code. This allows the Altera TSE driver to use the existing Lynx PCS driver, and drop the custom Altera TSE PCS driver. Finally, the stmmac Ethernet driver is modified to be able to use the Lynx PCS driver as well. Quite an adventure to finally get proper PCS support with stmmac
  • Miquèl Raynal contributed improvments in the 802.15.4 stack, especially related to scanning support.
  • Miquèl Raynal contributed fixes to the sja1000 CAN driver (to avoid overrun stalls on Renesas processors), to the SPI subsystem (to avoid false timeouts for long transfers), to the DMA engine driver for Xilinx XDMA IP, and a few more.
  • Miquèl Raynal also continued his effort of improving the Device Tree bindings for MTD NAND controllers
  • Luca Ceresoli added sound card support to the MSC SM2-MB-EP1 carrier board, which runs a i.MX8MP SoM, and he also fixed the timings for one of the panels supported by the simple-panel driver

Here are the details of all our changes that went into Linux 6.5:

Bootlin toolchains 2023.08 released

Bootlin toolchains 2023.08We are happy to announce that we have just published a new update of our freely available toolchains at toolchains.bootlin.com, version 2023.08.

For the record, we provide pre-built cross-compilation toolchains that work on x86-64 Linux machines, and targeting 43 different CPU architecture variants, with support for all 3 major C libraries: glibc, musl and uClibc-ng. For each toolchain, we provide two versions: a stable one that uses GCC/binutils/GDB versions next to the last, and a bleeding-edge one that uses the very latest GCC/binutils/GDB versions.

In this 2023.08 release, we have:

  • Updated the bleeding-edge toolchains to gcc 13.2, binutils 2.41, gdb 13.2, kernel headers 5.10, glibc 2.37, musl 1.2.4 or uclibc-ng 1.0.43
  • Updated the stable toolchains to gcc 12.3, binutils 2.40, gdb 12.1, kernel headers 4.14, glibc 2.37, musl 1.2.4 or uclibc-ng 1.0.43
  • Marked the sparcv8 toolchain as obsolete as sparc support in GCC has been broken for several releases, and the last working version of GCC for sparc has been dropped from Buildroot

A special thanks to Romain Naour from Smile who helped investigate and resolve some of the issues encountered in the preparation of those 2023.08 toolchains.

If you encounter any issue in the usage of those toolchains, or miss the support for a specific feature or architecture variant, let us know through the issue tracker. We hope those toolchains will continue to be useful to the community.

Bootlin collaborates with DENT to upstream ONIE NVMEM support in Linux

DENT project logoThe DENT project is a project from the Linux Foundation which aims at utilizing the Linux Kernel, Switchdev, and other Linux based projects as the basis for building a new standardized network operating system without abstractions or overhead.

Recently, Bootlin collaborated with the DENT project to work on a specific topic: extending the Linux kernel NVMEM subsystem to be able to support the ONIE TLV storage format which is used on ONIE-compliant network equipment to store in an EEPROM various information about the device: serial number, model, MAC addresses, and more.

This work, lead by Bootlin engineer Miquèl Raynal has now landed in Linux 6.4 as the drivers/nvmem/layouts/onie-tlv.c driver, together with the underlying new NVMEM layout infrastructure, which Miquèl helped to upstream in collaboration with Michael Walle.

We have written and published a longer blog post on the DENT website to explain the motivation for this effort and the results.

Back from the Embedded Linux Conference Europe 2023

From June 28 to June 30, Bootlin participated to the Embedded Linux Conference Europe, which was organized as part of the new and larger Embedded Open Source Summit.

In addition, the day before the conference, on June 27, our team had a great team building event, spending the day visiting Prague, having lunch in a traditional restaurant, enjoying a boat tour on the Vltava river, and an evening with a traditional dinner and folklore music. As our team is distributed, conferences are a great opportunity to meet each other and Prague was for several members of our team their first in-person meeting.

With 14 Bootlin engineers at the conference, almost our entire engineering team participated. Indeed, we have a policy at Bootlin to offer to all our engineers, regardless of their seniority level, the chance of attending 2 technical conferences each year.

Continue reading “Back from the Embedded Linux Conference Europe 2023”

Linux 6.4 released, Bootlin contributions inside

Linux 6.4 was released on June 25, just before the start of the Embedded Open Source Summit in Prague. As usual, lots of changes in Linux 6.4, and we recommend reading LWN coverage of the merge window (part 1, part 2). Sadly, the usual KernelNewbies page hasn’t received a lot of attention, contributions are probably welcome to revive this useful resource.

With 59 commits from Bootlin engineers, Bootlin is ranked as the #28 contributing company by number of commits for this 6.4 release, according to contribution statistics. Our main contributions have been:

  • Alexis Lothoré and Clément Léger contributed a few fixes to the Renesas RZ/N1 A5PSW Ethernet switch driver
  • Hervé Codina contributed a number of new drivers needed to support complex audio setups on some relatively old Freescale PowerPC 32-bit platforms: a driver for the Time Slot Assigner (TSA), a driver for the QUICC Multichannel Controller (QMC), and an ALSA driver that provides audio support over QMC. We have more contributions coming in this area, most notably to support HDLC network traffic over QMC.
  • Kamel Bouhara added support for the TI TAS5733 audio codec in the existing tas571x driver
  • Luca Ceresoli improved the fsl-ldb driver, used on NXP i.MX8MP and i.MX93 for the built-in DPI-to-LVDS encoder. Luca’s improvement allows to use LVDS channel 1 only, while the driver initially supported using either LVDS channel 0, or LVDS channel 0 and 1 combined.
  • Maxime Chevallier contributed an improvement to the regmap code, which allows upshifting register addresses before performing operations
  • Maxime Chevallier also contributed some small fixes to the phylink code related to previous work on QUSGMII support
  • Miquèl Raynal contributed the support for Real-While-Write in the MTD SPI-NOR subsystem. This allows to perform read operations while erase/program operations are on-going, which helps to reduce read latencies. This of course only works on SPI NOR chips that support this feature.
  • Miquèl Raynal contributed several improvements to the NVMEM subsystem. First, a brand new NVMEM driver capable of parsing the ONIE TLV information, as defined by the ONIE spec used on network equipment. Second, he contributed changes that allow NVMEM layout drivers to be compiled as kernel modules rather than being built-in

And the full details of our contributions:

New training course: Embedded Linux Audio

Embedded Linux Audio training course
Image from flaticon.com
We are very happy to announce the availability of a new training course in our portfolio: Embedded Linux audio.

Over the past years, Bootlin has helped more and more of its customers with numerous audio aspects on embedded Linux systems: development of Linux kernel drivers for audio components, description of audio hardware in Device Tree, support of unusual audio hardware setups, integration of user-space audio frameworks and servers such as PipeWire, and more. We have seen an interest from our customers and the broader community in getting trained on those topics, so we have built a brand new training course covering the following:

  • Digital Audio Representation
  • Audio hardware
  • Linux kernel ASoC subsystem
  • Linux kernel helpers for audio
  • Audio routing
  • More kernel audio components
  • Audio troubleshooting and debugging
  • User-space configuration for audio hardware
  • User-space configuration for audio controls
  • User-space APIs to play and capture audio
  • PipeWire
  • GStreamer

The detailed agenda of course is available for on-line sessions (4 half-days of 4 hours each) and on-site sessions (2 days). As usual with Bootlin, our training materials will be published for free under an open-source license in the next few weeks.

This course has been developed and is taught by Bootlin expert Alexandre Belloni.

We have a first public on-line session scheduled on September 11-14 2023, with a possible extra session on September 15. Sessions take place from 2 PM to 6 PM UTC+2 on each day. Seats are offered at 619 EUR per participant, with a discount at 519 EUR per participant under conditions. You can book your seat now, beware that only 12 seats are available.

This new training course is the 9th training course we offer in our portfolio, with all courses centered around embedded Linux development. We aim at developing more of those specific courses in the next few years, to continue to help engineers working on embedded Linux grow their skills and expertise.