Supporting a misbehaving NAND ECC engine

Over the years, Bootlin has grown a significant expertise in U-Boot and Linux support for flash memory devices. Thanks to this expertise, we have recently been in charge of rewriting and upstreaming a driver for the Arasan NAND controller, which is used in a number of Xilinx Zynq SoCs. It turned out that supporting this NAND controller had some interesting challenges to handle its ECC engine peculiarities. In this blog post, we would like to give some background about ECC issues with NAND flash devices, and then dive into the specific issues that we encountered with the Arasan NAND controller, and how we solved them.

Ensuring data integrity

NAND flash memories are known to be intrinsically rather unstable: over time, external conditions or repetitive access to a NAND device may result in the data being corrupted. This is particularly true with newer chips, where the number of corruptions usually increases with density, requiring even stronger corrections. To mitigate this, Error Correcting Codes are typically used to detect and correct such corruptions, and since the calculations related to ECC detection and correction are quite intensive, NAND controllers often embed a dedicated engine, the ECC engine, to offload those operations from the CPU.

An ECC engine typically acts as a DMA master, moving, correcting data and calculating syndromes on the fly between the controller FIFO’s and the user buffer. The engine correction is characterized by two inputs: the size of the data chunks on which the correction applies and the strength of the correction. Old SLC (Single Level Cell) NAND chips typically require a strength of 1 symbol over 4096 (1 bit/512 bytes) while new ones may require much more: 8, 16 or even 24 symbols.

In the write path, the ECC engine reads a user buffer and computes a code for each chunk of data. NAND pages being longer than officially advertised, there is a persistent Out-Of-Band (OOB) area which may be used to store these codes. When reading data, the ECC engine gets fed by the data coming from the NAND bus, including the OOB area. Chunk by chunk, the engine will do some math and correct the data if needed, and then report the number of corrected symbols. If the number of error is higher than the chosen strength, the engine is not capable of any correction and returns an error.

The Arasan ECC engine

As explained in our introduction, as part of our work on upstreaming the Arasan NAND controller driver, we discovered that this NAND controller IP has a specific behavior in terms of how it reports ECC results: the hardware ECC engine never reports errors. It means the data may be corrected or uncorrectable: the engine behaves the same. From a software point of view, this is a critical flaw and fully relying on such hardware was not an option.

To overcome this limitation, we investigated different solutions, which we detail in the sections below.

Suppose there will never be any uncorrectable error

Let’s be honest, this hypothesis is highly unreliable. Besides that anyway, it would imply that we do not differentiate between written/erased pages and users would receive unclean buffers (with bitflips), which would not work with upper layers such as UBI/UBIFS which expect clean data.

Keep an history of bitflips of every page

This way, during a read, it would be possible to compare the evolution of the number of bitflips. If it suddenly drops significantly, the engine is lying and we are facing an error. Unfortunately it is not a reliable solution either because we should either trigger a write operation every time a read happens (slowing down a lot the I/Os and wearing out very quickly the storage device) or loose the tracking after every power cycle which would make this solution very fragile.

Add a CRC16

This CRC16 could lay in the OOB area and help to manually verify the data integrity after the engine’s correction by checking it against the checksum. This could be acceptable, even if not perfect in term of collisions. However, it would not work with existing data while there are many downstreams users of the vendor driver already.

Use a bitwise XOR between raw and corrected data

By doing a bitwise XOR between raw and corrected datra, and compare with the number of bitflips reported by the engine, we could detect if the engine is lying on the number of corrected bitflips. This solution has actually been implemented and tested. It involves extra I/Os as the page must be read twice: first with correction and then again without correction. Hence, the NAND bus throughput becomes a limiting factor. In addition, when there are too many bitflips, the engine still tries to correct data and creates bitflips by itself. The result is that, with just a XOR, we cannot discriminate a working correction from a failure. The following figure shows the issue.

Rely on the hardware only in the write path

Using the hardware engine in the write path is fine (and possibly the quickest solution). Instead of trying to workaround the flaws of the read path, we can do the math by software to derive the syndrome in the read path and compare it with the one in the OOB section. If it does not match, it means we are facing an uncorrectable error. This is finally the solution that we have chosen. Of course, if we want to compare software and hardware calculated ECC bytes, we must find a way to reproduce the hardware calculations, and this is what we are going to explore in the next sections.

Reversing a hardware BCH ECC engine

There is already a BCH library in the Linux kernel on which we could rely on to compute BCH codes. What needed to be identified though, were the BCH initial parameters. In particular:

The BCH primary polynomial, from which is derived the generator polynomial. The latter is then used for the computation of BCH codes.
The range of data on which the derivation would apply.

There are several thousands possible primary polynomials with a form like x^3 + x^2 + 1. In order to represent these polynomials more easily by software, we use integers or binary arrays. In both cases, each bit represents the coefficient for the order of magnitude corresponding to its position. The above example could be represented by b1101 or 0xD.

For a given desired BCH code (ie. the ECC chunk size and hence its corresponding Gallois Field order), there is a limited range of possible primary polynomials which can be used. Given eccsize being the amount of data to protect, the Gallois Field order is the smallest integer m so that: 2^m > eccsize. Knowing m, one can check these tables to see examples of polynomials which could match (non exhaustive). The Arasan ECC engine supporting two possible ECC chunk sizes of 512 and 1024 bytes, we had to look at the tables for m = 13 and m = 14.

Given the required strength t, the number of needed parity bits p is: p = t x m.

The total amount of manipulated data (ECC chunk, parity bits, eventual padding) n, also called BCH codeword in papers, is: n = 2^m - 1.

Given the size of the codeword n and the number of parity bits p, it is then possible to derive the maximum message length k with: k = n - p.

The theory of BCH also shows that if (n, k) is a valid BCH code, then (n - x, k - x) will also be valid. In our situation this is very interesting. Indeed, we want to protect eccsize number of symbols, but we currently cover k within n. In other words we could use the translation factor x being: x = k - eccsize. If the ECC engine was also protecting some part of the OOB area, x should have been extended a little bit to match the extra range.

With all this theory in mind, we used GNU Octave to brute force the BCH polynomials used by the Arasan ECC engine with the following logic:

Write a NAND page with a eccsize-long ECC step full of zeros, and another one full of ones: this is our known set of inputs.
Extract each BCH code of p bits produced by the hardware: this is our known set of outputs.

For each possible primary polynomial with the Gallois Field order m, we derive a generator polynomial, use it to encode both input buffers thanks to a regular BCH derivation, and compare the output syndromes with the expected output buffers.

Because the GNU Octave program was not tricky to write, we first tried to match with the output of Linux software BCH engine. Linux using by default the primary polynomial which is the first in GNU Octave’s list for the desired field order, it was quite easy to verify the algorithm worked.

As unfortunate as it sounds, running this test with the hardware data did not gave any match. Looking more in depth, we realized that visually, there was something like a matching pattern between the output of the Arasan engine and the output of Linux software BCH engine. In fact, both syndromes where identical, the bits being swapped at byte level by the hardware. This observation was made possible because the input buffers have the same values no matter the bit ordering. By extension, we also figured that swapping the bits in the input buffer was also necessary.

The primary polynomial for an eccsize of 512 bytes being already found, we ran again the program with eccsize being 1024 bytes:

eccsize = 1024 eccstrength = 24 m = 14 n = 16383 p = 336 k = 16047 x = 7855 Trying primary polynomial #1: 0x402b Trying primary polynomial #2: 0x4039 Trying primary polynomial #3: 0x4053 Trying primary polynomial #4: 0x405f Trying primary polynomial #5: 0x407b [...] Trying primary polynomial #44: 0x43c9 Trying primary polynomial #45: 0x43eb Trying primary polynomial #46: 0x43ed Trying primary polynomial #47: 0x440b Trying primary polynomial #48: 0x4443 Primary polynomial found! 0x4443

Final solution

With the two possible primary polynomials in hand, we could finish the support for this ECC engine.

At first, we tried a “mixed-mode” solution: read and correct the data with the hardware engine and then re-read the data in raw mode. Calculate the syndrome over the raw data, derive the number of roots of the syndrome which represents the number of bitflips and compare with the hardware engine’s output. As finding the syndrome’s roots location (ie. the bitflips offsets) is very time consuming for the machine it was decided not to do it in order to gain some time. This approach worked, but doing the I/Os twice was slowing down very much the read speed, much more than expected.

The final approach has been to actually get rid of any hardware computation in the read path, delegating all the work to Linux BCH logic, which indeed worked noticeably faster.

The overall work is now in the upstream Linux kernel:

Bit-swapping support in the Linux kernel BCH library: lib/bch: Allow easy bit swapping
The Arasan NAND controller driver, first without hardware ECC support: mtd: rawnand: arasan: Add new Arasan NAND controller
The addition of hardware ECC support to the Arasan NAND controller driver:
mtd: rawnand: arasan: Support the hardware BCH ECC engine

If you’re interested about more details on ECC for flash devices, and their support in Linux, we will be giving a talk precisely on this topic at the upcoming Embedded Linux Conference!

Bootlin contributes SquashFS support to U-Boot

SquashFS is a very popular read-only compressed root filesystem, widely used in embedded systems. It has been supported in the Linux kernel for many years, but so far the U-Boot bootloader did not have support for SquashFS, so it was not possible to load a kernel image or a Device Tree Blob from a SquashFS filesystem in U-Boot.

Between February 2020 and August 2020, João Marcos Costa from the ENSICAEN engineering school, has worked at Bootlin as an intern. João’s internship goal was specifically to implement and contribute to U-Boot the support for the SquashFS filesystem. We are happy to announce that João’s effort has now completed, as the support for SquashFS is now in upstream U-Boot. It can be found in fs/squashfs/ in the U-Boot source code.

More specifically, João’s contributions have been:

fs/squashfs: new filesystem, this is the core of the contribution, the SquashFS filesystem driver itself
fs/squashfs: add filesystem commands, which adds the sqfsls and sqfsload commands in U-Boot
include/u-boot, lib/zlib: add sources for zlib decompression and fs/squashfs: add support for zlib decompression, which add support for zlib decompression in the SquashFS driver
test/py: Add tests for the SquashFS commands, which extends the U-Boot test suite to also test the SquashFS fielsystem support

In addition to those contributions already merged, João has also submitted for inclusion the support for LZO and ZSTD decompression support.

Practically speaking, this SquashFS support works very much like the support for other filesystems. At build time, you need to enable the CONFIG_FS_SQUASHFS option for the SquashFS driver itself, and CONFIG_CMD_SQUASHFS for the SquashFS U-Boot commands. Once enabled, in U-Boot, you get:

=> sqfsls     
sqfsls - List files in directory. Default: root (/).
 
Usage:
sqfsls  [] [directory]
    - list files from 'dev' on 'interface' in 'directory'
 
=> sqfsload 
sqfsload - load binary file from a SquashFS filesystem
 
Usage:
sqfsload  [ [ [ [bytes [pos]]]]]
    - Load binary file 'filename' from 'dev' on 'interface'
      to address 'addr' from SquashFS filesystem.
      'pos' gives the file position to start loading from.
      If 'pos' is omitted, 0 is used. 'pos' requires 'bytes'.
      'bytes' gives the size to load. If 'bytes' is 0 or omitted,
      the load stops on end of file.
      If either 'pos' or 'bytes' are not aligned to
      ARCH_DMA_MINALIGN then a misaligned buffer warning will
      be printed and performance will suffer for the load.

sqfsls is obviously used to list files, here the list of files from a typical Linux root filesystem:

=> sqfsls mmc 0:1
            bin/
            boot/
            dev/
            etc/
            lib/
    <SYM>   lib32
    <SYM>   linuxrc
            media/
            mnt/
            opt/
            proc/
            root/
            run/
            sbin/
            sys/
            tmp/
            usr/
            var/
 
2 file(s), 16 dir(s)

And then you can use sqfsload to load files, which we illustrate here by loading a Linux kernel image and Device Tree blob, and booting this kernel:

=> sqfsload mmc 0:1 $kernel_addr_r /boot/zImage
6160384 bytes read in 433 ms (13.6 MiB/s)
=> sqfsload mmc 0:1 0x81000000 /boot/am335x-boneblack.dtb
40817 bytes read in 11 ms (3.5 MiB/s)
=> setenv bootargs console=ttyO0,115200n8
=> bootz $kernel_addr_r - 0x81000000
## Flattened Device Tree blob at 81000000
   Booting using the fdt blob at 0x81000000
   Loading Device Tree to 8fff3000, end 8fffff70 ... OK
 
Starting kernel ...
 
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.19.79 (joaomcosta@joaomcosta-Latitude-E7470) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)) #1 SMP Fri May 29 18:26:39 CEST 2020
[    0.000000] CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c5387d

Of course, the SquashFS driver is still fresh, and there is a chance that more extensive and widespread testing will uncover a few bugs or limitations, which we’re sure the broader U-Boot community will help address. Overall, we’re really happy to have contributed this new functionality to U-Boot, it will be useful for our projects, and we hope it will be useful to many others in the embedded Linux community!

Linux 5.8 released: Bootlin contributions

Linux 5.8 was released recently. See our usual resources for a good coverage of the highlights of this new release: KernelNewbies page, LWN.net article on the first part of the merge window, LWN.net article on the second part of the merge window.

On our side, we contributed a total of 155 commits to Linux 5.8, which makes Bootlin the 19th contributing company by number of commits according to Linux Kernel Patch Statistic. The highlights of our contributions are:

Miquèl Raynal contributed a completely new NAND controller driver: the arasan-nand-controller driver, used on Xilinx platforms.
In the MTD subsystem, Miquèl Raynal, as one of the co-maintainers, made a substantial number of contributions: cleanups in the nandsim driver, drop of the nand_release() API, support in the NAND core for the specificities of the arasan-nand-controller driver in terms of ECC handling (we will soon publish a blog post on this topic!)
On the support of Atmel/Microchip platforms
- Alexandre Belloni migrated the SAMA5D3, AT91SAM9N12, AT91RM9200 and AT91SAM9G45 Device Tree files to use the new clock DT bindings
- Grégory Clement modified the atmel_usba_udc USB device controller driver to no longer require describing all USB endpoints in the Device Tree, since they are always the same for a given SoC.
Grégory Clement contributed a number of improvements and fixes for the n_gsm line discipline driver, which allows to multiplex an UART used to communicate with a GSM modem. These improvements and fixes allowed the n_gsm driver to be fully stable for one of our customers.
In the RTC subsystem, Alexandre Belloni (maintainer of that subsystem) did a number of small improvements to various RTC drivers.
Antoine Ténart has done a number of improvements in the support for Microchip/Microsemi networking products: improvements to the mscc-miim MDIO driver, improvements to the MSCC Ocelot Ethernet switch driver, improvements to the MSCC Ethernet PHY Driver.

Also, several Bootlin engineers are maintainers of various areas of the Linux kernel:

Miquèl Raynal, as the NAND maintainer and MTD co-maintainer, reviewed and merged 57 patches from other contributors
Alexandre Belloni, as the RTC maintainer and Microchip platform support co-maintainer, reviewed and merged 54 patches from other contributors
Grégory Clement, as the Marvell EBU platform support co-maintainer, reviewed and merged 13 patches from other contributors

Here is the complete list of our contributions:

Linux 5.7 released, Bootlin contributions

We’re late to the party as Linux 5.8 is going to be released in a few weeks, but we never published about our contribution to the current Linux stable release, Linux 5.7, so here is our usual summary! For an overview of the major changes in 5.7, KernelNewbies has a nice summary, as well as LWN, in two parts: part 1 and part 2.

Bootlin contributed 92 commits to this release, a small number of contributions compared to past releases, but nevertheless with some significant work:

Antoine Ténart contributed support for offloading the MACsec encryption/decryption to a PHY in the networking stack, as well as the corresponding offloading support for some specific Microchip/Vitesse Ethernet PHYs. See our blog post for more details about this feature.
Alexandre Belloni continued converting the Atmel/Microchip platforms to the new clock representation, with this time AT91SAM9G45, SAMA5D3, AT91SAM9N12 and AT91RM9200.
Alexandre Belloni, as the RTC subsystem maintainer, again did a lot of cleanup and improvements in multiple RTC drivers.
Kamel Bouhara contributed support for I2C recovery for the Atmel/Microchip platforms.

In terms of maintainers activity: Miquèl Raynal, as the MTD co-maintainer, merged 62 patches from other contributors, Alexandre Belloni, the RTC maintainer and Atmel/Microchip platform co-maintainer merged 49 patches from other contributors, while Grégory Clement, as the Marvell EBU platforms co-maintainer, merged 11 patches from other contributors.

Here is the detail of our contributions for 5.7:

Measured boot with a TPM 2.0 in U-Boot

A Trusted Platform Module, in short TPM, is a small piece of hardware designed to provide various security functionalities. It offers numerous features, such as storing secrets, ‘measuring’ boot, and may act as an external cryptographic engine. The Trusted Computing Group (TCG) delivers a document called TPM Interface Specifications (TIS) which describes the architecture of such devices and how they are supposed to behave as well as various details around the concepts.

These TPM chips are either compliant with the first specification (up to 1.2) or the second specification (2.0+). The TPM2.0 specification is not backward compatible and this is the one this post is about. If you need more details, there are many documents available at https://trustedcomputinggroup.org/.

Picture of a TPM wired on an EspressoBin — Trusted Platform Module connected over SPI to Marvell EspressoBin platform

Among the functions listed above, this blog post will focus on the measured boot functionality.

Measured boot principles

Measuring boot is a way to inform the last software stage if someone tampered with the platform. It is impossible to know what has been corrupted exactly, but knowing someone has is already enough to not reveal secrets. Indeed, TPMs offer a small secure locker where users can store keys, passwords, authentication tokens, etc. These secrets are not exposed anywhere (unlike with any standard storage media) and TPMs have the capability to release these secrets only under specific conditions. Here is how it works.

Starting from a root of trust (typically the SoC Boot ROM), each software stage during the boot process (BL1, BL2, BL31, BL33/U-Boot, Linux) is supposed to do some measurements and store them in a safe place. A measure is just a digest (let’s say, a SHA256) of a memory region. Usually each stage will ‘digest’ the next one. Each digest is then sent to the TPM, which will merge this measurement with the previous ones.

The hardware feature used to store and merge these measurements is called Platform Configuration Registers (PCR). At power-up, a PCR is set to a known value (either 0x00s or 0xFFs, usually). Sending a digest to the TPM is called extending a PCR because the chosen register will extend its value with the one received with the following logic:

PCR[x] := sha256(PCR[x] | digest)

This way, a PCR can only evolve in one direction and never go back unless the platform is reset.

In a typical measured boot flow, a TPM can be configured to disclose a secret only under a certain PCR state. Each software stage will be in charge of extending a set of PCRs with digests of the next software stage. Once in Linux, user software may ask the TPM to deliver its secrets but the only way to get them is having all PCRs matching a known pattern. This can only be obtained by extending the PCRs in the right order, with the right digests.

Linux support for TPM devices

A solid TPM 2.0 stack has been around for Linux for quite some time, in the form of the tpm2-tss and tpm2-tools projects. More specifically, a daemon called resourcemgr, is provided by the tpm2-tss project. For people coming from the TPM 1.2 world, this used to be called trousers. One can find some commands ready to be used in the tpm2-tools repository, useful for testing purpose.

From the Linux kernel perspective, there are device drivers for at least SPI chips (one can have a look there at files called tpm2*.c and tpm_tis*.c for implementation details).

Bootlin’s contribution: U-Boot support for TPM 2.0

Back when we worked on this topic in 2018, there was no support for TPM 2.0 in U-Boot, but one of customer needed this support. So we implemented, contributed and upstreamed to U-Boot support for TPM 2.0. Our 32 patches patch series adding TPM 2.0 support was merged, with:

SPI TPMs compliant with the TCG TIS v2.0
Commands for U-Boot shell to do minimal operations (detailed below)
A test framework for regression detection
A sandbox TPM driver emulating a fake TPM

In details, our commits related to TPM support in U-Boot:

Details of U-Boot commands

Available commands for v2.0 TPMs in U-Boot are currently:

STARTUP
SELF TEST
CLEAR
PCR EXTEND
PCR READ
GET CAPABILITY
DICTIONARY ATTACK LOCK RESET
DICTIONARY ATTACK CHANGE PARAMETERS
HIERARCHY CHANGE AUTH

With this set of functions, minimal handling is possible with the following sequence.

First, the TPM stack in U-Boot must be initialized with:

> tpm init

Then, the STARTUP command must be sent.

> tpm startup TPM2_SU_CLEAR

To enable full TPM capabilities, one must request to continue the self tests (or do them all again).

> tpm self_test full > tpm self_test continue

This is enough to pursue measured boot as one just need to extend the PCR as needed, giving 1/ the PCR number and 2/ the address where the digest is stored:

> tpm pcr_extend 0 0x4000000

Reading of the extended value is of course possible with:

> tpm pcr_read 0 0x4000000

Managing passwords is about limiting some commands to be sent without previous authentication. This is also possible with the minimum set of commands recently committed, and there are two ways of implementing it. One is quite complicated and features the use of a token together with cryptographic derivations at each exchange. Another solution, less invasive, is to use a single password. Changing passwords was previously done with a single TAKE OWNERSHIP command, while today a CLEAR must precede a CHANGE AUTH. Each of them may act upon different hierarchies. Hierarchies are some kind of authority level and do not act upon the same commands. For the example, let’s use the LOCKOUT hierarchy: the locking mechanism blocking the TPM for a given amount of time after a number of failed authentications, to mitigate dictionary attacks.

> tpm clear TPM2_RH_LOCKOUT [<pw>] > tpm change_auth TPM2_RH_LOCKOUT <new_pw> [<old_pw>]

Drawback of this implementation: as opposed to the token/hash solution, there is no protection against packet replay.

Please note that a CLEAR does much more than resetting passwords, it entirely resets the whole TPM configuration.

Finally, Dictionary Attack Mitigation (DAM) parameters can also be changed. It is possible to reset the failure counter (aka. the maximum number of attempts before lockout) as well as to disable the lockout entirely. It is possible to check the parameters have been correctly applied.

> tpm dam_reset [<pw>] > tpm dam_parameters 0xffff 1 0 [<pw>] > tpm get_capability 0x0006 0x020e 0x4000000 4

In the above example, the DAM parameters are reset, then the maximum number of tries before lockout is set to 0xffff, the delay before decrementing the failure counter by 1 and the lockout is entirely disabled. These parameters are for testing purpose. The third command is explained in the specification but basically retrieves 4 values starting at capability 0x6, property index 0x20e. It will display first the failure counter, followed by the three parameters previously changed.

Limitation

Although TPMs are meant to be black boxes, U-Boot current support is too light to really protect against replay attacks as one could spoof the bus and resend the exact same packets after taking ownership of the platform in order to get these secrets out. Additional developments are needed in U-Boot to protect against these attacks. Additionally, even with this extra security level, all the above logic is only safe when used in the context of a secure boot environment.

Conclusion

Thanks to this work from Bootlin, U-Boot has basic support for TPM 2.0 devices connected over SPI. Do not hesitate to contact us if you need support or help around TPM 2.0 support, either in U-Boot or Linux.

Configuring ALSA controls from an application

ALSA logo A common task when handling audio on Linux is the need to modify the configuration of the sound card, for example, adjusting the output volume or selecting the capture channels. On an embedded system, it can be enough to simply set the controls once using alsamixer or amixer and then save the configuration with alsactl store. This saves the driver state to the configuration file which, by default, is /var/lib/alsa/asound.state. Once done, this file can be included in the build system and shipped with the root filesystem. Usual distributions already include a script that will invoke alsactl at boot time to restore the settings. If it is not the case, then it is simply a matter of calling alsactl restore.

However, defining a static configuration may not be enough. For example, some codecs have advanced routing features allowing to route the audio channels to different outputs and the application may want to decide at runtime where the audio is going.

Instead of invoking amixer using system(3), even if it is not straightforward, it is possible to directly use the alsa-lib API to set controls.

Let’s start with some required includes:

#include <stdio.h>
#include <alsa/asoundlib.h>

alsa/asoundlib.h is the header that is of interest here as it is where the ALSA API lies. Then we define an id lookup function, which is actually the tricky part. Each control has a unique identifier and to be able to manipulate controls, it is necessary to find this unique identifier. In our sample application, we will be using the control name to do the lookup.

int lookup_id(snd_ctl_elem_id_t *id, snd_ctl_t *handle)
{
	int err;
	snd_ctl_elem_info_t *info;
	snd_ctl_elem_info_alloca(&info);

	snd_ctl_elem_info_set_id(info, id);
	if ((err = snd_ctl_elem_info(handle, info)) < 0) {
		fprintf(stderr, "Cannot find the given element from card\n");
		return err;
	}
	snd_ctl_elem_info_get_id(info, id);

	return 0;
}

This function allocates a snd_ctl_elem_info_t, sets its current id to the one passed as the first argument. At this point, the id only includes the control interface type and its name but not its unique id. The snd_ctl_elem_info() function looks up for the element on the sound card whose handle has been passed as the second argument. Then snd_ctl_elem_info_get_id() updates the id with the now completely filled id.

Then the controls can be modified as follows:

int main(int argc, char *argv[])
{
	int err;
	snd_ctl_t *handle;
	snd_ctl_elem_id_t *id;
	snd_ctl_elem_value_t *value;
	snd_ctl_elem_id_alloca(&id);
	snd_ctl_elem_value_alloca(&value);

This declares and allocates the necessary variables. Allocations are done using alloca so it is not necessary to free them as long as the function exits at some point.

	if ((err = snd_ctl_open(&handle, "hw:0", 0)) < 0) {
		fprintf(stderr, "Card open error: %s\n", snd_strerror(err));
		return err;
	}

Get a handle on the sound card, in this case, hw:0 which is the first sound card in the system.

	snd_ctl_elem_id_set_interface(id, SND_CTL_ELEM_IFACE_MIXER);
	snd_ctl_elem_id_set_name(id, "Headphone Playback Volume");
	if (err = lookup_id(id, handle))
		return err;

This sets the interface type and name of the control we want to modify and then call the lookup function.

	snd_ctl_elem_value_set_id(value, id);
	snd_ctl_elem_value_set_integer(value, 0, 55);
	snd_ctl_elem_value_set_integer(value, 1, 77);

	if ((err = snd_ctl_elem_write(handle, value)) < 0) {
		fprintf(stderr, "Control element write error: %s\n",
			snd_strerror(err));
		return err;
	}

Now, this changes the value of the control. snd_ctl_elem_value_set_id() sets the id of the control to be changed then snd_ctl_elem_value_set_integer() sets the actual value. There are multiple calls because this control has multiple members (in this case, left and right channels). Finally, snd_ctl_elem_write() commits the value.

Note that snd_ctl_elem_value_set_integer() is called directly because we know this control is an integer but it is actually possible to query what kind of value should be used using snd_ctl_elem_info_get_type() on the snd_ctl_elem_info_t. The scale of the integer is also device specific and can be retrieved with the snd_ctl_elem_info_get_min(), snd_ctl_elem_info_get_max() and snd_ctl_elem_info_get_step() helpers.

	snd_ctl_elem_id_clear(id);
	snd_ctl_elem_id_set_interface(id, SND_CTL_ELEM_IFACE_MIXER);
	snd_ctl_elem_id_set_name(id, "Headphone Playback Switch");
	if (err = lookup_id(id, handle))
		return err;

	snd_ctl_elem_value_clear(value);
	snd_ctl_elem_value_set_id(value, id);
	snd_ctl_elem_value_set_boolean(value, 1, 1);

	if ((err = snd_ctl_elem_write(handle, value)) < 0) {
		fprintf(stderr, "Control element write error: %s\n",
			snd_strerror(err));
		return err;
	}

This unmutes the right channel of Headphone playback, this time it is a boolean. The other common kind of element is SND_CTL_ELEM_TYPE_ENUMERATED for enumerated contents. This is used for channel muxing or selecting de-emphasis values for example. snd_ctl_elem_value_set_enumerated() has to be used to set the selected item.

	return 0;
}

This concludes this simple example and should be enough to get you started writing smarter applications that don't rely on external program to configure the sound card controls.

New feature highlights in Elixir Cross Referencer v2.0 and v2.1

The 2.1 release of the Elixir Cross Referencer is now live on https://elixir.bootlin.com/.

Development of new features has accelerated in the recent months, thanks to the contributions from Tamir Carmeli (Github), Chris White (Github) and Maxime Chrétien (Github), who was hired at Bootlin as an intern. I am going to describe the most important new features from such contributors, but the three of them actually made many smaller contributions to many aspects of Elixir.

So, here are the important new features you can now find in Elixir…

Support for symbol documentation

Thanks to Chris White, when you search for a function, you can now see where it is documented, at least when it is done in the Linux kernel way, extracting documentation from comments in the sources.

This way, when documentation is available, you can immediately know the meaning and expected values of the parameters of a given function and its return value.

Support for Kconfig symbols

Maxime Chrétien has extended Elixir to support kernel configuration parameters. Actually, he contributed a new parser to the universal-ctags project to do so. This way, you can explore C sources and Kconfig files and find the declarations and uses of kernel parameters:

Now, every time we mention a kernel configuration parameter in our free training materials, we can provide an Elixir link to them. Here is an example for CONFIG_SQUASHFS. Don’t hesitate to use such links in your documents and e-mails about the Linux kernel!

Note that you also have Kconfig symbol links in defconfig files, allowing to understand non-default kernel configuration settings for a given SoC family or board. See this example.

Support for Device Tree aliases

Maxime Chrétien also extended Elixir to support Device Tree labels. This way, when you explore a Device Tree source file and see a reference (phandle) to such a label, you can easily find where it’s defined and what the default properties of the corresponding node are.

Following such extensions to Elixir to support new scopes for symbols, we extended the interface to allow to make searches for symbols either in specific contexts (C, Kconfig or DT), or in all contexts. In most of the cases, a single context will suffice, but we’re anyway offering a mode to perform searches in all contexts at the same time:

Support for Device Tree compatible strings

v2.1 of the Elixir Cross Referencer also adds support for Device Tree compatible strings, also contributed by Maxime Chrétien. When browsing Device Tree files, you can instantly find which drivers drivers can be bound to the corresponding devices, which properties such drivers require from such devices (as specified in the Device Tree bindings), and other Device Tree files using the same compatible string.

Symbol auto-completion in the search dialog

Elixir Cross Referencer v2.1 also features symbol search autocompletion, another capability implemented by Maxime Chrétien. This makes it easy to find Linux kernel function names while programming!

Pygments support for Device Tree source files

In addition to this improvement for Device Tree indexing, Maxime has also contributed a new lexer to the Pygments project, which is used by Elixir for HTML syntax highlighting for all types of files.

REST API

Thanks to Tamir Carmeli, it’s now possible to access the Elixir database through a new REST API, instead of going through its web interface. This way, you can make Elixir queries from data processing scripts, for example.

Testing infrastructure

Chris White has implemented an extensive testing infrastructure to quickly detect regressions before the corresponding changes are applied to production servers. Tamir Carmeli also contributed a test system for the REST API.Thanks to this, each new commit is tested on Travis CI.

Parallel build for the Elixir database

Maxime Chrétien has managed to multithread indexing work. While Maxime is still exploring further options, this has already allowed to divide indexing time by an approximate factor of two.

Limitations

The main limitation of the Elixir Cross Referencer is that it doesn’t try to match any context. For example, the actual implementation of a symbol may depend on the value of a configuration option. When browsing a source file, Elixir also always links to all possibilities for each symbol (there can be multiple unrelated instances of the same symbol across the kernel sources) instead of narrowing the search to the definition corresponding to the currently browsed file. Elixir leave it up to the human user to find out which result matches the context of origin.

This is particularly true for Device Tree symbols that have unrelated occurrences everywhere in the source tree, such as i2c0. In a distant future, we may be able to restrict the search to the context of an originating file.

Contribute

If you have new ideas for extending the Elixir Cross Referencer to support more features and use cases, please share them on the project’s bug tracker. If they are feasible without compromising the relative simplicity and scalability of our engine, we will be happy to implement them!

Practical usage of timer counters in Linux, illustrated on Microchip platforms

Virtually all micro-controllers and micro-processors provide some form of timer counters. In the context of Linux, they are always used for kernel timers, but they can also sometimes be used for PWMs, or input capture devices able to measure external signals such as rotary encoders. In this blog post, we would like to illustrate how Linux can take advantage of such timer counters, by taking the example of the Microchip Timer Counter Block, and depict how its various features fit into existing Linux kernel subsystems.

Hardware overview

On Microchip ARM processors, the TCB (Timer Counter Block) module is a set of three independent, 16 or 32-bits, channels as illustrated in this simplified block diagram:

Microchip TCB

The exact number of TCB modules depends on which Microchip processor you’re using, this Microchip brochure gives the details. Most products have 6 or 9 timer counter channels available, which are grouped into two or three TCB modules, each having 3 channels.

Each TC channel can independently select a clock source for its counter:

Internal Clock: sourced from either the system bus clock (often the highest rated one with pre-defined divisors), the slow clock (crystal oscillator) and for the Microchip SAMA5D2 and SAM9X60 SOC series there is even a programmable generic clock source (GCLK) specific to each peripheral.
External Clock: based on the three available external input pins: TCLK0, TCLK1 or TCLK2.

The clock source choice should obviously be made depending on the accuracy required by the application.

The module has many functions declined in three different modes:

The input capture mode is useful to capture input signals (e.g measure a signal period) through one of the six input pins (TIOAx/TIOBx) connected to each TC module. Each pin can act as trigger source for the counter and two latch register RA/RB can be loaded and compared with a third RC register. This mode is highly configurable with lots of feature to fine tune the capture (subsambling, clock inverting, interrupt, etc.).
The waveform mode which provide the core function of TCs as all channels could be used as three independent free-running counters and it is also a mode used to generate PWM signals which gives an extra pool of PWMs
The quadrature mode is only supported on the first TC module TCB0 and two (or three) channels are required, channel 0 will decode the speed or position on TIOA0/TIOB0, channel 1 (with TIOB1 input) can be configured to store the revolution or number of rotation. Finally if speed measurement is configured the channel 2 shall define a speed time base.Something important to note is that this mode actually is only part of Microchip SAMA5 and SAM9x60 family SOCs.

Software overview

On the software side in the Linux kernel, the different functionalities offered by the Microchip TCBs will be handled by three different subsystems, which we cover in the following sections.

Clocksource susbsystem

This subsystem is the core target of any TC module as it allows the kernel to keep track of the time passing (clocksource) and program timer interrupts (clockevents). The Microchip TCB has its upstream implementation in drivers/clocksource/timer-atmel-tcb.c that uses the waveform mode to provide both clock source and clock events. The older Microchip platforms have only 16-bit timer counters, in which case two channels are needed to implement the clocksource support. Newer Microchip platforms have 32-bit timer counters, and in this case only one channel is needed to implement clocksource. In both cases, only one channel is necessary to implement clock events.

In the timer-atmel-tcb driver:

The clocksource is registered using a struct clocksource structure which mainly provides a ->read() callback to read the current cycle count
The clockevents is registered using a struct tc_clkevt_device structure, which provides callbacks to set the date of the next timer event (->set_next_event()) and to change the mode of the timer (->set_state_shutdown(), ->set_state_periodic(), ->set_state_oneshot()).

From a user-space point of view, the clocksource and clockevents subsystems are not directly visible, but they are of course used whenever one uses time or timer related functions. The available clockevents are visible in /sys/bus/clockevents and the available clocksources are visible in /sys/bus/clocksource. The file /proc/timer_list also gives a lot of information about the timers that are pending, and the available timer devices on the platform.

PWM subsystem

This subsystem is useful for many applications (fan control, leds, beepers etc.), and provides both an in-kernel APIs for other kernel drivers to use, as well as a user-space API in /sys/class/pwm, documented at https://www.kernel.org/doc/html/latest/driver-api/pwm.html.

As far as PWM functionality is concerned, the Microchip TCB module is supported by the driver at drivers/pwm/pwm-atmel-tcb.c, which also uses the waveform mode. In this mode both channels pins TIOAx/TIOBx can be used to output PWM signals which allows to provide up to 6 PWM outputs per TCB. On a high-level, this PWM driver registers a struct pwm_ops structure that provides pointers to the important callback to setup and configure PWM outputs.

The current diver implementation has the drawback of using an entire TCB module as a PWM chip: it is not possible to use 1 channel of a TCB module for PWM, and the other channels of the same TCB module for other functionality. On platforms that have only two TCB modules, this means that the first TCB module is typically used for the clockevents/clocksource functionality described previously, and therefore only the second TCB module can be used for PWM.

We are however working on lifting this limitation: Bootlin engineer Alexandre Belloni has a patch series at https://github.com/alexandrebelloni/linux/commits/at91-tcb to address this. We aim at submitting this patch series in the near future.

Thanks to the changes of this patch series, we will be able to use PWM channels as follows:

Configuring a 100KHz PWM signal on TIOAx:

# echo 0 > /sys/class/pwm/pwmchip0/export
# echo 10000 > /sys/class/pwm/pwmchip0/pwm0/period
# echo 1000 > /sys/class/pwm/pwmchip0/pwm0/duty_cycle
# echo 1 > /sys/class/pwm/pwmchip0/pwm0/enable

Configuring a 100KHz PWM signal on TIOBx:

# echo 1 > /sys/class/pwm/pwmchip0/export
# echo 10000 > /sys/class/pwm/pwmchip0/pwm1/period
# echo 1000 > /sys/class/pwm/pwmchip0/pwm1/duty_cycle
# echo 1 > /sys/class/pwm/pwmchip0/pwm1/enable

One must note that both PWM signals of the same channel will share the same period even though we set it twice here as it is required by the PWM framework. The Microchip TCB takes the period from the RC register and RA/RB respectively for TIOAx/TIOBx duty cycles.

Counter subsystem

The Linux kernel counter subsystem, located in drivers/counter/ is much newer than the clocksource, clockevents and PWM subsystems described previously. Indeed, it is only in 2019 that it was added to the Linux kernel, and so far it contains only 5 drivers. This subsystem abstracts a timer counter as three entities: a Count that stores the value incremented or decremented from a measured input Signal and a Synapse that will provide edge-based trigger source.

This subsystem was therefore very relevant to expose the input capture and quadrature decoder modes of the Microchip TCB module, and we recently submitted a patch series that implements a counter driver for the Microchip TCB module. The driver instantiates and registers a struct counter_device structure, with a variety of sub-structures and callbacks that allow the core counter subsystem to use the Microchip TCB module and expose its input capture and quadrature decoder features to user-space.

The current user-space interface of the counter subsystem works over sysfs and is documented at https://www.kernel.org/doc/html/latest/driver-api/generic-counter.html. For example, to read the position of a rotary encoder connected to a TCB module configured as a quadradure decoder, one would do:

# cd /sys/bus/counter/devices/counter0/count0/                    
# echo "quadrature x4" > function                                 
# cat count
0

However, when the device connected to the TCB is a rotary encoder, it would be much more useful to have it exposed to user-space as a standard input device so that all existing graphical libraries and frameworks can automatically make use of it. Rotary encoders connected to GPIOs can already be exposed to user-space as input devices using the rotary_encoder driver. Our goal was to achieve the same, but with a rotary encoder connected to a quadrature decoder handled by the counter subsystem. To this end, we submitted a second patch series, which:

Extends the counter subsystem with an in-kernel API, so that counter devices can not only be used from user-space using sysfs, but also from other kernel subsystems. This is very much like the IIO in-kernel API, which is used in a variety of other kernel subsystems that need access to IIO devices.
A new rotary-encoder-counter driver, which implements an input device based on a counter device configured in quadrature decoder mode.

Thanks to this driver, we get an input device for our rotary encoder, which can for example be tested using evtest to decode the input events that occur when rotating the rotary encoder:

# evtest /dev/input/event1                                        
Input driver version is 1.0.1                                     
Input device ID: bus 0x19 vendor 0x0 product 0x0 version 0x0      
Input device name: "rotary@0"                                     
Supported events:                                                 
Event type 0 (EV_SYN)                                           
Event type 2 (EV_REL)                                           
  Event code 0 (REL_X)                                          
Properties:                                                       
Testing ... (interrupt to exit)                                   
Event: time 1325392910.906948, type 2 (EV_REL), code 0 (REL_X), value 2
Event: time 1325392910.906948, -------------- SYN_REPORT ------------
Event: time 1325392911.416973, type 2 (EV_REL), code 0 (REL_X), value 1
Event: time 1325392911.416973, -------------- SYN_REPORT ------------
Event: time 1325392913.456956, type 2 (EV_REL), code 0 (REL_X), value 2
Event: time 1325392913.456956, -------------- SYN_REPORT ------------
Event: time 1325392916.006937, type 2 (EV_REL), code 0 (REL_X), value 1
Event: time 1325392916.006937, -------------- SYN_REPORT ------------
Event: time 1325392919.066977, type 2 (EV_REL), code 0 (REL_X), value 1
Event: time 1325392919.066977, -------------- SYN_REPORT ------------
Event: time 1325392919.576988, type 2 (EV_REL), code 0 (REL_X), value 2
Event: time 1325392919.576988, -------------- SYN_REPORT ------------

Device Tree

From a Device Tree point of view, the representation is a bit more complicated than for many other hardware blocks, due to the multiple features offered by timer counters. First of all, in the .dtsi file describing the system-on-chip, we have a node that describes each TCB module. For example, for the Microchip SAMA5D2 system-on-chip, which has two TCB modules, we have in arch/arm/boot/dts/sama5d2.dtsi:

tcb0: timer@f800c000 {
	compatible = "atmel,at91sam9x5-tcb", "simple-mfd", "syscon";
	#address-cells = <1>;
	#size-cells = <0>;
	reg = <0xf800c000 0x100>;
	interrupts = <35 IRQ_TYPE_LEVEL_HIGH 0>;
	clocks = <&pmc PMC_TYPE_PERIPHERAL 35>, <&clk32k>;
	clock-names = "t0_clk", "slow_clk";
};

tcb1: timer@f8010000 {
	compatible = "atmel,at91sam9x5-tcb", "simple-mfd", "syscon";
	#address-cells = <1>;
	#size-cells = <0>;
	reg = <0xf8010000 0x100>;
	interrupts = <36 IRQ_TYPE_LEVEL_HIGH 0>;
	clocks = <&pmc PMC_TYPE_PERIPHERAL 36>, <&clk32k>;
	clock-names = "t0_clk", "slow_clk";
};

This however does not define how each TCB module and each channel is going to be used. This happens at the board level, by adding sub-nodes to the appropriate TCB module node.

First, each board needs to at least define which TCB module and channels should be used for the clocksource/clockevents. For example, arch/arm/boot/dts/at91-sama5d2_xplained.dts has:

tcb0: timer@f800c000 {
	timer0: timer@0 {
		compatible = "atmel,tcb-timer";
		reg = <0>;
	};

	timer1: timer@1 {
		compatible = "atmel,tcb-timer";
		reg = <1>;
	};
};

As can be seen in this example, the timer@0 and timer@1 node are sub-nodes of the timer@f800c000 node. The SAMA5D2 has 32-bit timer counters, so only one channel is needed for the clocksource, and another channel is needed for clock events. Older platforms such as AT91SAM9260 would need:

tcb0: timer@fffa0000 {
	timer@0 {
		compatible = "atmel,tcb-timer";
		reg = <0>, <1>;
	};

	timer@2 {
		compatible = "atmel,tcb-timer";
		reg = <2>;
	};
};

Where the first instance of atmel,tcb-timer uses two channels: on AT91SAM9260, each channel is only 16-bit, so we need two channels for clocksource. This is why we have reg = <0>, <1> in the first sub-node.

Now, to use some TCB channels as PWMs, with the new patch series proposed by Alexandre, one would for example use:

&tcb1 {
	tcb1_pwm0: pwm@0 {
		compatible = "atmel,tcb-pwm";
		#pwm-cells = <3>;
		reg = <0>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_tcb1_tioa0 &pinctrl_tcb1_tiob0>;
	};

	tcb1_pwm1: pwm@1 {
		compatible = "atmel,tcb-pwm";
		#pwm-cells = <3>;
		reg = <1>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_tcb1_tioa1>;
	};
};

To use the two first channels of TCB1 as PWMs. This would provide two separate PWM devices visible to user-space, and to other kernel drivers.

Otherwise, to use a TCB as a quadrature decoder, one would use the following piece of Device Tree. Note that we must use the TCB0 module as it is the only one that supports quadrature decoding. This means that the atmel,tcb-timer nodes for clocksource/clockevents support have to use TCB1.

&tcb0 {
	qdec: counter@0 {
		compatible = "atmel,tcb-capture";
		reg = <0>, <1>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_qdec_default>;
	};
};

A quadrature decoder needs two channels, hence the reg = <0>, <1>.

And if in addition you would like to setup an input device for the rotary encoder connected to the quadrature decoder, you can add:

rotary@0 {
	compatible = "rotary-encoder-counter";
	counter = <&qdec>;
	qdec-mode = <7>;
	poll-interval = <50>;
};

Note that this is not a sub-node of the TCB node, the rotary encoder needs to be described at the top-level of the Device Tree, and has a reference to the TCB channels used as quadrature decoder by means of the counter = <&qdec>; phandle.

Of course, these different capabilities can be combined. For example, you could use the first two channels of TCB0 to implement a quadrature decoder using the counter subsystem, and the third channel of the same TCB module for a PWM. TCB1 is used for clocksource/clockevents. In this case, the Device Tree would look like this:

&tcb0 {
	counter@0 {
		compatible = "atmel,tcb-capture";
		reg = <0>, <1>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_qdec_default>;
	};

	pwm@2 {
		compatible = "atmel,tcb-pwm";
		#pwm-cells = <3>;
		reg = <2>;
		pinctrl-names = "default";
		pinctrl-0 = <&pinctrl_tcb1_tioa1>;
	};
};

&tcb1 {
	timer@0 {
		compatible = "atmel,tcb-timer";
		reg = <0>, <1>;
	};

	timer@2 {
		compatible = "atmel,tcb-timer";
		reg = <2>;
	};
};

Conclusion

We hope that this blog post was useful to understand how Linux handles timer counters, and what are the Linux kernel subsystems that are involved. Even though we used the Microchip TCB to illustrate our discussion, the concepts all apply to the timer counters of other platforms that would offer similar features.

Audio multi-channel routing and mixing using alsalib

Recently, one of our customers designing an embedded Linux system with specific audio needs had a use case where they had a sound card with more than one audio channel, and they needed to separate individual channels so that they can be used by different applications. This is a fairly common use case, we would like to share in this blog post how we achieved this, for both input and output audio channels.

The most common use case would be separating a 4 or 8-channel sound card in multiple stereo PCM devices. For this, alsa-lib, the userspace API interface to the ALSA drivers, provides PCM plugins. Those plugins are configured through configuration files that are usually known to be /etc/asound.conf or $(HOME)/.asoundrc. However, through the configuration of /usr/share/alsa/alsa.conf, it is also possible, and in fact recommended to use a card-specific configuration, named /usr/share/alsa/cards/<card_name>.conf.

The syntax of this configuration is documented in the alsa-lib configuration documentation, and the most interesting part of the documentation for our purpose is the pcm plugin documentation.

Audio inputs

For example, let’s say we have a 4-channel input sound card, which we want to split in 2 mono inputs and one stereo input, as follows:

Audio input example

In the ALSA configuration file, we start by defining the input pcm:

pcm_slave.ins {
	pcm "hw:0,1"
	rate 44100
	channels 4
}

pcm "hw:0,1" refers to the the second subdevice of the first sound card present in the system. In our case, this is the capture device. rate and channels specify the parameters of the stream we want to set up for the device. It is not strictly necessary but this allows to enable automatic sample rate or size conversion if this is desired.

Then we can split the inputs:

pcm.mic0 {
	type dsnoop
	ipc_key 12342
	slave ins
	bindings.0 0
}

pcm.mic1 {
	type plug
	slave.pcm {
		type dsnoop
		ipc_key 12342
		slave ins
		bindings.0 1
	}
}

pcm.mic2 {
	type dsnoop
	ipc_key 12342
	slave ins
	bindings.0 2
	bindings.1 3
}

mic0 is of type dsnoop, this is the plugin splitting capture PCMs. The ipc_key is an integer that has to be unique: it is used internally to share buffers. slave indicates the underlying PCM that will be split, it refers to the PCM device we have defined before, with the name ins. Finally, bindings is an array mapping the PCM channels to its slave channels. This is why mic0 and mic1, which are mono inputs, both only use bindings.0, while mic2 being stereo has both bindings.0 and bindings.1. Overall, mic0 will have channel 0 of our input PCM, mic1 will have channel 1 of our input PCM, and mic2 will have channels 2 and 3 of our input PCM.

The final interesting thing in this example is the difference between mic0 and mic1. While mic0 and mic2 will not do any conversion on their stream and pass it as is to the slave pcm, mic1 is using the automatic conversion plugin, plug. So whatever type of stream will be requested by the application, what is provided by the sound card will be converted to the correct format and rate. This conversion is done in software and so runs on the CPU, which is usually something that should be avoided on an embedded system.

Also, note that the channel splitting happens at the dsnoop level. Doing it at an upper level would mean that the 4 channels would be copied before being split. For example the following configuration would be a mistake:

pcm.dsnoop {
    type dsnoop
    ipc_key 512
    slave {
        pcm "hw:0,0"
        rate 44100
    }
}

pcm.mic0 {
    type plug
    slave dsnoop
    ttable.0.0 1
}

pcm.mic1 {
    type plug
    slave dsnoop
    ttable.0.1 1
}

Audio outputs

For this example, let’s say we have a 6-channel output that we want to split in 2 mono outputs and 2 stereo outputs:

Audio output example

As before, let’s define the slave PCM for convenience:

pcm_slave.outs {
	pcm "hw:0,0"
	rate 44100
	channels 6
}

Now, for the split:

pcm.out0 {
	type dshare
	ipc_key 4242
	slave outs
	bindings.0 0
}

pcm.out1 {
	type plug
	slave.pcm {
		type dshare
		ipc_key 4242
		slave outs
		bindings.0 1
	}
}

pcm.out2 {
	type dshare
	ipc_key 4242
	slave outs
	bindings.0 2
	bindings.1 3
}

pcm.out3 {
	type dmix
	ipc_key 4242
	slave outs
	bindings.0 4
	bindings.1 5
}

out0 is of type dshare. While usually dmix is presented as the reverse of dsnoop, dshare is more efficient as it simply gives exclusive access to channels instead of potentially software mixing multiple streams into one. Again, the difference can be significant in terms of CPU utilization in the embedded space. Then, nothing new compared to the audio input example before:

out1 is allowing sample format and rate conversion
out2 is stereo
out3 is stereo and allows multiple concurrent users that will be mixed together as it is of type dmix

A common mistake here would be to use the route plugin on top of dmix to split the streams: this would first transform the mono or stereo stream in 6-channel streams and then mix them all together. All these operations would be costly in CPU utilization while dshare is basically free.

Duplicating streams

Another common use case is trying to copy the same PCM stream to multiple outputs. For example, we have a mono stream, which we want to duplicate into a stereo stream, and then feed this stereo stream to specific channels of a hardware device. This can be achieved using the following configuration snippet:

pcm.out4 {
	type route;
	slave.pcm {
	type dshare
		ipc_key 4242
		slave outs
		bindings.0 0
		bindings.1 5
	}
	ttable.0.0 1;
	ttable.0.1 1;
}

The route plugin allows to duplicate the mono stream into a stereo stream, using the ttable property. Then, the dshare plugin is used to get the first channel of this stereo stream and send it to the hardware first channel (bindings.0 0), while sending the second channel of the stereo stream to the hardware sixth channel (bindings.1 5).

Conclusion

When properly used, the dsnoop, dshare and dmix plugins can be very efficient. In our case, simply rewriting the alsalib configuration on an i.MX6 based system with a 16-channel sound card dropped the CPU utilization from 97% to 1-3%, leaving plenty of CPU time to run further audio processing and other applications.

Bootlin toolchains updated, edition 2020.02

Bootlin provides a large number of ready-to-use pre-built cross-compilation toolchains at toolchains.bootlin.com. We announced the service in June 2017, and released multiple versions of the toolchains up to 2018.11.

After a long pause, we are happy to announce that we have released a new set of toolchains, built using Buildroot 2020.02, and therefore labelled as 2020.02, even though they have been published in April. They are available for 38 CPU architectures or architecture variants, supporting the glibc, uclibc-ng and musl C libraries when possible.

For each toolchain, we offer two variants: one called stable which uses “proven” versions of gcc, binutils and gdb, and one called bleeding edge which uses the latest version of gcc, binutils and gdb.

Overall, these 2020.02 toolchains use:

gcc 8.4.0 for stable, 9.3.0 for bleeding edge
binutils 2.32 for stable, 2.33.1 for bleeding edge
gdb 8.2.1 for stable, 8.3 for bleeding edge
linux headers 4.4.215 for stable, 4.19.107 for bleeding edge
glibc 2.30
uclibc-ng 1.0.32
musl 1.1.24

In total, that’s 154 different toolchains that we are providing! If you are using these toolchains and face any issue, or want to request some additional change of feature, do not hesitate to contact us through the corresponding Github project. Also, I’d like to thank Romain Naour, from Smile for his contributions to this project.