Bootlin contributions to Linux 5.13

After finally publishing about our Linux 5.12 contributions and even though Linux 5.14 was just released yesterday, it’s hopefully still time to talk about our contributions to Linux 5.13. Check out the LWN articles about the merge window to get the bigger picture about this release: part 1 and part 2.

In terms of Bootlin contributions, this was a much more quiet release than Linux 5.12, with just 28 contributions. The main highlights are:

  • The usual round of RTC subsystem updates from its maintainer Alexandre Belloni
  • A large amount of improvements in the MTD subsystem by its co-maintainer Miquèl Raynal, continuing his effort to improve the ECC handling in the MTD subsystem. See Miquèl’s talk at ELCE 2020 for more details on this effort: slides and video.
  • A small fix for an annoying regression in the musb USB gadget controller driver.

Even though we contributed just 28 commits to this release, as maintainers, some of us also reviewed and merged code from other contributors: Miquèl Raynal as the MTD co-maintainer merged 63 patches, Alexandre Belloni merged 22 patches, and Grégory Clement 6 patches.

Here are the details of our contributions to Linux 5.13:

Bootlin contributions to Linux 5.12

Yes, Linux 5.13 was released yesterday, but we never published the blog post detailing our contributions to Linux 5.12, so let’s do this now! First of all the usual links to the excellent articles on the 5.12 merge window: part 1 and part 2. also published an article with Linux 5.12 development statistics, and two Bootlin engineers made their way to the statistics: Alexandre Belloni in the list of top contributors by number of changesets, with 69 commits, and Paul Kocialkowski in the list of top contributors by number of changed lines, with over 6000 lines changed.

Here are the highlights of our contributions:

  • Addition of a new driver for the Silvaco I3C master controller. This was contributed by Miquèl Raynal, who became the maintainer for this driver. Bootlin has pioneered support for I3C in Linux, by introducing the complete drivers/i3c subsystem a few years ago, together with the first controller driver, for a Cadence IP, see our blog post from 2018.
  • Addition of two new camera sensor drivers, one for the Omnivision OV5648 and another for the Omnivision OV8865. These were contributed by Paul Kocialkowski.
  • Implementation of mqprio support in the Marvell Ethernet controller driver mvneta, see this commit. As explained in the tc-mqprio man page, the MQPRIO qdisc is a simple queuing discipline that allows mapping traffic flows to hardware queue ranges using priorities and a configurable priority to traffic class mapping. This was contributed by Maxime Chevallier
  • Improvements in the IIO driver for the ms58xx family of sensors, contributed by Alexandre Belloni.
  • The final removal of the atmel_tclib code, which has been replaced by proper drivers for the TCB timers on Atmel/Microchip ARM platforms over the past few releases, also by Alexandre Belloni.
  • As usual, a large amount of fixes and improvements in the RTC subsystem, by its maintainer Alexandre Belloni.

Here is the detailed list of our contributions to this release:

Bootlin contributions to Linux 5.11

Linux 5.11 was released quite some time ago now, but it’s never too late to have a look at Bootlin contributions to this release. As usual, we recommend reading the LWN articles on the 5.11 merge window: part 1 and part2. Also of interest is the Kernelnewbies page for 5.11.

Here are the main highlights of our contributions:

  • Alexandre Belloni, as the maintainer of the RTC subsystem, continued making numerous improvements and fixes to RTC drivers
  • On the support for Microchip ARM platforms, Alexandre Belloni switched the PWM atmel-tcb driver to a new Device Tree binding and added SAMA5D2 support, he did some improvements to the IIO driver for the Microchip ADC, and continued to remove platform_data support from Microchip drivers as all platforms are now converted to the Device Tree.
  • Alexandre Belloni contributed a new Simple Audio Mux driver for the ALSA subsystem, which can be used to control simple audio multiplexers driven using GPIOs, that allows to select which of their input line is connected to the output line.
  • Grégory Clement added support for several new MIPS platforms from Microchip: Luton, Serval and Jaguar2. All those platforms include a MIPS core, a few peripherals and more importantly an Ethernet switch. For now the support only includes the base platform support, but we are working on the switchdev driver for the Ethernet switch.
  • Miquèl Raynal, maintainer of the NAND subsystem and co-maintainer of the MTD subsystem, contributed numerous changes to the ECC support in the MTD subsystem, making it more generic so that it can be used not just for parallel NAND flashes, but also SPI NAND flashes. For more details, see the talk from Miquèl Raynal on this topic.

In addition to those 95 patches that we authored and contributed, several Bootlin engineers being maintainers of different subsystems of the Linux kernel reviewed and merged patches from other contributors:

  • Miquèl Raynal, as the NAND maintainer and MTD co-maintainer, reviewed and merged 67 patches from other contributors
  • Alexandre Belloni, as the RTC, I3C and Microchip ARM/MIPS platforms maintainer, reviewed and merged 47 patches from other contributors
  • Grégory Clement, as the Marvell EBU platform co-maintainer, reviewed and merged 33 patches from other contributors

Here is the detailed list of our contributions to Linux 5.11:

Large Page Support for NAS systems on 32 bit ARM

The need for large page support on 32 bit ARM

Storage space has become more and more affordable to a point that it is now possible to have multiple hard drives of dozens of terabytes in a single consumer-grade device. With a few 10 TiB hard drives and thanks to RAID technology, storage capacities that exceed 16 or 32 TiB can easily be reached and at a relatively low cost.

However, a number of consumer NAS systems used in the field today are still based on 32 bit ARM processors. The problem is that, with Linux on a 32 bit system, it’s only possible to address up to 16 TiB of storage space. This is still true even with the ext4 filesystem, even though it uses 64 bit pointers.

We were lucky to have a customer contracting us to update older Large Page Support patches to a recent version of the Linux kernel. This set of patches are one way of overcoming this 16 TiB limitation for ARM 32-bit systems. Since updating this patch series was a non trivial task, we are happy to share the results of our efforts with the community, both through this blog post and through a patch series we posted to the Linux ARM kernel mailing list: ARM: Add support for large kernel page (from 8K to 64K).

How Large Page Support works

The 16 TiB limitation comes from the use of page->index which is a pgoff_t offset type corresponding to unsigned long. This limits us to a 32-bit page offsets, so with 4 KiB physical pages, we end up with a maximum of 16 TiB. A way to address this limitation is to use larger physical pages. We can reach 32 TiB with 8 KiB pages, 64 TiB with 16 KiB pages and up to 256 TiB with 64 KiB pages.

Before going further, the ARM32 Page Tables article from Linus Walleij is a good reference to understand how the Linux kernel deals with ARM32 page tables. In our case, we are only going to cover the non LPAE case. As explained there, the way the Linux kernel sees the page tables actually doesn’t match reality. First, the kernel deals with 4 levels of page tables while on hardware there are only 2 levels. In addition, while the ARM32 hardware stores only 256 PTEs in Page Tables, taking up only 1 KB, Linux optimizes things by storing in each 4 KB page two sets of 256 PTEs, and two sets of shadow PTEs that are used to store additional metadata needed by Linux about each page (such as the dirty and accessed/young bits). So, there is already some magic between what is presented to the Linux virtual memory management subsystem, and what is really programmed into the hardware page tables. To support large pages, the idea is to go further in this direction by emulating larger physical pages.

Our series (and especially patch 5: ARM: Add large kernel page support) proposes to pretend to have larger hardware pages. The ARM 32-bit architecture only supports 4 KiB or 64 KiB page sizes, but we would like to support intermediate values of 8 KiB, 16 KiB and 32 KiB as well. So what we do to support 8 KiB pages is that we tell Linux the hardware has 8 KiB pages, but in fact we simply use two consecutive 4 KiB pages at the hardware level that we manipulate and configure simultaneously. To support 16 KiB pages, we use 4 consecutive 4 KiB pages, for 32 KiB pages, we use 8 consecutive pages, etc. So really, we “emulate” having larger page sizes by grouping 2, 4 or 8 pages together. Adding this feature only required a few changes in the code, mainly dealing with ranges of pages every time we were dealing with a single page. Actually, most of the code in the series is about making it possible to modify the hard coded value of the hardware page size and fixing the assumptions associated to such a fixed value.

In addition to this emulated mechanism that we provide for 8 KiB, 16 KiB, 32 KiB and 64 KiB pages, we also added support for using real hardware 64 KiB pages as part of this patch series.

Overall the number of changes is very limited (271 lines added, 13 lines removed), and allows to use much larger storage devices. Here is the diffstat of the full patch series:

 arch/arm/include/asm/elf.h                  |  2 +-
 arch/arm/include/asm/fixmap.h               |  3 +-
 arch/arm/include/asm/page.h                 | 12 ++++
 arch/arm/include/asm/pgtable-2level-hwdef.h |  8 +++
 arch/arm/include/asm/pgtable-2level.h       |  6 +-
 arch/arm/include/asm/pgtable.h              |  4 ++
 arch/arm/include/asm/shmparam.h             |  4 ++
 arch/arm/include/asm/tlbflush.h             | 21 +++++-
 arch/arm/kernel/entry-common.S              | 13 ++++
 arch/arm/kernel/traps.c                     | 10 +++
 arch/arm/mm/Kconfig                         | 72 +++++++++++++++++++++
 arch/arm/mm/fault.c                         | 19 ++++++
 arch/arm/mm/mmu.c                           | 22 ++++++-
 arch/arm/mm/pgd.c                           |  2 +
 arch/arm/mm/proc-v7-2level.S                | 72 ++++++++++++++++++++-
 arch/arm/mm/tlb-v7.S                        | 14 +++-
 16 files changed, 271 insertions(+), 13 deletions(-)

This patch series is running in production now on some NAS devices from a very popular NAS brand.

Limitations and alternatives

The submission of our patch series is recent but this feature has actually been running for years on many NAS systems in the field. Our new series is based on the original patchset, with the purpose of submitting it to the mainline kernel community. However, there is little chance that it will ever be merged into the mainline kernel.

The main drawback of this approach are large pages themselves: as each file in the page cache uses at least one page, the memory wasted increases as the size of the pages increases. For this reason, Linus Torvalds was against similar series proposed in the past.

To show how much memory is wasted, Arnd Bergmann ran some numbers to measure the page cache overhead for a typical set of files (Linux 5.7 kernel sources) for 5 different page sizes:

Page size (KiB) 4 8 16 32 64
page cache usage (MiB) 1,023.26 1,209.54 1,628.39 2,557.31 4,550.88
factor over 4K pages 1.00x 1.18x 1.59x 2.50x 4.45x

We can see that while a factor of 1.18 is acceptable for 8 KiB pages, a 4.45 multiplier looks excessive with 64 KiB pages.

Actually, to make it possible to address large volumes on 32 bit ARM, another solution was pointed out during the review of our series. Instead of using larger pages which have an impact on the entire system, an alternative is to modify the way the filesystem addresses the memory by using 64 bits pgoff_t offsets. This has already been implemented in vendor kernels running in some NAS systems, but this has never been submitted to mainline developers.

Videos and slides from Bootlin talks at Embedded Linux Conference Europe 2020

The Embedded Linux Conference Europe took place online last week. While we definitely missed the experience of an in-person event, we strongly participated to this conference with no less than 7 talks on various topics showing Bootlin expertise in different fields: Linux kernel development in networking, multimedia and storage, but also build systems and tooling. We’re happy to be publishing now the slides and videos of our talks.

From the camera sensor to the user: the journey of a video frame, Maxime Chevallier

Download the slides: PDF, source.

OpenEmbedded and Yocto Project best practices, Alexandre Belloni

Download the slides: PDF, source.

Supporting hardware-accelerated video encoding with mainline Linux, Paul Kocialkowski

Download the slides: PDF, source.

Building embedded Debian/Ubuntu systems with ELBE, Köry Maincent

Download the slides: PDF, source.

Understand ECC support for NAND flash devices in Linux, Miquèl Raynal

Download the slides: PDF, source.

Using Visual Studio Code for Embedded Linux Development, Michael Opdenacker

Download the slides: PDF, source.

Precision Time Protocol (PTP) and packet timestamping in Linux, Antoine Ténart

Download the slides: PDF, source.