New training course: embedded Linux boot time optimization

For many embedded products, the issue of how much time it takes from power-on to the application being fully usable by the end-user is an important challenge. Bootlin has been providing its expertise and experience in this area to its customers for many years through numerous boot time optimization projects, and we have shared this knowledge through a number of talks at several conferences over the past years.

We are now happy to announce that we have a new training course Embedded Linux boot time optimization, open for public registration. This training course was already given to selected Bootlin customers and is now available for everyone.

Embedded Linux boot time optimization

The training course will be lead by Michael Opdenacker, Bootlin’s founder, and author of several publications on the topic of boot time optimization. The course is organized over 4 sessions of 4 hours, with a significant fraction of time spent on practical demonstrations showing on a real-life example the techniques to measure and reduce the boot time of an embedded Linux system.

As usual with Bootlin, the training materials are fully available: Agenda, Slides and Practical lab instructions.

Boot time optimization slide

Our first course open for public registration will take place from April 6th to April 9th, 2021, from 14:00 to 18:00 UTC+2 (Paris time) on each day. The session cost is 519 EUR if you take advantage of the early bird price available until March 9th. Otherwise, the regular rate is 619 EUR. You can register now for this course on Eventbrite.

Also, if you’re interested in organizing a dedicated session for your company, do not hesitate to contact us.

Online training courses in March/April 2021

Our online training courses of January are now all completed, and were again successful. So we’re happy to announce the next dates for our public training courses in March and April, for all our courses: Embedded Linux system development, Linux kernel driver development, Yocto and OpenEmbedded system development, Buildroot system development and Linux graphics stack.

It is worth mentioning that we now have an Early Bird price, which is valid up to 1 month before the course, so register early if you’re interested!

Type Dates Time Duration Cost and registration
Embedded Linux (agenda) Mar. 8, 9, 10, 11, 12, 15, 16, 2021
+ extra session on Mar. 17 if needed
14:00 – 18:00 (Paris, UTC+1) 28 h Early: 829 EUR*
Regular: 929 EUR*
Register
Linux kernel (agenda) Mar. 9, 10, 11, 12, 16, 17, 18, 2021 13:30 – 17:30 (Paris time, UTC+1) 28 h Early: 829 EUR*
Regular: 929 EUR*
register
Buildroot (agenda) Mar. 8, 9, 10, 11, 2021 14:00 – 18:00 (Paris time, UTC+1) 16 h Early: 519 EUR*
Regular: 619 EUR*
Register
Yocto Project (agenda) Mar. 22, 23, 24, 25, 2021
+ extra session on Mar 26 if needed
14:00 – 18:00 (Paris time, UTC+1) 16 h Early: 519 EUR*
Regular: 619 EUR*
Register
Linux Graphics (agenda) Apr. 6, 7, 8, 9, 2021 14:00 – 18:00 (Paris time, UTC+2) 16 h Early: 519 EUR*
Regular: 619 EUR*
Register

Free “Device Tree 101” webinar, on February 9, 2021

In partnership with ST, we are organizing on February 9, 2021, a free webinar entitled “Device Tree 101”.

The Device Tree has been adopted for the ARM 32-bit Linux kernel support almost a decade ago, and since then, its usage has expanded to many other CPU architectures in Linux, as well as bootloaders such as U-Boot or Barebox. Even though Device Tree is no longer a new mechanism, developers coming into the embedded Linux world often struggle to understand what Device Trees are, what is their syntax, how they interact with the Linux kernel device drivers, what Device Tree bindings are, and more. This webinar will offer a deep dive into the Device Tree, to jump start new developers in using this description language that is now ubiquitous in the vast majority of embedded Linux projects. This webinar will be illustrated with numerous examples applicable to the STM32MP1 MPU platforms, which make extensive usage of the Device Tree.

This webinar will take place on February 9, 2021, and is proposed at two different times during the day: at 10 AM CET (UTC+1) and 5 PM CET (UTC+1). The duration of the webinar is 1 hours and 30 minutes. Registration is free at https://www.eventbrite.com/e/135964923747. The webinar itself will be hosted as a Youtube Live stream, which will allow participants to ask questions in the chat during the webinar.

Device Tree 101

The trainer for this webinar is Thomas Petazzoni, Bootlin’s CTO. Thomas is the author of the popular « Device Tree for Dummies » talk given in 2014 and which helped numerous embedded Linux developers get started with the Device Tree. Thomas has contributed over 900 patches to the official Linux kernel, mainly around ARM hardware platform support. He is also the co-maintainer of the Buildroot open-source project.

Bootlin at FOSDEM 2021: two talks, member of Embedded program committee

FOSDEM21Like all conferences in these times, FOSDEM will take place as an online, virtual event. For all the FOSDEM regular attendees, it will certainly be a very different experience, and for sure, we will all miss the chocolate, waffles, beer, mussels as well as the rainy, muddy, snowy, foggy and cold weather that characterize Brussels in early February. But nevertheless, knowledge sharing and discussions must go on, and FOSDEM will take place! As usual, FOSDEM takes place the first week-end of February, on February 6-7, and the event is completely free, with no registration required.

This time around, Bootlin is once again contributing to FOSDEM:

Make sure to check out the rest of the Embedded Devroom schedule, as well as the overall FOSDEM schedule.

Bootlin toolchains integration in Buildroot

Since 2017, Bootlin is freely providing ready-to-use pre-built cross-compilation toolchains at https://toolchains.bootlin.com/. We are now providing over 150 toolchains, for a wide range of CPU architectures, covering the glibc, uClibc-ng and musl C libraries, with up-to-date gcc, binutils, gdb and C library support.

We recently contributed an improvement to Buildroot that allows those toolchains to very easily be used in Buildroot configurations: the Bootlin toolchains are now all known by Buildroot as existing external toolchains, next to toolchains from other vendors such as ARM, Synopsys and others.

If you are building a Buildroot system for a CPU architecture variant that has a matching toolchain available from bootlin.toolchains.com, then Bootlin toolchains will naturally show up in the Toolchain sub-menu, when the selected Toolchain type is External toolchain. For example, if the selected CPU architecture is ARM little endian Cortex-A9, with VFP you will see:

Bootlin toolchain selection

Once Bootlin toolchains is selected, a new sub-option Bootlin toolchain variant appears, which allows to choose between the different toolchains applicable to the selected CPU architecture:

Bootlin toolchain choice

This hopefully should make Bootlin toolchains easier to use for Buildroot users.

Internally, this support for Bootlin toolchains in Buildroot is generated and updated using the support/scripts/gen-bootlin-toolchains script. In addition to making the toolchains available to the user, it allows generates some Buildroot test cases for each toolchain, so that each of those configuration is tested by Buildroot continuous integration, see support/testing/tests/toolchain/test_external_bootlin.py.

Linux 5.10, Bootlin contributions

Linux 5.10 was released a few weeks ago, and while 5.11-rc2 is already out, it’s still time to look at what Bootlin contributed to the 5.10 kernel. As usual, for a broad overview of the major changes in 5.10, we recommend reading the LWN articles: 5.10 merge window part 1, the rest of the 5.10 merge window, or the 5.10 KernelNewbies page.

Overall, Bootlin contributed 78 patches to this kernel release, in the following areas:

  • Alexandre Belloni did a number of improvements in the support of Microchip ARM platforms: device tree updates, code cleanups, etc.
  • Alexandre Belloni added a new rv3032 RTC driver and did some improvements to the r9701 RTC driver.
  • Miquèl Raynal implemented a significant rework of how ECC engines are handled in the MTD subsystems, so that ECC engines can be used not just for parallel NANDs but also for SPI NANDs. See also the talk that Miquèl gave at the Embedded Linux Conference Europe on this topic: slides and video.
  • Miquèl Raynal contributed a few improvements to the tlv320aic32x4 audio codec driver.
  • Paul Kocialkowski made some small improvements, one in the OV5640 camera sensor driver, and one in the Rockchip DRM driver.
  • Thomas Petazzoni implemented a performance improvement in the max310x driver, used for SPI-connected UART controllers.

In addition to these code contributions, we also contribute by having several of our engineers be maintainers of a few subsystems of the Linux kernel. As part of this:

  • Miquèl Raynal reviewed and merged 47 patches touching the MTD subsystem he co-maintains with other kernel developers.
  • Alexandre Belloni reviewed and merged 42 patches touching either the Microchip ARM or MIPS platforms, or the RTC subsystem.
  • Grégory Clement reviewed and merged 2 patches touching the Marvell ARM/ARM64 platform support.

Here is the complete list of our commits to 5.10.

Bootlin welcomes Thomas Perrot in its team

Welcome on board!Since December 1st, 2020, we’re happy to have in our team an additional engineer, Thomas Perrot, who joined our office in Toulouse, France.

Thomas brings 6+ years of experience working on embedded Linux systems, during which he worked at Intel on Android platforms, and then at Sigfox on the base stations for Sigfox’s radio network. Thomas has experience working with Linux on x86-64, ARM and ARM64 platforms, with a wide range of skills: bootloader development, Linux kernel and driver development, Yocto integration, OTA updates. Thomas was also deeply involved in the strong security aspects of Sigfox base stations, with secure boot and measured boot, TPM, integrity measurement, etc. At Sigfox, Thomas was involved in all steps of the product life-cycle, from the design phase all the way to the in-field deployment, update and maintenance. Last but not least, Thomas is a Linux technologist and a free software enthusiast, who hacks some open source hardware projects on his free time. See also Thomas page on our website and his LinkedIn profile.

Thomas Perrot is joining our growing team of engineers in Toulouse, which already included Paul Kocialkowski, Miquèl Raynal, Köry Maincent, Maxime Chevallier and Thomas Petazzoni.

On-going Bootlin contributions to the Video4Linux subsystem: camera, camera sensors, video encoding

Over the past years, we have been more and more involved in projects that have significant multimedia requirements. As part of this trend, 2020 has lead us to work on a number of contributions to the Video4Linux subsystem of the Linux kernel, with new drivers for camera interfaces, camera sensors, video decoders, and even HW-accelerated video encoding. In this blog post, we propose to summarize our contributions and their status on the following topics:

  • Rockchip PX30, RK1808, RK3128 and RK3288 camera interface driver
  • Allwinner A31, V3s/V3/S3 and A83T MIPI CSI-2 support for the camera interface driver
  • Omnivision OV8865 camera sensor driver
  • Omnivision OV5648 camera sensor driver
  • TW9900 PAL/NTSC video decoder driver
  • Rockchip HW-accelerated H264 video encoding

Rockchip camera interface

Rockchip camera interfaceThe Rockchip ARM processors are known to have very good support in the upstream Linux kernel. However, one area where the support was lacking is in the support of the camera interface used by those SoCs. And it turns out that Bootlin engineer Maxime Chevallier has worked precisely on this topic throughout 2020: the development and upstreaming of the rkvip driver, a Video4Linux driver for the Rockchip camera interface. While the work was done and tested on a Rockchip PX30 platform, the same camera interface is used on RK1808, RK3128 and RK3288.

Several iterations of the driver have been posted on the linux-media mailing list, with the latest iteration, version 5, posted on December 29, 2020:

Maxime Chevallier (3):
  media: dt-bindings: media: Document Rockchip VIP bindings
  media: rockchip: Introduce driver for Rockhip's camera interface
  arm64: dts: rockchip: Add the camera interface description of the PX30

 .../bindings/media/rockchip-vip.yaml          |  101 ++
 arch/arm64/boot/dts/rockchip/px30.dtsi        |   12 +
 drivers/media/platform/Kconfig                |   15 +
 drivers/media/platform/Makefile               |    1 +
 drivers/media/platform/rockchip/vip/Makefile  |    3 +
 drivers/media/platform/rockchip/vip/capture.c | 1146 +++++++++++++++++
 drivers/media/platform/rockchip/vip/dev.c     |  331 +++++
 drivers/media/platform/rockchip/vip/dev.h     |  203 +++
 drivers/media/platform/rockchip/vip/regs.h    |  260 ++++
 9 files changed, 2072 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/rockchip-vip.yaml
 create mode 100644 drivers/media/platform/rockchip/vip/Makefile
 create mode 100644 drivers/media/platform/rockchip/vip/capture.c
 create mode 100644 drivers/media/platform/rockchip/vip/dev.c
 create mode 100644 drivers/media/platform/rockchip/vip/dev.h
 create mode 100644 drivers/media/platform/rockchip/vip/regs.h

We’re hoping to get this driver merged soon, as we have now addressed the feedback that was received through the 5 iterations the patch series as gone through. It should be noted that for now it only supports the parallel BT656 interface as this is what we needed for our current project, we are definitely able to extend it to support MIPI CSI2 as well if you’re interested!

It should be noted that as a result of this work, Maxime Chevallier also prepared and delivered a talk From a video sensor to your display which was given at the Embedded Linux Conference Europe 2020. See the slides and video.

Allwinner MIPI CSI2 camera interface

Allwinner MIPI CSI2As part of an internship in 2020 and then a customer project, Bootlin intern Kévin L’Hôpital and Bootlin engineer Paul Kocialkowski worked on extending the Allwinnera camera interface support with support for MIPI CSI2 cameras. In fact, this addition was done to two Allwinner camera interface drivers: the sun6i driver which is used on Allwinner A31 and V3s/V3/S3, and the sun8i-a83t, which is used on the Allwinner A83T.

Through a fairly long 15 patches patch series, support for MIPI CSI2 is added to both camera interface controllers. We have tested both with Omnivision sensors, which are described below.

The series is currently in its third iteration, which was posted by Paul Kocialkowski on December 11, 2020 on the linux-media mailing list:


Paul Kocialkowski (15):
  docs: phy: Add a part about PHY mode and submode
  phy: Distinguish between Rx and Tx for MIPI D-PHY with submodes
  phy: allwinner: phy-sun6i-mipi-dphy: Support D-PHY Rx mode for MIPI
    CSI-2
  media: sun6i-csi: Use common V4L2 format info for storage bpp
  media: sun6i-csi: Only configure the interface data width for parallel
  dt-bindings: media: sun6i-a31-csi: Add MIPI CSI-2 input port
  media: sun6i-csi: Add support for MIPI CSI-2 bridge input
  dt-bindings: media: Add A31 MIPI CSI-2 bindings documentation
  media: sunxi: Add support for the A31 MIPI CSI-2 controller
  ARM: dts: sun8i: v3s: Add nodes for MIPI CSI-2 support
  MAINTAINERS: Add entry for the Allwinner A31 MIPI CSI-2 bridge
  dt-bindings: media: Add A83T MIPI CSI-2 bindings documentation
  media: sunxi: Add support for the A83T MIPI CSI-2 controller
  ARM: dts: sun8i: a83t: Add MIPI CSI-2 controller node
  MAINTAINERS: Add entry for the Allwinner A83T MIPI CSI-2 bridge

 .../media/allwinner,sun6i-a31-csi.yaml        |  88 ++-
 .../media/allwinner,sun6i-a31-mipi-csi2.yaml  | 149 ++++
 .../media/allwinner,sun8i-a83t-mipi-csi2.yaml | 147 ++++
 Documentation/driver-api/phy/phy.rst          |  18 +
 MAINTAINERS                                   |  16 +
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts  |   2 +-
 arch/arm/boot/dts/sun8i-a83t.dtsi             |  26 +
 arch/arm/boot/dts/sun8i-v3s.dtsi              |  67 ++
 drivers/media/platform/sunxi/Kconfig          |   2 +
 drivers/media/platform/sunxi/Makefile         |   2 +
 .../platform/sunxi/sun6i-csi/sun6i_csi.c      | 165 +++--
 .../platform/sunxi/sun6i-csi/sun6i_csi.h      |  58 +-
 .../platform/sunxi/sun6i-csi/sun6i_video.c    |  53 +-
 .../platform/sunxi/sun6i-csi/sun6i_video.h    |   7 +-
 .../platform/sunxi/sun6i-mipi-csi2/Kconfig    |  12 +
 .../platform/sunxi/sun6i-mipi-csi2/Makefile   |   4 +
 .../sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.c   | 590 ++++++++++++++++
 .../sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.h   | 117 ++++
 .../sunxi/sun8i-a83t-mipi-csi2/Kconfig        |  11 +
 .../sunxi/sun8i-a83t-mipi-csi2/Makefile       |   4 +
 .../sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.c    |  92 +++
 .../sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.h    |  39 ++
 .../sun8i_a83t_mipi_csi2.c                    | 657 ++++++++++++++++++
 .../sun8i_a83t_mipi_csi2.h                    | 197 ++++++
 drivers/phy/allwinner/phy-sun6i-mipi-dphy.c   | 164 ++++-
 drivers/staging/media/rkisp1/rkisp1-isp.c     |   3 +-
 include/linux/phy/phy-mipi-dphy.h             |  13 +
 27 files changed, 2581 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/media/allwinner,sun6i-a31-mipi-csi2.yaml
 create mode 100644 Documentation/devicetree/bindings/media/allwinner,sun8i-a83t-mipi-csi2.yaml
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/Kconfig
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/Makefile
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.c
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.h
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/Kconfig
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/Makefile
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.c
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.h
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_mipi_csi2.c
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_mipi_csi2.h

Here as well, the patch series has gone through a number of iterations, with significant reshaping to take into account the comments and feedback of other kernel developers and maintainers, so we hope to be near the point where it can be merged.

Omnivision OV8865 camera sensor driver

OV8865 block diagramAs part of his internship at Bootlin in 2020, Kévin L’Hôpital implemented a driver for the OV8865 camera sensor, connected over MIPI CSI2 to an Allwinner A83T platform. This OV8865 was then taken by Bootlin engineer Paul Kocialkowski, who did additional rework and polishing.

We are currently at the 4th iteration of this driver, which has been posted on December 11, 2020, and it has now been accepted and submitted to the V4L maintainer in a pull request.


Kévin L'hôpital (1):
  ARM: dts: sun8i: a83t: bananapi-m3: Enable MIPI CSI-2 with OV8865

Paul Kocialkowski (2):
  dt-bindings: media: i2c: Add OV8865 bindings documentation
  media: i2c: Add support for the OV8865 image sensor

 .../bindings/media/i2c/ovti,ov8865.yaml       |  124 +
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts  |  102 +
 drivers/media/i2c/Kconfig                     |   13 +
 drivers/media/i2c/Makefile                    |    1 +
 drivers/media/i2c/ov8865.c                    | 2981 +++++++++++++++++
 5 files changed, 3221 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/i2c/ovti,ov8865.yaml
 create mode 100644 drivers/media/i2c/ov8865.c

Omnivision OV5648 camera sensor driver

OV5648 block diagramIn addition to the work done by Bootlin intern Kévin L’Hôpital on OV8865 with Allwinner A83T, Paul Kocialkowski worked on OV5648 with Allwinner V3s, also connected over MIPI CSI2. This work results in a driver for the OV5648 camera sensor, which Paul has submitted to the linux-media mailing list.

This driver is now in is 5th iteration, posted on December 11, 2020, and it has now been accepted and submitted to the V4L maintainer in a pull request.


Paul Kocialkowski (2):
  dt-bindings: media: i2c: Add OV5648 bindings documentation
  media: i2c: Add support for the OV5648 image sensor

 .../bindings/media/i2c/ovti,ov5648.yaml       |  115 +
 drivers/media/i2c/Kconfig                     |   13 +
 drivers/media/i2c/Makefile                    |    1 +
 drivers/media/i2c/ov5648.c                    | 2638 +++++++++++++++++
 4 files changed, 2767 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/i2c/ovti,ov5648.yaml
 create mode 100644 drivers/media/i2c/ov5648.c

TW9900 PAL/NTSC video decoder driver

TW9900In addition to working on the Rockchip camera interface driver, Maxime Chevallier has also worked on a driver for the TW9900 PAL/NTSC video decoder. This chip from Renesas, takes as input an analog PAL or NTSC signal, digitizes it and outputs it on a parallel BT656 interface, which in our case was connected to a Rockchip PX30 platform.

Maxime posted the third iteration of the patch series adding this driver on December 22, 2020 on the linux-media mailing list.

Maxime Chevallier (3):
  dt-bindings: vendor-prefixes: Add techwell vendor prefix
  media: dt-bindings: media: i2c: Add bindings for TW9900
  media: i2c: Introduce a driver for the Techwell TW9900 decoder

 .../devicetree/bindings/media/i2c/tw9900.yaml |  60 ++
 .../devicetree/bindings/vendor-prefixes.yaml  |   2 +
 MAINTAINERS                                   |   6 +
 drivers/media/i2c/Kconfig                     |  11 +
 drivers/media/i2c/Makefile                    |   1 +
 drivers/media/i2c/tw9900.c                    | 617 ++++++++++++++++++
 6 files changed, 697 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/i2c/tw9900.yaml
 create mode 100644 drivers/media/i2c/tw9900.c

Rockchip HW-accelerated H264 video encoding

In 2018 and thanks to success of the crowd-funding campaign we ran back then, Bootlin engineer Paul Kocialkowski pioneered support for stateless video decoders in the Linux kernel, with a first driver supporting MPEG2, H264 and H265 HW-accelerated video decoding on Allwinner platforms.

Rockchip video encoderIn 2020, Paul was tasked to work on HW-accelerated H264 video encoding for Rockchip platforms, which also use a stateless video encoder. Of course, Paul took the same approach of going towards an upstream-acceptable solution rather than relying on out-of-tree and vendor-specific solutions provided by Rockchip.

Paul has been able to implement a working solution for one of our customers, and while the result is not yet in a shape where it can be submitted upstream, Paul has presented its result at the Embedded Linux Conference Europe 2020: the slides and video. The kernel code is available at https://github.com/bootlin/linux/tree/hantro/h264-encoding while the user-space code is available at https://github.com/bootlin/v4l2-hantro-h264-encoder.

As explained in Paul’s talk, this is not fully ready for upstream, as lots of discussions are needed on the user-space APIs, especially around the topic of rate control.

If you are interested in having this work fully available in the upstream Linux kernel, please contact us. We are looking for additional funding and support to push this completely upstream.

Conclusion

As can be seen from the numerous topics covered in this blog post, Bootlin has significant experience with the Video4Linux subsystem, and is able to both implement support for new hardware, extend the Video4Linux subsystem if needed, and contribute drivers and changes to the official Linux kernel.

Bootlin training courses for beginning of 2021

It’s the beginning of 2021, and Bootlin’s offering of online training courses continues. We have dates available for our 5 training courses, at an affordable cost, and with the same quality characteristics of all Bootlin courses: trainers with proven in-field experience, fully open-source training materials and worldwide recognized training contents.

Here are the dates of our upcoming sessions:

See our training page for more details about all our training courses!

Large Page Support for NAS systems on 32 bit ARM

The need for large page support on 32 bit ARM

Storage space has become more and more affordable to a point that it is now possible to have multiple hard drives of dozens of terabytes in a single consumer-grade device. With a few 10 TiB hard drives and thanks to RAID technology, storage capacities that exceed 16 or 32 TiB can easily be reached and at a relatively low cost.

However, a number of consumer NAS systems used in the field today are still based on 32 bit ARM processors. The problem is that, with Linux on a 32 bit system, it’s only possible to address up to 16 TiB of storage space. This is still true even with the ext4 filesystem, even though it uses 64 bit pointers.

We were lucky to have a customer contracting us to update older Large Page Support patches to a recent version of the Linux kernel. This set of patches are one way of overcoming this 16 TiB limitation for ARM 32-bit systems. Since updating this patch series was a non trivial task, we are happy to share the results of our efforts with the community, both through this blog post and through a patch series we posted to the Linux ARM kernel mailing list: ARM: Add support for large kernel page (from 8K to 64K).

How Large Page Support works

The 16 TiB limitation comes from the use of page->index which is a pgoff_t offset type corresponding to unsigned long. This limits us to a 32-bit page offsets, so with 4 KiB physical pages, we end up with a maximum of 16 TiB. A way to address this limitation is to use larger physical pages. We can reach 32 TiB with 8 KiB pages, 64 TiB with 16 KiB pages and up to 256 TiB with 64 KiB pages.

Before going further, the ARM32 Page Tables article from Linus Walleij is a good reference to understand how the Linux kernel deals with ARM32 page tables. In our case, we are only going to cover the non LPAE case. As explained there, the way the Linux kernel sees the page tables actually doesn’t match reality. First, the kernel deals with 4 levels of page tables while on hardware there are only 2 levels. In addition, while the ARM32 hardware stores only 256 PTEs in Page Tables, taking up only 1 KB, Linux optimizes things by storing in each 4 KB page two sets of 256 PTEs, and two sets of shadow PTEs that are used to store additional metadata needed by Linux about each page (such as the dirty and accessed/young bits). So, there is already some magic between what is presented to the Linux virtual memory management subsystem, and what is really programmed into the hardware page tables. To support large pages, the idea is to go further in this direction by emulating larger physical pages.

Our series (and especially patch 5: ARM: Add large kernel page support) proposes to pretend to have larger hardware pages. The ARM 32-bit architecture only supports 4 KiB or 64 KiB page sizes, but we would like to support intermediate values of 8 KiB, 16 KiB and 32 KiB as well. So what we do to support 8 KiB pages is that we tell Linux the hardware has 8 KiB pages, but in fact we simply use two consecutive 4 KiB pages at the hardware level that we manipulate and configure simultaneously. To support 16 KiB pages, we use 4 consecutive 4 KiB pages, for 32 KiB pages, we use 8 consecutive pages, etc. So really, we “emulate” having larger page sizes by grouping 2, 4 or 8 pages together. Adding this feature only required a few changes in the code, mainly dealing with ranges of pages every time we were dealing with a single page. Actually, most of the code in the series is about making it possible to modify the hard coded value of the hardware page size and fixing the assumptions associated to such a fixed value.

In addition to this emulated mechanism that we provide for 8 KiB, 16 KiB, 32 KiB and 64 KiB pages, we also added support for using real hardware 64 KiB pages as part of this patch series.

Overall the number of changes is very limited (271 lines added, 13 lines removed), and allows to use much larger storage devices. Here is the diffstat of the full patch series:

 arch/arm/include/asm/elf.h                  |  2 +-
 arch/arm/include/asm/fixmap.h               |  3 +-
 arch/arm/include/asm/page.h                 | 12 ++++
 arch/arm/include/asm/pgtable-2level-hwdef.h |  8 +++
 arch/arm/include/asm/pgtable-2level.h       |  6 +-
 arch/arm/include/asm/pgtable.h              |  4 ++
 arch/arm/include/asm/shmparam.h             |  4 ++
 arch/arm/include/asm/tlbflush.h             | 21 +++++-
 arch/arm/kernel/entry-common.S              | 13 ++++
 arch/arm/kernel/traps.c                     | 10 +++
 arch/arm/mm/Kconfig                         | 72 +++++++++++++++++++++
 arch/arm/mm/fault.c                         | 19 ++++++
 arch/arm/mm/mmu.c                           | 22 ++++++-
 arch/arm/mm/pgd.c                           |  2 +
 arch/arm/mm/proc-v7-2level.S                | 72 ++++++++++++++++++++-
 arch/arm/mm/tlb-v7.S                        | 14 +++-
 16 files changed, 271 insertions(+), 13 deletions(-)

This patch series is running in production now on some NAS devices from a very popular NAS brand.

Limitations and alternatives

The submission of our patch series is recent but this feature has actually been running for years on many NAS systems in the field. Our new series is based on the original patchset, with the purpose of submitting it to the mainline kernel community. However, there is little chance that it will ever be merged into the mainline kernel.

The main drawback of this approach are large pages themselves: as each file in the page cache uses at least one page, the memory wasted increases as the size of the pages increases. For this reason, Linus Torvalds was against similar series proposed in the past.

To show how much memory is wasted, Arnd Bergmann ran some numbers to measure the page cache overhead for a typical set of files (Linux 5.7 kernel sources) for 5 different page sizes:

Page size (KiB) 4 8 16 32 64
page cache usage (MiB) 1,023.26 1,209.54 1,628.39 2,557.31 4,550.88
factor over 4K pages 1.00x 1.18x 1.59x 2.50x 4.45x

We can see that while a factor of 1.18 is acceptable for 8 KiB pages, a 4.45 multiplier looks excessive with 64 KiB pages.

Actually, to make it possible to address large volumes on 32 bit ARM, another solution was pointed out during the review of our series. Instead of using larger pages which have an impact on the entire system, an alternative is to modify the way the filesystem addresses the memory by using 64 bits pgoff_t offsets. This has already been implemented in vendor kernels running in some NAS systems, but this has never been submitted to mainline developers.