Boot time: choose your kernel loading address carefully

When the compressed and uncompressed kernel images overlap

At least on ARM32, there seems to be many working addresses where the compressed kernel can be loaded in RAM. For example, one can load the compressed kernel at offset 0x1000000 (16 MB) from the start of RAM, and the Device Tree Blog (DTB) at offset 0x2000000 (32 MB). Whatever this loading address, the kernel is then decompressed at offset 0x8000 from the start of RAM, as explained this the famous How the ARM32 Linux kernel decompresses article from Linus Walleij.

There is a potential issue with the loading address of the compressed kernel, as explained in the article too. If the compressed kernel is loaded too close to the beginning of RAM, where the kernel must be decompressed, there will be an overlap between the two. The decompressed kernel will overwrite the compressed one, potentially breaking the decompression process.

Overlapping compressed and decompressed kernel

As you see in the above diagram, when this happens, the bootstrap code in the compressed kernel will first copy the compressed image to a location that’s far enough to guarantee that the decompressed kernel won’t overlap it. However, this extra step in the boot process has a cost.

Measuring boot time impact

In the context of updating our materials for our upcoming Embedded Linux Boot Time Optimization course in June, we measured this additional time on the STM32MP157A-DK1 Discovery Kit from STMicroelectronics, with a dual-core ARM Cortex-A7 CPU running at 650 MHz.

Initially, in our Embedded Linux System Development course, we were booting the DK1 board as follows:

ext4load mmc 0:4 0xc0000000 zImage; ext4load mmc 0:4 0xc4000000 dtb; bootz 0xc0000000 - 0xc4000000

0xc0000000 is exactly the beginning of RAM! We are therefore in the overlap situation.

We used grabserial from Tim Bird to measure the time between Starting kernel in U-Boot and when the compressed kernel starts executing (Booting Linux on physical CPU 0x0):

...
[4.451996 0.000124] Starting kernel ...
[0.001838 0.001838] 
[2.439980 2.438142] [    0.000000] Booting Linux on physical CPU 0x0
...

On a series of 5 identical tests, we obtained an average time of 2,440 ms, with a standard deviation of 0.4 ms.

Then, we measured the optimum case, in which the compressed kernel is loaded far enough from the beginning of RAM so that no overlap is possible:

No overlap between compressed and decompressed kernel

Here we chose to load the kernel at 0xc2000000:

ext4load mmc 0:4 0xc2000000 zImage; ext4load mmc 0:4 0xc4000000 dtb; bootz 0xc2000000 - 0xc4000000

On a series of 5 identical tests, we obtained an average time of 2,333 ms, with a standard deviation of 0.7 ms.

The new average is 107 ms smaller, which you are likely to consider as a worthy reduction, if you have experience with boot time reduction projects.

What to remember

In your embedded projects, if you are using a compressed kernel, make sure it is loaded far enough from the beginning of RAM, leaving enough space for the decompressed kernel to fit in between. Otherwise, your system will still be able to boot, but depending on the speed of your CPU and storage, it will be slower, from a few tens to a few hundreds of milliseconds.

We checked the How to optimize the boot time page on the STM32 wiki, and it recommends optimum loading addresses: 0xc2000000 for the kernel and 0xc4000000 for the device tree. This way, the upper limit for the decompressed kernel is 32 MB, which is more than enough.

If you are directly using an uncompressed kernel, which is more rare, you should also make sure that it is loaded at an optimum location, so that there is no need to move it before starting it.

Large Page Support for NAS systems on 32 bit ARM

The need for large page support on 32 bit ARM

Storage space has become more and more affordable to a point that it is now possible to have multiple hard drives of dozens of terabytes in a single consumer-grade device. With a few 10 TiB hard drives and thanks to RAID technology, storage capacities that exceed 16 or 32 TiB can easily be reached and at a relatively low cost.

However, a number of consumer NAS systems used in the field today are still based on 32 bit ARM processors. The problem is that, with Linux on a 32 bit system, it’s only possible to address up to 16 TiB of storage space. This is still true even with the ext4 filesystem, even though it uses 64 bit pointers.

We were lucky to have a customer contracting us to update older Large Page Support patches to a recent version of the Linux kernel. This set of patches are one way of overcoming this 16 TiB limitation for ARM 32-bit systems. Since updating this patch series was a non trivial task, we are happy to share the results of our efforts with the community, both through this blog post and through a patch series we posted to the Linux ARM kernel mailing list: ARM: Add support for large kernel page (from 8K to 64K).

How Large Page Support works

The 16 TiB limitation comes from the use of page->index which is a pgoff_t offset type corresponding to unsigned long. This limits us to a 32-bit page offsets, so with 4 KiB physical pages, we end up with a maximum of 16 TiB. A way to address this limitation is to use larger physical pages. We can reach 32 TiB with 8 KiB pages, 64 TiB with 16 KiB pages and up to 256 TiB with 64 KiB pages.

Before going further, the ARM32 Page Tables article from Linus Walleij is a good reference to understand how the Linux kernel deals with ARM32 page tables. In our case, we are only going to cover the non LPAE case. As explained there, the way the Linux kernel sees the page tables actually doesn’t match reality. First, the kernel deals with 4 levels of page tables while on hardware there are only 2 levels. In addition, while the ARM32 hardware stores only 256 PTEs in Page Tables, taking up only 1 KB, Linux optimizes things by storing in each 4 KB page two sets of 256 PTEs, and two sets of shadow PTEs that are used to store additional metadata needed by Linux about each page (such as the dirty and accessed/young bits). So, there is already some magic between what is presented to the Linux virtual memory management subsystem, and what is really programmed into the hardware page tables. To support large pages, the idea is to go further in this direction by emulating larger physical pages.

Our series (and especially patch 5: ARM: Add large kernel page support) proposes to pretend to have larger hardware pages. The ARM 32-bit architecture only supports 4 KiB or 64 KiB page sizes, but we would like to support intermediate values of 8 KiB, 16 KiB and 32 KiB as well. So what we do to support 8 KiB pages is that we tell Linux the hardware has 8 KiB pages, but in fact we simply use two consecutive 4 KiB pages at the hardware level that we manipulate and configure simultaneously. To support 16 KiB pages, we use 4 consecutive 4 KiB pages, for 32 KiB pages, we use 8 consecutive pages, etc. So really, we “emulate” having larger page sizes by grouping 2, 4 or 8 pages together. Adding this feature only required a few changes in the code, mainly dealing with ranges of pages every time we were dealing with a single page. Actually, most of the code in the series is about making it possible to modify the hard coded value of the hardware page size and fixing the assumptions associated to such a fixed value.

In addition to this emulated mechanism that we provide for 8 KiB, 16 KiB, 32 KiB and 64 KiB pages, we also added support for using real hardware 64 KiB pages as part of this patch series.

Overall the number of changes is very limited (271 lines added, 13 lines removed), and allows to use much larger storage devices. Here is the diffstat of the full patch series:

 arch/arm/include/asm/elf.h                  |  2 +-
 arch/arm/include/asm/fixmap.h               |  3 +-
 arch/arm/include/asm/page.h                 | 12 ++++
 arch/arm/include/asm/pgtable-2level-hwdef.h |  8 +++
 arch/arm/include/asm/pgtable-2level.h       |  6 +-
 arch/arm/include/asm/pgtable.h              |  4 ++
 arch/arm/include/asm/shmparam.h             |  4 ++
 arch/arm/include/asm/tlbflush.h             | 21 +++++-
 arch/arm/kernel/entry-common.S              | 13 ++++
 arch/arm/kernel/traps.c                     | 10 +++
 arch/arm/mm/Kconfig                         | 72 +++++++++++++++++++++
 arch/arm/mm/fault.c                         | 19 ++++++
 arch/arm/mm/mmu.c                           | 22 ++++++-
 arch/arm/mm/pgd.c                           |  2 +
 arch/arm/mm/proc-v7-2level.S                | 72 ++++++++++++++++++++-
 arch/arm/mm/tlb-v7.S                        | 14 +++-
 16 files changed, 271 insertions(+), 13 deletions(-)

This patch series is running in production now on some NAS devices from a very popular NAS brand.

Limitations and alternatives

The submission of our patch series is recent but this feature has actually been running for years on many NAS systems in the field. Our new series is based on the original patchset, with the purpose of submitting it to the mainline kernel community. However, there is little chance that it will ever be merged into the mainline kernel.

The main drawback of this approach are large pages themselves: as each file in the page cache uses at least one page, the memory wasted increases as the size of the pages increases. For this reason, Linus Torvalds was against similar series proposed in the past.

To show how much memory is wasted, Arnd Bergmann ran some numbers to measure the page cache overhead for a typical set of files (Linux 5.7 kernel sources) for 5 different page sizes:

Page size (KiB) 4 8 16 32 64
page cache usage (MiB) 1,023.26 1,209.54 1,628.39 2,557.31 4,550.88
factor over 4K pages 1.00x 1.18x 1.59x 2.50x 4.45x

We can see that while a factor of 1.18 is acceptable for 8 KiB pages, a 4.45 multiplier looks excessive with 64 KiB pages.

Actually, to make it possible to address large volumes on 32 bit ARM, another solution was pointed out during the review of our series. Instead of using larger pages which have an impact on the entire system, an alternative is to modify the way the filesystem addresses the memory by using 64 bits pgoff_t offsets. This has already been implemented in vendor kernels running in some NAS systems, but this has never been submitted to mainline developers.

Covid-19: Bootlin proposes online sessions for all its courses

Tux working on embedded Linux on a couchLike most of us, due to the Covid-19 epidemic, you may be forced to work from home. To take advantage from this time confined at home, we are now proposing all our training courses as online seminars. You can then benefit from the contents and quality of Bootlin training sessions, without leaving the comfort and safety of your home. During our online seminars, our instructors will alternate between presentations and practical demonstrations, executing the instructions of our practical labs.

At any time, participants will be able to ask questions.

We can propose such remote training both through public online sessions, open to individual registration, as well as dedicated online sessions, for participants from the same company.

Public online sessions

We’re trying to propose time slots that should be manageable from Europe, Middle East, Africa and at least for the East Coast of North America. All such sessions will be taught in English. As usual with all our sessions, all our training materials (lectures and lab instructions) are freely available from the pages describing our courses.

Our Embedded Linux and Linux kernel courses are delivered over 7 half days of 4 hours each, while our Yocto Project, Buildroot and Linux Graphics courses are delivered over 4 half days. For our embedded Linux and Yocto Project courses, we propose an additional date in case some extra time is needed to complete the agenda.

Here are all the available sessions. If the situation lasts longer, we will create new sessions as needed:

Type Dates Time Duration Expected trainer Cost and registration
Embedded Linux (agenda) Sep. 28, 29, 30, Oct. 1, 2, 5, 6 2020. 17:00 – 21:00 (Paris), 8:00 – 12:00 (San Francisco) 28 h Michael Opdenacker 829 EUR + VAT* (register)
Embedded Linux (agenda) Nov. 2, 3, 4, 5, 6, 9, 10, 12, 2020. 14:00 – 18:00 (Paris), 8:00 – 12:00 (New York) 28 h Michael Opdenacker 829 EUR + VAT* (register)
Linux kernel (agenda) Nov. 16, 17, 18, 19, 23, 24, 25, 26 14:00 – 18:00 (Paris time) 28 h Alexandre Belloni 829 EUR + VAT* (register)
Yocto Project (agenda) Nov. 30, Dec. 1, 2, 3, 4, 2020 14:00 – 18:00 (Paris time) 16 h Maxime Chevallier 519 EUR + VAT* (register)
Buildroot (agenda) Dec. 7, 8, 9, 10, and 11, 2020 14:00 – 18:00 (Paris time) 16 h Thomas Petazzoni 519 EUR + VAT* (register)
Linux Graphics (agenda) Dec. 1, 2, 3, 4, 2020 14:00 – 18:00 (Paris time) 16 h Paul Kocialkowski 519 EUR + VAT* (register

* VAT: applies to businesses in France and to individuals from all countries. Businesses in the European Union won’t be charged VAT only if they provide valid company information and VAT number to Evenbrite at registration time. For businesses in other countries, we should be able to grant them a VAT refund, provided they send us a proof of company incorporation before the end of the session.

Each public session will be confirmed once there are at least 6 participants. If the minimum number of participants is not reached, Bootlin will propose new dates or a full refund (including Eventbrite fees) if no new date works for the participant.

We guarantee that the maximum number of participants will be 12.

Dedicated online sessions

If you have enough people to train, such dedicated sessions can be a worthy alternative to public ones:

  • Flexible dates and daily durations, corresponding to the availability of your teams.
  • Confidentiality: freedom to ask questions that are related to your company’s projects and plans.
  • If time left, possibility to have knowledge sharing time with the instructor, that could go beyond the scope of the training course.
  • Language: possibility to have a session in French instead of in English.

Online seminar details

Each session will be given through Jitsi Meet, a Free Software solution that we are trying to promote. As a backup solution, we will also be able to Google Hangouts Meet. Each participant should have her or his own connection and computer (with webcam and microphone) and if possible headsets, to avoid echo issues between audio input and output. This is probably the best solution to allow each participant to ask questions and write comments in the chat window. We also support people connecting from the same conference room with suitable equipment.

Each participant is asked to connect 15 minutes before the session starts, to make sure her or his setup works (instructions will be sent before the event).

How to register

For online public sessions, use the EventBrite links in the above list of sessions to register one or several individuals.

To register an entire group (for dedicated sessions), please contact training@bootlin.com and tell us the type of session you are interested in. We will then send you a registration form to collect all the details we need to send you a quote.

You can also ask all your questions by calling +33 484 258 097.

Questions and answers

Q : Should I order hardware in advance, our hardware included in the training cost?
R : No, practical labs are replaced by technical demonstrations, so you will be able to follow the course without any hardware. However, you can still order the hardware by checking the “Shopping list” pages of presentation materials for each session. This way, between each session, you will be able to replay by yourself the labs demonstrated by your trainer, ask all your questions, and get help between sessions through our dedicated Matrix channel to reach your goals.

Q: Why just demos instead of practicing with real hardware?
A: We are not ready to support a sufficient number of participants doing practical labs remotely with real hardware. This is more complicated and time consuming than in real life. Hence, what we we’re proposing is to replace practical labs with practical demonstrations shown by the instructor. The instructor will go through the normal practical labs with the standard hardware that we’re using.

Q: Would it be possible to run practical labs on the QEMU emulator?
R: Yes, it’s coming. In the embedded Linux course, we are already offering instructions to run most practical labs on QEMU between the sessions, before the practical demos performed by the trainer. We should also be able to propose such instructions for our Yocto Project and Buildroot training courses in the next months. Such work is likely to take more time for our Linux kernel course, practical labs being closer to the hardware that we use.

Q: Why proposing half days instead of full days?
A: From our experience, it’s very difficult to stay focused on a new technical topic for an entire day without having periodic moments when you are active (which happens in our public and on-site sessions, in which we interleave lectures and practical labs). Hence, we believe that daily slots of 4 hours (with a small break in the middle) is a good solution, also leaving extra time for following up your normal work.

Back from ELCE 2017: slides and videos

Bootlin participated to the Embedded Linux Conference Europe last week in Prague. With 7 engineers attending, 4 talks, one BoF and a poster at the technical showcase, we had a strong presence to this major conference of the embedded Linux ecosystem. All of us had a great time at this event, attending interesting talks and meeting numerous open-source developers.

Bootlin team at the Embedded Linux Conference Europe 2017
Bootlin team at the Embedded Linux Conference Europe 2017. Top, from left to right: Maxime Ripard, Grégory Clement, Boris Brezillon, Quentin Schulz. Bottom, from left to right: Miquèl Raynal, Thomas Petazzoni, Michael Opdenacker.

In this first blog post about ELCE, we want to share the slides and videos of the talks we have given during the conference.

SD/eMMC: New Speed Modes and Their Support in Linux – Gregory Clement

Grégory ClementSince the introduction of the original “default”(DS) and “high speed”(HS) modes, the SD card standard has evolved by introducing new speed modes, such as SDR12, SDR25, SDR50, SDR104, etc. The same happened to the eMMC standard, with the introduction of new high speed modes named DDR52, HS200, HS400, etc. The Linux kernel has obviously evolved to support these new speed modes, both in the MMC core and through the addition of new drivers.

This talk will start by introducing the SD and eMMC standards and how they work at the hardware level, with a specific focus on the new speed modes. With this hardware background in place, we will then detail how these standards are supported by Linux, see what is still missing, and what we can expect to see in the future.

Slides [PDF], Slides [LaTeX source]

An Overview of the Linux Kernel Crypto Subsystem – Boris Brezillon

Boris BrezillonThe Linux kernel has long provided cryptographic support for in-kernel users (like the network or storage stacks) and has been pushed to open these cryptographic capabilities to user-space along the way.

But what is exactly inside this subsystem, and how can it be used by kernel users? What is the official userspace interface exposing these features and what are non-upstream alternatives? When should we use a HW engine compared to a purely software based implementation? What’s inside a crypto engine driver and what precautions should be taken when developing one?

These are some of the questions we’ll answer throughout this talk, after having given a short introduction to cryptographic algorithms.

Slides [PDF], Slides [LaTeX source]

Buildroot: What’s New? – Thomas Petazzoni

Thomas PetazzoniBuildroot is a popular and easy to use embedded Linux build system. Within minutes, it is capable of generating lightweight and customized Linux systems, including the cross-compilation toolchain, kernel and bootloader images, as well as a wide variety of userspace libraries and programs.

Since our last “What’s new” talk at ELC 2014, three and half years have passed, and Buildroot has continued to evolve significantly.

After a short introduction about Buildroot, this talk will go through the numerous new features and improvements that have appeared over the last years, and show how they can be useful for developers, users and contributors.

Slides [PDF], Slides [LaTeX source]

Porting U-Boot and Linux on New ARM Boards: A Step-by-Step Guide – Quentin Schulz

May it be because of a lack of documentation or because we don’t know where to look or where to start, it is not always easy to get started with U-Boot or Linux, and know how to port them to a new ARM platform.

Based on experience porting modern versions of U-Boot and Linux on a custom Freescale/NXP i.MX6 platform, this talk will offer a step-by-step guide through the porting process. From board files to Device Trees, through Kconfig, device model, defconfigs, and tips and tricks, join this talk to discover how to get U-Boot and Linux up and running on your brand new ARM platform!

Slides [PDF], Slides [LaTeX source]

BoF: Embedded Linux Size – Michael Opdenacker

This “Birds of a Feather” session will start by a quick update on available resources and recent efforts to reduce the size of the Linux kernel and the filesystem it uses.

An ARM based system running the mainline kernel with about 3 MB of RAM will also be demonstrated.

If you are interested in the size topic, please join this BoF and share your experience, the resources you have found and your ideas for further size reduction techniques!

Slides [PDF], Slides [LaTeX source]

Mali OpenGL support on Allwinner platforms with mainline Linux

As most people know, getting GPU-based 3D acceleration to work on ARM platforms has always been difficult, due to the closed nature of the support for such GPUs. Most vendors provide closed-source binary-only OpenGL implementations in the form of binary blobs, whose quality depend on the vendor.

This situation is getting better and better through vendor-funded initiatives like for the Broadcom VC4 and VC5, or through reverse engineering projects like Nouveau on Tegra SoCs, Etnaviv on Vivante GPUs, Freedreno on Qualcomm’s. However there are still GPUs where you do not have the option to use a free software stack: PowerVR from Imagination Technologies and Mali from ARM (even though there is some progress on the reverse engineering effort).

Allwinner SoCs are using either a Mali GPU from ARM or a PowerVR from Imagination Technologies, and therefore, support for OpenGL on those platforms using a mainline Linux kernel has always been a problem. This is also further complicated by the fact that Allwinner is mostly interested in Android, which uses a different C library that avoids its use in traditional glibc-based systems (or through the use of libhybris).

However, we are happy to announce that Allwinner gave us clearance to publish the userspace binary blobs that allows to get OpenGL supported on Allwinner platforms that use a Mali GPU from ARM, using a recent mainline Linux kernel. Of course, those are closed source binary blobs and not a nice fully open-source solution, but it nonetheless allows everyone to have OpenGL support working, while taking advantage of all the benefits of a recent mainline Linux kernel. We have successfully used those binary blobs on customer projects involving the Allwinner A33 SoCs, and they should work on all Allwinner SoCs using the Mali GPU.

In order to get GPU support to work on your Allwinner platform, you will need:

  • The kernel-side driver, available on Maxime Ripard’s Github repository. This is essentially the Mali kernel-side driver from ARM, plus a number of build and bug fixes to make it work with recent mainline Linux kernels.
  • The Device Tree description of the GPU. We introduced Device Tree bindings for Mali GPUs in the mainline kernel a while ago, so that Device Trees can describe such GPUs. Such description has been added for the Allwinner A23 and A33 SoCs as part of this commit.
  • The userspace blob, which is available on Bootlin GitHub repository. It currently provides the r6p2 version of the driver, with support for both fbdev and X11 systems. Hopefully, we’ll gain access to newer versions in the future, with additional features (such as GBM support).

If you want to use it in your system, the first step is to have the GPU definition in your device tree if it’s not already there. Then, you need to compile the kernel module:

git clone https://github.com/mripard/sunxi-mali.git
cd sunxi-mali
export CROSS_COMPILE=$TOOLCHAIN_PREFIX
export KDIR=$KERNEL_BUILD_DIR
export INSTALL_MOD_PATH=$TARGET_DIR
./build.sh -r r6p2 -b
./build.sh -r r6p2 -i

It should install the mali.ko Linux kernel module into the target filesystem.

Now, you can copy the OpenGL userspace blobs that match your setup, most likely the fbdev or X11-dma-buf variant. For example, for fbdev:

git clone https://github.com/bootlin/mali-blobs.git
cd mali-blobs
cp -a r6p2/fbdev/lib/lib_fb_dev/lib* $TARGET_DIR/usr/lib

You should be all set. Of course, you will have to link your OpenGL applications or libraries against those user-space blobs. You can check that everything works using OpenGL test programs such as es2_gears for example.

Support for Device Tree overlays in U-Boot and libfdt

C.H.I.PWe have been working for almost two years now on the C.H.I.P platform from Nextthing Co.. One of the characteristics of this platform is that it provides an expansion headers, which allows to connect expansion boards also called DIPs in the CHIP community.

In a manner similar to what is done for the BeagleBone capes, it quickly became clear that we should be using Device Tree overlays to describe the hardware available on those expansion boards. Thanks to the feedback from the Beagleboard community (especially David Anders, Pantelis Antoniou and Matt Porter), we designed a very nice mechanism for run-time detection of the DIPs connected to the platform, based on an EEPROM available in each DIP and connected through the 1-wire bus. This EEPROM allows the system running on the CHIP to detect which DIPs are connected to the system at boot time. Our engineer Antoine Ténart worked on a prototype Linux driver to detect the connected DIPs and load the associated Device Tree overlay. Antoine’s work was even presented at the Embedded Linux Conference, in April 2016: one can see the slides and video of Antoine’s talk.

However, it turned out that this Linux driver had a few limitations. Because the driver relies on Device Tree overlays stored as files in the root filesystem, such overlays can only be loaded fairly late in the boot process. This wasn’t working very well with storage devices or for DRM that doesn’t allow hotplug of some components. Therefore, this solution wasn’t working well for the display-related DIPs provided for the CHIP: the VGA and HDMI DIP.

The answer to that was to apply those Device Tree overlays earlier, in the bootloader, so that Linux wouldn’t have to deal with them. Since we’re using U-Boot on the CHIP, we made a first implementation that we submitted back in April. The review process took its place, it was eventually merged and appeared in U-Boot 2016.09.

List of relevant commits in U-Boot:

However, the U-Boot community also requested that the changes should also be merged in the upstream libfdt, which is hosted as part of dtc, the device tree compiler.

Following this suggestion, Bootlin engineer Maxime Ripard has been working on merging those changes in the upstream libfdt. He sent a number of iterations, which received very good feedback from dtc maintainer David Gibson. And it finally came to a conclusion early October, when David merged the seventh iteration of those patches in the dtc repository. It should therefore hopefully be part of the next dtc/libfdt release.

List of relevant commits in the Device Tree compiler:

Since the libfdt is used by a number of other projects (like Barebox, or even Linux itself), all of them will gain the ability to apply device tree overlays when they will upgrade their version. People from the BeagleBone and the Raspberry Pi communities have already expressed interest in using this work, so hopefully, this will turn into something that will be available on all the major ARM platforms.

Linux 4.8 released, Bootlin contributions

Adelie PenguinLinux 4.8 has been released on Sunday by Linus Torvalds, with numerous new features and improvements that have been described in details on LWN: part 1, part 2 and part 3. KernelNewbies also has an updated page on the 4.8 release. We contributed a total of 153 patches to this release. LWN also published some statistics about this development cycle.

Our most significant contributions:

  • Boris Brezillon improved the Rockchip PWM driver to avoid glitches basing that work on his previous improvement to the PWM subsystem already merged in the kernel. He also fixed a few issues and shortcomings in the pwm regulator driver. This is finishing his work on the Rockchip based Chromebook platforms where a PWM is used for a regulator.
  • While working on the driver for the sii902x HDMI transceiver, Boris Brezillon did a cleanup of many DRM drivers. Those drivers were open coding the encoder selection. This is now done in the core DRM subsystem.
  • On the support of Atmel platforms
    • Alexandre Belloni cleaned up the existing board device trees, removing unused clock definitions and starting to remove warnings when compiling with the Device Tree Compiler (dtc).
  • On the support of Allwinner platforms
    • Maxime Ripard contributed a brand new infrastructure, named sunxi-ng, to manage the clocks of the Allwinner platforms, fixing shortcomings of the Device Tree representation used by the existing implementation. He moved the support of the Allwinner H3 clocks to this new infrastructure.
    • Maxime also developed a driver for the Allwinner A10 Digital Audio controller, bringing audio support to this platform.
    • Boris Brezillon improved the Allwinner NAND controller driver to support DMA assisted operations, which brings a very nice speed-up to throughput on platforms using NAND flashes as the storage, which is the case of Nextthing’s C.H.I.P.
    • Quentin Schulz added support for the Allwinner R16 EVB (Parrot) board.
  • On the support of Marvell platforms
    • Grégory Clément added multiple clock definitions for the Armada 37xx series of SoCs.
    • He also corrected a few issues with the I/O coherency on some Marvell SoCs
    • Romain Perier worked on the Marvell CESA cryptography driver, bringing significant performance improvements, especially for dmcrypt usage. This driver is used on numerous Marvell platforms: Orion, Kirkwood, Armada 370, XP, 375 and 38x.
    • Thomas Petazzoni submitted a driver for the Aardvark PCI host controller present in the Armada 3700, enabling PCI support for this platform.
    • Thomas also added a driver for the new XOR engine found in the Armada 7K and Armada 8K families

Here are in details, the different contributions we made to this release:

Bootlin contributions to Linux 4.5

Adelie PenguinLinus Torvalds just released Linux 4.5, for which the major new features have been described by LWN.net in three articles: part 1, part 2 and part 3. On a total of 12080 commits, Bootlin contributed 121 patches, almost exactly 1% of the total. Due to its large number of contribution by patch number, Bootlin engineer Boris Brezillon appears in the statistics of top-contributors for the 4.5 kernel in the LWN.net statistics article.

This time around, our important contributions were:

  • Addition of a driver for the Microcrystal rv1805 RTC, by Alexandre Belloni.
  • A huge number of patches touching all NAND controller drivers and the MTD subsystem, from Boris Brezillon. They are the first step of a more general rework of how NAND controllers and NAND chips are handled in the Linux kernel. As Boris explains in the cover letter, his series aims at clarifying the relationship between the mtd and nand_chip structures and hiding NAND framework internals to NAND. […]. This allows removal of some of the boilerplate code done in all NAND controller drivers, but most importantly, it unifies a bit the way NAND chip structures are instantiated.
  • On the support for the Marvell ARM processors:
    • In the mvneta networking driver (used on Armada 370, XP, 38x and soon on Armada 3700): addition of naive RSS support with per-CPU queues, configure XPS support, numerous fixes for potential race conditions.
    • Fix in the Marvell CESA driver
    • Misc improvements to the mv_xor driver for the Marvell XOR engines.
    • After four years of development the 32-bits Marvell EBU platform support is now pretty mature and the majority of patches for this platform now are improvements of existing drivers or bug fixes rather than new hardware support. Of course, the support for the 64-bits Marvell EBU platform has just started, and will require a significant number of patches and contributions to be fully supported upstream, which is an on-going effort.
  • On the support for the Atmel ARM processors:
    • Addition of the support for the L+G VInCo platform.
    • Improvement to the macb network driver to reset the PHY using a GPIO.
    • Fix Ethernet PHY issues on Atmel SAMA5D4
  • On the support for Allwinner ARM processors:
    • Implement audio capture in the sun4i audio driver.
    • Add the support for a special pin controller available on Allwinner A80.

The complete list of our contributions:

“Porting Linux on ARM” seminar road show in France

CaptronicIn December 2015, Bootlin engineer Alexandre Belloni gave a half-day seminar “Porting Linux on ARM” in Toulouse (France) in partnership with french organization Captronic. We published the materials used for the seminar shortly after the event.

We are happy to announce that this seminar will be given in four different cities in France over the next few months:

  • In Montpellier, on April 14th from 2 PM to 6 PM. See this page for details.
  • In Clermont-Ferrand, on April 27th from 2 PM to 6 PM. See this page for details.
  • In Brive, on April 28th from 9 AM to 1 PM. See this page for details.
  • Near Chambéry, on May 25th from 9:30 AM to 5/30 PM. See this page for details.
  • Near Bordeaux, on June 2nd from 2 PM to 6 PM. See this page for details.
  • Near Nancy, on June 16th from 2 PM to 6 PM. See this page for details.

The seminar is delivered in French, and the event is free after registration. The speaker, Alexandre Belloni, has worked on porting botloaders and the Linux kernel on a number of ARM platforms (Atmel, Freescale, Texas Instruments and more) and is the Linux kernel co-maintainer for the RTC subsystem and the support of the Atmel ARM processors.

Factory flashing with U-Boot and fastboot on Freescale i.MX6

Introduction

For one of our customers building a product based on i.MX6 with a fairly low-volume, we had to design a mechanism to perform the factory flashing of each product. The goal is to be able to take a freshly produced device from the state of a brick to a state where it has a working embedded Linux system flashed on it. This specific product is using an eMMC as its main storage, and our solution only needs a USB connection with the platform, which makes it a lot simpler than solutions based on network (TFTP, NFS, etc.).

In order to achieve this goal, we have combined the imx-usb-loader tool with the fastboot support in U-Boot and some scripting. Thanks to this combination of a tool, running a single script is sufficient to perform the factory flashing, or even restore an already flashed device back to a known state.

The overall flow of our solution, executed by a shell script, is:

  1. imx-usb-loader pushes over USB a U-Boot bootloader into the i.MX6 RAM, and runs it;
  2. This U-Boot automatically enters fastboot mode;
  3. Using the fastboot protocol and its support in U-Boot, we send and flash each part of the system: partition table, bootloader, bootloader environment and root filesystem (which contains the kernel image).
The SECO uQ7 i.MX6 platform used for our project.
The SECO uQ7 i.MX6 platform used for our project.

imx-usb-loader

imx-usb-loader is a tool written by Boundary Devices that leverages the Serial Download Procotol (SDP) available in Freescale i.MX5/i.MX6 processors. Implemented in the ROM code of the Freescale SoCs, this protocol allows to send some code over USB or UART to a Freescale processor, even on a platform that has nothing flashed (no bootloader, no operating system). It is therefore a very handy tool to recover i.MX6 platforms, or as an initial step for factory flashing: you can send a U-Boot image over USB and have it run on your platform.

This tool already existed, we only created a package for it in the Buildroot build system, since Buildroot is used for this particular project.

Fastboot

Fastboot is a protocol originally created for Android, which is used primarily to modify the flash filesystem via a USB connection from a host computer. Most Android systems run a bootloader that implements the fastboot protocol, and therefore can be reflashed from a host computer running the corresponding fastboot tool. It sounded like a good candidate for the second step of our factory flashing process, to actually flash the different parts of our system.

Setting up fastboot on the device side

The well known U-Boot bootloader has limited support for this protocol:

The fastboot documentation in U-Boot can be found in the source code, in the doc/README.android-fastboot file. A description of the available fastboot options in U-Boot can be found in this documentation as well as examples. This gives us the device side of the protocol.

In order to make fastboot work in U-Boot, we modified the board configuration file to add the following configuration options:

#define CONFIG_CMD_FASTBOOT
#define CONFIG_USB_FASTBOOT_BUF_ADDR       CONFIG_SYS_LOAD_ADDR
#define CONFIG_USB_FASTBOOT_BUF_SIZE          0x10000000
#define CONFIG_FASTBOOT_FLASH
#define CONFIG_FASTBOOT_FLASH_MMC_DEV    0

Other options have to be selected, depending on the platform to fullfil the fastboot dependencies, such as USB Gadget support, GPT partition support, partitions UUID support or the USB download gadget. They aren’t explicitly defined anywhere, but have to be enabled for the build to succeed.

You can find the patch enabling fastboot on the Seco MX6Q uQ7 here: 0002-secomx6quq7-enable-fastboot.patch.

U-Boot enters the fastboot mode on demand: it has to be explicitly started from the U-Boot command line:

U-Boot> fastboot

From now on, U-Boot waits over USB for the host computer to send fastboot commands.

Using fastboot on the host computer side

Fastboot needs a user-space program on the host computer side to talk to the board. This tool can be found in the Android SDK and is often available through packages in many Linux distributions. However, to make things easier and like we did for imx-usb-loader, we sent a patch to add the Android tools such as fastboot and adb to the Buildroot build system. As of this writing, our patch is still waiting to be applied by the Buildroot maintainers.

Thanks to this, we can use the fastboot tool to list the available fastboot devices connected:

# fastboot devices

Flashing eMMC partitions

For its flashing feature, fastboot identifies the different parts of the system by names. U-Boot maps those names to the name of GPT partitions, so your eMMC normally requires to be partitioned using a GPT partition table and not an old MBR partition table. For example, provided your eMMC has a GPT partition called rootfs, you can do:

# fastboot flash rootfs rootfs.ext4

To reflash the contents of the rootfs partition with the rootfs.ext4 image.

However, while using GPT partitioning is fine in most cases, i.MX6 has a constraint that the bootloader needs to be at a specific location on the eMMC that conflicts with the location of the GPT partition table.

To work around this problem, we patched U-Boot to allow the fastboot flash command to use an absolute offset in the eMMC instead of a partition name. Instead of displaying an error if a partition does not exists, fastboot tries to use the name as an absolute offset. This allowed us to use MBR partitions and to flash at defined offset our images, including U-Boot. For example, to flash U-Boot, we use:

# fastboot flash 0x400 u-boot.imx

The patch adding this work around in U-Boot can be found at 0001-fastboot-allow-to-flash-at-a-given-address.patch. We are working on implementing a better solution that can potentially be accepted upstream.

Automatically starting fastboot

The fastboot command must be explicitly called from the U-Boot prompt in order to enter fastboot mode. This is an issue for our use case, because the flashing process can’t be fully automated and required a human interaction. Using imx-usb-loader, we want to send a U-Boot image that automatically enters fastmode mode.

To achieve this, we modified the U-Boot configuration, to start the fastboot command at boot time:

#define CONFIG_BOOTCOMMAND "fastboot"
#define CONFIG_BOOTDELAY 0

Of course, this configuration is only used for the U-Boot sent using imx-usb-loader. The final U-Boot flashed on the device will not have the same configuration. To distinguish the two images, we named the U-Boot image dedicated to fastboot uboot_DO_NOT_TOUCH.

Putting it all together

We wrote a shell script to automatically launch the modified U-Boot image on the board, and then flash the different images on the eMMC (U-Boot and the root filesystem). We also added an option to flash an MBR partition table as well as flashing a zeroed file to wipe the U-Boot environment. In our project, Buildroot is being used, so our tool makes some assumptions about the location of the tools and image files.

Our script can be found here: flash.sh. To flash the entire system:

# ./flash.sh -a

To flash only certain parts, like the bootloader:

# ./flash.sh -b 

By default, our script expects the Buildroot output directory to be in buildroot/output, but this can be overridden using the BUILDROOT environment variable.

Conclusion

By assembling existing tools and mechanisms, we have been able to quickly create a factory flashing process for i.MX6 platforms that is really simple and efficient. It is worth mentioning that we have re-used the same idea for the factory flashing process of the C.H.I.P computer. On the C.H.I.P, instead of using imx-usb-loader, we have used FEL based booting: the C.H.I.P indeed uses an Allwinner ARM processor, providing a different recovery mechanism than the one available on i.MX6.