Allwinner VPU support in mainline Linux end of the year status update

The end of 2018 is approaching, which is the right time to provide an update on where we stand with the Allwinner VPU support in Linux. The promise of our Kickstarter campaign was to deliver all funded stretch goals by the end of 2018. As we’ll explain below, all the goals have been met, and only the upstreaming process is not completely finished for some features.

With the Linux 4.20 release happening just a few days ago, the core of the Cedrus driver and the media Requests API that supports it are finally available through Linus Torvalds’ tree! They provide the required interface for hardware-accelerated MPEG-2 decoding on the A13, A20, A33 and H3 Allwinner SoCs.

Cedrus driver in staging

Discussions regarding the details of the stateless video decoding interface offered by the V4L2 API are still taking place, which explains why our driver was integrated as a staging media driver. It was also decided to hide the associated interface for MPEG-2 decoding from the public kernel headers for now. This way, the interface is not considered stable and can keep evolving as discussions are still taking place. Because the kernel has a policy of never breaking backward compatibility, these interfaces will hardly evolve once they are declared stable, so it is crucial to ensure they are ready when that happens. This is especially true given the level of complexity associated with video decoding and the various pitfalls paving that road.

Status of H264 and H265 decoding support

The code for H264 and H265 decoding is already developed and functional, has already been submitted for upstream inclusion but it hasn’t been merged yet. However, with the decoding interfaces considered unstable for now, it should become easier to bring-in H.264 and H.265 decoding support to mainline. Support for these two codecs in the API and our driver is still under review and we will continue sending new iterations for them, with the suggested changes and fixes as well as adaptations for the still-evolving decoding interface. More specifically, our series need to be updated to the latest proposal for identifying reference frames, which is now based on the timestamps of the buffers.

Platform support: H5, A64 and A10 on the horizon

Regarding the platforms supported by our driver, the patches for H5 and A64 support were accepted since our last update and they will make it to Linux 4.21! We are also looking to send out support for the A10, which was missing from our previous series (but shouldn’t cause any particular trouble).

DRM driver improvements

The adaptations for the DRM driver allowing to display the raw decoded frames from the VPU on older platforms were also finalized, with the first half of the series already queued for 4.21. Subsequent changes in the series are still waiting for more reviews as they affect common parts of the framework.

Conclusion

Overall, we are slowly closing down on this project and hoping to get the remaining pieces of our work integrated in mainline Linux as soon as possible, as there seems to be very few obstacles left in the way.

Linux 4.20 released, Bootlin contributions

The Linux 4.20 kernel has been released just before Christmas, on December 23. As usual, LWN had a very nice coverage of the most significant features and improvements provided by this new release as part of the merge window articles: part 1 and part 2.

Bootlin once again contributed to this Linux 4.20 release, with a total of 216 non-merge commits, which puts us the 13th contributing company by number of commits and the 8th contributing company by number of lines changed, according to LWN statistics for the 4.20 release.

Significant Bootlin contributions

For Linux 4.20, the most significant contributions from Bootlin have been:

On the support for Marvell platforms
- Antoine Ténart reworked how the “software thread” mechanism is handled in the mvpp2 network driver, used on Marvell Armada 375 and 7K/8K.
- Maxime Chevallier enabled XPS support in the same mvpp2 driver
- Maxime Chevallier added logic to support 2.5G speed in the mvneta driver, but this logic is not enabled yet, as we don’t have the necessary COMPHY driver for Armada 38x to really allow using 2.5G speed.
- Miquèl Raynal added suspend/resume support for the Armada 37xx clock driver. This is part of a larger work to enable suspend/resume on the Armada 37xx platform.
- Miquèl Raynal improved the Marvell ICU driver to support SEI interrupts. The ICU is the Interrupt Collector Unit, found on Marvell Armada 7K/8K. It turns wired interrupts from a part of the chip called the “Communication Processor” (CP) into message interrupts so that they can be notified to the other part of the chip called the “Application Processor” (AP), which contains the CPU core and GIC.
- Thomas Petazzoni introduced some common code in the PCI subsystem to emulate a PCI root port bridge, and converted the pci-mvebu and pci-aardvark drivers to use it. pci-mvebu already had such an emulation logic, but since it became also needed for pci-aardvark, we turn it into a piece of common code so that it can be shared by both drivers (and perhaps others in the future). pci-mvebu is used on Marvell Armada XP, 370, 375, 38x, while pci-aardvark is used on Marvell Armada 37xx.
On the support for Allwinner platforms
- Paul Kocialkowski contributed the Cedrus VPU driver, with for now just MPEG2 decoding support. This work was done thanks to the successful crowd-funding campaign we did in February/March 2018.
- Paul Kocialkowski contributed a few fixes to the sun4i DRM display driver.
On the support for Microchip/Microsemi MIPS platforms
- Alexandre Belloni added support for the Microsemi Jaguar2 in the Designware SPI controller driver
- Alexandre Belloni added support for the Microsemi Ocelot in the Designware I2C controller driver
- Quentin Schulz contributed a new driver in the PHY subsystem to configure the SERDES muxing on Microsemi Ocelot platforms
- Quentin Schulz contributed a new Device Tree description for the Microsemi Ocelot PCB120 platform
On the support Microchip/Atmel ARM platforms
- Alexandre Belloni got the first step of the clock drivers rework merged. The purpose of this rework is to move from a Device Tree representation with one Device Tree node per clock to a much simpler representation with only one Device Tree node for the whole clock controller. The SAMA5D2, SAMA5D4, AT91SAM9260, AT91SAM9x5 at AT91SAM9RL clock drivers have already been converted. Other platforms and DT changes will arrive in future releases.
In the support for RaspberryPi platforms
- Boris Brezillon contributed a few minor fixes for the vc4 DRM driver.
In the RTC subsystem
- As the subsystem maintainer, Alexandre Belloni as usual did a number of cleanups and improvements in numerous drivers, especially to use more modern APIs where possible.
In the network subsystem
- Quentin Schulz contributed extensive support for the Microsemi VSC8574 and VSC8584 Ethernet PHYs.
In the MTD subsystem
- Subsystem maintainer Boris Brezillon did a lot of changes to pass a nand_chip object to numerous NAND subsystem hooks, and fixed all the drivers accordingly.
- Boris Brezillon also deprecated a significant number of NAND subsystem hooks, by introducing a nand_legacy structure with legacy hooks. This should encourage developers to move their drivers to the more modern MTD/NAND APIs.

Bootlin maintainers activity

A number of Bootlin engineers are maintainers in the Linux kernel, so their work is not visible as patch authors, but rather as developers reviewing and merging patches contributed by others. As part of this maintainer work:

Maxime Ripard reviewed and merged 47 patches from other developers, as the Allwinner platform co-maintainer
Boris Brezillon reviewed and merged 36 patches from other developers, as the MTD subsystem co-maintainer
Alexandre Belloni reviewed and merged 64 patches from other developers, as the RTC subsystem maintainer and the Microchip/Atmel platform co-maintainer
Grégory Clement reviewed and merged 25 patches from other developers, as the Marvell platform co-maintainer
Miquèl Raynal reviewed and merged 71 patches from other developers, as the NAND subsystem co-maintainer

Detailed list of Bootlin contributions

MIPI I3C support is now in upstream Linux

MIPI I3C specification published Back in August 2017, we wrote in a blog post about the first iteration of the Linux kernel subsystem we proposed to support the brand new MIPI I3C bus. Almost a year and half later, there are some really good news:

The Linux I3C support now has its own Git repository on kernel.org
The Linux I3C community has its mailing list.
Besides the first I3C controller driver we wrote for the Cadence I3C Master, Synopsys has contributed a second driver for their I3C master IP: i3c: master: Add driver for Synopsys DesignWare IP. This definitely helped show that there is interest in I3C beyond our contributions, and also helped validate that the subsystem was working fine for a different I3C controller.
The Linux I3C tree has been part of linux-next since November 6, 2018: linux-next: Tree for Nov 6.
The Linux I3C subsystem has received a Reviewed-by: Arnd Bergmann and later on a Acked-by: Greg Kroah-Hartman, which were two key approvals to move forward with the merging of I3C support.
I3C subsystem maintainer Boris Brezillon has therefore sent a pull request to get this subsystem merged in the upcoming 4.21 (or 5.0 ?) Linux kernel: [GIT PULL] i3c: Initial pull request, and this pull request has been merged by Linus Torvalds, so the I3C subsystem is now visible in Linus Git tree: drivers/i3c. The merge commit has been done on December 25, so it arrived as a very nice Christmas present!

It has been a long process, but we are proud and happy to have pioneered the support for I3C in the Linux kernel, from the design of the subsystem based on the I3C specifications all the way to its merging in the upstream Linux kernel and the creation of a small (but hopefully growing) community of developers around it.

Real-Time Summit 2018

This year, Bootlin engineer Maxime Chevallier attended the Real-Time Summit, which took place after ELCE in Edinburgh. In a similar fashion to the Linux Media Summit, this was a 1-day conference dedicated to Real-Time topics in the Linux ecosystem, more specifically the Linux Kernel.

The future of Preempt-RT

Real-Time (or should we say, deterministic) behavior in the Linux kernel has been pursued for a long time, the most famous effort being the Preempt-RT patch. As Steven Rostedt announced during his talk at ELCE 2018, the Preempt-RT patch is close to being fully merged in mainline Linux, we can expect to see this happen in 2019.

Some of the maintainers of the Preempt-RT patch were present at the Real-Time summit, including Thomas Gleixner who lead the discussion throughout the day.

This was the occasion to discuss the remaining points to be addressed for Preempt-RT to make it into mainline Linux :

Printk : As Steven Rostedt explained at ELCE 2017, printk is not very real-time friendly. The main issue was worked around, but John Ogness presented his current work of fully redesigning printk’s behaviour.
Thomas Gleixner talked about the current state of softirq handling, which is also a critical point for determinism. They work by “stealing” some irq context time, falling back to ksoftirqd when necessary. This is particularly problematic for networking drivers that heavily rely on softirq.
Peter Zijlstra exposed the different scheduler related issues that needs to be addressed, focusing on SCHED_DEADLINE.

Modeling and analyzing the kernel behavior

All the talks weren’t about the Preempt-RT match merging effort. Daniel Bristot de Oliveira presented his ongoing academic work on modeling the Linux task model. The idea here is to build a formal model that doesn’t take shortcuts or idealize the way tasks are handled in the kernel, so that this can be used as a basis for academic research on topics such as scheduling.

One of the main arguments is that there’s a gap in terms of language and methodology used between kernel developers and the academic world. Daniel explained how he managed to build a huge state-machine representing the task model, and how he uses it now to verify that tasks behave how they should by running trace events in the state machine.

This talk sparked a lot if interesting discussions, for example Peter Zijlstra suggested to compile the state machine into eBPF code and run it live in the kernel.

Julia Lawall was present in the room, and improvised a talk inspired by Daniel’s presentation. She presented DSAC, a static analysis tool dedicated to finding Sleeping in Atomic Context bugs. Julia is involved in the development and use of the coccinelle tool, and explained that it is quickly limited when trying to find that categories of bugs, where sleeping calls can be deeply nested in a call stack protected by spinlocks. Using LLVM, DSAC can analyze complex scenarios with multiple level of nesting and indirect calls to detect SAC bugs. After analyzing the v4.17 kernel sources for only a few hours, the tool was able to detect more than 1000 bugs, 220 of which were confirmed.

The overall technical level of the different talks was high, leading to passionate discussions and suggestions on every topic that was brought during the day.

Feedback from Bootlin at the Linux Plumbers Conference 2018

The Linux Plumbers Conference (LPC) was held a few weeks ago in Vancouver, BC. As always there were several tracks where contributors gave a presentation of on-going or future work, and discussed it with the audience, on specific topics such as thermal, containers, real time, device tree and many more. For the first time at LPC a 2-day networking track took place. As we work on a diversity of networking projects at Bootlin we decided to attend.

Networking track at LPC. Photo @linuxplumbers.

The hot topic of the last couple of years in conferences in the network subsystem is XDP, so the conference was not exception. We saw a handful of talks and discussions about the on-going work and support of XDP within the kernel. XDP provides a programmable network data path (using eBPF) in the Linux kernel to process bare metal packets at the lowest point in the network stack. Packets are processed directly in the drivers’ Rx queues, before any allocation happen (such as socket buffers). Facebook is one well known heavy user of this technology (every packet toward Facebook is processed by XDP) and its engineers gave feedback about how they use XDP and the issues they faced. Other projects and companies are currently evaluating and starting to use XDP as well: we also saw presentations about XDP/eBPF in Open vSwitch, DPDK or kTLS.

While XDP/eBPF was featured in most of the discussions, other interesting topics where brought up. Andrew Lunn gave a presentation about the current need to go beyond 1G copper PHYs for many Linux enabled embedded devices. This was very interesting for us as we used and worked on the technologies used within the Linux kernel to address this, such as Phylink and the SFP bus (we used those when enabling 10G interfaces in the Marvell MacchiatoBin board).

Another presentation caught our attention as the topic was related to what we do at Bootlin. Jesse Brandeburg from Intel talked about the networking hardware offloads and their APIs. He exposed a brief history of the offloads supported by NICs and then showed some issues with the current APIs, where some use cases or behaviors are not clearly defined and sometimes overlap. This is a feeling we share as we experienced it while implementing some of those hardware networking offloads. Jesse’s idea was to open a discussion to come up with better solutions within the next years, as NICs offloading continue to grow.

The Linux Plumbers Conference was very pleasant and well organized. We had the chance to attend the networking track, seeing lots of great cutting-edge topics being discussed; as well as other interesting tracks.

We’d like to thank the conference and track organizers, we had a great time! Videos, slides and papers are now available on the official website or on Youtube.

Buildroot 2018.11 released, Bootlin contributions inside

Buildroot logo The 2018.11 release of Buildroot was published a few days ago. As is well-known, Bootlin is a strong contributor to this project, and this blog post proposes a summary of the new features provided by 2018.11, and highlights the contributions made by Bootlin.

What’s new in 2018.11 ?

From a CPU architecture support point of view, by far the most important addition is support for the RISC-V 64 architecture. For now, only the 64-bit version of the architecture is supported, but the patches for the 32-bit version have been posted already, and will hopefully be merged for the next release. It is worth mentioning that we have already used the RISC-V 64 support in Buildroot to provide a pre-built toolchain for this architecture on our toolchains.bootlin.com site.
In the toolchain support area, the most important change is that glibc was upgraded to version 2.28. This caused a number of build issues with various packages, which were detected by the project autobuilders and fixed. musl was bumped to version 1.1.20, and the ARM (formerly Linaro) pre-built toolchains for ARM and AArch64 were updated.
The support for hardening flags, i.e flags passed to gcc to improve the “security” of programs, has been changed, and is now done directly as part of the compiler wrapper that Buildroot has. Indeed, when using Buildroot, arm-linux-gcc is not directly the usual gcc compiler, but a small wrapper program that Buildroot uses to make sure we always pass the appropriate flags when calling the cross-compiler. Passing those hardening flags through the wrapper allowed to solve a number of build issues. Options such as BR2_RELRO_PARTIAL, BR2_RELRO_FULL, BR2_SSP_REGULAR, BR2_SSP_STRONG and BR2_SSP_ALL should therefore work better now.
In terms of filesystem images, while Buildroot already supports the most popular filesystem types, two additional filesystems are now supported: btrfs and f2fs.
A number of new default configuration for various boards have been added: Amarula a64-relic, Bananapi m2 ultra, Embest riotboard, Hardkernel Odroid XU-4, QEMU riscv64-virt.
A number of packages have been added: bird, bluez5_utils-headers, btrfs-progs, checksec, davici, duktape, ell, haproxy, libclc, libcorrect, libopencl, libopenresolv, nss-myhostname, riscv-pk, sedutil, spandsp, tini, waffle, xapian, not counting the dozens of new Perl and Python modules that have been added.

Bootlin contributions

By far and large, the most significant contribution from Bootlin to Buildroot is the activity of Thomas Petazzoni as a co-maintainer for the project. Out of the 1366 commits made between the 2018.08 and 2018.11 release, Thomas authored 126 commits, but more importantly reviewed and merged 883 patches from other developers, or in other words 64% of the commits that have been made.

As part of the 127 commits Bootlin contributed, we:

Fixed a large number of build issues reported by the autobuilder, or by the CI testing the build of our defconfigs.
Introduced a make check-package target to more easily use Buildroot check-package tool to verify the coding style of packages.
Updated the musl C library to 1.1.20.
Updated the Solidrun MacchiatoBin defconfigs (board powered by a Marvell Armada 8K processor) to use the most recent kernel, U-Boot and ATF versions.
Started adding a virtual package for opencl.
Fixed a number of missing dependencies on host-pkgconf (i.e pkg-config) which were detected by our work on per-package directories, that we will discuss in a future blog post.

Here is the detailed list of our contributions to this release:

Bootlin at the Linux Media Summit 2018

This year’s edition of the Linux Media Summit happened a month ago, in Edinburgh, right after the Embedded Linux Conference. Since we were already at the ELCE, and that we’ve been more and more involved in the media community thanks to our work on the Allwinner CSI driver and more importantly the Cedrus driver, it was natural for us to attend.

The media summit is usually a meeting to discuss the hot topics, so the whole day was a mix and match of various status updates and discussions on the future needs and developments around the Video4Linux2 framework.

Linux Media Summit attendees

Most of the discussion was about how to improve the contributor’s experience and improve the maintenance. The DRM subsystem was used as an example, since the number of patches are in the same order of magnitude, and a number of v4l2 contributors are also contributing to DRM drivers. Part of the improvement of both the maintenance and contribution experience will also come through some CI work, so there was a lot of discussions on how to improve the already existing tools (such as v4l2-compliance) but also how to setup some automatic tooling to run those tests as early as possible.

A good part of the day was also spent on dealing with the current developments, such as the Request API we’ve used in the Cedrus driver, and how to integrate that API into popular multimedia frameworks like gstreamer or ffmpeg. It looks like our libva implementation was well received, so it will probably be made standard and hosted on linuxtv.org in the near future. Other developments discussed were fault tolerant v4l2, in order to deal with video pipelines where one or several components might not work anymore, and storing the v4l2 controls state in a persistent way.

It was overall a very productive day, and it’s always nice to meet people you interact with over mailing list and IRC on a regular basis. If you want more information, you can read the extensive report.

Bootlin adds SPI NAND support to U-Boot

Bootlin is proud to announce that it has contributed SPI NAND support to the U-Boot bootloader, which is part of the recently released U-Boot 2018.11. Thanks to this effort, one can now use SPI NAND memories from U-Boot, a feature that had been missing for a long time.

State of the art: Linux support

A few months ago, Bootlin engineer Boris Brezillon added SPI-NAND support in the Linux kernel, based on an initial contribution from Peter Pan. As Boris explained in a previous blog post, adding SPI NAND support in Linux required adding a new spi-mem layer, that allows SPI NOR and SPI NAND drivers to leverage regular SPI controller drivers, but also to allow those SPI controller drivers to expose optimized operations for flash memory access. The spi-mem layer was added to the SPI subsystem by a first series of patches, while the SPI NAND support itself was added to the MTD subsystem as part of another patch series.

Moving to U-Boot

Since accessing flash memories from the bootloader is often necessary, Bootlin engineer Miquèl Raynal took the challenge of adding SPI NAND support in U-Boot. Miquèl did this by porting the SPI-mem and SPI-NAND subsystems from Linux to U-Boot. The first challenge when porting the SPI-mem and SPI-NAND code from Linux to U-Boot was that the U-Boot MTD stack hadn’t been synchronized with the one of Linux for quite some time. Thus a number of changes in the Linux MTD subsystem had to be ported to U-Boot as well, which was a fairly time-consuming effort. The SPI NAND code has been imported in drivers/mtd/nand/spi, while the spi-mem layer is in drivers/spi/spi-mem.c.

Once the core code was ready, we had to find a way to let the user interact with the SPI NAND devices. Until now, U-Boot had a separate set of commands for each type of flash memory (nand for parallel NAND, erase/cp for parallel NOR, sf for SPI NOR), and it indeed seemed like adding yet another command was the way to go. Instead, we introduced a new mtd that can be used to access all flash memory devices, regardless of their specific type. We will discuss this mtd in more details in another blog post.

However, such a move to a generic mtd command forced us to do a lot more cleanup than expected, as we ended up reworking the MTD partition handling, and even making deep changes in the ubi command. This was more complicated than anticipated because of the SPI NOR support in U-Boot: it is not very well integrated with MTD subsystem, in the sense that there is a duplication of information between the SPI NOR and MTD subsystems, and when the duplicated information is no longer consistent, really bad things happen. As an example, any call to sf probe was doing a reset of the MTD device structure using memset, causing all other state information contained in this structure to be lost. Since the SPI NAND support relies on the MTD subsystem (much more than the current SPI NOR support), we had to mitigate those issues. Long term, a proper rework of the SPI NOR support in U-Boot is definitely needed.

Some of those issues are present in the 2018.11 release and were discovered by U-Boot users who started testing the new mtd command. We have contributed a patch series addressing them, which hopefully should be merged soon.

Now that those difficulties are hopefully behind us, the U-Boot SPI-NAND support looks pretty stable, and we have quite a few SPI-NAND manufacturer drivers in U-Boot mainline, with Gigadevice, Macronix, Micron and Winbond supported so far. We’re happy to have contributed this new significant feature, as it finally allows to use this popular type of flash memory in U-Boot.

Updated cross-compilation toolchains, RISC-V 64 bit toolchain added

RISC-V 64 toolchain We have just published an updated version of the cross-compilation toolchains available at toolchains.bootlin.com.

The significant changes are:

A RISC-V 64 bit toolchain is now provided, following the addition of support for this architecture to the Buildroot project.
The stable toolchains are now using gcc 7.3.0 (instead of 6.4.0), gdb 7.12.1 (instead of 7.11.1), kernel headers 4.1.52 (instead of 4.1.49), glibc 2.27 (instead of 2.26), musl 1.1.19 (instead of 1.1.18) and uclibc 1.0.30 (instead of 1.0.28). We are still using a 7.x gdb version because the 8.x versions need C++11 support, which requires a recent enough host compiler, which in turn requires using a more modern distribution. Thus, those toolchains would be unusable with older distributions as they would require a recent glibc version on the host. Currently, our stable toolchains are still built within an old Debian Squeeze system, for maximum compatibility with old distributions.
The bleeding-edge toolchains are still using gcc 8.2.0, but gdb is now 8.1.1 (instead of 8.1), kernel headers 4.14.80 (instead of 4.14.57), glibc 2.28 (instead of 2.27), musl 1.1.20 (instead of 1.1.19).

We will continue to update those toolchains with more recent versions of gcc, binutils, gdb and the different C libraries, and add support for more architectures. Do not hesitate to ask for additional features or report any issue encountered when using those toolchains in our bug tracker.

Allwinner VPU support in mainline Linux November status update

Since our previous update back in September, we continued the work to reach the goals set by our crowdfunding campaign and made a number of steps forward. First, we are happy to announce that the core of the Cedrus driver was approved by the linux-media maintainers! It followed the final version of the media request API (the required piece of media framework plumbing necessary for our driver).

Both the API and our driver were merged in time for Linux 4.20, that is currently at the release candidate stage and will be released in a few weeks. The core of the Cedrus driver that is now in Linus’ tree supports hardware-accelerated video decoding for the MPEG-2 codec. We have even already seen contributions from the community, including minor fixes and improvements!

We have also been following-up on the other features covered by our crowdfunding campaign and made good progress on bringing them forward:

The series bringing H.264 decoding to our driver was updated for a second revision on November 15, rebased atop the upcoming Linux release and including a number of fixes as well as documentation;
H.265 decoding support followed with a second version sent on November 23, based on the updated H.264 series and bringing various minor improvements over the first iteration;
The patch series for the display engine DRM driver that adds support for the tiled YUV format used by the VPU was also updated, significantly reworked and submitted again on November 23;
Finally, we submitted a patch series adding support for the A64 and H5 Allwinner SoCs in the Cedrus VPU driver on November 15.

A64 and H5 boards provided by our sponsors Olimex and Libretech, now supported in Cedrus

With these patch series well on their way, we are closer than ever to delivering the remaining goals of the crowdfunding campaign!