Back in June last year, we launched a new service that provides pre-compiled and ready-to-use cross compilation toolchains for a large number of CPU architectures and C library configurations. They are available from toolchains.bootlin.com.
We have recently updated those toolchains, with the following improvements:
Our stable family of toolchains has been updated in terms of components versions: we’re using gcc 6.3.0, binutils 2.29.1, gdb 7.11.1, kernel headers 4.1, glibc 2.26, musl 1.1.18 and uClibc 1.0.28
Our bleeding-edge family of toolchains has also been updated in terms of components versions: we’re using gcc 7.3.0, binutils 2.30, gdb 8.0.1, kernel headers 4.9.80, glibc 2.27, musl 1.1.8 and uClibc 1.0.28
The tarballs now have a more nice-looking version number, and the version number is also included in the directory after extracting the tarball
Qemu testing of the PowerPC64 little-endian configuration was added
Mid-March of this year, 8 engineers from Bootlin attended the Embedded Linux Conference North-America in Portland, Oregon. We had a strong presence at this conference with 5 talks, one BoF and two E-ALE tutorial sessions.
In this first blog post about ELC 2018, we want to share the slides and videos of the talks we gave during the conference.
Buildroot: What’s new? – Thomas Petazzoni
Buildroot is a popular and easy to use embedded Linux build system. Within minutes, it is capable of generating lightweight and customized Linux systems, including the cross-compilation toolchain, kernel and bootloader images, as well as a wide variety of userspace libraries and programs.
After a short introduction about Buildroot, this talk will go through the numerous new features and improvements that have appeared in the last few years, and show how they can be useful for developers, users and contributors.
This talk is an updated version of the one given at ELCE 2017.
NAND flash chips are almost everywhere, sometimes hidden in eMMCs, sometimes they are just parallel NAND chips under the orders of your favorite NAND controller. Each NAND vendor follows its own rules. Each SoC vendor creates his preferred abstraction for interacting with these chips.
Handling all of that requires some abstraction, and that is currently being enhanced in Linux! A new interface, called exec_op is showing up. It has been designed to match the most diverse situations. It should ease the support of advanced controllers as well as the implementation of vendor-specific NAND flash features.
This talk will start with some basics about NAND memories, especially their weaknesses and how we get rid of them. It will also show how the interaction between NAND chips and controllers has been standardized over the years and how it is planned to drive NAND controllers within Linux.
Secure Boot from A to Z – Quentin Schulz & Mylène Josserand
Based on our complementary experience on building a secure system on an i.MX6 custom board, we’ll present how to build a complete chain-of-trust for a platform.
This talk will introduce each and every link of the chain-of-trust from the boot ROM to filesystem, as well as the bootloader and kernel with real life examples.
We’ll go through everything needed from the signing of binaries (U-Boot and kernel) to the secured automation of kernel booting within the bootloader, the use of dm-verity and switchroot for securing the filesystem, and more.
I + I2C = I3C: What’s in this Additional ‘I’? – Boris Brezillon
The MIPI Alliance recently released version 1 of the I3C (pronounce ‘eye-three-see’) bus specification, which is supposed to be an improvement over the long-standing I2C and SPI protocols. Compared to I2C/SPI, I3C provides a higher data rate, lower power consumption and additional features such as dynamic address assignment, host join, in-band interrupts. For the last year or so, Bootlin has been working with Cadence Design Systems on supporting this new kind of bus in Linux.
With this talk we would like to introduce this new bus and the concepts it brings to the table. We will also detail how we plan to expose the new features exposed by the I3C protocol in Linux and go through future possible improvements of the I3C framework that has already been submitted for review on the Linux kernel mailing list.
Introduction to Linux Kernel Driver Programming: i2c drivers – Michael Opdenacker
For people new to Linux kernel driver programming, writing a driver for an I2C device is a relatively easy way to start. This presentation will start by explaining the Device Model, the mechanism that the Linux kernel offers to bind drivers to devices. Even though the way to detect or describe devices can depend on the bus or CPU architecture, the infrastructure binding devices with drivers is universal and therefore applies to all types of device drivers in the Linux kernel. You will see how the driver uses one of the frameworks offered by the Linux kernel to expose device data to user space in a generic way. Once again, this type of mechanism is used everywhere in the Linux kernel.
Michael presented this topic as part of the E-ALE track, we’ll update this blog article once the recording is available to embed the video.
This “Birds of a Feather” session will start by a quick update on available resources, patches and recent work to reduce the size of user-space and of the Linux kernel (in particular the efforts from Nicolas Pitre).
An ARM based system running the mainline kernel with about 3 MB of RAM will also be demonstrated.
If you are interested in the size topic, please join this BoF and share your experience, the resources you have found and your ideas for further size reduction techniques! This BoF will build upon the one run at the latest Embedded Linux Conference in Europe.
We’ll update this blog article once the recording is available to embed the video.
Getting Started with Buildroot – Thomas Petazzoni
Need to create simple and optimized Linux systems for your embedded devices? Tired of complicated tools? You should try Buildroot!
In this tutorial, we will first introduce Buildroot, a popular embedded Linux build system, that allows you to build your own cross-compilation toolchain, Linux kernel and bootloader images, as well as root filesystem with your selection of user-space libraries and applications, all from an easy-to-use “menuconfig” interface.
Thomas presented this topic as part of the E-ALE track, we’ll update this blog article once the recording is available to embed the video.
Ethernet Switch Support in the Linux Kernel – Alexandre Belloni
Hardware Ethernet switches are appearing on more SoC families and can take care of many network functionalities like VLAN tagging, IGMP snooping, link aggregation,… Linux is able to offload network processing to those switches using the switchdev and the DSA APIs.
This talk will introduce the Ethernet switches and their typical features, the Linux switchdev and DSA APIs and their differences. It will also give an overview of sample implementations and how to use the features from userspace.
Following up on the work started last week, I finished implementing initial support for displaying the NV12-based tiled format (that we shall call MB32-tiled NV12). The frame, that was dumped from the VPU, is now correctly displayed on the screen (after adapting scaling coefficients that needed specific tweaking for this use case).
The result can be shown in the following picture, where our Big Buck Bunny has the right coloring:
Scaling is also supported for the tiled format, so the frame can be shown in full screen without resorting to software scaling.
A series of patches supporting these features was sent for review on the dri-devel mailing list, where it already got some feedback from Maxime Ripard (who maintains the sun4i DRM driver impacted by these patches) as well as other members of the community! There is already enough material to craft a second version and send it again for review.
Significant time was spent figuring out the DRM, KMS, DRI and X11 graphics pipeline (as well as specific details of the inner workings of display hardware) and how to properly integrate the overlay DRM plane with all this. We are evaluating all our options here before spending time on a specific implementation. Of course, we are trying to keep things as generic as possible and avoid introducing platform-specific code in userspace, but there are also challenges to overcome in this regard. On the Wayland side, things are looking much brighter as compositors such as Weston have support for managing hardware planes directly, so there should be less work required.
Finally, I started working on dmabuf support, that I am testing with gstreamer‘s kmssink, that allows outputting directly in a hardware plane. Once this work is ready, we’ll be able to get an idea of the performance of the VPU when it is not limited by software-based untiling and compositing. Stay tuned for updates in this direction!
After the initial submission of the Sunxi-Cedrus driver last week, I spent most of this week looking into the sun4i DRM (Direct Rendering Manager) driver. The driver is in charge of handling the display pipeline on Allwinner SoCs. Tight integration of the VPU and the display pipeline is required in order to achieve decent video playback performance. That is because the output format of the VPU is a 32×32 tiled format based on NV12, a YUV420 semi-planar format, with one plane for the Y component (luminance) and one plane for the interleaved UV components (chrominance). While NV12 is a standard format for video output, the tiling is rather specific to the VPU, so the frames have to be untiled before they can be used. This operation, when done in software, is rather slow. Moreover, software-based compositing of the decoded frames is also a bottleneck that impacts the overall performance.
In order to circumvent these issues, we will be using the display engine itself to untile the VPU output frames and show the untiled frames directly in a dedicated hardware plane, that is then composed with the primary plane. This requires several features and especially support for the display engine’s frontend, that has the required components to untile and decode the frames. Partial support for the frontend was recently contributed by Maxime Ripard and is on its way to landing in the mainline Linux kernel, providing a base for my VPU-related work. Maxime’s patches allow scaling hardware planes (among other things), a feature that will be very useful for scaling videos to the screen size in hardware rather than software (which is another major bottleneck for performance).
Support for untiling the VPU frames is approaching completion (luminance is correctly decoded while chrominance is not yet correctly handled).
Once the frames are properly shown on screen, it’ll be time to make sure that dmabuf works as expected, which will allow us to send buffers from the VPU to the display engine without any copy, thus improving performance.
We should be making good progress on this topic over the upcoming week and start contributing patches to the sun4i DRM driver, so stay tuned for our next status update!
According to Linux Kernel Patch statistics, Bootlin (now Bootlin) contributed 150 patches to this release, making it the 16th contributing company by number of commits.
The main highlights of our contributions are:
In the RTC subsystem, Alexandre Belloni made a number of improvements to various drivers, mainly making them use the nvmem subsystem where appropriate, and use the recently introduced rtc_register_device() API.
In the MTD subsystem, both Boris Brezillon and Miquèl Raynal made a number of contributions, mainly fixes.
For Marvell platforms
Antoine Ténart contributed a few fixes to the inside-secure crypto accelerator driver, used on Marvell Armada 3700 and Armada 7K/8K
Antoine Ténart also contributed fixes and improvements to the mvpp2 network driver, used for the Ethernet controller on the Marvell Armada 7K/8K. His improvements include preparation work to support Receive Side Scaling (RSS).
Antoine Ténart enabled more networking ports and features in some Armada 7K/8K boards, especially SFP ports on Armada 7040 DB and Armada 7040 DB.
Boris Brezillon contributed a few fixes to the Marvell CESA crypto accelerator driver, used on the older Orion, Kirkwood, Armada 370/XP/38x processors. He migrated the driver to use the skcipher interface of the Linux kernel crypto framework.
Grégory Clement enabled NAND support on Armada 7K, and contributed a number of fixes around MMC support for some Marvell boards.
Thomas Petazzoni contributed a few minor Device Tree enhancements for Marvell platforms: fixing MPP muxing on an older Kirkwood platform, enabling more PCIe ports on Armada 8040 DB, etc.
Miquèl Raynal contributed support for more advanced statistics in the mvpp2 network driver.
Miquèl Raynal added support for the extended UART for the Marvell Armada 3720 processor, both in the UART driver and in the Device Tree.
For the RaspberryPi platform, Boris Brezillon contributed a few fixes to the vc4 display driver, and added support for the new DRM_IOCTL_VC4_GEM_MADVISE ioctl, which can be used to ask the userspace applications to purge inactive buffers when allocations start to fail in the kernel.
For Allwinner platforms
Mylène Josserand contributed a fix for the Allwinner A83 clock driver, fixing I2C bus clocks.
Quentin Schulz contributed a few fixes to the sun4i-gpadc-iio.c driver, which is used for the ADCs on several Allwinner processors.
Maxime Ripard made a number of fixes to the sun8i-codec driver, fixing clock issues, left/right channels inversion, etc.
Maxime Ripard made a number of improvements to the sun4i DRM display driver.
Maxime Ripard improved the support for the A83 processor (described the UART1 controller, the MMC1 controller, added support for display clocks) and added the Device Tree for a new A83 device.
Maxime Ripard also did a number of cleanups and misc improvements in a significant number of Device Tree files for Allwinner platforms.
Thomas Petazzoni made a few fixes to the sh_eth network driver, used on several Renesas SuperH platform, as part of a recent project Bootlin did on SuperH 4.
Bootlin engineers are not only contributors, but also maintainers of various subsystems in the Linux kernel, which means they are involved in the process of reviewing, discussing and merging patches contributed to those subsystems:
Maxime Ripard, as the Allwinner platform co-maintainer, merged 108 patches from other contributors
Boris Brezillon, as the MTD/NAND maintainer, merged 34 patches from other contributors
Alexandre Belloni, as the RTC maintainer and Atmel platform co-maintainer, merged 50 patches from other contributors
Grégory Clement, as the Marvell EBU co-maintainer, merged 24 patches from other contributors
Here is the commit by commit detail of our contributons to 4.15:
Just over a week ago, I started my internship focused on adding upstream Linux kernel support for the Allwinner VPU at Bootlin’s Toulouse office. The team has been super-friendly and very helpful to help me get settled and I’m definitely happy about moving to Toulouse for the occasion!
This first week of work was focused on studying and rebasing the work done by Florent Revest a year and a half ago. As a main development target, I went for an A33-based board, the SinA33 from Sinlinx. Florent’s patches for the sunxi-cedrus driver were rebased against the latest release candidate version of Linus’ tree, v4.16-rc4.
The driver was then adapted to use the latest version of the V4L2 request API, a crucial piece of plumbing needed to provide coherency between setting specific controls for the media stream and the input/output buffers that these controls are related to. A few bugs needed fixing along the way, in order to avoid memory corruptions (use-after-free) and to properly schedule the VPU to run when a request is submitted. With these fixes the driver was ready, so it was sent for review on the linux-media mailing list. On the userspace side, the cedrus-specific libva was also updated to use the latest version of the request API.
The next step in the pipeline is to use a common buffer for the VPU’s decoded frame and the display controller’s plane, using dmabuf. This should bring a significant performance improvement and eventually allow for hardware-based scaling when decoding videos through the standard DRM/KMS interfaces. However, this requires adding support for the specific format used by the VPU (a multiplanar NV12 format with 32×32 tiles) into the display controller code.
Over the last months, Bootlin engineers Boris Brezillon and Miquèl Raynal have been working on rewriting the NAND controller driver used on a large number of Marvell SoCs. This NAND controller driver had grown very complicated, and Miquèl’s adventure in this rework led him to contribute a new interface to the NAND framework, in order to simplify implementing NAND controller drivers for complex NAND controllers. In this blog post, Miquèl summarizes the original issue, and how it is solved by the ->exec_op() interface he has contributed.
The NAND framework is the layer between the generic MTD layer and the NAND controller drivers. Its purpose is to handle MTD requests and transform them into understandable NAND operations the controller will have to send to the NAND chip.
For general information about NANDs, the reader is invited to read the ONFI specification (Open NAND Flash Interface) which defines the most common NAND operations.
Interacting with a NAND chip
Raw NANDs (so-called “parallel NANDs”) are slave devices waiting for instructions from the controller. An operation is a sequence of instructions usually referred as “command” (CMD), “addresses” (ADDR), and “data” cycles (DATA_IN/DATA_OUT) and sometimes wait periods (WAITRDY). Some everyday operations any NAND enthusiast should know by heart are, for instance:
How it was handled in the Linux kernel
Today, a majority of NAND controlller drivers implement the ->cmd_ctrl() hook. It aimed to be a very small function, designed to just send command and address cycles independently, usually embedding some very controller-specific logic. This hook was supposed to be called by a function of higher level from the NAND core, ->cmdfunc(). In addition to calling ->cmd_ctrl() to send command and address cycles, the core would also call ->read|write_byte|word|buf() hooks to actually move data from the NAND controller and the memory (the DATA parts in the diagram above).
This approach worked very well with simple NAND controllers, which are just able to send command and address cycles one at a time to the NAND chip, without any extra intelligence. However, NAND controllers have become more and more complex and now can handle higher-level operations, usually to provide higher performance. For example, a NAND controller may provide an operation that would do all of the command and address cycles of a read-page operation in one-go. Some controllers even support only those higher-level operations, and are not able to simply do the basic operation of sending one command cycle or one data cycle. To handle such controllers, their drivers were overloading the ->cmdfunc() hook directly, circumventing the generic NAND core implementation of ->cmdfunc(). This is a first drawback: it is no longer possible to easily add logic to the NAND core to support new NAND operations, because some drivers overload the ->cmdfunc() logic. Worse, ->cmdfunc() doesn’t provide some information such as the length of the data transfer, which some controllers actually need in order to run the desired operation. NAND controller drivers started to have complicated state machines just to work around the NAND framework limitations.
Some driver-specific implementations of this hook started diverging from the original one, giving maintainers a lot of pain to maintain the whole subsystem, specifically when they needed to introduce additional vendor-specific operations support. These implementations were not only diverse but also incomplete, sometimes buggy and most importantly, developers had to guess the data that would probably be moved by the core after that, which is clearly a symptom that the framework was not fitting the user needs anymore.
The ->exec_op() era
The NAND subsystem maintainers decided to switch to a new approach, based on a new hook called ->exec_op(), implemented by NAND controller drivers and called by the generic NAND core. The logic behind that name is to provide to every controller a generic interface that can easily be extended and exposes the overall NAND operation to be performed. This way, the driver can optimize depending on the controller capabilities without the need of a complex state machine as ->cmdfunc() was.
All major NAND generic raw operations like reset, reading the NAND ID, selecting a set of timings, reading/writing data and so on found their place into small internal functions named nand_[operation]_op().
From the NAND controller driver point of view, an array of instructions is received for each operation. The controller then needs to parse these instructions, decides if it can handle the overall operation, splits the operation if needed, and executes what is requested.
Using the ->exec_op() interface is as simple as declaring a list with the controller capabilities, each entry of this array having a callback function knowing the overall operation that will actually handle all the logic. The NAND core was enhanced with a proper parser that one may use in his driver to handle the callback selection logic.
The ->exec_op() interface in the NAND core has been accepted and merged upstream, and will be part of Linux 4.16. The first driver converted to this new interface was obviously the NAND controller driver used on Marvell platforms, pxa3xx_nand. It has been rewritten as marvell_nand, and will also be part of Linux 4.16. Even though the new driver is longer (by lines of code) than the previous one, it supports additional features (such as raw read and write operations), allows the NAND core to pass custom commands to the NAND chip, and has a logic that is a lot less complicated.
Miquèl has also worked on converting the fsmc_nand driver to ->exec_op(), but this work hasn’t been merged yet. In the community, Stefan Agner has taken on the task to convert the vf610_nfc driver to this new approach.
Bootlin is proud to have contributed such enhancements to the Linux kernel, and hopes to see other developers contribute to this subsystem in the near future, by migrating their favorite NAND controller driver to ->exec_op()!
Back in 2012, Bootlin engineer Maxime Ripard pioneered the support for Allwinner processors in the official Linux kernel. Today, thanks to the contributions of numerous developers around the world and our involvement, there is very good support for a large number of Allwinner processors in the Linux kernel, to the point where actual Allwinner-based products are shipping with the mainline kernel.
Despite this major effort, there is one area that has remained unsupported in the mainline kernel: the video decoding and encoding engine, which allows to accelerate in hardware the decoding and encoding of popular codecs such as MPEG2, MPEG4 or H264. Last summer, we successfully implemented a prototype, supporting MPEG2 decoding and partially MPEG4 decoding.
Today, we are launching a crowdfunding campaign to fund the remainder of the development: finishing MPEG4 decoding support, implementing H264 decoding, optimizing the rendering of video frames in cooperation with the display driver, and upstreaming the driver. We also have additional goals of supporting H265, encoding support, and additional Allwinner SoCs.
In the vendor-provided kernel, this video decoding/encoding unit is supported by a kernel driver that uses a non-standard user-space API, in conjunction with a binary-only userspace blob. Fortunately, a number of people have done an enormous reverse engineering effort, which we have leveraged for our existing prototype, and which we intend to use to continue the development of this upstream driver. Both Maxime Ripard and our intern Paul Kocialkowski will be working on this crowdfunded project.
This is our first crowdfunding campaign to fund upstream Linux kernel development, and we are interested in seeing how much interest there is in such a financing model. Help us making this a success by spreading the word!
The FOSDEM conference will take place next week-end in Brussels, Belgium. As the biggest open-source conference event in Europe, featuring a number of talks related to embedded systems and generally low-level development, Bootlin never misses this event!
Finally, Bootlin is also sponsoring the participation of Thomas Petazzoni to the Buildroot Developers Meeting, which is a 2-day event dedicated to the development of the Buildroot embedded Linux build system. With 14 attendees, this event will have the largest number of participants it ever had. We take this opportunity to thank Google and Mind, who are sponsoring the event by providing the meeting room, lunch and social event for the attendees.
Beyond participating to the event, Maxime and Thomas also presented briefly on two topics:
Maxime Ripard brought up the topic of handling foreign DT bindings (see slides). Currently, the Device Tree bindings documentation is stored in the Linux kernel source tree, in Documentation/devicetree/bindings/. However, in theory, bindings are not operating-system specific, and indeed the same bindings are used in other projects: U-Boot, Barebox, FreeBSD, Zephyr, and probably more. Maxime raised the question of what these projects should do when they create new bindings or extend existing ones? Should they contribute a patch to Linux? Should we have a separate repository for DT bindings? A bit of discussion followed, but without getting to a real conclusion.
Thomas Petazzoni presented on the topic of avoiding duplication in Device Tree representations (see slides). Recent Marvell Armada processors have a hardware layout where a block containing multiple IPs is duplicated several times in the SoC. In the currently available Armada 8040 there are two copies of the CP110 hardware block, and the Linux kernel carries a separate description for each. While very similar, those descriptions have subtle differences that make it non-trivial to de-duplicate. However, future SoCs will not have just 2 copies of the same hardware block, 4 copies or potentially more. In such a situation, duplicating the Device Tree description is no longer reasonable. Thomas presented a solution based on the C pre-processor, and commented on other options, such as a script to generate DTs, or improvements in the DT compiler itself. A discussion around those options followed, and while tooling improvements were considered as being the long-term solution, in the short term the solution based on the C pre-processor was acceptable upstream.
For Bootlin, participating to such events is very important, as it allows to expose to kernel developers the issue we are facing in some of our projects, and to get direct feedback from the developers on how to move forward on those topics. We definitely intend to continue participating in similar events in the future, for topics of interest to Bootlin.