Feedback from Bootlin at the Linux Plumbers Conference 2018

The Linux Plumbers Conference (LPC) was held a few weeks ago in Vancouver, BC. As always there were several tracks where contributors gave a presentation of on-going or future work, and discussed it with the audience, on specific topics such as thermal, containers, real time, device tree and many more. For the first time at LPC a 2-day networking track took place. As we work on a diversity of networking projects at Bootlin we decided to attend.

Networking track at LPC. Photo @linuxplumbers.

The hot topic of the last couple of years in conferences in the network subsystem is XDP, so the conference was not exception. We saw a handful of talks and discussions about the on-going work and support of XDP within the kernel. XDP provides a programmable network data path (using eBPF) in the Linux kernel to process bare metal packets at the lowest point in the network stack. Packets are processed directly in the drivers’ Rx queues, before any allocation happen (such as socket buffers). Facebook is one well known heavy user of this technology (every packet toward Facebook is processed by XDP) and its engineers gave feedback about how they use XDP and the issues they faced. Other projects and companies are currently evaluating and starting to use XDP as well: we also saw presentations about XDP/eBPF in Open vSwitch, DPDK or kTLS.

While XDP/eBPF was featured in most of the discussions, other interesting topics where brought up. Andrew Lunn gave a presentation about the current need to go beyond 1G copper PHYs for many Linux enabled embedded devices. This was very interesting for us as we used and worked on the technologies used within the Linux kernel to address this, such as Phylink and the SFP bus (we used those when enabling 10G interfaces in the Marvell MacchiatoBin board).

Another presentation caught our attention as the topic was related to what we do at Bootlin. Jesse Brandeburg from Intel talked about the networking hardware offloads and their APIs. He exposed a brief history of the offloads supported by NICs and then showed some issues with the current APIs, where some use cases or behaviors are not clearly defined and sometimes overlap. This is a feeling we share as we experienced it while implementing some of those hardware networking offloads. Jesse’s idea was to open a discussion to come up with better solutions within the next years, as NICs offloading continue to grow.

The Linux Plumbers Conference was very pleasant and well organized. We had the chance to attend the networking track, seeing lots of great cutting-edge topics being discussed; as well as other interesting tracks.

We’d like to thank the conference and track organizers, we had a great time! Videos, slides and papers are now available on the official website or on Youtube.

Buildroot 2018.11 released, Bootlin contributions inside

Buildroot logo The 2018.11 release of Buildroot was published a few days ago. As is well-known, Bootlin is a strong contributor to this project, and this blog post proposes a summary of the new features provided by 2018.11, and highlights the contributions made by Bootlin.

What’s new in 2018.11 ?

From a CPU architecture support point of view, by far the most important addition is support for the RISC-V 64 architecture. For now, only the 64-bit version of the architecture is supported, but the patches for the 32-bit version have been posted already, and will hopefully be merged for the next release. It is worth mentioning that we have already used the RISC-V 64 support in Buildroot to provide a pre-built toolchain for this architecture on our toolchains.bootlin.com site.
In the toolchain support area, the most important change is that glibc was upgraded to version 2.28. This caused a number of build issues with various packages, which were detected by the project autobuilders and fixed. musl was bumped to version 1.1.20, and the ARM (formerly Linaro) pre-built toolchains for ARM and AArch64 were updated.
The support for hardening flags, i.e flags passed to gcc to improve the “security” of programs, has been changed, and is now done directly as part of the compiler wrapper that Buildroot has. Indeed, when using Buildroot, arm-linux-gcc is not directly the usual gcc compiler, but a small wrapper program that Buildroot uses to make sure we always pass the appropriate flags when calling the cross-compiler. Passing those hardening flags through the wrapper allowed to solve a number of build issues. Options such as BR2_RELRO_PARTIAL, BR2_RELRO_FULL, BR2_SSP_REGULAR, BR2_SSP_STRONG and BR2_SSP_ALL should therefore work better now.
In terms of filesystem images, while Buildroot already supports the most popular filesystem types, two additional filesystems are now supported: btrfs and f2fs.
A number of new default configuration for various boards have been added: Amarula a64-relic, Bananapi m2 ultra, Embest riotboard, Hardkernel Odroid XU-4, QEMU riscv64-virt.
A number of packages have been added: bird, bluez5_utils-headers, btrfs-progs, checksec, davici, duktape, ell, haproxy, libclc, libcorrect, libopencl, libopenresolv, nss-myhostname, riscv-pk, sedutil, spandsp, tini, waffle, xapian, not counting the dozens of new Perl and Python modules that have been added.

Bootlin contributions

By far and large, the most significant contribution from Bootlin to Buildroot is the activity of Thomas Petazzoni as a co-maintainer for the project. Out of the 1366 commits made between the 2018.08 and 2018.11 release, Thomas authored 126 commits, but more importantly reviewed and merged 883 patches from other developers, or in other words 64% of the commits that have been made.

As part of the 127 commits Bootlin contributed, we:

Fixed a large number of build issues reported by the autobuilder, or by the CI testing the build of our defconfigs.
Introduced a make check-package target to more easily use Buildroot check-package tool to verify the coding style of packages.
Updated the musl C library to 1.1.20.
Updated the Solidrun MacchiatoBin defconfigs (board powered by a Marvell Armada 8K processor) to use the most recent kernel, U-Boot and ATF versions.
Started adding a virtual package for opencl.
Fixed a number of missing dependencies on host-pkgconf (i.e pkg-config) which were detected by our work on per-package directories, that we will discuss in a future blog post.

Here is the detailed list of our contributions to this release:

Bootlin at the Linux Media Summit 2018

This year’s edition of the Linux Media Summit happened a month ago, in Edinburgh, right after the Embedded Linux Conference. Since we were already at the ELCE, and that we’ve been more and more involved in the media community thanks to our work on the Allwinner CSI driver and more importantly the Cedrus driver, it was natural for us to attend.

The media summit is usually a meeting to discuss the hot topics, so the whole day was a mix and match of various status updates and discussions on the future needs and developments around the Video4Linux2 framework.

Linux Media Summit attendees

Most of the discussion was about how to improve the contributor’s experience and improve the maintenance. The DRM subsystem was used as an example, since the number of patches are in the same order of magnitude, and a number of v4l2 contributors are also contributing to DRM drivers. Part of the improvement of both the maintenance and contribution experience will also come through some CI work, so there was a lot of discussions on how to improve the already existing tools (such as v4l2-compliance) but also how to setup some automatic tooling to run those tests as early as possible.

A good part of the day was also spent on dealing with the current developments, such as the Request API we’ve used in the Cedrus driver, and how to integrate that API into popular multimedia frameworks like gstreamer or ffmpeg. It looks like our libva implementation was well received, so it will probably be made standard and hosted on linuxtv.org in the near future. Other developments discussed were fault tolerant v4l2, in order to deal with video pipelines where one or several components might not work anymore, and storing the v4l2 controls state in a persistent way.

It was overall a very productive day, and it’s always nice to meet people you interact with over mailing list and IRC on a regular basis. If you want more information, you can read the extensive report.

Bootlin adds SPI NAND support to U-Boot

Bootlin is proud to announce that it has contributed SPI NAND support to the U-Boot bootloader, which is part of the recently released U-Boot 2018.11. Thanks to this effort, one can now use SPI NAND memories from U-Boot, a feature that had been missing for a long time.

State of the art: Linux support

A few months ago, Bootlin engineer Boris Brezillon added SPI-NAND support in the Linux kernel, based on an initial contribution from Peter Pan. As Boris explained in a previous blog post, adding SPI NAND support in Linux required adding a new spi-mem layer, that allows SPI NOR and SPI NAND drivers to leverage regular SPI controller drivers, but also to allow those SPI controller drivers to expose optimized operations for flash memory access. The spi-mem layer was added to the SPI subsystem by a first series of patches, while the SPI NAND support itself was added to the MTD subsystem as part of another patch series.

Moving to U-Boot

Since accessing flash memories from the bootloader is often necessary, Bootlin engineer Miquèl Raynal took the challenge of adding SPI NAND support in U-Boot. Miquèl did this by porting the SPI-mem and SPI-NAND subsystems from Linux to U-Boot. The first challenge when porting the SPI-mem and SPI-NAND code from Linux to U-Boot was that the U-Boot MTD stack hadn’t been synchronized with the one of Linux for quite some time. Thus a number of changes in the Linux MTD subsystem had to be ported to U-Boot as well, which was a fairly time-consuming effort. The SPI NAND code has been imported in drivers/mtd/nand/spi, while the spi-mem layer is in drivers/spi/spi-mem.c.

Once the core code was ready, we had to find a way to let the user interact with the SPI NAND devices. Until now, U-Boot had a separate set of commands for each type of flash memory (nand for parallel NAND, erase/cp for parallel NOR, sf for SPI NOR), and it indeed seemed like adding yet another command was the way to go. Instead, we introduced a new mtd that can be used to access all flash memory devices, regardless of their specific type. We will discuss this mtd in more details in another blog post.

However, such a move to a generic mtd command forced us to do a lot more cleanup than expected, as we ended up reworking the MTD partition handling, and even making deep changes in the ubi command. This was more complicated than anticipated because of the SPI NOR support in U-Boot: it is not very well integrated with MTD subsystem, in the sense that there is a duplication of information between the SPI NOR and MTD subsystems, and when the duplicated information is no longer consistent, really bad things happen. As an example, any call to sf probe was doing a reset of the MTD device structure using memset, causing all other state information contained in this structure to be lost. Since the SPI NAND support relies on the MTD subsystem (much more than the current SPI NOR support), we had to mitigate those issues. Long term, a proper rework of the SPI NOR support in U-Boot is definitely needed.

Some of those issues are present in the 2018.11 release and were discovered by U-Boot users who started testing the new mtd command. We have contributed a patch series addressing them, which hopefully should be merged soon.

Now that those difficulties are hopefully behind us, the U-Boot SPI-NAND support looks pretty stable, and we have quite a few SPI-NAND manufacturer drivers in U-Boot mainline, with Gigadevice, Macronix, Micron and Winbond supported so far. We’re happy to have contributed this new significant feature, as it finally allows to use this popular type of flash memory in U-Boot.

Updated cross-compilation toolchains, RISC-V 64 bit toolchain added

RISC-V 64 toolchain We have just published an updated version of the cross-compilation toolchains available at toolchains.bootlin.com.

The significant changes are:

A RISC-V 64 bit toolchain is now provided, following the addition of support for this architecture to the Buildroot project.
The stable toolchains are now using gcc 7.3.0 (instead of 6.4.0), gdb 7.12.1 (instead of 7.11.1), kernel headers 4.1.52 (instead of 4.1.49), glibc 2.27 (instead of 2.26), musl 1.1.19 (instead of 1.1.18) and uclibc 1.0.30 (instead of 1.0.28). We are still using a 7.x gdb version because the 8.x versions need C++11 support, which requires a recent enough host compiler, which in turn requires using a more modern distribution. Thus, those toolchains would be unusable with older distributions as they would require a recent glibc version on the host. Currently, our stable toolchains are still built within an old Debian Squeeze system, for maximum compatibility with old distributions.
The bleeding-edge toolchains are still using gcc 8.2.0, but gdb is now 8.1.1 (instead of 8.1), kernel headers 4.14.80 (instead of 4.14.57), glibc 2.28 (instead of 2.27), musl 1.1.20 (instead of 1.1.19).

We will continue to update those toolchains with more recent versions of gcc, binutils, gdb and the different C libraries, and add support for more architectures. Do not hesitate to ask for additional features or report any issue encountered when using those toolchains in our bug tracker.

Allwinner VPU support in mainline Linux November status update

Since our previous update back in September, we continued the work to reach the goals set by our crowdfunding campaign and made a number of steps forward. First, we are happy to announce that the core of the Cedrus driver was approved by the linux-media maintainers! It followed the final version of the media request API (the required piece of media framework plumbing necessary for our driver).

Both the API and our driver were merged in time for Linux 4.20, that is currently at the release candidate stage and will be released in a few weeks. The core of the Cedrus driver that is now in Linus’ tree supports hardware-accelerated video decoding for the MPEG-2 codec. We have even already seen contributions from the community, including minor fixes and improvements!

We have also been following-up on the other features covered by our crowdfunding campaign and made good progress on bringing them forward:

The series bringing H.264 decoding to our driver was updated for a second revision on November 15, rebased atop the upcoming Linux release and including a number of fixes as well as documentation;
H.265 decoding support followed with a second version sent on November 23, based on the updated H.264 series and bringing various minor improvements over the first iteration;
The patch series for the display engine DRM driver that adds support for the tiled YUV format used by the VPU was also updated, significantly reworked and submitted again on November 23;
Finally, we submitted a patch series adding support for the A64 and H5 Allwinner SoCs in the Cedrus VPU driver on November 15.

A64 and H5 boards provided by our sponsors Olimex and Libretech, now supported in Cedrus

With these patch series well on their way, we are closer than ever to delivering the remaining goals of the crowdfunding campaign!

New embedded Linux engineer job opening in 2019 in Lyon, France

Penguin works Bootlin is going to move to a new and bigger office in Lyon, France, by the end of 2018. Our team in Lyon will therefore be able to welcome a new engineer in 2019.

Here are a few details about the job:

Job description: embedded Linux and kernel engineer
Profile: for this new position, meant to strengthen our small team in Lyon (currently two people), we are looking for someone with already valuable experience and autonomy in embedded Linux and kernel development. The positions that will follow should be open to junior engineers.
Lyon is a beautiful and vibrant city, the second largest urban area in France, which two rivers instead of one! Our office is within 5 minutes of a subway station, and is also easy to access from more residential areas in the south of Lyon.

If you are interested, please send a resume to jobs@bootlin.com, letting us know about your interests and ideas for the job.

Linux 4.19 released, Bootlin contributions

With the 4.19 released last week by Greg Kroah-Hartman (and not Linus), it’s time to have a look at our contributions for this release.

As always, LWN.net did an interesting coverage of this release cycle merge window, highlighting the most important changes: the first half of the 4.19 merge window and the rest of the 4.19 merge window. For 4.19 only, Bootlin contributed a total of 295 patches, which puts us at the 10th place in the ranking of most contributing companies according to KPS.

Also according to LWN statistics, Bootlin’s engineer Boris Brezillon is the 16th most active developer in terms of commits for this release with a total of 85.

The main highlights of our contributions are:

For Marvell platforms:
- Antoine Ténart made a few fixes and improvements to the inside-secure cryptographic engine driver and contributed support for SHA512, SHA384 and eip197d. This crypto engine is used on Marvell Armada 3700 and 7K/8K,
- Maxime Chevallier added support for Receive Side Scaling (RSS), in the PPv2 Ethernet controller driver found on Marvell 7K and 8K platforms,
- Gregory Clement added support for Adaptative Voltage Scaling (AVS) used by DVFS and more specifically by CPUfreq on Armada 37xx platforms,
- Miquèl Raynal added support for suspend/resume in the pinctrl driver for Marvell Armada 37xx family,
- Miquèl Raynal added support for multi-channel sensor in the thermal driver for Marvell SoCs,
- Thomas Petazzoni fixed a few bugs found in the PCI mvebu controller driver,
In the MTD subsystem:
- Boris Brezillon contributed the support for the MX35LF1GE4AB SPI NAND chip,
- Miquèl Raynal added the support for the MX35LF2GE4AB SPI NAND chip,
- Miquèl Raynal converted NAND drivers to the added nand_scan() hook,
- Miquèl Raynal and Boris Brezillon continued their cleanup of the NAND subsystem,
For Microsemi Ocelot platforms:
- Alexandre Belloni added link aggregation hardware offload support to the Ethernet switch driver that was contributed in Linux 4.18,
- Antoine Ténart contributed VLAN support also in the Ethernet switch driver that was contributed in Linux 4.18,
- Alexandre Belloni added support for the SPI controllers,
- Quentin Schulz added support for the interrupt controller on GPIO pins,
For Allwinner platforms:
- Boris Brezillon fixed an overflow in the dotclock driver of the Allwinner DRM driver,
- Paul Kocialkowski fixed an issue in the DRM driver on sun8i SoC family where planes untouched by an atomic commit are not displayed,
- Maxime Ripard added support for the C1 SRAM region on sun5i, sun7i and sun8i SoC families, which was needed to support the VPU (coming in future Linux kernel versions)
Maxime Chevallier improved PIO transfers on the SPI imx driver to support non-multiple of 8 bits_per_word and dynamic_burst transfers,

Bootlin engineers are not only contributors, but also maintainers of various subsystems in the Linux kernel, which means they are involved in the process of reviewing, discussing and merging patches contributed to those subsystems:

Maxime Ripard, as the Allwinner platform co-maintainer, merged 93 patches from other contributors
Boris Brezillon, as the MTD/NAND maintainer, merged 38 patches from other contributors
Miquèl Raynal, as the NAND co-maintainer, merged 110 patches from other contributors
Alexandre Belloni, as the RTC and Microsemi maintainer and Atmel platform co-maintainer, merged 38 patches from other contributors
Grégory Clement, as the Marvell EBU co-maintainer, merged 16 patches from other contributors

Here is the commit by commit detail of our contributions to 4.19:

Using the Cortex-M4 MCU on the i.MX6 SoloX from Linux

Introduction

The NXP i.MX6 SoloX System on Chip has two different CPU cores (i.e. Assymetric Multi Processing), a Cortex-A9 and a Cortex-M4. The Cortex-M4 MCU allows running an hard real-time OS while still having access to all the SoC peripherals.

This post is about running an application on the Cortex-M4, loading it from the Linux userspace. The i.MX 6SoloX SABRE Development Board is used for this demonstration.
It doesn’t describe in details how to build the BSP but the meta-freescale Yocto Project layer has been used. The kernel is a vendor kernel derivative, linux-fslc.

Building the cortex M4 binary

We will be running FreeRTOS on the Cortex-M4. The BSP is available from the NXP website (it may require registration).

Uncompress it:

$ tar xf FreeRTOS_BSP_1.0.1_iMX6SX.tar.gz

A baremetal toolchain is needed to compile for Cortex-M4. One is available from ARM.

Download it and uncompress it:

$ wget https://armkeil.blob.core.windows.net/developer/Files/downloads/gnu-rm/7-2018q2/gcc-arm-none-eabi-7-2018-q2-update-linux.tar.bz2
$ tar xf gcc-arm-none-eabi-7-2018-q2-update-linux.tar.bz2

The provided examples will have the Cortex-M4 output debug on one of the SoC UART. Unfortunately, the hardware setup code will forcefully try to use the 24MHz oscillator as the clock parent for the UART and because Linux is using a 80MHz clock, running the examples as-is would result in a non functioning Linux console.

The following patch solves this issue:

diff -burp a/examples/imx6sx_sdb_m4/board.c b/examples/imx6sx_sdb_m4/board.c
--- a/examples/imx6sx_sdb_m4/board.c
+++ b/examples/imx6sx_sdb_m4/board.c
@@ -69,10 +69,12 @@ void dbg_uart_init(void)
     /* Set debug uart for M4 core domain access only */
     RDC_SetPdapAccess(RDC, BOARD_DEBUG_UART_RDC_PDAP, 3 << (BOARD_DOMAIN_ID * 2), false, false);
 
+#if 0
     /* Select board debug clock derived from OSC clock(24M) */
     CCM_SetRootMux(CCM, ccmRootUartClkSel, ccmRootmuxUartClkOsc24m);
     /* Set relevant divider = 1. */
     CCM_SetRootDivider(CCM, ccmRootUartClkPodf, 0);
+#endif
     /* Enable debug uart clock */
     CCM_ControlGate(CCM, ccmCcgrGateUartClk, ccmClockNeededAll);
     CCM_ControlGate(CCM, ccmCcgrGateUartSerialClk, ccmClockNeededAll);
@@ -80,7 +82,7 @@ void dbg_uart_init(void)
     /* Configure the pin IOMUX */
     configure_uart_pins(BOARD_DEBUG_UART_BASEADDR);
 
-    DbgConsole_Init(BOARD_DEBUG_UART_BASEADDR, 24000000, 115200);
+    DbgConsole_Init(BOARD_DEBUG_UART_BASEADDR, 80000000, 115200);
 }
 
 /*FUNCTION*---------------------------------------------------------------------

After applying that, let's build some examples.

$ cd FreeRTOS_BSP_1.0.1_iMX6SX/examples/imx6sx_sdb_m4/demo_apps/hello_world/armgcc
$ ARMGCC_DIR=~/gcc-arm-none-eabi-7-2018-q2-update ./build_release.sh
...
[100%] Linking C executable release/hello_world.elf
[100%] Built target hello_world
$ cd ../../rpmsg/pingpong_freertos/armgcc/
$ ARMGCC_DIR=~/gcc-arm-none-eabi-7-2018-q2-update ./build_release.sh
...
[100%] Linking C executable release/rpmsg_pingpong_freertos_example.elf
[100%] Built target rpmsg_pingpong_freertos_example
$ cd ../../str_echo_freertos/armgcc/i
$ ARMGCC_DIR=~/gcc-arm-none-eabi-7-2018-q2-update ./build_release.sh
...
[100%] Linking C executable release/rpmsg_str_echo_freertos_example.elf
[100%] Built target rpmsg_str_echo_freertos_example

The generated binaries are in FreeRTOS_BSP_1.0.1_iMX6SX/examples/imx6sx_sdb_m4/demo_apps/:

rpmsg/pingpong_freertos/armgcc/release/rpmsg_pingpong_freertos_example.bin
rpmsg/str_echo_freertos/armgcc/release/rpmsg_str_echo_freertos_example.bin
hello_world/armgcc/release/hello_world.bin

Copy them to your root filessystem.

Loading the M4 binary

NXP provides a userspace application allowing to load the binary on the Cortex-M4 from Linux: imx-m4fwloader.

Unfortunately, this doesn't work out of the box because the code in the vendor kernel is expecting the Cortex-M4 to be started by u-boot to initialize communication channels between the Cortex-A9 and the Cortex-M4, specifically the shared memory area containing the clocks status and the Cortex-M4 clock.

The following hack ensures the initialization is done and the Cortex-M4 clock is left enabled after boot:

diff --git a/arch/arm/mach-imx/src.c b/arch/arm/mach-imx/src.c
index c53b6da411b9..9859751b5109 100644
--- a/arch/arm/mach-imx/src.c
+++ b/arch/arm/mach-imx/src.c
@@ -194,6 +194,7 @@ void __init imx_src_init(void)
 		m4_is_enabled = true;
 	else
 		m4_is_enabled = false;
+	m4_is_enabled = true;
 
 	val &= ~(1 << BP_SRC_SCR_WARM_RESET_ENABLE);
 	writel_relaxed(val, src_base + SRC_SCR);

Don't forget to use the -m4 version of the device tree (i.e imx6sx-sdb-m4.dtb), else the kernel will crash with the following dump:

Unable to handle kernel NULL pointer dereference at virtual address 00000018
pgd = 80004000
[00000018] *pgd=00000000
Internal error: Oops: 805 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.67-fslc-02224-g953c6e30c970-dirty #6
Hardware name: Freescale i.MX6 SoloX (Device Tree)
task: a80bc000 task.stack: a8120000
PC is at imx_amp_power_init+0x98/0xdc
LR is at 0xa8003b00
pc : [<810462ec>]    lr : []    psr: 80000013
sp : a8121ec0  ip : 812241cc  fp : a8121ed4
r10: 8109083c  r9 : 00000008  r8 : 00000000
r7 : 81090838  r6 : 811c9000  r5 : ffffe000  r4 : 81223d78
r3 : 00000001  r2 : 0000001c  r1 : 00000001  r0 : 00000000
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: 8000404a  DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xa8120210)
Stack: (0xa8121ec0 to 0xa8122000)
1ec0: 81046254 ffffe000 a8121f4c a8121ed8 80101c7c 81046260 81000638 8045f190
1ee0: abfff9ad 80b35338 a8121f00 a8121ef8 801522e0 81000628 a8121f34 80d4aefc
1f00: 80d4a754 80d57578 00000007 00000007 00000000 80e59e78 80d96100 00000000
1f20: 81090818 80e59e78 811c9000 80e59e78 810c3c70 811c9000 81090838 811c9000
1f40: a8121f94 a8121f50 81000ea0 80101c34 00000007 00000007 00000000 8100061c
1f60: 8100061c 00000130 dc911044 00000000 80ab005c 00000000 00000000 00000000
1f80: 00000000 00000000 a8121fac a8121f98 80ab0074 81000d40 00000000 80ab005c
1fa0: 00000000 a8121fb0 80108378 80ab0068 00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 6ff3be9f f9fde415
[<810462ec>] (imx_amp_power_init) from [<80101c7c>] (do_one_initcall+0x54/0x17c)
[<80101c7c>] (do_one_initcall) from [<81000ea0>] (kernel_init_freeable+0x16c/0x200)
[<81000ea0>] (kernel_init_freeable) from [<80ab0074>] (kernel_init+0x18/0x120)
[<80ab0074>] (kernel_init) from [<80108378>] (ret_from_fork+0x14/0x3c)
Code: 0a000005 e79ce103 e2833001 e794e10e (e5421004) 
---[ end trace 0b1d1e3108025c69 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Running the examples

With all of that in place, we can now load the first example, a simple hello world on the Cortex-M4:

On the Cortex-A9:

# ./m4fwloader hello_world.bin 0x7f8000

Output from the Cortex-M4:

Hello World!

0x7f8000 is the address of the TCM. That is the memory from where the Cortex-M4 is running the code. The OCRAM or the regular DDR can also be used.

The first RPMSG example allows sending strings from the Cortex-A9 to the Cortex-M4 using a tty character device:

On the Cortex-A9:

# ./m4fwloader rpmsg_str_echo_freertos_example.bin 0x7f8000
virtio_rpmsg_bus virtio0: creating channel rpmsg-openamp-demo-channel addr 0x0
# insmod ./imx_rpmsg_tty.ko
imx_rpmsg_tty virtio0.rpmsg-openamp-demo-channel.-1.0: new channel: 0x400 -> 0x0!
Install rpmsg tty driver!
# echo test > /dev/ttyRPMSG
#

Output from the Cortex-M4:

RPMSG String Echo FreeRTOS RTOS API Demo...
RPMSG Init as Remote
Name service handshake is done, M4 has setup a rpmsg channel [0 ---> 1024]
Get Message From Master Side : "test" [len : 4]
Get New Line From Master Side

The other RPMSG example is a ping pong between the Cortex-A9 and the
Cortex-M4:

On the Cortex-A9:

# ./m4fwloader rpmsg_pingpong_freertos_example.bin 0x7f8000
virtio_rpmsg_bus virtio0: creating channel rpmsg-openamp-demo-channel addr 0x0
# insmod ./imx_rpmsg_pingpong.ko
imx_rpmsg_pingpong virtio0.rpmsg-openamp-demo-channel.-1.0: new channel: 0x400 -> 0x0!
# get 1 (src: 0x0)
get 3 (src: 0x0)
get 5 (src: 0x0)
get 7 (src: 0x0)
get 9 (src: 0x0)
get 11 (src: 0x0)
get 13 (src: 0x0)
get 15 (src: 0x0)
get 17 (src: 0x0)
get 19 (src: 0x0)
get 21 (src: 0x0)
get 23 (src: 0x0)
get 25 (src: 0x0)
get 27 (src: 0x0)
get 29 (src: 0x0)
get 31 (src: 0x0)
get 33 (src: 0x0)
get 35 (src: 0x0)
get 37 (src: 0x0)
get 39 (src: 0x0)

Output from the Cortex-M4:

RPMSG PingPong FreeRTOS RTOS API Demo...
RPMSG Init as Remote
Name service handshake is done, M4 has setup a rpmsg channel [0 ---> 1024]
Get Data From Master Side : 0
Get Data From Master Side : 2
Get Data From Master Side : 4
Get Data From Master Side : 6
Get Data From Master Side : 8
Get Data From Master Side : 10
Get Data From Master Side : 12
Get Data From Master Side : 14
Get Data From Master Side : 16
Get Data From Master Side : 18
Get Data From Master Side : 20
Get Data From Master Side : 22
Get Data From Master Side : 24
Get Data From Master Side : 26
Get Data From Master Side : 28
Get Data From Master Side : 30
Get Data From Master Side : 32
Get Data From Master Side : 34
Get Data From Master Side : 36
Get Data From Master Side : 38
Get Data From Master Side : 40

Conclusion

While loading and reloading a Cortex-M4 firmware from Linux doesn't work out of the box, it is possible to make that work without too many modifications.

We usually prefer working with an upstream kernel. Upstream, there is a remoteproc driver, drivers/remoteproc/imx_rproc.c which is a much cleaner and generic way of loading a firmware on the Cortex-M4.

Bootlin at the XDC 2018 Conference

This year’s edition of the X.org Developpers Conference (XDC) happened two weeks ago in A Coruña, Spain. While its name suggests that it might be focused solely on the X.org display server, this conference is actually targeted at the whole Linux graphics stack, including alternative stacks like Android’s or Wayland servers. Following our involvment in the Linux DRM subsystem, and to deepen our understanding and involvement in the graphics stack, Bootlin sent one engineer, Maxime Ripard, maintainer of the sun4i DRM driver.

There’s been a lot of interesting talks during those three days, as you can see in the conference schedule, but we especially liked a few of those:

Jens Owens, Pierre-Loup Griffais – Open Source Driver Development Funding Hooking up the Money Hose – Slides

The opening talk was made by Jens Owens, from Google, and Pierre-Loup Griffais, from Valve. They provided some interesting feedback and insights from two companies with a quite central position in the gaming industry. They also advocated for open source drivers, and the way they were actually helping the game developpers.

Haneen Mohamed, Rodrigo Siqueira: VKMS – Slides

Haneen Mohamed and Rodrigo Siqueira were on stage to talk about their work as part of the Google Summer of Code and Outreachy programs to work on a Virtual KMS driver. This driver is still in its early stage, but got merged and while basic at the moment, holds a lot of promises to use it as a KMS backend for testing the KMS API.

Jerome Glisse – getting rid of get_user_page() in favor of HMM – Slides

Jerome Glisse, from Red Hat, came to describe his current work on the memory management subsystem of the Linux kernel. He’s working on dealing with the constraints that the systems using GPUs to offload computations have when it comes to allocating memory in the most efficient way.

Overall, this was a great description about the get_user_page interface pitfalls when used in that context, and from the work that he has been doing for the past years to overcome them.

Lyude Paul, Alyssa Rosenzweig – Introducing Panfrost – Slides

In that talk, Lyude Paul and Alyssa Rosenzweig were showing the work they did on Panfrost, the reverse-engineering effort around the Mali-T GPUs from ARM (which was then expanded to the Mali-G GPUs). They discussed the result of their findings, explained the architecture of the GPU and then talked about the current state of their work. The final part of the talk was a quick demo of their work on a Rockchip SoC. It provided a great overview of the current state of the driver, and there’s a lot of hope for an open-source driver for that GPU that is quite widely used on ARM.

Elie Tournier – What’s new in the virtual world? – Slides

Elie Tournier, from Collabora, gave a talk about his work and the current state of virgl, which is a virtual 3D GPU meant to be used within qemu virtual machines, while remaining independant of the host GPU. While we were aware of the existence of that driver for quite some time, this talk provided a great overview of the features that are provided by virgl, and what you can and cannot do with it.

Conclusion

After XDC in 2016 in Helsinki, this was our second time attending that conference. Just like the first time, we really enjoyed the single track format where you can meet all the attendees and have side discussions pretty easily. Once again, the talks were great, and lead us to think about interesting developments we could do on our various projects.