Back from ELCE 2018: our selection of talks

The Embedded Linux Conference Europe edition 2018 took place a few weeks ago in Edinburgh, Scotland, and no less than 9 engineers from Bootlin attended the conference. While our previous blog post shared the videos and slides of our talks, tutorials and demos, in this blog post we would like to highlight a selection of talks that Bootlin engineers found interesting. We asked each of the 9 engineers who attended the event to pick one talk they liked, and make a small write-up about it. Of course, many other talks were interesting and what makes a talk interesting is very subjective!

Getting Your Patches in Mainline Linux: What Not To Do (and a Few Things You Could Try Instead), by Marc Zyngier

Talk selected by Maxime Ripard

Marc gave a talk on a subject that is often debated, and still confusing to newcomers: how to contribute. He first started by presenting the various actors involved in a contribution: a contributor, a maintainer and a reviewer. He also took the time to explain the various objectives that everyone has which is something that is often overlooked by the other parties and the conferences on this subject. He then went on to explain and document the good practices that can be used in order to contribute to most subsystems. This was overall a great overview, and we definitely recommend it to people willing to start contributing.



Real Time is Coming to Linux; What Does that Mean to You? , by Steven Rostedt

Talk selected by Michael Opdenacker

In this talk about PREEMPT_RT, the speaker, who’s a long time contributor to this feature, was approaching the subject on a new angle, taking for granted that PREEMPT_RT is in mainline Linux. That’s not quite right yet, but this is possible before the next Embedded Linux Conference, in August next year. One proof that this is on the verge of being true is that its authors no longer call it a patch set, but just PREEMPT_RT. Rostedt also added that Linux can now be called a Deterministic Operating System (aka DOS!).

So, Rostedt first explains what PREEMPT_RT is about and how it addresses the challenges of users who are determined to be deterministic (that’s my pun here, not Steven’s).

Doing this, Steven recalled the “Priority inheritance” issue that is best known through the fact that it happened on Mars on the Pathfinder robot. A high priority and critical system process got starved by a lower priority one because an even lower priority process was holding the lock the high priority process was waiting for, causing some system services to be unavailable. This caused a watchdog to kick in and reboot the system endlessly. Such an issue is addressed by “Priority inheritance”, allowing a lock-holding process to inherit the priority of the highest priority process waiting for the lock. Priority inheritance is now supported in kernel locks thanks to PREEMPT_RT.

By the way, I learned that there are now 5 preemption models in the kernel, instead of four originally with PREEMPT_RT. There is now a “Basic RT” option in which you have all the PREEMPT_RT features except the sleeping spinlocks, which is useful for debugging such features.

So now that PREEMPT_RT is almost in mainline, what should kernel developers do? The main thing is to stop adding non determinism to Linux. For example, Rostedt strongly advised against rw_locks and semaphores on multiple CPUs. That’s horrible for cache lines, as they do not scale. You should use RCU mechanisms instead.

As a kernel developer, you shouldn’t use preempt_disable() either, unless you know it is done for a very short amount of time. Similarly, if you find code that uses local_irq_save(), that’s most likely a bug. Instead, people should use spin_lock_irqsave() and spin_lock_irq(), which disable interrupts only when PREEMPT_RT is not enabled.

Rostedt ended his talk by answering a question about what will remain of the PREEMPT_RT patch set. Even when the most important parts of PREEMPT_RT are in mainline, some changesets are likely to remain for some time, just to address cases that don’t have a solution yet. 99.9% of the users will be able to do without it. That’s what a mainline solution means: no patches to apply.



Uh-oh, It’s I/O Ordering! by Will Deacon

Talk selected by Miquèl Raynal

Will gave his second talk at an ELCE about I/O ordering, 6 years after the first talk on that subject. For this purpose, he started with an introduction to the memory consistency models (in 5 minutes!) to show the audience how a very simple program, ran on two CPUs, could produce very strange results due to store buffering. Because his assumption was a bit hard to believe for such a simple program, he proved us he was right by actually running it on his laptop. While such kind of tricky behavior applies to memory, the same odd situation may happen with I/Os! After a theoretical explanation, he gave a few examples (mostly taken from the mainline Linux kernel) of good and bad code sections and explained why. If you are a device driver writer, this talk should be of interest! The examples are real use cases that you might encounter someday (if not already) and knowing how to workaround the most generic caveats with the right memory barrier or even doing a dummy read to enforce ordering is something you will want to master to avoid strange random bugs.




[PDF]

The Power Supply Subsystem, by Sebastian Reichel

Talk selected by Quentin Schulz

Sebastian started the talk by presenting what this subsystem is used for and its history, which he knows in great length since he took over the maintainership of the power supply subsystem in the Linux kernel in 2014. While it’s not the subsystem with the hardest concepts to grasp, Sebastian explained that he aimed, with his talk, at providing an accessible approach to the subsystem for people who’re trying to get started in the Linux kernel or in this specific subsystem. Having contributed to this subsystem a few patches and drivers in my early days as a kernel developer, I can say that I wish I had seen his talk before to quicken my understanding of the power supply subsystem. Scrolling down the slides, he presented a very simple example of a dummy driver, Device Tree nodes and how to configure what’s exposed to sysfs. Sebastian also gave a few words on Open-Circuit Voltage in batteries which is interesting for getting more precise values of the battery capacity depending on its age and temperature, and the ongoing work on supporting this in the kernel. He concluded with the future plans for the subsystem, which are mainly related to batteries, their fuel gauges and chargers.




[PDF]

The End of Time, 19 Years to Go, by Arnd Bergmann

Talk selected by Alexandre Belloni

Arnd gave an update on the status of the effort to get a 32-bit kernel handle the 32-bit time_t overflow which will happen in January 2038. He first started to explain why this is necessary. This boils down to the huge number of 32-bit products that are still being introduced on the market with some of them having a very long service life. Arnd said this work has been on-going since 2014, when John Stultz switched the internal timekeeping code to a 64-bit second counter. The device drivers then needed fixing. This was done by addressing them individually by changing:

  • time* to ktime_t
  • time* to jiffies
  • time_t to time64_t
  • timespec/timeval to timespec64
  • CLOCK_REALTIME to CLOCK_MONOTONIC

The driver userspace interface also needed to be changed. Some IOCTLs were easy to change because they are already using different numbers depending on the size of the argument they take. The other IOCTLs had to be redefined. It gets worse Arnd said, explaining how the read, write and mmap callbacks are getting fixed.

While the VFS layer got fixed earlier this year, some filesystems are still work in progress and other ones are not fixable because they use a 32-bit time on disk. The only way is to move away from those.

Arnd then went over the biggest remaining part of the work, the system calls. The 32-bit compat syscalls mechanism is reused and a __kernel_timespec type has been introduced to handle time at the boundary. He then listed the affected system calls and their current status.

He ended by talking about userspace and the plan to handle the issue in glibc. Finally, he mentioned what distributions will have to do.




[PDF]

On this Rock I will Build my System – Why Open-Source Firmware Matters, by Lucas Stach

Talk selected by Grégory Clement

Lucas started to present what we used to have in embedded world: a minimalist firmware which acts only as a bootloader and with no interaction with the kernel.

Then he showed why with the virtualization there were some needs to have CPU power management in a single place. This was defined by the PSCI: the purpose of it was to have the bare-metal and the virtualized kernel seeing the same interface. What should have been a simple and delimited interface then became more and more complex due to the hardware constraints. Indeed, in many SoCs multiples devices or CPUs can share the same register. Besides, an interface such as the I2C used by a PMIC can also be shared. This lead to moving the entire register inside the firmware or to have lock mechanisms between the kernel and the firmware. In conclusion, the kernel implementation became easier but at the expense of a complex firmware.

The sad news, is that most of the firmwares are not copyleft which can lead to closed source binaries, making the debugging very difficult for the kernel. Even if the firmware remains open source, having the hardware management split in two parts, makes the debugging more complex. However, there is nothing we can do about it, because there are valid reasons to have a firmware. The only thing we should be vigilant about is the openness of the firmware source.




[PDF]

Handling Security Flaws in an Open Source Project, by Jeremy Allison

Talk selected by Antoine Ténart

Samba is a well known re-implementation of the SMB protocol and as such is used in several consumer devices — such as NAS. As open source software are more and more used in new products, correctly handling security flaws and their fixes is becoming an important topic.

Jeremy Allison, one of the core developers of Samba, gave a talk about how Samba is dealing with security issues and what questions other projects should ask themselves to handle those the right way. He talked about the process to put in place to take security seriously, how to respond to vulnerability reporters and to security issues, and how to notify downstream vendors so that products in the wild are patched before the CVE is made public.

Jeremy Allison also presented three examples of security flaws in Samba. He described how they were handled at the time, the difficulties the Samba developers encountered, and gave a postmortem.

Security is important and we found this talk to be a must-see for open source maintainers and developers, as it gave a good insight on how to properly handle security vulnerabilities in a project. One of the key points was how to coordinate the security responses to avoid having the users being at risk.



[PDF]

Improve Linux User-Space Core Libraries with Restartable Sequences, by Mathieu Desnoyers

Talk selected by Maxime Chevallier

Following-up on the good LWN coverage of the restartable sequences, Mathieu Desnoyers gave an interesting talk on the current userspace support, and some feedback regarding the shortcomings of the current implementation.

Restartable sequences allow to implement lockless per-cpu sections of code, that will be automatically aborted (or restarted) whenever migration, preemption or signal delivery occurs before the final “commit” operation is done.

This is useful to read some performance counters from userspace with a minimal overhead since there’s no lock involved to protect the critical section.

Mathieu explained that these critical sections need to be written in assembly code, but thanks to the librseq and its set of macros, users shouldn’t have to worry about this.

Mathieu then presented some of the shortcomings of rseqs, one of them being that they can’t be debugged in step-by-step (since a signal interrupts the sequence, causing it to abort). To solve these shortcomings, Mathieu gave a quick glimpse of a possible new system-call, cpu_opv(), that would allow users to execute a limited sequence of instructions with preemption and migration disabled.

Power Debugging with JTAG, by Patrick Titiano & Alexandre Bailon, Baylibre

Talk selected by Thomas Petazzoni

In this talk, BayLibre engineers Patrick Titiano and Alexandre Bailon introduced libSoCCA (SoC Continuous Analyzer), a Python library that allows to watch over JTAG what a SoC is doing.

This library allows remote access to the registers of a SoC through JTAG, and uses the SoC interconnect debug port rather than the CPU debug port. Non-intrusive observation of what the SoC is doing is thus possible, even when the CPU is idle or in a low-power state.

libSoCCA uses SVD (System View Description) files, which are XML files that describe all the registers of the SoC, their bitfields and possible values. This format is not specific to libSoCCA, since it is already used by Keil, and apparently some SoC vendors provide such SVD files for their SoCs. Unfortunately, not all vendors do this, and creating such SVD files from the SoC datasheet is a very long and boring process. In addition, the speakers pointed out that the SVD file format lacked an include directive, which would be very useful to share register definitions between SoC.

With the information provided by the SVD files and a connection to the target over JTAG that uses OpenOCD, libSoCCA is then used to implement a number of different
tools:

  • PMUGraph, which shows power management statistics of the device. Compared to solution such as perf or powertop, this solution has the advantage of being non-intrusive.
  • memtool, which provides a way of manipulating registers without having to manually fiddle with register offsets and bitfields. It could be summarized as a remote devmem that knows your SoC registers. This kind of feature can be found in proprietary JTAG tools, and was lacking in the open-source world.
  • clocktool (development not started yet), which shows the state of the SoC clocks remotely, a bit like clk_summary in debugfs, but which works even when the SoC is idle or in a low power state, which is precisely a moment where getting clock status may be useful for debugging.

Overall, we found libsocca very interesting as it opens up lots of possibilities. It would be useful to have a better file format than SVD to describe SoC registers though, and it would also be nice to have an on-target variant of memtool.




[PDF]

New embedded Linux engineer job opening in 2019 in Lyon, France

Penguin worksBootlin is going to move to a new and bigger office in Lyon, France, by the end of 2018. Our team in Lyon will therefore be able to welcome a new engineer in 2019.

Here are a few details about the job:

  • Job description: embedded Linux and kernel engineer
  • Profile: for this new position, meant to strengthen our small team in Lyon (currently two people), we are looking for someone with already valuable experience and autonomy in embedded Linux and kernel development. The positions that will follow should be open to junior engineers.
  • Lyon is a beautiful and vibrant city, the second largest urban area in France, which two rivers instead of one! Our office is within 5 minutes of a subway station, and is also easy to access from more residential areas in the south of Lyon.

If you are interested, please send a resume to jobs@bootlin.com, letting us know about your interests and ideas for the job.

Back from ELCE 2018: talks, tutorials and demos from Bootlin

The Embedded Linux Conference Europe edition 2018 took place last week in Edinburgh, Scotland, and no less than 9 engineers from Bootlin attended the conference. In this blog post, we would like to share the slides, materials and videos of the talks, tutorials and demos we gave during this conference.

Talk: Supporting Hardware Codecs in a Linux system – Maxime Ripard

This talk was given by Bootlin engineer Maxime Ripard, who has worked since spring 2018 with Paul Kocialkowski on adding support in the upstream Linux kernel for hardware-accelerated video decoding on Allwinner platforms. This project was the topic of the successful crowd-funding campaign we launched in February 2018, and for which we regularly posted updates on our blog.

[PDF] [Sources]


Talk: Networking: From the Ethernet MAC to the Link Partner – Maxime Chevallier & Antoine Ténart

This talk was given by Bootlin engineers Maxime Chevallier and Antoine Ténart, who shared their knowledge and experience working on enabling network hardware in Linux, trying to clarify how Ethernet MAC and PHYs interact, how PHYs communicate with their link partner, what are the protocols involved, etc.



[PDF] [Sources]

Talk: SPI Memory support in Linux and U-Boot – Miquèl Raynal

This talk was given by Miquèl Raynal, who has worked with Bootlin engineer Boris Brezillon on adding support for SPI NAND in U-Boot and Linux, as well as improving in general the support for SPI flash memory, see our previous blog post.



[PDF] [Sources]

Tutorial: Introduction to Linux kernel driver programming – Michael Opdenacker

This tutorial was given by Bootlin CEO and founder Michael Opdenacker, as part of the Embedded Apprentice Linux Engineer track.


[PDF] [Sources]

The video will be published later, as it was not recorded by the Linux Foundation, but by the E-ALE track organizers.

Tutorial: Getting started with Buildroot – Thomas Petazzoni

This tutorial was given by Bootlin CTO Thomas Petazzoni, as part of the Embedded Apprentice Linux Engineer track.



[PDF] [Sources]
The video will be published later, as it was not recorded by the Linux Foundation, but by the E-ALE track organizers.

Demo: Hardware Video Codec Support on Allwinner SoCs – Maxime Ripard

In this demonstration, Maxime Ripard was showing the upstream Linux kernel support for the Allwinner VPU, which provides hardware-accelerated video decoding for MPEG2, H264 and H265 within the Kodi media player on Allwinner platforms.



[PDF] [Sources]

Demo: Upstream Linux kernel support for Microsemi switches – Alexandre Belloni

Alexandre Belloni showing upstream Linux kernel support for Microsemi Ethernet switches
Alexandre Belloni showing upstream Linux kernel support for Microsemi Ethernet switches at the Embedded Linux Conference Europe 2018

In this demonstration, Bootlin engineer Alexandre Belloni was showing the upstream Linux kernel support for the VSC5713 and VSC7514 Microsemi Ethernet switches, which we presented in a previous blog post. Thanks to this support in upstream Linux, the different ports of the switch are seen as regular Linux network interfaces, and standard Linux user-space tools can be used to bridge the ports, set up VLAN filtering, and more. This makes such switches a lot easier to use than vendor-specific SDKs.



[Poster] [Sources]

Linux 4.19 released, Bootlin contributions

Penguin from Mylène Josserand
Drawing from Mylène Josserand,
based on a picture from Samuel Blanc under CC-BY-SA

With the 4.19 released last week by Greg Kroah-Hartman (and not Linus), it’s time to have a look at our contributions for this release.

As always, LWN.net did an interesting coverage of this release cycle merge window, highlighting the most important changes: the first half of the 4.19 merge window and the rest of the 4.19 merge window. For 4.19 only, Bootlin contributed a total of 295 patches, which puts us at the 10th place in the ranking of most contributing companies according to KPS.

Also according to LWN statistics, Bootlin’s engineer Boris Brezillon is the 16th most active developer in terms of commits for this release with a total of 85.

The main highlights of our contributions are:

Bootlin engineers are not only contributors, but also maintainers of various subsystems in the Linux kernel, which means they are involved in the process of reviewing, discussing and merging patches contributed to those subsystems:

  • Maxime Ripard, as the Allwinner platform co-maintainer, merged 93 patches from other contributors
  • Boris Brezillon, as the MTD/NAND maintainer, merged 38 patches from other contributors
  • Miquèl Raynal, as the NAND co-maintainer, merged 110 patches from other contributors
  • Alexandre Belloni, as the RTC and Microsemi maintainer and Atmel platform co-maintainer, merged 38 patches from other contributors
  • Grégory Clement, as the Marvell EBU co-maintainer, merged 16 patches from other contributors

Here is the commit by commit detail of our contributions to 4.19:

Using the Cortex-M4 MCU on the i.MX6 SoloX from Linux

Introduction

The NXP i.MX6 SoloX System on Chip has two different CPU cores (i.e. Assymetric Multi Processing), a Cortex-A9 and a Cortex-M4. The Cortex-M4 MCU allows running an hard real-time OS while still having access to all the SoC peripherals.

This post is about running an application on the Cortex-M4, loading it from the Linux userspace. The i.MX 6SoloX SABRE Development Board is used for this demonstration.
It doesn’t describe in details how to build the BSP but the meta-freescale Yocto Project layer has been used. The kernel is a vendor kernel derivative, linux-fslc.

Building the cortex M4 binary

We will be running FreeRTOS on the Cortex-M4. The BSP is available from the NXP website (it may require registration).

Uncompress it:

$ tar xf FreeRTOS_BSP_1.0.1_iMX6SX.tar.gz

A baremetal toolchain is needed to compile for Cortex-M4. One is available from ARM.

Download it and uncompress it:

$ wget https://armkeil.blob.core.windows.net/developer/Files/downloads/gnu-rm/7-2018q2/gcc-arm-none-eabi-7-2018-q2-update-linux.tar.bz2
$ tar xf gcc-arm-none-eabi-7-2018-q2-update-linux.tar.bz2

The provided examples will have the Cortex-M4 output debug on one of the SoC UART. Unfortunately, the hardware setup code will forcefully try to use the 24MHz oscillator as the clock parent for the UART and because Linux is using a 80MHz clock, running the examples as-is would result in a non functioning Linux console.

The following patch solves this issue:

diff -burp a/examples/imx6sx_sdb_m4/board.c b/examples/imx6sx_sdb_m4/board.c
--- a/examples/imx6sx_sdb_m4/board.c
+++ b/examples/imx6sx_sdb_m4/board.c
@@ -69,10 +69,12 @@ void dbg_uart_init(void)
     /* Set debug uart for M4 core domain access only */
     RDC_SetPdapAccess(RDC, BOARD_DEBUG_UART_RDC_PDAP, 3 << (BOARD_DOMAIN_ID * 2), false, false);
 
+#if 0
     /* Select board debug clock derived from OSC clock(24M) */
     CCM_SetRootMux(CCM, ccmRootUartClkSel, ccmRootmuxUartClkOsc24m);
     /* Set relevant divider = 1. */
     CCM_SetRootDivider(CCM, ccmRootUartClkPodf, 0);
+#endif
     /* Enable debug uart clock */
     CCM_ControlGate(CCM, ccmCcgrGateUartClk, ccmClockNeededAll);
     CCM_ControlGate(CCM, ccmCcgrGateUartSerialClk, ccmClockNeededAll);
@@ -80,7 +82,7 @@ void dbg_uart_init(void)
     /* Configure the pin IOMUX */
     configure_uart_pins(BOARD_DEBUG_UART_BASEADDR);
 
-    DbgConsole_Init(BOARD_DEBUG_UART_BASEADDR, 24000000, 115200);
+    DbgConsole_Init(BOARD_DEBUG_UART_BASEADDR, 80000000, 115200);
 }
 
 /*FUNCTION*---------------------------------------------------------------------

After applying that, let's build some examples.

$ cd FreeRTOS_BSP_1.0.1_iMX6SX/examples/imx6sx_sdb_m4/demo_apps/hello_world/armgcc
$ ARMGCC_DIR=~/gcc-arm-none-eabi-7-2018-q2-update ./build_release.sh
...
[100%] Linking C executable release/hello_world.elf
[100%] Built target hello_world
$ cd ../../rpmsg/pingpong_freertos/armgcc/
$ ARMGCC_DIR=~/gcc-arm-none-eabi-7-2018-q2-update ./build_release.sh
...
[100%] Linking C executable release/rpmsg_pingpong_freertos_example.elf
[100%] Built target rpmsg_pingpong_freertos_example
$ cd ../../str_echo_freertos/armgcc/i
$ ARMGCC_DIR=~/gcc-arm-none-eabi-7-2018-q2-update ./build_release.sh
...
[100%] Linking C executable release/rpmsg_str_echo_freertos_example.elf
[100%] Built target rpmsg_str_echo_freertos_example

The generated binaries are in FreeRTOS_BSP_1.0.1_iMX6SX/examples/imx6sx_sdb_m4/demo_apps/:

  • rpmsg/pingpong_freertos/armgcc/release/rpmsg_pingpong_freertos_example.bin
  • rpmsg/str_echo_freertos/armgcc/release/rpmsg_str_echo_freertos_example.bin
  • hello_world/armgcc/release/hello_world.bin

Copy them to your root filessystem.

Loading the M4 binary

NXP provides a userspace application allowing to load the binary on the Cortex-M4 from Linux: imx-m4fwloader.

Unfortunately, this doesn't work out of the box because the code in the vendor kernel is expecting the Cortex-M4 to be started by u-boot to initialize communication channels between the Cortex-A9 and the Cortex-M4, specifically the shared memory area containing the clocks status and the Cortex-M4 clock.

The following hack ensures the initialization is done and the Cortex-M4 clock is left enabled after boot:

diff --git a/arch/arm/mach-imx/src.c b/arch/arm/mach-imx/src.c
index c53b6da411b9..9859751b5109 100644
--- a/arch/arm/mach-imx/src.c
+++ b/arch/arm/mach-imx/src.c
@@ -194,6 +194,7 @@ void __init imx_src_init(void)
 		m4_is_enabled = true;
 	else
 		m4_is_enabled = false;
+	m4_is_enabled = true;
 
 	val &= ~(1 << BP_SRC_SCR_WARM_RESET_ENABLE);
 	writel_relaxed(val, src_base + SRC_SCR);

Don't forget to use the -m4 version of the device tree (i.e imx6sx-sdb-m4.dtb), else the kernel will crash with the following dump:

Unable to handle kernel NULL pointer dereference at virtual address 00000018
pgd = 80004000
[00000018] *pgd=00000000
Internal error: Oops: 805 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.67-fslc-02224-g953c6e30c970-dirty #6
Hardware name: Freescale i.MX6 SoloX (Device Tree)
task: a80bc000 task.stack: a8120000
PC is at imx_amp_power_init+0x98/0xdc
LR is at 0xa8003b00
pc : [<810462ec>]    lr : []    psr: 80000013
sp : a8121ec0  ip : 812241cc  fp : a8121ed4
r10: 8109083c  r9 : 00000008  r8 : 00000000
r7 : 81090838  r6 : 811c9000  r5 : ffffe000  r4 : 81223d78
r3 : 00000001  r2 : 0000001c  r1 : 00000001  r0 : 00000000
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c53c7d  Table: 8000404a  DAC: 00000051
Process swapper/0 (pid: 1, stack limit = 0xa8120210)
Stack: (0xa8121ec0 to 0xa8122000)
1ec0: 81046254 ffffe000 a8121f4c a8121ed8 80101c7c 81046260 81000638 8045f190
1ee0: abfff9ad 80b35338 a8121f00 a8121ef8 801522e0 81000628 a8121f34 80d4aefc
1f00: 80d4a754 80d57578 00000007 00000007 00000000 80e59e78 80d96100 00000000
1f20: 81090818 80e59e78 811c9000 80e59e78 810c3c70 811c9000 81090838 811c9000
1f40: a8121f94 a8121f50 81000ea0 80101c34 00000007 00000007 00000000 8100061c
1f60: 8100061c 00000130 dc911044 00000000 80ab005c 00000000 00000000 00000000
1f80: 00000000 00000000 a8121fac a8121f98 80ab0074 81000d40 00000000 80ab005c
1fa0: 00000000 a8121fb0 80108378 80ab0068 00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 6ff3be9f f9fde415
[<810462ec>] (imx_amp_power_init) from [<80101c7c>] (do_one_initcall+0x54/0x17c)
[<80101c7c>] (do_one_initcall) from [<81000ea0>] (kernel_init_freeable+0x16c/0x200)
[<81000ea0>] (kernel_init_freeable) from [<80ab0074>] (kernel_init+0x18/0x120)
[<80ab0074>] (kernel_init) from [<80108378>] (ret_from_fork+0x14/0x3c)
Code: 0a000005 e79ce103 e2833001 e794e10e (e5421004) 
---[ end trace 0b1d1e3108025c69 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Running the examples

With all of that in place, we can now load the first example, a simple hello world on the Cortex-M4:

On the Cortex-A9:

# ./m4fwloader hello_world.bin 0x7f8000

Output from the Cortex-M4:

Hello World!

0x7f8000 is the address of the TCM. That is the memory from where the Cortex-M4 is running the code. The OCRAM or the regular DDR can also be used.

The first RPMSG example allows sending strings from the Cortex-A9 to the Cortex-M4 using a tty character device:

On the Cortex-A9:

# ./m4fwloader rpmsg_str_echo_freertos_example.bin 0x7f8000
virtio_rpmsg_bus virtio0: creating channel rpmsg-openamp-demo-channel addr 0x0
# insmod ./imx_rpmsg_tty.ko
imx_rpmsg_tty virtio0.rpmsg-openamp-demo-channel.-1.0: new channel: 0x400 -> 0x0!
Install rpmsg tty driver!
# echo test > /dev/ttyRPMSG
#

Output from the Cortex-M4:

RPMSG String Echo FreeRTOS RTOS API Demo...
RPMSG Init as Remote
Name service handshake is done, M4 has setup a rpmsg channel [0 ---> 1024]
Get Message From Master Side : "test" [len : 4]
Get New Line From Master Side

The other RPMSG example is a ping pong between the Cortex-A9 and the
Cortex-M4:

On the Cortex-A9:

# ./m4fwloader rpmsg_pingpong_freertos_example.bin 0x7f8000
virtio_rpmsg_bus virtio0: creating channel rpmsg-openamp-demo-channel addr 0x0
# insmod ./imx_rpmsg_pingpong.ko
imx_rpmsg_pingpong virtio0.rpmsg-openamp-demo-channel.-1.0: new channel: 0x400 -> 0x0!
# get 1 (src: 0x0)
get 3 (src: 0x0)
get 5 (src: 0x0)
get 7 (src: 0x0)
get 9 (src: 0x0)
get 11 (src: 0x0)
get 13 (src: 0x0)
get 15 (src: 0x0)
get 17 (src: 0x0)
get 19 (src: 0x0)
get 21 (src: 0x0)
get 23 (src: 0x0)
get 25 (src: 0x0)
get 27 (src: 0x0)
get 29 (src: 0x0)
get 31 (src: 0x0)
get 33 (src: 0x0)
get 35 (src: 0x0)
get 37 (src: 0x0)
get 39 (src: 0x0)

Output from the Cortex-M4:

RPMSG PingPong FreeRTOS RTOS API Demo...
RPMSG Init as Remote
Name service handshake is done, M4 has setup a rpmsg channel [0 ---> 1024]
Get Data From Master Side : 0
Get Data From Master Side : 2
Get Data From Master Side : 4
Get Data From Master Side : 6
Get Data From Master Side : 8
Get Data From Master Side : 10
Get Data From Master Side : 12
Get Data From Master Side : 14
Get Data From Master Side : 16
Get Data From Master Side : 18
Get Data From Master Side : 20
Get Data From Master Side : 22
Get Data From Master Side : 24
Get Data From Master Side : 26
Get Data From Master Side : 28
Get Data From Master Side : 30
Get Data From Master Side : 32
Get Data From Master Side : 34
Get Data From Master Side : 36
Get Data From Master Side : 38
Get Data From Master Side : 40

Conclusion

While loading and reloading a Cortex-M4 firmware from Linux doesn't work out of the box, it is possible to make that work without too many modifications.

We usually prefer working with an upstream kernel. Upstream, there is a remoteproc driver, drivers/remoteproc/imx_rproc.c which is a much cleaner and generic way of loading a firmware on the Cortex-M4.

Bootlin at the XDC 2018 Conference

This year’s edition of the X.org Developpers Conference (XDC) happened two weeks ago in A Coruña, Spain. While its name suggests that it might be focused solely on the X.org display server, this conference is actually targeted at the whole Linux graphics stack, including alternative stacks like Android’s or Wayland servers. Following our involvment in the Linux DRM subsystem, and to deepen our understanding and involvement in the graphics stack, Bootlin sent one engineer, Maxime Ripard, maintainer of the sun4i DRM driver.

There’s been a lot of interesting talks during those three days, as you can see in the conference schedule, but we especially liked a few of those:

Jens Owens, Pierre-Loup Griffais – Open Source Driver Development Funding Hooking up the Money Hose – Slides

The opening talk was made by Jens Owens, from Google, and Pierre-Loup Griffais, from Valve. They provided some interesting feedback and insights from two companies with a quite central position in the gaming industry. They also advocated for open source drivers, and the way they were actually helping the game developpers.

Haneen Mohamed, Rodrigo Siqueira: VKMS – Slides

Haneen Mohamed and Rodrigo Siqueira were on stage to talk about their work as part of the Google Summer of Code and Outreachy programs to work on a Virtual KMS driver. This driver is still in its early stage, but got merged and while basic at the moment, holds a lot of promises to use it as a KMS backend for testing the KMS API.

Jerome Glisse – getting rid of get_user_page() in favor of HMM – Slides

Jerome Glisse, from Red Hat, came to describe his current work on the memory management subsystem of the Linux kernel. He’s working on dealing with the constraints that the systems using GPUs to offload computations have when it comes to allocating memory in the most efficient way.

Overall, this was a great description about the get_user_page interface pitfalls when used in that context, and from the work that he has been doing for the past years to overcome them.

Lyude Paul, Alyssa Rosenzweig – Introducing Panfrost – Slides

In that talk, Lyude Paul and Alyssa Rosenzweig were showing the work they did on Panfrost, the reverse-engineering effort around the Mali-T GPUs from ARM (which was then expanded to the Mali-G GPUs). They discussed the result of their findings, explained the architecture of the GPU and then talked about the current state of their work. The final part of the talk was a quick demo of their work on a Rockchip SoC. It provided a great overview of the current state of the driver, and there’s a lot of hope for an open-source driver for that GPU that is quite widely used on ARM.

Elie Tournier – What’s new in the virtual world? – Slides

Elie Tournier, from Collabora, gave a talk about his work and the current state of virgl, which is a virtual 3D GPU meant to be used within qemu virtual machines, while remaining independant of the host GPU. While we were aware of the existence of that driver for quite some time, this talk provided a great overview of the features that are provided by virgl, and what you can and cannot do with it.

Conclusion

After XDC in 2016 in Helsinki, this was our second time attending that conference. Just like the first time, we really enjoyed the single track format where you can meet all the attendees and have side discussions pretty easily. Once again, the talks were great, and lead us to think about interesting developments we could do on our various projects.

Bootlin back from Kernel Recipes!

As announced previously, we participated to the Kernel Recipes conference in September in Paris. Three people from Bootlin attended the event: Grégory Clement who gave a talk about SD/eMMC, Boris Brezillon and Mylène Josserand.
Unfortunately, we were not able to attend the Embedded Recipes conference but we hope to catch up next year!

Overview of SD/eMMC, their high speed modes and Linux support, by Grégory Clément

Here is the video of Grégory’s presentation:

You can find the slides on our website.

KernelShark 1.0; What’s new and what’s coming, by Steven Rostedt – VMware

The first day, one of the most enjoyable talk was “KernelShark 1.0; What’s new and what’s coming”. One reason is the speaker itself, Steven Rostedt, who is very experienced in presenting. He always knows very well the approached subject and does a few jokes during the talk: all of these lead to a very pleasant talk.

From my point of view, the talk itself presents two interesting subjects: the process of developing a tool’s front end (with trace-cmd being the example) and then a presentation of this GUI.

Talk chosen by Grégory

Atomic explosion: evolution and use of relaxed concurrency primitives, by Will Deacon – ARM

On the second day, Will Deacon talked about an interesting topic “Atomic explosion: evolution and use of relaxed concurrency primitives”. As usual with Will, the technical level is high and seeing the video a second time is recommended to really put the multiple pieces of information together.

Besides the explanations on the atomic operation and their meaning from the point of view of the CPUs, Will also presented his new API, how and when we should use it.

Talk chosen by Grégory

Coccinelle: 10 Years of Automated Evolution in the Linux Kernel, by Julia Lawall – INRIA/LIP6

Happy birthday, Coccinelle!
It has been 10 years that this project is helping kernel developers to track bugs or clean the kernel up. For this event, Julia did a retrospective and a “what’s new” of this project.

Initially used only by Coccinelle developers, it was quickly adopted by all the kernel community. It was interesting to have the history, feedback and also updates on this project that is more and more used now.

Talk chosen by Mylène

Meltdown and Spectre: seeing through the magician’s tricks, by Paolo Bonzini – Red Hat

Paolo Bonzini did a great presentation about Meltdown and Spectre with a detailed description of the different mechanisms taken advantage of by these two issues: branch prediction, memory mapping, paging, etc.
It was a great overview and well explained.

Talk chosen by Mylène

The end word

As usual at Kernel Recipes, Frank Tizzoni is in the room to draw sketches of attendees and speakers! Have a look at all the sketches! Some of them are really funny 🙂

It is the first time we attended the Kernel Recipes and this conference is as good as the feedback we received from people who were in the previous editions.
The major points are the high quality of the talks, the interaction between the speaker and the audience but also the social events around it.

Boris and Grégory

It is the second time that I attended Kernel Recipes and I am still convinced that this conference is really nice.
The talks, the audience, the format (limited to 100 people) and all social events are great!
My only regret is that I was not able to attend Embedded Recipes to enjoy a bit more the ambiance around these two conferences.
I hope to register in time next year! 😉

Mylène

Bootlin at the ALPSS 2018 conference

The second edition of the Alpine Linux Persistent Storage Summit (ALPSS) happened two weeks ago in the Lizumerhütte Alpine lodge. Close to Innsbruck, Austria, the lodge resides in an amazingly beautiful valley. Completely separated from the rest of the world in Winter, this year edition was marked by the absence of data network access, intensifying the feeling of isolation, stimulating the exchanges between attendees. To strengthen the representation of MTD developers at this event, Bootlin sent two of his engineers: Boris Brezillon and Miquèl Raynal, respectively MTD and NAND maintainers in the Linux kernel.

Cow with a beautiful view over the Alps
Picture taken while climbing to the lodge. Author: Hans Holmberg, 2018 (CC-BY-SA)

NVMe, open-channel and zoned namespaces

While almost all the ~30 attendees work on storage support that are based on NAND flashes, a majority work on domains targeting high-performances, where power-cuts are not the issue but the latency and throughput are. Far beyond our embedded world, people are working hard on the parallelization and the standardization of high-speed interfaces (SCSI, NVMe). In the end, we all have to make the software deals with the NAND-specific constraints of the underlying storage device.

Disclaimer: This is a short summary (not exhaustive) of the “high-performance” world talks as we could understand them. This is probably not 100% accurate as the topics discussed are, currently, out of our domain of expertise. Corrections are welcome.

Matias Bjørling (Western Digital) and Christoph Hellwig presented new NVMe commands to manage NVMe zones. While zones need write order to be preserved, the Linux multi-queue block I/O queueing mechanism (blk-mq) cannot enforce this. Bart van Assche (Google) and Damien Le Moal (Western Digital) proposed a draft to reorder writes at the blk-mq layer. While this solution was not very well received, it opened the discussion on how the issue should be addressed. Bart van Assche also presented his work on copy offload mechanism in Linux, which could for instance serve to fast copy entire zones. His work could be also useful to Stephen Bates who works on PCIe peer-to-peer and talked on how he wants to eg. enable DMA between SSDs. Still on the topic of DMA and performances, Idan Burstein (Mellanox) exposed the cutting-edge features he worked on to improve Remote DMA (RDMA) performances.

MTD was also present to the party

Probably the easier part to understand for us, embedded people.

Boris and Miquèl presenting
Boris and Miquèl presenting about memories. Author: Brian Pawlowski, 2018 (CC-BY)

Boris Brezillon and Miquèl Raynal gave a talk on their recent work support for SPI memories in Linux (and U-Boot, but this will be more detailed at ELCE in October). Boris wrote a new SPI-NAND layer, converting MTD requests into SPI exchanges, giving the flow of commands to the (also brand new) SPI-mem layer to standardize how to speak with SPI controller drivers from both SPI-NAND and SPI-NOR stacks. Cleaning work is still needed on the SPI-NOR side as well as the addition of new features like direct mapping, XIP (that was discussed after the talk), the addition of support for more chips and the conversion to SPI-mem of more SPI controllers. The slides are available online, see also our previous blog post on this topic.



Richard Weinberger (from Sigma Star GmbH, and co-maintainer of MTD and UBI/UBIFS) updated us about the level of power-cut testing available to challenge the MTD stack. Tracing is possible to get closer to the failing sequence but one big problem is to replay the sequence and reproduce the issue. Tracking down untested code path is very important to keep UBI/UBIFS as reliable as possible: this is what is generally the most important when using SPI/parallel NAND devices.

Richard’s co-worker David Gstir also works on UBI/UBIFS, but on the authentication side. Bringing filesystem authentication to UBIFS could have been simple but during his introduction he disqualified most of the alternatives he had (dm-verity, fs-verity, …). Fun-fact about fs-verity, authentication would have work on the file’s contents, but not on the inodes themselves. Hence, the file’s content could not be changed, but the file itself could still be moved. So, a brand new solution has been implemented for UBIFS, upstreaming ongoing.

Original ideas presented

Benchmarking real hardware was somehow not adapted to Damien Le Moal experiments. He hacked QEMU to add the possibility to tune CPU latency so that he could compare easily the latency on in-memory data processing paths. WIP.

Johannes Thumshirn (SUSE Labs), as a side project, started reversing APFS, Apple’s new filesystem. The firm promised two years ago to release the implementation of its filesystem so that computers running Microsoft or Linux could mount it. So far nothing happened, that is why, without even a Mac in hand, he started spending nights hex-dumping structures from a filesystem image he got, reverse-engineering the content with the help of research papers already produced. The first results are there, he can now ls and cat random files!

And after talks and hiking: time to BOFs

View from the lodge of a lake and the mountains
View from the lodge. Author: Brian Pawlowski, 2018 (CC-BY)

A bit before the official BOFs time MTD folks gathered around Hans Holmberg (CNEX Labs) to carefully listen about how pblk works, a “Physical block device” FTL for SSDs supporting open-channel that could give ideas to some of them. Why not an entirely open-source SSD running Linux with its own FTL?

Finally, between all the interesting discussions that happened, we could mention the need for a generic NVMe-oF (NVMe over Fabric) discovery protocol raised by Hannes Reinecke (SUSE Labs), and the possible evolution of the MTD stack to integrate an I/O scheduler to provide much better (and parallelized) performances exposed by Boris Brezillon.

Conclusion

All attendees agreed this format of conference is really pleasant, the surrounding helping a lot to the general wellness and the success of this year’s edition of the ALPSS. We will definitely try to make it next year!

Allwinner VPU support in mainline Linux status update (week 37)

Even though the bulk of the development on the Allwinner VPU support is done, we are still working on completing the upstreaming of the kernel driver, and some progress has been made recently on this topic:

  • On September 10, core Video4Linux developer Hans Verkuil sent a pull request to Video4Linux maintainer Mauro Carvalho Chehab to get the Cedrus driver merged. This means we’re getting closer and closer to have the driver merged. Unfortunately, some last minute issues were found in the patch series, so this pull request wasn’t merged.
  • On September 13, Bootlin engineer Maxime Ripard sent a new iteration of the Cedrus driver, version 10, which addresses those issues.
  • In addition, as the Allwinner platform maintainer, Maxime Ripard has merged the patches adding the Device Tree description of the Allwinner VPU, which reduces the Cedrus patch series to just 5 patches. They are now in the branch sunxi/dt-for-4.20, which should be part of the upcoming 4.20 Linux release.
T-Shirt for Allwinner VPU campaign supporters
T-Shirt for Allwinner VPU campaign supporters

In addition to this progress on the Linux kernel driver upstreaming process, we also moved forward with delivering the perks to the companies and individuals who supported our campaign:

  • A CREDITS file has been added to the libva-v4l2-request base, thanking all our backers who pleged more than 16 EUR.
  • The T-Shirts for the backers who pledged more than 128 EUR have been sent to those in the EU. We are also working on sending the t-shirts to those outside the EU, but it takes a bit more time due to the need for customs declarations. Don’t hesitate to take a picture of you with the T-Shirt, and post it on Twitter with the hashtag #VPULinuxDriverSupporter.

Bootlin at the Linux Plumbers 2018 conference

Last year, a number of Bootlin engineers attended the Linux Plumbers conference. This year again, Bootlin will participate to the event, with engineer Antoine Ténart traveling to Vancouver, Canada on November 13-15 for this conference.

Linux Plumbers 2018

We are particularly interested in attending the new Networking Track added to Linux Plumbers for the first time, but there will certainly be useful discussions as well in the BPF micro-conference, the Real-time micro-conference or the Power Management and Energy-awareness micro-conference.

If you’re attending this conference, don’t hesitate to get in touch with Antoine and meet during the event!