Bootlin at the XDC 2018 Conference

This year’s edition of the X.org Developpers Conference (XDC) happened two weeks ago in A Coruña, Spain. While its name suggests that it might be focused solely on the X.org display server, this conference is actually targeted at the whole Linux graphics stack, including alternative stacks like Android’s or Wayland servers. Following our involvment in the Linux DRM subsystem, and to deepen our understanding and involvement in the graphics stack, Bootlin sent one engineer, Maxime Ripard, maintainer of the sun4i DRM driver.

There’s been a lot of interesting talks during those three days, as you can see in the conference schedule, but we especially liked a few of those:

Jens Owens, Pierre-Loup Griffais – Open Source Driver Development Funding Hooking up the Money Hose – Slides

The opening talk was made by Jens Owens, from Google, and Pierre-Loup Griffais, from Valve. They provided some interesting feedback and insights from two companies with a quite central position in the gaming industry. They also advocated for open source drivers, and the way they were actually helping the game developpers.

Haneen Mohamed, Rodrigo Siqueira: VKMS – Slides

Haneen Mohamed and Rodrigo Siqueira were on stage to talk about their work as part of the Google Summer of Code and Outreachy programs to work on a Virtual KMS driver. This driver is still in its early stage, but got merged and while basic at the moment, holds a lot of promises to use it as a KMS backend for testing the KMS API.

Jerome Glisse – getting rid of get_user_page() in favor of HMM – Slides

Jerome Glisse, from Red Hat, came to describe his current work on the memory management subsystem of the Linux kernel. He’s working on dealing with the constraints that the systems using GPUs to offload computations have when it comes to allocating memory in the most efficient way.

Overall, this was a great description about the get_user_page interface pitfalls when used in that context, and from the work that he has been doing for the past years to overcome them.

Lyude Paul, Alyssa Rosenzweig – Introducing Panfrost – Slides

In that talk, Lyude Paul and Alyssa Rosenzweig were showing the work they did on Panfrost, the reverse-engineering effort around the Mali-T GPUs from ARM (which was then expanded to the Mali-G GPUs). They discussed the result of their findings, explained the architecture of the GPU and then talked about the current state of their work. The final part of the talk was a quick demo of their work on a Rockchip SoC. It provided a great overview of the current state of the driver, and there’s a lot of hope for an open-source driver for that GPU that is quite widely used on ARM.

Elie Tournier – What’s new in the virtual world? – Slides

Elie Tournier, from Collabora, gave a talk about his work and the current state of virgl, which is a virtual 3D GPU meant to be used within qemu virtual machines, while remaining independant of the host GPU. While we were aware of the existence of that driver for quite some time, this talk provided a great overview of the features that are provided by virgl, and what you can and cannot do with it.

Conclusion

After XDC in 2016 in Helsinki, this was our second time attending that conference. Just like the first time, we really enjoyed the single track format where you can meet all the attendees and have side discussions pretty easily. Once again, the talks were great, and lead us to think about interesting developments we could do on our various projects.

Bootlin back from Kernel Recipes!

As announced previously, we participated to the Kernel Recipes conference in September in Paris. Three people from Bootlin attended the event: Grégory Clement who gave a talk about SD/eMMC, Boris Brezillon and Mylène Josserand.
Unfortunately, we were not able to attend the Embedded Recipes conference but we hope to catch up next year!

Overview of SD/eMMC, their high speed modes and Linux support, by Grégory Clément

Here is the video of Grégory’s presentation:

You can find the slides on our website.

KernelShark 1.0; What’s new and what’s coming, by Steven Rostedt – VMware

The first day, one of the most enjoyable talk was “KernelShark 1.0; What’s new and what’s coming”. One reason is the speaker itself, Steven Rostedt, who is very experienced in presenting. He always knows very well the approached subject and does a few jokes during the talk: all of these lead to a very pleasant talk.

From my point of view, the talk itself presents two interesting subjects: the process of developing a tool’s front end (with trace-cmd being the example) and then a presentation of this GUI.

Talk chosen by Grégory

Atomic explosion: evolution and use of relaxed concurrency primitives, by Will Deacon – ARM

On the second day, Will Deacon talked about an interesting topic “Atomic explosion: evolution and use of relaxed concurrency primitives”. As usual with Will, the technical level is high and seeing the video a second time is recommended to really put the multiple pieces of information together.

Besides the explanations on the atomic operation and their meaning from the point of view of the CPUs, Will also presented his new API, how and when we should use it.

Talk chosen by Grégory

Coccinelle: 10 Years of Automated Evolution in the Linux Kernel, by Julia Lawall – INRIA/LIP6

Happy birthday, Coccinelle!
It has been 10 years that this project is helping kernel developers to track bugs or clean the kernel up. For this event, Julia did a retrospective and a “what’s new” of this project.

Initially used only by Coccinelle developers, it was quickly adopted by all the kernel community. It was interesting to have the history, feedback and also updates on this project that is more and more used now.

Talk chosen by Mylène

Meltdown and Spectre: seeing through the magician’s tricks, by Paolo Bonzini – Red Hat

Paolo Bonzini did a great presentation about Meltdown and Spectre with a detailed description of the different mechanisms taken advantage of by these two issues: branch prediction, memory mapping, paging, etc.
It was a great overview and well explained.

Talk chosen by Mylène

The end word

As usual at Kernel Recipes, Frank Tizzoni is in the room to draw sketches of attendees and speakers! Have a look at all the sketches! Some of them are really funny 🙂

It is the first time we attended the Kernel Recipes and this conference is as good as the feedback we received from people who were in the previous editions.
The major points are the high quality of the talks, the interaction between the speaker and the audience but also the social events around it.

Boris and Grégory

It is the second time that I attended Kernel Recipes and I am still convinced that this conference is really nice.
The talks, the audience, the format (limited to 100 people) and all social events are great!
My only regret is that I was not able to attend Embedded Recipes to enjoy a bit more the ambiance around these two conferences.
I hope to register in time next year! 😉

Mylène

Bootlin at the ALPSS 2018 conference

The second edition of the Alpine Linux Persistent Storage Summit (ALPSS) happened two weeks ago in the Lizumerhütte Alpine lodge. Close to Innsbruck, Austria, the lodge resides in an amazingly beautiful valley. Completely separated from the rest of the world in Winter, this year edition was marked by the absence of data network access, intensifying the feeling of isolation, stimulating the exchanges between attendees. To strengthen the representation of MTD developers at this event, Bootlin sent two of his engineers: Boris Brezillon and Miquèl Raynal, respectively MTD and NAND maintainers in the Linux kernel.

Cow with a beautiful view over the Alps
Picture taken while climbing to the lodge. Author: Hans Holmberg, 2018 (CC-BY-SA)

NVMe, open-channel and zoned namespaces

While almost all the ~30 attendees work on storage support that are based on NAND flashes, a majority work on domains targeting high-performances, where power-cuts are not the issue but the latency and throughput are. Far beyond our embedded world, people are working hard on the parallelization and the standardization of high-speed interfaces (SCSI, NVMe). In the end, we all have to make the software deals with the NAND-specific constraints of the underlying storage device.

Disclaimer: This is a short summary (not exhaustive) of the “high-performance” world talks as we could understand them. This is probably not 100% accurate as the topics discussed are, currently, out of our domain of expertise. Corrections are welcome.

Matias Bjørling (Western Digital) and Christoph Hellwig presented new NVMe commands to manage NVMe zones. While zones need write order to be preserved, the Linux multi-queue block I/O queueing mechanism (blk-mq) cannot enforce this. Bart van Assche (Google) and Damien Le Moal (Western Digital) proposed a draft to reorder writes at the blk-mq layer. While this solution was not very well received, it opened the discussion on how the issue should be addressed. Bart van Assche also presented his work on copy offload mechanism in Linux, which could for instance serve to fast copy entire zones. His work could be also useful to Stephen Bates who works on PCIe peer-to-peer and talked on how he wants to eg. enable DMA between SSDs. Still on the topic of DMA and performances, Idan Burstein (Mellanox) exposed the cutting-edge features he worked on to improve Remote DMA (RDMA) performances.

MTD was also present to the party

Probably the easier part to understand for us, embedded people.

Boris and Miquèl presenting
Boris and Miquèl presenting about memories. Author: Brian Pawlowski, 2018 (CC-BY)

Boris Brezillon and Miquèl Raynal gave a talk on their recent work support for SPI memories in Linux (and U-Boot, but this will be more detailed at ELCE in October). Boris wrote a new SPI-NAND layer, converting MTD requests into SPI exchanges, giving the flow of commands to the (also brand new) SPI-mem layer to standardize how to speak with SPI controller drivers from both SPI-NAND and SPI-NOR stacks. Cleaning work is still needed on the SPI-NOR side as well as the addition of new features like direct mapping, XIP (that was discussed after the talk), the addition of support for more chips and the conversion to SPI-mem of more SPI controllers. The slides are available online, see also our previous blog post on this topic.



Richard Weinberger (from Sigma Star GmbH, and co-maintainer of MTD and UBI/UBIFS) updated us about the level of power-cut testing available to challenge the MTD stack. Tracing is possible to get closer to the failing sequence but one big problem is to replay the sequence and reproduce the issue. Tracking down untested code path is very important to keep UBI/UBIFS as reliable as possible: this is what is generally the most important when using SPI/parallel NAND devices.

Richard’s co-worker David Gstir also works on UBI/UBIFS, but on the authentication side. Bringing filesystem authentication to UBIFS could have been simple but during his introduction he disqualified most of the alternatives he had (dm-verity, fs-verity, …). Fun-fact about fs-verity, authentication would have work on the file’s contents, but not on the inodes themselves. Hence, the file’s content could not be changed, but the file itself could still be moved. So, a brand new solution has been implemented for UBIFS, upstreaming ongoing.

Original ideas presented

Benchmarking real hardware was somehow not adapted to Damien Le Moal experiments. He hacked QEMU to add the possibility to tune CPU latency so that he could compare easily the latency on in-memory data processing paths. WIP.

Johannes Thumshirn (SUSE Labs), as a side project, started reversing APFS, Apple’s new filesystem. The firm promised two years ago to release the implementation of its filesystem so that computers running Microsoft or Linux could mount it. So far nothing happened, that is why, without even a Mac in hand, he started spending nights hex-dumping structures from a filesystem image he got, reverse-engineering the content with the help of research papers already produced. The first results are there, he can now ls and cat random files!

And after talks and hiking: time to BOFs

View from the lodge of a lake and the mountains
View from the lodge. Author: Brian Pawlowski, 2018 (CC-BY)

A bit before the official BOFs time MTD folks gathered around Hans Holmberg (CNEX Labs) to carefully listen about how pblk works, a “Physical block device” FTL for SSDs supporting open-channel that could give ideas to some of them. Why not an entirely open-source SSD running Linux with its own FTL?

Finally, between all the interesting discussions that happened, we could mention the need for a generic NVMe-oF (NVMe over Fabric) discovery protocol raised by Hannes Reinecke (SUSE Labs), and the possible evolution of the MTD stack to integrate an I/O scheduler to provide much better (and parallelized) performances exposed by Boris Brezillon.

Conclusion

All attendees agreed this format of conference is really pleasant, the surrounding helping a lot to the general wellness and the success of this year’s edition of the ALPSS. We will definitely try to make it next year!

Allwinner VPU support in mainline Linux status update (week 37)

Even though the bulk of the development on the Allwinner VPU support is done, we are still working on completing the upstreaming of the kernel driver, and some progress has been made recently on this topic:

  • On September 10, core Video4Linux developer Hans Verkuil sent a pull request to Video4Linux maintainer Mauro Carvalho Chehab to get the Cedrus driver merged. This means we’re getting closer and closer to have the driver merged. Unfortunately, some last minute issues were found in the patch series, so this pull request wasn’t merged.
  • On September 13, Bootlin engineer Maxime Ripard sent a new iteration of the Cedrus driver, version 10, which addresses those issues.
  • In addition, as the Allwinner platform maintainer, Maxime Ripard has merged the patches adding the Device Tree description of the Allwinner VPU, which reduces the Cedrus patch series to just 5 patches. They are now in the branch sunxi/dt-for-4.20, which should be part of the upcoming 4.20 Linux release.
T-Shirt for Allwinner VPU campaign supporters
T-Shirt for Allwinner VPU campaign supporters

In addition to this progress on the Linux kernel driver upstreaming process, we also moved forward with delivering the perks to the companies and individuals who supported our campaign:

  • A CREDITS file has been added to the libva-v4l2-request base, thanking all our backers who pleged more than 16 EUR.
  • The T-Shirts for the backers who pledged more than 128 EUR have been sent to those in the EU. We are also working on sending the t-shirts to those outside the EU, but it takes a bit more time due to the need for customs declarations. Don’t hesitate to take a picture of you with the T-Shirt, and post it on Twitter with the hashtag #VPULinuxDriverSupporter.

Bootlin at the Linux Plumbers 2018 conference

Last year, a number of Bootlin engineers attended the Linux Plumbers conference. This year again, Bootlin will participate to the event, with engineer Antoine Ténart traveling to Vancouver, Canada on November 13-15 for this conference.

Linux Plumbers 2018

We are particularly interested in attending the new Networking Track added to Linux Plumbers for the first time, but there will certainly be useful discussions as well in the BPF micro-conference, the Real-time micro-conference or the Power Management and Energy-awareness micro-conference.

If you’re attending this conference, don’t hesitate to get in touch with Antoine and meet during the event!

Final weekly status update for Allwinner VPU support in mainline Linux (week 35)

The end of August has arrived, bringing an end to Paul’s engineering internship at Bootlin, focused on bringing mainline Linux support for the VPU found on Allwinner platforms. Over the past six months, we have worked hard to reach the goals announced in the project’s crowdfunding campaign and we were able to deliver most of the main goals last month.

Since last month delivery, we made great progress on supporting the H265 codec, one of the stretch goals that were funded during the campaign. A dedicated patch series introducing support for it was submitted to the linux-media mailing list earlier this week, as well as a new iteration of the base Cedrus VPU driver. As the Request API is on the verge of integrating the Linux kernel, our VPU driver should follow pretty soon.

Reaching the end of the funding: a status on where we stand

We have now exhausted the budget that was provided through the crowdfunding campaign: both Maxime Ripard’s time (who worked mainly on the H264 decoding and helping with DRM topics) and Paul’s internship are over, and therefore the remaining work will be done on a best-effort basis, without direct funding. This will therefore be the last weekly update, but we will be publishing updates once in a while when interesting progress is made.

Here is a quick summary of our current status, compared to what was promised during our Kickstarter campaign:

  • Making sure that the codec works on the older Allwinner SoCs that are still widely used: A10, A13, A20, A33, R8 and R16. This goal is fully met;
  • Polishing the existing MPEG2 decoding support to make it fully production ready. This goal is fully met;
  • Implementing H264 video decoding. This goal is fully met with base H264 decoding support implemented. However, a number of more advanced H264 features have not been implemented, and therefore additional improvements could be made;
  • Modifying the Allwinner display driver in order to be able to directly display the decoded frames instead of converting and copying those frames. This goal is fully met.
  • Providing a user-space library easy to integrate in the popular open-source video players. This goal is partially met. We do provide a user-space library that offers a VA-API implementation, however the integration with popular video players turned out to be a lot more challenging than expected, and we only offer Kodi integration at this point. See below for details;
  • Upstreaming those changes to the official Linux kernel. This goal is in progress, on both the VPU driver side and DRM improvements side;
  • Supporting the newer Allwinner SoCs (H3, H5, A64). This goal is partially met, since H3 is supported, but not yet H5 and A64;
  • H265 video decoding support. This goal is fully met with base H265 decoding support implemented. Like H264, a number of more advanced features have not been implemented, so there is room for more work.

The most challenging topic: integration with open source video players

The major pitfalls that we encountered are related to integrating our accelerated video decoding pipeline with multimedia players. They will require extra work out of the scope of the VPU campaign to reach a production-ready state.

We considered a number of options for integrating with a desktop environment under Xorg, which was especially tricky for the oldest Allwinner platforms where the VPU outputs a tiled YUV format. The chain of required operations includes untiling, colorspace conversion (from YUV to RGB), scaling and composition.

  • We first resorted to the main CPU for all the required operations (including NEON-backed untiling routines), which becomes unbearably slow as soon as scaling is involved in the process.
  • We tried to bring-in the GPU for accelerating the untiling, colorspace conversion, scaling and composition operations involved. Although we wrote a shader-based untiler, the Mali blobs did not allow for importing the raw frame data on a byte-by-byte basis. This made GPU acceleration unusable for our use case in practice. Bringing-in the GPU for the final composition step only (that should be possible with GBM-enabled blobs) could however bring some speedup.
  • Another lead is to use the Xv extension of the X11 API, that fits the bill for using the Display Engine hardware to accelerate these operations, but this interface is quite old now and increasingly deprecated. It also only allows sub-optimal use cases, with one video at a time.

We also investigated the situation for media players that can run without a display server, which removes the need for the composition step and allows using the Display Engine hardware directly, through the DRM interface.

  • We succeeded at bringing up support for the Kodi mediacenter, by adding the required bits to implement a zero-copy pipeline.
  • We worked on getting GStreamer to correctly pipe VAAPI-based decoding to the DRM-enabled kmssink without going through the GPU, but did not end up with any functional result, so significant work remains in that area.

Going further: what will happen now ?

Here are the topics that we intend to continue work on in this best-effort mode and complete by the end of 2018, as promised in our crowdfunding campaign:

  • Ensure the base Cedrus Linux kernel driver gets merged;
  • Ensure the H264 decoding support in the Cedrus driver gets merged;
  • Ensure the H265 decoding support in the Cedrus driver gets merged;
  • Ensure the DRM driver improvements get merged;
  • Enable VPU support on H5 and A64.

Here are other topics that we do not intend to work on without additional funding. Individuals who want to see some progress on those topics are invited to contribute and join the effort of improving Allwinner VPU support in upstream Linux. Companies interested in those features can also contact us.

  • Additional H264 and H265 decoding features: interlaced video support (H264 and H265), quantization matrices (H265), 10-bit (H265), 4K resolution (H265);
  • Other codecs beyond MPEG2, H264 and H265, such as VP8;
  • Encoding support;
  • Additional work on GStreamer integration or X.org integration.

Thanks

Once again, we would like to thank all the individuals and companies who participated to our crowdfunding campaign, and made this project possible. We are very happy to see that despite the uncertainties involved in all software development projects, we have been to deliver the vast majority of the goals, within the expected time frame, while delivering weekly updates of our progress. It was a new experience for Bootlin, and we hope to renew this experience for other Linux kernel upstream developments in the future!

Upstream Linux support for Microsemi Ethernet Switch

VSC7513 Block Diagram
Microsemi VSC7513 Block Diagram
Starting last year, we have been working on the Microsemi VSC7513 and VSC7514 MIPS processors.

They have a 500 MHz MIPS 24KEc CPU and the usual DDR, UART, I2C and SPI controllers. But more interestingly, they also have an 8 or 10-port Gigabit Ethernet switch allowing to offload common network bridging operations to the hardware. As is usual for that kind of products, the vendor-provided SDK (called WebStaX) used to configure the switch is running in userspace and uses a custom in-kernel UIO driver to talk to the hardware.

However, this has now changed as we submitted support for the platform and the switch to the upstream Linux kernel:

The whole driver based on the switchdev Linux kernel subsystem, is about 5700 lines long.

Microsemi VSC7514EV

Thanks to this work, it is now possible to use standard Linux user-space tools to configure the switch. For example, the following will bridge the switch port and offload to the hardware:

ip link add name br0 type bridge
ip link set dev sw0p0 master br0
ip link set dev sw0p1 master br0

To achieve hardware offloading, the driver needs to:

  • configure port forwarding i.e. to what port the frames coming form a particular port should be forwarded;
  • handle the MAC table: this table is the one used to know on which port which machine is connected. Also, the broadcast and multicast MAC have to be installed;
  • handle STP port state: whether the port is allowed to forward frames or learn new MAC addresses;

VLANs are configured using ip and bridge:

ip link set dev br0 type bridge vlan_filtering 1
bridge vlan add dev sw0p0 vid 1 pvid untagged
bridge vlan add dev sw0p1 vid 1
bridge vlan add dev sw0p0 vid 30
bridge vlan add dev sw0p1 vid 30

Here, the driver configures the VIDs on each port and what to do about them (tag, untag, forward).

Configuring link aggregation is also done with ip:

ip link add name aggr0 type bond
ip link set dev eth_yellow master aggr0
ip link set dev eth_blue master aggr0

The driver has to configure the aggregated ports and the balancing mode. It also has to ensure the switch will forward the control frame (LACPDUs) to the CPU so Linux can know the state of the links.

IGMP snooping is a simple feature where the switch is able to push new multicast addresses to the CPU so Linux can install the MACs in the table and avoid having to forward the multicast frames on all the switch ports. In our case, it is simply enabled using a single register when multicasting is enabled on the bridge.

The switch supports more features to be worked on: PTP timestamping, QoS and packet filtering to name a few. We have already implemented PTP support, and we will be submitting upstream this additional feature in the near future.

To learn more about the inner workings of switchdev, you can refer to Alexandre Belloni’s ELC talk:




If you’re interested about upstream Linux kernel support for other Ethernet switches, do not hesitate to contact us!

Allwinner VPU support in mainline Linux status update (week 34)

This week has seen great advancements in H265 support, following up on the work conducted during the past weeks. The first item to debug was support for bi-directional predictive frames (AKA B frames) which was broken last week. This required some adaptation in our standalone test tool v4l2-request-test in order to display the decoded frames in the right order. With bi-directional prediction, the display order no longer matches the decoding order, in which the coded frames are stored in the bitstream.

With the images displayed in the right order, the debugging process was a matter of comparing the configuration register values written by our driver with the reference provided by libvdpau-sunxi, but it was not enough. A specific buffer has to be provided for each frame for the decoder to store extra meta-data related to bi-directional frame prediction. With the buffer set, the situation vastly improved and only minor issues had to be resolved.

This lead to properly decoding our reference H265 video that contains I, P and B frames! A few more videos were also tested to spot possible bugs and were eventually decoded correctly too. Of course, due to the many possible combinations of H265 features, it is possible that we are still missing some corner cases, but the bulk of H265 support is well in place at this point.

We moved on to adding support for H265 in libva-v4l2-request, which allows the integration of the codec with media players such as VLC and Kodi. We hit a few hiccups during the bringup :

Hiccups when integrating H265 with VAAPI

But we managed to fix the integration of H265 to behave properly :

H265 decoding working properly with VAAPI

So H265 is now integrated in our pipeline and we are ready to submit the patches introducing its support for the Cedrus driver, which should come around next week.

Bootlin at the X.org Developer Conference

Bootlin engineer Maxime Ripard will be attending the X.org Developer Conference 2018, from September 26 to September 28 in A Coruña, Spain. This conference is the main event to discuss Linux graphics and display related topics and meet the Linux kernel and userspace developers working in this field.

At Bootlin, Maxime has been involved over the last few years in a number of display related developments:

  • He is the initial author and the maintainer of the DRM display controller driver for the Allwinner processors, to which he has progressively added numerous features over the years, including parallel RGB support, HDMI support, DSI support and TV-out support, for many different Allwinner platforms.
  • He has worked on enabling OpenGL support on Allwinner platforms using the open-source kernel driver and the closed-source binary blob provided by ARM, making OpenGL work using a mainline and upstream Linux kernel on Allwinner hardware. As part of this, Maxime designed and upstreamed a Device Tree binding to describe the Mali GPU and maintains sunxi-mali, a fork of the ARM-provided kernel driver for Mali, modified to work with the upstream Linux kernel.
  • Maxime has been involved in setting up automated testing of the RaspberryPi display subsystem, using the Chamelium platform and the intel-gpu-tools test suite. See our blog post on this topic.
  • As part of Bootlin’s work on the Linux support for the Allwinner VPU (funded by our crowdfunding campaign earlier this year), Maxime got involved into issues related to feeding the output of the VPU into the display pipeline found on Allwinner platforms.

Bootlin at the Alpine Linux Persistence and Storage Summit

A group of Linux kernel developers is organizing on September 11-14 the Alpine Linux Persistence and Storage Summit, a meeting of kernel developers to discuss the hot topics in Linux storage and file systems, such as persistent memory, NVMe, multi-pathing, raw or open channel flash and I/O scheduling.

Bootlin engineers Boris Brezillon, who is the co-maintainer of the MTD subsystem in the Linux kernel, and Miquèl Raynal, who is the co-maintainer of the NAND subsystem in the Linux kernel, will be attending this event. Through this participation, Bootlin is supporting the work done by its engineers acting as Linux kernel maintainers: they will have the chance to meet other kernel developers and discuss the current issues and future of storage-related subsystems. After the event, we will be reporting on our blog about the discussions that took place.