Feedback from the Netdev 2.1 conference

At Bootlin, we regularly work on networking topics as part of our Linux kernel contributions and thus we decided to attend our very first Netdev conference this year in Montreal. With the recent evolution of the network subsystem and its drivers capabilities, the conference was a very good opportunity to stay up-to-date, thanks to lots of interesting sessions.

Eric Dumazet presenting “Busypolling next generation”

The speakers and the Netdev committee did an impressive job by offering such a great schedule and the recorded talks are already available on the Netdev Youtube channel. We particularly liked a few of those talks.

Distributed Switch Architecture – slidesvideo

Andrew Lunn, Viven Didelot and Florian Fainelli presented DSA, the Distributed Switch Architecture, by giving an overview of what DSA is and by then presenting its design. They completed their talk by discussing the future of this subsystem.

DSA in one slide

The goal of the DSA subsystem is to support Ethernet switches connected to the CPU through an Ethernet controller. The distributed part comes from the possibility to have multiple switches connected together through dedicated ports. DSA was introduced nearly 10 years ago but was mostly quiet and only recently came back to life thanks to contributions made by the authors of this talk, its maintainers.

The main idea of DSA is to reuse the available internal representations and tools to describe and configure the switches. Ports are represented as Linux network interfaces to allow the userspace to configure them using common tools, the Linux bridging concept is used for interface bridging and the Linux bonding concept for port trunks. A switch handled by DSA is not seen as a special device with its own control interface but rather as an hardware accelerator for specific networking capabilities.

DSA has its own data plane where the switch ports are slave interfaces and the Ethernet controller connected to the SoC a master one. Tagging protocols are used to direct the frames to a specific port when coming from the SoC, as well as when received by the switch. For example, the RX path has an extra check after netif_receive_skb() so that if DSA is used, the frame can be tagged and reinjected into the network stack RX flow.

Finally, they talked about the relationship between DSA and Switchdev, and cross-chip configuration for interconnected switches. They also exposed the upcoming changes in DSA as well as long term goals.

Memory bottlenecks – slides

As part of the network performances workshop, Jesper Dangaard Brouer presented memory bottlenecks in the allocators caused by specific network workloads, and how to deal with them. The SLAB/SLUB baseline performances are found to be too slow, particularly when using XDP. A way from a driver to solve this issue is to implement a custom page recycling mechanism and that’s what all high-speed drivers do. He then displayed some data to show why this mechanism is needed when targeting the 10G network budget.

Jesper is working on a generic solution called page pool and sent a first RFC at the end of 2016. As mentioned in the cover letter, it’s still not ready for inclusion and was only sent for early reviews. He also made a small overview of his implementation.

DDOS countermeasures with XDP – slides #1slides #2 – video #1video #2

These two talks were given by Gilberto Bertin from Cloudflare and Martin Lau from Facebook. While they were not talking about device driver implementation or improvements in the network stack directly related to what we do at Bootlin, it was nice to see how XDP is used in production.

XDP, the eXpress Data Path, provides a programmable data path at the lowest point of the network stack by processing RX packets directly out of the drivers’ RX ring queues. It’s quite new and is an answer to lots of userspace based solutions such as DPDK. Gilberto andMartin showed excellent results, confirming the usefulness of XDP.

From a driver point of view, some changes are required to support it. RX hooks must be added as well as some API changes and the driver’s memory model often needs to be updated. So far, in v4.10, only a few drivers are supporting XDP.

XDP MythBusters – slides – video

David S. Miller, the maintainer of the Linux networking stack and drivers, did an interesting keynote about XDP and eBPF. The eXpress Data Path clearly was the hot topic of this Netdev 2.1 conference with lots of talks related to the concept and David did a good overview of what XDP is, its purposes, advantages and limitations. He also quickly covered eBPF, the extended Berkeley Packet Filters, which is used in XDP to filter packets.

This presentation was a comprehensive introduction to the concepts introduced by XDP and its different use cases.

Conclusion

Netdev 2.1 was an excellent experience for us. The conference was well organized, the single track format allowed us to see every session on the schedule, and meeting with attendees and speakers was easy. The content was highly technical and an excellent opportunity to stay up-to-date with the latest changes of the networking subsystem in the kernel. The conference hosted both talks about in-kernel topics and their use in userspace, which we think is a very good approach to not focus only on the kernel side but also to be aware of the users needs and their use cases.

Bootlin at the Netdev 2.1 conference

Netdev 2.1 is the fourth edition of the technical conference on Linux networking. This conference is driven by the community and focus on both the kernel networking subsystems (device drivers, net stack, protocols) and their use in user-space.

This edition will be held in Montreal, Canada, April 6 to 8, and the schedule has been posted recently, featuring amongst other things a talk giving an overview and the current status display of the Distributed Switch Architecture (DSA) or a workshop about how to enable drivers to cope with heavy workloads, to improve performances.

At Bootlin, we regularly work on networking related topics, especially as part of our Linux kernel contribution for the support of Marvell or Annapurna Labs ARM SoCs. Therefore, we decided to attend our first Netdev conference to stay up-to-date with the network subsystem and network drivers capabilities, and to learn from the community latest developments.

Our engineer Antoine Ténart will be representing Bootlin at this event. We’re looking forward to being there!

Bootlin at the Embedded Linux Conference 2017

Last month, five engineers from Bootlin participated to the Embedded Linux Conference in Portlan, Oregon. It was once again a great conference to learn new things about embedded Linux and the Linux kernel, and to meet developers from the open-source community.

Bootlin team at work at ELC 2017, with Maxime Ripard, Antoine Ténart, Mylène Josserand and Quentin Schulz

Bootlin talks

Bootlin CEO Michael Opdenacker gave a talk on Embedded Linux Size Reduction techniques, for which the slides and video are available:

Bootlin engineer Quentin Schulz gave a talk on Power Management Integrated Circuits: Keep the Power in Your Hands, the slides and video are also available:

Bootlin selection of talks

Of course, the slides from many other talks are progressively being uploaded, and the Linux Foundation published the video recordings in a record time: they are all already available on Youtube!

Below, each Bootlin engineer who attended the conference has selected one talk he/she has liked, and gives a quick summary of the talk, hopefully to encourage you watch the corresponding video recording.

Using SWupdate to Upgrade your system, Gabriel Huau

Talk selected by Mylène Josserand.

Gabriel Huau from Witekio did a great talk at ELC about SWUpdate, a tool created by Denx to update your system. The talk gives an overview of this tool, how it is working and how to use it. Updating your system is very important for embedded devices to fix some bugs/security fixes or add new features, but in an industrial context, it is sometimes difficult to perform an update: devices not easily accessible, large number of devices and variants, etc. A tool that can update the system automatically or even Over The Air (OTA) can be very useful. SWUpdate is one of them.

SWUpdate allows to update different parts of an embedded system such as the bootloader, the kernel, the device tree, the root file system and also the application data.
It handles different image types: UBI, MTD, Raw, Custom LUA, u-boot environment and even your custom one. It includes a notifier to be able to receive feedback about the updating process which can be useful in some cases. SWUPdate uses different local and OTA/remote interfaces such as USB, SD card, HTTP, etc. It is based on a simple update image format to indicate which images must be updated.

Many customizations can be done with this tool as it is provided with the classic menuconfig configuration tool. One great thing is that this tool is supported by Yocto Project and Buildroot so it can be easily tested.

Do not hesitate to have a look to his slides, the video of his talk or directly test SWUpdate!

GCC/Clang Optimizations for embedded Linux, Khem Raj

Talk selected by Michael Opdenacker.

Khem Raj from Comcast is a frequent speaker at the Embedded Linux Conference, and one of his strong fields of expertise is C compilers, especially LLVM/Clang and Gcc. His talk at this conference can interest anyone developing code in the C language, to know about optimizations that the compilers can use to improve the performance or size of generated binaries. See the video and slides.

Khem Raj slide about compiler optimization optionsOne noteworthy optimization is Clang’s -Oz (Gcc doesn’t have it), which goes even beyond -Os, by disabling loop vectorization. Note that Clang already performs better than Gcc in terms of code size (according to our own measurements). On the topic of bundle optimizations such as -O2 or -Os, Khem added that specific optimizations can be disabled in both compilers through the -fno- command line option preceding the name of a given optimization. The name of each optimization in a given bundle can be found through the -fverbose-asm command line option.

Another new optimization option is -Og, which is different from the traditional -g option. It still allows to produce code that can be debugged, but in a way that provides a reasonable level of runtime performance.

On the performance side, he also recalled the Feedback-Directed Optimizations (FDO), already covered in earlier Embedded Linux Conferences, which can be used to feed the compiler with profiler statistics about code branches. The compiler can use such information to optimize branches which are the more frequent at run-time.

Khem’s last advise was not to optimize too early, and first make sure you do your debugging and profiling work first, as heavily optimized code can be very difficult to debug. Therefore, optimizations are for well-proven code only.

Note that Khem also gave a similar talk in the IoT track for the conference, which was more focused on bare-metal code optimization code and portability: “Optimizing C for microcontrollers” (slides, video).

A Journey through Upstream Atomic KMS to Achieve DP Compliance, Manasi Navare

Talk selected by Quentin Schulz.

This talk was about the journey of a new comer in the mainline kernel community to fix the DisplayPort support in Intel i915 DRM driver. It first presented what happens from the moment we plug a cable in a monitor until we actually see an image, then where the driver is in the kernel: in the DRM subsystem, between the hardware (an Intel Integrated Graphics device) and the libdrm userspace library on which userspace applications such as the X server rely.

The bug to fix was that case when the driver would fail after updating to the requested resolution for a DP link. The other existing drivers usually fail before updating the resolution, so Manasi had to add a way to tell the userspace the DP link failed after updating the resolution. Such addition would be useless without applications using this new information, therefore she had to work with their developers to make the applications behave correctly when reading this important information.

With a working set of patches, she thought she had done most of the work with only the upstreaming left and didn’t know it would take her many versions to make it upstream. She wished to have sent a first version of a driver for review earlier to save time over the whole development plus upstreaming process. She also had to make sure the changes in the userspace applications will be ready when the driver will be upstreamed.

The talk was a good introduction on how DisplayPort works and an excellent example on why involving the community even in early stages of the development process may be a good idea to quicken the overall driver development process by avoiding complete rewriting of some code parts when upstreaming is under way.

See also the video and slides of the talk.

Timekeeping in the Linux Kernel, Stephen Boyd

Talk selected by Maxime Ripard.

Stephen did a great talk about one thing that is often overlooked, and really shouldn’t: Timekeeping. He started by explaining the various timekeeping mechanisms, both in hardware and how Linux use them. That meant covering the counters, timers, the tick, the jiffies, and the various POSIX clocks, and detailing the various frameworks using them. He also explained the various bugs that might be encountered when having a too naive counter implementation for example, or using the wrong POSIX clock from an application.

See also the video and slides of the talk.

Android Things, Karim Yaghmour

Talk selected by Antoine Ténart

Karim did a very good introduction to Android Things. His talk was a great overview of what this new OS from Google targeting embedded devices is, and where it comes from. He started by showing the history of Android, and he explained what this system brought to the embedded market. He then switched to the birth of Android Things; a reboot of Google’s strategy for connected devices. He finally gave an in depth explanation of the internals of this new OS, by comparing Android Things and Android, with lots of examples and demos.

Android Things replaces Brillo / Weave, and unlike its predecessor is built reusing available tools and services. It’s in fact a lightweight version of Android, with many services removed and a few additions like the PIO API to drive GPIO, I2C, PWM or UART controllers. A few services were replaced as well, most notably the launcher. The result is a not so big, but not so small, system that can run on headless devices to control various sensors; with an Android API for application developers.

See also the video and slides of the talk.

Bootlin at Linux.conf.au, January 2017

Linux.conf.au, which takes place every year in January in Australia or New Zealand, is a major event of the Linux community. Bootlin already participated to this event three years ago, and will participate again to this year’s edition, which will take place from January 16 to January 20 2017 in Hobart, Tasmania.

Linux Conf Australia 2017

This time, Bootlin CTO Thomas Petazzoni will give a talk titled A tour of the ARM architecture and its Linux support, in which he will share with LCA attendees what is the ARM architecture, how its Linux support is working, what the numerous variants of ARM processors and boards mean, what is the Device Tree, the ARM specific bootloaders, and more.

Linux.conf.au also features a number of other kernel related talks, such as the Kernel Report from Jonathan Corbet, Linux Kernel memory ordering: help arrives at last from Paul E. McKenney. The list of conferences is very impressive, and the event also features a number of miniconfs, including one on the Linux kernel.

If some of our readers located in Australia, New Zealand or neighboring countries plan on attending the conference, do not hesitate to drop us a mail so that we can meet during the event!

Slides and videos from the Embedded Linux Conference Europe 2016

Last month, the entire Bootlin engineering team attended the Embedded Linux Conference Europe in Berlin. The slides and videos of the talks have been posted, including the ones from the seven talks given by Bootlin engineers:

  • Alexandre Belloni presented on ASoC: Supporting Audio on an Embedded Board, slides and video.
  • Boris Brezillon presented on Modernizing the NAND framework, the big picture, slides and video.
  • Boris Brezillon, together with Richard Weinberger from sigma star, presented on Running UBI/UBIFS on MLC NAND, slides and video.
  • Grégory Clement presented on Your newer ARM64 SoC Linux check list, slides and video.
  • Thomas Petazzoni presented on Anatomy of cross-compilation toolchains, slides and video.
  • Maxime Ripard presented on Supporting the camera interface on the C.H.I.P, slides and video.
  • Quentin Schulz and Antoine Ténart presented on Building a board farm: continuous integration and remote control, slides and video.

Bootlin at the X.org Developer Conference 2016

The X.org Foundation hosts every year around september the X.org Developer Conference, which, unlike its name states, is not limited to X.org developers, but gathers all the Linux graphics stack developers, including X.org, Mesa, wayland, and other graphics stacks like ChromeOS, Android or Tizen.

This year’s edition was held last week in the University of Haaga-Helia, in Helsinki. At Bootlin, we’ve had more and more developments on the graphic stack recently through the work we do on Atmel and NextThing Co’s C.H.I.P., so it made sense to attend.

XDC 2016 conference

There’s been a lot of very interesting talks during those three days, as can be seen in the conference schedule, but we especially liked a few of those:

DRM HWComposer – SlidesVideo

The opening talk was made by two Google engineers from the ChromeOS team, Sean Paul and Zach Reizner. They talked about the work they did on the drm_hwcomposer they wrote for the Pixel C, on Android.

The hwcomposer is one of the HAL in Android that interfaces between Surface Flinger, the display manager, and the underlying display driver. It aims at providing hardware composition features, so that Android can leverage the capacities of the display engine to perform compositions (through planes and sprites), without having to use the CPU or the GPU to do this work.

The drm_hwcomposer started out as yet another hwcomposer library implementation for the tegra-drm driver in Linux. While they implemented it, it turned into some generic enough implementation that should be useful for all the DRM drivers out there, and they even introduced some particularly nice features, to split the final screen content into several planes based on the actual displayed content rather than on windows like it’s usually done.

Their work also helped to point out a few flaws in the hwcomposer API, that will eventually be fixed in a new revision of that API.

ARC++ SlidesVideo

The next talk was once again from a ChromeOS engineer, David Reveman, who came to show his work on ARC++, the component in ChromeOS that allows to run Android applications. He was obviously mostly talking about the display side.

In order to achieve that, he had to implement an hwcomposer that would just act as a proxy between SurfaceFlinger and Wayland that is used on the ChromeOS side. The GL rendering is still direct though, and each Android application will talk directly to the GPU, as usual. Only the composition will be forwarded to the ChromeOS side.

In order to minimize that composition process, whenever possible, ARC++ tries to back each application with an overlay so that the composition would happen directly in hardware.

This also led to some interesting challenges, especially since some of the assumptions of both systems are in contradiction. For example, any application can be resized in ChromeOS, while it’s not really a thing in Android where all the applications run full screen.

HDR Displays in Linux – SlidesVideo

The next talk we found interesting was Andy Ritger from nVidia explaining how the HDR displays were supposed to be handled in Linux.

He first started by explaining what HDR is exactly. While the HDR is just about having a wider range of luminance than on a regular display, you often also get a wider gamut with HDR capable displays. This means that on those screens you can display a wider range of colors, and with a better range and precision in their intensity. And
while the applications have been able to generate HDR content for more than 10 years, the rest of the display stack wasn’t really ready, meaning that you had convert the HDR colors to colors that your monitor was able to display, using a technique called tone mapping.

He then explained than the standard, non-HDR colorspace, sRGB, is not a linear colorspace. This means than by doubling the encoded luminance of a color, you will not get a color twice brighter on your display. This was meant this way because the human eye is much more sensitive to the various shades of colors when they are dark than when they are bright. Which essentially means that the darker the color is, the more precision you want to get.

However, the luminance “resolution” on the HDR display is so good that you actually don’t need that anymore, and you can have a linear colorspace, which is in our case SCRGB.

But drawing blindly in all your applications in SCRGB is obviously not a good solution either. You have to make sure that your screen supports it (which is exposed through its EDIDs), but also that you actually tell your screeen to switch to it (through the infoframes). And that requires some support in the kernel drivers.

The Anatomy of a Vulkan Driver – SlidesVideo

This talk by Jason Ekstrand was some kind of a war story of the bring up Intel did of a Vulkan implementation on their GPU.

He first started by saying that it was actually a not so long project, especially when you consider that they wrote it from scratch, since it took roughly 3 full-time engineers 8 months to come up with a fully compliant and open source stack.

He then explained why Vulkan was needed. While OpenGL did amazingly well to cope with the hardware evolutions, it was still designed over 20 years ago, This proved to have some core characteristics that are not really relevant any more, and are holding the application developers back. For example, he mentioned that at its core, OpenGL is based on a singleton-based state machine, that obviously doesn’t scale well anymore on our SMP systems. He also mentioned that it was too abstracted, and people just wanted a lower level API, or that you might want to render things off screen without X or any context.

This was fixed in Vulkan by effectively removing the state machine, which allows it to scale, push things like the error checking or the synchronization directly to the applications, making the implementation much simpler and less layered which also simplifies the development and debugging.

He then went on to discuss how we could share the code that was still shared between the two implementations, like implementing OpenGL on top of Vulkan (which was discarded), having some kind of lighter intermediate language in Mesa to replace Gallium or just sharing through a library the common bits and making both the OpenGL and Vulkan libraries use that.

Motivating preemptive GPU scheduling for real-time systems – SlidesVideo

The last talk that we want to mention is the talk on preemptive scheduling by Roy Spliet, from the University of Cambridge.

More and more industries, and especially the automotive industry, offload some computations to the GPU for example to implement computer vision. This is then used in a car to implement the autonomous driving to make the car recognize signs or stay in its lane. And obviously, this kind of computations are supposed to be handled in a real time
system, since you probably don’t want your shiny user interface for the heating to make your car crash in the car before it because its rendering was taking too long.

He first started to explain what real time means, and what the usual metrics are, which should to no surprise to people used to “CPU based” real time systems: latency, deadline, execution time, and so on.

He then showed a bunch of benchmarks he used to test his preemptive scheduler, in a workload that was basically running OpenArena while running some computations, on various nouveau based platforms (both desktop-grade GPUs, and embedded SoCs).

This led to some expected conclusions, like the fact that a preemptive scheduler is indeed adding some overhead, but is on average worth it, while some have been quite interesting. He was for example observing some worst case latencies that were quite rare (0.3%), but were actually interferences from the display engine filling up its empty FIFOs, and creating some contention on the memory bus.

Conclusion

Overall, this has been a great experience. The organisation was flawless, and the one-track-only format allows you to meet easily both the speakers and attendees. The content was also highly technical, as you might expect, which made us learn a lot and led us to think about some interesting developments we could do on our various projects in the future, such as NextThing Co’s CHIP.

Bootlin at the Kernel Recipes conference

Kernel RecipesThe 2016 edition of the Kernel Recipes conference will take place from September 28th to 30th in Paris. With talks from kernel developers Jonathan Corbet, Greg Kroah-Hartmann, Daniel Vetter, Laurent Pinchart, Tejun Heo, Steven Rosdedt, Kevin Hilman, Hans Verkuil and many others, the schedule looks definitely very appealing, and indeed the event is now full.

Thomas Petazzoni, Bootlin CTO, will be attending this event. If you’re interested in discussing business or career opportunities with Bootlin, this event will be a great place to meet together.

Bootlin at the X Developer Conference

The next X.org Developer Conference will take place on September 21 to September 23 in Helsinki, Finland. This is a major event for Linux developers working in the graphics/display areas, not only at the X.org level, but also at the kernel level, in Mesa, and other related projects.

Bootlin engineer Maxime Ripard will be attending this conference, with 80+ other engineers from Intel, Google, NVidia, Texas Instruments, AMD, RedHat, etc.

Maxime is the author of the DRM/KMS driver in the upstream Linux kernel for the Allwinner SoCs, which provides display support for numerous Allwinner platforms, especially Nextthing’s CHIP (with parallel LCD support, HDMI support, VGA support and composite video support). Maxime has also worked on making the 3D acceleration work on this platform with a mainline kernel, by adapting the Mali kernel driver. Most recently, Maxime has been involved in Video4Linux development, writing a driver for the camera interface of Allwinner SoCs, and supervising Florent Revest work on the Allwinner VPU that we published a few days ago.

Bootlin at the Embedded Linux Conference Europe

The next Embedded Linux Conference Europe will take place from October 11 to October 13 in Berlin, Germany. As usual, the entire Bootlin engineering team will participate, which means this time 10 participants from Bootlin!

Embedded Linux Conference Europe 2016

The schedule for the conference has been published recently, and a number of our talk proposals have been accepted, so we will present on the following topics:

Like every year, we’re looking forward to attending this conference, and meeting all the nice folks of the Embedded Linux community!

“Understanding D-Bus” talk at the Toulouse Embedded Linux Meetup

A few months ago, in May, Bootlin engineer Mylène Josserand presented a talk titled Understanding D-Bus at the Toulouse Embedded Linux and Android meetup.

In this talk, Mylène shared her experience working with D-Bus, especially in conjunction with the OFono and Connman projects, to support modem and 3G connections on embedded Linux systems.

Understanding D-Bus

We are now publishing the slides of Mylène’s talk, they are available in PDF format.