Publication of Linux graphics training materials

Back in June 2019, we announced the availability of a new training course, Displaying and rendering graphics with Linux. At the time of this announcement, the training materials were not available though.

Since then, Bootlin engineer Paul Kocialkowski has been very busy preparing those training materials, and has successfully delivered the first edition of this course to one of our customers in Spain early September. After taking the time to polish those training materials following this first course, we are now very happy to publish and share this 200+ slides deck, covering a wide range of graphics related topics:

  • Image and color representation
  • Basic drawing
  • Basic and advanced operations
  • Hardware aspects overview
  • Hardware for display
  • Hardware for rendering
  • Memory aspects
  • Performance aspects
  • Software aspects overview
  • Kernel components in Linux
  • Userspace components with Linux

See also the detailed agenda of this training course. The LaTeX source code for all our training materials, including this graphics training, is available in a Git repository. It is worth mentioning that this training only consists of slides and demos, and does not include practical labs done by the participants, in order to keep the training logistics manageable and the duration reasonably short (2 days).

Here are a few slides showing various aspects of this training course:

Graphics training

Graphics training

Graphics training

Graphics training

Graphics training

By publishing this training materials right after our first course, and under the Creative Commons CC-BY-SA license, Bootlin sticks to its commitment of publishing all its training materials under a free documentation license, to better spread the knowledge in the entire embedded Linux community.

We are available to deliver this Displaying and rendering graphics with Linux course anywhere in the world, at your location. Contact us for more details.

Bootlin contributes Linux DRM driver for LogicBricks logiCVC-ML IP

LogicBricks is a vendor of numerous IP blocks, ranging from display controllers, audio controllers, 3D accelerators and many other specialized IP blocks. Most of these IP blocks are designed to work with the Xilinx Zynq 7000 system-on-chip, which includes an FPGA area. And indeed, because the Zynq 7000 does not have a display controller, one of Bootlin customers has selected the LogicBricks logiCVC-ML IP to provide display support for their Zynq 7000 design.

logiCVC-ML

LogiBricks provide one driver based on the framebuffer subsystem and another one based on the DRM subsystem, but none of these drivers are in the upstream Linux kernel. Bootlin engineer Paul Kocialkowski worked on a clean DRM driver for this IP block, and submitted the first version to the upstream Linux kernel. We already received some useful comments on the Device Tree binding for this IP block, which is pretty elaborate due to the number of aspects/features that can be tuned at IP synthesis time, and we will of course take into account those comments and send new iterations of the patch series until it gets merged.

In the e-mail containing the driver patch itself, Paul gives a summary of the IP features that are supported and tested, and those that re either untested or unsupported:

Introduces a driver for the LogiCVC display controller, a programmable
logic controller optimized for use in Xilinx Zynq-7000 SoCs and other
Xilinx FPGAs. The controller is mostly configured at logic synthesis
time so only a subset of configuration is left for the driver to
handle.

The following features are implemented and tested:
- LVDS 4-bit interface;
- RGB565 pixel formats;
- Multiple layers and hardware composition;
- Layer-wide alpha mode;

The following features are implemented but untested:
- Other RGB pixel formats;
- Layer framebuffer configuration for version 4;
- Lowest-layer used as background color;
- Per-pixel alpha mode.

The following features are not implemented:
- YUV pixel formats;
- DVI, LVDS 3-bit, ITU656 and camera link interfaces;
- External parallel input for layer;
- Color-keying;
- LUT-based alpha modes.

Additional implementation-specific notes:
- Panels are only enabled after the first page flip to avoid flashing a
  white screen.
- Depth used in context of the LogiCVC driver only counts color components
  to match the definition of the synthesis parameters.

Support is implemented for both version 3 and 4 of the controller.

With version 3, framebuffers are stored in a dedicated contiguous
memory area, with a base address hardcoded for each layer. This requires
using a dedicated CMA pool registered at the base address and tweaking a
few offset-related registers to try to use any buffer allocated from
the pool. This is done on a best-effort basis to have the hardware cope
with the DRM framebuffer allocation model and there is no guarantee
that each buffer allocated by GEM CMA can be used for any layer.
In particular, buffers allocated below the base address for a layer are
guaranteed not to be configurable for that layer. See the implementation of
logicvc_layer_buffer_find_setup for specifics.

Version 4 allows configuring each buffer address directly, which
guarantees that any buffer can be configured.

More Improvements to Raspberry Pi Display Testing

Raspberry Pi Display Support and IGT

We have been working with Raspberry Pi for quite some time, especially on areas related to the display side. Our work is part of a larger ongoing effort to move away from using the VC4 firmware for display operations and use native Linux drivers instead, which interact with the hardware directly. This transition is a long process, which requires bringing the native drivers to a point where they are efficient and reliable enough to cover most use cases of Raspberry Pi users.

Continuous Integration (CI) plays an important role in that process, since it allows detecting regressions early in the development cycle. This is why we have been tasked with improving testing in IGT GPU Tools, the test suite for the DRM subsystem of the kernel (which handles display). We already presented the work we conducted for testing various pixel formats with IGT on the Raspberry Pi’s VC4 last year. Since then, we have continued the work on IGT and brought it even further.

Improving YUV and Adding Tiled Pixel Formats Support

We continued the work on pixel formats by generalizing support for YUV buffers and reworking the format conversion helpers to support most of the common YUV formats instead of a reduced number of them. This lead to numerous commits that were merged in IGT:

In the meantime, we have also added support for testing specific tiling modes for display buffers. Tiling modes indicate that the pixel data is laid out in a different fashion than the usual line-after-line linear raster order. It provides more efficient data access to the hardware and yields better performance. They are used by the GPU (T tiling) or the VPU (SAND tiling). This required introducing a few changes to IGT as well as adding helpers for converting to the tiling modes, which was done in the following commits:

DRM Planes Support

The display engine hardware used on the Raspberry Pi allows displaying multiple framebuffers on-screen, in addition to the primary one (where the user interface lives). This feature is especially useful to display video streams directly, without having to perform the composition step with the CPU or GPU. The display engine offers features such as colorspace conversion (for converting YUV to RGB) and scaling, which are usually quite intensive tasks. In the Linux kernel’s DRM subsystem, this ability of the display engine hardware is exposed through DRM planes.

Displaying multiple DRM planes

We worked on adding support for testing DRM planes with the Chamelium board, with a fuzzing test that selects randomized attributes for the planes. Our work lead to the introduction of a new test in IGT:

Dealing with Imperfect Outputs

With the Chamelium, there are two major ways of finding out whether the captured display is correct or not:

  • Comparing the captured frame’s CRC with a CRC calculated from the reference frame;
  • Comparing the pixels in the captured and reference frames.

While the first method is the fastest one (because the captured frame’s CRC is calculated by the Chamelium board directly), it can only work if the framebuffer and the reference are guaranteed to be pixel-perfect. Since HDMI is a digital interface, this is generally the case. But as soon as scaling or colorspace conversion is involved, the algorithms used by the hardware do not result in the exact same pixels as performing the operation on the reference with the CPU.

Because of this issue, we had to come up with a specific checking method that excludes areas where there are such differences. Since our display pattern resembles a colorful checkerboard with solid-filled areas, most of the differences are only noticeable at the edges of each color block. As a result, we introduced a checking method that excludes the checkerboard edges from the comparison.

Detecting the edges (in blue) of a multi-plane pattern

This method turned out to provide good results and very few incorrect results after some tweaking. It was contributed to IGT with commit:

Underrun Detection

We also worked on implementing display pipeline underrun detection in the kernel’s VC4 DRM driver. Underruns occur when too much pixel data is provided (e.g. because of too many DRM planes enabled) and the hardware can’t keep up. In addition, a bandwidth filter was also added to reject configurations that would likely lead to an underrun. This lead to a few commits that were already merged upstream:

We prepared tests in IGT to ensure that the underruns are correctly reported, that the bandwidth protection does its job and that both are consistent. This test was submitted for review with patch:

Bootlin at the X.org Developer Conference 2016

The X.org Foundation hosts every year around september the X.org Developer Conference, which, unlike its name states, is not limited to X.org developers, but gathers all the Linux graphics stack developers, including X.org, Mesa, wayland, and other graphics stacks like ChromeOS, Android or Tizen.

This year’s edition was held last week in the University of Haaga-Helia, in Helsinki. At Bootlin, we’ve had more and more developments on the graphic stack recently through the work we do on Atmel and NextThing Co’s C.H.I.P., so it made sense to attend.

XDC 2016 conference

There’s been a lot of very interesting talks during those three days, as can be seen in the conference schedule, but we especially liked a few of those:

DRM HWComposer – SlidesVideo

The opening talk was made by two Google engineers from the ChromeOS team, Sean Paul and Zach Reizner. They talked about the work they did on the drm_hwcomposer they wrote for the Pixel C, on Android.

The hwcomposer is one of the HAL in Android that interfaces between Surface Flinger, the display manager, and the underlying display driver. It aims at providing hardware composition features, so that Android can leverage the capacities of the display engine to perform compositions (through planes and sprites), without having to use the CPU or the GPU to do this work.

The drm_hwcomposer started out as yet another hwcomposer library implementation for the tegra-drm driver in Linux. While they implemented it, it turned into some generic enough implementation that should be useful for all the DRM drivers out there, and they even introduced some particularly nice features, to split the final screen content into several planes based on the actual displayed content rather than on windows like it’s usually done.

Their work also helped to point out a few flaws in the hwcomposer API, that will eventually be fixed in a new revision of that API.

ARC++ SlidesVideo

The next talk was once again from a ChromeOS engineer, David Reveman, who came to show his work on ARC++, the component in ChromeOS that allows to run Android applications. He was obviously mostly talking about the display side.

In order to achieve that, he had to implement an hwcomposer that would just act as a proxy between SurfaceFlinger and Wayland that is used on the ChromeOS side. The GL rendering is still direct though, and each Android application will talk directly to the GPU, as usual. Only the composition will be forwarded to the ChromeOS side.

In order to minimize that composition process, whenever possible, ARC++ tries to back each application with an overlay so that the composition would happen directly in hardware.

This also led to some interesting challenges, especially since some of the assumptions of both systems are in contradiction. For example, any application can be resized in ChromeOS, while it’s not really a thing in Android where all the applications run full screen.

HDR Displays in Linux – SlidesVideo

The next talk we found interesting was Andy Ritger from nVidia explaining how the HDR displays were supposed to be handled in Linux.

He first started by explaining what HDR is exactly. While the HDR is just about having a wider range of luminance than on a regular display, you often also get a wider gamut with HDR capable displays. This means that on those screens you can display a wider range of colors, and with a better range and precision in their intensity. And
while the applications have been able to generate HDR content for more than 10 years, the rest of the display stack wasn’t really ready, meaning that you had convert the HDR colors to colors that your monitor was able to display, using a technique called tone mapping.

He then explained than the standard, non-HDR colorspace, sRGB, is not a linear colorspace. This means than by doubling the encoded luminance of a color, you will not get a color twice brighter on your display. This was meant this way because the human eye is much more sensitive to the various shades of colors when they are dark than when they are bright. Which essentially means that the darker the color is, the more precision you want to get.

However, the luminance “resolution” on the HDR display is so good that you actually don’t need that anymore, and you can have a linear colorspace, which is in our case SCRGB.

But drawing blindly in all your applications in SCRGB is obviously not a good solution either. You have to make sure that your screen supports it (which is exposed through its EDIDs), but also that you actually tell your screeen to switch to it (through the infoframes). And that requires some support in the kernel drivers.

The Anatomy of a Vulkan Driver – SlidesVideo

This talk by Jason Ekstrand was some kind of a war story of the bring up Intel did of a Vulkan implementation on their GPU.

He first started by saying that it was actually a not so long project, especially when you consider that they wrote it from scratch, since it took roughly 3 full-time engineers 8 months to come up with a fully compliant and open source stack.

He then explained why Vulkan was needed. While OpenGL did amazingly well to cope with the hardware evolutions, it was still designed over 20 years ago, This proved to have some core characteristics that are not really relevant any more, and are holding the application developers back. For example, he mentioned that at its core, OpenGL is based on a singleton-based state machine, that obviously doesn’t scale well anymore on our SMP systems. He also mentioned that it was too abstracted, and people just wanted a lower level API, or that you might want to render things off screen without X or any context.

This was fixed in Vulkan by effectively removing the state machine, which allows it to scale, push things like the error checking or the synchronization directly to the applications, making the implementation much simpler and less layered which also simplifies the development and debugging.

He then went on to discuss how we could share the code that was still shared between the two implementations, like implementing OpenGL on top of Vulkan (which was discarded), having some kind of lighter intermediate language in Mesa to replace Gallium or just sharing through a library the common bits and making both the OpenGL and Vulkan libraries use that.

Motivating preemptive GPU scheduling for real-time systems – SlidesVideo

The last talk that we want to mention is the talk on preemptive scheduling by Roy Spliet, from the University of Cambridge.

More and more industries, and especially the automotive industry, offload some computations to the GPU for example to implement computer vision. This is then used in a car to implement the autonomous driving to make the car recognize signs or stay in its lane. And obviously, this kind of computations are supposed to be handled in a real time
system, since you probably don’t want your shiny user interface for the heating to make your car crash in the car before it because its rendering was taking too long.

He first started to explain what real time means, and what the usual metrics are, which should to no surprise to people used to “CPU based” real time systems: latency, deadline, execution time, and so on.

He then showed a bunch of benchmarks he used to test his preemptive scheduler, in a workload that was basically running OpenArena while running some computations, on various nouveau based platforms (both desktop-grade GPUs, and embedded SoCs).

This led to some expected conclusions, like the fact that a preemptive scheduler is indeed adding some overhead, but is on average worth it, while some have been quite interesting. He was for example observing some worst case latencies that were quite rare (0.3%), but were actually interferences from the display engine filling up its empty FIFOs, and creating some contention on the memory bus.

Conclusion

Overall, this has been a great experience. The organisation was flawless, and the one-track-only format allows you to meet easily both the speakers and attendees. The content was also highly technical, as you might expect, which made us learn a lot and led us to think about some interesting developments we could do on our various projects in the future, such as NextThing Co’s CHIP.