Delivery of Allwinner VPU driver main goals

With a few weeks of delay, we are proud to announce the delivery of the main goals of our crowdfunding campaign dedicated at adding upstream Linux support for the Allwinner video decoding hardware.

After several months of hard work by Bootlin engineer Maxime Ripard and intern Paul Kocialkowski, we now have a working demo of Kodi running with our VPU driver on top of a mainline 4.18-rc kernel. Both MPEG2 and H264 are supported, with a fully-optimized pipeline between the VPU and the display side that does not involve any buffer copy or extra transformation that the hardware cannot offload. These results were possible thanks to the previous efforts carried out by the linux-sunxi community, and especially the libvdpau-sunxi project.

The Cedrus VPU driver running on the A33 and H3

Here were the main goals defined in our crowdfunding campaign, which we promised to deliver end of June 2018, and their status in our delivery:

Making sure that the codec works on the older Allwinner SoCs: A10, A13, A20, A33, R8 and R16.. This goal is fully met, with more features than planned: the Cedrus driver was brought up on the A10, A13, A20, A33 and H3. Therefore, we included H3 support in this delivery, even though it was originally only part of one of the stretch goals. The R8 is the same as an A13 and the R16 is the same as an A33, so they are supported as well.
Polishing the existing MPEG2 decoding support to make it fully production ready. This goal is fully met: we have done much more testing of the MPEG2 decoding, and both the Linux kernel code and user-space code supporting MPEG2 has been significantly improved and cleaned up.
Implementing H264 video decoding, since H264 is by far one of the most popular video codec.. This goal is fully met: H264 decoding support has been added to both the Linux kernel driver and the user-space library, including high-profile H264 support. However, the H264 support is still very recent and we expect that additional debugging and improvements will be needed.
Modifying the Allwinner display driver in order to be able to directly display the decoded frames instead of converting and copying those frames. This goal is fully met: the Allwinner DRM driver has received a number of patches to ensure we can use one of the several planes to directly display the video frames in the format provided by the VPU. Support for hardware scaling has also been fixed to work properly. Those patches have already been contributed to the upstream Linux kernel. The work on the A20 and A33 display driver was done by Bootlin, while the work on the H3 was done by other developers of the community.
Providing a user-space library easy to integrate in the popular open-source video players. This goal is partially met: while we are providing a libva-v4l2-request user-space libraries that can in theory be used by all libva capable video players, the actual integration with video players is for now only working completely with Kodi. We have started efforts to make it work with both VLC and GStreamer, but the work has not been complete due to various challenges detailed below. This area was definitely much more challenging than we initially expected.
Upstreaming those changes to the official Linux kernel. This goal is almost met: we have posted 5 iterations of the Cedrus Linux kernel driver, each time using new versions of the Request API patches, helping improve this API along the way. While our patches have not been merged yet, because the Request API itself hasn’t been merged, they have received significant review from the V4L developers, and we believe our patches are not far from being merged.

All in all, despite the numerous challenges encountered over the last few months, we are happy to see that we have been able to deliver most of the goals completely, and we are not too far off for the few goals that haven’t yet been fully met. As we will discuss below, we will continue to work in the next months on completing those unfinished steps, and on the stretch goals that received enough funding.

Reaching this level of support was not a straightforward journey, as our road was paved with various obstacles that are presented below.

Media Request API

In order to add support for the VPU found on Allwinner platforms, some internal plumbing is necessary in the Video4Linux2 (v4l2) framework, the video framework in Linux. While V4L2 gained support for a specific class of VPUs, so-called “stateful” (where the video bitstream is passed directly to the hardware controller) thanks to the Memory2Memory API, this is not sufficient for our hardware. Indeed, Allwinner platforms come with a “stateless” VPU, where the video needs to be parsed beforehand to extract the frame data and its associated metadata, and then passed to the hardware. V4L2 lacked an API for synchronizing the frame data and associated metadata, although it had been in development for a long time and known as the Request API.

Our work on Cedrus contributed to revive the flame for this API, that saw its development accelerated over the past months thanks to the commitment of individuals such as Alexandre Courbot, Hans Verkuil and Sakari Ailus. We had the opportunity to report various issues and suggest fixes over its development process, which were integrated so that all the required bits for our driver are now in. The API is finally mature and appears to be quite stable, so there is no known blocker left for its integration in the kernel.

Cedrus V4L2 Driver

The first version of the Cedrus driver originally developed in 2016 by Florent Revest as part of an internship at Bootlin was based on an old version of the Request API. We therefore started by porting it to the latest version of the API and kept publishing new revisions as development of the Request API happened. We also received useful feedback from the community in the process. Here are the different iterations of the Cedrus driver that have been sent as part of this crowdfunded effort:

version 1, March 9, 2018
version 2, April 19, 2018
version 3, May 7, 2018
version 4, June 18, 2018
version 5, July 10, 2018

In addition to those patch series adding the driver itself, an additional patch series was sent to bring H264 support.

The development of the driver itself was not the most cumbersome part of the process, although it brought some challenges. For instance, we had to rework buffer management after discovering a limitation in the hardware, where the luminance and chrominance planes of our destination buffers need to be kept close in memory. We also had to bring in a workqueue (later replaced by a threaded IRQ) for the needs of the M2M API, which comes with performance drawbacks, although this issue is in the process of being resolved.

Standalone Testing

In order to test the VPU driver in a fully-controlled environment, we developed a standalone testing tool: v4l2-request-test (formerly cedrus-frame-test) that implements all the V4L2 userspace APIs needed for our VPU, including M2M and the Request API. This tool includes frame data and metadata dumps from actual videos, with the ability to decode these frames one-by-one. The tool was tremendously helpful for debugging the driver as well as adding support for H264. Since the userspace APIs involved properly abstract the hardware, this tool can be used to bring up and develop other VPU drivers that rely on the V4L2 Request API!

VAAPI Backend

In order to provide integration with actual video players, we developed libva-v4l2-request (formerly libva-cedrus): a VAAPI backend that supports the V4L2 M2M and Request APIs. It currently supports both MPEG2 and H264 and will be extended as support for new formats is added. Just like v4l2-request-test, libva-v4l2-request aims at using the kernel APIs involved in a generic way, that should suit other Request API-based VPU drivers.

In the long run, it is likely that players will integrate direct support for the Request API (for instance, through ffmpeg). In the meantime, this allows interfacing with media players through two major interfaces: buffer derivation where the destination frames are copied (and converted to a regular image format when the VPU cannot do it on its own) or dma-buf, without any copy.

Zero-copy Pipeline Integration with EGL (Mali GPUs): VLC and GStreamer

In order to reach the best performance we can achieve, we focused on pipelines where no buffer copy is involved, on popular players: VLC and GStreamer. Since the X.org display server does not easily permit piping the VPU output to a dedicated plane on the Display Engine side, we investigated the use of the GPU. GPU support on Allwinner platforms still requires proprietary blobs at this point, such as the ones recently made available by Bootlin. We hope that the Lima project will soon bring a fully free alternative that will be integrated with both upstream kernel and upstream userspace components.

We did not have much luck when dealing with the tiled VPU output format, that the GPU cannot handle directly. Although we wrote a GPU shader for untiling (that works properly with regular GL implementations), the Mali GPU blobs did not behave as expected when it came to importing the tiled output frame. There is a chance that platforms that can output a regular image format (A33 and onwards) will be able to deal with piping the VPU and the GPU for accelerated scaling and colorspace conversion, but we did not test this option at this point.

Zero-copy Pipeline Integration with DRM (Display Engine): GStreamer and Kodi

Although involving the GPU in the pipeline was not a realistic possibility with the tiled VPU output format, various players support a direct DRM video output, that uses the Display Engine directly to pipe the video. Alas, it means that no window composition is possible, so this cannot be integrated with desktop environments. Instead, the players run standalone in their own virtual terminal.

We initially looked at using GStreamer this way but soon decided to prioritize Kodi (formerly XBMC), the popular mediacenter application. It was a struggle to integrate our pipeline (through libva-v4l2-request, via ffmpeg) in Kodi, although DRM video output support was there already. We eventually managed to get a usable result out of it, although there are areas left to improve!

LibreELEC Image Release with Kodi

In order to showcase the delivery of our main VPU crowdfunding campaign goals, we cooked a release of LibreELEC that supports the A20, A33 and H3 SoCs! It consists of a LibreELEC root filesystem (excluding the kernel and boot software) that works in conjunction with our latest linux-cedrus kernel tree.

Source code is of course available through our repositories, marked with the release-2018-07 tag.
Instructions to deploy the software on a compatible board are available on the linux-sunxi community wiki!

Remaining Tasks

We have tackled many of the tasks on our plate at this point, but there are still items that need to be worked on:

posting new series of the Cedrus driver and H264 support until it is merged;
supporting H265 in our driver and userspace components;
supporting the ARM64 SoCs that come with version 2 of the Display Engine design, namely the H5 and A64;
contributing to the integration of our code in upstream Kodi and LibreELEC;
integrating a dma-buf and DRM pipeline with GStreamer.

Thanks

We would like to thank all the individuals and companies who have supported this project by participating to our crowdfunding campaign, but also the linux-sunxi community members who did the initial reverse engineering of the Cedrus VPU and who worked with us during the development of this driver as well as the members of the V4L2 community who worked on the Request API and reviewed our patches.

Author: Paul Kocialkowski

Paul is a kernel and embedded Linux engineer at Bootlin, which he joined in 2018. See More details... View all posts by Paul Kocialkowski

16 thoughts on “Delivery of Allwinner VPU driver main goals”

skxo says:

July 22, 2018 at 7:39 am

Nice work! Congrats

igraltist says:

July 22, 2018 at 1:55 pm

Thanks for this great work.

Ning says:

July 23, 2018 at 3:32 am

great work. thanks Bootlin and the team.

Kevin says:

July 23, 2018 at 10:42 am

Awesome work, congratulations to all folks involved!

Sam says:

July 23, 2018 at 12:46 pm

Nice. Looking forward to testing it.

For your next project could I suggest looking at video acceleration in Firefox in order to take more advantage of this work. https://bugzilla.mozilla.org/show_bug.cgi?id=563206 . There is already some money pledged as a bounty https://www.bountysource.com/issues/55506502-add-va-api-hardware-decoding-support-on-linux

1. Thomas Petazzoni says:
  
  September 7, 2018 at 10:35 am
  
  @Sam: thanks for your suggestion. However, Firefox development is a bit out of our core expertise, so we’re probably not the best team to work on such a topic. There are quite certainly other developers with existing Firefox experience who would be more suitable for such a work. However, a pledge of USD 410 is far from being sufficient considering the amount of engineering work required.
  
Tom says:

July 23, 2018 at 6:59 pm

Bravo! Thanks for all your hard work.

I noticed that you did not meet the kickstarter goal for H264 encoding. Is there any chance of supporting this in the future? It would be a great help for capture card & camera uses.

1. Thomas Petazzoni says:
  
  September 7, 2018 at 10:30 am
  
  @Tom: indeed, the funding was not sufficient for H264 encoding. We hope that some companies will be interested by this feature and will contract us to do this development. Otherwise, we may start another crowdfunding specifically for this feature. But first, we would like to complete all the features we had promised in the current Kickstarter campaign.
  
SK says:

July 24, 2018 at 6:55 am

Good job, guys.
Are you going to dig deeper into this for the next phase (VLC is kind of important as a media player):
“There is a chance that platforms that can output a regular image format (A33 and onwards) will be able to deal with piping the VPU and the GPU for accelerated scaling and colorspace conversion, but we did not test this option at this point.”

zille says:

July 24, 2018 at 8:08 am

Great job! Thank you!
Is there a plan to write a driver for hardware deinterlacer?

1. Thomas Petazzoni says:
  
  September 7, 2018 at 11:18 am
  
  @zille: as explained in our blog post concluding the main development period (https://bootlin.com/blog/final-weekly-status-update-allwinner-vpu-support/), we did not implement support for interlaced video, and we do not plan to work on it at this point. We don’t think the effort is very significant though.
  
Anon Y. Mouse says:

July 26, 2018 at 10:13 pm

It seems like its wishing time in comments section 😉

What about another kickstarter for hdmi audio?

Jack Ching says:

September 6, 2018 at 10:19 am

Greate work, thanks for the bootlin and team.

What is the plan working for Soc H5?

1. Thomas Petazzoni says:
  
  September 7, 2018 at 10:25 am
  
  @Jack Ching: we plan to work on H5 and A64 by the end of 2018, as promised by our Kickstarter campaign. See also our blog post at https://bootlin.com/blog/final-weekly-status-update-allwinner-vpu-support/ that brings a status of the work and what we plan to do next.
  
JM says:

February 6, 2020 at 2:57 pm

Very nice work !!!!
If I’m working with the Allwinner A20, is there a common method or solution of working with only one HW video encoder for both the cameras simultaneously ?

1. Paul Kocialkowski says:
  
  February 10, 2020 at 10:20 am
  
  @JM: We have not yet looked at the encoder side in-depth, but it is most likely possible to use it for encoding multiple streams simultaneously in theory. I don’t know if the proprietary libraries from Allwinner allow this though. But a mainline implementation using the V4L2 interface could definitely support this if it is allowed by the hardware!