Allwinner VPU support in mainline Linux status update (week 12)

Following up on the work started last week, I finished implementing initial support for displaying the NV12-based tiled format (that we shall call MB32-tiled NV12). The frame, that was dumped from the VPU, is now correctly displayed on the screen (after adapting scaling coefficients that needed specific tweaking for this use case).

The result can be shown in the following picture, where our Big Buck Bunny has the right coloring:

Scaling is also supported for the tiled format, so the frame can be shown in full screen without resorting to software scaling.

A series of patches supporting these features was sent for review on the dri-devel mailing list, where it already got some feedback from Maxime Ripard (who maintains the sun4i DRM driver impacted by these patches) as well as other members of the community! There is already enough material to craft a second version and send it again for review.

Significant time was spent figuring out the DRM, KMS, DRI and X11 graphics pipeline (as well as specific details of the inner workings of display hardware) and how to properly integrate the overlay DRM plane with all this. We are evaluating all our options here before spending time on a specific implementation. Of course, we are trying to keep things as generic as possible and avoid introducing platform-specific code in userspace, but there are also challenges to overcome in this regard. On the Wayland side, things are looking much brighter as compositors such as Weston have support for managing hardware planes directly, so there should be less work required.

Finally, I started working on dmabuf support, that I am testing with gstreamer‘s kmssink, that allows outputting directly in a hardware plane. Once this work is ready, we’ll be able to get an idea of the performance of the VPU when it is not limited by software-based untiling and compositing. Stay tuned for updates in this direction!

Allwinner VPU support in mainline Linux status update (week 11)

After the initial submission of the Sunxi-Cedrus driver last week, I spent most of this week looking into the sun4i DRM (Direct Rendering Manager) driver. The driver is in charge of handling the display pipeline on Allwinner SoCs. Tight integration of the VPU and the display pipeline is required in order to achieve decent video playback performance. That is because the output format of the VPU is a 32×32 tiled format based on NV12, a YUV420 semi-planar format, with one plane for the Y component (luminance) and one plane for the interleaved UV components (chrominance). While NV12 is a standard format for video output, the tiling is rather specific to the VPU, so the frames have to be untiled before they can be used. This operation, when done in software, is rather slow. Moreover, software-based compositing of the decoded frames is also a bottleneck that impacts the overall performance.

In order to circumvent these issues, we will be using the display engine itself to untile the VPU output frames and show the untiled frames directly in a dedicated hardware plane, that is then composed with the primary plane. This requires several features and especially support for the display engine’s frontend, that has the required components to untile and decode the frames. Partial support for the frontend was recently contributed by Maxime Ripard and is on its way to landing in the mainline Linux kernel, providing a base for my VPU-related work. Maxime’s patches allow scaling hardware planes (among other things), a feature that will be very useful for scaling videos to the screen size in hardware rather than software (which is another major bottleneck for performance).

Support for untiling the VPU frames is approaching completion (luminance is correctly decoded while chrominance is not yet correctly handled).

Decoding the MB32 tiled format with sun4i-drm

Once the frames are properly shown on screen, it’ll be time to make sure that dmabuf works as expected, which will allow us to send buffers from the VPU to the display engine without any copy, thus improving performance.

We should be making good progress on this topic over the upcoming week and start contributing patches to the sun4i DRM driver, so stay tuned for our next status update!

Allwinner VPU support in mainline Linux status update (week 10)

Just over a week ago, I started my internship focused on adding upstream Linux kernel support for the Allwinner VPU at Bootlin’s Toulouse office. The team has been super-friendly and very helpful to help me get settled and I’m definitely happy about moving to Toulouse for the occasion!

This first week of work was focused on studying and rebasing the work done by Florent Revest a year and a half ago. As a main development target, I went for an A33-based board, the SinA33 from Sinlinx. Florent’s patches for the sunxi-cedrus driver were rebased against the latest release candidate version of Linus’ tree, v4.16-rc4.

VPU decoding with Cedrus on the Sinlinx A33

The driver was then adapted to use the latest version of the V4L2 request API, a crucial piece of plumbing needed to provide coherency between setting specific controls for the media stream and the input/output buffers that these controls are related to. A few bugs needed fixing along the way, in order to avoid memory corruptions (use-after-free) and to properly schedule the VPU to run when a request is submitted. With these fixes the driver was ready, so it was sent for review on the linux-media mailing list. On the userspace side, the cedrus-specific libva was also updated to use the latest version of the request API.

The next step in the pipeline is to use a common buffer for the VPU’s decoded frame and the display controller’s plane, using dmabuf. This should bring a significant performance improvement and eventually allow for hardware-based scaling when decoding videos through the standard DRM/KMS interfaces. However, this requires adding support for the specific format used by the VPU (a multiplanar NV12 format with 32×32 tiles) into the display controller code.