Allwinner VPU support in mainline Linux status update (week 11)

After the initial submission of the Sunxi-Cedrus driver last week, I spent most of this week looking into the sun4i DRM (Direct Rendering Manager) driver. The driver is in charge of handling the display pipeline on Allwinner SoCs. Tight integration of the VPU and the display pipeline is required in order to achieve decent video playback performance. That is because the output format of the VPU is a 32×32 tiled format based on NV12, a YUV420 semi-planar format, with one plane for the Y component (luminance) and one plane for the interleaved UV components (chrominance). While NV12 is a standard format for video output, the tiling is rather specific to the VPU, so the frames have to be untiled before they can be used. This operation, when done in software, is rather slow. Moreover, software-based compositing of the decoded frames is also a bottleneck that impacts the overall performance.

In order to circumvent these issues, we will be using the display engine itself to untile the VPU output frames and show the untiled frames directly in a dedicated hardware plane, that is then composed with the primary plane. This requires several features and especially support for the display engine’s frontend, that has the required components to untile and decode the frames. Partial support for the frontend was recently contributed by Maxime Ripard and is on its way to landing in the mainline Linux kernel, providing a base for my VPU-related work. Maxime’s patches allow scaling hardware planes (among other things), a feature that will be very useful for scaling videos to the screen size in hardware rather than software (which is another major bottleneck for performance).

Support for untiling the VPU frames is approaching completion (luminance is correctly decoded while chrominance is not yet correctly handled).

Decoding the MB32 tiled format with sun4i-drm

Once the frames are properly shown on screen, it’ll be time to make sure that dmabuf works as expected, which will allow us to send buffers from the VPU to the display engine without any copy, thus improving performance.

We should be making good progress on this topic over the upcoming week and start contributing patches to the sun4i DRM driver, so stay tuned for our next status update!

Allwinner VPU support in mainline Linux status update (week 10)

Just over a week ago, I started my internship focused on adding upstream Linux kernel support for the Allwinner VPU at Bootlin’s Toulouse office. The team has been super-friendly and very helpful to help me get settled and I’m definitely happy about moving to Toulouse for the occasion!

This first week of work was focused on studying and rebasing the work done by Florent Revest a year and a half ago. As a main development target, I went for an A33-based board, the SinA33 from Sinlinx. Florent’s patches for the sunxi-cedrus driver were rebased against the latest release candidate version of Linus’ tree, v4.16-rc4.

VPU decoding with Cedrus on the Sinlinx A33

The driver was then adapted to use the latest version of the V4L2 request API, a crucial piece of plumbing needed to provide coherency between setting specific controls for the media stream and the input/output buffers that these controls are related to. A few bugs needed fixing along the way, in order to avoid memory corruptions (use-after-free) and to properly schedule the VPU to run when a request is submitted. With these fixes the driver was ready, so it was sent for review on the linux-media mailing list. On the userspace side, the cedrus-specific libva was also updated to use the latest version of the request API.

The next step in the pipeline is to use a common buffer for the VPU’s decoded frame and the display controller’s plane, using dmabuf. This should bring a significant performance improvement and eventually allow for hardware-based scaling when decoding videos through the standard DRM/KMS interfaces. However, this requires adding support for the specific format used by the VPU (a multiplanar NV12 format with 32×32 tiles) into the display controller code.