Allwinner VPU support in mainline Linux status update (week 12)

Following up on the work started last week, I finished implementing initial support for displaying the NV12-based tiled format (that we shall call MB32-tiled NV12). The frame, that was dumped from the VPU, is now correctly displayed on the screen (after adapting scaling coefficients that needed specific tweaking for this use case).

The result can be shown in the following picture, where our Big Buck Bunny has the right coloring:

Scaling is also supported for the tiled format, so the frame can be shown in full screen without resorting to software scaling.

A series of patches supporting these features was sent for review on the dri-devel mailing list, where it already got some feedback from Maxime Ripard (who maintains the sun4i DRM driver impacted by these patches) as well as other members of the community! There is already enough material to craft a second version and send it again for review.

Significant time was spent figuring out the DRM, KMS, DRI and X11 graphics pipeline (as well as specific details of the inner workings of display hardware) and how to properly integrate the overlay DRM plane with all this. We are evaluating all our options here before spending time on a specific implementation. Of course, we are trying to keep things as generic as possible and avoid introducing platform-specific code in userspace, but there are also challenges to overcome in this regard. On the Wayland side, things are looking much brighter as compositors such as Weston have support for managing hardware planes directly, so there should be less work required.

Finally, I started working on dmabuf support, that I am testing with gstreamer‘s kmssink, that allows outputting directly in a hardware plane. Once this work is ready, we’ll be able to get an idea of the performance of the VPU when it is not limited by software-based untiling and compositing. Stay tuned for updates in this direction!

Allwinner VPU support in mainline Linux status update (week 11)

After the initial submission of the Sunxi-Cedrus driver last week, I spent most of this week looking into the sun4i DRM (Direct Rendering Manager) driver. The driver is in charge of handling the display pipeline on Allwinner SoCs. Tight integration of the VPU and the display pipeline is required in order to achieve decent video playback performance. That is because the output format of the VPU is a 32×32 tiled format based on NV12, a YUV420 semi-planar format, with one plane for the Y component (luminance) and one plane for the interleaved UV components (chrominance). While NV12 is a standard format for video output, the tiling is rather specific to the VPU, so the frames have to be untiled before they can be used. This operation, when done in software, is rather slow. Moreover, software-based compositing of the decoded frames is also a bottleneck that impacts the overall performance.

In order to circumvent these issues, we will be using the display engine itself to untile the VPU output frames and show the untiled frames directly in a dedicated hardware plane, that is then composed with the primary plane. This requires several features and especially support for the display engine’s frontend, that has the required components to untile and decode the frames. Partial support for the frontend was recently contributed by Maxime Ripard and is on its way to landing in the mainline Linux kernel, providing a base for my VPU-related work. Maxime’s patches allow scaling hardware planes (among other things), a feature that will be very useful for scaling videos to the screen size in hardware rather than software (which is another major bottleneck for performance).

Support for untiling the VPU frames is approaching completion (luminance is correctly decoded while chrominance is not yet correctly handled).

Decoding the MB32 tiled format with sun4i-drm

Once the frames are properly shown on screen, it’ll be time to make sure that dmabuf works as expected, which will allow us to send buffers from the VPU to the display engine without any copy, thus improving performance.

We should be making good progress on this topic over the upcoming week and start contributing patches to the sun4i DRM driver, so stay tuned for our next status update!

Allwinner VPU support in mainline Linux status update (week 10)

Just over a week ago, I started my internship focused on adding upstream Linux kernel support for the Allwinner VPU at Bootlin’s Toulouse office. The team has been super-friendly and very helpful to help me get settled and I’m definitely happy about moving to Toulouse for the occasion!

This first week of work was focused on studying and rebasing the work done by Florent Revest a year and a half ago. As a main development target, I went for an A33-based board, the SinA33 from Sinlinx. Florent’s patches for the sunxi-cedrus driver were rebased against the latest release candidate version of Linus’ tree, v4.16-rc4.

VPU decoding with Cedrus on the Sinlinx A33

The driver was then adapted to use the latest version of the V4L2 request API, a crucial piece of plumbing needed to provide coherency between setting specific controls for the media stream and the input/output buffers that these controls are related to. A few bugs needed fixing along the way, in order to avoid memory corruptions (use-after-free) and to properly schedule the VPU to run when a request is submitted. With these fixes the driver was ready, so it was sent for review on the linux-media mailing list. On the userspace side, the cedrus-specific libva was also updated to use the latest version of the request API.

The next step in the pipeline is to use a common buffer for the VPU’s decoded frame and the display controller’s plane, using dmabuf. This should bring a significant performance improvement and eventually allow for hardware-based scaling when decoding videos through the standard DRM/KMS interfaces. However, this requires adding support for the specific format used by the VPU (a multiplanar NV12 format with 32×32 tiles) into the display controller code.

Crowdfunding campaign for upstream Linux kernel driver for Allwinner VPU

Back in 2012, Bootlin engineer Maxime Ripard pioneered the support for Allwinner processors in the official Linux kernel. Today, thanks to the contributions of numerous developers around the world and our involvement, there is very good support for a large number of Allwinner processors in the Linux kernel, to the point where actual Allwinner-based products are shipping with the mainline kernel.

Despite this major effort, there is one area that has remained unsupported in the mainline kernel: the video decoding and encoding engine, which allows to accelerate in hardware the decoding and encoding of popular codecs such as MPEG2, MPEG4 or H264. Last summer, we successfully implemented a prototype, supporting MPEG2 decoding and partially MPEG4 decoding.

Today, we are launching a crowdfunding campaign to fund the remainder of the development: finishing MPEG4 decoding support, implementing H264 decoding, optimizing the rendering of video frames in cooperation with the display driver, and upstreaming the driver. We also have additional goals of supporting H265, encoding support, and additional Allwinner SoCs.

In the vendor-provided kernel, this video decoding/encoding unit is supported by a kernel driver that uses a non-standard user-space API, in conjunction with a binary-only userspace blob. Fortunately, a number of people have done an enormous reverse engineering effort, which we have leveraged for our existing prototype, and which we intend to use to continue the development of this upstream driver. Both Maxime Ripard and our intern Paul Kocialkowski will be working on this crowdfunded project.

This is our first crowdfunding campaign to fund upstream Linux kernel development, and we are interested in seeing how much interest there is in such a financing model. Help us making this a success by spreading the word!

Mali OpenGL support on Allwinner platforms with mainline Linux

As most people know, getting GPU-based 3D acceleration to work on ARM platforms has always been difficult, due to the closed nature of the support for such GPUs. Most vendors provide closed-source binary-only OpenGL implementations in the form of binary blobs, whose quality depend on the vendor.

This situation is getting better and better through vendor-funded initiatives like for the Broadcom VC4 and VC5, or through reverse engineering projects like Nouveau on Tegra SoCs, Etnaviv on Vivante GPUs, Freedreno on Qualcomm’s. However there are still GPUs where you do not have the option to use a free software stack: PowerVR from Imagination Technologies and Mali from ARM (even though there is some progress on the reverse engineering effort).

Allwinner SoCs are using either a Mali GPU from ARM or a PowerVR from Imagination Technologies, and therefore, support for OpenGL on those platforms using a mainline Linux kernel has always been a problem. This is also further complicated by the fact that Allwinner is mostly interested in Android, which uses a different C library that avoids its use in traditional glibc-based systems (or through the use of libhybris).

However, we are happy to announce that Allwinner gave us clearance to publish the userspace binary blobs that allows to get OpenGL supported on Allwinner platforms that use a Mali GPU from ARM, using a recent mainline Linux kernel. Of course, those are closed source binary blobs and not a nice fully open-source solution, but it nonetheless allows everyone to have OpenGL support working, while taking advantage of all the benefits of a recent mainline Linux kernel. We have successfully used those binary blobs on customer projects involving the Allwinner A33 SoCs, and they should work on all Allwinner SoCs using the Mali GPU.

In order to get GPU support to work on your Allwinner platform, you will need:

  • The kernel-side driver, available on Maxime Ripard’s Github repository. This is essentially the Mali kernel-side driver from ARM, plus a number of build and bug fixes to make it work with recent mainline Linux kernels.
  • The Device Tree description of the GPU. We introduced Device Tree bindings for Mali GPUs in the mainline kernel a while ago, so that Device Trees can describe such GPUs. Such description has been added for the Allwinner A23 and A33 SoCs as part of this commit.
  • The userspace blob, which is available on Bootlin GitHub repository. It currently provides the r6p2 version of the driver, with support for both fbdev and X11 systems. Hopefully, we’ll gain access to newer versions in the future, with additional features (such as GBM support).

If you want to use it in your system, the first step is to have the GPU definition in your device tree if it’s not already there. Then, you need to compile the kernel module:

git clone https://github.com/mripard/sunxi-mali.git
cd sunxi-mali
export CROSS_COMPILE=$TOOLCHAIN_PREFIX
export KDIR=$KERNEL_BUILD_DIR
export INSTALL_MOD_PATH=$TARGET_DIR
./build.sh -r r6p2 -b
./build.sh -r r6p2 -i

It should install the mali.ko Linux kernel module into the target filesystem.

Now, you can copy the OpenGL userspace blobs that match your setup, most likely the fbdev or X11-dma-buf variant. For example, for fbdev:

git clone https://github.com/bootlin/mali-blobs.git
cd mali-blobs
cp -a r6p2/fbdev/lib/lib_fb_dev/lib* $TARGET_DIR/usr/lib

You should be all set. Of course, you will have to link your OpenGL applications or libraries against those user-space blobs. You can check that everything works using OpenGL test programs such as es2_gears for example.