Initial Allwinner V3 ISP support in mainline Linux

Introduction

Several months ago, Bootlin announced ongoing work on MIPI CSI-2 support for the Allwinner A31/V3 and A83T platforms in mainline Linux, as well as support for the Omnivision OV8865 and OV5648 image sensors. This effort has been a success and while the sensor patches were already integrated in mainline Linux since, the MIPI CSI-2 controller patches are on their way towards inclusion.

Since then we had the opportunity to take things further and start tackling the next steps for advanced camera support in mainline Linux on Allwinner SoCs!

With MIPI CSI-2 support and proper sensor drivers available in V4L2, we were able to capture raw bayer data provided by the sensors. But this data does not constitute the final picture that can be displayed or encoded into a file: a number of enhancement and transformation steps are required to achieve a visually-pleasing result that users typically expect.

These steps are quite calculation-intensive and it does not make sense to implement them with a software pipeline, especially with rates of 25, 30 or even 60 frames per second that are typically expected for video recording.

An open-source and upstream driver for the Allwinner ISP

Allwinner SoCs that support MIPI CSI-2 also include an Image Signal Processor hardware unit, a dedicated accelerator for enhancing and transforming raw data received from sensors.

Allwinner ISP features — Features of the ISP as described in the Allwinner V3 datasheet.

Since support for this ISP was implemented using a non-free blob in Allwinner SDKs, this area remained unsupported in mainline Linux… Until now!

Thanks to some help from Allwinner, we were able to implement a proper V4L2 driver for the Allwinner ISP found in the Allwinner V3, completely open-source, with no binary blob involved. This work was recently submitted upstream, with a first revision totaling more than 8000 new lines of code, which comes together with a significant rework of the Allwinner camera interface driver to make it usable with or without the ISP, and including the new MIPI CSI-2 support which we had submitted previously. We are very happy to keep contributing to advancing fully open-source Allwinner SoCs support in mainline Linux and help tackle some of the remaining areas there!

[PATCH 00/22] Allwinner A31/A83T MIPI CSI-2 Support and A31 ISP Support

Our currently proposed driver for the Allwinner ISP only supports a limited set of features: debayering with coefficients and 2D noise filtering. These features were sufficient for our use case, and allowed to offload the computationally intensive debayering process to a dedicated hardware accelerator.

As the driver for now relies on a specific user-space API that does not yet cover all aspects of the ISP, the driver was submitted to the Linux kernel staging area and will probably stay there until all ISP features are properly described.

Our work on this advanced camera support, including the ISP driver, has been described in the talk we have given earlier this week at the Embedded Linux Conference, for which the slides are already available.

Additional features and future work

The Allwinner ISP supports a lot more features beyond just debayering and noise filtering. For example, it supports statistics to implement 3A algorithms (auto-focus, auto-exposition and auto-white-balance) which are necessary to avoid manual configuration of scene-specific parameters. These could typically be implemented in libcamera, the community free software project that supports complex image capture pipelines and ISPs.

As a result Bootlin would be very interested to continue this work and bring this driver to a more advanced state. So if you have a project that could help move this topic forward, do not hesitate to contact us about it!

On-going Bootlin contributions to the Video4Linux subsystem: camera, camera sensors, video encoding

Over the past years, we have been more and more involved in projects that have significant multimedia requirements. As part of this trend, 2020 has lead us to work on a number of contributions to the Video4Linux subsystem of the Linux kernel, with new drivers for camera interfaces, camera sensors, video decoders, and even HW-accelerated video encoding. In this blog post, we propose to summarize our contributions and their status on the following topics:

Rockchip PX30, RK1808, RK3128 and RK3288 camera interface driver
Allwinner A31, V3s/V3/S3 and A83T MIPI CSI-2 support for the camera interface driver
Omnivision OV8865 camera sensor driver
Omnivision OV5648 camera sensor driver
TW9900 PAL/NTSC video decoder driver
Rockchip HW-accelerated H264 video encoding

Rockchip camera interface

The Rockchip ARM processors are known to have very good support in the upstream Linux kernel. However, one area where the support was lacking is in the support of the camera interface used by those SoCs. And it turns out that Bootlin engineer Maxime Chevallier has worked precisely on this topic throughout 2020: the development and upstreaming of the rkvip driver, a Video4Linux driver for the Rockchip camera interface. While the work was done and tested on a Rockchip PX30 platform, the same camera interface is used on RK1808, RK3128 and RK3288.

Several iterations of the driver have been posted on the linux-media mailing list, with the latest iteration, version 5, posted on December 29, 2020:

Maxime Chevallier (3):
  media: dt-bindings: media: Document Rockchip VIP bindings
  media: rockchip: Introduce driver for Rockhip's camera interface
  arm64: dts: rockchip: Add the camera interface description of the PX30

 .../bindings/media/rockchip-vip.yaml          |  101 ++
 arch/arm64/boot/dts/rockchip/px30.dtsi        |   12 +
 drivers/media/platform/Kconfig                |   15 +
 drivers/media/platform/Makefile               |    1 +
 drivers/media/platform/rockchip/vip/Makefile  |    3 +
 drivers/media/platform/rockchip/vip/capture.c | 1146 +++++++++++++++++
 drivers/media/platform/rockchip/vip/dev.c     |  331 +++++
 drivers/media/platform/rockchip/vip/dev.h     |  203 +++
 drivers/media/platform/rockchip/vip/regs.h    |  260 ++++
 9 files changed, 2072 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/rockchip-vip.yaml
 create mode 100644 drivers/media/platform/rockchip/vip/Makefile
 create mode 100644 drivers/media/platform/rockchip/vip/capture.c
 create mode 100644 drivers/media/platform/rockchip/vip/dev.c
 create mode 100644 drivers/media/platform/rockchip/vip/dev.h
 create mode 100644 drivers/media/platform/rockchip/vip/regs.h

We’re hoping to get this driver merged soon, as we have now addressed the feedback that was received through the 5 iterations the patch series as gone through. It should be noted that for now it only supports the parallel BT656 interface as this is what we needed for our current project, we are definitely able to extend it to support MIPI CSI2 as well if you’re interested!

It should be noted that as a result of this work, Maxime Chevallier also prepared and delivered a talk From a video sensor to your display which was given at the Embedded Linux Conference Europe 2020. See the slides and video.

Allwinner MIPI CSI2 camera interface

As part of an internship in 2020 and then a customer project, Bootlin intern Kévin L’Hôpital and Bootlin engineer Paul Kocialkowski worked on extending the Allwinnera camera interface support with support for MIPI CSI2 cameras. In fact, this addition was done to two Allwinner camera interface drivers: the sun6i driver which is used on Allwinner A31 and V3s/V3/S3, and the sun8i-a83t, which is used on the Allwinner A83T.

Through a fairly long 15 patches patch series, support for MIPI CSI2 is added to both camera interface controllers. We have tested both with Omnivision sensors, which are described below.

The series is currently in its third iteration, which was posted by Paul Kocialkowski on December 11, 2020 on the linux-media mailing list:


Paul Kocialkowski (15):
  docs: phy: Add a part about PHY mode and submode
  phy: Distinguish between Rx and Tx for MIPI D-PHY with submodes
  phy: allwinner: phy-sun6i-mipi-dphy: Support D-PHY Rx mode for MIPI
    CSI-2
  media: sun6i-csi: Use common V4L2 format info for storage bpp
  media: sun6i-csi: Only configure the interface data width for parallel
  dt-bindings: media: sun6i-a31-csi: Add MIPI CSI-2 input port
  media: sun6i-csi: Add support for MIPI CSI-2 bridge input
  dt-bindings: media: Add A31 MIPI CSI-2 bindings documentation
  media: sunxi: Add support for the A31 MIPI CSI-2 controller
  ARM: dts: sun8i: v3s: Add nodes for MIPI CSI-2 support
  MAINTAINERS: Add entry for the Allwinner A31 MIPI CSI-2 bridge
  dt-bindings: media: Add A83T MIPI CSI-2 bindings documentation
  media: sunxi: Add support for the A83T MIPI CSI-2 controller
  ARM: dts: sun8i: a83t: Add MIPI CSI-2 controller node
  MAINTAINERS: Add entry for the Allwinner A83T MIPI CSI-2 bridge

 .../media/allwinner,sun6i-a31-csi.yaml        |  88 ++-
 .../media/allwinner,sun6i-a31-mipi-csi2.yaml  | 149 ++++
 .../media/allwinner,sun8i-a83t-mipi-csi2.yaml | 147 ++++
 Documentation/driver-api/phy/phy.rst          |  18 +
 MAINTAINERS                                   |  16 +
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts  |   2 +-
 arch/arm/boot/dts/sun8i-a83t.dtsi             |  26 +
 arch/arm/boot/dts/sun8i-v3s.dtsi              |  67 ++
 drivers/media/platform/sunxi/Kconfig          |   2 +
 drivers/media/platform/sunxi/Makefile         |   2 +
 .../platform/sunxi/sun6i-csi/sun6i_csi.c      | 165 +++--
 .../platform/sunxi/sun6i-csi/sun6i_csi.h      |  58 +-
 .../platform/sunxi/sun6i-csi/sun6i_video.c    |  53 +-
 .../platform/sunxi/sun6i-csi/sun6i_video.h    |   7 +-
 .../platform/sunxi/sun6i-mipi-csi2/Kconfig    |  12 +
 .../platform/sunxi/sun6i-mipi-csi2/Makefile   |   4 +
 .../sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.c   | 590 ++++++++++++++++
 .../sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.h   | 117 ++++
 .../sunxi/sun8i-a83t-mipi-csi2/Kconfig        |  11 +
 .../sunxi/sun8i-a83t-mipi-csi2/Makefile       |   4 +
 .../sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.c    |  92 +++
 .../sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.h    |  39 ++
 .../sun8i_a83t_mipi_csi2.c                    | 657 ++++++++++++++++++
 .../sun8i_a83t_mipi_csi2.h                    | 197 ++++++
 drivers/phy/allwinner/phy-sun6i-mipi-dphy.c   | 164 ++++-
 drivers/staging/media/rkisp1/rkisp1-isp.c     |   3 +-
 include/linux/phy/phy-mipi-dphy.h             |  13 +
 27 files changed, 2581 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/media/allwinner,sun6i-a31-mipi-csi2.yaml
 create mode 100644 Documentation/devicetree/bindings/media/allwinner,sun8i-a83t-mipi-csi2.yaml
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/Kconfig
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/Makefile
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.c
 create mode 100644 drivers/media/platform/sunxi/sun6i-mipi-csi2/sun6i_mipi_csi2.h
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/Kconfig
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/Makefile
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.c
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_dphy.h
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_mipi_csi2.c
 create mode 100644 drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/sun8i_a83t_mipi_csi2.h

Here as well, the patch series has gone through a number of iterations, with significant reshaping to take into account the comments and feedback of other kernel developers and maintainers, so we hope to be near the point where it can be merged.

Omnivision OV8865 camera sensor driver

As part of his internship at Bootlin in 2020, Kévin L’Hôpital implemented a driver for the OV8865 camera sensor, connected over MIPI CSI2 to an Allwinner A83T platform. This OV8865 was then taken by Bootlin engineer Paul Kocialkowski, who did additional rework and polishing.

We are currently at the 4th iteration of this driver, which has been posted on December 11, 2020, and it has now been accepted and submitted to the V4L maintainer in a pull request.


Kévin L'hôpital (1):
  ARM: dts: sun8i: a83t: bananapi-m3: Enable MIPI CSI-2 with OV8865

Paul Kocialkowski (2):
  dt-bindings: media: i2c: Add OV8865 bindings documentation
  media: i2c: Add support for the OV8865 image sensor

 .../bindings/media/i2c/ovti,ov8865.yaml       |  124 +
 arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts  |  102 +
 drivers/media/i2c/Kconfig                     |   13 +
 drivers/media/i2c/Makefile                    |    1 +
 drivers/media/i2c/ov8865.c                    | 2981 +++++++++++++++++
 5 files changed, 3221 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/i2c/ovti,ov8865.yaml
 create mode 100644 drivers/media/i2c/ov8865.c

Omnivision OV5648 camera sensor driver

In addition to the work done by Bootlin intern Kévin L’Hôpital on OV8865 with Allwinner A83T, Paul Kocialkowski worked on OV5648 with Allwinner V3s, also connected over MIPI CSI2. This work results in a driver for the OV5648 camera sensor, which Paul has submitted to the linux-media mailing list.

This driver is now in is 5th iteration, posted on December 11, 2020, and it has now been accepted and submitted to the V4L maintainer in a pull request.


Paul Kocialkowski (2):
  dt-bindings: media: i2c: Add OV5648 bindings documentation
  media: i2c: Add support for the OV5648 image sensor

 .../bindings/media/i2c/ovti,ov5648.yaml       |  115 +
 drivers/media/i2c/Kconfig                     |   13 +
 drivers/media/i2c/Makefile                    |    1 +
 drivers/media/i2c/ov5648.c                    | 2638 +++++++++++++++++
 4 files changed, 2767 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/i2c/ovti,ov5648.yaml
 create mode 100644 drivers/media/i2c/ov5648.c

TW9900 PAL/NTSC video decoder driver

In addition to working on the Rockchip camera interface driver, Maxime Chevallier has also worked on a driver for the TW9900 PAL/NTSC video decoder. This chip from Renesas, takes as input an analog PAL or NTSC signal, digitizes it and outputs it on a parallel BT656 interface, which in our case was connected to a Rockchip PX30 platform.

Maxime posted the third iteration of the patch series adding this driver on December 22, 2020 on the linux-media mailing list.

Maxime Chevallier (3):
  dt-bindings: vendor-prefixes: Add techwell vendor prefix
  media: dt-bindings: media: i2c: Add bindings for TW9900
  media: i2c: Introduce a driver for the Techwell TW9900 decoder

 .../devicetree/bindings/media/i2c/tw9900.yaml |  60 ++
 .../devicetree/bindings/vendor-prefixes.yaml  |   2 +
 MAINTAINERS                                   |   6 +
 drivers/media/i2c/Kconfig                     |  11 +
 drivers/media/i2c/Makefile                    |   1 +
 drivers/media/i2c/tw9900.c                    | 617 ++++++++++++++++++
 6 files changed, 697 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/i2c/tw9900.yaml
 create mode 100644 drivers/media/i2c/tw9900.c

Rockchip HW-accelerated H264 video encoding

In 2018 and thanks to success of the crowd-funding campaign we ran back then, Bootlin engineer Paul Kocialkowski pioneered support for stateless video decoders in the Linux kernel, with a first driver supporting MPEG2, H264 and H265 HW-accelerated video decoding on Allwinner platforms.

In 2020, Paul was tasked to work on HW-accelerated H264 video encoding for Rockchip platforms, which also use a stateless video encoder. Of course, Paul took the same approach of going towards an upstream-acceptable solution rather than relying on out-of-tree and vendor-specific solutions provided by Rockchip.

Paul has been able to implement a working solution for one of our customers, and while the result is not yet in a shape where it can be submitted upstream, Paul has presented its result at the Embedded Linux Conference Europe 2020: the slides and video. The kernel code is available at https://github.com/bootlin/linux/tree/hantro/h264-encoding while the user-space code is available at https://github.com/bootlin/v4l2-hantro-h264-encoder.

As explained in Paul’s talk, this is not fully ready for upstream, as lots of discussions are needed on the user-space APIs, especially around the topic of rate control.

If you are interested in having this work fully available in the upstream Linux kernel, please contact us. We are looking for additional funding and support to push this completely upstream.

Conclusion

As can be seen from the numerous topics covered in this blog post, Bootlin has significant experience with the Video4Linux subsystem, and is able to both implement support for new hardware, extend the Video4Linux subsystem if needed, and contribute drivers and changes to the official Linux kernel.

Wrapping up the Allwinner VPU crowdfunded Linux driver work

Back in early 2018, Bootlin started a crowd-funding campaign to fund the development of an upstream Linux kernel driver for the VPU found in Allwinner processors. Thanks to the support from over 400 contributors, companies and individuals, we have been able to bring support for hardware-accelerated video decoding in the mainline Linux kernel for Allwinner platforms.

From April 2018 to end of 2019, Paul Kocialkowski and Maxime Ripard at Bootlin worked hard on developing the driver and getting it accepted upstream, as well as developing the corresponding user-space components. We regularly published the progress of our work on this blog.

As of the end of 2019, we can say that all the goals defined in the Kickstarter have been completed:

We have an upstream Linux kernel for the Allwinner VPU, in drivers/staging/media/sunxi/cedrus, which supports MPEG2 decoding (since Linux 4.19), H264 decoding (since Linux 5.2) and H265 decoding (will be in the upcoming Linux 5.5)
We have a user-space VA-API implementation called libva-v4l2-request, and which allows to use any Linux kernel video codec based on the request API.
We have enabled the Linux kernel driver on all platforms we listed in our Kickstarter campaign: A13/A10S/A20/A33/H3 (since Linux 4.19), A64/H5 (since Linux 4.20), A10 (since Linux 5.0) and H6 (since Linux 5.1, contributed by Jernej Skrabec)

This means that the effort that was funded by the Kickstarter campaign is now over, and from now on, we are operating in maintenance mode regarding the cedrus driver: we are currently not actively working on developing new features for the driver anymore.

Of course, there are plenty of additional features that can be added to the driver: support for H264 encoding, support for high-profile H264 decoding, support for other video codecs. Bootlin is obviously available to develop those additional features for customers, do not hesitate to contact us if you are interested.

Overall, we found this experience of funding upstream Linux kernel development through crowd-funding very interesting and we’re happy to have been successful at delivering what was promised in our campaign. Looking at the bigger picture, the Linux userspace API for video decoding with stateless hardware codecs in V4L2 has been maturing for a while and is getting closer and closer to being finalized and declared a stable kernel API: this project has been key in the introduction of this API, as cedrus was the first driver merged to require and use it. Additional drivers are appearing for other stateless decoding engines, such as the Hantro G1 (found in Rockchip, i.MX and Microchip platforms) or the rkvdec engine. We are of course also interested in working on support for these VPUs, as we have gained significant familiarity with all things related to hardware video decoding during the cedrus adventure.

Allwinner VPU support in mainline Linux end of the year status update

The end of 2018 is approaching, which is the right time to provide an update on where we stand with the Allwinner VPU support in Linux. The promise of our Kickstarter campaign was to deliver all funded stretch goals by the end of 2018. As we’ll explain below, all the goals have been met, and only the upstreaming process is not completely finished for some features.

With the Linux 4.20 release happening just a few days ago, the core of the Cedrus driver and the media Requests API that supports it are finally available through Linus Torvalds’ tree! They provide the required interface for hardware-accelerated MPEG-2 decoding on the A13, A20, A33 and H3 Allwinner SoCs.

Cedrus driver in staging

Discussions regarding the details of the stateless video decoding interface offered by the V4L2 API are still taking place, which explains why our driver was integrated as a staging media driver. It was also decided to hide the associated interface for MPEG-2 decoding from the public kernel headers for now. This way, the interface is not considered stable and can keep evolving as discussions are still taking place. Because the kernel has a policy of never breaking backward compatibility, these interfaces will hardly evolve once they are declared stable, so it is crucial to ensure they are ready when that happens. This is especially true given the level of complexity associated with video decoding and the various pitfalls paving that road.

Status of H264 and H265 decoding support

The code for H264 and H265 decoding is already developed and functional, has already been submitted for upstream inclusion but it hasn’t been merged yet. However, with the decoding interfaces considered unstable for now, it should become easier to bring-in H.264 and H.265 decoding support to mainline. Support for these two codecs in the API and our driver is still under review and we will continue sending new iterations for them, with the suggested changes and fixes as well as adaptations for the still-evolving decoding interface. More specifically, our series need to be updated to the latest proposal for identifying reference frames, which is now based on the timestamps of the buffers.

Platform support: H5, A64 and A10 on the horizon

Regarding the platforms supported by our driver, the patches for H5 and A64 support were accepted since our last update and they will make it to Linux 4.21! We are also looking to send out support for the A10, which was missing from our previous series (but shouldn’t cause any particular trouble).

DRM driver improvements

The adaptations for the DRM driver allowing to display the raw decoded frames from the VPU on older platforms were also finalized, with the first half of the series already queued for 4.21. Subsequent changes in the series are still waiting for more reviews as they affect common parts of the framework.

Conclusion

Overall, we are slowly closing down on this project and hoping to get the remaining pieces of our work integrated in mainline Linux as soon as possible, as there seems to be very few obstacles left in the way.

Allwinner VPU support in mainline Linux November status update

Since our previous update back in September, we continued the work to reach the goals set by our crowdfunding campaign and made a number of steps forward. First, we are happy to announce that the core of the Cedrus driver was approved by the linux-media maintainers! It followed the final version of the media request API (the required piece of media framework plumbing necessary for our driver).

Both the API and our driver were merged in time for Linux 4.20, that is currently at the release candidate stage and will be released in a few weeks. The core of the Cedrus driver that is now in Linus’ tree supports hardware-accelerated video decoding for the MPEG-2 codec. We have even already seen contributions from the community, including minor fixes and improvements!

We have also been following-up on the other features covered by our crowdfunding campaign and made good progress on bringing them forward:

The series bringing H.264 decoding to our driver was updated for a second revision on November 15, rebased atop the upcoming Linux release and including a number of fixes as well as documentation;
H.265 decoding support followed with a second version sent on November 23, based on the updated H.264 series and bringing various minor improvements over the first iteration;
The patch series for the display engine DRM driver that adds support for the tiled YUV format used by the VPU was also updated, significantly reworked and submitted again on November 23;
Finally, we submitted a patch series adding support for the A64 and H5 Allwinner SoCs in the Cedrus VPU driver on November 15.

A64 and H5 boards provided by our sponsors Olimex and Libretech, now supported in Cedrus

With these patch series well on their way, we are closer than ever to delivering the remaining goals of the crowdfunding campaign!

Allwinner VPU support in mainline Linux status update (week 37)

Even though the bulk of the development on the Allwinner VPU support is done, we are still working on completing the upstreaming of the kernel driver, and some progress has been made recently on this topic:

On September 10, core Video4Linux developer Hans Verkuil sent a pull request to Video4Linux maintainer Mauro Carvalho Chehab to get the Cedrus driver merged. This means we’re getting closer and closer to have the driver merged. Unfortunately, some last minute issues were found in the patch series, so this pull request wasn’t merged.
On September 13, Bootlin engineer Maxime Ripard sent a new iteration of the Cedrus driver, version 10, which addresses those issues.
In addition, as the Allwinner platform maintainer, Maxime Ripard has merged the patches adding the Device Tree description of the Allwinner VPU, which reduces the Cedrus patch series to just 5 patches. They are now in the branch sunxi/dt-for-4.20, which should be part of the upcoming 4.20 Linux release.

T-Shirt for Allwinner VPU campaign supporters

In addition to this progress on the Linux kernel driver upstreaming process, we also moved forward with delivering the perks to the companies and individuals who supported our campaign:

A CREDITS file has been added to the libva-v4l2-request base, thanking all our backers who pleged more than 16 EUR.
The T-Shirts for the backers who pledged more than 128 EUR have been sent to those in the EU. We are also working on sending the t-shirts to those outside the EU, but it takes a bit more time due to the need for customs declarations. Don’t hesitate to take a picture of you with the T-Shirt, and post it on Twitter with the hashtag #VPULinuxDriverSupporter.

Final weekly status update for Allwinner VPU support in mainline Linux (week 35)

The end of August has arrived, bringing an end to Paul’s engineering internship at Bootlin, focused on bringing mainline Linux support for the VPU found on Allwinner platforms. Over the past six months, we have worked hard to reach the goals announced in the project’s crowdfunding campaign and we were able to deliver most of the main goals last month.

Since last month delivery, we made great progress on supporting the H265 codec, one of the stretch goals that were funded during the campaign. A dedicated patch series introducing support for it was submitted to the linux-media mailing list earlier this week, as well as a new iteration of the base Cedrus VPU driver. As the Request API is on the verge of integrating the Linux kernel, our VPU driver should follow pretty soon.

Reaching the end of the funding: a status on where we stand

We have now exhausted the budget that was provided through the crowdfunding campaign: both Maxime Ripard’s time (who worked mainly on the H264 decoding and helping with DRM topics) and Paul’s internship are over, and therefore the remaining work will be done on a best-effort basis, without direct funding. This will therefore be the last weekly update, but we will be publishing updates once in a while when interesting progress is made.

Here is a quick summary of our current status, compared to what was promised during our Kickstarter campaign:

Making sure that the codec works on the older Allwinner SoCs that are still widely used: A10, A13, A20, A33, R8 and R16. This goal is fully met;
Polishing the existing MPEG2 decoding support to make it fully production ready. This goal is fully met;
Implementing H264 video decoding. This goal is fully met with base H264 decoding support implemented. However, a number of more advanced H264 features have not been implemented, and therefore additional improvements could be made;
Modifying the Allwinner display driver in order to be able to directly display the decoded frames instead of converting and copying those frames. This goal is fully met.
Providing a user-space library easy to integrate in the popular open-source video players. This goal is partially met. We do provide a user-space library that offers a VA-API implementation, however the integration with popular video players turned out to be a lot more challenging than expected, and we only offer Kodi integration at this point. See below for details;
Upstreaming those changes to the official Linux kernel. This goal is in progress, on both the VPU driver side and DRM improvements side;
Supporting the newer Allwinner SoCs (H3, H5, A64). This goal is partially met, since H3 is supported, but not yet H5 and A64;
H265 video decoding support. This goal is fully met with base H265 decoding support implemented. Like H264, a number of more advanced features have not been implemented, so there is room for more work.

The most challenging topic: integration with open source video players

The major pitfalls that we encountered are related to integrating our accelerated video decoding pipeline with multimedia players. They will require extra work out of the scope of the VPU campaign to reach a production-ready state.

We considered a number of options for integrating with a desktop environment under Xorg, which was especially tricky for the oldest Allwinner platforms where the VPU outputs a tiled YUV format. The chain of required operations includes untiling, colorspace conversion (from YUV to RGB), scaling and composition.

We first resorted to the main CPU for all the required operations (including NEON-backed untiling routines), which becomes unbearably slow as soon as scaling is involved in the process.
We tried to bring-in the GPU for accelerating the untiling, colorspace conversion, scaling and composition operations involved. Although we wrote a shader-based untiler, the Mali blobs did not allow for importing the raw frame data on a byte-by-byte basis. This made GPU acceleration unusable for our use case in practice. Bringing-in the GPU for the final composition step only (that should be possible with GBM-enabled blobs) could however bring some speedup.
Another lead is to use the Xv extension of the X11 API, that fits the bill for using the Display Engine hardware to accelerate these operations, but this interface is quite old now and increasingly deprecated. It also only allows sub-optimal use cases, with one video at a time.

We also investigated the situation for media players that can run without a display server, which removes the need for the composition step and allows using the Display Engine hardware directly, through the DRM interface.

We succeeded at bringing up support for the Kodi mediacenter, by adding the required bits to implement a zero-copy pipeline.
We worked on getting GStreamer to correctly pipe VAAPI-based decoding to the DRM-enabled kmssink without going through the GPU, but did not end up with any functional result, so significant work remains in that area.

Going further: what will happen now ?

Here are the topics that we intend to continue work on in this best-effort mode and complete by the end of 2018, as promised in our crowdfunding campaign:

Ensure the base Cedrus Linux kernel driver gets merged;
Ensure the H264 decoding support in the Cedrus driver gets merged;
Ensure the H265 decoding support in the Cedrus driver gets merged;
Ensure the DRM driver improvements get merged;
Enable VPU support on H5 and A64.

Here are other topics that we do not intend to work on without additional funding. Individuals who want to see some progress on those topics are invited to contribute and join the effort of improving Allwinner VPU support in upstream Linux. Companies interested in those features can also contact us.

Additional H264 and H265 decoding features: interlaced video support (H264 and H265), quantization matrices (H265), 10-bit (H265), 4K resolution (H265);
Other codecs beyond MPEG2, H264 and H265, such as VP8;
Encoding support;
Additional work on GStreamer integration or X.org integration.

Thanks

Once again, we would like to thank all the individuals and companies who participated to our crowdfunding campaign, and made this project possible. We are very happy to see that despite the uncertainties involved in all software development projects, we have been to deliver the vast majority of the goals, within the expected time frame, while delivering weekly updates of our progress. It was a new experience for Bootlin, and we hope to renew this experience for other Linux kernel upstream developments in the future!

Allwinner VPU support in mainline Linux status update (week 34)

This week has seen great advancements in H265 support, following up on the work conducted during the past weeks. The first item to debug was support for bi-directional predictive frames (AKA B frames) which was broken last week. This required some adaptation in our standalone test tool v4l2-request-test in order to display the decoded frames in the right order. With bi-directional prediction, the display order no longer matches the decoding order, in which the coded frames are stored in the bitstream.

With the images displayed in the right order, the debugging process was a matter of comparing the configuration register values written by our driver with the reference provided by libvdpau-sunxi, but it was not enough. A specific buffer has to be provided for each frame for the decoder to store extra meta-data related to bi-directional frame prediction. With the buffer set, the situation vastly improved and only minor issues had to be resolved.

This lead to properly decoding our reference H265 video that contains I, P and B frames! A few more videos were also tested to spot possible bugs and were eventually decoded correctly too. Of course, due to the many possible combinations of H265 features, it is possible that we are still missing some corner cases, but the bulk of H265 support is well in place at this point.

We moved on to adding support for H265 in libva-v4l2-request, which allows the integration of the codec with media players such as VLC and Kodi. We hit a few hiccups during the bringup :

Hiccups when integrating H265 with VAAPI

But we managed to fix the integration of H265 to behave properly :

H265 decoding working properly with VAAPI

So H265 is now integrated in our pipeline and we are ready to submit the patches introducing its support for the Cedrus driver, which should come around next week.

Allwinner VPU support in mainline Linux status update (week 33)

The first task that was tackled this week was solving the bit offset issue encountered last week. We found out that ffmpeg provides VAAPI with a byte-aligned value after rounding it up from an internal offset it keeps in bits. When trying to use the internal value in bits, our VPU would succeed at decoding the H265 frame. After looking at the values for a few distinct frames, it became clear that the offset matched the beginning of a Golomb-coded compressed sequence, starting with a 1 bit and followed by zeros, as a prefix code. Detecting this pattern appears to work reliably for the H265 videos we could test.

This paved the way for properly decoding intra-coded (I) H265 frames without any hardcoded value left in the code. With that in place, it was only a small stretch to decode a few seconds of video made of I frames!

Of course, intra-coded frames are rare in H265 videos since they do not use any temporal compression technique and are thus larger in size. Predicted frames (using references from already-decoded frames) compose the vast majority of H265 videos. Prediction takes places either for forward prediction (P frames) or both forward and backward prediction (B frames). Supporting these prediction modes requires significant driver-side work, especially to handle the metadata (such as prediction weight coefficients) associated to each frame in the reference lists and the lists on their own. On the framework side, V4L2 controls also had to be introduced to bring the required plumbing for these features.

As of today, we successfully implemented support for P frames while B frames are still work in progress. To illustrate our progress, the same video can be seen decoded in v4l2-request-test (at nominal and half speed), with the two prediction modes :

With I and P frames, the video is decoded correctly:

Some more work seems to be required for B frames:

Next week will be the opportunity to move forward on B frames decoding!

Allwinner VPU support in mainline Linux status update (week 32)

This week started with the preparation of a new revision of the Cedrus VPU driver, after significant feedback was received on the version posted two weeks ago. Thanks to the careful testing carried out by community member Jernej Škrabec, a number of decoding issues were discovered in version 6 of the driver. This includes issues related to MPEG2 decoding but also to the use of the VPU untiling block, that affects all codecs indifferently.

The latest version of the driver (that was posted on Thursday) brings-in the required fixes for properly supporting the videos for which decoding errors were spotted.

Some updates were also included on the MPEG2 controls side, in order to bring them closer to the raw bitstream parameters. Some parameters (that are not exposed by VAAPI) were also added, making the V4L2 controls broader than what is strictly required for our VPU.

Regarding H265, progress was slow this week due to a mismatch between values provided by VAAPI and what our VPU expects. More specifically, VAAPI provides a byte-aligned value for the offset to the coded video data in the slice (which also includes a header with metadata) while our VPU expects a bit-aligned value that does not match the value provided by VAAPI. We are hard at work to figure out a solution to this issue, but it is not straightforward. In addition, the reference libvdpau-sunxi code does not set that offset explicitly, as it is reached after parsing the header through the VPU itself. In our case, the parsing is done in userspace so the use case differs.