Bringing NV-DDR support to parallel NAND flashes in Linux

We have recently contributed support for NV-DDR interfaces to parallel NAND flashes in the Linux kernel, which brings performance improvements for a number of NAND flash devices. In this article, we will detail what are the ONFI specifications, the historical SDR interface, then the introduction of faster interfaces in the ONFI specification, and finally our work to support such interfaces in the Linux kernel.

ONFI specifications

Even though specifications came after the introduction of NAND devices on the market, the Open NAND Flash Interface (ONFI) specification is nowadays a de-facto specification which many NAND chip support (even non-ONFI ones). For instance, in the Linux kernel, we assume that any NAND flash device will by default, after a reset command, at least support the slowest set of ONFI timings. Other specifications exist, like the Joint Electron Device Engineering Council (JEDEC), but as it is a bit less common in the parallel NAND flashes world, we will focus on the ONFI details in this blog post.

The early days of the SDR interface

At the time of the first ONFI specification back in 2006, there was only a single interface detailed: the asynchronous data interface. Also known as Single Data Rate or SDR interface in modern language, it defines the timings sequence that should be respected in order for any NAND controller to be able to deal with almost any kind of NAND device. As an asynchronous interface, in this interface, the data bus has no clock signal. Instead, it features a specific set of signals which are asserted by the controller to signal read data latch and write data latch: Read Enable (RE#) and Write Enable (WE#).

The data interface can work in 6 different timing modes, from 0 to 5. 0 is the slowest mode and the default one at boot time with a theoretical data rate of about 10MiB/s (assuming an 8-bit bus). Mode 4 and 5 are the fastest, they leverage the ability of Extended Data Output (EDO) to latch data on both RE#/WE# edges and may reach a theoretical data rate of 50MiB/s.

The introduction of faster interfaces

Shortly after, at the beginning of 2008, the ONFI consortium released the second version of the ONFI specification and included a new interface: the source synchronous data interface. This interface is backward compatible with the asynchronous interface and allows the host to switch from one interface to the other if this is needed. In the particular case of the source synchronous interface, a clock (CLK) signal is replacing the legacy WE# signal and indicates when the commands and address should be latched. The direction of the transfers is handled by the Write/Read signal (W/R#) in place of RE# signal. Finally, a data strobe (DQS) signal is being introduced and indicates when the data should be latched. As both edges of the DQS signal advertise for a data latch, the source synchronous interface is also called Double Data Rate (DDR) interface even though this naming was only introduced in the version 3.0 of the specification, in 2011.

The exact terms that are used in more recent specifications are NV-DDR (Non-Volatile DDR), NV-DDR2 and NV-DDR3 which are backward compatible improvements of the NV-DDR interface. For instance, the first NV-DDR specification has a range of theoretical rates from 40MiB/s to 200MiB/s.

Support in the Linux kernel

While the addition of the MTD/NAND subsystem in the Linux kernel predates the Git era and is now over 20 years old, Linux users have always been limited to use the asynchronous interface (SDR modes). At Bootlin, we recently started an effort to bring support for the NV-DDR interface to the Linux kernel MTD/NAND subsystem, and this involved the following changes:

Introducing an API to propose timings to the host controller driver, so that it might either accept or refuse them (only SDR mode 0 cannot be refused) and be aware of all timings that this choice involves so that the host controller registers will be configured properly.
Adding the possibility for NAND chip drivers to tweak the timings if the parameter page is not present or inaccurate.
Adding the core logic to ask the NAND chip to change its data interface through the use of GET_FEATURE and SET_FEATURE calls, as well as verifying that this operation worked correctly and handling the fallback in case of error.

We recently reached a final step in this effort as the last missing parts will be part of the next Linux kernel release (v5.14). This final series aiming at bringing NV-DDR support to Linux carries the following changes:

Adding the necessary bits to parse the parameter page of the NAND device in order to know which NV-DDR modes the chips support.
Providing the reference implementation of all NV-DDR timing modes and various helpers to manage them.
Adding the necessary infrastructure and helpers to the host controller drivers in order to allow them to distinguish between SDR and NV-DDR, as well as advertise which mode they are willing to support based on the controller’s constraints.
Updating the existing logic to take into account the existence of NV-DDR timings and select them when appropriate. This part is a bit trickier as the core must gracefully fallback to SDR modes under certain conditions.

Overall, thanks to the major cleanups which happened in the NAND subsystem in the last three years, it was pretty straightforward to add support for these new timings.

Future work

It is worth mentioning that accelerating the overall throughput on the data bus without a deeper rework of the MTD core than just enabling faster timings is very limiting: data reads must respect a tR delay before starting and writes are considered effective only after a tPROG delay. Both are significantly high in practice: respectively about 25-45us and 200-600us, compared to the time needed to store/fetch the data through the I/O bus: a few dozens of micro-seconds.

To fully leverage the power of NV-DDR timings the NAND and MTD cores should be partially rewritten to bring parallel multi-die support and cached operations. Such features would allow to optimize the use of the I/O bus in order to mitigate the performances impact of tR and tPROG during massive I/O operations. This is precisely one of the tricks used by SSD drives to exhibit very fast I/Os while using multiple NAND chips behind. There is therefore interesting additional work to do in the Linux kernel MTD subsystem to fully benefit from NV-DDR interfaces.