LZO kernel compression

As Michael stated in his review of the interesting features in Linux 2.6.30, new compression options have been recently added to the kernel. We therefore decided to have a look at those compression methods, from a compression ratio and decompression speed point of view.

This comparison will be based on “self-extractible kernels”, that is, kernel images containing bootstrap code allowing them to extract a compressed image. As underlined in the previous article, this approach is not used on all architectures. Blackfin, notably, chose a different path and compresses the whole kernel image, without including bootstrap code. While this has the clear advantage of making compression much simpler with respect to kernel code, it forces decompression out to the bootloader code.

Each of those methods has its advantages. Indeed, the Blackfin approach relies on the bootloader to provide the necessary functions, so that may be a problem to do things like bypassing u-boot to reduce the boot time. On the other hand, implementing it only once in the bootloader (as architecture-independent code) makes it unnecessary to write the low-level bootstrap code for each architecture in the kernel, which is surely interesting on virtually all architectures, the notable exceptions being x86 and ARM.

Gzip (also known as Zlib or inflate) has been the traditional (and, as a matter of fact, only) method used to compress kernels. Consequently, we’ll use it as the reference in the following tests. Our test environment is as follows:

ARM9 AT91SAM9263 CPU, 200MHz, using the mainline arch/arm/configs/usb-a9263.config.

This comparison includes figures for LZO, a new kernel image compression method that I have contributed to the Linux sources, and which hopefully will make its way into the mainline kernel. LZO support in the kernel is only new for kernel decompression, as it is already used by JFFS2 and UBIFS. LZO is a stream-oriented algorithm, and although its compression ratio is lower than that of gzip, decompression is lightning-fast, as we will soon find out.

So, here are the figures, average on 20 boots with each compression method:

Uncompressed 3.24Mo 200%
LZO 1.76Mo 0.552s 70% 109%
Gzip 1.62Mo 0.775s 100% 100%
LZMA 1.22Mo 5.011s 646% 75%

Bzip2 has not been tested here: the low-level bootstrap file, head.S, only allocates 64Kb for use by malloc() on ARM. Some quick tests showed that the kernel would not extract with less than 3.5Mib of malloc() space. That would require to modify head.S so that malloc can use more memory, which we will not do here. However, given that enough memory is usable on the system, one could well use bzip2. All the other algorithms performed the extraction using the standard 64Kib malloc space.

From the results, we can clearly see that LZMA is nearly unsuitable for our system, and should be considered only if the space constraints for storing the kernel are so tight that we can’t afford to use more space that was is strictly necessary.

LZO looks like a good candidate when it comes to speeding up the boot process, at the expense of some (almost neglectable) extra space. Gzip is close to LZO when it comes to size, although extraction is not as fast. That means that unless you’re hitting corner cases, like only having enough space for a Gzip compressed image but not for one made with LZO, choosing the latter is probably a safe bet.

Besides, the LZO-compressed kernel size is about 54% the size of the uncompressed kernel. As the kernel load time varies linearly with its size, load time for an uncompressed kernel doubles. While 0.55s are won because there’s no need to run a decompression algorithm, you spend twice as much time loading the kernel. This time is not negligible at all compared to the decompression time. Indeed, loading the uncompressed image takes roughly 0.8s. That means that at the cost of slowing down the boot process by 0.15s (compared to an uncompressed kernel), one gets a kernel image which is roughly twice as small. Rather nice, isn’t it?

Bootlin at ELCE 2009

Grenoble

As usual, we won’t miss this year’s edition of the Embedded Linux Conference Europe, which has always been a great source of information and encounters for embedded Linux developers.

Here are details about our involvement this year.

  • I am part of the organization committee, in particular the coordinator for the Technical Showcase.
  • Taking advantage of his stay in Grenoble, my colleague Thomas Petazzoni will make an embedded Linux presentation on Tuesday, Oct. 13 at 7:30 pm, at GUILDE, the local Linux user group.
  • Thomas and I will be present at the Embedded Systems Exhibition on Wednesday, Oct. 14, sharing a booth with our partner CALAO Systems. The exhibition entry is free of charge, and this will be an excellent opportunity to meet us and have enough time to talk about your topics of interest.
  • Thomas will lead the Buildroot BOF with Peter Korsgaard, Buildroot’s maintainer, at 5:35 pm on Thursday, Oct. 15. This informal session will allow users and developers to meet and exchange ideas.
  • I will be the leader of the Small Business BOF on Thursday 15 at 6:35 pm, an informal session for small embedded Linux companies interested in sharing experience and best practices, and of course to know each other better.
  • I will make a presentation on boot time reduction techniques, at 3:40 on Friday, Oct. 16.
  • Albin Tonnerre, who was an intern at Bootlin this summer, will participate to the Technical Showcase at 12:00 am on Friday, Oct. 16, showing the benefits of LZO decompression on kernel boot time. During his internship, Albin made very nice contributions to boot time reduction, power management on AT91 and to U-boot board support.
  • Thomas Petazzoni will also participate to the Technical Showcase at the same time, showing Buidroot’s new features.
  • We will videotape the conferences we go to and will release the videos later on our website.
  • Thomas organizes a Buildroot developer day on Saturday, Oct. 17, allowing developers to meet and code together. Bootlin will offer lunch to the participants, and the room will be offered by CALAO Systems. There are no more seats left for space reasons.

Hope to see you in Grenoble!

Crosstool-NG 1.5.0 released, with new features!

Crosstool-NG, the successor of Crosstool, is a tool that automates the complicated process of building a cross-compilation toolchain. It provides a nice menuconfig interface to fine tune the configuration of the toolchain, before creating the toolchain automatically by retrieving, extracting, patching, configuring, compiling and installing the different components in the right order, with the right arguments and configuration.

Yann E. Morin, the maintainer of Crosstool-NG has just announced the release of Crosstool-NG 1.5.0, with the following new features:

  • Support for gcc 4.4
  • Experimental support for canadian-cross toolchains. Canadian-cross toolchains are toolchains that are built on machine A, to run on machine B and generate code for machine C, while usual cross-compilation toolchain are built on machine A, to run on machine A and generate code for machine B
  • Experimental support for AVR32, with support for operation as MMU-less, thanks to the newlib C library

In addition to these important features, bugs have been fixed and improvements have been made. Yann also switched the development of Crosstool-NG from Subversion to Mercurial, in order to ease community participation in the improvement of Crosstool-NG.

By the way, if you are interested in Crosstool-NG, don’t miss Yann E. Morin’s talk at the Embedded Linux Conference Europe 2009. See the program for details.

CALAO boards supported in mainline U-Boot

I’m happy to announce that a couple days ago, support for the CALAO SBC35-A9G20, TNY-A9260 and TNY-A9G20 boards made its way into the U-Boot git repository. Sadly, it’s not possible to boot from an MMC/SD card with the SBC35 yet, but it’s something I’m currently working on.

Support for all these cards will be available in the next U-Boot release, due in November.

Buildroot simplified for users!

Buildroot logoYesterday, a set of patches I’ve authored that aims at simplifying Buildroot for users has been merged into the official version of the project, and will therefore be part of the next stable release (scheduled for November, according to our 3 months release cycle). This work is probably my major contribution to Buildroot, outside of external toolchain support and various fixes here and there. Here are quick details about the improvements brought by these patches :

  • Remove the “project” feature. The project feature removal was the main point of this patch set. This feature, that allows to compile a system for different, but very similar platforms, without recompiling everything from scratch, was rarely used and introduced a lot of complexity in the usage of Buildroot for newcomers. Who hasn’t been confused by this project_build_arm directory? This thing is gone now.
  • Remove the BOARD/LOCAL feature, which duplicates another way of adding support for new targets in Buildroot. This is the kind of feature that has been added at the time Buildroot was basically unmaintained, when nobody was able to say « Hey, but you’re just trying to-reimplement something that already exists »
  • Move all output directories in an output directory. By default, when Buildroot is compiled, it generates several directories in the middle of its source code. Now, with this patch, everything is grouped into an output directory, unless out-of-tree compilation is used, of course (with O=)
  • Remove TOPDIR_PREFIX and TOPDIR_SUFFIX since the same effect could already be done using out-of-tree compilation with O=. Another duplicated feature that should never have reached the tree.
  • Rename the output directories. Now that everything is properly stored in an output directory, it was time to rename the subdirectories to make them more meaningful. So now, we have build where all packages are built, images that contains the final binary images of the root filesystem and the kernel, staging which contains the staging directory (all packages installed with their development headers and libraries), target that contains the root filesystem for the target (without the device files), host that contains the installation of the host tools that Buildroot requires for its execution, stamps that contains the stamp files used by Buildroot to keep track of the compilation progress. Therefore, all directories such as build_ARCH or toolchain_build_ARCH have disappeared.
  • Major documentation update, to of course make sure that our documentation is up-to-date with the latest changes.

Getting all these changes mainlined is really a nice thing. I also have tons of other ideas to improve Buildroot infrastructure, and I’m sure the coming Buildroot Developer Day will be a great opportunity to discuss these.

Faster boot: starting Linux directly from AT91bootstrap

Reducing start-up time looks like one of the most discussed topics nowadays, for both embedded and desktop systems. Typically, the boot process consists of three steps: AT91SAM9263 CPU

  • First-stage bootloader
  • Second-stage bootloader
  • Linux kernel

The first-stage bootloader is often a tiny piece of code whose sole purpose is to bring the hardware in a state where it is able to execute more elaborate programs. On our testing board (CALAO TNY-A9260), it’s a piece of code the CPU stores in internal SRAM and its size is limited to 4Kib, which is a very small amount of space indeed. The second-stage bootloader often provides more advanced features, like downloading the kernel from the network, looking at the contents of the memory, and so on. On our board, this second-stage bootloader is the famous U-Boot.

One way of achieving a faster boot is to simply bypass the second-stage bootloader, and directly boot Linux from the first-stage bootloader. This first-stage bootloader here is AT91bootstrap, which is an open-source bootloader developed by Atmel for their AT91 ARM-based SoCs. While this approach is somewhat static, it’s suitable for production use when the needs are simple (like simply loading a kernel from NAND flash and booting it), and allows to effectively reduce the boot time by not loading U-Boot at all. On our testing board, that saves about 2s.

As we have the source, it’s rather easy to modify AT91bootstrap to suit our needs. To make things easier, we’ll boot using an existing U-Boot uImage. The only requirement is that it should be an uncompressed uImage, like the one automatically generated by make uImage when building the kernel (there’s not much point using such compressed uImage files on ARM anyway, as it is possible to build self-extractible compressed kernels on this platform).

Looking at the (shortened) main.c, the code that actually boots the kernel looks like this:

int main(void)
{
/* ================== 1st step: Hardware Initialization ================= */
/* Performs the hardware initialization */
hw_init();

/* Load from Nandflash in RAM */
load_nandflash(IMG_ADDRESS, IMG_SIZE, JUMP_ADDR);

/* Jump to the Image Address */
return JUMP_ADDR;
}

In the original source code, load_nandflash actually loads the second-stage bootloader, and then jumps directly to JUMP_ADDR (this value can be found in U-Boot as TEXT_BASE, in the board-specific file config.mk. This is the base address from which the program will be executed). Now, if we want to load the kernel directly instead of a second-level bootloader, we need to know a handful of values:

  • the kernel image address (we will reuse IMG_ADDRESS here, but one could
    imagine reading the actual image address from a fixed location in NAND)
  • the kernel size
  • the kernel load address
  • the kernel entry point

The last three values can be extracted from the uImage header. We will not hard-code the kernel size as it was previously the case (using IMG_SIZE), as this would lead to set a maximum size for the image and would force us to copy more data than necessary. All those values are stored as 32 bits bigendian in the header. Looking at the struct image_header declaration from image.h in the uboot-mkimage sources, we can see that the header structure is like this:

typedef struct image_header {
uint32_t    ih_magic;    /* Image Header Magic Number    */
uint32_t    ih_hcrc;    /* Image Header CRC Checksum    */
uint32_t    ih_time;    /* Image Creation Timestamp    */
uint32_t    ih_size;    /* Image Data Size        */
uint32_t    ih_load;    /* Data     Load  Address        */
uint32_t    ih_ep;        /* Entry Point Address        */
uint32_t    ih_dcrc;    /* Image Data CRC Checksum    */
uint8_t        ih_os;        /* Operating System        */
uint8_t        ih_arch;    /* CPU architecture        */
uint8_t        ih_type;    /* Image Type            */
uint8_t        ih_comp;    /* Compression Type        */
uint8_t        ih_name[IH_NMLEN];    /* Image Name        */
} image_header_t;

It’s quite easy to determine where the values we’re looking for actually are in the uImage header.

  • ih_size is the fourth member, hence we can find it at offset 12
  • ih_load and ih_ep are right after ih_size, and therefore can be found at offset 16 and 20.

A first call to load_nandflash is necessary to get those values. As the data we need are contained within the first 32 bytes, that’s all we need to load at first. However, some space is required in memory to actually store the data. The first-stage bootloader is running in internal SRAM, so we can pick any location we want in SDRAM. For the sake of simplicity, we’ll choose PHYS_SDRAM_BASEhere, which we define to the base address of the on-board SDRAM in the CPU address space. Then, a second call will be necessary to load the entire kernel image at the right load address.

Then all we need to do is:

#define be32_to_cpu(a) ((a)[0] << 24 | (a)[1] << 16 | (a)[2] << 8 | (a)[3])
#define PHYS_SDRAM_BASE 0x20000000

int main(void)
{
unsigned char *tmp;
unsigned long jump_addr;
unsigned long load_addr;
unsigned long size;

hw_init();

load_nandflash(IMG_ADDRESS, 0x20, PHYS_SDRAM_BASE);

/* Setup tmp so that we can read the kernel size */
tmp = PHYS_SDRAM_BASE + 12;
size = be32_to_cpu(tmp);

/* Now, load address */
tmp += 4;
load_addr = be32_to_cpu(tmp);

/* And finally, entry point */
tmp += 4;
jump_addr = be32_to_cpu(tmp);

/* Load the actual kernel */
load_nandflash(IMG_ADDRESS, size, load_addr - 0x40);

return jump_addr;
}

Note that the second call to load_nandflash could in theory be replaced by:

load_nandflash(IMG_ADDRESS + 0x40, size + 0x40, load_addr);

However, this will not work. What happens is that load_nandflash starts reading at an address aligned on a page boundary, so even when passing IMG_ADDRESS+0x40 as a first argument, reading will start at IMG_ADDRESS, leading to a failure (writes have to aligned on a page boundary, so it is safe to assume that IMG_ADDRESS is actually correctly aligned).

The above piece of code will silently fail if anything goes wrong, and does no checking at all – indeed, the binary size is very limited and we can’t afford to put more code than what is strictly necessary to boot the kernel.

First Buildroot Developer Day after ELCE, on October 17th in Grenoble, France

Buildroot logoThe first Buildroot Developer Day will take place on Saturday, October 17th in Grenoble, France, just the day after Embedded Linux Conference Europe. This Developer Day aims at allowing Buildroot developers to meet and exchange ideas on the project and its future.

As the number of places is limited, interested candidates are invited to send an e-mail to Peter Korsgaard (jacmet at uclibc dot org) and Thomas Petazzoni (thomas dot petazzoni at free-electrons dot com).

Peter Korsgaard and I are organizing this Developer Day thanks to the sponsorship of Calao Systems (offering the location for the meeting) and Bootlin (offering free lunch to the participants).

See also the announcement of the mailing list and on Buildroot website.

Embedded Linux Conference Europe 2009 program

GrenobleAs we announced a few months ago, Embedded Linux Conference 2009 will take place in Grenoble, France on October 15th and 16th. The list of sessions has just been published online, and as usual, it is a very exciting list of in-depth technical talks.

Because of my involvement in Buildroot, I’m particularly interested in the numerous talks about embedded Linux build systems: Florian Fainelli will talk about OpenWRT (which is not only dedicated to Wifi routers, but is a generic embedded Linux build system, built as a fork of Buildroot), Gordon Hecker on e2factory (a build system I’ve never heard of until now, but conferences are great to discover new tools and projects), Cedric Hombourger on OpenEmbedded (I will be particularly happy to meet Cedric again since I had the chance to work with him six years ago), Marcin Jusziewicz on OpenEmbedded again, Robert Schwebel on PTXdist and finally Alex de Vries on what seems to be a more generic talk about build systems.

Of course, besides build systems, a lot of other topics will be covered. I’ve noted things such as the talk on Canola by Gustavo Barbieri, the boot time presentation by Grégory Clément, the device tree talks by Wolfram Sang and Vitaly Wool, the talk by Alessandro Rubini in order to meet one of the author of Linux Device Drivers, the power management and clock management talks also.

Bootlin will obviously be present during this conference :

  • Michael Opdenacker, my colleague, will give a talk entitled Update on boot time reduction techniques
  • Michael, again, will be the chair of a Small Business BOF, which should allow small companies offering services around embedded Linux to meet and exchange their ideas and experience
  • Finally, I will be the co-chair with Peter Korsgaard of a BOF on Buildroot
  • We will share a booth with our partner Calao Systems during the Minalogic Embedded Systems Exhibition. This will be another opportunity to meet.

I hope to see you in October at ELCE !

Add an introduction sequence to a Theora video

FilmTo produce new HD videos from the Embedded Linux Conference, we needed a new script to add an introduction sequence to the videos we processed.

To make all this automatic, and to suppress the need to go through interactive tools (such as kdenlive) to produce a title with a fade-in / fade-out effect, I created a new theora-intro script. The process got much more complex than I expected, but fortunately, I got a lot of valuable advise and help from the Theora mailing list, and could hide all this complexity inside the script. That’s what scripts are for!

A nice feature. compared to the interactive video editors, is that no extra video processing pass is needed. Thanks to this, even a big video file can be processed within a few minutes. Another useful feature is that the script preserves the Ogg metadata (artist, title, location, copyright…).

So, if you wish to add an introduction sequence to your own videos, whatever their initial format, first convert them to Theora using ffmpeg2theora. Then, create a title image for your video, and then run theora-intro on it.

You can find all useful details on our theora-intro page.