New training materials: boot time reduction workshop

We are happy to release new training materials that we have developed in 2013 with funding from Atmel Corporation.

The materials correspond to a 1-day embedded Linux boot time reduction workshop. In addition to boot time reduction theory, consolidating some of our experience from our embedded Linux boot time reduction projects, the workshop allows participants to practice with the most common techniques. This is done on SAMA5D3x Evaluation Kits from Atmel.

The system to optimize is a video demo from Atmel. We reduce the time to start a GStreamer based video player. During the practical labs, you will practice with techniques to:

  • Measure the various steps of the boot process
  • Analyze time spent starting system services, using bootchartd
  • Simplify your init scripts
  • Trace application startup with strace
  • Find kernel functions taking the most time during the boot process
  • Reduce kernel size and boot time
  • Replace U-Boot by the Barebox bootloader, and save a lot of time
    thanks to the activation of the data cache.

Creative commonsAs usual, our training materials are available under the terms of the Creative Commons Attribution-ShareAlike 3.0 license. This essentially means that you are free to download, distribute and even modify them, provided you mention us as the original authors and that you share these documents under the same conditions.

Special thanks to Atmel for allowing us to share these new materials under this license!

Here are the documents at last:

The first public session of this workshop will be announced in the next weeks.
Don’t hesitate to contact us if you are interested in organizing a session on your site.

Starting Linux directly from AT91bootstrap3

Here is an update for our previous article on booting linux directly from AT91bootstrap. On newer ATMEL platforms, you will have to use AT91bootstrap 3. It now has a convenient way to be configured to boot directly to Linux.

You can check it out from github:

git clone git://github.com/linux4sam/at91bootstrap.git

That version of AT91bootstrap is using the same configuration mechanism as the Linux kernel. You will find default configurations, named in the form:
<board_name><storage>_<boot_strategy>_defconfig

  • board_name can be: at91sam9260ek, at91sam9261ek, at91sam9263ek, at91sam9g10ek, at91sam9g20ek, at91sam9m10g45ek, at91sam9n12ek, at91sam9rlek, at91sam9x5ek, at91sam9xeek or at91sama5d3xek
  • storage can be:
    • df for DataFlash
    • nf for NAND flash
    • sd for SD card
  • our main interest will be in boot_strategy which can be:
    • uboot: start u-boot or any other bootloader
    • linux: boot Linux directly, passing a kernel command line
    • linux_dt: boot Linux directly, using a Device Tree
    • android: boot Linux directly, in an Android configuration

Let’s take for example the latest evaluation boards from ATMEL, the SAMA5D3x-EK. If you are booting from NAND flash:

make at91sama5d3xeknf_linux_dt_defconfig
make

You’ll end up with a file named at91sama5d3xek-nandflashboot-linux-dt-3.5.4.bin in the binaries/ folder. This is your first stage bootloader. It has the same storage layout as used in the u-boot strategy so you can flash it and it will work.

As a last note, I’ll had that less is not always faster. On our benchmarks, booting the SAMA5D31-EK using AT91bootstrap, then Barebox was faster than just using AT91bootstrap. The main reason is that barebox is actually enabling the caches and decompresses the kernel(see below, the kernel is also enaling the caches before decompressing itself) before booting.

Linux on ARM: xz kernel decompression benchmarks

I recently managed to find time to clean up and submit my patches for xz kernel compression support on ARM, which I started working on back in November, during my flight to Linaro Connect. However, it was too late as Russell King, the ARM Linux maintainer, alreadyaccepted a similar patch, about 3 weeks before my submission. The lesson I learned was that checking a git tree is not always sufficient. I should have checked the mailing list archives too.

The good news is that xz kernel compression support should be available in Linux 3.4 in a few months from now. xz is a compression format based on the LZMA2 compression algorithm. It can be considered as the successor of lzma, and achieves even better compression ratios!

Before submitting my patches, I ran a few benchmarks on my own implementation. As the decompressing code is the same, the results should be the same as if I had used the patches that are going upstream.

Benchmark methodology

For both boards I tested, I used the same pre 3.3 Linux kernel from Linus Torvalds’ mainline git tree. I also used the U-boot bootloader in both cases.

I used the very useful grabserial script from Tim Bird. This utility reads messages coming out of the serial line, and adds timestamps to each line it receives. This allow to measure time from the earliest power on stages, and doesn’t slow down the target system by adding instrumentation to it.

Our benchmarks just measure the time for the bootloader to copy the kernel to RAM, and then the time taken by the kernel to uncompress itself.

  • Loading time is measured between “reading uImage” and “OK” (right before “Starting kernel”) in the bootloader messages.
  • Compression time measured between “Uncompressing Linux” and “done”:
    ~/bin/grabserial -v -d /dev/ttyUSB0 -e 15 -t -m "Uncompressing Linux" -i "done," > booting-lzo.log

Benchmarks on OMAP4 Panda

The Panda board has a fast dual Cortex A9 CPU (OMAP 4430) running at 1 GHz. The standard way to boot this board is from an MMC/SD card. Unfortunately, the MMC/SD interface of the board is rather slow.

In this case, we have a fast CPU, but with rather slow storage. Therefore, the time taken to copy the kernel from storage to RAM is expected to have a significant impact on boot time.

This case typically represents todays multimedia and mobile devices such as phones, media players and tablets.

Compression Size Loading time Uncompressing time Total time
gzip 3355768 2.213376 0.501500 2.714876
lzma 2488144 1.647410 1.399552 3.046962
xz 2366192 1.566978 1.299516 2.866494
lzo 3697840 2.471497 0.160596 2.632093
None 6965644 4.626749 0 4.626749

Results on Calao Systems USB-A9263 (AT91)

The USB-A9263 board from Calao Systems has a cheaper and much slower AT91SAM9263 CPU running at 200 MHz.

Here we are booting from NAND flash, which is the fastest way to boot a kernel on this board. Note that we are using the nboot command from U-boot, which guarantees that we just copy the number of bytes specified in the uImage header.

In this case, we have a slow CPU with slow storage. Therefore, we expect both the kernel size and the decompression algorithm to have a major impact on boot time.

This case is a typical example of industrial systems (AT91SAM9263 is still very popular in such applications, as we can see from customer requests), booting from NAND storage operating with a 200 to 400 MHz CPU.

Compression Size Loading time Uncompressing time Total time
gzip 2386936 5.843289 0.935495 6.778784
lzma 1794344 4.465542 6.513644 10.979186
xz 1725360 4.308605 4.816191 9.124796
lzo 2608624 6.351539 0.447336 6.798875
None 4647908 11.080560 0 11.080560

Lessons learned

Here’s what we learned from these benchmarks:

  • lzo is still the best solution for minimum boot time. Remember, lzo kernel compression was merged by Free Electrons.
  • xz is always better than lzma, both in terms of image size. Therefore, there’s no reason to stick to lzma compression if you used it.
  • Because of their heavy CPU usage, lzma and xz remain pretty bad in terms of boot time, on most types of storage devices. On systems with a fast CPU, and very slow storage though, xz should be the best solution
  • On systems with a fast CPU, like the Panda board, boot time with xz is actually pretty close to lzo, and therefore can be a very interesting compromise between kernel size and boot time.
  • Using a kernel image without compression is rarely a worthy solution, except in systems with a very slow CPU. This is the case of CPUs emulated on an FPGA (typically during chip development, before silicon is available). In this particular case, copying to memory is directly done by the emulator, and we just need CPU cycles to start the kernel.

Embedded Linux boot time reduction presentation for GENIVI

GENIVI LogoI was invited to speak at the GENIVI All Members Meeting that took place on May 3-6 in Dublin, Ireland. This was a very interesting opportunity to meet new people in the In Vehicle Infotainment (IVI) industry and community.

In addition to the friendly social event at the Guiness Brewery, there was also a very interesting technical showcase of products and software using the GENIVI stack. I could observe that Freescale and ARM chips in general dominate this market. I also wore my Linaro shirt and had interesting discussions with several people about partnership opportunities between GENIVI and Linaro.

I gave a presentation about reducing boot time in embedded Linux systems. The slides are available in PDF and ODF formats, and as usual, are released with a Creative Commons Attribution – Share Alike 3.0 license. Here is the description of the talk:

Cheap Linux boot time reduction techniques

By Michael Opdenacker, Free Electrons

More and more feature rich Linux devices are put in the hands of consumers, and the average consumer shouldn’t even notice that they run Linux. To make the OS invisible, the system should boot in a flash.

Multiple boot time reduction techniques are now available, and can be used at the end of a development project, without incurring redesign costs. This presentation will guide embedded Linux system developers through the most effective ones. For each technique, we will detail how to use it and will report the exact savings achieved on a real embedded board.

Author’s biography

Michael Opdenacker is the founder of Free Electrons (http://free-electrons.com), a company offering development, consulting and training services to embedded Linux system developers worldwide. He is always looking for innovative techniques to share with customers and with the community.

Michael is also the Community Manager for Linaro (http://linaro.org), a not-for-profit engineering organization working on software foundations for Linux on ARM, to reduce fragmentation between ARM chip vendors, increase product performance and reduce time to market. Linaro currently employs more than 100 of the most active developers in the ARM and embedded Linux community.

I was pleased to have a good number of participants, and to get many questions during and after the talk.

Though GENIVI is about Free and Open Source Software, it is unfortunately not very open to the community yet. You have to become a member to access its specifications, wiki and other technical resources. While collecting membership fees makes sense to operate such an organization, and is acceptable for system makers, it makes it difficult for embedded Linux community developers to get involved. I hope that GENIVI will become more open to the wider embedded Linux community in the future.

Linux 2.6.33 features for embedded systems

Interesting features for embedded Linux system developers

Penguin workerLinux 2.6.33 was out on Feb. 24, 2010, and to incite you to try this new kernel in your embedded Linux products, here are features you could be interested in.

The first news is the availability of the LZO algorithm for kernel and initramfs compression. Linux 2.6.30 already introduced LZMA and BZIP2 compression options, which could significantly reduce the size of the kernel and initramfs images, but at the cost of much increased decompression time. LZO compression is a nice alternative. Though its compression rate is not as good as that of ZLIB (10 to 15% larger files), decompression time is much faster than with other algorithms. See our benchmarks. We reduced boot time by 200 ms on our at91 arm system, and the savings could even increase with bigger kernels.

This feature was implemented by my colleague Albin Tonnerre. It is currently available on x86 and arm (commit, commit, commit, commit), and according to Russell King, the arm maintainer, it should become the default compression option on this platform. This compressor can also be used on mips, thanks to Wu Zhangjin (commit).

For systems lacking RAM resources, a new useful feature is Compcache, which allows to swap application memory to a compressed cache in RAM. In practise, this technique increases the amount of RAM that applications can use. This could allow your embedded system or your netbook to run applications or environments it couldn’t execute before. This technique can also be a worthy alternative to on-disk swap in servers or desktops which do need a swap partition, as access performance is much improved. See this LWN.net article for details.

This new kernel also carries lots of improvements on embedded platforms, especially on the popular TI OMAP platform. In particular, we noticed early support to the IGEPv2 board, a very attractive platform based on the TI OMAP 3530 processor, much better than the Beagle Board for a very similar price. We have started to use it in customer projects, and we hope to contribute to its full support in the mainline kernel.

Another interesting feature of Linux 2.6.33 is the improvements in the capabilities of the perf tool. In particular, perf probe allows to insert Kprobes probes through the command line. Instead of SystemTap, which relied on kernel modules, perf probe now relies on a sysfs interface to pass probes to the kernel. This means that you no longer need a compiler and kernel headers to produce your probes. This made it difficult to port SystemTap to embedded platforms. The arm architecture doesn’t have performance counters in the mainline kernel yet (other architectures do), but patches are available. This carries the promise to be able to use probe tools like SystemTap at last on embedded architectures, all the more if SystemTap gets ported to this new infrastructure.

Other noticeable improvements in this release are the ability to mount ext3 and ext2 filesystems with just an ext4 driver, a lightweight RCU implementation, as well as the ability to change the default blinking cursor that is shown at boot time.

Unfortunately, each kernel release doesn’t only carry good news. Android patches got dropped from this release, because of a lack of interest from Google to maintain them. These are sad news and a threat for Android users who may end up without the ability to use newer kernel features and releases. Let’s hope that Google will once more realize the value of converging with the mainline Linux community. I hope that key contributors that this company employs (Andrew Morton in particular) will help to solve this issue.

As usual, this was just a selection. You will probably find many other interesting features on the Linux Changes page for Linux 2.6.33.

LZO kernel compression

As Michael stated in his review of the interesting features in Linux 2.6.30, new compression options have been recently added to the kernel. We therefore decided to have a look at those compression methods, from a compression ratio and decompression speed point of view.

This comparison will be based on “self-extractible kernels”, that is, kernel images containing bootstrap code allowing them to extract a compressed image. As underlined in the previous article, this approach is not used on all architectures. Blackfin, notably, chose a different path and compresses the whole kernel image, without including bootstrap code. While this has the clear advantage of making compression much simpler with respect to kernel code, it forces decompression out to the bootloader code.

Each of those methods has its advantages. Indeed, the Blackfin approach relies on the bootloader to provide the necessary functions, so that may be a problem to do things like bypassing u-boot to reduce the boot time. On the other hand, implementing it only once in the bootloader (as architecture-independent code) makes it unnecessary to write the low-level bootstrap code for each architecture in the kernel, which is surely interesting on virtually all architectures, the notable exceptions being x86 and ARM.

Gzip (also known as Zlib or inflate) has been the traditional (and, as a matter of fact, only) method used to compress kernels. Consequently, we’ll use it as the reference in the following tests. Our test environment is as follows:

ARM9 AT91SAM9263 CPU, 200MHz, using the mainline arch/arm/configs/usb-a9263.config.

This comparison includes figures for LZO, a new kernel image compression method that I have contributed to the Linux sources, and which hopefully will make its way into the mainline kernel. LZO support in the kernel is only new for kernel decompression, as it is already used by JFFS2 and UBIFS. LZO is a stream-oriented algorithm, and although its compression ratio is lower than that of gzip, decompression is lightning-fast, as we will soon find out.

So, here are the figures, average on 20 boots with each compression method:

Uncompressed 3.24Mo 200%
LZO 1.76Mo 0.552s 70% 109%
Gzip 1.62Mo 0.775s 100% 100%
LZMA 1.22Mo 5.011s 646% 75%

Bzip2 has not been tested here: the low-level bootstrap file, head.S, only allocates 64Kb for use by malloc() on ARM. Some quick tests showed that the kernel would not extract with less than 3.5Mib of malloc() space. That would require to modify head.S so that malloc can use more memory, which we will not do here. However, given that enough memory is usable on the system, one could well use bzip2. All the other algorithms performed the extraction using the standard 64Kib malloc space.

From the results, we can clearly see that LZMA is nearly unsuitable for our system, and should be considered only if the space constraints for storing the kernel are so tight that we can’t afford to use more space that was is strictly necessary.

LZO looks like a good candidate when it comes to speeding up the boot process, at the expense of some (almost neglectable) extra space. Gzip is close to LZO when it comes to size, although extraction is not as fast. That means that unless you’re hitting corner cases, like only having enough space for a Gzip compressed image but not for one made with LZO, choosing the latter is probably a safe bet.

Besides, the LZO-compressed kernel size is about 54% the size of the uncompressed kernel. As the kernel load time varies linearly with its size, load time for an uncompressed kernel doubles. While 0.55s are won because there’s no need to run a decompression algorithm, you spend twice as much time loading the kernel. This time is not negligible at all compared to the decompression time. Indeed, loading the uncompressed image takes roughly 0.8s. That means that at the cost of slowing down the boot process by 0.15s (compared to an uncompressed kernel), one gets a kernel image which is roughly twice as small. Rather nice, isn’t it?

Free Electrons at ELCE 2009

Grenoble

As usual, we won’t miss this year’s edition of the Embedded Linux Conference Europe, which has always been a great source of information and encounters for embedded Linux developers.

Here are details about our involvement this year.

  • I am part of the organization committee, in particular the coordinator for the Technical Showcase.
  • Taking advantage of his stay in Grenoble, my colleague Thomas Petazzoni will make an embedded Linux presentation on Tuesday, Oct. 13 at 7:30 pm, at GUILDE, the local Linux user group.
  • Thomas and I will be present at the Embedded Systems Exhibition on Wednesday, Oct. 14, sharing a booth with our partner CALAO Systems. The exhibition entry is free of charge, and this will be an excellent opportunity to meet us and have enough time to talk about your topics of interest.
  • Thomas will lead the Buildroot BOF with Peter Korsgaard, Buildroot’s maintainer, at 5:35 pm on Thursday, Oct. 15. This informal session will allow users and developers to meet and exchange ideas.
  • I will be the leader of the Small Business BOF on Thursday 15 at 6:35 pm, an informal session for small embedded Linux companies interested in sharing experience and best practices, and of course to know each other better.
  • I will make a presentation on boot time reduction techniques, at 3:40 on Friday, Oct. 16.
  • Albin Tonnerre, who was an intern at Free Electrons this summer, will participate to the Technical Showcase at 12:00 am on Friday, Oct. 16, showing the benefits of LZO decompression on kernel boot time. During his internship, Albin made very nice contributions to boot time reduction, power management on AT91 and to U-boot board support.
  • Thomas Petazzoni will also participate to the Technical Showcase at the same time, showing Buidroot’s new features.
  • We will videotape the conferences we go to and will release the videos later on our website.
  • Thomas organizes a Buildroot developer day on Saturday, Oct. 17, allowing developers to meet and code together. Free Electrons will offer lunch to the participants, and the room will be offered by CALAO Systems. There are no more seats left for space reasons.

Hope to see you in Grenoble!