New training course: embedded Linux boot time optimization

For many embedded products, the issue of how much time it takes from power-on to the application being fully usable by the end-user is an important challenge. Bootlin has been providing its expertise and experience in this area to its customers for many years through numerous boot time optimization projects, and we have shared this knowledge through a number of talks at several conferences over the past years.

We are now happy to announce that we have a new training course Embedded Linux boot time optimization, open for public registration. This training course was already given to selected Bootlin customers and is now available for everyone.

Embedded Linux boot time optimization

The training course will be lead by Michael Opdenacker, Bootlin’s founder, and author of several publications on the topic of boot time optimization. The course is organized over 4 sessions of 4 hours, with a significant fraction of time spent on practical demonstrations showing on a real-life example the techniques to measure and reduce the boot time of an embedded Linux system.

As usual with Bootlin, the training materials are fully available: Agenda, Slides and Practical lab instructions.

Boot time optimization slide

Our first course open for public registration will take place from April 6th to April 9th, 2021, from 14:00 to 18:00 UTC+2 (Paris time) on each day. The session cost is 519 EUR if you take advantage of the early bird price available until March 9th. Otherwise, the regular rate is 619 EUR. You can register now for this course on Eventbrite.

Also, if you’re interested in organizing a dedicated session for your company, do not hesitate to contact us.

New training materials: boot time reduction workshop

We are happy to release new training materials that we have developed in 2013 with funding from Atmel Corporation.

The materials correspond to a 1-day embedded Linux boot time reduction workshop. In addition to boot time reduction theory, consolidating some of our experience from our embedded Linux boot time reduction projects, the workshop allows participants to practice with the most common techniques. This is done on SAMA5D3x Evaluation Kits from Atmel.

The system to optimize is a video demo from Atmel. We reduce the time to start a GStreamer based video player. During the practical labs, you will practice with techniques to:

  • Measure the various steps of the boot process
  • Analyze time spent starting system services, using bootchartd
  • Simplify your init scripts
  • Trace application startup with strace
  • Find kernel functions taking the most time during the boot process
  • Reduce kernel size and boot time
  • Replace U-Boot by the Barebox bootloader, and save a lot of time
    thanks to the activation of the data cache.

Creative commonsAs usual, our training materials are available under the terms of the Creative Commons Attribution-ShareAlike 3.0 license. This essentially means that you are free to download, distribute and even modify them, provided you mention us as the original authors and that you share these documents under the same conditions.

Special thanks to Atmel for allowing us to share these new materials under this license!

Here are the documents at last:

The first public session of this workshop will be announced in the next weeks.
Don’t hesitate to contact us if you are interested in organizing a session on your site.

Starting Linux directly from AT91bootstrap3

Here is an update for our previous article on booting linux directly from AT91bootstrap. On newer ATMEL platforms, you will have to use AT91bootstrap 3. It now has a convenient way to be configured to boot directly to Linux.

You can check it out from github:

git clone git://github.com/linux4sam/at91bootstrap.git

That version of AT91bootstrap is using the same configuration mechanism as the Linux kernel. You will find default configurations, named in the form:
<board_name><storage>_<boot_strategy>_defconfig

  • board_name can be: at91sam9260ek, at91sam9261ek, at91sam9263ek, at91sam9g10ek, at91sam9g20ek, at91sam9m10g45ek, at91sam9n12ek, at91sam9rlek, at91sam9x5ek, at91sam9xeek or at91sama5d3xek
  • storage can be:
    • df for DataFlash
    • nf for NAND flash
    • sd for SD card
  • our main interest will be in boot_strategy which can be:
    • uboot: start u-boot or any other bootloader
    • linux: boot Linux directly, passing a kernel command line
    • linux_dt: boot Linux directly, using a Device Tree
    • android: boot Linux directly, in an Android configuration

Let’s take for example the latest evaluation boards from ATMEL, the SAMA5D3x-EK. If you are booting from NAND flash:

make at91sama5d3xeknf_linux_dt_defconfig
make

You’ll end up with a file named at91sama5d3xek-nandflashboot-linux-dt-3.5.4.bin in the binaries/ folder. This is your first stage bootloader. It has the same storage layout as used in the u-boot strategy so you can flash it and it will work.

As a last note, I’ll had that less is not always faster. On our benchmarks, booting the SAMA5D31-EK using AT91bootstrap, then Barebox was faster than just using AT91bootstrap. The main reason is that barebox is actually enabling the caches and decompresses the kernel(see below, the kernel is also enaling the caches before decompressing itself) before booting.

Linux on ARM: xz kernel decompression benchmarks

I recently managed to find time to clean up and submit my patches for xz kernel compression support on ARM, which I started working on back in November, during my flight to Linaro Connect. However, it was too late as Russell King, the ARM Linux maintainer, alreadyaccepted a similar patch, about 3 weeks before my submission. The lesson I learned was that checking a git tree is not always sufficient. I should have checked the mailing list archives too.

The good news is that xz kernel compression support should be available in Linux 3.4 in a few months from now. xz is a compression format based on the LZMA2 compression algorithm. It can be considered as the successor of lzma, and achieves even better compression ratios!

Before submitting my patches, I ran a few benchmarks on my own implementation. As the decompressing code is the same, the results should be the same as if I had used the patches that are going upstream.

Benchmark methodology

For both boards I tested, I used the same pre 3.3 Linux kernel from Linus Torvalds’ mainline git tree. I also used the U-boot bootloader in both cases.

I used the very useful grabserial script from Tim Bird. This utility reads messages coming out of the serial line, and adds timestamps to each line it receives. This allow to measure time from the earliest power on stages, and doesn’t slow down the target system by adding instrumentation to it.

Our benchmarks just measure the time for the bootloader to copy the kernel to RAM, and then the time taken by the kernel to uncompress itself.

  • Loading time is measured between “reading uImage” and “OK” (right before “Starting kernel”) in the bootloader messages.
  • Compression time measured between “Uncompressing Linux” and “done”:
    ~/bin/grabserial -v -d /dev/ttyUSB0 -e 15 -t -m "Uncompressing Linux" -i "done," > booting-lzo.log

Benchmarks on OMAP4 Panda

The Panda board has a fast dual Cortex A9 CPU (OMAP 4430) running at 1 GHz. The standard way to boot this board is from an MMC/SD card. Unfortunately, the MMC/SD interface of the board is rather slow.

In this case, we have a fast CPU, but with rather slow storage. Therefore, the time taken to copy the kernel from storage to RAM is expected to have a significant impact on boot time.

This case typically represents todays multimedia and mobile devices such as phones, media players and tablets.

Compression Size Loading time Uncompressing time Total time
gzip 3355768 2.213376 0.501500 2.714876
lzma 2488144 1.647410 1.399552 3.046962
xz 2366192 1.566978 1.299516 2.866494
lzo 3697840 2.471497 0.160596 2.632093
None 6965644 4.626749 0 4.626749

Results on Calao Systems USB-A9263 (AT91)

The USB-A9263 board from Calao Systems has a cheaper and much slower AT91SAM9263 CPU running at 200 MHz.

Here we are booting from NAND flash, which is the fastest way to boot a kernel on this board. Note that we are using the nboot command from U-boot, which guarantees that we just copy the number of bytes specified in the uImage header.

In this case, we have a slow CPU with slow storage. Therefore, we expect both the kernel size and the decompression algorithm to have a major impact on boot time.

This case is a typical example of industrial systems (AT91SAM9263 is still very popular in such applications, as we can see from customer requests), booting from NAND storage operating with a 200 to 400 MHz CPU.

Compression Size Loading time Uncompressing time Total time
gzip 2386936 5.843289 0.935495 6.778784
lzma 1794344 4.465542 6.513644 10.979186
xz 1725360 4.308605 4.816191 9.124796
lzo 2608624 6.351539 0.447336 6.798875
None 4647908 11.080560 0 11.080560

Lessons learned

Here’s what we learned from these benchmarks:

  • lzo is still the best solution for minimum boot time. Remember, lzo kernel compression was merged by Bootlin.
  • xz is always better than lzma, both in terms of image size. Therefore, there’s no reason to stick to lzma compression if you used it.
  • Because of their heavy CPU usage, lzma and xz remain pretty bad in terms of boot time, on most types of storage devices. On systems with a fast CPU, and very slow storage though, xz should be the best solution
  • On systems with a fast CPU, like the Panda board, boot time with xz is actually pretty close to lzo, and therefore can be a very interesting compromise between kernel size and boot time.
  • Using a kernel image without compression is rarely a worthy solution, except in systems with a very slow CPU. This is the case of CPUs emulated on an FPGA (typically during chip development, before silicon is available). In this particular case, copying to memory is directly done by the emulator, and we just need CPU cycles to start the kernel.

Embedded Linux boot time reduction presentation for GENIVI

GENIVI LogoI was invited to speak at the GENIVI All Members Meeting that took place on May 3-6 in Dublin, Ireland. This was a very interesting opportunity to meet new people in the In Vehicle Infotainment (IVI) industry and community.

In addition to the friendly social event at the Guiness Brewery, there was also a very interesting technical showcase of products and software using the GENIVI stack. I could observe that Freescale and ARM chips in general dominate this market. I also wore my Linaro shirt and had interesting discussions with several people about partnership opportunities between GENIVI and Linaro.

I gave a presentation about reducing boot time in embedded Linux systems. The slides are available in PDF and ODF formats, and as usual, are released with a Creative Commons Attribution – Share Alike 3.0 license. Here is the description of the talk:

Cheap Linux boot time reduction techniques

By Michael Opdenacker, Bootlin

More and more feature rich Linux devices are put in the hands of consumers, and the average consumer shouldn’t even notice that they run Linux. To make the OS invisible, the system should boot in a flash.

Multiple boot time reduction techniques are now available, and can be used at the end of a development project, without incurring redesign costs. This presentation will guide embedded Linux system developers through the most effective ones. For each technique, we will detail how to use it and will report the exact savings achieved on a real embedded board.

Author’s biography

Michael Opdenacker is the founder of Bootlin (https://bootlin.com), a company offering development, consulting and training services to embedded Linux system developers worldwide. He is always looking for innovative techniques to share with customers and with the community.

Michael is also the Community Manager for Linaro (http://linaro.org), a not-for-profit engineering organization working on software foundations for Linux on ARM, to reduce fragmentation between ARM chip vendors, increase product performance and reduce time to market. Linaro currently employs more than 100 of the most active developers in the ARM and embedded Linux community.

I was pleased to have a good number of participants, and to get many questions during and after the talk.

Though GENIVI is about Free and Open Source Software, it is unfortunately not very open to the community yet. You have to become a member to access its specifications, wiki and other technical resources. While collecting membership fees makes sense to operate such an organization, and is acceptable for system makers, it makes it difficult for embedded Linux community developers to get involved. I hope that GENIVI will become more open to the wider embedded Linux community in the future.