size – Bootlin

LZO kernel compression

As Michael stated in his review of the interesting features in Linux 2.6.30, new compression options have been recently added to the kernel. We therefore decided to have a look at those compression methods, from a compression ratio and decompression speed point of view.

This comparison will be based on “self-extractible kernels”, that is, kernel images containing bootstrap code allowing them to extract a compressed image. As underlined in the previous article, this approach is not used on all architectures. Blackfin, notably, chose a different path and compresses the whole kernel image, without including bootstrap code. While this has the clear advantage of making compression much simpler with respect to kernel code, it forces decompression out to the bootloader code.

Each of those methods has its advantages. Indeed, the Blackfin approach relies on the bootloader to provide the necessary functions, so that may be a problem to do things like bypassing u-boot to reduce the boot time. On the other hand, implementing it only once in the bootloader (as architecture-independent code) makes it unnecessary to write the low-level bootstrap code for each architecture in the kernel, which is surely interesting on virtually all architectures, the notable exceptions being x86 and ARM.

Gzip (also known as Zlib or inflate) has been the traditional (and, as a matter of fact, only) method used to compress kernels. Consequently, we’ll use it as the reference in the following tests. Our test environment is as follows:

ARM9 AT91SAM9263 CPU, 200MHz, using the mainline arch/arm/configs/usb-a9263.config.

This comparison includes figures for LZO, a new kernel image compression method that I have contributed to the Linux sources, and which hopefully will make its way into the mainline kernel. LZO support in the kernel is only new for kernel decompression, as it is already used by JFFS2 and UBIFS. LZO is a stream-oriented algorithm, and although its compression ratio is lower than that of gzip, decompression is lightning-fast, as we will soon find out.

So, here are the figures, average on 20 boots with each compression method:

Uncompressed	3.24Mo	–	–	200%
LZO	1.76Mo	0.552s	70%	109%
Gzip	1.62Mo	0.775s	100%	100%
LZMA	1.22Mo	5.011s	646%	75%

Bzip2 has not been tested here: the low-level bootstrap file, head.S, only allocates 64Kb for use by malloc() on ARM. Some quick tests showed that the kernel would not extract with less than 3.5Mib of malloc() space. That would require to modify head.S so that malloc can use more memory, which we will not do here. However, given that enough memory is usable on the system, one could well use bzip2. All the other algorithms performed the extraction using the standard 64Kib malloc space.

From the results, we can clearly see that LZMA is nearly unsuitable for our system, and should be considered only if the space constraints for storing the kernel are so tight that we can’t afford to use more space that was is strictly necessary.

LZO looks like a good candidate when it comes to speeding up the boot process, at the expense of some (almost neglectable) extra space. Gzip is close to LZO when it comes to size, although extraction is not as fast. That means that unless you’re hitting corner cases, like only having enough space for a Gzip compressed image but not for one made with LZO, choosing the latter is probably a safe bet.

Besides, the LZO-compressed kernel size is about 54% the size of the uncompressed kernel. As the kernel load time varies linearly with its size, load time for an uncompressed kernel doubles. While 0.55s are won because there’s no need to run a decompression algorithm, you spend twice as much time loading the kernel. This time is not negligible at all compared to the decompression time. Indeed, loading the uncompressed image takes roughly 0.8s. That means that at the cost of slowing down the boot process by 0.15s (compared to an uncompressed kernel), one gets a kernel image which is roughly twice as small. Rather nice, isn’t it?

Linux 2.6.30 – New features for embedded systems

Interesting features in Linux 2.6.30 for embedded system developers

Linux 2.6.30 has been released almost 1 month ago and it’s high time to write a little about it. Like every new kernel release, it contains interesting features for embedded system developers.

The first feature that stands out is support for fastboot. Today, most devices are initialized in a sequential way. As scanning and probing the hardware often requires waiting for the devices to be ready, a significant amount of cpu time is wasted in delay loops. The fastboot infrastructure allows to run device initialization routines in parallel, keeping the cpu fully busy and thus reducing boot time in a significant way. Fasboot can be enabled by passing the fastboot parameter in the kernel command line. However, unless your embedded system uses PC hardware, don’t be surprised if you don’t get any boot time reduction yet. If you look at the code, you will see that the async_schedule function is only used by 4 drivers so far, mainly for disk drives. You can see that board support code and most drivers still need to be converted to this new infrastructure. Let’s hope this will progress in future kernel releases, bringing significant boot time savings to everyone.

Linux 2.6.30 also features the inclusion of Tomoyo Linux, a lightweight Mandatory Access Control system developed by NTT. According to presentations I saw quite a long time ago, Tomoyo can be used as an alternative to SELinux, and it just consumes a very reasonable amount of RAM and storage space. It should interest people making embedded devices which are always connected to the network, and need strong security.

Another nice feature is support for kernel images compressed with bzip2 and lzma, and not just with zlib as it’s been the case of ages. The bzip2 and lzma compressors allow to reduce the size of a compressed kernel in a significant way. This should appeal to everyone interested in saving a few hundreds of kilobytes of storage space. Just beware that decompressing these formats requires more CPU resources, so there may be a price to pay in terms of boot time if you have a slow cpu, like in many embedded systems. On the other hand, if you have a fast cpu and rather slow I/O, like in a PC, you may also see a reduction in boot time. In this case, you would mainly save I/O time copying a smaller kernel to RAM, and with a fast cpu, the extra decompression cost wouldn’t be too high.

However, if you take a closer look at this new feature, you will find that it is only supported on x86 and blackfin. Alain Knaff, the author of the original patches, did summit a patch for the arm architecture, but it didn’t make it this time. Upon my request, Alain posted an update to this arm patch. Unfortunately, decompressing the kernel no longer works after applying this patch. There seems to be something wrong with the decompression code… Stay tuned on the LKML to follow up this issue. Note that the blackfin maintainers took another approach, apparently. They didn’t include any decompression code on this architecture. Instead, they relied on the bootloader to take care of decompression. While this is simpler, at least from the kernel code point of view, this is not a satisfactory solution. It would be best if the arm kernel bootstrap code took care of this task, which would then work with any board and any bootloader.

Another interesting feature is the inclusion of the Microblaze architecture, a soft core cpu on Xilinx FPGAs. This MMU-less core has been supported for quite a long time by uClinux, and it’s good news that it is now supported in the mainline kernel. This guarantees that this cpu will indeed be maintained for a long time, and could thus be a good choice in your designs.

Other noteworthy features are support for threaded interrupt handlers (which shows that work to merge the real-time preempt patches is progressing), ftrace support in 32 and 64 bit powerpc, new tracers, and of course, several new embedded boards and many new device drivers.

As usual, full details can be found on the Linux Kernel Newbies website.

clink

Compacts directories by replacing duplicate files by symbolic links

clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links.

clink saves space. It works particularly well with automatically generated directory structures, such as compiling toolchains.
clink uses relative links, making it possible to move processed directory structures
clink is fast. It reads each file only once and its runtime is mainly the time taken to read files.
clink is light. It consumes very little RAM. No problem to run it on huge filesystems!
clink is easy to use. Just download the script and run it!
clink is free. It is released under the terms of the GNU General Public License.

Usage

usage: clink [options] [files or directories]

Compacts folders by replacing identical files by symbolic links

options:
  --version      show program's version number and exit
  -h, --help     show this help message and exit
  -d, --dry-run  just reports identical files, doesn't make any change.

Screenshot

Downloads

Stable version:

clink-1.1.1,
signature
(Jun. 14, 2006)
Older releases:

clink-1.1,
signature
(Sep. 6, 2005)

clink-1.0,
signature
(Aug. 25, 2005)
ChangeLog

Here is the OpenPGP key used to generate the signatures.

How it works

clink reads all the files one by one, and computes their SHA (20 bytes) and MD5 (16 bytes) checksums. The trick to easily find identical files is a dictionary of files lists indexed by their SHA checksum.

All the files with the same SHA checksum are not immediately considered as identical. Their MD5 checksums and sizes are also compared then. There is an extremely low probability that files meeting all these 3 criteria at once are different. You are much more likely to face file corruption because of a hardware failure on your computer!

Hard links to the same contents are treated as regular files. Keeping one instance and replacing the others by symbolic links is harmless. Files implemented by symbolic links also have the advantage of not having their contents duplicated in tar archives.

Limitations and possible improvements

File permissions: clink just keeps one copy of duplicate files. The permissions of this file may be less strict than those of other duplicates. If permissions matter, enforce them by yourself after running clink.
Directory structure: even when entire directories are identical, clink just creates links between files. This is not fully optimal in this case, but it keeps clink simple.

Similar tools or alternatives

dupmerge2: replaces identical files by hardlinks.
finddup: finds identical files.

Demo: tiny qemu arm system with a DirectFB interface

A tiny embedded Linux system running on the qemu arm emulator, with a DirectFB interface, everything in 2.1 MB (including the kernel)!

Overview

This demo embedded Linux system has the following features:

Very easy to run demo, just 1 file to download and 1 command line to type!
Runs on qemu (easy to get for most GNU/Linux distributions), emulating an ARM Versatile PB board.
Available through a single file (kernel and root filesystem), sizing only 2.1 MB!
DirectFB graphical user interface.
Demonstrates the capabilities of qemu, the Linux kernel, BusyBox, DirectFB, and
shows the benefits of system size and boot time reduction techniques as advertised and supported by the CE Linux Forum.
License: GNU GPL for root filesystem scripts. Each software component has its own license.

How to run the demo

Make sure the qemu emulator is installed on your GNU/Linux distribution. The demo works with qemu 0.8.2 and beyond, but it may also work with earlier versions.
Download the vmlinuz-qemu-arm-2.6.20
binary.

Run the below command:

qemu-system-arm -M versatilepb -m 16 -kernel vmlinuz-qemu-arm-2.6.20 -append "clocksource=pit quiet rw"

When you reach the console prompt, you can try regular Unix commands but also the graphical demo:
```
run_demo
```

FAQ / Troubleshooting

Q: I get Could not initialize SDL - exiting when I try to run qemu.

That’s a qemu issue (qemu used the SDL library). Check that you can start graphical applications from your terminal (try xeyes or xterm for example). You may also need to check that you have name servers listed in /etc/resolv.conf. Anyway, you will find solutions for this issue on the Internet.

Screenshots

How to rebuild this demo

All the files needed to rebuild this demo are available here:

You can rebuild or upgrade the (Vanilla) kernel by using the given kernel configuration file.
The configuration file expects to find an initramfs source directory in ../rootfs, which
you can create by extracting the contents of the rootfs.tar.7z archive.
Of course, you can make changes to this root filesystem!

Tools and optimization techniques used in this demo

Software and development tools

The demo was built using Scratchbox, a fantastic development tool that makes cross-compiling transparent!
The demo includes BusyBox 1.4.1, an toolbox implementing most UNIX commands in a few hundreds of KB. In our case, BusyBox includes the most common commands (like a vi implementation), and only sizes 192 KB!
The root filesystem is shipped within the Linux kernel image, using the initramfs technique, which makes the kernel simpler and saves a dramatic amount of RAM (compared to an init ramdisk).
The demo is interfaced by DirectFB example programs (version 0.9.25, with DirectFB 1.0.0-rc4), which demonstrate the amazing capabilities of this library, created to meet the needs of embedded systems.

Size optimization techniques

The below optimization techniques were used to reduce the filesystem size from 74 MB to 3.3 MB (before compression in the Linux kernel image):

Removing development files: C headers and manual pages copied when installing tools and libraries, .a library files, gdbserver, strace, /usr/lib/libfakeroot, /usr/local/lib/pkgconfig…
Files not used by the demo programs: libstdc++, and any library or resource file.
Stripping and even super stripping (see sstrip) executables and libraries.
Reducing the kernel size using CONFIG_EMBEDDED switches, mainly from the
Linux Tiny project.

Techniques to reduce boot time

We used the below techniques to reduce boot time:

Disabled console output (quiet boot option, printk support was disabled anyway), which saves time scrolling the framebuffer console.
Use the Preset Loops per Jiffy technique to disable delay loop calculation, by feeding the kernel with a value measured in an earlier boot (lpj setting, which you may update according to the speed of your own workstation).

All these optimization techniques and other ones we haven’t tried yet are described either on the elinux.org Wiki or in our embedded Linux optimizations presentation.

Future work

We plan to implement a generic tool which would apply some of these techniques in an automatic way, to shrink an existing GNU/Linux or embedded Linux root filesystem without any loss in functionality. More in the next weeks or months!