Faster boot: starting Linux directly from AT91bootstrap

Reducing start-up time looks like one of the most discussed topics nowadays, for both embedded and desktop systems. Typically, the boot process consists of three steps: AT91SAM9263 CPU

  • First-stage bootloader
  • Second-stage bootloader
  • Linux kernel

The first-stage bootloader is often a tiny piece of code whose sole purpose is to bring the hardware in a state where it is able to execute more elaborate programs. On our testing board (CALAO TNY-A9260), it’s a piece of code the CPU stores in internal SRAM and its size is limited to 4Kib, which is a very small amount of space indeed. The second-stage bootloader often provides more advanced features, like downloading the kernel from the network, looking at the contents of the memory, and so on. On our board, this second-stage bootloader is the famous U-Boot.

One way of achieving a faster boot is to simply bypass the second-stage bootloader, and directly boot Linux from the first-stage bootloader. This first-stage bootloader here is AT91bootstrap, which is an open-source bootloader developed by Atmel for their AT91 ARM-based SoCs. While this approach is somewhat static, it’s suitable for production use when the needs are simple (like simply loading a kernel from NAND flash and booting it), and allows to effectively reduce the boot time by not loading U-Boot at all. On our testing board, that saves about 2s.

As we have the source, it’s rather easy to modify AT91bootstrap to suit our needs. To make things easier, we’ll boot using an existing U-Boot uImage. The only requirement is that it should be an uncompressed uImage, like the one automatically generated by make uImage when building the kernel (there’s not much point using such compressed uImage files on ARM anyway, as it is possible to build self-extractible compressed kernels on this platform).

Looking at the (shortened) main.c, the code that actually boots the kernel looks like this:

int main(void)
{
/* ================== 1st step: Hardware Initialization ================= */
/* Performs the hardware initialization */
hw_init();

/* Load from Nandflash in RAM */
load_nandflash(IMG_ADDRESS, IMG_SIZE, JUMP_ADDR);

/* Jump to the Image Address */
return JUMP_ADDR;
}

In the original source code, load_nandflash actually loads the second-stage bootloader, and then jumps directly to JUMP_ADDR (this value can be found in U-Boot as TEXT_BASE, in the board-specific file config.mk. This is the base address from which the program will be executed). Now, if we want to load the kernel directly instead of a second-level bootloader, we need to know a handful of values:

  • the kernel image address (we will reuse IMG_ADDRESS here, but one could
    imagine reading the actual image address from a fixed location in NAND)
  • the kernel size
  • the kernel load address
  • the kernel entry point

The last three values can be extracted from the uImage header. We will not hard-code the kernel size as it was previously the case (using IMG_SIZE), as this would lead to set a maximum size for the image and would force us to copy more data than necessary. All those values are stored as 32 bits bigendian in the header. Looking at the struct image_header declaration from image.h in the uboot-mkimage sources, we can see that the header structure is like this:

typedef struct image_header {
uint32_t    ih_magic;    /* Image Header Magic Number    */
uint32_t    ih_hcrc;    /* Image Header CRC Checksum    */
uint32_t    ih_time;    /* Image Creation Timestamp    */
uint32_t    ih_size;    /* Image Data Size        */
uint32_t    ih_load;    /* Data     Load  Address        */
uint32_t    ih_ep;        /* Entry Point Address        */
uint32_t    ih_dcrc;    /* Image Data CRC Checksum    */
uint8_t        ih_os;        /* Operating System        */
uint8_t        ih_arch;    /* CPU architecture        */
uint8_t        ih_type;    /* Image Type            */
uint8_t        ih_comp;    /* Compression Type        */
uint8_t        ih_name[IH_NMLEN];    /* Image Name        */
} image_header_t;

It’s quite easy to determine where the values we’re looking for actually are in the uImage header.

  • ih_size is the fourth member, hence we can find it at offset 12
  • ih_load and ih_ep are right after ih_size, and therefore can be found at offset 16 and 20.

A first call to load_nandflash is necessary to get those values. As the data we need are contained within the first 32 bytes, that’s all we need to load at first. However, some space is required in memory to actually store the data. The first-stage bootloader is running in internal SRAM, so we can pick any location we want in SDRAM. For the sake of simplicity, we’ll choose PHYS_SDRAM_BASEhere, which we define to the base address of the on-board SDRAM in the CPU address space. Then, a second call will be necessary to load the entire kernel image at the right load address.

Then all we need to do is:

#define be32_to_cpu(a) ((a)[0] << 24 | (a)[1] << 16 | (a)[2] << 8 | (a)[3])
#define PHYS_SDRAM_BASE 0x20000000

int main(void)
{
unsigned char *tmp;
unsigned long jump_addr;
unsigned long load_addr;
unsigned long size;

hw_init();

load_nandflash(IMG_ADDRESS, 0x20, PHYS_SDRAM_BASE);

/* Setup tmp so that we can read the kernel size */
tmp = PHYS_SDRAM_BASE + 12;
size = be32_to_cpu(tmp);

/* Now, load address */
tmp += 4;
load_addr = be32_to_cpu(tmp);

/* And finally, entry point */
tmp += 4;
jump_addr = be32_to_cpu(tmp);

/* Load the actual kernel */
load_nandflash(IMG_ADDRESS, size, load_addr - 0x40);

return jump_addr;
}

Note that the second call to load_nandflash could in theory be replaced by:

load_nandflash(IMG_ADDRESS + 0x40, size + 0x40, load_addr);

However, this will not work. What happens is that load_nandflash starts reading at an address aligned on a page boundary, so even when passing IMG_ADDRESS+0x40 as a first argument, reading will start at IMG_ADDRESS, leading to a failure (writes have to aligned on a page boundary, so it is safe to assume that IMG_ADDRESS is actually correctly aligned).

The above piece of code will silently fail if anything goes wrong, and does no checking at all – indeed, the binary size is very limited and we can’t afford to put more code than what is strictly necessary to boot the kernel.

ELC 2009 videos

My Colleague Thomas and I had the privilege to participate to the 2009 edition of the Embedded Linux Conference, which took place in San Francisco, on April 6-8. In spite of the weak economy, this event was once again a success. It attracted major developers from the embedded Linux community, as well as participants from all over the word.

Following the tradition, we are proud to release new videos about this event. These videos were shot by Satoru Ueda and Tim Bird (Sony), and by Thomas Petazzoni and Michael Opdenacker (Bootlin). For the first time, we used an HD camcorder to shoot some of the videos. A higher resolution allows to read the slides projected on the screen. As usual, the videos are released with a Creative Commons Attribution – ShareAlike 3.0 license.

Thomas and I found the following talks particularly interesting:

  • Ubiquitous Linux, by Dirk Hohndel
  • Embedded Linux and Mainline Kernel, by David Woodhouse
  • What are Interrupt Threads and How Do They Work?, by Reece Pollack
  • Visualizing Process Memory, by Matt Mackall
  • KProbes and Systemtap Status, by Tim Bird
  • Deploying LTTng on Exotic Embedded Architectures, by Mathieu Desnoyers
  • Embedded Linux on FPGAs for fun and profit, by Dr John Williams (Petalogix)
  • Linux on Embedded PowerPC porting guide, by Grant Likely
  • Understanding and writing an LLVM Compiler Backend, by Bruno Cardoso Lopes

You may be interested in watching the presentations we made and the BOFs we led:

  • Building Embedded Linux Systems with Buildroot, by Thomas Petazzoni. In these last months, Thomas has made big contributions to this build system.
  • Build tools BOF, by Thomas Petazzoni
  • Update on filesystems for flash storage, by Michael Opdenacker
  • System Size BOF, by Michael Opdenacker

Of course, lots of other talks were very interesting. See the whole list by yourself:

Linux 2.6.30 – New features for embedded systems

Interesting features in Linux 2.6.30 for embedded system developers

Linux 2.6.30 has been released almost 1 month ago and it’s high time to write a little about it. Like every new kernel release, it contains interesting features for embedded system developers.

The first feature that stands out is support for fastboot. Today, most devices are initialized in a sequential way. As scanning and probing the hardware often requires waiting for the devices to be ready, a significant amount of cpu time is wasted in delay loops. The fastboot infrastructure allows to run device initialization routines in parallel, keeping the cpu fully busy and thus reducing boot time in a significant way. Fasboot can be enabled by passing the fastboot parameter in the kernel command line. However, unless your embedded system uses PC hardware, don’t be surprised if you don’t get any boot time reduction yet. If you look at the code, you will see that the async_schedule function is only used by 4 drivers so far, mainly for disk drives. You can see that board support code and most drivers still need to be converted to this new infrastructure. Let’s hope this will progress in future kernel releases, bringing significant boot time savings to everyone.

Tomoyo LinuxLinux 2.6.30 also features the inclusion of Tomoyo Linux, a lightweight Mandatory Access Control system developed by NTT. According to presentations I saw quite a long time ago, Tomoyo can be used as an alternative to SELinux, and it just consumes a very reasonable amount of RAM and storage space. It should interest people making embedded devices which are always connected to the network, and need strong security.

Another nice feature is support for kernel images compressed with bzip2 and lzma, and not just with zlib as it’s been the case of ages. The bzip2 and lzma compressors allow to reduce the size of a compressed kernel in a significant way. This should appeal to everyone interested in saving a few hundreds of kilobytes of storage space. Just beware that decompressing these formats requires more CPU resources, so there may be a price to pay in terms of boot time if you have a slow cpu, like in many embedded systems. On the other hand, if you have a fast cpu and rather slow I/O, like in a PC, you may also see a reduction in boot time. In this case, you would mainly save I/O time copying a smaller kernel to RAM, and with a fast cpu, the extra decompression cost wouldn’t be too high.

However, if you take a closer look at this new feature, you will find that it is only supported on x86 and blackfin. Alain Knaff, the author of the original patches, did summit a patch for the arm architecture, but it didn’t make it this time. Upon my request, Alain posted an update to this arm patch. Unfortunately, decompressing the kernel no longer works after applying this patch. There seems to be something wrong with the decompression code… Stay tuned on the LKML to follow up this issue. Note that the blackfin maintainers took another approach, apparently. They didn’t include any decompression code on this architecture. Instead, they relied on the bootloader to take care of decompression. While this is simpler, at least from the kernel code point of view, this is not a satisfactory solution. It would be best if the arm kernel bootstrap code took care of this task, which would then work with any board and any bootloader.

Another interesting feature is the inclusion of the Microblaze architecture, a soft core cpu on Xilinx FPGAs. This MMU-less core has been supported for quite a long time by uClinux, and it’s good news that it is now supported in the mainline kernel. This guarantees that this cpu will indeed be maintained for a long time, and could thus be a good choice in your designs.

Other noteworthy features are support for threaded interrupt handlers (which shows that work to merge the real-time preempt patches is progressing), ftrace support in 32 and 64 bit powerpc, new tracers, and of course, several new embedded boards and many new device drivers.

As usual, full details can be found on the Linux Kernel Newbies website.

Linux kernel 2.6.29 – New features for embedded users

Tuz Linux logoThe 2.6.29 version of the Linux kernel has just been released by Linus Torvalds. Like all kernel releases, this new version offers a number of interesting new features.

For embedded users, the most important new feature is certainly the inclusion of Squashfs, a read-only compressed filesystem. This filesystem is very well-suited to store the immutable parts of an embedded system (applications, libraries, static data, etc.), and replaces the old cramfs filesystem which had some strong limitations (file size, filesystem size and limited compression).

Squashfs is a block filesystem, but since it is read-only, you can also use it on flash partitions, through the mtdblock driver. It’s fine as you just write the filesystem image once. Don’t hesitate to try it to get the best performance out of your flash partitions. Ideally, you should even use it on top of UBI, which would transparently allow the read-only parts of your filesystem to participate to wear-leveling. See sections about filesystems in our embedded Linux training materials for details.

This new release also adds Fastboot support, at least for scsi probes and libata port scan. This is a step forward to reducing boot time, which is often critical in embedded systems.

2.6.29 also allows stripping of generated symbols under CONFIG_KALLSYMS_ALL, saving up to 200 KB for kernels built with this feature. This is nice for embedded systems with very little RAM, which still need this feature during development.

Another new feature of this kernel is the support for the Samsung S3C64XX CPUs. Of course, a lot of new boards and devices are supported, such as the framebuffer of the i.MX 31 CPU, or the SMSC LAN911x and LAN912x Ethernet controllers.

The other major features of this new kernel, not necessarily interesting for embedded systems are the inclusion of the btrfs filesystem, the inclusion of the kernel modesetting code, support for WiMAX, scalability improvements (support of 4096 CPUs!), filesystem freezing capability, filesystem improvements and many new drivers.

For more details on the 2.6.29 improvements, the best resource is certainly the human-readable changelog written by the KernelNewbies.org community.

USB-Ethernet device for Linux

Useful device when you work with an embedded development board

For our Embedded Linux training sessions, I was looking for a USB to Ethernet device. Since Linux supported devices are often difficult to find, I’m glad to share my investigations here.

When you use an embedded development board, you must connect it to your computer with an Ethernet cable, for example to transfer a new kernel image to U-boot through tftp, or to make your board boot on a directory on your workstation, exported with NFS.

You could connect both the board and computer to your local network, which would still allow your computer to connect to the Internet while you work with the board. However, you may create conflicts on your local network if you don’t use DHCP to assign an IP address to your board (if your DHCP server even accepts this new device on the network). In a training environment, you are also likely to run out of Ethernet outlets in the training room if you have to connect 8 such boards. Hence, a direct connection between the board and your workstation’s Ethernet port is often the most convenient solution.

If you can’t use WIFI to keep your computer connected to the outside world, a good solution is to add an extra Ethernet port to your computer by using an USB-to-Ethernet device.

My colleague Thomas and I started looking for such devices that would be supported by Linux. Here are a few that we found:

  • D-Link DUB-E100. Supported by the USB_NET_AX8817X driver. However, this product is bulky and quite heavy (at least 100 grams).
  • TRENDnet TU2-E100. Supported by the same driver, but still bulk (August 2015 update: now replaced by a more recent version, now almost as small as the Apple one, and supported out of the box in Linux. See the comment about this device.)
  • Linksys USB 200m. Supported by the same Linux driver and has a much more acceptable size, but customer reviews complain that its connector can break easily.
  • Apple USB Ethernet Adapter. This should be working out of the box in Linux. At least the MB442Z/A or MC704ZM/A references did, but Apple now sells a new reference that might have a different chipset. It is beautiful, small and light. Support for this device (at least the references I mentioned) was added to Linux 2.6.26 through the same driver. You should be able to use it in recent distros.

Apple USB to EthernetSo, I recommend the Apple device. I event posted a comment on the Apple Store, titled “Perfect for Linux”! I hope the Apple droids won’t censor it. Don’t hesitate to buy it, so that we can confirm that the latest reference is supported too.

I can’t tell whether this could happen with Apple. This was the first Apple device I ever bought…

clink

Compacts directories by replacing duplicate files by symbolic links

clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links.

  • clink saves space. It works particularly well with automatically generated directory structures, such as compiling toolchains.
  • clink uses relative links, making it possible to move processed directory structures
  • clink is fast. It reads each file only once and its runtime is mainly the time taken to read files.
  • clink is light. It consumes very little RAM. No problem to run it on huge filesystems!
  • clink is easy to use. Just download the script and run it!
  • clink is free. It is released under the terms of the GNU General Public License.
clink logo

Usage

usage: clink [options] [files or directories]

Compacts folders by replacing identical files by symbolic links

options:
  --version      show program's version number and exit
  -h, --help     show this help message and exit
  -d, --dry-run  just reports identical files, doesn't make any change.

Screenshot

clink screenshot

Downloads

Here is the OpenPGP key used to generate the signatures.

How it works

clink reads all the files one by one, and computes their SHA (20 bytes) and MD5 (16 bytes) checksums. The trick to easily find identical files is a dictionary of files lists indexed by their SHA checksum.

All the files with the same SHA checksum are not immediately considered as identical. Their MD5 checksums and sizes are also compared then. There is an extremely low probability that files meeting all these 3 criteria at once are different. You are much more likely to face file corruption because of a hardware failure on your computer!

Hard links to the same contents are treated as regular files. Keeping one instance and replacing the others by symbolic links is harmless. Files implemented by symbolic links also have the advantage of not having their contents duplicated in tar archives.

Limitations and possible improvements

  • File permissions: clink just keeps one copy of duplicate files. The permissions of this file may be less strict than those of other duplicates. If permissions matter, enforce them by yourself after running clink.
  • Directory structure: even when entire directories are identical, clink just creates links between files. This is not fully optimal in this case, but it keeps clink simple.

Similar tools or alternatives

  • dupmerge2: replaces identical files by hardlinks.
  • finddup: finds identical files.

Demo: tiny qemu arm system with a DirectFB interface

A tiny embedded Linux system running on the qemu arm emulator, with a DirectFB interface, everything in 2.1 MB (including the kernel)!

Overview

This demo embedded Linux system has the following features:

  • Very easy to run demo, just 1 file to download and 1 command line to type!
  • Runs on qemu (easy to get for most GNU/Linux distributions), emulating an ARM Versatile PB board.
  • Available through a single file (kernel and root filesystem), sizing only 2.1 MB!
  • DirectFB graphical user interface.
  • Demonstrates the capabilities of qemu, the Linux kernel, BusyBox, DirectFB, and
    shows the benefits of system size and boot time reduction techniques as advertised and supported by the CE Linux Forum.
  • License: GNU GPL for root filesystem scripts. Each software component has its own license.

How to run the demo

  • Make sure the qemu emulator is installed on your GNU/Linux distribution. The demo works with qemu 0.8.2 and beyond, but it may also work with earlier versions.
  • Download the vmlinuz-qemu-arm-2.6.20
    binary.
  • Run the below command:
    qemu-system-arm -M versatilepb -m 16 -kernel vmlinuz-qemu-arm-2.6.20 -append "clocksource=pit quiet rw"
  • When you reach the console prompt, you can try regular Unix commands but also the graphical demo:
    run_demo
    

FAQ / Troubleshooting

  • Q: I get Could not initialize SDL - exiting when I try to run qemu.

    That’s a qemu issue (qemu used the SDL library). Check that you can start graphical applications from your terminal (try xeyes or xterm for example). You may also need to check that you have name servers listed in /etc/resolv.conf. Anyway, you will find solutions for this issue on the Internet.

Screenshots

console screenshot df_andi program screenshot
df_dok program screenshot df_dok2 program screenshot
df_neo program screenshot df_input program screenshot

How to rebuild this demo

All the files needed to rebuild this demo are available here:

  • You can rebuild or upgrade the (Vanilla) kernel by using the given kernel configuration file.
  • The configuration file expects to find an initramfs source directory in ../rootfs, which
    you can create by extracting the contents of the rootfs.tar.7z archive.
  • Of course, you can make changes to this root filesystem!

Tools and optimization techniques used in this demo

Software and development tools

  • The demo was built using Scratchbox, a fantastic development tool that makes cross-compiling transparent!
  • The demo includes BusyBox 1.4.1, an toolbox implementing most UNIX commands in a few hundreds of KB. In our case, BusyBox includes the most common commands (like a vi implementation), and only sizes 192 KB!
  • The root filesystem is shipped within the Linux kernel image, using the initramfs technique, which makes the kernel simpler and saves a dramatic amount of RAM (compared to an init ramdisk).
  • The demo is interfaced by DirectFB example programs (version 0.9.25, with DirectFB 1.0.0-rc4), which demonstrate the amazing capabilities of this library, created to meet the needs of embedded systems.

Size optimization techniques

The below optimization techniques were used to reduce the filesystem size from 74 MB to 3.3 MB (before compression in the Linux kernel image):

  • Removing development files: C headers and manual pages copied when installing tools and libraries, .a library files, gdbserver, strace, /usr/lib/libfakeroot, /usr/local/lib/pkgconfig
  • Files not used by the demo programs: libstdc++, and any library or resource file.
  • Stripping and even super stripping (see sstrip) executables and libraries.
  • Reducing the kernel size using CONFIG_EMBEDDED switches, mainly from the
    Linux Tiny project.

Techniques to reduce boot time

We used the below techniques to reduce boot time:

  • Disabled console output (quiet boot option, printk support was disabled anyway), which saves time scrolling the framebuffer console.
  • Use the Preset Loops per Jiffy technique to disable delay loop calculation, by feeding the kernel with a value measured in an earlier boot (lpj setting, which you may update according to the speed of your own workstation).

All these optimization techniques and other ones we haven’t tried yet are described either on the elinux.org Wiki or in our embedded Linux optimizations presentation.

Future work

We plan to implement a generic tool which would apply some of these techniques in an automatic way, to shrink an existing GNU/Linux or embedded Linux root filesystem without any loss in functionality. More in the next weeks or months!

Linux USB drivers

Learning how to write USB device drivers for Linux

Bootlin is proud to release a new set of training slides from its embedded Linux training materials. These new ones cover writing USB device drivers for Linux.

Like everything we create, these new materials are released to the user and developer community under a free license. They can be freely downloaded, copied, distributed or even modified according to the terms of the Creative Commons Attribution-ShareAlike 2.5 license.

Embedded Linux and Ecology

Embedded Linux contributions to the Linux Ecology HOWTO.

Bootlin has contributed major updates to the Linux Ecology HOWTO, a Linux Documentation Project document that gathers ideas and techniques for using Linux in an environmentally friendly way.

In particular, Bootlin took advantage of its experience with embedded Linux system development to add new techniques which can reduce power consumption or make it possible to extend the lifetime of old systems with limited resources.

Bootlin also contributed an overview presentation on this HOWTO. The latest HOWTO version with our updates (waiting for the next official release) can also be found on the same page.