Embedded Linux optimizations

This presentation is a collection of ideas and resources for optimizing the Linux kernel and applications for speed, size, RAM, power and cost. Most of them are gathered and supported by the CE Linux Forum projects. Interested embedded system developers are invited to contribute benchmarks, testing, code and more ideas to these projects.

This document is used in our training sessions. It is available under the Creative Commons BY-SA license (see details and other documents).

It is available under several formats:

Thanks to people who helped, sent corrections or suggestions: Tim Bird, Robert P.J. Day

Ogg/Theora video mini howto

How to make your own Ogg/Theora videos

Here is how we created the free conference videos we are sharing with you.

Our goal is to show you that it is very easy and pretty cheap to create Ogg/Theora videos using only Free Software tools. It would be great if more people shared what they experience, in particular when they attend interesting presentations!

License

Creative commons

Copyright 2006-2008, Bootlin.
This mini-howto is released under the terms of the Creative Commons Attribution-ShareAlike 2.5 license.

Requirements

A mini-DV camcorder.

Such a device costs approximately 500 US dollars / euros. Mini-dv tapes cost about 5 US dollars / euros.

Note that other devices may be used, such as DVD or harddisk camcorders.

Harddisk camcorders are not a very good solution, because video is stored with a high compression rate (MPEG-4 format). You will not get the best results if you encode from MPEG-4 to Theora, because you will be using low bitrate input compressed with another codec.

A DVD camcorder is fine (MPEG-2 compression), because the input quality would be much better. The best is still DV input, which has very little compression, and allows to get the best of the Theora codec.

Camcorder accessories

A tripod is a must-have. Without one, your image will not be very stable (even with image stabilization), and above all, you will be exhausted after one hour.

An external microphone is nice to have, but not mandatory at all. You still
get pretty good quality audio with the built-in one. So, if you are satisfied
by the audio that you get, you do not have to buy such a microphone. However,
the best solution for top quality audio is to connect you audio input to the
room audio system (if any, and if the speaker is using a microphone).

Computer connectivity

You need a GNU/Linux computer with FireWire input (aka IEEE 1394 or iLink).
If you have a notebook with a PCMCIA adaptor, you best option is to get a
FireWire PCMCIA card which doesn’t need any special driver. This should mean
that it is compliant with the 1394 OHCI standard, which is fully supported by
Linux. Note that recent distributions (at least Fedora Core) automatically
load the right drivers when such a card is plugged in.

It may be possible to use USB connectivity too to get the video from the
camcorder. We just do not know how yet. Any resources are welcome!

Storage

The DV files are huge (roughly 15 GB per hour). As intermediate processing
steps are used in our flow, intermediate files of similar size will be
created. Hence, you will need at least 30 GB of free space to process 1 video.
Anyway, it’s much better to have 100 GB or more to store and process several
videos in a row. For notebook owners, external hard drives (typically
high-speed USB 2.0) are your friends.

An external microphone is nice to have, but not mandatory at all. You still get pretty good quality audio with the built-in one. So, if you are satisfied by the audio that you get, you do not have to buy such a microphone. However, the best solution for top quality audio is to connect you audio input to the room audio system (if any, and if the speaker is using a microphone).

Shooting the video

Before or right after filming, make sure that you ask the speaker(s) for permission to publish the video! Make sure you mention the license that you are going to use.

Video capture

Connect the camera to the computer with the FireWire cable.

If you are using a PCMCIA FireWire adaptor, all the modules should have been loaded automatically at module load time.

If you have a legacy FireWire input, you may have to do a few things by hand (logged as root):

modprobe dv1394
chmod a+rw /dev/dv1394/0

You will now use dvgrab
to get the video through the FireWire link and save it to a file. This tool is shipped by most distributions.

dvgrab --size 0 --format raw <output-file-prefix>

Note: --size 0 means that the output file is not split into many smaller ones, when they exceed a given size.

Now that you’re done, let’s assume that you created a video.dv file.

Video trimming

When you read reused tapes, it’s hard to avoid video frames from the previous recordings at the begining or at the end. Before compressing, you first have to trim out the unwanted frames.

This is pretty easy to do with the kino tool, available in all recent distros.

Make sure you export the trimmed video in DV format, to avoid losing quality.

We will soon post kino usage screenshots on this page, to get you started faster with kino.

Quick Ogg/Theora generation

That’s very easy to do. Get the latest version of the ffmpeg2theora package.

ffmpeg2theora -o video.ogv video.dv

You’re done!

You can use the -v and -a parameters to control video and audio quality. The defaults (5 and 2) should be fine for average quality requirements. With -v 7, we already get very good video quality, but the output file size is roughly double. As far as audio quality is concerned, keep in mind the source quality. Unless your audio input is high quality (audio in directly connected to the conference room sound system), there is no need for high bitrate audio compression (-a setting greater than 4).

Deinterlacing?

If the output video quality is poor, it could be because your video needs deinterlacing. In particular, this happens when you record your video in long play mode. Interlaced video is very easy to identify: you just need to find a sequence with motion (camcorder or character motion). Pause the video and interlaced lines will show up.

So, if you source video is interlaced, use the --deinterlace parameter of ffmpeg2theora:

ffmpeg2theora --deinterlace -o video.ogv video.dv

Denoising the video

Look carefully at the generated Ogg/Theora video. Do you see MPEG-like squares moving on surfaces which shouldn’t change at all (walls, sky, board, etc.)?

If this happens, this means that your original video contained noise. This is very frequent with digital camcorders, in particular in low light conditions (when you amplify a weak signal, noise gets more significant). Such noise, though it is not obvious on the source video, can get amplified in the compression process.

Hence, it’s best to remove noise before compressing, so that pixes in still surfaces do not change at all in the source video. Follow the below instructions and compare the output Ogg/Theora video size. You will find that the output file is smaller that what you got by just running ffmpeg2theora on the raw DV video.

Fortunately ffmpeg2theora now supports denoising filters: we contracted Jan Gerber, its developer, to add such filters to his tool. First make sure you have at least version 0.20 (otherwise, download the latest version).

The implementation is based on ffmpeg / mplayer‘s postproc library. Available filter settings are detailed by ffmpeg2theora --pp help, or can be found by looking for tmpnoise in mplayer’s manual page. Filter settings are not easy to choose, however. For your convenience, here are the settings we chose after multiple experiments: --pp de,tn:256:512:1024. At least with our videos, they produce good quality output without significant side effects.

Ogg/Theora video with metatags

It’s possible and useful to add metainformation (title, author, location, license) to the ogv video files.

This can be done thanks to ffmpeg2theora parameters:

ffmpeg2theora -a 3 -v 7 --pp de,tn:256:512:1024 \
--artist "Michael Opdenacker" --title "Fosdem 2006" \
--date "February 2006" --location "ULB, Brussels, Belgium" \
--organization "Bootlin (https://bootlin.com)" \
--copyright "Copyright 2006, Michael Opdenacker" \
--license "Creative Commons Attribution-ShareAlike 2.5" \
-o video.ogv video.dv

If you need to mass encode several videos in a script, it is now possible to add the metatags by hand after encoding. This can be done with the TagTheora tool.

Going further

Run ffmpeg2theora --help for details about more possibilities like live encoding and streaming.

Thanks

  • To Diego Rondini, for letting us know about TagTheora

RAID + Xen on Ubuntu Edgy

Using Xen on Ubuntu 6.10 (Edgy), using RAID storage.

License

Creative commons

Copyright 2006, Bootlin.
This mini-howto is released under the terms of the Creative Commons Attribution-ShareAlike 2.5 license.

Credits

Thanks to:

  • Sébastien Chaumat, for making me feel like using Xen.
  • Eric-Olivier Lamey, for sending feedback and making useful suggestions,

Introduction

In this document, we share our experience using Xen on Ubuntu 6.10 (Edgy), using RAID storage.

Ubuntu 6.10 was used because it was the first Ubuntu version with Xen support. In earlier Ubuntu versions (in particular 6.06 LTS), you have to install Xen from sources, and do manual C library tweaks (for TLS support issues). The advantage of packages is that you can easily know about and deploy security updates!

Another reason for using version 6.10 is that it uses the Linux kernel version supported by the latest Xen version (3.0.3 when we installed it). With Ubuntu 6.06, we would have needed to upgrade the kernel version, or to use an earlier Xen version.

Kernel configuration files are provided for the Via C7 based Dedibox servers available in France. Of course, these instructions should be useful for anyone trying to use Xen, whatever the server hardware, and even if RAID storage is not used.

We wanted to share our experience because we spent a significant amount of time looking for correct kernel configuration settings, bootloader settings (in particular for RAID), as well as Xen network and tuning settings. We hope that this document will save some of your time, in particular if you have a Dedibox server!

Note that this HOWTO may not given enough details for unexperienced system administrators, who are unlikely to fiddle with Xen and RAID anyway.

Ubuntu 6.10 installation

For Dedibox users, we chose the below partition settings in the Dedibox installation interface:

  • 1st partition: /boot, 256 MB, RAID1, ext3
  • 2nd partition: /, 4096 MB, RAID1, ext3
  • 3rd partition: /xen, 146225 MB, RAID1, ext3
  • 4th partition: Linux swap, 2048 MB (2 separate partitions on sda and sdb)

Note that when using Xen, the swap space will only be used by Domain0, which is not supposed to run any services, except a ssh server. The 2048 MB maximum size is definitely much more than needed. 512 MB may be more than enough. We kept 2048 MB in case we decide to stop using Xen and run all our services on a single, real server.

By default, the Dedibox or Edgy install only took the first partition into account. However, Linux fully supports several swap partitions (max 2GB per partition). If these partitions are on different disks as in our case, this is even better for performance, as Linux can access those 2 partitions in parallel.

To make the second swap partition work, we added it to /etc/fstab file, ran mkswap /dev/sdb4 and then rebooted to check that everything was correctly set up.

For Dedibox users, note that the server doesn’t seem to boot if you do not choose a separate boot partition.

Adding packages

Uncomment all universe lines in /etc/apt/sources.list.

Install the packages we are going to need in the next sections:

apt-get update
apt-get install xen-hypervisor-3.0-i386 xen-source-2.6.17 xen-tools xen-utils-3.0 libc6-xen
apt-get install build-essential libncurses5-dev ccache

Kernel compiling

Of course, you may choose to use a generic Linux kernel provided by Ubuntu (such as xen-image-xen0-2.6.17-6-server-xen0). Follow the below instructions if you want to tune your kernel according to your exact hardware.

cd /usr/src
tar jxf xen-source-2.6.17.tar.bz2
cd xen-source

To speed up recompiling, you can add ccache support in the kernel makefile. Change the lines defining CC and CROSS_COMPILE:

HOSTCC          = ccache gcc
CC              = ccache $(CROSS_COMPILE)gcc

Dedibox users can use our custom configuration file. We derived it from the Ubuntu Xen kernel and used the settings used in the official Dedibox kernel configurations. You may still check that you have all the features you need, as we removed the features we do not use at the moment (such as NFS, ReiserFS, FAT…).

Compile your kernel:

make
make install
make modules_install

Bootloader configuration

The Grub bootloader configuration file needed to be updated to be able to load the Xen hypervisor kernel. Here’s what we added to our /boot/grub/menu.lst file before the ## ## End Default Options ## line:

title XEN/2.6.17-bootlin
root (hd0,0)
kernel /xen-3.0-i386.gz
module /vmlinuz-2.6.17.11-ubuntu1-xen0-dedibox-bootlin1 root=/dev/md1 md=1,/dev/sda2,/dev/sdb2 ro quiet splash

You can see that Grub loads files from the first raw partition (not using RAID), while Linux directly boots from the RAID device. In this case, files paths are taken from the /boot partition. You will have to adjust file patches in case /boot is part of the / partition.

Note that there is no clear documentation on the minimum of memory needed for dom0, the privileged Xen domain, from which you are going to control the standard Xen domains. We found that many sites use 128 MB, but other ones seem to be working fine with 64 MB. Just try by yourself if physical RAM is scarse!

Testing dom0

You are now ready to test dom0!

Just reboot and hope that your new kernel boots well. In case you administrate a remote server and this doesn’t work, debugging is tricky (believe us!), because you have no access to the system console. If this happens to you, we advise you to start from a working configuration, like ours or the default Ubuntu kernel, and apply your changes little by little.

Once you access a working shell, you can run top and check that you are no longer running on your regular server. In particular, you will only see the amount of RAM that you attributed to dom0 in the Grub configuration file. You can also run uname -r to check that you are running your new kernel, and xm info to get more information about the Xen hypervizor running on your machine.

Configuring regular Xen domains

To configure networking between dom0 and regular domains, we decided to use regular routing and NAT. Xen uses bridging by default, but we are less familiar with this method.

Set this in the /etc/xen/xend-config.sxp file, by making sure that bridge settings are commented out, and by uncommenting the below 2 lines:

(network-script network-nat)
(vif-script     vif-nat)

We also commented out domain migration settings, as we are not using them (yet).

Creating a new domU domain

Create a 1 GB (for example) sparse file for dom1 system files and format it:

dd if=/dev/zero of=/xen/dom1.img bs=1024k seek=1024 count=0
mkfs.ext3 -F dom1.img

Sparse files are particular files containing holes filled with zeros. No space is used on real storage until the empty blocks are written to. In a few words, they just use the size of their contents.

Create a swap partition image file:

dd if=/dev/zero of=dom1-swap.img bs=128M count=1
mkswap dom1-swap.img

We also created a special 8 MB filesystem for data files for dom1 (not belonging to the operating system):

dd if=/dev/zero of=/xen/dom1-data.img bs=1024k seek=8192 count=0
mkfs.ext3 -F dom1.img

Populate the root filesystem for dom1:

mkdir /mnt/dom1
mount -o loop dom1.img /mnt/dom1
debootstrap edgy /mnt/dom1

Copy the kernel modules too:

rsync -a /lib/modules/2.6.17.11-ubuntu1-xen0-dedibox-bootlin1/ /mnt/dom1/lib/modules/2.6.17.11-ubuntu1-xen0-dedibox-bootlin1/

Copy the /etc/apt/sources.list and /etc/resolv.conf files too.

Configuring domU

Declare the new domain in a /etc/xen/dom1.cfg file:

kernel = "/boot/vmlinuz-2.6.17.11-ubuntu1-xen0-dedibox-bootlin1"
memory = 96
name = "dom1"
vcpus = 1
disk = [ 'file:/xen/dom1.img,ioemu:hda1,w','file:/xen/dom1-data.img,ioemu:hda2,w','file:/xen/dom1-swap.img,ioemu:hda3,w' ]
root = "/dev/hda1 ro"

vif = [ 'ip=10.0.0.1' ]
dhcp = "off"
hostname = "dom1.bootlin.com"
ip = "10.0.0.1"
netmask = "255.0.0.0"
gateway = "10.0.0.254"

Of course, replace dom1 by a meaningful name!

Here, you can see that we assign 10.0.0.x to domx domains.

You can also see that we are using the same kernel as the one we use for dom0. This is not required at all, and we will soon propose a slightly lighter domU kernel without things which are not needed in unprivileged domains (no RAID, no netfilter…).

Booting and configuring domU

Start the new virtual machine and access a console:

xm create /etc/xen/dom1.cfg
xm console dom1

We gave you both instructions, as the second one is useful to access the console of an already running domain. However, you can create a domain and access its console in a single command, using the -c option:

xm create -c /etc/xen/dom1.cfg

You are connected to your new virtual system, with minimum Ubuntu server packages. There are still a few things to adjust though:

Set the root password, otherwise there is no password!

passwd

Fill up the /etc/fstab file as follows:

# /etc/fstab: static file system information.
#
#                
/dev/hda1       /       ext3    defaults,errors=remount-ro   0  1
/dev/hda2       /data   ext3    defaults        0       2
/dev/hda3       none    swap    sw      0       0

Fill up the /etc/network/interfaces file as follows:

# Used by ifup(8) and ifdown(8). See the interfaces(5) manpage or
# /usr/share/doc/ifupdown/examples for more information.

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
        address 10.0.0.1
        netmask 255.0.0.0
        broadcast 10.0.0.255
        gateway 10.0.0.128

Edit the /etc/hosts file as follows:

127.0.0.1       localhost localhost.localdomain
127.0.0.1       dom1

Fill up the /etc/hostname file as follows:

dom1

Exit the dom1 console by typing exit followed by Ctrl ], and reboot dom1:

xm reboot dom1

Back to the dom1 console, check networking with dom0 and with the outside world:

ping 10.0.0.128
ping bootlin.com

Install extra packages:

apt-get install libc6-xen deborphan psutils wget rsync openssh-client

Remove packages we will not need in the Ubuntu distribution:

apt-get remove --purge wireless-tools wpasupplicant pcmciautils libusb-0.1-4 alsa-base alsa-utils dhcp3-common dmidecode linux-sound-base x11-common eject libconsole aptitude groff-base

Keep your current configuration as a reference starting point for other domU domains you will create:

cp dom1.img domu.img.ref

Now we are going to create NAT (Network Address Translation) rules to forward incoming Internet packets to the right server on your virtual local network. Write your own rules in /etc/network/if-up.d/iptables in dom0 (make sure you make this file executable!). Here’s an example for an http server and a BitTorrent seed server.

#!/bin/sh

### Port Forwarding ###
iptables -A PREROUTING -t nat -p tcp -i eth0 --dport 80 -j DNAT --to 10.0.0.1
iptables -A PREROUTING -t nat -p tcp -i eth0 --dport 6881:6889 -j DNAT --to 10.0.0.2

Now, make sure your domain is started when your server is started, by adding a link in /etc/xen/auto/:

cd /etc/xen/auto/
ln -s ../dom1.cfg .

Ubuntu Edgy fixes

tty fixes

If you are using only xm console to connect to dom1, and do not plan to use ssh, you will notice that there are 5 getty processes running all the time waiting for a terminal connection on /dev/tty2 to /dev/tty6, while you just use /dev/tty0.

This does not only waste some CPU cycles and a few MB of RAM, but these getty processes keep failing and get respawned by the upstart init process. This causes a log of writes to the /var/log/daemon.log file, which could eventually fill up your root filesystem.

Here’s a quick fix for this:

rm /etc/event.d/tty2
rm /etc/event.d/tty3
rm /etc/event.d/tty4
rm /etc/event.d/tty5
rm /etc/event.d/tty6

Note that you may need to run this again each time the initscripts package is updated.

Configuring CPU sharing

With Xen, it’s not only possible to set the amount of physical RAM used by each domain. You can also set kind of CPU priorities for more critical domains needing fast response times.

Of course, you could also set the vcpus setting to values greater than one, but this has several drawbacks:

  • You would need an SMP enabled kernel.
  • This method also allows discrete settings, and you would need quite big numbers of vcpus to have a 45% / 65% cpu share, for example.
  • No easy way to add more CPU power to a domain without rebooting it (you may try Linux CPU hotplugging capa bilities, though).

Fortunately, the "xm sched-credit" command lets you change the weight and cap settings of each domain. See the Credit based CPU scheduler page for details about this command.

What’s good is that you can change those settings on the fly, according to actual server loads, without having to restart the corresponding domain. Unfortunately, Xen 3.0.3 doesn’t let you configure initial weight and cap settings at domain creation time through the domain configuration file (such a feature should be available in Xen 3.0.4, as a patch has been committed in the development version). Until such a feature is available, here’s an example implementation:

Create a /etc/init.d/xen-sched-credits file with execute permissions:

#!/bin/sh
# Sets initial weight and cap for xen domains
# In xen 3.0.4, this should be possible
# to set these values in the domain configuration files

# dom1 domain
xm sched-credit -d dom1 -w 64
xm sched-credit -d dom1 -c 25

# dom2 domain
xm sched-credit -d dom2 -w 256
xm sched-credit -d dom2 -c 50

Add this file to init runlevel 2 in dom0:

cd /etc/rc2.d/
ln -s ../init.d/xen-sched-credits S99xen-sched-credits

Useful tips

Making changes in a domain

What’s good with Xen as opposed to a real server is that you don’t need the domain to be running to make changes, even to upgrade packages!

mkdir /mnt/dom1
mount -o loop /xen/dom1.img /mnt/dom1
chroot /mnt/dom1

Further tasks

Congratulations! You are now ready to create more domains, and fine tune Xen according to your exact requirements.

Have fun! We hope that this HOWTO saved some of your time, anticipated some of your questions, and hopefully made you feel like sharing what you learn on your turn too.

Useful links

On-line resources that we used to configure Xen:

LXR websites

Important update: we are proposing a more modern engine to index big projects like the Linux kernel. Check out Elixir.

LXR (Linux Cross Reference) is a great source indexing tool, particularly useful for big projects like the Linux kernel.

Here are Internet sites that make it easy to explore the Linux sources thanks to the LXR engine:

Please let us know if you know about other LXR sites!

You can also browse Linus Torvalds’ GIT tree.

COOol

Checks LibreOffice / OpenOffice.org documents for bad Links

cOOol is a simple Python script that looks for broken hyperlinks in LibreOffice / OpenOffice.org documents.

  • cOOol only supports documents in the OpenDocument format.
  • cOOol is fast: it doesn’t start LibreOffice / OpenOffice.org and runs link checks in parallel threads.
  • cOOol supports most kinds of hyperlinks, including links within the documents.
  • cOOol is easy to use. Just download the script and run it!
  • cOOol is free. It is released under the terms of the GNU General Public License.
cOOol logo

Here is why an automatic link checker for your documents is useful:

  • External references can be a very valuable part of your documents. Broken links reduce their usefulness as well as the impression they make. They also give the feeling that your documents are outdated and older than they are.
  • Web sites evolve frequently. Having an automated way of detecting obsolete links is essential to keeping your documents up to date.
  • You may be much more familiar with your target websites than your readers. They may not be able to find a new location by themselves. You’d better be aware of the change and do this for them!
  • When you rename a page (for example), LibreOffice and OpenOffice.org don’t update all the references to it.

Usage

Usage: coool [options] [OpenOffice.org document files]

Checks OpenOffice.org documents for broken Links

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -v, --verbose         display progress information
  -d, --debug           display debug information
  -t MAX_THREADS, --max-threads=MAX_THREADS
                        set the maximum number of parallel threads to create
  -r MAX_REQUESTS_PER_HOST, --max-requests-per-host=MAX_REQUESTS_PER_HOST
                        set the maximum number of parallel requests per host
  -x EXCLUDE_HOSTS, --exclude-hosts=EXCLUDE_HOSTS
                        ignore urls which host name belongs to the given list

When a broken link is found, open the document in OpenOffice.org and use the search facility to look for the link text.

Configuration file

Rather than configuring cOOol from the command line, it is possible to
define the same settings in a ~/.cooolrc file.

Example:

# Configuration file for cOOol

verbose = True
exclude_hosts = "lxr.bootlin.com www.example.com"
max_threads = 200

You can see that configuration file settings have the same name as long
options, except that dash (-) characters are replaced by
underscores (_).

Usage through a proxy

cOOol can be used through a proxy. The Python classes it uses rely on standard Unix environment variables for proxy definition, as in the below bash example:

export http_proxy="proxy.server.com:8080"
export ftp_proxy="proxy.server.com:8080"

Prerequisites

You first need to install the configparse module.

Downloads

cOOol can be found in our odf-tools git tree.

Screenshot

cOOol screenshot

Implementation

cOOol parses the xml components of each document file, looking for hyperlinks.

It would have been cleaner and safer to use the OpenOffice.org API to explore the documents. However, there are also benefits in a standalone Python implementation:

  • No need to start OpenOffice.org and load documents in memory. This saves a lot of time and RAM!
  • No need to have an OpenOffice.org install. Nice if you need to implement a validation server using cOOol.
  • Last but not least, no need to understand OpenOffice.org’s API and the internal structure of documents! By the way, that’s what makes exchange formats like XML attractive. However, we would be delighted if somebody could come up with a simpler and safer implementation based on the API, that could be run within OpenOffice.org user interface!

Testing cOOol

We are using 2 documents to make sure that cOOol finds all the kinds of broken links it is supposed to support:

Limitations and possible improvements

  • cOOol doesn’t check for e-mail links. It could at least check that the corresponding domain is valid.
  • cOOol doesn’t give you page numbers for broken links. You have to open the document and use the search facilities to locate each link.
  • cOOol still crashes on some documents with Unicode strings (for example with Chinese text).
  • cOOol has trouble with link text containing quotes, as in what’s new. The text it outputs is truncated.

clink

Compacts directories by replacing duplicate files by symbolic links

clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links.

  • clink saves space. It works particularly well with automatically generated directory structures, such as compiling toolchains.
  • clink uses relative links, making it possible to move processed directory structures
  • clink is fast. It reads each file only once and its runtime is mainly the time taken to read files.
  • clink is light. It consumes very little RAM. No problem to run it on huge filesystems!
  • clink is easy to use. Just download the script and run it!
  • clink is free. It is released under the terms of the GNU General Public License.
clink logo

Usage

usage: clink [options] [files or directories]

Compacts folders by replacing identical files by symbolic links

options:
  --version      show program's version number and exit
  -h, --help     show this help message and exit
  -d, --dry-run  just reports identical files, doesn't make any change.

Screenshot

clink screenshot

Downloads

Here is the OpenPGP key used to generate the signatures.

How it works

clink reads all the files one by one, and computes their SHA (20 bytes) and MD5 (16 bytes) checksums. The trick to easily find identical files is a dictionary of files lists indexed by their SHA checksum.

All the files with the same SHA checksum are not immediately considered as identical. Their MD5 checksums and sizes are also compared then. There is an extremely low probability that files meeting all these 3 criteria at once are different. You are much more likely to face file corruption because of a hardware failure on your computer!

Hard links to the same contents are treated as regular files. Keeping one instance and replacing the others by symbolic links is harmless. Files implemented by symbolic links also have the advantage of not having their contents duplicated in tar archives.

Limitations and possible improvements

  • File permissions: clink just keeps one copy of duplicate files. The permissions of this file may be less strict than those of other duplicates. If permissions matter, enforce them by yourself after running clink.
  • Directory structure: even when entire directories are identical, clink just creates links between files. This is not fully optimal in this case, but it keeps clink simple.

Similar tools or alternatives

  • dupmerge2: replaces identical files by hardlinks.
  • finddup: finds identical files.

Demo: tiny qemu arm system with a DirectFB interface

A tiny embedded Linux system running on the qemu arm emulator, with a DirectFB interface, everything in 2.1 MB (including the kernel)!

Overview

This demo embedded Linux system has the following features:

  • Very easy to run demo, just 1 file to download and 1 command line to type!
  • Runs on qemu (easy to get for most GNU/Linux distributions), emulating an ARM Versatile PB board.
  • Available through a single file (kernel and root filesystem), sizing only 2.1 MB!
  • DirectFB graphical user interface.
  • Demonstrates the capabilities of qemu, the Linux kernel, BusyBox, DirectFB, and
    shows the benefits of system size and boot time reduction techniques as advertised and supported by the CE Linux Forum.
  • License: GNU GPL for root filesystem scripts. Each software component has its own license.

How to run the demo

  • Make sure the qemu emulator is installed on your GNU/Linux distribution. The demo works with qemu 0.8.2 and beyond, but it may also work with earlier versions.
  • Download the vmlinuz-qemu-arm-2.6.20
    binary.
  • Run the below command:
    qemu-system-arm -M versatilepb -m 16 -kernel vmlinuz-qemu-arm-2.6.20 -append "clocksource=pit quiet rw"
  • When you reach the console prompt, you can try regular Unix commands but also the graphical demo:
    run_demo
    

FAQ / Troubleshooting

  • Q: I get Could not initialize SDL - exiting when I try to run qemu.

    That’s a qemu issue (qemu used the SDL library). Check that you can start graphical applications from your terminal (try xeyes or xterm for example). You may also need to check that you have name servers listed in /etc/resolv.conf. Anyway, you will find solutions for this issue on the Internet.

Screenshots

console screenshot df_andi program screenshot
df_dok program screenshot df_dok2 program screenshot
df_neo program screenshot df_input program screenshot

How to rebuild this demo

All the files needed to rebuild this demo are available here:

  • You can rebuild or upgrade the (Vanilla) kernel by using the given kernel configuration file.
  • The configuration file expects to find an initramfs source directory in ../rootfs, which
    you can create by extracting the contents of the rootfs.tar.7z archive.
  • Of course, you can make changes to this root filesystem!

Tools and optimization techniques used in this demo

Software and development tools

  • The demo was built using Scratchbox, a fantastic development tool that makes cross-compiling transparent!
  • The demo includes BusyBox 1.4.1, an toolbox implementing most UNIX commands in a few hundreds of KB. In our case, BusyBox includes the most common commands (like a vi implementation), and only sizes 192 KB!
  • The root filesystem is shipped within the Linux kernel image, using the initramfs technique, which makes the kernel simpler and saves a dramatic amount of RAM (compared to an init ramdisk).
  • The demo is interfaced by DirectFB example programs (version 0.9.25, with DirectFB 1.0.0-rc4), which demonstrate the amazing capabilities of this library, created to meet the needs of embedded systems.

Size optimization techniques

The below optimization techniques were used to reduce the filesystem size from 74 MB to 3.3 MB (before compression in the Linux kernel image):

  • Removing development files: C headers and manual pages copied when installing tools and libraries, .a library files, gdbserver, strace, /usr/lib/libfakeroot, /usr/local/lib/pkgconfig
  • Files not used by the demo programs: libstdc++, and any library or resource file.
  • Stripping and even super stripping (see sstrip) executables and libraries.
  • Reducing the kernel size using CONFIG_EMBEDDED switches, mainly from the
    Linux Tiny project.

Techniques to reduce boot time

We used the below techniques to reduce boot time:

  • Disabled console output (quiet boot option, printk support was disabled anyway), which saves time scrolling the framebuffer console.
  • Use the Preset Loops per Jiffy technique to disable delay loop calculation, by feeding the kernel with a value measured in an earlier boot (lpj setting, which you may update according to the speed of your own workstation).

All these optimization techniques and other ones we haven’t tried yet are described either on the elinux.org Wiki or in our embedded Linux optimizations presentation.

Future work

We plan to implement a generic tool which would apply some of these techniques in an automatic way, to shrink an existing GNU/Linux or embedded Linux root filesystem without any loss in functionality. More in the next weeks or months!

OLS 2008 videos

30 videos from the Linux Symposium in Ottawa

We are pleased to release 29 videos that we took at the Linux Symposium in Ottawa, Canada, in July 2008:

  • Keynote: The Kernel: 10 Years in Review, by Matthew Wilcox (Intel)
    video (57 minutes, 175M)
  • Talk: Tux on the Air: State of Linux Wireless Networking, by John W. Linville (Red Hat)
    paper, video (52 minutes, 168M)
  • Talk: Suspend to RAM in Linux: State of the Union, by Len Brown and Rafael Wysocki (Intel)
    paper, video (52 minutes, 163M)
  • Talk: Real Time vs Real Fast: How To Choose?, by Paul E. McKenney (IBM)
    paper, video (45 minutes, 166M)
  • Tutorial: ftrace: latency tracer, by Steven Rostedt (Red Hat) video (98 minutes, 772M)
  • BOF: Embedded Linux, by Tim R. Bird (Sony)
    video (42 minutes, 200M)
  • BOF: Embedded Microcontroller Linux, by Michael Durrant (Arcturus Networks)
    video (42 minutes, 243M)
  • Talk: Energy-aware task and interrupt management, by Vaidyanathan Srinivasan (IBM)
    paper, video (52 minutes, 182M)
  • Talk: Application Testing Under Realtime Linux, by Luis Claudio R. Gonçalves (Red Hat)
    paper, slides, video (54 minutes, 297M)
  • Talk: Application Framework for Your Mobile Device, by Shreyas Srinivasan (Geodesic Information Systems)
    paper, video (25 minutes, 146M)
  • Keynote: The Making of OpenMoko Neo, by Werner Almesberger (OpenMoko)
    video (94 minutes, 463M)
  • BOF: U-Boot by Wolfgang Denk (Denx)
    video (54 minutes, 362M)
  • BOF: Linux Compiler, by Rob Landley (Impact Linux)
    video (100 minutes, 765M)
  • Tutorial: Practical Guide to Using Git, by James Bottomley (Hansen Partnership)
    video (61 minutes, 357M)
  • Talk: Advanced XIP File System, by Jared Hulbert (Numonyx)
    paper, video (49 minutes, 160M)
  • Talk: SELinux for Consumer Electronic Devices, by Yuichi Nakamura (Hitachi)
    paper, video (31 minutes, 113M)
  • Talk: Around the Linux File System World in 45 Minutes, by Steve French (IBM)
    paper, slides, video (49 minutes, 298M)
  • BOF: Linux The Easy Way with LTIB, by Stuart Hughes (Freescale)
    slides, video (25 minutes, 144M)
  • Keynote: The Joy of Synchronicity: Coordinating the Releases of Upstream and Distributions, by Mark Shuttleworth (Canonical)
    slides, video (76 minutes, 458M)
  • Talk: Smack in Embedded Computing, by Casey Schauffer
    paper, video (59 minutes, 211M)
  • Talk: Bazillions of Pages: The Future of Memory Management, by Christoph H. Lameter (SGI)
    paper, video (49 minutes, 258M)
  • Tutorial: Writing application fault handlers, by Gilad Ben-Yossef (Codefidence)
    video (49 minutes, 275M)
  • Talk: Linux, Open Source and System Bringup Tools, by Tim Hockin (Google)
    paper, video (51 minutes, 229M)
  • Talk: DCCP Reached Mobiles, by Leandro Melo Sales (Federal University of Campina Grande)
    paper, video (42 minutes, 193M)
  • Talk: Building a robust Linux kernel, by Subrata Modak (IBM)
    paper, slides, video (51 minutes, 249M)
  • CELF BOF presentation: Best of recent CELF Conferences, by Tim Bird (Sony)
    slides, video (10 minutes, 88M)
  • CELF BOF presentation: Developing Embedded Linux with Target Control, by Tim Bird (Sony)
    slides, video (17 minutes, 145M)
  • CELF BOF presentation: Embedded Building Tools – An Audience Survey, by Michael Opdenacker (Bootlin)
    slides, video (17 minutes, 127M)
  • CELF BOF presentation: GCC Tips and Tricks Highlights, by Gene Sally
    video (14 minutes, 62M)

See also all the papers, and a report from the CELF BOF.

We could only shoot the presentations we attended. You can see that our main interests are embedded systems and the Linux kernel wink smiley.

ELC 2008 report

Table of contents

Introduction

Day 1

Day 2

Day 3

Other talks

Conclusion

Introduction

From April 15th to 17th 2008 took place the fourth edition of the Embedded Linux Conference organized every year by the CE Linux Forum in the Silicon Valley. This year, for the first time, the conference was organized inside the Computer History Museum, which happened to be a very nice venue for such a conference. The museum also has various exhibits about computer history such as Visible Storage, an exhibit featuring many samples from the museum collection, ranging from the first computers to the first Google cluster, going through Cray supercomputers.

The conference’s program was very promising: three keynotes from famous speakers (Henry Kingman, Andrew Morton and Tim Bird) and fifty sessions, either talks, tutorials or bird-of-a-feather sessions, covering a wide range of subjects of interest for any embedded Linux developer : power management, debugging techniques, system size reduction, flash filesystems, embedded distributions, realtime, graphics and video, security, etc.

This report has been written by Thomas Petazzoni, from Bootlin. The report only covers the talks he could actually attend : there were three simultaneous tracks at Embedded Linux Conference. Sometimes very interesting talks were happening at the same time, leading to a kind of frustration for the audience, willing to be at several places at the same time. For those people, and for all the persons who could not attend the conferences, Bootlin also provides video recordings for 19 talks given during ELC. The links to the video are given below in the report. The following report makes an extensive use of the contents of the slides used by the speakers during their talks.

Day 1

Keynote: Tux in Lights, Henry Kingman

Link to the video (44 minutes, 139 megabytes) and the slides.

The first day of the conference was opened by a keynote of Henry Kingman entitled Tux in Lights. Henry Kingman is famous for being the editor behind the well-known Linux Devices website, and this year he was in charge of opening the Embedded Linux Conference. He started his talk with an introduction about the importance of such meetings : he emphasized the fact that many free software developers work together all year long without having the chance to meet in person. In that respect, conferences such as ELC are important opportunities to see each other, he said.

Kingman then continued his presentation with slides containing the result of the latest Linux Devices survey concerning the use of embedded Linux, that Jake Edge already reported on Linux Weekly News in his article ELC: Trends in embedded Linux.

Adventures in Real-Time Performance Tuning, Frank Rowand

Link to the video (50 minutes, 251 megabytes) and the slides.

In this talk, Frank Rowand presented what has been involved in setting up the real time version of the Linux kernel (linux-rt) on a MIPS platform, using the TX4937 processor. He started by reminding that doesn’t mean fast response time, but determinism, and that deadlines could be seconds, milliseconds or microseconds, for example.

Then, he summed up what could affect the IRQ latency in the Linux kernel : disabled interrupts, execution of top halves, softirqs, scheduler execution, and finally context switch. An important aspect of Linux RT is tuning this IRQ latency to make it 1) deterministic and 2) low. So, code disabling interrupts in the kernel should be avoided as much as possible, and Frank’s talk focused on finding and fixing issues about such pieces of code.

The roadmap of his adventure was basically :

  • Add some RT pieces for MIPS and the tx4937 processor
  • Add MIPS support to RT instrumentation. Instrumentation is an essential tool to find RT-related issues, he said.
  • Tuning.
  • Implement “lite” irq disabled instrumentation, because the existing instrumentation tools overhead was too high in his opinion.
  • Tuning.

He then started to talk about the latency tracer, which has been recently submitted to mainline inclusion by Ingo Molnar. Currently only available in the -rt, this tracer has recently been improved in several areas in 2.6.24-rt2 : cleaned up code, user/kernel interface based on debugfs instead of /proc, simultaneous trace of IRQ off and preempt off latencies, and simultaneous histogram and trace. He however used the previous version, 2.6.24-rt1 for the experiments reported in his talk.

His first experiments with the tracer lead to the discovery of several issues :

  • Latencies up to 5.7 seconds were showing up in /proc/latency_hist/interrupt_off_latency/CPU0. Using /proc/latency_trace, he discovered the culprit : r4k_wait_irqoff(), a MIPS-specific function called when the CPU is idle. That function was disabling interrupts before going into idle using the wait MIPS instruction. The quick fix was to use the nowait kernel option, to disable the use of CPU idle specific instructions. Of course, one must be aware of the consequences of using such an option from a power management perspective. The real fix would be to stop latency tracing in cpu_idle(), as is done on x86. Even with that fix, he still had some large maximum latencies.
  • CPUs have timestamps registers that are very accurate, and 64 bits or 32 bits wide. These registers are incremented at each cycle, and on MIPS, 32 bits counters are used, which means that these counters were overflowing after a few seconds. In his case, it was rolling over in around six seconds (very close to the maximum 5.7 seconds reported latency !). In fact, it happened that the latency tracer code didn’t handle clock rollover properly. He fixed that by using the same algorithms used for jiffies in include/linux/jiffies.h. This fix removed the maximum reported latencies, and he was now down to a 6.7 milliseconds maximum latency.
  • The remaining problems were due to the fact that the timer comparison and capture code was not handling properly the switches between raw and non-raw clock sources. So in kernel/latency_trace.c, he had to look for such switches, and at each of them, delete timestamps from the current event in the other mode.

He then showed some nice and pretty graphs (visible in the video), showing the improvements made by each fix. Once the very ugly latencies are fixed, the next thing to do is to fix what disables preemption for the longest time and what disables interrupts for the longest time. In his talk, he focused on the second part : irq disabled time.

He presented in more details the main tool used for this debugging work: the latency tracer. He described the contents of a latency trace output, which might be kernel-hacker-readable, but not necessarily human-readable at first sight. He highlighted the fact that the function trace that one can get with the latency tracer is not a list of all functions executed, but that trace points are only inserted at “interesting” locations in various subsystems. Thus, one has to interpolate what’s happening between the locations provided by the trace, he said. He also mentioned the usefulness of the data fields available for each line of trace : they are not documented in any way, are specific to each trace point, but end up to be very useful in understanding what’s happening. They contain information such as time for timer related functions or PID and priority for scheduling related functions.

The first problem he found, with latencies of 164 microseconds, occurred when handling the timer interrupt, in hrtimer_interrupt(). Several calls to try_to_wake_up() where made, causing a long time with interrupt disabled (between handle_int(), the low level interrupt handling function in MIPS that disables interrupts, and schedule(), which re-enables interrupts). In fact, the timer code was waking up the tasks for which timers have expired, which is an O(n) algorithm that depends on the number of timers in the system. He has no fix yet, except the workaround of not using too many timers at the same time.

The second problem he found is the fact that the interrupt top half handling followed by preempt_schedule_irq() is a long path executing with interrupts disabled. A possible workaround is to remove or rate limit non-realtime related interrupts, which in his case where caused by the network card, due to having the root filesystem mounted over NFS. What he tried, as a quick and dirty hack, was to re-enable and immediately disable again interrupts in resume_kernel, the return from interrupt function. It is a bad hack as it allows nested interrupts to occur, which could cause the stack to overflow. However, he found that it improved the latencies, and presented results confirming that.

As final advise, he said do not lose sight of the most important metric — meeting the real time application deadline — while trying to tune the components that cause latency. He mentioned LatencyTOP as a promising tool, but also mentioned using the experts’ knowledge, thanks to the web and mailing lists. He mentioned a few recent topics of discussion on linux-rt-users, to show the type of discussions occurring on this mailing list.

To conclude the talk, he showed and discussed real-time results made by Alexander Bauer (and presented at the 9th Real Time Linux Workshop) and his own.

In the end, this talk happened to be highly technical, but very interesting for people who want to discover how the latency tracer can be used, and the kind of problems one can face when setting up and using such an instrumentation tool.

Kernel size report and Bloatwatch update, Matt Mackall

Link to the video (49 minutes, 146 megabytes).

Matt Mackall founded the Linux Tiny project in 2003, is the author of SLOB, a more space-efficient alternative to SLAB, the kernel’s memory allocator, and of other significant improvements towards reducing the code size of the Linux kernel. He naturally made an update of the size of the kernel, and announced a new version of his bloat-tracking tool, Bloatwatch.

To start with, Matt Mackall explained why all that attention is paid on size. He said that it of course matters for the embedded people, become memory and storage are expensive relative to the price of an embedded device, and that a smaller kernel means a cheaper device, and hence more room for applications. But Matt also said that the rest of the world now cares about code size, because even if memory and storage are cheap, the speed ratio between CPU cache and memory increases, which means that smaller code allows to fit more code in cache lines, allowing performance improvements. Matt Mackall is certainly right with this statement, but the issue is that code size reduction is focused on hot paths, not on overall code size.

According to Mackall, the reasons for the kernel growth are many : new features, improved correctness, robustness, genericity and diagnostics. He then gave an absolutely impressive report on the amount of changes that occurred last year. In April 2007, Linux 2.6.21 was the stable version, it had 21,615 files and 8.24 million lines of code. In April 2008, at the time of the conference, Linux 2.6.25-rc8 was the latest available version (probably very close to the final 2.6.25), and it had 23,811 files and 9.21 million lines of code. 37,033 changesets were committed to the kernel, from around 2,400 different contributors, contributing to the change of 18,165 files (almost of all files in the kernel have been touched !), to the addition of 2.24 millions lines and the removal of 1.25 millions lines. Matt concludes : « a lot has happened ».

He then mentioned a few noticeable changes in 2007, concerning the subjects he cares for : SLUB, another alternative to SLAB, being now the default allocator, SLUB and SLOB having seen their efficiency improved, greater attention paid to cache footprint issues, increase usage of automated testing, pagemap and PSS to monitor userspace (work that has been merged in 2.6.25 and that allows to understand precisely userspace memory consumption), and the revival of the Linux-Tiny project, now maintained by Michael Opdenacker.

Mackall then entered the core of the subject : kernel code size. With all the architectures, drivers and configuration options, it’s difficult to measure the kernel code size increase (or decrease), so Matt proposed a simple metric : measure the size of an allnoconfig configuration for the x86 architecture. The allnoconfig kernel Makefile target allows to create a minimalistic configuration, with no networking, no filesystems, no drivers, only the core kernel features. Matt then showed a graph of the kernel size in that configuration, from 2.6.13 (released two and half years ago) and now. And he said, « we can see a pretty steady and obvious increase », which we can obviously be seen on the graph. Most of the growth is due to code increase, the data part of the kernel hasn’t increased in the last years.

The graph showed an increase of 28% on the kernel size over the last two and half years. Over the last year, between 2.6.21 and 2.6.25-rc8, the kernel size of the same allnoconfig has increased from 1.06 megabytes to 1.21 megabytes, a 14% increase. He said that he made some experiments on more realistic kernel configurations, and ignoring variations in configuration options over the kernels, the kernel size increase was pretty much the same so he thinks the allnoconfig metric is good enough.

He then gave some nice numbers about the size increase : it currently increases at a rate of 400 bytes per day or 4 bytes per change (one or two instructions). The average function size is around 140 bytes, so he concludes that we would need to take out of the kernel three functions every day to keep the core from growing !

To keep the kernel small, his biggest advise is to review the code before it goes in. He insisted on having new functionality under configuration options, because, as he said : « I don’t need processes namespaces on my phone ». And more generally, he said that the kernel community currently lacks code reviewers. He proposed to continue working on inlining and code duplication elimination : code inlining used to be popular in the kernel community, but it is not longer useful with modern architectures. The biggest issue is that a lot of functions are defined in header files, and are then included in thousands of C files so that they are instantiated in every object file. And then, Matt thinks that there is a need to automate size measurement to find worst offenders in existing code… This made a perfect transition to the next topic of his talk : Bloatwatch 2.0.

Two years ago, at the same conference, he presented Bloatwatch 1.0. The new version is rewritten from scratch, with many improvements :

  • easy to customize for your kernel configuration so that everybody can run Bloatwatch on his specific configuration
  • statistics for both built-in and modular code
  • delve down into individual object files
  • improved filtering of symbols
  • greatly cleaned-up code

One can get Bloatwatch from its Mercurial repository, using 

hg clone http://selenic.com/repo/bloatwatch

or grab the tarball, at http://selenic.com/repo/bloatwatch/archive/tip.tar.gz.

Matt then went one making a demo of Bloatwatch. On one hand, Bloatwatch is a set of scripts to compile a kernel according to a configuration, and fill a database with the results. On the other hand, it is a Web application that allows to navigate through the results, generate nice and fancy graphs, compare size between kernel versions, for the total kernel, or for any subsystem, object file or even function.

He said that building the whole database for allnoconfig for several years of stable kernels takes a few hours on a normal laptop, and doing the same with defconfig takes about a day. This means that rebuilding the database for a given configuration is something anyone can do pretty easily.

In a few seconds, he demonstrated how to find the specific source of a bloat case. He pointed down the sysctl_check.code file, that appeared last year, and which weights 25 kilobytes of code. And thanks to the link to the revision control system of the kernel, he was able to find the description of the original patches in a few seconds, which gave an insight on the purpose of the change. In fact, it happened that all that stuff does binary checking on sysctl arguments, something we probably don’t need on your phone, he said. So it’s probably a good candidate for a configuration option.

Bloatwatch appears to be a great tool for measuring kernel size increase, and to analyze the causes of that increase. Now, some effort should probably be set up to communicate such information to the kernel developer community, in one way or another.

Every Microamp is sacred – A dynamic voltage and current control interface for the Linux Kernel, Liam Girdwood

Link to the video (35 minutes, 71 megabytes) and the slides.

Liam Girdwood works for a company called Wolfson Microelectronics and discussed the creation of a kernel API for voltage and current regulators controls. Before going into the kernel framework itself, he started by providing an introduction to regulator based systems, assuming that everyone is not necessarily familiar with such systems, which indeed was true.

Power consumption in semiconductors has two components : static and dynamic. The static part is smaller that the dynamic one when the device is active, but is the bigger source of power consumption when the device is inactive. The dynamic part corresponds to the activity of the device : signals switching, analog circuits changing state, etc. Power consumption grows linearly with the frequency, and grows with the square of the voltage. See this Wikipedia page on power optimization for more information. Liam concluded that general introduction by saying that regulators can be used to save both static and dynamic power.

Then, he went on to present the global picture of a regulator. It is a piece of hardware that takes an input power (from a battery, line, USB or another regulator), and that outputs a power (to a device or another regulator). This piece of hardware is controlled by software, so that we can control how the output power will be. It is possible to instruct the regulator to generate a 1.8V output power when the input source is 5V, or to limit the current to 20mA, for example. The whole purpose of the regulator framework is to provide a generic software framework for controlling this kind of devices.

After that, he introduced the abstraction of power domains. A power domain is a set of devices and regulators that get their input power from a regulator, from a switch or from another power domain, so that power domains can be chained together. Power constraints can also be applied to power domains to protect the hardware.

Then, in order to get into more concrete examples, he started describing the system architecture of one of their Internet Tablets. It has the usual components : CPU, memory, NOR flash, audio codec, touchscreen, LCD controller, USB, Wifi and other peripherals. Then, after showing this block diagram, he presented the same block diagram, with all the regulators. Each device can be controlled by one or several power regulators. The whole purpose of the regulator framework is to control all these regulators, and so he went on with a discussion about the framework itself.

The general goal of the regulator framework is to « provide a standard kernel interface to control voltage and current regulators ». It should allow systems to dynamically control current regulator output power in order to save watts, with the ultimate goal of prolonging battery life, of course. The kernel framework to control all that is divided in four interfaces :

  • consumer interface for device drivers
  • regulator driver interface for regulator drivers
  • machine interface for board configuration
  • sysfs interface for userspace

The consumers are the clients of the regulators, i.e. the drivers controlling a device that get its current from a regulator. The consumers are constrained by the power domain in which they are : they cannot request more that the limits that have been set for their power domain. They defined two types of consumers : the static ones (that just want to enable or disable the power source), and the dynamic ones (that want to change the voltage or the current limit).

The consumer API is very similar to the clock API, he said. Basically, a device driver starts to access a regulator using :

regulator = regulator_get(dev, "Vcc");

where dev is the device and "Vcc" a string identifying the particular regulator we would like to control. It returns a reference to a regulator, that should be at some point released, using :

regulator_put(regulator);

Then, the API to enable or disable is as simple as :

int regulator_enable(regulator);
int regulator_disable(regulator);
int regulator_force_disable(regulator);

regulator_enable() keeps track of the number of times the regulator is enabled, so that the regulator will actually be disabled only after the corresponding number of calls to regulator_disable(). regulator_force_disable(), as its name says, allows to disable a regulator even if the reference count is non-zero. A status API is also available in the form of the int regulator_is_enabled(regulator) function.

Then, the voltage API looks like :

int regulator_set_voltage(regulator, int min_uV, int max_uV);

After checking the constraints, the specified regulator will provide power with a voltage inside the boundaries requested by the consumer, between min_uV (minimal voltage in micro-volts) and max_uV. The regulator will actually choose the minimum value that it can provide and that is in the range requested by the consumer. The voltage actually chosen by the regulator can be fetched using int regulator_get_voltage(regulator).

The current limit API is similar :

int regulator_set_current_limit(regulator, int min_uA, int max_uA);
int regulator_get_current_limit(regulator);

The regulators are not 100% efficient, their efficiency vary depending on load, and they often offer several modes to increase their efficiency. He gave the example of a regulator with two modes : a normal mode, pretty inefficient for low current values but covering the full range of current values, and an idle mode, more efficient for low current values, which cannot provide more current than a given limit (smaller than the one in normal mode). So, for example, with a consumer of 10 mA, the efficiency would be 70% in normal mode, consuming 13 mA and 90% in idle mode, consuming 11 mA, thus saving 2 mA. There is an API to set the optimum mode for a given current value :

regulator_set_mode();
regulator_get_mode();
regulator_set_optimum_mode();

Regulators can also notify software of events, such as failure or excess temperature :

regulator_register_notifier();
regulator_unregister_notifier();

This is all about the API one can use in device drivers to handle regulators.

Then, he switched to the topic of writing a regulator driver. The API is very similar to other kernel APIs. They must first be registered to the framework before consumers can use them :

struct regulator_dev *regulator_register(struct regulator_desc *desc, void *data);
void regulator_unregister(struct regulator_dev *rdev);

The events can propagated to consumers, thanks to the notifier call chain mechanism. Every consumer that registered a callback using regulator_register_notifier() will be notified if the following function is called by a regulator driver :

int regulator_notifier_call_chain(struct regulator_dev *rdev, unsigned long event, void *data);

The regulator_desc structure must give some information about the regulator (name, type, IRQ, etc.), but most importantly, must contain a pointer to a regulator_ops structure. It is pretty much a 1:1 mapping of the consumer interface :

struct regulator_ops {

 /* get/set regulator voltage */
 int (*set_voltage)(struct regulator_cdev *, int uV);
 int (*get_voltage)(struct regulator_cdev *);

 /* get/set regulator current */
 int (*set_current)(struct regulator_cdev *, int uA);
 int (*get_current)(struct regulator_cdev *);

 /* enable/disable regulator */
 int (*enable)(struct regulator_cdev *);
 int (*disable)(struct regulator_cdev *);
 int (*is_enabled)(struct regulator_cdev *);

 /* get/set regulator operating mode (defined in regulator.h) */
 int (*set_mode)(struct regulator_cdev *, unsigned int mode);
 unsigned int (*get_mode)(struct regulator_cdev *);

 /* get most efficient regulator operating mode for load */
 unsigned int (*get_optimum_mode)(struct regulator_cdev *, int input_uV,
 int output_uV, int load_uA);
};

After this short description of the regulator driver interface, he described the machine driver interface. It is basically used to glue the regulator drivers with their consumers for a specific machine configuration. It describes the power domains : « regulator 1 supplies consumers x, y and z », power domain suppliers : « regulator 1 is supplied by default (Line/Battery/USB) » or « regulator 1 is supplied by regulator 2 » and power domain constraints : « regulator 1 output must be between 1.6V and 1.8V ».

To give a concrete example, he propose to take a NAND flash chip whose power is supplied by the LDO1 regulator. To attach the regulator to the “Vcc” supply pin of the NAND, we use the following call :

regulator_set_device_supply("LDO1", dev, "Vcc");

This will associate the regulator named LDO1 (as given in the regulator_desc structure) to the Vcc input of a given device. Then that device driver is able to use the regulator_get() to get a reference to its regulator and then control it.

Then, the machine driver can specify constraints on power domains, using the regulation_constraints that can be associated to a given regulator using regulator_set_platform_constraints().

Finally, the machine driver is also responsible for mapping regulators to regulators, when one regulator is supplied by other regulators. It is done using the regulator_set_supply() function, which takes the name of two regulators as arguments, the supplier regulator, and the consumer regulator. Of course, it is up to the machine specific code to glue up everything properly.

Then, he described the sysfs interface, which exports regulator and consumer information to userspace. It is currently read-only, and Liam doesn’t see at the moment any good reason to switch it to read-write. One can access information such as voltage, current limit, state, operating mode and constraints, which could be used to provide more power usage information to PowerTOP, for example.

After this API description, he gave some real world examples. First, cpufreq, which allows to scale CPU frequency to meet processing demands. He says that voltage can also be scaled with frequency : increased with frequency to increase performance and stability or decreased with frequency to save power. This can be done with the regulator_set_voltage() API. In cpuidle, you can imagine changing the operating mode of the regulator that supplies current to the CPU in order to switch to a more efficient mode.

He then gave the example of LCD backlights, which usually consume a lot of power. It’s only possible to reduce power when it’s possible to reduce brightness. This can then be achieved using the regulator_set_current_limit() API, particularly for backlights using white LEDs, in which brightness can be changed by changing the current.

In the audio world as well, improvements can be made. Audio hardware consumes analog power even when there is no audio activity : power can be saved by switching off the regulators supplying the audio hardware. We might also think of switching off the components that are not in use. He gave the example of the FM-tuner when you’re listening to MP3’s or the speaker amplifier that can be turned off when headphones are used. The same goes for NAND and NOR flash that consume more power during I/O than when they are idle, so it is possible to switch the operating mode of the regulators to take advantage of the more efficient mode for low current values. He pointed out the fact that flash chips have power consumption information in their datasheets, and that they could be used in the flash driver to properly call regulator_set_optimum_mode() to set the best possible mode.

The status of this work is that the code is working on several machines. It supports several devices : Freescale MC13783, Wolfson WM8350 and WM8400. They are working with the -mm kernel by providing patches to Andrew Morton, and they already posted the code on the Linux Kernel Mailing List.

Using Real-Time Linux, Klaas van Gend

Link to the video (53 minutes, 263 megabytes) and the slides.

This talk of Klaas van Gend, Senior Solutions Architect at Montavista Europe, was subtitled Common pitfalls, tips and tricks. He presented the real-time version of the Linux kernel, clarifications about various misconceptions on real-time, and gave some advise.

He started by presenting both faces of Klaas : Klaas-the-Geek, who started programming at 13, first encountered Linux in 1993 and is a software engineer since 1998, and Klaas-the-Sales-Guy, who joined Montavista as FAE in 2004 and is in charge of the UK, Benelux and Israel territory.

Originally, Linux is designed to be fair, like the other Unixes : the CPU has to be shared properly between all processes, with fair scheduling. However, in the case of real-time systems, you don’t usually care about fairness. So a lot has to be done to give real-time capabilities to the Linux kernel, and this work has being done for a long time in the -rt version of the Linux kernel, maintained as a separate patch. His slide also mentioned some progress made on the mainline kernel : originally, only userspace code was preemptible, then Robert Love added preemption to the kernel, and Ingo Molnar added voluntary preemption. The O(1) scheduler, which allows to decide which task should be run next in a constant time, was also mentioned.

He then went on with a definition of real-time : « OK, we have a deadline and if we don’t answer within the deadline… Sorry we don’t care anymore ». As an example, he said : « if the airbag doesn’t blow in time or is only half-way blown, too bad : you’re dead ». In contrast, he said, if after a mouse click the system only reacts after half a second, that’s annoying, but it works. His words were strengthened with a nice slide showing that the degree of acceptability of the response time only slowly decreases for a consumer/user interface, but falls down abruptly for a classic real time system.

Here’s the main assumption in Real Time Linux : the highest priority task should go first, « always », he said. This means that everything should be pre-emptable and that nothing should keep higher priority things from executing. He said that lots of things had to be changed in the kernel to implement this assumption, and one of the first targets were spinlocks.

The original Linux UP spinlock basically disables interrupts : nothing else can interrupt your code during a critical section, and this is not real-time-friendly at all. In addition, the original SMP spinlock basically busy waited for another CPU to release the lock, which is not always performance-friendly. In order to go to real-time, something had to be done with spinlocks : introduce sleeping spinlocks, so that instead of busy-waiting, threads waiting for the lock would go to sleep, and no interrupt would be disabled. Spinlocks are thus turned into mutexes.

Another problem is priority inversion, a fairly classical problem in synchronization and scheduling literature, which can lead to the situation where an high priority process cannot run because it is blocked by a low priority process. We have three processes : A, B and C. A has the highest priority, B a medium priority and C a low priority. C holds a lock Q. After some time, task A needs that lock Q, but it is still held by C, so A cannot run. Because B is runnable and its priority is higher than that of C, it will run, and run, and run, and the lock will never be released, or only when B is done executing its code. The solution to this problem is known as priority inheritance. In our case, the priority inheritance mechanism would increase the priority of C to the priority of A when A needs the lock held by C, so that C can run instead of B, release the lock, and allow A to get it. Work on priority inheritance has been done inside the linux-rt tree, but has finally been merged into mainline in 2.6.18.

The next problem discussed by Klaas came from the named semaphores mechanism. These are semaphores that appear on the filesystem, so that they can be used by several unrelated processes (processes with no parent-child relationship or living in the same address space). The problem with named semaphores is that when a process holding the semaphore dies, the semaphore is not automatically released and any other process trying to get the semaphore will be stuck… until the system is rebooted. A solution to this is called robust mutexes, which allow to automatically release mutexes when a process dies. It has been merged in 2.6.17, and covered by Linux Weekly News.

Then, Klaas quickly covered the topic of priority queues. Traditionally, the Linux kernel handled mutex queues in a FIFO-order: the first waiting process gets the mutex when it is released by another process. However, on a real-time system, you want the mutex to be assigned to the waiting process with the highest priority. This is not fair for the other processes, but as explained by the speaker at the beginning of his talk, real-time and fairness are not necessarily compatible. The solution to this problem is called priority queues : processes are ordered by priority in waiting lists. This is implemented by the rtmutex code in the Linux kernel (see kernel/rtmutex.c), which is used by the Futex facility available in userspace (see futex(7) for more details), and used by the glibc to implement mutexes. The rtmutex relies on the plist library in the kernel (see lib/plist.c).

Klaas van Gend then discussed the issues of the standard IRQ handling mechanism in Linux. In the regular kernel, IRQs and tasklets are handled in priority over any task in the system, even the highest-priority ones. This means that the execution of a high priority task can be delayed for an unbounded amount of time because of any IRQ coming from the hardware, even interrupts we don’t care about. The solution to this problem, only available in the linux-rt tree as of today, is called threaded interrupts. The idea is to move the interrupt handlers to threads, so that they become entities known by the scheduler. Once known by the scheduler, these entities can be scheduled (i.e delayed) and we can assign priority to them. To illustrate the need for such a feature, Klaas gave the example of a customer who builds a big printer. On this printer, the high-priority task is to push data to the engine, otherwise the user will get white bands on paper. This process should not be disturbed by any other process, such as getting new printing jobs. He wanted to highlight the fact that threaded interrupts are actually in use and are useful.

Klaas then concluded : « essentially, those are the basic mechanisms in use to make Linux realtime. Does it help ? Yes it does. ». And he switched to the Results section of his presentation. He started with measurements of interrupt latency, and compared results from different preemption modes (none, desktop and RT) on an IXP425 platform using a 2.6.18 kernel. With preempt none, the minimum latency is 4 microseconds, average is 6 and maximum is 9797 microseconds. With preempt desktop, the minimum latency is 5, average is 10 and maximum is 2679. With preempt RT, the minimum is 6, the average 7 and the maximum 349 microseconds. With a higher-end processor (FreeScale 8349 mITX), the results are better: maximum latency of 3968 microseconds with preempt none, 1604 with preempt desktop and 53 microseconds with preempt RT. He also said that with an Intel Core 2 Duo, they managed to lower the maximum latency down to 30 microseconds.

After this short result section, the speaker switched to the final part of his talk, entitled Common mistakes and myths. The first myth is that people are confusing speed and determinism. He cited quotes such as « I need real time because my system needs to be fast » or « I want to have the best performance Linux can do ». But he said «NO !», real time does not mean highest throughput, it means more predictability. He even said that efficiency and responsiveness are inversely related. For example, the real-time preemption code adds some overhead (spinlocks are replaced by mutexes but mutexes are much more heavy-weight than spinlocks, priority inheritance increases task switching and worst case execution time, etc.). He cited benchmarks that measured a decrease of 20% in the network throughput of a -RT kernel compared to a regular kernel.

He then went one with a list of mistakes :

  • Forgetting to recompile. When switching to -rt, all kernel files need to be recompiled because of the complex internal changes that are involved by the switch to -rt. However, the userspace ABI doesn’t change, so you don’t have to recompile the glibc or the userspace applications. But if you use third-party modules, you’ll have to recompile them. Another drawback of third-party binary kernel modules !
  • Forgetting to enable robustness and priority-inheritance in userspace. Userspace mutexes do not automatically have the robustness and priority-inheritance properties. They must be enabled by doing
    pthread_mutex_t mutex;
    pthread_mutexattr_t mutex_attr;
    
    pthread_mutex_attr_init(&mutex_attr);
    pthread_mutexattr_setprotocol(&mutex_attr, PTHREAD_PRIO_INHERIT);
    pthread_mutexattr_setrobust_np(&mutex_attr, PTHREAD_MUTEX_ROBUST_NP);
    
    pthread_mutex_init(&mutex, &mutex_attr);
      
  • « Running at prio 99 froze my system ». If a process running a top priority runs forever, then the system will freeze. So an infinite-loop process with lock your system, even if you call sched_yield(). sched_yield() will simply yield the CPU to the highest-priority runnable process : you !.

He then gave some advise on how to design the system. One should not set the highest priority or even realtime priority to all the processes in the system, otherwise you are no longer real-time. The realtime tasks should also be carefully designed to run in a fairly limited time, so that the rest of the system can still execute. If you have a collection of realtime processes, their execution time must of course match with the timing requirements that you have. He suggested to have only one or two high priority tasks in the system, otherwise things start to be very complicated to design.

One myth he wanted to fight first is the myth that real time is hard. He said that it is not as hard as many tend to say or think. The second myth he wanted to fight is the rumor that real-time is only pushed by the embedded community. It is also strongly pushed by the audio community, and can be useful for games as well.

He went back to a mistake made by a customer. Even after switching to a real-time enabled 2.6 kernel, that customer still failed to receive some bytes from serial ports on his Geode x86-like board. It turned out that it was caused by calls to the BIOS used for VGA buffer scrolling and VGA resolution switching. These calls can disable interrupts for an unbounded amount of time, not controlled by the Linux kernel. He also mentioned the problem of printk() on a serial port. printk() on a serial line can block waiting for the transmission buffer of the UART to be empty after transmitting the bytes to the other end. And this can take an unbounded amount of time. So he suggested to disable printk() completely when one has real-time issues.

After this suggestion, he switched to another topic : the relation between RT and SMP concerns when doing driver development. He said that both RT and SMP have similar requirements : in RT any process can be preempted at any time, which is very similar to multi-processor issues, where the same code can run simultaneously on different cores. All requirements for SMP-safety also apply to RT, and RT and SMP share the same advanced locking, he said. He also mentioned that the deadlock detection code introduced by the -rt people already led to fixing many SMP bugs in the kernel.

Then he discussed the problem of swapping in the context of a real time system. What happens if your real time task code or data gets swapped to disk because of memory pressure in your system ? The latencies would be horrible. The solution he mentioned to this problem is the usage of the mlockall() system call :

mlockall(MCL_CURRENT | MCL_FUTURE);

But it warned that this should only be done on small processes, because all memory pages of the process will be locked into memory : code, data and libraries.

To complete his talk he highlighted the fact that the Linux Real Time kernel comes with no warranty. Even though it has been thoroughly tested over the years by the kernel community and by companies lie Montavista, the Linux kernel has several millions of lines of code, and nobody can prove that it will work correctly in all situations. One has to verify that it works well for one’s particular use cases.

To conclude, he recalled that Linux used to be fair, which was bad for real-time. Montavista has worked on improving RT behaviour since 1999, but true real time only appeared in Linux in 2004, with interrupt latencies below 50 microseconds on some platforms. However, the real-time patch is still being merged into mainline kernel, and real time system design has its challenges… just like programming in COBOL, he said. He ended with a famous quote from Linus Torvalds « Controlling a laser with Linux is crazy, but everyone in this room is crazy in his own way. So if you want Linux to control an industrial welding laser, I have no problem with your using PREEMPT_RT ». And Klaas made the transition to the questions part with a funny Windows Blue Screen of Death.

The audience had questions about the interaction between memory allocation and real time, about predictions on the merging of the remaining -rt features to the mainline kernel (with some insights by Matt Mackall on that topic), about the interaction between real-time and I/O scheduling, another topic on which Matt Mackall gave some interesting insights.

In the end, this presentation wasn’t about anything really new, but gave a well-presented overview of the features needed in the Linux kernel to answer the needs of real-time users, as well as a good summary of the first pitfalls one could face in creating a real time system.

Power management quality of service and how you could use it in your embedded application, Mark Gross

Link to the video (57 minutes, 401 megabytes) and the slides.

Mark Gross, who works at the Open Source Technology Center of Intel, gave a talk about power management quality of service (PM_QOS), a new kernel infrastructure that has been merged in 2.6.25 (see the commit and the interface documentation).

The first problem for Mark’s work lies in the current power management architecture, in which the implementation of power management policy is extracted away for the drivers (who know the hardware the best) to a centralized policy manager, creating a dual point of maintenance of device power/performance knowledge : some in the driver, some in the policy manager. In his opinion, it « removes all hope of good abstractions or stable and useful PM API’s ».

That’s the reason why PM QoS was created. The goal is to provide a coordination mechanism between the hardware providing a power managed resource and users with performance needs. It’s implemented as a new kernel infrastructure to facilitate the communication of latency and throughput needs among devices, system and users. Automatic power management is then possible at the driver level, with coordinated device throttling given the QoS expectations on that device.

He then presented areas where PM QoS would be useful in the kernel. First in the cpu-idle infrastructure, to take DMA latency requirements into account when switching to deeper C-states. He also mentioned issues with the ipw2100 driver or sound drivers when C-state latencies are large.

PM QoS first implements a list of parameters in pm_qos_params.c, which are currently just : cpu_dma_latency, network latency and network throughput. These are exported both to the kernel and to userspace. PM QoS maintains a list of pm_qos requests for each parameter, along with an aggregated performance requirement and maintains a notification tree, for each parameter. Inside the kernel, it provides an API to register to notifications of performance requests and target changes. To userspace, it provides an interface for requesting QoS.

When an element is added or changed inside the list of pm_qos of a given parameter, the corresponding aggregate value is recomputed. If it changed, then all drivers registered for notification on that parameter are notified.

From the userspace point of view, PM QoS appears as a set of character device files, one for each PM QoS parameter. When an application opens one of these files, then a PM QoS request with a default value is registered. The application can later change the value by writing to the device file. Closing the device file will remove the request in the kernel, so that if the application crashes, the cleanup is done automatically by the kernel. Mark then showed a simple Python program to use that user interface :

#!/usr/bin/python
import struct, time
DEV_NODE = "/dev/network_latency"
pmqos_dev = open(DEV_NODE, 'w')
latency = 2000
data = struct.pack('=i', latency)
pmqos_dev.write(data)
pmqos_dev.flush()
while(1):
  time.sleep(1.0)

Mark Gross then described the in-kernel API. A driver can poll the current value for a parameter using :

int pm_qos_requirement(int qos);

but of course, most drivers will probably be more interested in the parameter notification mechanism. They can subscribe (and unsubscribe) to a notification chain using :

int pm_qos_add_notifier(int qos, struct notifier_block *notifier);
int pm_qos_remove_notifier(...);

To create new PM QoS parameters, one will have to modify the pm_qos_init() code in kernel/pm_qos_params.c.

After describing the consumer side of the API, he described the producer side of the API, that allows to instruct other device drivers to respect certain latency or throughput requirements (just like the userspace API presented previously). This API is a set of three functions : pm_qos_add_requirement(int qos, char *name, s32 value) to add a requirement to a parameter list, pm_qos_update_requirement(int qos, char *name, s32 value) to update it, and pm_qos_remove_requirement(int qos, char *name) to remove it.

At the end of the presentation, he gave the example of using PM QoS within the iwl4965 wireless adapter driver, which he is working on with one of the iwl4965 developers. The chipset has six high level power configurations affecting the powering of the antenna, how quickly it makes the radio sleep and for how long between AP-beacons. Therefore, it looks like a good application of the PM QoS network latency parameter, he said.

At the moment, power management for this device is device-specific, through sysfs. Thanks to PM QoS, the driver could simply register itself for pm_qos notifications of changes to network latencies requirements, and switch to the corresponding power management levels when needed. All other network device drivers could do the same, so that sane user mode policy managers could be written without knowing the exact power management details of each and every network adapter. Mark Gross then described some details of the implementation of PM QoS inside the iwl4965 driver.

Mark sees a lot of possibilities with such a coherent userspace interface. Network shooter games could set network latency to zero to disable power management. A Web browser could set it to two seconds, a instant-messaging client to 0.5 seconds, a user mode policy manager could adjust it when the laptop goes to battery power or switches back to AC power, etc.

In the end, the talk was fairly short, but very interesting and completely in-topic. Some developer invents a new API to solve a problem, and tries to make it known, to allow other developers to use this API in their drivers or applications, and to get feedback from the community. Something that just happened during the long questions and answers session that followed the talk (discussion on the current API, its usage, etc.)

Leveraging Free and Open Source Software in a product development environment, Matt Porter

Link to the (45 minutes, 220 megabytes) and the slides.

In this talk, Matt Porter, who works for Embedded Alley, wanted to explain how one can leverage Free and Open Source Software in the development of a new product. Everyone knows that GNU toolchains exist, that we have the Linux kernel and standard basic root filesystems. But then, what else is there, wondered Matt Porter ?

In order to make his talk more concrete, he proposed to discuss a case study, and follow the following steps : define application requirements, break down requirements by software components, identify software components fully or partially available as FOSS and finally integrate and extend the FOSS components with value-added software to meet application requirements.

His case study was the development of a Digital Photo Frame (DPF), on of these small devices that allows to display pictures, play music, are wireless connected and look nice and shiny on the dining room table. The requirements for such a device are clear and concise, he said, making it a good example for his presentation.

His hardware platform is a ARM SoC (with DSP, PCM audio playback, LCD controller, MMC/SD controller, NAND controller), a 800×600 LCD screen, a couple of navigation buttons, MMC/SD slot, NAND flash and speakers. The user requirements for the DPF device were 

  • Display to the LCD
  • Detect SD card insertion, notify application of SD card presence, and have the application catalog the photo files present on the card
  • Provide a modern 3D GUI and transitions, navigation via buttons, configuration for slideshows, transition types, etc.
  • Audio playback of MP3, playlist handling, ID3 tag display
  • Support JPEG resize and rotation to support arbitrary-sized JPEG files, dithering support for 16 bits display

Based on these requirements, he established a list of software components that are needed 

  • Firmware
  • OS Kernel
  • I/O drivers
  • Base userspace framework/applications
  • Media event handler
  • JPEG library (running on ARM or DSP)
  • MP3 and supporting audio libraries
  • OpenGL ES library for 3D interface
  • Main application

He quickly covered the obvious components : U-Boot for the firmware, Linux as the kernel, leveraging the SD/MMC, framebuffer, input and ALSA subsystems of the kernel as I/O drivers, use Busybox as the base userspace framework and use OpenEmbedded as the build system.

For the media event handling, he used udev, which receives events from the kernel when the SD card is inserted or removed, creates device nodes according to a set of rules, and then sends the event to the HAL daemon. HAL, which stands for Hardware Abstraction Layer, is a daemon to handle hardware interaction : it knows how to handle the hardware, and can send events over D-Bus to notify other applications, such as the main DPF application. D-Bus was used in their product, it is an IPC framework used to implement a system-wide bus through which applications can communicate with each other. In their case, HAL and their application do use D-Bus to communicate : the application subscribes to HAL events for the SD card and is notified when something happens.

The next subject was JPEG picture handling. For JPEG decoding, they used the libjpeg library, and for resize and rotation, they used jpegtran. Dithering was not supported in libjpeg or jpegtran, and instead of writing their own code, they borrowed some code from the FIM image viewer (FIM stands for Fbi IMproved, which is a framebuffer based image viewer).

To support MP3 playing, they used libmad, which runs on ARM and supports MP3 audio decoding for playback. They also used libid3 to handle the ID3 tags and be able to display them on the screen, and libm3u to handle media playlists.

Then, he covered a more specific and technical subject : using DSP acceleration. Using the DSP available in hardware to accelerate JPEG and MP3 processing looks like an interesting option. First, one needs a DSP bridge, and he mentioned openomap.org as a good starting point for that topic. He also mentioned using libelf to process ELF DSP binaries, which allows for pre-runtime patching of symbols and cross calls from DSP to ARM. He said that the general purpose libraries such as libjpeg, jpegtran, FIM and libmad can be ported to run portions of their code on a DSP.

For the 3D graphic interface, they decided to use Vincent, an OpenGL ES 1.1 compliant implementation. Nokia ported the code to Linux/X11, and it has been easily modified to run on top of the Linux framebuffer. It can also be extended in various ways to support a hardware accelerated cursor, floating/fixed point conversions, use GPU acceleration, etc.

Matt said that a complete GUI can be implemented in low-level OpenGL ES. Font rendering can be done using the freetype library, and it makes it possible to have an interface with a 3D desktop look. It also makes 3D photo transitions possible : photos are loaded as textures, and transitions are then managed as polygon animations together with camera view changes. He also mentioned the fact that higher-level libraries such as Clutter can be used on top of OpenGL ES to provide higher-level interface building tools.

Finally, he described the main DPF application, which integrates all the FOSS components : managing media events, using the JPEG library to decode and render photos, handling Linux input events and driving the OpenGL ES based GUI, managing user-selected configuration, and displaying the photo slideshow using selected transitions.

To conclude, he said that « good research is the key to maximizing FOSS use ». He however warned that many components will require extensions and/or optimization, but that smart use of FOSS where possible will save time, money and speed up product to market.

Demonstrations

At the end of the first day, some companies and projects have been invited to demonstrate some of their work in the hall next to the main conference room. Your editor found some of these demonstrations particularly interesting.

One person from Fujitsu was demonstrating Google Android on real hardware. They ported Android from the QEMU environment provided in Google’s SDK to real environments : Freescale LMX31 PDF, a development board, and Sophia Systems Sandgate3-P, a device which looks like a mix of a phone and a remote controller.

Engineers from Lineo Solutions were demonstrating their work around memory management, and the management of out-of-memory situations. They explored in-kernel memory mechanisms and userspace notifications mechanisms through a signal. The latter sounded particularly interesting, as it allows to notify applications of memory pressure inside the kernel. The application could then free some memory used for temporary caches for example, in order to help the system to recover for the bad situation.

Richard Woodruff, from Texas Instrument, was demonstrating the power management improvements they made to the Linux kernel in order to decrease the power consumption of their OMAP3 platform. They have been able to get very impressive results.

One Hitachi engineer was demonstrating the use of SELinux in Android. SELinux was used to create two operating modes in Android : the private mode and the business mode. In private mode, only personal applications and data are available. In business mode, only business applications and data are available. And the isolation between these two worlds is enforced by SELinux.

Another Hitachi engineer was demonstrating the use of SystemTap in an embedded system. SystemTap was not designed with cross-compiling and host/target separation in mind. So they improved SystemTap to make it more easily usable in embedded situations : the kernel module generated by SystemTap can be cross-compiled, then loaded on a remote target, and the results can be gathered on the host. These improvements will soon be published.

York Sun, from Freescale Semiconductor was demonstrating a new CPU, with interesting framebuffer capabilities. The framebuffer controller is able to overlay in hardware several layers, which is very useful in things such as navigation systems. York Sun gave more details about Linux support of such a framebuffer controller in the talk entitled Adding framebuffer support for Freescale SoCs.

Day 2

Keynote: The relationship between kernel.org development and the use of Linux for embedded applications, Andrew Morton

Link to the video (55 minutes, 240 megabytes) and the slides.

The second day started by a conference given by a famous special guest : Andrew Morton. After an introduction by conference organizer Tim Bird, Andrew started his talk entitled The relation ship between kernel.org development and the use of Linux for embedded applications.

His talk was already the subject of several reports, one on Linux Devices, and another one on LWN, by Jake Edge.

Andrew Morton’s talk was not technical at all, it rather discussed how embedded companies could participate more in mainline kernel development, what are their interests in doing so, and how this can be mutually beneficial to both companies and to the kernel community.

Linux Tiny, Thomas Petazzoni

Link to the video (32 minutes, 140 megabytes) and the slides. Thanks to Jean Pihet, Montavista for recording the talk.

Making a full and complete report of your editor’s talk wouldn’t be very interesting, so let’s let other persons do that. Just to sum up, the talk discussed the following topics :

  • Why is the kernel size important ?
  • Demonstration of the fact that the kernel size is growing, in a significant way over the years
  • History, goal and current status of the Linux Tiny project
  • Future work on this project

UME, Ubuntu Mobile and Embedded, David Mandala

Link to the video (30 minutes, 145 megabytes) and the slides.

David Mandala gave a not very technical talk about UME, Ubuntu Mobile and Embedded. He first described the type of devices targeted by UME : the devices are called MID, for Mobile Internet Devices. He described them as « consumer centric devices », « task oriented devices », offering a simple and rich experience with an intuitive UI and an “invisible” Linux OS.

He then described Ubuntu Mobile & Embedded as a completely new product based on Ubuntu core technology. It incorporates open source components from maemo.org, adds new mobile applications developed by Intel and adapts existing open source applications to mobile devices. The challenges for UME are mainly that applications can’t fit on small screens and that applications are designed for keyboard and mouse, not fingers and touch screen. The big focus of UME is on these two problems, not on other embedded related issues such as system size, boot time, memory consumption, porting to other architectures, etc. This is a point that has been raised by the Rob Landley at the end of the talk, and it seems that at the moment, these topics are not in the radar of the UME project.

David Mandala listed the differences between UME and the standard Ubuntu desktop : GNOME Mobile (Hildon) is used instead of the standard GNOME desktop, applications are optimized to fit in 4.5″ to 7″ touch LCD, optimizations for power consumption (with a reference to the LPIA acronym, which seems to stand for Low Power on Intel Architecture), built-in drivers for WiFi, WiMax, 3G and Bluetooth. The size of the system will be around 500 megabytes, it targets devices with more than 2 gigabytes of Flash. Not something we can call resource-constrained.

The global architecture of Ubuntu Mobile is similar to a normal Linux desktop : the kernel with its drivers, X11 with Cairo, Pango, OpenGL, a networking layer, basic frameworks like Gtk, HAL, D-Bus, Gstreamer, and then applications for PIM, e-mail, web browsing, instant messaging, etc. David Mandala also mentioned the problem of proprietary applications with redistribution restrictions, such as a Flash players and video codecs.

Mandala mentioned the Moblin website, « a place for specific Intel software for MIDs ». The projects focus on things such as an image creator, a power policy manager and a web browser. Ubuntu Mobile integrates applications and solutions from standard Ubuntu, from Moblin and from GNOME Mobile.

Canonical’s representative then talked about the community they are building around Ubuntu Mobile. It works pretty much like the standard Ubuntu community : transparent community process, a MID/Mobile track at the Ubuntu Developer Summit, a code of conduct, open and transparent community councils and boards, use of launchpad.net, etc. Canonical will dedicate a three persons team to the mobile community, and they will engage with upstream communities to work with them in improving mobile solutions.

He closed his talk with some useful pointers for the interested people : the Mobile and Embedded project on Ubuntu Wiki (no longer exists, see this Wikipedia page), the #ubuntu-mobile IRC channel on Freenode.

It is also worth noting that LWN published a short report of this talk.

Hacking an existing phone for phase change memory, Justin Treon

Link to the video (28 minutes, 159 megabytes) and the slides.

In this talk, Justin Treon, from Numonyx, explained how he hacked into a phone running Linux and how he modified it to use «Phase change memory». He first explained how he managed to get serial and JTAG to work, then how he reduced the amount of SD-RAM from 48 megabytes to 32 megabytes (because of the use of Phase Change Memory, less RAM is actually needed to run the same system and set of applications).

Phase change memory, or PCM in short, is a type of non-volatile memory, that combines advantages of the existing types of memory without having their drawbacks. PCM allows execution in place, like NOR flash; it is fast to write, like NAND flash, and doesn’t require erasing and can be modified on a bit-by-bit basis, like RAM. Using PCM greatly simplifies the software stack (no need for a Flash Translation Layer, for erase, block management and garbage collection), and improves system performance, he said. PCM is backward compatible with Flash; it supports traditional erase and write commands. But it also offers new commands like «Bit-Alterable Write One Word» and «Bit-Alterable Buffer Write» with which block erasing is not needed anymore.

Justin Treon then explained how he hacked the Flash code of the Linux kernel to support PCM. His modifications are very hacky at the moment (direct hack of mtdblock), but he wants to improve them in the future.

Shifting sands: lessons learned from Linux on FPGA, Grant Likely

Link to the video (47 minutes, 261 megabytes) and the slides.

Grant Likely, who works for Secret Labs Technologies Ltd., is an experienced kernel developer of the PowerPC port : he works on the device tree and gave a talk entitled A symphony of flavors: using the device tree to describe embedded hardware, which I unfortunately couldn’t attend due to the three simultaneous tracks of ELC. However, in this talk, Grant wanted to share his experience with running Linux on FPGA.

At the beginning of his talk, he first gave some context on running Linux on FPGA. He first presented the typical architecture of a System-on-Chip (SoC) : in a single chip, one has a CPU, an interrupt controller, a memory controller and several peripheral controllers, such as Ethernet MAC, UART, GPIOs and other external buses. On his diagram, all that stuff fitted inside a big gray box representing the chip, and was connected to external boxes (DDR2 RAM, Ethernet PHY, serial transceiver, etc.)

Then, he showed a “FPGA system”. The big gray box is now completely empty. One has to implement everything inside it, using specific languages such as VHDL and Verilog. This is nice, because it’s very flexible : the full chip can be completely programmed in a custom way.

However, people using FPGAs soon discovered that they were often implementing the same blocks, and that they could benefit from having these blocks directly in hardware. This could make these blocks faster and reduce their consumption of programmable gates. Grant presented the architecture of a FPGA with higher cool factor, the Virtex 4FX FPGA system. It’s a FPGA in which one or two PowerPC processor blocks, two or four Ethernet MACs, and between 0 and 24 RocketIO serial transceivers are implemented in hardware. These are fixed in hardware, they cannot be changed. But they are available in the same chip as the normal FPGA, which can be programmed for custom applications. The rest of Grant’s talk focused on running Linux on the PowerPC, not on the FPGA itself, because as he said, he is not a FPGA engineer.

Grant then presented the status of Virtex FPGA Linux support. There is basic support in mainline for serial ports, for the ML300/403 framebuffer and for the SystemACE device. Extra drivers are available in the public git tree of Xilinx : Ethernet devices, DMA, I2C, GPIO, Microblaze support. At the time of the talk, this git tree was up to date with 2.6.24-rc8, but some rewrite work was needed before merging into mainline.

The first lesson learned that the speaker wanted to share with the audience was summarized by Don’t make developers lives hard. As his slides say, hardware engineers don’t like to compile kernels, and software engineers don’t like to synthesize bitstreams. He explained that when doing development on a FPGA, the peripherals addresses and configurations can be changed at any time by the hardware people. At the beginning, the synthesis process generates a file with address definitions, which can then included by the Linux kernel to know how to compile the drivers properly. This means that anytime a hardware engineer wants to run Linux on a modified FPGA, he has to recompile the kernel (which he doesn’t like to do). This is where device tree comes into play. It is basically a file that one can give to the kernel at boot time (and not compile time), and that lists the configuration for the various peripherals of the system. This file can for example be generated by the synthesis process of the hardware engineers, so that they don’t have to mess up with kernel compiling anymore. And software engineers don’t have to mess up with bitstream synthesis anymore.

The second lesson learned was to get the drivers into mainline, and Grant referred to Andrew Morton’s talk in the morning, that just said the same thing. Grant Likely said « You’re not doing anything that novel anyway. No: you’re really not », trying to fight the usual “intellectual property protection” argument.

The third lesson learned is that with FPGA, «hardware is the new software», so that one should follow software best practices : revision control, automated builds and peer review. Grant even suggests to let the software people have a look at the hardware design, and vice-versa.

The fourth lesson is that when you have a problem, it « really might be a hardware bug ». Grant advices to talk your hardware engineer immediately when you have problems, because he is able to probe any signal inside the FPGA design.

The fifth lesson is to not spend all the budget on “boring” stuff, such as getting PCI, USB, Ethernet or serial working. Grant Likely cited Matt Mackall who said : « if your vendor isn’t pushing stuff to mainline, go beat them up ». You should probably be spending your time on interesting stuff, such as developing the custom application logic you want to put inside the FPGA. So Grant suggests to choose the platform carefully at the design stage by making sure that Linux support is correct, and that viable device drivers are available. He mentioned the experience of the Cypress c67x00 USB driver that he had to develop. It took three months for something that was absolutely not directly interesting for the project. But the piece of hardware was there, in their design, and he had no other choice that developing the driver for it.

The sixth lesson learned is very classical to software engineering : make things work first before you try to optimize them (to make it faster, smaller, or more clever).

The seventh lesson is to prepare for dynamic hardware in the kernel. When working with FPGA, one should expect things to change, at a much faster rate than with SoCs.

The next lesson was entitled « User space sucks ». Grant thinks that it’s easy to cross-compile kernels, but that cross-compiling userspace is hard. So he suggests to get the userspace problem solved early.

After that, the questions and answers session started. The first question voiced concerns about the increase in boot time caused by the use of the device tree. Grant said that using the device tree is mandatory, but one could still limit its use to only a few things. However, he was a bit skeptical about the fact that the device tree is actually responsible for increasing boot time. He suggested to make some measurements, because he always found the kernel decompression step to be the biggest time consumer in the kernel booting process.

During the discussion, he used the term «SystemACE», and one person of the audience asked what SystemACE was, so Grant started an explanation about what it actually is. SystemACE is a separate chip, next to the FPGA. On one side, it has a Compact Flash interface. On the other side it has a 8-bit bus connexion and a JTAG connexion with the FPGA. When the board is powered up, the SystemACE chip reads data files from the Compact Flash and pushes that data stream to the FPGA to configure it. After boot, the SystemACE chip can also be used as an interface to the FPGA to read more information from the Compact Flash. The SystemACE mechanism is also documented on the Xilinx website. It is not the only solution to configure the FPGA at boot time, CPLD are also commonly used. The data file used by SystemACE is actually a list of JTAG commands, so one can actually use it to push the bitstream to the FPGA, but also to load the kernel to memory for example (but this is slow because of the limited JTAG speed, Grant said). Following a question for the audience, Grant suggested to have standard Flash connected to the FPGA. SystemACE is used to load a simple loader, which will then run on the FPGA and load the kernel from Flash.

The next question was how to handle, from the Linux perspective, the flow of data to the FPGA and out of the FPGA, considering the fact that this flow of data is usually very high-speed, but that the CPU doesn’t need to touch it. Grant then offered an interesting view of a hardware architecture that allows to transfer large amount of memory from a high speed source to the DDR, using MPMC, the Multi Port Memory Controller, which is available on Xilinx chips. Grant then offered very technical and precise recommendations on how to handle that from the perspective of Linux. Your editor suggests anyone interested by the details to look at the video of the talk.

Using a JTAG for Linux driver debugging, Mike Anderson

Link to the video (113 minutes, 694 megabytes) and the slides.

During this two hours tutorial, Mike Andersen first described the development of a simple character device driver, and then the debugging of the Linux kernel and Linux kernel modules using JTAG. He described what JTAG devices are, what kind of hardware and software you need, how you can use them with gdb, how you configure them. This tutorial is a very good introduction to the use of JTAG devices for those who never or only rarely used that kind of hardware debugging technology.

However, as this talk is a tutorial with lots of live demonstrations, it’s probably not worth making a full report of it. Your editor rather suggests the reader to directly look at the video. Mike Anderson speaks very clearly, with a loud voice, making his tutorial very easy to understand, even for non-native English speakers.

Social event

Day 2 ended with the usual social event for such conferences. It started with a nice barbecue in the garden next to the Computer History Museum building. The conference attendees were able to prolong their discussions around tables, with lots of meat, wine and beer. After that barbecue, the attendees were invited to the Mountain View Laser Quest, on the other side of the street, to get some fun fighting with laser guns. Laser Quest employees were a bit puzzled by the nicknames chosen by the participants : fbflush, sbin init, dev zero, kill -9 or rm -rf /. Such social events are always a nice addition to the conference in that they allow to create more contacts with the other attendees.

Day 3

Appropriate Community Practices: Social and Technical advices, Deepak Saxena

Link to the video (44 minutes, 139 megabytes).

Your editor thought that sharing the video of such a talk with the community would be very interesting, and Kevin Hilman, a colleague of Deepak Saxena at Montavista, kindly accepted to record the talk. Thanks !

During that talk, your editor attended the Adding Framebuffer Support for Freescale SoCs, detailed below.

Adding framebuffer support for Freescale SoCs, York Sun

This talk followed the demonstration made by York Sun of a new Freescale CPU with impressive framebuffer capabilities, the MPC8610. It is a high-performance chip with interesting controllers, but the controller that was discussed during this talk was the LCD one. This controller is able to do real-time blending of up to three planes, and handle transparency between the planes. Inside each plane, several non-overlapping windows can also be created to render different applications, videos or pictures. The chip is able to display at 1280×1024, 60 Hz with a color depth up to 24 bits.

York Sun described the implementation of the Linux framebuffer driver for such a chip. He decided to export several framebuffer devices to userspace, one for each plane  /dev/fb0, /dev/fb1 and /dev/fb2. The former is the main plane, while the last two are the secondary planes. An application can render to any of these planes, the hardware will do the blending in real-time magically. However, there are still some differences between the primary plane and the secondary planes. System-wide configuration can only be made on the primary plane, for example.

This talk was interesting because it was an illustration of new hardware capabilities that create new challenges for Linux device drivers. The existing frameworks always have to be redesigned, refactorized, to take into account new hardware capabilities.

Back-tracing in MIPS-based Linux systems, Jong-Sung Kim

Link to the video (54 minutes, 160 megabytes) and the slides.

In this talk, Jong-Sun Kim, with a very light voice making the talk difficult to understand, made a report of his work on MIPS back-tracing, which happens to be a complex topic.

Of course, everyone knows that backtracing is very useful for debugging. However, backtracing facilities such as gcc’s __builtin_return_address() or glibc’s backtrace(3) or backtrace_symbols(3) are not available on MIPS. Jong-Sun then described a typical real-world MIPS stack frame in order to explain why back-tracing on MIPS is a difficult thing : there is no easy way to get the address of the caller stack frame.

So, the only solution, detailed by Jong-Sun, is binary code scanning. The speaker presented typical function prologue and epilogue code, and the backtracing procedure that can be used on MIPS (both in English and in C language). The procedure scans the prologue of the function to get information from the instructions themselves on the stack frame size, and deduce the location of the caller stack frame from that.

Then, Jong-Sun presented the challenges posed by back-tracing inside a signal handler. The execution context of a signal handler is a bit special, and adaptation of the back-tracing procedure has to be done to handle this case properly.

However, these procedures are not perfect : leaf functions (that do not save registers), assembly-coded or highly-optimized functions can have non-typical prologue code that could defeat the proposed back-tracing procedures. The speaker then demonstrated some of his back-tracing procedures, and said that he is currently working on releasing these functions as an open-source library or inside the MIPS port of the C library.

After this presentation, the questions and answers session started, to which the speaker didn’t really participate, probably due to language understanding problems. It happened that in the audience, four different people already made four different implementations of back-tracing procedures for MIPS. Some of them claimed to have better implementations than the one proposed by Jong-Sun (for some obscure details that your editor couldn’t get), but none of these implementations are currently released or available as part of the C library.

After the long discussion on what to do about back-tracing on MIPS, the speaker went back to his presentation with an interesting appendix on a crash report system used in LG Electronic products. The goal of this system is to guarantee that in-time information about system crashes could not be lost. Their system include a watchdog, which on expiration, will trigger an in-kernel procedure that will store the contents of the circular log buffer used for the console and in-time debug information into a NVRAM, for later debugging. Jong-Sun then explained the implementation details of their solution and showed an example of its use.

DirectFB internals, Things to know to write your DirectFB gfxdriver, Takanari Hayama

Link to the video (57 minutes, 160 megabytes).

To start the talk, Takanari by giving a short presentation of DirectFB. It is a lightweight and small footprint (< 700 kilobytes on SH4, the architecture of interest for the presenter). It doesn't have any server/client model like X11 has. DirectFB offers an abstraction layer for hardware graphics acceleration : anything not supported by hardware will still be supported by software. DirectFB has multi-process support and other things, such as a built-in window manager and more.

The first embedded chip supported by the mainline version of DirectFB was Renesas SH7722. The speaker then detailed the architecture used to support this device in DirectFB. At the lower level, one finds the hardware : video memory and hardware, and video hardware accelerator. On top of that, they run an unmodified Linux kernel to which they add a kernel module that handles the video hardware accelerator. It receives commands from a userspace part, integrated as a module in DirectFB, called the gfxdriver, which is specific to each video hardware. They also use the already existing defame system module of DirectFB that allows to access video memory using the /dev/mem character device. So all they had to implement was a kernel module and a userspace gfxdriver, all the rest was existing code.

Takanari then presented important terms in DirectFB terminology. From his slides :

  • Layers represent independent graphic buffers. Most of embedded devices have more than one layer, they get layered and displayed with appropriate alpha blending by hardware
  • Surface is a reserved memory region to hold pixel data. Drawing and blitting operations in DirectFB are performed from/to surfaces. Memory for surfaces could be allocated from video memory or system memory depending on system constraints
  • Primary surface is a special surface that represents the frame buffer of a particular layer. If the primary surface is single buffered, any operation to the primary surface is directly visible on the screen

DirectFB is composed of several modules that can be extended in an object-oriented way : system modules, graphic drivers, graphic devices, screens and layers. A system module should implement the functions of the CoreSystemFuncs structure defined in core_system.h and should be declared with DFB_CORE_SYSTEM(). A graphic driver should implement the GraphicsDriverFuncs structure defined in graphic_driver.h and be declared using DFB_GRAPHICS_DRIVER(). A graphic device object should implement the methods of the GraphicsDeviceFuncs structure (in gfxcard.h) and be registered inside the driver_init_driver() method of GraphicsDriverFuncs. Screens objects should implement the ScreenFuncs methods (in screens.h) and be registered with dfb_screens_register(). Finally, layers objects should implement DisplayLayerFuncs methods (in layers.h) and be registered using dfb_layers_register(). At this time of the talk, things were a bit confused in your editor’s mind, which is only a beginner in DirectFB internal concepts. Fortunately, during the rest of the talk, Takanari explained the role of each module and how to actually implement them.

He first talked about the system module. A system module provides access to the hardware resources (framebuffer and hardware management). As of DirectFB 1.1.0, several system modules are available : fbdev (the default, which uses the kernel framebuffer interface through the /dev/fb* devices), osx, sdl, vnc, x11 and devmem. The system module to use can be specified in the directfbrc file using system=devmem for example.

For embedded systems, the devmem system module, merged in DirectFB 1.0.1 is particularly interesting. It uses /dev/mem to access graphics hardware and framebuffer. According to the speaker, it’s a convenient way for those using memory mapped I/O and uniform memory among CPU and graphics accelerator, and it seems that most embedded devices fall into this category. When using devmem, one must specify additional parameters : video-phys, the physical address of the beginning of the video memory, video-length, its length, mmio-phys, the physical address of the beginning of the MMIO area used to control the graphics hardware, mmio-length, its length and accelerator, an ID used by DirectFB core to select the graphics driver. These values have to be specified in the directfbrc configuration file.

Takanari described how DirectFB matches systems and gfxdrivers. The DirectFB core calls the driver_probe() method implemented in each gfxdriver to ask each driver whether it supports a particular piece of detected hardware. If supported by the gfxdriver, the driver_probe() method should return a non-zero value. When using devmem, there is a special case : the value passed to driver_probe() is the value passed using the accelerator parameter. So one has to make sure that they match.

Then, Takanari went to the core of the talk, writing the graphics driver. It consists of several components : the graphics driver module, the graphics device module, the screen module (optional for fbdev, but mandatory for devmem) and the layer module (also optional for fbdev, but mandatory for devmem). To get your graphics accelerator to work, this is the code you must write, he said. One can use devmem, so that there is no need to write a kernel framebuffer driver usable through fbdev.

First, the graphics driver. At the beginning of the file, give a name to your graphics driver, using DFB_GRAPHICS_DRIVER(yourname). After that, DirectFB expects to find six functions. First, driver_probe() (detailed earlier), driver_get_info() to get meta-information about the driver, driver_init_driver() and driver_close_driver() to initialize and close the driver. driver_init_driver() is responsible for acquiring all the hardware resources (setting up the mappings, etc.) and then to register screens and layers. It is also responsible for setting pointers to acceleration functions in a GraphicsDeviceFuncs structure. Finally, you must define driver_init_device() and driver_close_device() to initialize and close the device. In driver_init_device() you must for example set the device capabilities, so that the DirectFB core knows what the device can do in hardware, and what it will have to do in software.

Takanari then detailed again the initialization steps of a gfxdriver. From his slides :

  1. DirectFB calls driver_probe() in each gfxdriver on the system with a graphics device identifier to find the corresponding gfxdriver for the device
  2. If driver_probe() returns non-zero, then DirectFB calls driver_init_driver(). In this function, the driver should register graphics device functions, screen and layers.
  3. The DirectFB core then calls driver_init_device(), in which the driver should set the capabilities supported by the device in a GraphicsDeviceInfo structure.

The GraphicsDeviceFuncs structure lists the functions supported by the driver, and are set during driver_init_driver() (see src/core/gfxcard.h for the definition of this structure and many other important structures). The driver developer doesn’t have to set all the functions, only the ones for which you want a specific implementation. According to the speaker, the most interesting and important ones are : reset/sync of graphics accelerator (EngineReset(), EngineSync()), check/set state of the graphics accelerator (CheckState(), SetState()) and blitting/drawing functions (Blit, StretchBlit(), FillRectangle(), DrawLine(), etc.). In total, there are twenty-two functions that can set through this GraphicsDeviceFuncs structure.

The acceleration process was then described by Takanari. DirectFB starts by calling the CheckState() method to ask the driver whether it is possible to execute a specific operation with a specific state. The driver can either answer that it supports the operation, or not. Otherwise, DirectFB will fall back to software rendering. When the hardware supports the operation, DirectFB calls SetState(), which gives the opportunity to the driver to program the hardware for the execution of a given operation in a given state. Once done, DirectFB finally calls the appropriate drawing/blitting function, such as Blit(). As Takanari explained, thanks to this modular approach to acceleration support, one can start with a very basic driver with no acceleration and incrementally add support for the acceleration of the different operations, one by one.

The speaker then mentioned the possibility of queuing draw/blit commands, if the graphics accelerator supports this. If so, you can queue the draw/blit operations as much as you can and then kick the hardware. This is implemented through the EmitCommands operation, and an example of this is visible in the sh7722 gfxdriver.

Now that the graphics driver and device modules have been covered, Takanari went on with the screen module. A screen represents an output device, such as an LCD. For a fixed size screen, the minimum functions that the driver developer has to define are InitScreen() and GetScreenSize. The screen operations must be listed in a ScreenFuncs structure (see src/core/screens.h for the definition), and registered by the driver_init_driver() operation using the dfb_screens_register() function.

Then, layers. They represent independent graphics buffers, and they are merged by the hardware when they get displayed on the screen (usually with alpha blending). Layers are required to change the size, the pixel format, the buffering mode, the color-lookup table (CLUT) and to flip buffers. The layer operations must be implemented and listed in a DisplayLayerFuncs structure (see src/core/layers.h for the definition), and registered by the driver_init_driver() operation using the dfb_layers_register() function. The important display layer operations are : LayerDataSize() (returns the size of the layer data to be stored in shared memory), RegionDataSize() (returns the size of region data to be stored in shared memory), InitLayer() (initialize layer), TestRegion() (check if given parameters are supported), SetRegion() (program hardware with given parameters), RemoveRegion() (remove the region), FlipRegion() (flip the frame buffer).

To finish the talk, Takanari gave detailed information about surface allocation. DirectFB 1.0 used a single one-dimensional linear surface allocator. Using the DCAPS_VIDEOONLY flag, you could request a surface in the form of a contiguous memory block, and with DCAPS_SYSTEMONLY, the surface is allocated using malloc(). For embedded graphics accelerators, Takanari said that you probably have to use physically contiguous memory, so that DCAPS_VIDEOONLY is the only solution. The only way to customize surface allocation was through the Layer Driver API, mostly for primary surfaces. Takanari then gave examples of cases where custom surface allocation is needed, and explained how this could be done in DirectFB 1.0. Then, he introduced a new concept that was added to DirectFB 1.1 : surface pools, which greatly simplify surface allocation.

OpenEmbedded for product development, Matthew Locke

Link to the video (49 minutes, 141 megabytes) and the slides.

Matthew Locke works for Embedded Alley, so he is a colleague of Matt Porter, who gave a talk on the first day of the conference about leveraging existing free and open source software in embedded projects. Matthew’s talk is a very good presentation of OpenEmbedded, with step by step details on how to use it.

Matthew started his talk with a bit of background on OpenEmbedded. It was started inside the OpenZaurus project in order to make it easier to build applications for Zaurus PDAs. After some time, the build tool was rewritten to separate it from the meta-data describing how to build the various applications. The build tool, named bitbake is based on concepts found in Gentoo’s portage tool. OpenEmbedded is now used by many open source projects : handhelds.org, Linksys routers, Motorola phones, MythTV hardware and more recently OpenMoko.

The speaker then defined OpenEmbedded as a « self contained cross build system for embedded devices ». It contains a collection of recipes describing how to build thousands of packages including bootloaders, kernels, libraries and applications. It targets more than 60 machines and provides over 40 package/machine configurations, which are basically custom distributions for each device. OpenEmbedded doesn’t contain the source code for the applications, it fetches it from tarballs or SVN thanks to instructions in the meta-data. At the end, OpenEmbedded outputs individual packages and filesystem images (jffs2, ext3, etc.).

The building philosophy of OpenEmbedded is to build from scratch. By default, it builds the latest version of all components by downloading their source code and possibly applying patches to them.

OpenEmbedded is implemented using bitbake, whose role is to parse recipes and configurations, to create a database of how to fetch, configure, build and install each package, to determine the dependencies between packages, to build them in correct order, in parallel when possible. It uses the IPK packaging format for individual packages.

To setup OpenEmbedded, one should first decide which meta-data version to use (the latest one or a stable snapshot), then install bitbake. The speaker suggests to setup a pristine OpenEmbedded directory, and to not make changes directly inside this set of reference meta-data. There a mechanism called overlay that allows to keep to modifications separate from the OpenEmbedded code. The speaker also suggests to set up an internal mirror of the upstream software you’re using, so that you are sure of the version you will be using and there won’t be any surprise such as upstream server downtime, etc. The switch to an internal mirror is very simple in OpenEmbedded.

bitbake parses all configuration and recipes files found in the directories listed in the BBPATH environment variables. So one can setup an overlay directory that will contain specific configuration files, internal packages meta-data and that might override any pristine meta-data. So, BBPATH should include two directories : the one with the pristine OpenEmbedded meta-data, and the directory with the overlay information. The overlay directory should contain two subdirectories : conf/ with custom and overridden configuration files and packages/ with internal and overridden package files. This way, you can for example override the default Busybox configuration provided in the pristine OpenEmbedded meta-data by your specific Busybox configuration.

The configuration files define how the build environment is set up, package versions, information, global inheritance, target boards, final image configuration, etc. There are four types of configuration files, and Matthew went through all of them one by one to give an idea of what they are useful for.

The first configuration file is the distro configuration file. It defines toolchain and package versions, package configuration and high level settings such as whether udev should be used or not, and the final image format. Matthew detailed a specific example of such a configuration file, which simply defines a set of variables.

The second configuration file is the machine configuration file, defining board specific versions and features : architecture, compiler options, kernel version, package provider and board specific things. Again, the speaker gave an example in which one could see the specification of the architecture, of the udev and kernel versions, and of a list of features that should be included (alsa, host-usb, gadget-usb, mtd, wifi, etc.). This feature will trigger the building and inclusion of various packages into the final image.

The next configuration file was the recipe files, with the .bb extension. They contain necessary information to build a package, in the form of functions do_fetch(), do_stage(), do_configure(), do_compile(), do_install(). There are also four types of bb files. First classes, that define common steps for a package class (for example all kernel packages share the same building procedure). Then packages, that usually inherit classes, add or override package specific settings and steps. The next type of bb files are tasks, that define the collection of packages to be built. The last type, images, allows to create filesystem images out of tasks.

Matthew then went into more details about tasks. They allow to divide packages into logical groups, to enable developers to work on building blocks or to separate production and development. Typically, one could define four tasks : base (with the base system, kernel, glibc, busybox, etc.), core (with the core open source middleware components), apps (with the product applications) and UI (with the interface specific components). Of course this is completely flexible.

A task file for base could look like :

RDEPENDS = "\
 ${@base_contains("DEVEL_FEATURES", "alsa", "${ALSA_PKGS}","",d)} \
 base-files base-passwd busybox-devel \
 kernel kernel-modules \
 initscripts sysvinit udev \
 ${@base_contains("DEVEL_FEATURES", "mtd", "mtd-utils", "", d)} \
 ${@base_contains("DEVEL_FEATURES", "wifi", "wireless-tools", "", d)} \
 dropbear \
"

It gives a list of packages, with some packages being included only if some features have been enabled in the machine configuration file.

The speaker then gave a list of advises on how to use OpenEmbedded in the development of a commercial product. First, he suggested to make local copies of open source components as tarballs on a local server. He also thinks that locking down the versions of the components in a Bill of Materials configuration file is a good idea. This configuration file, using the PREFERRED_VERSION_<pkgname> = <version> directive, makes sure that the system is built with correct package versions. Then, he suggested to create meta-data for the internal components as well, to use parallel building to reduce run time, and to create and distribute a ready-to-use build environment (SDK).

Once the base system is ready to be built with OpenEmbedded, the custom applications still have to be developed. The speaker gave some tips on how to do that with OpenEmbedded. There are basically two ways. The first one is to use OpenEmbedded directly during application development : create a bb recipe for the applications, keep the revision control system updated with the changes you want to test, build using bitbake <packagename> and integrate into the system by adding the package in the appropriate package file. This is a powerful method, but with drawbacks : your application is recompiled completely every time, and you must commit your changes to the SCM to be able to test them. The second method is to export the SDK from OpenEmbedded : setup OpenEmbedded to export the toolchain and libraries to an environment that is independent from OpenEmbedded, then build your applications the usual way from your local sources, and integrate them into OpenEmbedded only when they are ready.

Finally, Matthew described the result of an OpenEmbedded build. It creates several directories, the interesting ones being conf, the build specific configuration files, deploy, the images (in the images/ subdirectory) and individual packages (in the ipk/ subdirectory), staging, that contains the intermediate install for libraries and headers, work, which is the build directory, cross that contains the host tools for the target and rootfs, which contains the expanded root filesystem generated by OpenEmbedded.

Matthew concluded his talk by describing OpenEmbedded as a « very powerful meta-data system », with many advantages : layered design that eases customization, easy support of commercial software development, many supported packages, high flexibility, and a large community using and maintaining it. However, he admitted that the learning curve is quite steep, and that finding a version of meta-data that “just works” can be a challenge.

The resources pointed by the speaker are the OpenEmbedded official website, the Bitbake manual and the OpenMoko Wiki, which contain a lot of information on how to build a complete software stack with OpenEmbedded.

In the end, your editor found this talk particularly interesting. He was fond of Buildroot, another tool with similar capabilities, but discovered that OpenEmbedded had real advantages over Buildroot, and that it was probably worth spending some time testing and playing with it.

Disko, an application framework for digital media devices, Guido Madaus

Link to the video (27 minutes, 190 megabytes)

In this talk, Guido Madaus and his colleague, two German developers, presented Disko, a framework to develop multimedia applications for digital media devices. The project started under the name MorphineTV, which is an application to use inside set-top boxes. After some time, the developers figured out that a more generic framework, allowing others to create custom applications and plug-ins, would be interesting, and the result of this effort is the Disko framework. It allows to create GUI applications for multimedia devices using a simple XML language. The applications are then fully skinable and themable. Following the conference, LinuxDevices published an article Linux gains lightweight media-oriented graphics stack about Disko, with the slides of the presentation.

Keynote: the status of embedded Linux and CELF plenary, Tim Bird

Link to the video (49 minutes, 112 megabytes) and the slides.

In his conference closing keynote, Tim Bird starting by giving a list of kernel highlights, things that happened over the last year and that will probably happen in the coming months, with of course a focus on embedded related features. So, for 2.6.24, he highlighted :

  • Kernel markers introduced in 2.6.24, with maybe LTTng (Linux Trace Toolkit) coming soon
  • Removal of the security module framework
  • The power management quality of service work (PM QoS), covered by a talk during the conference

2.6.25 which was released just during the conference, so that Tim wasn’t even aware of it when starting the talk. But of course, the well-informed audience noted that and told Tim Bird the good news. Tim highlighted the following features :

  • Kpagemap, Matt Mackall’s patches for fine-grained memory measurement
  • The latency measurement API, which is the foundation of LatencyTOP
  • Smack, the simple mandatory access control security module. Tim said that we need to see if it can makes sense to use it in embedded systems

On the radar, Tim seems two interesting things. First, the latency trace system, based on the gcc -mcount feature, which support multiple tracers and is based on the latency tracer available in the RT tree. And also mem_notify, which allows processes to avoid the OOM killer by responding to events and shrinking their memory usage voluntarily (see this LWN article).

He then mentioned the Technology Watch List maintained by the CE Linux Forum. It’s a list of technologies that they are interested in and that they are watching, like the Kernel Weather Forecast, but with an embedded focus. This page is available at http://elinux.org/Technology_Watch_List on the elinux.org Wiki. The list of technologies in the list is very impressive, and Tim went quickly through it with a few slides.

First preoccupation, the kernel size. Tim of course mentioned the Linux-Tiny work, which is now maintained by Michael Opdenacker, from Free-Electrons, and the CE Linux Forum is contracting Bootlin for this work. He mentioned the patches being mainlined by your editor, who also works for Free-Electrons, and suggested to see http://elinux.org/Linux_Tiny_Patch_Details for more information about the patch. Then Tim mentioned the kpagemap work done by Matt Mackall, founded by CELF and merged in 2.6.25. Matt Mackall also released Bloatwatch 2.0, a tool to show kernel size regression that has been covered in his talk Kernel Size Report during the conference.

Tim described kpagemap, which allows to get details about every allocated page in the system, and introduces new metrics to measure memory consumption in userspace applications. The existing metric, RSS, is not convenient because it counts memory consumed by shared pages for all processes mapping these pages. The new metrics, PSS, for Proportional Set Size and USS for Unique Set Size should give a better idea of memory consumption. For shared pages, PSS divides memory consumption by the number of processes actually mapping these pages. USS simply doesn’t count shared pages. Tim gave some interesting pointers : an ELC presentation (no longer available on-line), an LWN article and a visualization tool.

The next topic was filesystems. First SquashFS, the famous compressed read-only file system with better compression than CramFS. It is actively maintained (last release in February 2008), but still not mainlined. The main developer, Philip Lougher, injured his hand, so he cannot work too much on the project, so some help would be appreciated to get that filesystem merged in the kernel tree. The second filesystem, AXFS, was covered by a talk by Jared Hulbert during the conference. It is an advanced XIP file system that can profile applications and then use XIP only on some blocks. It allows fine-grained control over how much flash vs. RAM is used for an application set. A mainline merger was attempted in the summer of 2007, and the main developer said that he would try again soon. The third filesystem, LogFS, was also covered by a talk from Jörn Engel during ELC. It’s a flash filesystem that solves the scalability problems of JFFS2. It reduces memory consumption and mounting time compared to JFFS2, but still has some outstanding problems to be solved. Tim mentioned that CELF gives financial support to work on this filesystem. The last filesystem, UBIFS, is built on top of UBI, a new flash layer merged into the kernel. UBIFS inclusion into mainline has recently been requested by Nokia. See this whitepaper and this LWN article for more information about UBIFS.

Tim Bird is also very interested by tracing solutions. On the LTTng front, the markers infrastructure has been merged in 2.6.24, and the next thing to merge is the core of LTTng. The markers are a infrastructure for static instrumentation, so they do not compete with Kprobes which allows for dynamic instrumentation. The goal of kernel markers is to have a very low overhead when not in use, thanks to the use of immediate values. Tim Bird also mentioned SystemTap and the work done by Lineo engineers to adapt it to a cross-compiled environment (see our demonstrations report), Kernel Function Trace which is now maintained by Nicholas McGuire, and printk-times architecture support.

The next topic was security, with a discussion on Tomoyo Linux and App Armor, both of which are still out of the mainline version, and Smack which is now part of 2.6.25. He also mentioned the work on Embedded SE Linux, and the talks given by Nakamura and Kohei during the conference. SE Linux requires a filesystem with extended attributes support, and usually comes with enormous configurations. People were able to reduce the configuration size to only 700 kilobytes, which makes SE Linux usable in some embedded contexts.

On the power management front, PowerTOP was mentioned, but also PM QoS, merged in 2.6.24 and the work of Wolfson Electronics on voltage and current regulators.

In the real-time area, the high resolution timers have been merged to 2.6.21, but some work is still needed on some architectures. Large pieces of linux-rt still remain to be merged : threaded interrupts, sleeping spinlocks and latency tracer, amongst others. As said previously, work to mainline the latency tracer is currently being done.

Tim said that this list should of course be updated with the progress of the different projects. The fact that the page is on the Embedded Linux Wiki will probably help.

Then, some other topics of interest to CELF members were briefly presented : bootup time, system size, licensing, graphics (with an interest in GstOpenMAX), middleware (discussion about DLNA), mobile phone stack wars (with Android, LIMO and the new ARM Ultra-Mobile PC initiative).

At the end of the talk, Tim Bird gave a quick presentation of the CE Linux Forum, which is focused on the advancement of Linux as an open source platform for consumer electronics devices. It was founded in June 2003 and now has about 50 member companies, with Panasonic, Sony, Hitachi, Toshiba, Sharp, Philips, Samsung, NEC, IBM, etc. Interestingly, more than half of CELF members are in Asia, around a third in the US and around ten percent in Europe. There is an almost equal representation of Consumer Electronics players, Semiconductor players and Software players at the CELF. The CELF does some technical work, through workgroups, contract work, conferences, technical output and special projects. There are many workgroups in CELF, about audio video and graphics, boot technologies, digital television, memory management, power management, real time, security, system size, etc. Tim had some slides about them, but skipped them to make the presentation shorter.

Tim highlighted the contracted work done by Matt Mackall, Matt Locke, Bill Traynor, Michael Opdenacker, Nicholas McGuide and Jörn Engel. They are ready to fund other projects of interest for Linux on embedded devices. CELF also organizes or is present at several conferences : Embedded Linux Conference, Ottawa Linux Symposium, Regional Jamborees, ELC-Europe and Japan trade shows. The next ELC-Europe conference will take place on November 6th and 7th in Ede, The Netherlands.

Before closing, Tim again said that the elinux.org was open to contribution, and that it should become a central place of information for the use of Linux on embedded devices. Finally, Tim said that « Linux is destined to dominate the embedded market, so let’s have fun doing it ! ».

After the talk, a game was organized, at the end of which one person could win a Nokia Internet tablet. The first stage of the game consisted in solving Tango puzzles. After that stage, three persons were selected for the second stage : Jörn Engel, Matt Mackall and Liam Girdwood. During the second stage, they played the wheel of fortune, during which they had to guess sentences such as Linus Torvalds or Tux The Penguin. In the end, Matt Mackall won that second stage and could go back home with a shiny new Nokia Internet tablet. Cool !

Other talks

Of course, many other interesting talks took place at the Embedded Linux Conference, on a wide variety of topics. Here is a list of the other talks, along with links to their corresponding slides if available on the conference website :

  • Building blocks for embedded power management, by Kevin Hilman. Kevin also gave this presentation at FOSDEM in February 2008 in Brussels, and Bootlin recorded the talk, so that a video of it is available (56 minutes, 183 megabytes). LWN also covered this talk.
  • How to analyze your Linux’s behavior with TOMOYO Linux, by Kentaro Takeda. Slides are available.
  • How GCC works, an embedded engineer’s perspective, by Gene Sally. See the GCC tips page on the Embedded Linux Wiki, and the LWN report for this talk.
  • Avoiding web application flaws in embedded devices, by Jake Edge. Slides are available.
  • Compressed swap solution for embedded Linux, by Alexander Belyakov. This talk was canceled at the last minute, but the slides and documents are available.
  • Development of Embedded SELinux, by Yuichi Nakamura. The slides are available, and LWN made a report of this talk.
  • AXFS: Architecture and results, by Jared Hulbert. Slides are available. I had the opportunity to read a whitepaper (web page no longer available) about various XIP technologies. AXFS is evaluated and looks very interesting. More information on the Embedded Linux Wiki or on AXFS official website on Sourceforge.
  • Recent security features and issues in embedded systems, by KaiGai Kohei. Slides are available.
  • Avoiding OOM on embedded Linux, by YoungJun Jang. Slides are available.
  • Real-time virtualization solutions for Linux, a comparison of strategies, by Nicholas McGuire.
  • Instant startup for application using reduced relocation time and rearranging functions, by Min-Chan Kim. Slides are available.
  • A symphony of flavors : using the device tree to describe embedded hardware, by Grant Likely. Slides are available.
  • GPE Phone Edition, an open source software stack for Linux mobile phones? by Nils Faerber.
  • Trouble shooting for blocking problem, by Seo Hee.
  • Compiling full desktop distributions for ARM : the handhelds rebuild project, by Andrew Christian.Slides are available.
  • Enhancements to USB gadget framework, by Conrad Roeber. Slides are available.
  • Development of mobile Linux open platform, by Jyunji Kondo. Slides are available.
  • Learning kernel hacking from clever people, by Hugh Blemings.
  • Maemo mobile Linux platform, current status and future directions, by Kate Alhola.
  • Linux system power management on OMAP3430, by Richard Woodruff. Slides are available.
  • Status of LogFS, by Jörn Engel. LWN made a short report of this talk.
  • Embedded Linux development with Eclipse, by JT Thomas. Slides are available.
  • OpenMoko, by Michael Shiloh
  • Filesystem support on multi-level cell (MLC) flash in open source, by Kyungmin Park. Slides are available.
  • GStreamer on embedded, latest development and features, by Christian Shaller. Slides are available.
  • Cross-compiling tutorial, by Rob Landley
  • Gstreamer and OpenMAX IL: Plug-and-Play, by Felipe Contreras.
  • APCS, ARM Procedure Call Standard, Tutorial, by Seo Hee
  • Episodes of LKST for embedded Linux systems, by Hiroshisa Iijima. Slides are available.
  • Using UIO on an embedded platform, by Katsuya Matsubara. Slides are available.

Conclusion

For your editor, it was his first edition of the Embedded Linux Conference. The contents of the conference were very good, highly technical and of sufficient variety to allow all the attendees to find valuable information matching their interests. The conference organization was also absolutely perfect : nice venue, free lunch every day, fun social event, etc. The demonstrations session was also very interesting, and one could wish that more projects will be present next year.

Congratulations to the organization team, thanks to Tim Bird and the CE Linux Forum for setting up such a great conference every year !

As a suggestion for next years, your editor would suggest to set up a proper video recording team, with a connection to the room audio system. This would allow to record all the talks in high-quality, so that participants could see the talks they missed, and it would also benefit to people that couldn’t come to the conference for various reasons.

Conference videos and report

27 free videos from the ELC and FOSDEM 2008 conferences. Extensive technical report from ELC 2008.

After participating to the Embedded Linux Conference (ELC) in Mountain View, and to FOSDEM in Brussels, we are pleased to release the videos that we managed to shoot.

These videos should be useful to anyone interested in the multiple topics covered by these very interesting conferences, either to people who couldn’t join these conferences, or to single core participants who couldn’t attend more than one presentation at once. These videos are also interesting opportunities to see and hear key community members like Andrew Morton, Keith Packard, Henry Kingman, Tim Bird and many others!

While we’ve been releasing free technical videos for a few years now, ELC is the first conference for which we are also offering an extensive report, written by Thomas Petazzoni, one of our kernel and embedded system developers. This report is trying to sum up the most interesting things learned at this conference, at least from the presentations Thomas could attend. This way, you shouldn’t have to view all the videos to identify the most interesting talks.

Creative commons In agreement with the speakers, these videos and the report are released under the terms of the Creative Commons Attribution-ShareAlike 3.0 license.

We hope that sharing this knowledge will attract new contributors and users, and will bring our community one step closer to world domination…

Embedded Linux Conference, Mountain View, Apr. 2008

Don’t miss our detailed report on the below presentations!

  • Keynote: The Relationship Between kernel.org Development and the Use of Linux for Embedded Applications, by Andrew Morton (Google):
    video, slides (55 minutes, 240 MB)
  • UME – Ubuntu Mobile and Embedded, by David Mandala (Canonical):
    video, slides (30 minutes, 145 MB)
  • Appropriate Community Practices: Social and Technical Advice, by Deepak Saxena (MontaVista):
    video (thanks to Kevin Hilman, MontaVista)(44 minutes, 139 MB)
  • Adventures In Real-Time Performance Tuning, by Frank Rowand:
    video,slides (50 minutes, 251 MB)
  • Shifting Sands: Lessons Learned from Linux on an FPGA, by Grant Likely:
    video, slides (44 minutes, 262 MB)
  • Disko – An Application Framework for Digital Media Devices, by Guido Madaus:
    video (27 minutes, 190 MB)
  • Keynote: Tux in Lights, by Henry Kingman (LinuxDevices.com):
    video, slides (44 minutes, 139 MB)
  • Back-tracing in MIPS-based Linux Systems, by Jong-Sung Kim (LG Electronics):
    video, slides
    (54 minutes, 160 MB)
  • Making a Phone Call With Phase Change Memory, by Justin Treon (Numonyx):
    video, slides (28 minutes, 159 MB)
  • Building Blocks for Embedded Power Management, by Kevin Hilman (MontaVista):
    We couldn’t film his presentation, but we already shot a similar presentation he gave at Fosdem 2008: video ((56 minutes, 183 MB)
  • Using Real-Time Linux, by Klaas van Gend (MontaVista):
    video, slides (53 minutes, 263 MB)
  • Every Microamp is Sacred – A Dynamic Voltage and Current Control Interface for the Linux Kernel, by Liam Girdwood (Wolfson Microelectronics):
    video, slides (35 minutes, 71 MB)
  • Power Management Quality of Service and How You Could Use it in Your Embedded Application, by Mark Gross (Intel):
    video, slides (57 minutes, 401 MB)
  • OpenEmbedded for product development, by Matt Locke (Embedded Alley):
    video, slides (49 minutes, 141 MB)
  • Kernel Size Report, and Bloatwatch Update, by Matt Mackall (Selenic Consulting):
    video (49 minutes, 146 MB)
  • Leveraging Free and Open Source Software in a Product Development Environment, by Matt Porter (Embedded Alley):
    video, slides (45 minutes, 220 MB)
  • Using a JTAG for Linux Driver Debugging, by Mike Anderson (PTR Group):
    video, slides (113 minutes, 694 MB)
  • DirectFB Internals – Things You Need to Know to Write Your DirectFB gfxdriver, by Takanari Hayama ():
    video (43 minutes, 200 MB)
  • Linux Tiny – Penguin Weight Watchers, by Thomas Petazzoni (Bootlin):
    video (thanks to Jean Pihet, MontaVista), slides (32 minutes, 140 MB)
  • Keynote: Status of Embedded Linux and CELF Plenary Meeting, by Tim Bird (Sony):
    video, slides (49 minutes, 112 MB)

Slides are collected on http://www.celinux.org/elc08_presentations/.

Fosdem, Brussels, Feb. 2008

  • Modest, email client for embedded systems, by Dirk-Jan Binnema (Nokia):
    video (34 minutes, 121 MB)
  • Design a Linux robot companion with 8 bits microcontrollers, by David Bourgeois:
    video (54 minutes, 211 MB)
  • Linux on the PS3, by Olivier Grisel:
    video (47 minutes, 272 MB)
  • Xen for Secure Isolation on ARM11, by Jean-Pihet (MontaVista):
    video (41 minutes, 207 MB)
  • Building blocks for Embedded Power Management, by Kevin Hilman (MontaVista):
    video (56 minutes, 183 MB)
  • Emdebian Update: Rootfs, GPE and tdebs, by Neil Williams:
    video (47 minutes, 226 MB)
  • pjsip: lightweight portable SIP stack, by Perry Ismangil:
    video (55 minutes, 194 MB)

Additional video

  • Roadmap to recovery – pain and redemption in X driver development, by Keith Packard:
    video (44 minutes, 168 MB)