clink

Compacts directories by replacing duplicate files by symbolic links

clink is a simple Python script that replaces duplicate files in Unix filesystems by symbolic links.

  • clink saves space. It works particularly well with automatically generated directory structures, such as compiling toolchains.
  • clink uses relative links, making it possible to move processed directory structures
  • clink is fast. It reads each file only once and its runtime is mainly the time taken to read files.
  • clink is light. It consumes very little RAM. No problem to run it on huge filesystems!
  • clink is easy to use. Just download the script and run it!
  • clink is free. It is released under the terms of the GNU General Public License.
clink logo

Usage

usage: clink [options] [files or directories]

Compacts folders by replacing identical files by symbolic links

options:
  --version      show program's version number and exit
  -h, --help     show this help message and exit
  -d, --dry-run  just reports identical files, doesn't make any change.

Screenshot

clink screenshot

Downloads

Here is the OpenPGP key used to generate the signatures.

How it works

clink reads all the files one by one, and computes their SHA (20 bytes) and MD5 (16 bytes) checksums. The trick to easily find identical files is a dictionary of files lists indexed by their SHA checksum.

All the files with the same SHA checksum are not immediately considered as identical. Their MD5 checksums and sizes are also compared then. There is an extremely low probability that files meeting all these 3 criteria at once are different. You are much more likely to face file corruption because of a hardware failure on your computer!

Hard links to the same contents are treated as regular files. Keeping one instance and replacing the others by symbolic links is harmless. Files implemented by symbolic links also have the advantage of not having their contents duplicated in tar archives.

Limitations and possible improvements

  • File permissions: clink just keeps one copy of duplicate files. The permissions of this file may be less strict than those of other duplicates. If permissions matter, enforce them by yourself after running clink.
  • Directory structure: even when entire directories are identical, clink just creates links between files. This is not fully optimal in this case, but it keeps clink simple.

Similar tools or alternatives

  • dupmerge2: replaces identical files by hardlinks.
  • finddup: finds identical files.

Demo: tiny qemu arm system with a DirectFB interface

A tiny embedded Linux system running on the qemu arm emulator, with a DirectFB interface, everything in 2.1 MB (including the kernel)!

Overview

This demo embedded Linux system has the following features:

  • Very easy to run demo, just 1 file to download and 1 command line to type!
  • Runs on qemu (easy to get for most GNU/Linux distributions), emulating an ARM Versatile PB board.
  • Available through a single file (kernel and root filesystem), sizing only 2.1 MB!
  • DirectFB graphical user interface.
  • Demonstrates the capabilities of qemu, the Linux kernel, BusyBox, DirectFB, and
    shows the benefits of system size and boot time reduction techniques as advertised and supported by the CE Linux Forum.
  • License: GNU GPL for root filesystem scripts. Each software component has its own license.

How to run the demo

  • Make sure the qemu emulator is installed on your GNU/Linux distribution. The demo works with qemu 0.8.2 and beyond, but it may also work with earlier versions.
  • Download the vmlinuz-qemu-arm-2.6.20
    binary.
  • Run the below command:
    qemu-system-arm -M versatilepb -m 16 -kernel vmlinuz-qemu-arm-2.6.20 -append "clocksource=pit quiet rw"
  • When you reach the console prompt, you can try regular Unix commands but also the graphical demo:
    run_demo
    

FAQ / Troubleshooting

  • Q: I get Could not initialize SDL - exiting when I try to run qemu.

    That’s a qemu issue (qemu used the SDL library). Check that you can start graphical applications from your terminal (try xeyes or xterm for example). You may also need to check that you have name servers listed in /etc/resolv.conf. Anyway, you will find solutions for this issue on the Internet.

Screenshots

console screenshot df_andi program screenshot
df_dok program screenshot df_dok2 program screenshot
df_neo program screenshot df_input program screenshot

How to rebuild this demo

All the files needed to rebuild this demo are available here:

  • You can rebuild or upgrade the (Vanilla) kernel by using the given kernel configuration file.
  • The configuration file expects to find an initramfs source directory in ../rootfs, which
    you can create by extracting the contents of the rootfs.tar.7z archive.
  • Of course, you can make changes to this root filesystem!

Tools and optimization techniques used in this demo

Software and development tools

  • The demo was built using Scratchbox, a fantastic development tool that makes cross-compiling transparent!
  • The demo includes BusyBox 1.4.1, an toolbox implementing most UNIX commands in a few hundreds of KB. In our case, BusyBox includes the most common commands (like a vi implementation), and only sizes 192 KB!
  • The root filesystem is shipped within the Linux kernel image, using the initramfs technique, which makes the kernel simpler and saves a dramatic amount of RAM (compared to an init ramdisk).
  • The demo is interfaced by DirectFB example programs (version 0.9.25, with DirectFB 1.0.0-rc4), which demonstrate the amazing capabilities of this library, created to meet the needs of embedded systems.

Size optimization techniques

The below optimization techniques were used to reduce the filesystem size from 74 MB to 3.3 MB (before compression in the Linux kernel image):

  • Removing development files: C headers and manual pages copied when installing tools and libraries, .a library files, gdbserver, strace, /usr/lib/libfakeroot, /usr/local/lib/pkgconfig
  • Files not used by the demo programs: libstdc++, and any library or resource file.
  • Stripping and even super stripping (see sstrip) executables and libraries.
  • Reducing the kernel size using CONFIG_EMBEDDED switches, mainly from the
    Linux Tiny project.

Techniques to reduce boot time

We used the below techniques to reduce boot time:

  • Disabled console output (quiet boot option, printk support was disabled anyway), which saves time scrolling the framebuffer console.
  • Use the Preset Loops per Jiffy technique to disable delay loop calculation, by feeding the kernel with a value measured in an earlier boot (lpj setting, which you may update according to the speed of your own workstation).

All these optimization techniques and other ones we haven’t tried yet are described either on the elinux.org Wiki or in our embedded Linux optimizations presentation.

Future work

We plan to implement a generic tool which would apply some of these techniques in an automatic way, to shrink an existing GNU/Linux or embedded Linux root filesystem without any loss in functionality. More in the next weeks or months!

Linux USB drivers

Learning how to write USB device drivers for Linux

Bootlin is proud to release a new set of training slides from its embedded Linux training materials. These new ones cover writing USB device drivers for Linux.

Like everything we create, these new materials are released to the user and developer community under a free license. They can be freely downloaded, copied, distributed or even modified according to the terms of the Creative Commons Attribution-ShareAlike 2.5 license.

Embedded Linux and Ecology

Embedded Linux contributions to the Linux Ecology HOWTO.

Bootlin has contributed major updates to the Linux Ecology HOWTO, a Linux Documentation Project document that gathers ideas and techniques for using Linux in an environmentally friendly way.

In particular, Bootlin took advantage of its experience with embedded Linux system development to add new techniques which can reduce power consumption or make it possible to extend the lifetime of old systems with limited resources.

Bootlin also contributed an overview presentation on this HOWTO. The latest HOWTO version with our updates (waiting for the next official release) can also be found on the same page.