Reducing start-up time looks like one of the most discussed topics nowadays, for both embedded and desktop systems. Typically, the boot process consists of three steps: 
- First-stage bootloader
- Second-stage bootloader
- Linux kernel
The first-stage bootloader is often a tiny piece of code whose sole purpose is to bring the hardware in a state where it is able to execute more elaborate programs. On our testing board (CALAO TNY-A9260), it’s a piece of code the CPU stores in internal SRAM and its size is limited to 4Kib, which is a very small amount of space indeed. The second-stage bootloader often provides more advanced features, like downloading the kernel from the network, looking at the contents of the memory, and so on. On our board, this second-stage bootloader is the famous U-Boot.
One way of achieving a faster boot is to simply bypass the second-stage bootloader, and directly boot Linux from the first-stage bootloader. This first-stage bootloader here is AT91bootstrap, which is an open-source bootloader developed by Atmel for their AT91 ARM-based SoCs. While this approach is somewhat static, it’s suitable for production use when the needs are simple (like simply loading a kernel from NAND flash and booting it), and allows to effectively reduce the boot time by not loading U-Boot at all. On our testing board, that saves about 2s.
As we have the source, it’s rather easy to modify AT91bootstrap to suit our needs. To make things easier, we’ll boot using an existing U-Boot uImage. The only requirement is that it should be an uncompressed uImage, like the one automatically generated by make uImage when building the kernel (there’s not much point using such compressed uImage files on ARM anyway, as it is possible to build self-extractible compressed kernels on this platform).
Looking at the (shortened) main.c, the code that actually boots the kernel looks like this:
int main(void)
{
/* ================== 1st step: Hardware Initialization ================= */
/* Performs the hardware initialization */
hw_init();
/* Load from Nandflash in RAM */
load_nandflash(IMG_ADDRESS, IMG_SIZE, JUMP_ADDR);
/* Jump to the Image Address */
return JUMP_ADDR;
}
In the original source code, load_nandflash actually loads the second-stage bootloader, and then jumps directly to JUMP_ADDR (this value can be found in U-Boot as TEXT_BASE, in the board-specific file config.mk. This is the base address from which the program will be executed). Now, if we want to load the kernel directly instead of a second-level bootloader, we need to know a handful of values:
- the kernel image address (we will reuse IMG_ADDRESShere, but one could
 imagine reading the actual image address from a fixed location in NAND)
- the kernel size
- the kernel load address
- the kernel entry point
The last three values can be extracted from the uImage header. We will not hard-code the kernel size as it was previously the case (using IMG_SIZE), as this would lead to set a maximum size for the image and would force us to copy more data than necessary. All those values are stored as 32 bits bigendian in the header. Looking at the struct image_header declaration from image.h in the uboot-mkimage sources, we can see that the header structure is like this:
typedef struct image_header {
uint32_t    ih_magic;    /* Image Header Magic Number    */
uint32_t    ih_hcrc;    /* Image Header CRC Checksum    */
uint32_t    ih_time;    /* Image Creation Timestamp    */
uint32_t    ih_size;    /* Image Data Size        */
uint32_t    ih_load;    /* Data     Load  Address        */
uint32_t    ih_ep;        /* Entry Point Address        */
uint32_t    ih_dcrc;    /* Image Data CRC Checksum    */
uint8_t        ih_os;        /* Operating System        */
uint8_t        ih_arch;    /* CPU architecture        */
uint8_t        ih_type;    /* Image Type            */
uint8_t        ih_comp;    /* Compression Type        */
uint8_t        ih_name[IH_NMLEN];    /* Image Name        */
} image_header_t;
It’s quite easy to determine where the values we’re looking for actually are in the uImage header.
- ih_sizeis the fourth member, hence we can find it at offset 12
- ih_loadand- ih_epare right after- ih_size, and therefore can be found at offset 16 and 20.
A first call to load_nandflash is necessary to get those values. As the data we need are contained within the first 32 bytes, that’s all we need to load at first. However, some space is required in memory to actually store the data. The first-stage bootloader is running in internal SRAM, so we can pick any location we want in SDRAM. For the sake of simplicity, we’ll choose PHYS_SDRAM_BASEhere, which we define to the base address of the on-board SDRAM in the CPU address space. Then, a second call will be necessary to load the entire kernel image at the right load address.
Then all we need to do is:
#define be32_to_cpu(a) ((a)[0] << 24 | (a)[1] << 16 | (a)[2] << 8 | (a)[3])
#define PHYS_SDRAM_BASE 0x20000000
int main(void)
{
unsigned char *tmp;
unsigned long jump_addr;
unsigned long load_addr;
unsigned long size;
hw_init();
load_nandflash(IMG_ADDRESS, 0x20, PHYS_SDRAM_BASE);
/* Setup tmp so that we can read the kernel size */
tmp = PHYS_SDRAM_BASE + 12;
size = be32_to_cpu(tmp);
/* Now, load address */
tmp += 4;
load_addr = be32_to_cpu(tmp);
/* And finally, entry point */
tmp += 4;
jump_addr = be32_to_cpu(tmp);
/* Load the actual kernel */
load_nandflash(IMG_ADDRESS, size, load_addr - 0x40);
return jump_addr;
}
Note that the second call to load_nandflash could in theory be replaced by:
load_nandflash(IMG_ADDRESS + 0x40, size + 0x40, load_addr);
However, this will not work. What happens is that load_nandflash starts reading at an address aligned on a page boundary, so even when passing IMG_ADDRESS+0x40 as a first argument, reading will start at IMG_ADDRESS, leading to a failure (writes have to aligned on a page boundary, so it is safe to assume that IMG_ADDRESS is actually correctly aligned).
The above piece of code will silently fail if anything goes wrong, and does no checking at all – indeed, the binary size is very limited and we can’t afford to put more code than what is strictly necessary to boot the kernel.

This has been a very useful tip for me!
It saved me about 8s on boot, making my users much happier =)
The only problem I had was with ethernet connection not working (Linux displayed it as “up”, but not “running”). Some initializations from U-Boot are required before the Linux kernel can use it (on the method at91sam9260ek_macb_hw_init() )
I have isolated and replicated the required initialization for AT91Bootstrap in the following code:
Please note reading the mac address from flash and seting it’s registers (SA1L and SA1H) is not required – Linux will create a new random mac address if you don’t
#define AT91_RSTC_KEY 0xA5000000
#define AT91_RSTC_MR_ERSTL_MASK 0x0000FF00
#define AT91_RSTC_MR_ERSTL(x) ((x & 0xf) << 8)
#define AT91_RSTC_CR_EXTRST 0x00000008
#define AT91_RSTC_SR_NRSTL 0x00010000
static void mac_init() {
//Read the MAC address from NAND
unsigned long* MAC = (long*) JUMP_ADDR;
long erstl;
load_nandflash(CONFMAC_ADDRESS, CONFMAC_SIZE, (unsigned long)MAC);
//Store the MAC address on it's registers
writel(1 << AT91C_ID_EMAC, AT91C_PMC_PCER);
writel(MAC[0], AT91C_EMACB_SA1L);
writel(MAC[1], AT91C_EMACB_SA1H);
//Reset MACB Controoler. copied from U-boot method at91sam9260ek_macb_hw_init()
writel(
AT91C_PIO_PA14 |
AT91C_PIO_PA15 |
AT91C_PIO_PA17 |
AT91C_PIO_PA25 |
AT91C_PIO_PA26 |
AT91C_PIO_PA28,
AT91C_PIOA_PPUDR);
erstl = readl(AT91C_RSTC_RMR) & AT91_RSTC_MR_ERSTL_MASK;
writel(AT91_RSTC_KEY | AT91_RSTC_MR_ERSTL(13) | AT91C_RSTC_URSTEN, AT91C_RSTC_RMR);
writel(AT91_RSTC_KEY | AT91_RSTC_CR_EXTRST, AT91C_RSTC_RCR);
while ( ! (readl(AT91C_RSTC_RSR) & AT91_RSTC_SR_NRSTL) )
{ }
writel(AT91_RSTC_KEY | erstl | AT91C_RSTC_URSTEN, AT91C_RSTC_RMR);
}
I hope this tip will be useful for someone else
Paulo
Hi Paulo,
Many thanks for sharing this tip!!!
Michael.
Hi,
I am trying to implement this for at91sam9g45, when the process switch to Linux kernel, I can see kernel decompressing done correctly through the self decompression image, then when the process is going to switch to the Linux kernel, it is halted and Linux kernel is not coming up, even I can not see the kernel booting process.
Is there any Idea ?
Regards,
-Jalal
Hi,
I have implemented this fastboot code in at91bootstrap-1.16. and it is successfully booting linux-2.6.25.
Now i have upgrated my kernel version to linux-3.3.7.
Now I am getting a problem with this. Linux is not booting.
Only I am getting a print “Start AT91Bootstrap..”
I have checked for all parameters(IMG_ADDRESS, load_addr, jump_addr etc), they are same. But there is a difference in IMG_SIZE.
size of previous image(linux-2.6.25) = 1.1MB
size of new image (linux-3.3.7) = 1.5MB
Any Idea..??
Please Reply..
Thanks in Advance
-Vikram
Hello,
I am trying to Load Linux Directly from AT91BootStrap on SAM9G25 board. However i am getting INVALID MAGIC NUMBER error.
Any Suggestions?
Thanks,
Aman
We wrote an update for newer Atmel AT91 platforms. AT91Bootstrap now officially supports booting Linux directly, without having to hack its code. See our new blog post.