As kernel developers, we often find ourselves writing device drivers—pieces of code that are typically registered using module_init()
in the Linux kernel. But have you ever paused to wonder: just how late in the boot process does this happen? What exactly takes place between the moment we see the famous "Starting kernel..."
message and the point where drivers are finally registered and devices probed?
If you’re curious about the intricate steps that occur before the system even reaches a working init
process, you’re in the right place. Join us as we explore the fascinating journey of the Linux kernel boot sequence—step by step.
Throughout this article, you’ll find clickable links to our Elixir source code browser. We encourage you to dive in and follow along!
Where it all starts
The very first steps are utterly architecture-specific and written in assembly. If we take the example of an ARM 32-bit processor, the kernel is compressed but is prepended with a small uncompressed part that is responsible for uncompressing the rest of the binary at the correct location in RAM. The whole file is named zImage
, and the entry point comes from a file named arch/arm/boot/compressed/head.S. If you want more details about this decompression step, you can read the excellent How the ARM32 Linux kernel decompresses article from Linus Walleij.
Once the kernel code is in place (here we are talking about the .text
area), we jump to it and start executing the “real” part of the kernel; this happens in another assembly file, arch/arm/kernel/head.S. At this stage, the MMU is disabled; only a few basic checks regarding the MMU and the state of the caches are done, along with some very basic verifications. The very first page tables are created—just enough to get the kernel running—which means enabling the MMU and mapping the kernel code. A bit more MMU initialization is needed in order to run real code. This is done in another file that is included at the end of the head.S
file. Here as well, if you’re interested in further details about the ARM32 specific initialization, you can read How the ARM32 kernel starts article, also from Linus Walleij.
The magic point is right there, just a few lines later: we branch to start_kernel()
, which is the first C function. It is also the first generic piece of code that is not architecture-specific.
The feeling when going through this immense iterative function is that there is a lot to initialize—and in the correct order. All the functions linked below are somehow called from here, and going through this article will guide the reader through the main steps. Reading the code is probably the best approach to understand it fully, but let’s start by extracting the most important pieces.
Among the important function calls, there is of course the printing of the Linux banner, quickly followed by setup_arch()
, probably one of the most important functions here, and which brings up back into architecture-specific code.
Back to architecture-specific code
In ARM’s setup_arch()
implementation, we perform the identification of the machine (through the DTB or tags) and the retrieval of the cmdline
.
To access basic hardware and perform early memory allocations, it’s also necessary to create some early fixed mappings. These are used for early I/O mappings via the creation of a fixed-size table containing a list of slots ready to be queried. At this stage, the usual memory allocators and mapping functions (such as kmalloc()
or ioremap()
) are unavailable. Instead, the kernel offers early alternatives called early_malloc()
and early_ioremap()
, which are more limited.
The cmdline
then gets parsed.
Next comes a more advanced MMU configuration, with the introduction of page table management, the creation of a few kernel mappings and the reservation of certain memory areas which will be used for future kernel mappings as well as the Contiguous Memory Allocator (CMA). Finally, the MMU is configured in the state that we all know when running kernel or even userspace code, ie. with a kernel mapping at the top and a user addressable space at the bottom. This is also when the early memtest runs, if required. If you’re interested in even more details on how the memory is setup on ARM 32-bit, again we recommend articles from Linus Walleij: ARM32 Page Tables, Setting Up the ARM32 Architecture, part 1, Setting Up the ARM32 Architecture, part 2.
In the ARM case, the device tree gets unflattened, the PSCI interface started, platform-specific SMP initialization is also performed, and to end the architecture-specific section, a possible machine-specific callback can run (if provided).
Generic init steps
Back to our init/main.c
file, numerous steps remain to be done, such as dumping the cmdline, initializing an early random generator pool, enabling kernel memory allocators (the page allocator, the SLAB allocator, kmalloc()
and vmalloc()
) together with the main memory sanitizers and tracers (ftrace
, kfence
, kmemleak
, kasan
), starting ftrace
and trace_printk()
, the scheduler, and various components of a modern kernel, such as radix trees, maple trees, workqueues
, RCU
as well as, later on, more sanitizers and tracers, such as lockdep
, perf
…
So far we ran without IRQs, which obviously need to be taken care of at some point, allowing at the same time the configuration of softirqs
and hrtimers
, themselves enabling time keeping.
Only now, the system console is initialized!
Among the remaining steps, a few are probably worth pointing out, such as the calibration of the delay loop which will be used to estimate the number of CPU cycles to waste during udelay()
calls, a final call for “late” architecture-specific CPU initializations and checks, the preparation of the first fork()
system call with eg. the creation of a SLAB cache for storing thread descriptors (struct task_struct
), the VFS
initialization (Virtual File System) and the creation of procfs
.
If the kernel needs to pause the booting procedure, waiting for a debugger to connect to kgdb
, it is done just before the VFS
init.
This long init sequence ends with the following comment:
/* Do the rest non-__init’ed, we’re now alive */
Which means it is time to initialize “the rest” (making sure we spawn another kernel thread for that to avoid issues when starting the init process). The CPU system is all started, memory management is operational and the scheduler is ready: we can now deal with devices.
Enabling hardware
It is finally time to enable the driver model, by instantiating devtmpfs
, creating as many ksets
and kobjects
as needed [1] [2] [3], populating the most basic sysfs
entries and registering the platform bus.
While we are populating virtual filesystems, procfs
also gets populated with some already available values, such as the list of registered interrupts.
One of the latest missing pieces is now about to be tackled: the registration of all the drivers. These drivers have a registration order that is primarily based on the initcall
level to which they have been registered at. Initcalls
are nothing more than an 8 steps array ordering “the rest” of the kernel initialization. Inside a given initcall
level, the order of execution depends on the order of registration, itself depending directly on the order in which the object files have been linked together, (this is dictated by the various Makefiles
). So, back to our initcalls
, they will all be executed in order. This is when our init_module()
executes!
For instance, one of the callbacks registered at the arch_initcall
level (number 3 from a 0-7 range) is the one that populates the devices based on the content of the device tree !
Starting the userspace
Once all bus drivers, host controller drivers and device drivers have been registered, kunit
tests are run, right before handing over to the init
process, as soon as we find a suitable one.
So yes, there are so many intermediate steps that it is easy to get lost, especially since one step is often split into its own early init, init, configuration and late init, all of them being spread across immense functions with tons of intermediate steps, but we anyway hope this article will help demystifying these early steps, these steps that always happen before you’ll get a chance to see your first printk()
!