Systemd, read-only rootfs and overlay file system over /etc

Systemd is a popular init system, used to bootstrap user space and manage user processes. It now replaces several Linux utilities with its own components like log management, networking, time management, etc. There is even a bootloader component now. Systemd is obviously ubiquitous nowadays for desktop/server Linux distributions, and is also commonly used on embedded devices to benefit from features such as parallel startup of services, monitoring of services, and more.

In a recent project that uses Buildroot as its build system, we have used systemd with the storage consisting of a read-only root filesystem (SquashFS) and an overlay file system (OverlayFS) mounted on /etc. While doing this, we faced two issues with the use of OverlayFS on /etc:

  • /etc/machine-id file management. This file is created during the first boot by systemd, and if the root filesystem is read-only, it will bind mount it to /run and wait to have read-write access to create it (see more details). In that case, the machine-id file is re-generated at each boot (because /run is a tmpfs, which means that the machine identification changes at each boot, which is not necessarily desirable. On the other hand, we don’t want to machine-id file to be part of the SquashFS filesystem because the SquashFS filesystem is identical on all devices, while the /etc/machine-id file is unique per device. So ideally, we would like this machine-id file to be stored in our OverlayFS, generated during the first boot. The issue is that reading the machine-id file is done very early by systemd, before we get the chance to mount the OverlayFS.
  • We wanted to be able to add or modify systemd services using the OverlayFS. Systemd parses the service files at early init and executes them according to their order and dependencies. The service mounting the filesystems from /etc/fstab and any other services is started after such parsing, which is too late. We could think of running daemon-reload from a custom service once mounting was complete, but this is not really a stable solution, as
    Lennart Poettering commanted on in a short e-mail thread about this issue.

The solution suggested by Lennart, and elsewhere on the wider Internet is to mount the OverlayFS from an initramfs, which allows to have it setup before systemd even starts. As we use Buildroot and using an initramfs adds complexity by requiring a separate configuration to manage multiple images. This was overkill in our case, just for setting up the overlay. The solution we eventually chose was to create an init_overlay.sh script which is started as init before systemd, by adding init=/sbin/init_overlay.sh to the kernel command line:

#!/bin/sh
mount -t proc -o nosuid,nodev,noexec none /proc
mount -t sysfs -o nosuid,nodev,noexec none /sys
mount /dev/mmcblk0p2 /mnt/data
mount -t overlay overlay -o lowerdir=/etc,upperdir=/mnt/data/etc,workdir=/mnt/data/.etc-work /etc
exec /sbin/init

Hopefully, this will be useful to others. Of course, we’re also curious to hear if others faced the same issue, and discover how they solved this. Let us know in the comments.

Author: Köry Maincent

Köry Maincent is an embedded Linux and kernel engineer at Bootlin, which he joined in 2020.

18 thoughts on “Systemd, read-only rootfs and overlay file system over /etc”

  1. Thanks for sharing — we have also used overlay FS and mount them in an initramfs. I guess we stumbled on the correct solution because we run system update functionality from an initramfs, so it was natural to set up file systems, etc there anyway. Mounting the file systems in the initramfs gives us more control over error recovery, etc.

    Documentation:

    https://github.com/YoeDistro/yoe-distro/blob/master/docs/updater.md

    Our initramfs init is also a shell script:

    https://github.com/YoeDistro/yoe-distro/blob/master/sources/meta-yoe/recipes-support/updater/files/updater.installer

    1. Thanks for sharing your solution. Indeed if you have more to do an initramfs is suitable but I did not want to use it and increase boot time only for mounting a simple overlay file system.

  2. How does /dev/ get set up? Is there some missing snippet in the quoted script?

    For the first problem, if you have a locally configurable bootloader, you can have it set systemd.machine_id= on the kernel command line, and then it will be set early on boot. For the second problem, these days adding new services late at boot doesn’t require a reload. In your overlay have a late-services.target or so unit that pulls them in, and in your squashfs have a late service, ordered after the overlay setup, that simply starts it with systemctl start.

    1. By the kernel (CONFIG DEVTMPFS_MOUNT) and because there is no initramfs.

      1. Yes I know it was an option to let the bootloader create a random id, save it, and pass it to systemd. IMHO I am no sure it is cleaner. I had also an issue with it but I can’t recall it.

      2. What about enabling/disabling a service. IIRC it creates/removes links in filesystem but the overlay is mount after the parsing of these link. Systemd won’t know the enable state of these services.

  3. Thanks for sharing.
    We try to expose the machine-id as a file in sysfs in a kernel module and the /etc/machine-id is a softlink pointing to the file.

    1. This could be a interesting solution.
      Where will it be saved and how will you tell your module to look for the machine id in a specific memory or location?

  4. Thanks for your blog post! You are using the “overlay” keyword twice in your mount command. What is the second one for?

    1. mount -t type device dir
      The first one is for the overlayfs filesystem type the second one is to tell that the device mounted is an overlay.

  5. When I remove a file from the lower dir (“rm /etc/os-release” just for example) then a character device is created in the upper dir to white out the removed file, like expected. When listing the content of the /etc directory afterwards with the ls command (ls /etc) then /etc/os-release is not listed anymore in the output – like expected – but I do get an error message: ” ls: /etc/os-release: No such file or directory”. Is this something you are witnessing as well?

    1. You remove the file after mounting the overlayfs?
      If that’s the case, indeed the overlayfs add this character device for telling that this file is removed from the filesystem.
      You won’t be able to access the file as long as the overlayfs is mounted, and it will behave as if this file does not exist. Therefore yes it is expected to see this “No such file or directory” message.

  6. From a Security perspective It’s absolutely not a good idea to have /etc mounted as read-write ! !
    instead only the /etc subfolders that need to be modified should have an overlayfs set with the correct permissions.

    1. On my case we wanted to use the full /etc directory as we did not care about security, but you are right I should have told it in the blog post.

    2. I completely agree. This is a point that people apparently keep on forgetting, so I think it’s important to hammer on it over and over again. I’ve seen systems that have rootfs verification turned on but that do have a writeable /etc. This completely defeats the purpose of rootfs verification, because *anything* can be done by adding an init script in /etc/init.d or a systemd unit in /etc/systemd/system.

      1. Yes, I would like a better approach from systemd but the only way I see is indeed to create a symlink in the dead-only rootfs and then mont the memory destination in the init script before systemd.

Leave a Reply