As we reported in previous blog post, almost the entire Bootlin engineering team was at the Embedded Linux Conference Europe in Prague in June. In order to share with our readers more about what happened at this conference, we have asked all engineers at Bootlin to select one talk they found interesting and useful and share a short summary of it. We will share this feedback in a series of blog post, this one being the first of this series.
Preparing Linux Real-Time Kernel and Tuning Robotics Platform with Modern ARM64 SoC
Talk by Krzysztof Kozlowski, chosen by Bootlin engineer Alexis Lothoré
During his talk, Krzysztof offered a nice walkthrough about enabling a real time Linux kernel. After reminding us of the basic steps of building a real time kernel (applying PREEMPT-RT patches and enabling proper kernel configuration options), he showed us why the job (and so, the talk !) is not done yet, even if we manage to boot our freshly built kernel: indeed, a lot of code may now behave incorrectly when used with PREEMPT-RT, because of fundamental changes in its behavior. Many kernel design assumptions are now challenged and need careful care: for example spinlocks are now able to sleep, and memory allocations are not fully atomic anymore, even with the GFP atomic flag. Fortunately, the kernel offers infrastructures to detect those new issues, and Krzysztof gave a few examples of expected errors when those checks are enabled and how to fix them. Finally, he showed how to measure the system performance compared to a vanilla kernel, and how to tune it based on observed results. After a few iterations, he managed to get satisfying results with the RT kernel, while also proving that choosing a RT kernel is not a “black or white” option: a fine-tuned vanilla kernel can offer satisfying real-time performance too, depending on the requirements.
Krzysztof’s talk was very enjoyable thanks to the clear path it offered: it can be used as a nice guide/walkthrough to configure and integrate a real-time kernel for your own use case, and it gives a lot of valuable details and hints about issues you will likely encounter while doing so.
Subsystems with Object Lifetime Issues (in the Embedded Case)
Talk by Wolfram Sang (Sang Engineering / Renesas), chosen by Bootlin engineer Hervé Codina
As we already had some issues with object lifetime, we were interested in Wolfram’s talk related to this topic. Reference counting issues are not new issues. These issues were already spotted by Laurent Pinchart at LinuxPlumbers 2022 (slides, video) and by Bartosz Golaszewski at FOSDEM 2023 (slides, video)
Wolfram started presenting reference counting and the kernel struct kref available in struct device. He showed the logical and physical device relationship and exposed where the issues can happen. In particular when we mix two objects (struct device). One object can be released while it shouldn’t. In other words, object allocated resources are freed while its reference counter is not zero.
Wolfram told us about the wrong pattern and its usage in the current kernel code.
He exposed the issue in a very understandable way. He also did some demonstration with several subsystems:
- TTY: works Ok
- MTD and SPI: Kernel crash
- I2C: didn’t crash but block until some user-space apps using the device were terminated
Finally, he presented a solution to the issue involving
*_register() in order to avoid having the
struct device released while its reference counter is not zero.
This talk was pretty instructive to highlight this wrong pattern used in the kernel and these reference counting issues rang a bell about issues we encountered with some projects at Bootlin. Also Wolfram did his talk in a quite funny and interactive way which leads to a really pleasant moment.
The Resurrection of Ureadahead and Speeding up the Boot Process and Preloading Applications
Talk by Steven Rostedt (Google), chosen by Bootlin engineer Théo Lebrun.
Ureadahead was a Canonical project that did two things:
- It scanned boot from time to time to get accessed disk pages. It did that using kernel traces and the mincore(2) syscall.
- It was called at boot early on to call readahead(2) on the pages that were asked for by previous boots. The goal is to keep the disk busy and fill the page cache with what will most likely be requested.
The talk started by explaining the tool itself and its history. It continued with how Steven Rostedt refactored the tool to avoid it requiring kernel patches, which was a requirement with the Canonical version (that was now unmaintained). It also got away with a few pain points of the previous version. The tool is now being used as part of ChromeOS, but could be used by all Linux distros. Gains are in the vicinity of 16% for boot-time up to the login screen, on a ChromeOS laptop with SSD.
Steven ended his talk by asking for help from the community; he has no time to provide support on ureadahead and strongly believes that it can be improved and enlarged to bring more boot-time benefits, possibly in the embedded field.
A few aspects were mentioned in questions that have to be accounted for in an embedded setting:
- Does the inode + offset pair work well with compressed root filesystems?
- Could an Ahead-Of-Time approach be used so that the blocks description file be created at build-time?
- Could the root filesystem pages be ordered to suit the reading needed?
Overall, an interesting topic with a lot more potential to dig into! We particularly liked that the implementation could be a minimal userspace application that uses existing user-space interfaces.
Efficient and Practical Capturing of Crash Data on Embedded Systems
Talk by John Ogness, chosen by Bootlin engineer Luca Ceresoli.
In this talk John Ogness described the minicoredumper project, of which he is the author.
He first did a preamble on the core dump feature, which is a standard function provided by the kernel since a very long time ago. If enabled via the
proc filesystem, it instructs the kernel to dump into a file the whole virtual memory space of processes that crash. This is a very useful feature for post-mortem debugging. However it has shortcomings, most notably the huge size of the files generated for complex programs and the limited amount of available configuration settings.
Then he presented the minicoredumper project which provides many improvements over the standard core dump. It is split in three components: the minicoredumper program, libminicoredumper and live dumps.
minicoredumper is a userspace application that builds on top of the existing kernel core dump feature, exploiting its possibility to redirect the dump data to a handler process. This way, when a program crashes, the kernel will run minicoredumper and send all the dump to it. Based on this, minicoredumper will save a “reduced” version of the core dump, whose content is configured via a JSON file. This allows users to select what to save: the stack for all threads or only the crashed one, the heap, specific ELF sections and specific symbols that are relevant for debugging. Additionally the content can be compressed and sparse tar files can be used to dramatically reduce storage space.
Additionally minicoredumper saves lots of information about the process found in
/proc/, such as the list of open file descriptors and the memory map. These are saved as text files so they can be inspected without even using gdb.
Then John described libminicoredumper, a library that can be linked into a program that needs to control what gets dumped and how. It has a very simple API to register and unregister at runtime the data that should be dumped. A program can also register a function to produce a file with arbitrary content, allowing to save data generated on the fly in a form that is not present as-is in memory, e.g. a text representation of a complex data structure.
The last component of the project is the live dumps feature, which additionally allows saving information about other running processes which can have useful information about the crash. This works using libminicoredumper and a dedicated daemon, minicoredumper_regd. When a crash is detected, libminicoredumper instructs minicoredumper_regd to save data about the previously registered PIDs, then dumps the data about the crashed application. This cannot be done synchronously however, thus its usefulness can be reduced, and additionally the non-crashed processes will be stalled for a few milliseconds while their data is saved.
Overall we found the minicoredumper project very interesting for embedded systems and John gave a very good overview of what it does and how.