When developing projects based on Yocto Project / OpenEmbedded, a quite common practice is to have multiple build environments in different directories: one per product, or one for each development branch, or for other scenarios. Each build environment could have different layers, a different configuration, or just using a different version of the source code.
With default settings, different build directories result in duplicated storage for the downloaded source code and build artifacts, as well as duplicated time for downloading the sources and to build everything. This can be troublesome for large projects.
Fortunately, the bitbake build engine can share both the downloaded source code and the intermediate build results across multiple build directories, saving build time and disk space.
Sharing the downloaded sources
The first thing you may want to share is the download directory, which stores all the source code downloaded from each URL set in the SRC_URI
variable of each recipe. This is usually either a compressed tar archive or a git repository to be cloned.
In openembedded-core, by default the location of the download directory is ${TOPDIR}/downloads
, where TOPDIR
is your build directory. Thus, for multiple directories where you do Yocto development there will be a separate download directory, duplicating download time and disk space used.
One good news is that the above path is not set in stone, but rather it is the content of the DL_DIR
variable. Having a unique directory, shared across all your projects, can be done easily by setting this line in your conf/local.conf
file:
DL_DIR="${HOME}/data/bitbake.downloads"
You can choose the path you prefer here, but using HOME
and not TOPDIR
will make the path identical for all your builds.
Sharing the sstate cache
A powerful feature of the Yocto build system is the shared state cache, usually called sstate cache for brevity. After having built a recipe successfully, bitbake stores the output results (without all the intermediate files) into the sstate cache. When the exact same recipe needs to be built again, instead of running through the expensive tasks of the actual build, bitbake will simply extract the resulting binaries it had previously stored.
The sstate cache saves a huge amount of time when building big projects. It also saves disk space, as it does not need to extract all the source code and to produce all the intermediate artifacts: only the final package is extracted.
However, just like the download directory, also the directory storing the sstate cache is by default relative to the build directory, being in ${TOPDIR}/sstate-cache
. No surprise that it can be modified, just like the downloads directory, by modifying the SSTATE_DIR
variable, for example:
SSTATE_DIR="${HOME}/data/bitbake.sstate"
If you have multiple directories building a similar distribution (e.g. two different branches of the same projects), this can reuse the output products for most of your recipes. But it is useful even if you have projects involving totally different target CPU architectures, such as an ARM project and a RISC-V project, because native packages are also saved into the sstate cache. And even in case nothing can be shared, it will not give any noticeable performance penalty to your builds.
Is it safe?
Sharing the download directory and sstate cache directory might look dangerous. What if I’m running two independent builds together and they both try to download the source code for the same recipe? What if bitbake takes an sstate cache entry for the same recipe, but built from different settings?
Short answer: it is safe.
Slightly longer answer: it is safe unless you clean your sstate cache or download directory.
The reason it is safe is that Bitbake implements the mechanisms needed to make it safe. For the sstate especially, each cache entry is saved along with a hash, which is computed from all the variables and functions that are in the recipe. If you have ever run bitbake -e <recipename>
, you have an idea of what those variables and functions are. This means that if you rebuild the same recipe with even slightly different settings (source code version, applied patches, compiler optimization flags, permissions of installed files, whatever) the hash value will change, and a different sstate cache entry will be created.
However if you clean your sstate cache or download directory you may run into build failures, because of how Bitbake keeps track of the tasks it has already completed. The documentation for the do_cleanall
and do_cleansstate
tasks explain the rationale for this. Note that do_cleanall
can produce build errors even without a shared download directory.
So the general advice is to not use do_cleanall
and do_cleansstate
at all. And even in case you used them or removed files from your sstate cache or download directory by any other means, don’t worry: you will not end up with an incorrect sstate entry being used without noticing, or other incorrect build results. The worst problem you will face is a build error that you can recover by cleaning the affected recipes with bitbake -c clean <recipe>
. Removing the entire tmp
directory is another option.
Automatically sharing across all your projects
So, everything is perfect now? Well, not yet. Adding the above two lines to the local.conf
for all of your projects requires you to remember, and not to do any copy-paste errors while setting the variables. And if your local.conf
files are automatically generated by a CI script or by a tool such as kas, this can be tricky to get done the right way.
The good news is that there is a solution also to this problem: setting DL_DIR
and SSTATE_DIR
in your shell environment, so that every time you use bitbake they will be set. For Bash, this means exporting variables in your ~/.bashrc
. You can easily test it and… find that it does not work. This is because bitbake discards most of the the variables from the calling shell environment before running the build, in order to minimize pollution from local settings and make the build as much reproducible as possible across different workstations.
Don’t despair however: there is a way to explicitly let a variable from the shell environment into the bitbake global environment. Bitbake uses the BB_ENV_PASSTHROUGH
variable to keep a list of variables it should not discard, and removes all the remaining ones. This is an internal variable that you should not modify directly, but you can specify an additional set of variables in BB_ENV_PASSTHROUGH_ADDITIONS
; and this is a shell variable that bitbake will take from the external environment.
Thus a clean way to add your DL_DIR
and SSTATE_DIR
globally is to add these three lines to your shell init file (~/.bashrc
for Bash):
export BB_ENV_PASSTHROUGH_ADDITIONS="DL_DIR SSTATE_DIR" export DL_DIR="${HOME}/data/bitbake.downloads" export SSTATE_DIR="${HOME}/data/bitbake.sstate"
That’s all you need. With this, all of your builds will share downloaded archives and build results, saving disk space and time for downloading and building.
After doing this, remember to remove the downloads
and sstate-cache
subdirs of all your existing build directories. They will not be used anymore, so they are just wasting disk space!
Curious about the time saving you can get? You can easily discover that building from scratch an entire project that takes an hour at the first build, it will easily take less than a minute when rebuilding from scratch using a pre-populated sstate cache!
Cheers! I’ve always been avoiding to share sstate cache between project, I’ll be starting now. Besides the own PC case, that can be a great optimization regarding CI too.
Thanks, this post is very usefull and will help me to save space in my hard disk
Thanks for sharing, this is useful!
“Short answer is: it is safe”
I dont think its ok to state here its safe to share sstate in general.
I agree if you take care that there is always one running build at a time.
But on our servers are running multiple instances of different branches with different states of the project. Here I would say sharing download is ok but sstate must be separated per running instance.
It is safe, unless you are using a very old release.
The yocto autobuilders share their sstate and do a huge amount of builds, running on many builders having several different distributions, and building dunfell, kirkstone, mickledore, nanbield and master, all in parallel. This is definitely largely tested!
sharing download is not OK as one workspace may remove downloads with clean or cleanall, while another workspace think’s it is still present (due to sstate remembering that do_fetch is complete)
I’ve come across a case where it isn’t safe, if DL_DIR is shared between two workspaces:
A $ ./scx bitbake zstd -c fetch
B $ ./scx bitbake zstd -c cleanall
A $ ./scx bitbake zstd -c unpack
ERROR: zstd-1.4.5-r0 do_unpack: Unpack failure for URL: ‘git://github.com/facebook/zstd.git;nobranch=1’. No up to date source found: clone directory not available or not up to date: /share/dl_dir/sources/git2/github.com.facebook.zstd.git; shallow clone not enabled
What it really means is: file not found: /share/dl_dir/sources/git2/github.com.facebook.zstd.git
Hi Sam,
thanks for reporting your test. I tested with two clones of https://github.com/bootlin/simplest-yocto-setup/ and without the “./scx” (whatever it is), and I confirm this sequence does fail:
dir1$ bitbake zstd -c fetch
dir2$ bitbake zstd -c cleanall
dir1$ bitbake zstd -c unpack
The reason this does not happen on the yocto autobuilders, which do use a shares sstate cache for multiple concurrent builds, is that they just do not delete any downloaded files.
Week end time now. Next week I’m going to test on current master and, if it also fails, consider what to do (perhaps open a bug report).
Luca
I confirm this is happening also on master.
This is actually unavoidable due to the structure of the tasks, and there no easy/obvious fix. So I can say do_cleanall should just not be used with a shared DL_DIR. This is even documented in The Yocto Project Test Environment Manual (https://docs.yoctoproject.org/test-manual/intro.html?highlight=cleanall#considerations-when-writing-tests).
However it is not mentioned in the do_cleanall task, so I have sent a patch do add this:
https://docs.yoctoproject.org/test-manual/intro.html?highlight=cleanall#considerations-when-writing-tests
I am going to update the blog post as soon as my patch to the documentation will be merged (or rejected).
I also note that this can happen without sharing download dirs — if any two recipes use the same SRC_URI.
In fact it can occur with a single recipe having implicit variants, e.g. zstd and zstd-native
cleanall on one, will leave the other thinking the source is available when it is not
I think the general solution is perhaps for do_fetch sstate to include references to the downloaded sources so that these can be checked when deciding whether or not do_fetch needs re-running
Of course the general solution is not guaranteed where there is concurrent action to remove the source between setscene and recipe invocation but it at least could allow self-repair on retry
Hello Sam,
after a few iterations my documentation patches have been accepted, so now do_cleanall and do_cleansstate are officially discouraged: https://docs.yoctoproject.org/ref-manual/tasks.html#do-cleanall
I have just updated the blog post accordingly.
Thanks for your inputs!
Luca
Hi Luca.
I agree, that a ‘do_cleanall’ should _almost_ never be necessary. However that does not fix the issue, that a shared download folder is a bad idea. You can create your own source-mirror, as documented in https://wiki.yoctoproject.org/wiki/How_do_I#Q:_How_do_I_create_my_own_source_download_mirror_.3F.
I think you should rather point to that documentation or reiterate on what is written there, becaus in short: It is not safe. Only under certain conditions.
Cheers,
Flo
Hello Flo,
I agree on the sentence “Only under certain conditions”.
However those conditions are met by the most common use cases for developers, i.e. mainly that using do_cleanall is not normally used. Quoting the official documentation:
“You should never use the do_cleanall task in a normal scenario”
(https://docs.yoctoproject.org/ref-manual/tasks.html#do-cleanall)
On the other hand sharing the DL_DIR is super simple (3 lines in .bashrc) and works great in “normal conditions”. Whether this covers 40% or 90% of the users I really cannot say, but surely a relevant amount based on my experience.
When one is outside of those “normal conditions”, I can think about two main options.
First option: don’t do anything. There will be one download directory per build dir, disk space and network bandwitdh usage will be duplicated (both can be very cheap nowadays). Do your builds, live happy and safe.
Second option: set up a local mirror as you suggest. This is more complex in terms of configuration, has a performance impact due to BB_GENERATE_MIRROR_TARBALLS, requires “maintenance” to ensure the mirror has all the required sources and finally involves disk space duplication (unless hardlinks are used, I haven’t checked that). But it is safe _and_ avoids duplicating network usage.
So, three kind of user conditions/needs, three solutions. It’s good to have alternatives and to know them so anybody can pick the one that is best for them.
And thanks for posting the link to the download mirror instructions, they will be useful for who wants to do that!
Luca
In addition to my other reply here awaiting moderation, I think this is a general problem that affects any task that has side-effects that are not part of sstate.
DL_DIR downloads is just one, because presence is implied by sstate but sstate does not verify the presence of the file
For me, spdx is another (I haven’t checked master yet) but we have the case where the final image fails because spdx output for recipes is missing. The do_compile or do_build is not re-run because it can be recovered from sstate but this does not dump the spdx files in the spdx dir.
Another is the well know images in deploy_dir, we all know that deleting deploy images won’t cause the image recipe to re-run, but that is also of particular interest for recipes that dockerize image and have an image in the deploy dir as a SRC_URI
So I wonder if the very general solution is for tasks to log side-effect files with the bitbake system which can then tell if those files are absent or changed to invalidate the sstate and force the task to be re-run. The side effect files are not stored with sstate, just their names and checksums.
Do those shared folders need to be open for writing or can they be read only?
Hello Arik,
these folders definitely have to be writable.
Consider they are a storage area that bitbake uses. It stores files when fetching sources, and it reads them later on when fetching them again. Similarly it saves sstate cache entries when building a recipe, and it reads them back before rebuilding it.