Yocto: sharing the sstate cache and download directories

When developing projects based on Yocto Project / OpenEmbedded, a quite common practice is to have multiple build environments in different directories: one per product, or one for each development branch, or for other scenarios. Each build environment could have different layers, a different configuration, or just using a different version of the source code.

With default settings, different build directories result in duplicated storage for the downloaded source code and build artifacts, as well as duplicated time for downloading the sources and to build everything. This can be troublesome for large projects.

Fortunately, the bitbake build engine can share both the downloaded source code and the intermediate build results across multiple build directories, saving build time and disk space.

Sharing the downloaded sources

The first thing you may want to share is the download directory, which  stores all the source code downloaded from each URL set in the SRC_URI variable of each recipe. This is usually either a compressed tar archive or a git repository to be cloned.

In openembedded-core, by default the location of the download directory is ${TOPDIR}/downloads, where TOPDIR is your build directory. Thus, for multiple directories where you do Yocto development there will be a separate download directory, duplicating download time and disk space used.

One good news is that the above path is not set in stone, but rather it is the content of the DL_DIR variable. Having a unique directory, shared across all your projects, can be done easily by setting this line in your conf/local.conf file:

DL_DIR="${HOME}/data/bitbake.downloads"

You can choose the path you prefer here, but using HOME and not TOPDIR will make the path identical for all your builds.

Sharing the sstate cache

A powerful feature of the Yocto build system is the shared state cache, usually called sstate cache for brevity. After having built a recipe successfully, bitbake stores the output results (without all the intermediate files) into the sstate cache. When the exact same recipe needs to be built again, instead of running through the expensive tasks of the actual build, bitbake will simply extract the resulting binaries it had previously stored.

The sstate cache saves a huge amount of time when building big projects. It also saves disk space, as it does not need to extract all the source code and to produce all the intermediate artifacts: only the final package is extracted.

However, just like the download directory, also the directory storing the sstate cache is by default relative to the build directory, being in ${TOPDIR}/sstate-cache. No surprise that it can be modified, just like the downloads directory, by modifying the SSTATE_DIR variable, for example:

SSTATE_DIR="${HOME}/data/bitbake.sstate"

If you have multiple directories building a similar distribution (e.g. two different branches of the same projects), this can reuse the output products for most of your recipes. But it is useful even if you have projects involving totally different target CPU architectures, such as an ARM project and a RISC-V project, because native packages are also saved into the sstate cache. And even in case nothing can be shared, it will not give any noticeable performance penalty to your builds.

Is it safe?

Sharing the download directory and sstate cache directory might look dangerous. What if I’m running two independent builds together and they both try to download the source code for the same recipe? What if bitbake takes an sstate cache entry for the same recipe, but built from different settings?

Short answer: it is safe.

Slightly longer answer: it is safe unless you clean your sstate cache or download directory.

The reason it is safe is that Bitbake implements the mechanisms needed to make it safe. For the sstate especially, each cache entry is saved along with a hash, which is computed from all the variables and functions that are in the recipe. If you have ever run bitbake -e <recipename>, you have an idea of what those variables and functions are. This means that if you rebuild the same recipe with even slightly different settings (source code version, applied patches, compiler optimization flags, permissions of installed files, whatever) the hash value will change, and a different sstate cache entry will be created.

However if you clean your sstate cache or download directory you may run into build failures, because of how Bitbake keeps track of the tasks it has already completed. The documentation for the do_cleanall and do_cleansstate tasks explain the rationale for this. Note that do_cleanallcan produce build errors even without a shared download directory.

So the general advice is to not use do_cleanall and do_cleansstate at all. And even in case you used them or removed files from your sstate cache or download directory by any other means, don’t worry: you will not end up with an incorrect sstate entry being used without noticing, or other incorrect build results. The worst problem you will face is a build error that you can recover by cleaning  the affected recipes with bitbake -c clean <recipe>. Removing the entire tmp directory is another option.

Automatically sharing across all your projects

So, everything is perfect now? Well, not yet. Adding the above two lines to the local.conf for all of your projects requires you to remember, and not to do any copy-paste errors while setting the variables. And if your local.conf files are automatically generated by a CI script or by a tool such as kas, this can be tricky to get done the right way.

The good news is that there is a solution also to this problem: setting DL_DIR and SSTATE_DIR in your shell environment, so that every time you use bitbake they will be set. For Bash, this means exporting variables in your ~/.bashrc. You can easily test it and… find that it does not work. This is because bitbake discards most of the the variables from the calling shell environment before running the build, in order to minimize pollution from local settings and make the build as much reproducible as possible across different workstations.

Don’t despair however: there is a way to explicitly let a variable from the shell environment into the bitbake global environment. Bitbake uses the  BB_ENV_PASSTHROUGH variable to keep a list of variables it should not discard, and removes all the remaining ones. This is an internal variable that you should not modify directly, but you can specify an additional set of variables in BB_ENV_PASSTHROUGH_ADDITIONS; and this is a shell variable that bitbake will take from the external environment.

Thus a clean way to add your DL_DIR and SSTATE_DIR globally is to add these three lines to your shell init file (~/.bashrc for Bash):

export BB_ENV_PASSTHROUGH_ADDITIONS="DL_DIR SSTATE_DIR"
export DL_DIR="${HOME}/data/bitbake.downloads"
export SSTATE_DIR="${HOME}/data/bitbake.sstate"

That’s all you need. With this, all of your builds will share downloaded archives and build results, saving disk space and time for downloading and building.

After doing this, remember to remove the downloads and sstate-cache subdirs of all your existing build directories. They will not be used anymore, so they are just wasting disk space!

Curious about the time saving you can get? You can easily discover that building from scratch an entire project that takes an hour at the first build, it will easily take less than a minute when rebuilding from scratch using a pre-populated sstate cache!

12 thoughts on “Yocto: sharing the sstate cache and download directories”

  1. Cheers! I’ve always been avoiding to share sstate cache between project, I’ll be starting now. Besides the own PC case, that can be a great optimization regarding CI too.

  2. “Short answer is: it is safe”

    I dont think its ok to state here its safe to share sstate in general.
    I agree if you take care that there is always one running build at a time.

    But on our servers are running multiple instances of different branches with different states of the project. Here I would say sharing download is ok but sstate must be separated per running instance.

    1. It is safe, unless you are using a very old release.

      The yocto autobuilders share their sstate and do a huge amount of builds, running on many builders having several different distributions, and building dunfell, kirkstone, mickledore, nanbield and master, all in parallel. This is definitely largely tested!

    2. sharing download is not OK as one workspace may remove downloads with clean or cleanall, while another workspace think’s it is still present (due to sstate remembering that do_fetch is complete)

  3. I’ve come across a case where it isn’t safe, if DL_DIR is shared between two workspaces:

    A $ ./scx bitbake zstd -c fetch
    B $ ./scx bitbake zstd -c cleanall
    A $ ./scx bitbake zstd -c unpack

    ERROR: zstd-1.4.5-r0 do_unpack: Unpack failure for URL: ‘git://github.com/facebook/zstd.git;nobranch=1’. No up to date source found: clone directory not available or not up to date: /share/dl_dir/sources/git2/github.com.facebook.zstd.git; shallow clone not enabled

    What it really means is: file not found: /share/dl_dir/sources/git2/github.com.facebook.zstd.git

    1. Hi Sam,

      thanks for reporting your test. I tested with two clones of https://github.com/bootlin/simplest-yocto-setup/ and without the “./scx” (whatever it is), and I confirm this sequence does fail:

      dir1$ bitbake zstd -c fetch
      dir2$ bitbake zstd -c cleanall
      dir1$ bitbake zstd -c unpack

      The reason this does not happen on the yocto autobuilders, which do use a shares sstate cache for multiple concurrent builds, is that they just do not delete any downloaded files.

      Week end time now. Next week I’m going to test on current master and, if it also fails, consider what to do (perhaps open a bug report).

      Luca

      1. I confirm this is happening also on master.

        This is actually unavoidable due to the structure of the tasks, and there no easy/obvious fix. So I can say do_cleanall should just not be used with a shared DL_DIR. This is even documented in The Yocto Project Test Environment Manual (https://docs.yoctoproject.org/test-manual/intro.html?highlight=cleanall#considerations-when-writing-tests).

        However it is not mentioned in the do_cleanall task, so I have sent a patch do add this:
        https://docs.yoctoproject.org/test-manual/intro.html?highlight=cleanall#considerations-when-writing-tests

        I am going to update the blog post as soon as my patch to the documentation will be merged (or rejected).

        1. I also note that this can happen without sharing download dirs — if any two recipes use the same SRC_URI.

          In fact it can occur with a single recipe having implicit variants, e.g. zstd and zstd-native

          cleanall on one, will leave the other thinking the source is available when it is not

          I think the general solution is perhaps for do_fetch sstate to include references to the downloaded sources so that these can be checked when deciding whether or not do_fetch needs re-running

          Of course the general solution is not guaranteed where there is concurrent action to remove the source between setscene and recipe invocation but it at least could allow self-repair on retry

        2. In addition to my other reply here awaiting moderation, I think this is a general problem that affects any task that has side-effects that are not part of sstate.

          DL_DIR downloads is just one, because presence is implied by sstate but sstate does not verify the presence of the file

          For me, spdx is another (I haven’t checked master yet) but we have the case where the final image fails because spdx output for recipes is missing. The do_compile or do_build is not re-run because it can be recovered from sstate but this does not dump the spdx files in the spdx dir.

          Another is the well know images in deploy_dir, we all know that deleting deploy images won’t cause the image recipe to re-run, but that is also of particular interest for recipes that dockerize image and have an image in the deploy dir as a SRC_URI

          So I wonder if the very general solution is for tasks to log side-effect files with the bitbake system which can then tell if those files are absent or changed to invalidate the sstate and force the task to be re-run. The side effect files are not stored with sstate, just their names and checksums.

Leave a Reply to Yann C Cancel reply