ccache has zero cache hits in GitLab CI, even when sources don't change and the cache is persisted.
Furthermore, the cache increases in size every time a build runs, which means it is being rebuilt over and over.
The issue was the default way ccache checks if the compiler is the same - via its timestamp. Since each GitLab CI instance runs a brand new Docker container, this timestamp is always different, hence always building new cache.
To fix this, set CCACHE_COMPILERCHECK to content instead of the default mtime.
From the ccache documentation:
compiler_check (CCACHE_COMPILERCHECK)
By default, ccache includes the modification time (“mtime”) and size of the compiler in the hash to ensure that results retrieved from the cache are accurate. This setting can be used to select another strategy. Possible values are:
content
Hash the content of the compiler binary. This makes ccache very slightly slower compared to the mtime setting, but makes it cope better with compiler upgrades during a build bootstrapping process.
mtime
Hash the compiler’s mtime and size, which is fast. This is the default.
Related
I have a Yocto build based on Poky that inherits reproducible_build. This essentially sets BUILD_REPRODUCIBLE_BINARIES to "1", and REPRODUCIBLE_TIMESTAMP_ROOTFS to "1520598896", which is 12:34:56 on 9th March 2018 UTC.
In this build, I have a /www/index.html file, that is created in the final image with an "mtime" automatically set to this same date. I'm using a third-party web-server that uses the file's mtime to set the E-Tag for caching purposes. Unfortunately, because every build has the same timestamp, the server responds to the web client's If-None-Match HTTP request header with a 304 response - Not Modified. This causes the client to show the index.html from the previous build, unless the user does a force-refresh (ctrl+F5). What I'd like to see is the true file being downloaded and displayed to the user.
I would prefer not to disable reproducible builds for the entire image just because of one file, so I'm looking for alternatives.
Is it possible to direct bitbake to skip the effect of BUILD_REPRODUCIBLE_BINARIES for a single file when creating the final image? Ideally I'd like this file to have an mtime equal to the time at which is was actually built, or perhaps even specify it programmatically (e.g. to the time my pipeline was created).
Could you set REPRODUCIBLE_TIMESTAMP_ROOTFS to match your index file or otherwise set it to a time which works for your application?
As suggested by Richard Purdie, one workaround is to set REPRODUCIBLE_TIMESTAMP_ROOTFS to the value that uniquely represents the origination time of the build, such as the CI pipeline creation time. This would solve my immediate problem, but it does seem to invalidate the whole point of BUILD_REPRODUCIBLE_BINARIES and I'm not sure yet what the side-effects of this may be.
An alternative I tried was to create a core-image-minimal.bbappend for my image recipe that creates a new task that sets the file's mtime appropriately. It is important that this runs after the function reproducible_final_image_task(), which sets the mtime for all rootfs files, otherwise the effect will be overwritten.
This recipe is tailored for both local and GitLab-CI builds:
# Due to BUILD_REPRODUCIBLE_BINARIES (from 'inherit reproducible_build'), all files in the rootfs have the same
# mtime (REPRODUCIBLE_TIMESTAMP_ROOTFS). This causes a caching problem for web-served files like /www/index.html that change between builds.
# Local builds get the current time (UTC), pipeline builds get the value of CI_PIPELINE_CREATED_AT:
DEFAULT_BUILD_DATE ??= "${#time.strftime('%Y-%m-%dT%H:%M:%S', time.gmtime())}"
BUILD_DATE := "${#d.getVar('BB_ORIGENV').getVar('CI_PIPELINE_CREATED_AT') or '${DEFAULT_BUILD_DATE}'}"
do_fix_www_index_html_mtime() {
touch --time=mtime --date "${BUILD_DATE}" ${IMAGE_ROOTFS}/www/index.html
}
# If CI_PIPELINE_CREATED_AT is not set in the environment, these variables will differ in value
# each time this recipe is parsed. This will result in the error:
# "The metadata is not deterministic and this needs to be fixed."
# Therefore, exclude them from the task's hash:
do_fix_www_index_html_mtime[vardepsexclude] = "DEFAULT_BUILD_DATE BUILD_DATE"
# Ensure that this task runs after the function 'reproducible_final_image_task', as this
# sets all files' mtimes in the rootfs to the value of REPRODUCIBLE_TIMESTAMP_ROOTFS.
addtask fix_www_index_html_mtime after do_image before do_image_complete
Note that using the ROOTFS_POSTPROCESS_COMMAND mechanism to invoke such a function is insufficient, as it seems to run each function before reproducible_final_image_task(), which is run by the do_image_complete task.
There is a command cargo cache which can be used to clean stuff from .cargo without just deleting the whole folder regularly (and thus having to re-download/build packages).
But either I don't get cargo cache --help right or this tool does not implement "remove all dependencies I don't need anymore based on all Cargo.toml which can be found".
Does it?
cargo cache takes some arguments which seem to provide a flexible way to get rid of stuff you don't need anymore. Unfortunately on a closer look it doesn't seem to help me at all..
There is clean-unref but that seems to only work on a given Cargo.toml (what if you have a nested project with a couple of those?)
There is trim which removes older items until a max cache size limit is reached. But that way it keeps stuff I don't need and potentially removes stuff I still need. Correct?
There is --autoclean but that seems to remove everything?
There is --keep-duplicate-crates <N> but that too seems to work if you provide concrete directories.
There is --remove-if-older-than - but how do I know how old the oldest crates are I need? And it takes only "HH:MM:SS" or a date?!
And if you have a shared cache folder for different branches with possibly different Cargo.toml all those approaches seem to not work at all..
Is there a way to say: "Please remove all crates that haven't been used in the last days"?
I'm close to just log access to .cargo using inotify and remove unneeded content manually, but that doesn't feel right..
This is an order of operations question.
Suppose I declare a list of requirements:
required:=$(patsubst %.foo,%.bar, $(shell find * -name '.foo'))
And a rule to make those requirements:
$(required):
./foo.py $#
Finally, I invoke the work with:
make foo -j 10
Suppose further the job is taking days and days (up to a week on this slow desktop computer).
In order to speed things up, I'd like to generate a list of commands and do some of the work on the Much faster laptop. I can't do all of the work on the laptop because, for whatever reason, it can't stay up for hours and hours without discharging and suspending (if I had to guess, probably due to thermal throttling):
make -n foo > outstanding_jobs
cat outstanding_jobs | sort -r | sponge outstanding_jobs
scp slow_box:outstanding_jobs fast_laptop:outstanding_jobs
ssh fast_laptop
head -n 200 outstanding_jobs | parallel -j 12
scp *.bar slow_box:.
The question is:
If I put *.bar in the directory where the original make job was run, will make still try to do that job on the slow box?
OR do I have to halt the job on the slow box and re-invoke make to "get credit" in the make recipe for the new work that I've synced over onto the slow box?
NOTE: substantially revised.
Before it starts building anything, make constructs a dependency graph to guide it, based on an analysis of the requested goal(s), the applicable build rules, and, to some extent, the files already present. It then walks the graph, starting from the goal nodes, to determine which are out of date with respect to their prerequisites and update them.
Although it does not necessarily evaluate the whole graph before running any recipes, once it decides that a given target needs to be updated, make is committed to updating it. In particular, once make decides that some direct or indirect prerequisite of T is out of date, it is committed to (re)building T, too, regardless of any subsequent action on T by another process.
So, ...
If I put *.bar in the directory where the original make job was run,
will make still try to do that job on the slow box?
Adding files to the build directory after make starts building things will not necessarily affect which targets the ongoing make run will attempt to build, nor which recipes it uses to build them. The nearer a target is to a root of the dependency graph, the less likely that the approach described will affect whether make performs a rebuild, especially if you're running a parallel make.
It's possible that you would see some time savings, but you must also consider the possibility that you end up with an inconsistent build.
OR do I have to halt the job on the slow box and re-invoke make to "get credit" in the make recipe for the new work that I've synced over onto the slow box?
If the possibility of an inconsistent build can be discounted, then that is probably a viable option. A new make run will take the then-existing files into account. Depending on the defined rules and the applicable timestamps, it is still possible that some targets would be rebuilt that did not really need to be, but unless the makefile engages in unusual shennanigans, chances are good that at least most of the built files imported from the helper machine will be accepted and used without rebuilding.
When linking executables (more than 200) in a large project, I get link rate 0.5 executables per second, even if I have ran the link stage a minute before. vmstat shows more than 20MB/s disk read rate.
But if I pre-cache the build directory using "tar cf /dev/null build-dir" once, I get consistent link rate of 4.8 executables per second and the disk read rate is basically zero.
Why doesn't Linux cache the object files and/or ".so" files when they are read by GNU Linker, but does so when they are read by tar? There is plenty of RAM (16GB). Kernel version is 4.4.146. CentOS 7.5.
It looks like an incorrect setting of vm.vfs_cache_pressure = 1000 was causing this misbehaviour. Setting it to 70 fixed the problem and restored good cache performance.
And the documentation explicitly recommends against increasing the value beyond 100. Unfortunately, the Internet is full of examples with insane values like 1000.
I apollogize if this question has already been asked. It's not easy to search.
make has been designed with the assumption that the Makefile is kinda god-like. It is all-knowing about the future of your project and will never need any modification beside adding new source files. Which is obviously not true.
I used to make all my targets in a Makefile depend on the Makefile itself. So that if I change anything in the Makefile, the whole project is rebuilt.
This has two main limitations :
It rebuilds too often. Adding a linker option or a new source file rebuilds everything.
It won't rebuild if I pass a variable on the command line, like make CFLAGS=-O3.
I see a few ways of doing it correctly, but none of them seems satisfactory at first glance.
Make every target depend on a file that contains the content of the recipe.
Generate the whole rule with its recipe into a file destined to be included from the Makefile.
Conditionally add a dependency to the targets to force them being rebuilt whenever necessary.
Use the eval function to generate the rules.
But all these solutions need an uncommon way of writing the recipes. Either putting the whole rule as a string in a variable, or wrap the recipes in a function that would do some magic.
What I'm looking for is a solution to write the rules in a way as straightforward as possible. With as little additional junk as possible. How do people usually do this?
I have projects that compile for multiple platforms. When building a single project which had previously been compiled for a different architecture, one can force a rebuild manually. However when compiling all projects for OpenWRT, manual cleanup is unmanageable.
My solution was to create a marker identifying the platform. If missing, everything will recompile.
ARCH ?= $(shell uname -m)
CROSS ?= $(shell uname -s).$(ARCH)
# marker for the last built architecture
BUILT_MARKER := out/$(CROSS).built
$(BUILT_MARKER) :
#-rm -f out/*.built
#touch $(BUILT_MARKER)
build: $(BUILT_MARKER)
# TODO: add your build commands here
If your flags are too long, you may reduce them to a checksum.
"make has been designed with the assumption that the Makefile is kinda god-like. It is all-knowing about the future of your project and will never need any modification beside adding new source files."
I disagree. make was designed in a time when having your source tree sitting in a hierarchical file system was about all you needed to know about Software configuration management, and it took this idea to the logical consequence, namely that all that is, is a file (with a timestamp). So, having linker options, locator tables, compiler flags and everything else but the kitchen sink in a file, and putting the dependencies thereof also in a file, will yield a consistent, complete and error-free build environment as far as make is concerned.
This means that passing data to a process (which is nothing else than saying that this process is dependent on that data) has to be done via a file - command line arguments as make variables are an abuse of makes capabilities and lead to erroneous results. make clean is the technical remedy for a systemic misbehaviour. It wouldn't be necessary, had the software engineer designed the make process properly and correctly.
The problem is that a clean build process is hard to design and maintain. BUT: in a modern software process, transient/volatile build parameters such as make all CFLAGS=O3 never have a place anyway, as they wreck all good foundations of config management.
The only thing that can be criticised about make may be that it isn't the be-all-end-all solution to software building. I question if a program with this task would have reached even one percent of makes popularity.
TL;DR
place your compiler/linker/locator options into separate files (at a central, prominent, easy to maintain and understand, logical location), decide about the level of control through the granularity of Information (e.g. Compiler flags in one file, linker flags in another) and put the true dependencies down for all files, and voila, you will have the exactly necessary amount of compilation and a correct build.