Skipping cache generation, cache already exists for key - caching

Using CircleCI - version: 2.1 - for continuous deployment where caching installed dependencies. Based on save_cache documentation:
Generates and stores a cache of a file or directory of files such as dependencies or source code in our object storage. Later jobs can restore this cache.
Current scenario:
See the simplified caching step below in .circleci/config.yml file:
steps:
- node/with-cache:
steps:
- checkout
- run: npm install
- save_cache:
key: dependencies
paths: node_modules
The problem is coming once adding new package to the project thus package.json file is changing. In the same time CircleCI shows the message for Saving Cache step:
Skipping cache generation, cache already exists for key: dependenciesFound one created at 2020-05-23 19:29:29 +0000 UTC
Then once restoring the cache obviously does not find the newly added package in the build step:
./src/index.tsxCannot find module: 'package-name'. Make sure this package is installed.
Questions:
Is there any way to check package.json changes in the pipeline? Ideally I would install the dependencies only in those cases, so the cache can be purged and updated.
Maybe I did not see something in the documentation. Any help is appreciated, thank you!

The problem is the cache key you used is "dependencies", a plain string. This key never changes, so you will always use the same exact cache.
You need to use a cache key that changes, preferably based on package.lock. Please read the section of cache keys in the CircleCI Docs for more information: https://circleci.com/docs/2.0/caching/#using-keys-and-templates

Related

How to deal with yarn's `.pnp.js` merge conflicts?

Using yarn 2's new plug n play (pnp) creates a long .pnp.js file. I get a bunch of merge conflicts while pulling, and these are not autofixed (unlike yarn.lock).
How are these conflicts solved? I'd rather not go them through manually as it's not clear what change to accept.
Example conflict
["virtual:844e49f9c8ad85b5809b347eb507fe8bfdc2d527102f53e0b4f78076a2ad5ea2556763170701137a2cafdc51d5a36d82e448010e65742a300748e0bc70028101#npm:11.2.7", {
"packageLocation": "./.yarn/$$virtual/#testing-library-react-virtual-2e67fd5293/0/cache/#testing-library-react-npm-11.2.7-3a0469c756-389c9f3e83.zip/node_modules/#testing-library/react/",
"packageDependencies": [
["#testing-library/react", "virtual:844e49f9c8ad85b5809b347eb507fe8bfdc2d527102f53e0b4f78076a2ad5ea2556763170701137a2cafdc51d5a36d82e448010e65742a300748e0bc70028101#npm:11.2.7"],
["#babel/runtime", "npm:7.13.10"],
<<<<<<< HEAD
["#testing-library/dom", "npm:7.30.4"],
["#types/react", "npm:17.0.3"],
["#types/react-dom", "npm:17.0.3"],
=======
["#testing-library/dom", "npm:7.31.0"],
["#types/react", "npm:17.0.8"],
["#types/react-dom", "npm:17.0.5"],
>>>>>>> d2bb5d9e537f9647e9757656de230e56282e0b15
["react", "npm:17.0.2"],
I would assume you can delete this file containing merge conflicts.
Next, you run yarn install which will generate this file again.
Or just run yarn install which will overwrite the the .pnp.cjs file and fix the merge conflicts (if any) in the yarn.lock file for you.
From the docs:
The generated .pnp.cjs file can be committed to your repository
as part of the Zero-Installs effort, removing the need to run yarn install in the first place.
As you can read, this file can - not must - be committed. However, if you commit it, you can use all your dependencies immediately after cloning the repo, switching branches, ... without need to run yarn install every time.
Note that the same does not count for yarn.lock file which you should never delete.

Is there a way to have GitLab Cache be consumed without being written to?

I have a gitlab job that downloads a bunch of dependencies and stuffs them in a cache (if necessary), then I have a bunch of jobs that use that cache. I notice at the end of the consuming jobs, they spend a bunch of time creating a new cache, even though they made no changes to it.
Is it possible to have them act only as consumers? Read-only?
cache:
paths:
- assets/
configure:
stage: .pre
script:
- conda env update --prefix ./assets/env/base -f ./environment.yml;
- source activate ./assets/env/base
- bash ./download.sh
parse1:
stage: build
script:
- source activate ./assets/env/base;
- ./build.sh -b test -s 2
artifacts:
paths:
- build
parse2:
stage: build
script:
- source activate ./assets/env/base;
- ./build.sh -b test -s 2
artifacts:
paths:
- build
In the very detailed .gitlab-ci.yml documentation is a reference to a cache setting called policy. GitLab caches have the concept of push (aka write) and pull (aka read). By default it is set to pull-push (read at the beginning and write at the end).
If you know the job does not alter the cached files, you can skip the upload step by setting policy: pull in the job specification. Typically, this would be twinned with an ordinary cache job at an earlier stage to ensure the cache is updated from time to time:
.gitlab-ci.yml > cache:policy
Which pretty much describes this situation: the job configure updates the cache, and the parse jobs do not alter the cache.
In the consuming jobs, add:
cache:
paths:
- assets/
policy: pull
For clarity, it probably wouldn't hurt to make that explicit in the global setting:
cache:
paths:
- assets/
policy: pull-push
TLDR. Overwrite cache with no path element.
You probably have to add a key element to your global cache configuration too. I actually have never used without a key element.
See the cache documentation here

GitLab CI cache with multiple paths seems to skip a path

I configuring a gitlab CI where I have 2 jobs in the install stage pulling in dependencies into cached locations. Then a job in a different stage tries to access these locations but only one seems to exist.
I've built the CI according to the python example provided by Gitlab, which can be [found here].1
My .gitlab-ci.yml file looks like this.
---
cache:
paths:
- foo-1
- foo-2
stages:
- install
- test
install_foo-1_dependencies:
stage: install
script:
- pull foo-1 dependencies
install_foo-2_dependencies:
stage: install
script:
- pull foo-2 dependencies
tags:
- ansible-f5-runner
test_dependencies:
stage: test
script:
- ls foo-1
- ls foo-2
The output of install_foo-1_dependencies and install_foo-2_dependencies clearly shows the cache being created. However when you look at the output of test_dependencies it seems only foo-1 cache is being created.
install_foo-1_dependencies output:
Fetching changes...
Removing foo-1/
Checking cache for default-5...
Successfully extracted cache
Creating cache default-5...
....
foo-1: found 1000 matching files
Created cache
install_foo-2_dependencies output:
Fetching changes...
Removing installed-roles/
Checking cache for default-5...
Successfully extracted cache
Creating cache default-5...
....
foo-2: found 1000 matching files
Created cache
Output for test_dependencies
Fetching changes...
Removing foo-1/
Checking cache for default-5...
....
Successfully extracted cache
$ ls foo-1
files
$ ls foo-2
ls: cannot access foo-2: No such file or directory
You need to ensure the same runner is used for each stage of this pipeline. From the docs:
Tip: Using the same Runner for your pipeline, is the most simple and efficient way to cache files in one stage or pipeline, and pass this cache to subsequent stages or pipelines in a guaranteed manner.
It's not apparent from your .gitlab-ci.yml file that you're ensuring the same runner picks up each stage. Again from these docs, to ensure that one runner is used, you should use one or a mix of the following:
Tag your Runners and use the tag on jobs that share their cache.
Use sticky Runners that will be only available to a particular project.
Use a key that fits your workflow (e.g., different caches on each branch).

Is it possible to rebuild only updated files in Gitlab CI?

I'm using this script for my Gitlab CI build stage (only relevant part is shown):
cache:
key: "$CI_BUILD_REF"
paths:
- bin/
- build/
build:
image: <my_build_image>
stage: build
script:
- "make PLATFORM='x86_64-linux-gnu' BUILD='release' JOBS=8 all"
only:
- master
- tags
- merge-requests
artifacts:
untracked: true
paths:
- bin/x86_64-linux-gnu/release
I thought what if I'll add bin and build dirs into the cache, make won't rebuild the whole project every time (just like it behaves locally), but it seems what CI runner overwrites my src dir every time, so timestamps on the files is being updated too and make think each file is updated. I thought about including src dir into the cache, but it's included in the repo and I'm not sure this is correct. So, which is the best way to rebuild gitlab ci project using previously built binaries?
I see you are using $CI_BUILD_REF as a cache key; although this variable is deprecated, it seems to work and provides the commit's SHA1.
Is that really what you intended, to create separate caches per commit (not even per branch)?
So for any new commit there wouldn't be a cache anyways?
I'd probably even use a static cache key in order to maximize caching (while using minimal cache storage), or maybe per branch.
Maybe also the Git checkouts and/or branch switches touch the source files too often.
I have implemented a similar strategy in one of my projects, but there I have a distinct "cached" folder to where I /rsync/ the files from the checkout.
The shared runners of Gitlab.com do seem to leave the file modification time intact when using a cache, and even on the main checkout.
I've put up a sample project with a CI job that demonstrates the fac tat https://gitlab.com/hannibal218bc/test-build-cache-xtimes/-/jobs/1022404894 :
the job stats the directory's contents
creates a cached directory if it not yet exists
copies the README.md file
"compiles" the file to a README.built file.
As you can see in the output, the modification timestamp of the README.built is the runtime from the previous job:
$ cd cached
$ stat README.* || true
File: README.built
Size: 146 Blocks: 16 IO Block: 4096 regular file
Device: 809h/2057d Inode: 2101510 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2021-02-10 23:06:13.000000000 +0000
Modify: 2021-02-10 23:02:39.000000000 +0000 <<< timestamp from previous job
Change: 2021-02-10 23:06:13.000000000 +0000

Cache gradle dependencies, Travis CI

I'm trying to cache the dependencies for a private Travis CI repository, does Travis have some mechanism specific for gradle, or do I have to cache specific directories?
.travis.yml:
language: groovy
jdk:
- openjdk7
env:
- TERM=dumb
before_install:
- cd application
- chmod +x gradlew
script:
- ./gradlew build
Relevant parts of last working build:
Downloading https://services.gradle.org/distributions/gradle-2.1-bin.zip
......................................................................................................................................................................................
Unzipping /home/travis/.gradle/wrapper/dists/gradle-2.1-bin/2pk0g2l49n2sbne636fhtlet6a/gradle-2.1-bin.zip to /home/travis/.gradle/wrapper/dists/gradle-2.1-bin/2pk0g2l49n2sbne636fhtlet6a
Set executable permissions for: /home/travis/.gradle/wrapper/dists/gradle-2.1-bin/2pk0g2l49n2sbne636fhtlet6a/gradle-2.1/bin/gradle
Download https://jcenter.bintray.com/com/mycila/xmltool/xmltool/3.3/xmltool-3.3.pom
...
Would adding:
cache:
directories:
- $HOME/.gradle
work? or perhaps:
cache:
directories:
- $HOME/.gradle/caches/modules-2/files-2.1
Add this to your .travis.yml:
before_cache:
- rm -f $HOME/.gradle/caches/modules-2/modules-2.lock
- rm -fr $HOME/.gradle/caches/*/plugin-resolution/
cache:
directories:
- $HOME/.gradle/caches/
- $HOME/.gradle/wrapper/
It is documented in Travis documentation at https://docs.travis-ci.com/user/languages/java/#projects-using-gradle
You'll have to cache at least ~/.gradle/wrapper and ~/.gradle/caches, but I'd probably start out with ~/.gradle. (If necessary, the location of the latter can be changed by setting the GRADLE_USER_HOME environment variable). When upgrading to a newer Gradle version, the cache structure may change, so it might make sense to invalidate the cache from time to time.
PS: Please don't double-post here and on the Gradle forums (either is fine).
Probably you should add sudo: false to your .travis.yml, because caching is not available for public repositories. It will prevent you from using sudo, setid, setgid, but it allows caching mechanism!
But I have found that caching $HOME/.gradle/caches is not a very good variant, because the file $HOME/.gradle/caches/modules-2/modules-2.lock is changed every build, so Travis would repack the cache every time, and do full upload of that cache. That is slower for me than downloading all my dependencies. So maybe it would be better specify something else than $HOME/.gradle/caches.
I just added the following folders:
- $HOME/.gradle/wrapper
- $HOME/.gradle/native
- $HOME/.gradle/daemon
- $HOME/.gradle/caches/jars-1
- $HOME/.gradle/caches/2.3
Adding the .gradle/caches will create a new cache file every build.
Don't forget to change 2.3 to your gradle version.
You just have to add the lines below into your .travis.yml :
before_cache:
- rm -f $HOME/.gradle/caches/modules-2/modules-2.lock
cache:
directories:
- $HOME/.gradle/caches/
- $HOME/.gradle/wrapper/
You can obtain more information here.
As of version 3.5.1 the simplest and most effective way is to just cache the caches/modules-2 and caches/wrapper directory. Caching whole caches directory adds too many files and it causes greater delay. You still need to to delete modules-2.lock file.
before_cache:
- rm -rf $HOME/.gradle/caches/modules-2/modules-2.lock
cache:
- $HOME/.gradle/caches/modules-2
- $HOME/.gradle/wrapper/

Resources