Cache gradle dependencies, Travis CI - gradle

I'm trying to cache the dependencies for a private Travis CI repository, does Travis have some mechanism specific for gradle, or do I have to cache specific directories?
.travis.yml:
language: groovy
jdk:
- openjdk7
env:
- TERM=dumb
before_install:
- cd application
- chmod +x gradlew
script:
- ./gradlew build
Relevant parts of last working build:
Downloading https://services.gradle.org/distributions/gradle-2.1-bin.zip
......................................................................................................................................................................................
Unzipping /home/travis/.gradle/wrapper/dists/gradle-2.1-bin/2pk0g2l49n2sbne636fhtlet6a/gradle-2.1-bin.zip to /home/travis/.gradle/wrapper/dists/gradle-2.1-bin/2pk0g2l49n2sbne636fhtlet6a
Set executable permissions for: /home/travis/.gradle/wrapper/dists/gradle-2.1-bin/2pk0g2l49n2sbne636fhtlet6a/gradle-2.1/bin/gradle
Download https://jcenter.bintray.com/com/mycila/xmltool/xmltool/3.3/xmltool-3.3.pom
...
Would adding:
cache:
directories:
- $HOME/.gradle
work? or perhaps:
cache:
directories:
- $HOME/.gradle/caches/modules-2/files-2.1

Add this to your .travis.yml:
before_cache:
- rm -f $HOME/.gradle/caches/modules-2/modules-2.lock
- rm -fr $HOME/.gradle/caches/*/plugin-resolution/
cache:
directories:
- $HOME/.gradle/caches/
- $HOME/.gradle/wrapper/
It is documented in Travis documentation at https://docs.travis-ci.com/user/languages/java/#projects-using-gradle

You'll have to cache at least ~/.gradle/wrapper and ~/.gradle/caches, but I'd probably start out with ~/.gradle. (If necessary, the location of the latter can be changed by setting the GRADLE_USER_HOME environment variable). When upgrading to a newer Gradle version, the cache structure may change, so it might make sense to invalidate the cache from time to time.
PS: Please don't double-post here and on the Gradle forums (either is fine).

Probably you should add sudo: false to your .travis.yml, because caching is not available for public repositories. It will prevent you from using sudo, setid, setgid, but it allows caching mechanism!
But I have found that caching $HOME/.gradle/caches is not a very good variant, because the file $HOME/.gradle/caches/modules-2/modules-2.lock is changed every build, so Travis would repack the cache every time, and do full upload of that cache. That is slower for me than downloading all my dependencies. So maybe it would be better specify something else than $HOME/.gradle/caches.

I just added the following folders:
- $HOME/.gradle/wrapper
- $HOME/.gradle/native
- $HOME/.gradle/daemon
- $HOME/.gradle/caches/jars-1
- $HOME/.gradle/caches/2.3
Adding the .gradle/caches will create a new cache file every build.
Don't forget to change 2.3 to your gradle version.

You just have to add the lines below into your .travis.yml :
before_cache:
- rm -f $HOME/.gradle/caches/modules-2/modules-2.lock
cache:
directories:
- $HOME/.gradle/caches/
- $HOME/.gradle/wrapper/
You can obtain more information here.

As of version 3.5.1 the simplest and most effective way is to just cache the caches/modules-2 and caches/wrapper directory. Caching whole caches directory adds too many files and it causes greater delay. You still need to to delete modules-2.lock file.
before_cache:
- rm -rf $HOME/.gradle/caches/modules-2/modules-2.lock
cache:
- $HOME/.gradle/caches/modules-2
- $HOME/.gradle/wrapper/

Related

Is there a way to have GitLab Cache be consumed without being written to?

I have a gitlab job that downloads a bunch of dependencies and stuffs them in a cache (if necessary), then I have a bunch of jobs that use that cache. I notice at the end of the consuming jobs, they spend a bunch of time creating a new cache, even though they made no changes to it.
Is it possible to have them act only as consumers? Read-only?
cache:
paths:
- assets/
configure:
stage: .pre
script:
- conda env update --prefix ./assets/env/base -f ./environment.yml;
- source activate ./assets/env/base
- bash ./download.sh
parse1:
stage: build
script:
- source activate ./assets/env/base;
- ./build.sh -b test -s 2
artifacts:
paths:
- build
parse2:
stage: build
script:
- source activate ./assets/env/base;
- ./build.sh -b test -s 2
artifacts:
paths:
- build
In the very detailed .gitlab-ci.yml documentation is a reference to a cache setting called policy. GitLab caches have the concept of push (aka write) and pull (aka read). By default it is set to pull-push (read at the beginning and write at the end).
If you know the job does not alter the cached files, you can skip the upload step by setting policy: pull in the job specification. Typically, this would be twinned with an ordinary cache job at an earlier stage to ensure the cache is updated from time to time:
.gitlab-ci.yml > cache:policy
Which pretty much describes this situation: the job configure updates the cache, and the parse jobs do not alter the cache.
In the consuming jobs, add:
cache:
paths:
- assets/
policy: pull
For clarity, it probably wouldn't hurt to make that explicit in the global setting:
cache:
paths:
- assets/
policy: pull-push
TLDR. Overwrite cache with no path element.
You probably have to add a key element to your global cache configuration too. I actually have never used without a key element.
See the cache documentation here

Skipping cache generation, cache already exists for key

Using CircleCI - version: 2.1 - for continuous deployment where caching installed dependencies. Based on save_cache documentation:
Generates and stores a cache of a file or directory of files such as dependencies or source code in our object storage. Later jobs can restore this cache.
Current scenario:
See the simplified caching step below in .circleci/config.yml file:
steps:
- node/with-cache:
steps:
- checkout
- run: npm install
- save_cache:
key: dependencies
paths: node_modules
The problem is coming once adding new package to the project thus package.json file is changing. In the same time CircleCI shows the message for Saving Cache step:
Skipping cache generation, cache already exists for key: dependenciesFound one created at 2020-05-23 19:29:29 +0000 UTC
Then once restoring the cache obviously does not find the newly added package in the build step:
./src/index.tsxCannot find module: 'package-name'. Make sure this package is installed.
Questions:
Is there any way to check package.json changes in the pipeline? Ideally I would install the dependencies only in those cases, so the cache can be purged and updated.
Maybe I did not see something in the documentation. Any help is appreciated, thank you!
The problem is the cache key you used is "dependencies", a plain string. This key never changes, so you will always use the same exact cache.
You need to use a cache key that changes, preferably based on package.lock. Please read the section of cache keys in the CircleCI Docs for more information: https://circleci.com/docs/2.0/caching/#using-keys-and-templates

How can I run dependency install job only when it's not cached or package.json changed in gitlab ci?

I have a monorepo in gitlab with angular frontend and nestjs backend. I have package.json for each of them and 1 in the root. My pipeline consists of multiple stages like these:
stages:
- build
- verify
- test
- deploy
And I have a job in a .pre stage which installs dependencies. I would like to cache those between jobs and also between branches, if any of package-lock.json changed, but also if there are no cached node_modules currently.
I have a job that looks like this:
prepare:
stage: .pre
script:
- npm run ci-deps # runs npm ci in each folder
cache:
key: $CI_PROJECT_ID
paths:
- node_modules/
- frontend/node_modules/
- backend/node_modules/
only:
changes:
- '**/package-lock.json'
Now problem with this is that if cache was somehow cleared or if I didn't introduce changes to package-lock.json with first push I won't have this job running at all and therefore everything else will fail because it requires node_modules. If I remove changes: from there, then it runs the job for every pipeline. Of course then I still can share it between jobs, but if I do another commit and push it takes almost 2 minutes to install all the dependencies even though I didn't change anything about what should be there... Am I missing something? How can I cache it in a way so that it only will reinstall dependencies if cache is outdated or doesn't exist?
Rules:Exists runs before the cache is pulled down so this was not a workable soluton for me.
In GitLab v12.5 we can now use cache:key:files
If we combine that with part of Blind Despair's conditional logic we get a nicely working solution
prepare:
stage: .pre
image: node:12
script:
- if [[ ! -d node_modules ]];
then
npm ci;
fi
cache:
key:
files:
- package-lock.json
prefix: nm-$CI_PROJECT_NAME
paths:
- node_modules/
We can then use this in subsequent build jobs
# let's keep it dry with templates
.use_cached_node_modules: &use_cached_node_modules
cache:
key:
files:
- package-lock.json
prefix: nm-$CI_PROJECT_NAME
paths:
- node_modules/
policy: pull # don't push unnecessarily
build:
<<: *use_cached_node_modules
stage: build
image: node:12
script:
- npm run build
We use this successfully across multiple branches with a shared cache.
In the end I figured that I could do this without relying on gitlab ci features, but do my own checks like so:
prepare:
stage: .pre
image: node:12
script:
- if [[ ! -d node_modules ]] || [[ -n `git diff --name-only HEAD~1 HEAD | grep "\package.json\b"` ]];
then
npm ci;
fi
- if [[ ! -d frontend/node_modules ]] || [[ -n `git diff --name-only HEAD~1 HEAD | grep "\frontend/package.json\b"` ]];
then
npm run ci-deps:frontend;
fi
- if [[ ! -d backend/node_modules ]] || [[ -n `git diff --name-only HEAD~1 HEAD | grep "\backend/package.json\b"` ]];
then
npm run ci-deps:backend;
fi
cache:
key: '$CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR'
paths:
- node_modules/
- frontend/node_modules
- backend/node_modules
The good thing about this is that it will only install dependencies for specific part of the project if it either doesn't have node_modules yet or when package.json was changed. This however will probably be wrong if I push multiple commits and package.json would change not in the last one. In that case I can still clear cache and rerun pipeline manually, but I will try to further improve my script and update my answer.
I had the same problem, and I was able to solve it using the keyword rules instead of only|except. With it, you can declare more complex cases, using if, exists, changes, for example. Also, this :
Rules can't be used in combination with only/except because it is a replacement for that functionality. If you attempt to do this, the linter returns a key may not be used with rules error.
-- https://docs.gitlab.com/ee/ci/yaml/#rules
All the more reasons to switch to rules. Here's my solution, which executes npm ci :
if the package-lock.json file was modified
OR
or if node-modules folder does not exists (in case of new branches or cache cleaning) :
npm-ci:
image: node:lts
cache:
key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
paths:
- node_modules/
script:
- npm ci
rules:
- changes:
- package-lock.json
- exists:
- node_modules
when: never
Hope it helps !

GitLab CI cache with multiple paths seems to skip a path

I configuring a gitlab CI where I have 2 jobs in the install stage pulling in dependencies into cached locations. Then a job in a different stage tries to access these locations but only one seems to exist.
I've built the CI according to the python example provided by Gitlab, which can be [found here].1
My .gitlab-ci.yml file looks like this.
---
cache:
paths:
- foo-1
- foo-2
stages:
- install
- test
install_foo-1_dependencies:
stage: install
script:
- pull foo-1 dependencies
install_foo-2_dependencies:
stage: install
script:
- pull foo-2 dependencies
tags:
- ansible-f5-runner
test_dependencies:
stage: test
script:
- ls foo-1
- ls foo-2
The output of install_foo-1_dependencies and install_foo-2_dependencies clearly shows the cache being created. However when you look at the output of test_dependencies it seems only foo-1 cache is being created.
install_foo-1_dependencies output:
Fetching changes...
Removing foo-1/
Checking cache for default-5...
Successfully extracted cache
Creating cache default-5...
....
foo-1: found 1000 matching files
Created cache
install_foo-2_dependencies output:
Fetching changes...
Removing installed-roles/
Checking cache for default-5...
Successfully extracted cache
Creating cache default-5...
....
foo-2: found 1000 matching files
Created cache
Output for test_dependencies
Fetching changes...
Removing foo-1/
Checking cache for default-5...
....
Successfully extracted cache
$ ls foo-1
files
$ ls foo-2
ls: cannot access foo-2: No such file or directory
You need to ensure the same runner is used for each stage of this pipeline. From the docs:
Tip: Using the same Runner for your pipeline, is the most simple and efficient way to cache files in one stage or pipeline, and pass this cache to subsequent stages or pipelines in a guaranteed manner.
It's not apparent from your .gitlab-ci.yml file that you're ensuring the same runner picks up each stage. Again from these docs, to ensure that one runner is used, you should use one or a mix of the following:
Tag your Runners and use the tag on jobs that share their cache.
Use sticky Runners that will be only available to a particular project.
Use a key that fits your workflow (e.g., different caches on each branch).

Is it possible to rebuild only updated files in Gitlab CI?

I'm using this script for my Gitlab CI build stage (only relevant part is shown):
cache:
key: "$CI_BUILD_REF"
paths:
- bin/
- build/
build:
image: <my_build_image>
stage: build
script:
- "make PLATFORM='x86_64-linux-gnu' BUILD='release' JOBS=8 all"
only:
- master
- tags
- merge-requests
artifacts:
untracked: true
paths:
- bin/x86_64-linux-gnu/release
I thought what if I'll add bin and build dirs into the cache, make won't rebuild the whole project every time (just like it behaves locally), but it seems what CI runner overwrites my src dir every time, so timestamps on the files is being updated too and make think each file is updated. I thought about including src dir into the cache, but it's included in the repo and I'm not sure this is correct. So, which is the best way to rebuild gitlab ci project using previously built binaries?
I see you are using $CI_BUILD_REF as a cache key; although this variable is deprecated, it seems to work and provides the commit's SHA1.
Is that really what you intended, to create separate caches per commit (not even per branch)?
So for any new commit there wouldn't be a cache anyways?
I'd probably even use a static cache key in order to maximize caching (while using minimal cache storage), or maybe per branch.
Maybe also the Git checkouts and/or branch switches touch the source files too often.
I have implemented a similar strategy in one of my projects, but there I have a distinct "cached" folder to where I /rsync/ the files from the checkout.
The shared runners of Gitlab.com do seem to leave the file modification time intact when using a cache, and even on the main checkout.
I've put up a sample project with a CI job that demonstrates the fac tat https://gitlab.com/hannibal218bc/test-build-cache-xtimes/-/jobs/1022404894 :
the job stats the directory's contents
creates a cached directory if it not yet exists
copies the README.md file
"compiles" the file to a README.built file.
As you can see in the output, the modification timestamp of the README.built is the runtime from the previous job:
$ cd cached
$ stat README.* || true
File: README.built
Size: 146 Blocks: 16 IO Block: 4096 regular file
Device: 809h/2057d Inode: 2101510 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2021-02-10 23:06:13.000000000 +0000
Modify: 2021-02-10 23:02:39.000000000 +0000 <<< timestamp from previous job
Change: 2021-02-10 23:06:13.000000000 +0000

Resources