Gitlab pipeline: How to recache node modules only when dependency changed? - continuous-integration

I am working on the performance tuning for Gitlab pipeline using cache.
This is a nodejs project using npm for the dependency management. I have put the node_modules folder into cache for subsequent stages with following setting:
build:
stage: build
only:
- develop
script:
- npm install
cache:
key: $CI_COMMIT_REF_SLUG
paths:
- node_modules/
Could I make the cache available for pipeline triggered next time? Or the cache is accessible in single pipeline?
If I can access that within multiple pipeline, could I recache the node module only when we change package.json?

First, put the cache on the global level. This will make sure, that the jobs share the same cache.
Second, you can use cache:key:files introduced with GitLab 12.5 to only recreate the cache when the package.json changes.
cache:
key:
files:
- package.json
paths:
- node_modules/
build:
stage: build
only:
- develop
script:
- npm install
Further information:
https://docs.gitlab.com/ee/ci/yaml/#cachekeyfiles
Additional hints:
You might want to check on package-lock.json instead of package.json.
I recommend reading the cache mismatch chapter in the documentation to make sure you don't run into common problems where the cache might not be restored.
Instead of simply adding npm install, you can also skip this step when the node_modules folder was recreated from cache. Following bash addition to your npm install will only run the command, if the node_modules folder doesn't exist.
build:
stage: build
only:
- develop
script:
- if [ ! -d "node_modules" ]; then npm install; fi

Related

Clearing build cache in angular

In angular 13 default build caching was introduced.
I want to use it in my CD/CI gitlab pipelines but I can't find any information about when the cache should be cleared.
For every merge request, I want to build my app and run some tests.
It is safe to use the same cached directory for each MR, no matter what was changed?
If not, what should be the key for the cache?
I didn't find anything about it in angular docs.
You could follow some best practices as illustrated in GitLab documentation, whose runners do have a cache management.
See "Good caching practices"
test-job:
stage: build
cache:
- key:
files:
- Gemfile.lock
paths:
- vendor/ruby
- key:
files:
- yarn.lock
paths:
- .yarn-cache/
script:
- bundle install --path=vendor
- yarn install --cache-folder .yarn-cache
- echo Run tests...
In this example, you indicate to yarn where the cache folder (from the runner) is.
In your case (Angular cache, as requested in issue 21545), if you use a GitLab runner cache, you can clear said cache when you want.

Cypress binary is missing and Gitlab CI pipeline

I'm trying to integrate cypress testing into gitlab pipeline.
I've tried about 10 different configurations which all fail.. I've included what I think are the relevant portions of of the gitlab.yml file, as well as the screenshot of the error on gitlab.
Thanks for any help
variables:
GIT_SUBMODULE_STRATEGY: recursive
cache:
paths:
- src/ui/node_modules/
- /root/.cache/Cypress/ //added this, also have tried src/ui/cypress/
build_ui:
image: node:16.14.2
stage: build
script:
- cd src/ui
- yarn install --pure-lockfile --prefer-offline --cache-folder .yarn
ui_test:
image: node:16.14.2
stage: test
needs: [build_ui]
script:
- cd src/ui
- yarn run runCypressHeadless
Each job gets its own separate environment. Therefore, you need to install your dependencies in each job. Add your yarn install command to the ui_test job.
The reason why your cache: did not restore to the job from the previous stage is because caches are per job by default (e.g. caches are restored from previous pipelines that ran the same job). If you want subsequent jobs in the same pipeline to use the cache, set the cache:key: to something like $CI_COMMIT_SHA or use cache:key:files: to use a file key, like your lockfile(s).
Also, you can only cache paths in the workspace. So you won't be able to cache/restore /root/.cache/... -- instead you should change the cache location to somewhere in the workspace.
For additional reference, see: caching in GitLab CI and caching NodeJS dependencies.

Gitlab CI : how to cache node_modules from a prebuilt image?

The situation is this:
I'm running Cypress tests in a Gitlab CI (launched by vue-cli). To speed up the execution, I built a Docker image that contains the necessary dependencies.
How can I cache node_modules from the prebuilt image to use it in the test job ?
Currently I'm using an awful (but working) solution:
testsE2e:
image: path/to/prebuiltImg
stage: tests
script:
- ln -s /node_modules/ /builds/path/to/prebuiltImg/node_modules
- yarn test:e2e
- yarn test:e2e:report
But I think there must be a cleaner way using the Gitlab CI cache.
I've been testing:
cacheE2eDeps:
image: path/to/prebuiltImg
stage: dependencies
cache:
key: e2eDeps
paths:
- node_modules/
script:
- find / -name node_modules # check that node_modules files are there
- echo "Caching e2e test dependencies"
testsE2e:
image: path/to/prebuiltImg
stage: tests
cache:
key: e2eDeps
script:
- yarn test:e2e
- yarn test:e2e:report
But the job cacheE2eDeps displays a "WARNING: node_modules/: no matching files" error.
How can I do this successfully? The Gitlab documentation doesn't really talk about caching from a prebuilt image...
The Dockerfile used to build the image :
FROM cypress/browsers:node13.8.0-chrome81-ff75
COPY . .
RUN yarn install
There is not documentation for caching data from prebuilt images, because it’s simply not done. The dependencies are already available in the image so why cache them in the first place? It would only lead to an unnecessary data duplication.
Also, you seem to operate under the impression that cache should be used to share data between jobs, but it’s primary use case is sharing data between different runs of the same job. Sharing data between jobs should be done using artifacts.
In your case you can use cache instead of prebuilt image, like so:
variables:
CYPRESS_CACHE_FOLDER: "$CI_PROJECT_DIR/cache/Cypress"
testsE2e:
image: cypress/browsers:node13.8.0-chrome81-ff75
stage: tests
cache:
key: "e2eDeps"
paths:
- node_modules/
- cache/Cypress/
script:
- yarn install
- yarn test:e2e
- yarn test:e2e:report
The first time the above job is run, it’ll install dependencies from scratch, but the next time it’ll fetch them from the runner cache. The caveat is that unless all runners that run this job share cache, each time you run it on a new runner it’ll install the dependencies from scratch.
Here’s the documentation about using yarn with GitLab CI.
Edit:
To elaborate on using cache vs artifacts - artifacts are meant for both storing job output (eg. to manually download it later) and for passing results of one job to another one from a subsequent stage, while cache is meant to speed up job execution by preserving files that the job needs to download from the internet. See GitLab documentation for details.
Contents of node_modules directory obviously fit into the second category.

AWS CodeBuild does not work with Yarn Workspaces

I'm using Yarn Workspaces in my repository and also using AWS CodeBuild to build my packages. When build starts, CodeBuild takes 60 seconds to install all packages and I'd want to avoid this time caching node_modules folder.
When I add:
cache:
paths:
- 'node_modules/**/*'
to my buildspec file and enable LOCAL_CUSTOM_CACHE, I receive this error:
error An unexpected error occurred: "EEXIST: file already exists, mkdir '/codebuild/output/src637134264/src/git-codecommit.us-east-2.amazonaws.com/v1/repos/MY_REPOSITORY/node_modules/#packages/configs'".
Is there a way to remove this error configuring AWS CodeBuild or Yarn?
My buildspec file:
version: 0.2
phases:
install:
commands:
- npm install -g yarn
- git config --global credential.helper '!aws codecommit credential-helper $#'
- git config --global credential.UseHttpPath true
- yarn
pre_build:
commands:
- git rev-parse HEAD
- git pull origin master
build:
commands:
- yarn run build
- yarn run deploy
post_build:
commands:
- echo 'Finished.'
cache:
paths:
- 'node_modules/**/*'
Thank you!
Update 1:
The folder /codebuild/output/src637134264/src/git-codecommit.us-east-2.amazonaws.com/v1/repos/MY_REPOSITORY/node_modules/#packages/configs was being attempted to be created by Yarn, with the command - yarn at install phase. This folder is one of my repository packages, called #packages/config. When I run yarn on my computer, Yarn creates folders linking my packages as described here. An example of how my node_modules structure is on my computer:
node_modules/
|-- ...
|-- #packages/
| |-- configs/
| |-- myPackageA/
| |-- myPackageB/
|-- ...
I was having the exact same issue ("EEXIST: file already exists, mkdir"), I ended up using S3 cache and it worked pretty well. Note: for some reason the first upload to S3 took way (10 minutes) too long, the others went fine.
Before:
[5/5] Building fresh packages...
--
Done in 60.28s.
After:
[5/5] Building fresh packages...
--
Done in 6.64s.
If you already have your project configured you can edit the cache accessing the Project -> Edit -> Artifacts -> Additional configuration.
My buildspec.yml is as follows:
version: 0.2
phases:
install:
runtime-versions:
nodejs: 14
build:
commands:
- yarn config set cache-folder /root/.yarn-cache
- yarn install --frozen-lockfile
- ...other build commands go here
cache:
paths:
- '/root/.yarn-cache/**/*'
- 'node_modules/**/*'
# This third entry is only if you're using monorepos (under the packages folder)
# - 'packages/**/node_modules/**/*'
If you use NPM you'd do something similar, with slightly different commands:
version: 0.2
phases:
install:
runtime-versions:
nodejs: 14
build:
commands:
- npm config -g set prefer-offline true
- npm config -g set cache /root/.npm
- npm ci
- ...other build commands go here
cache:
paths:
- '/root/.npm-cache/**/*'
- 'node_modules/**/*'
# This third entry is only if you're using monorepos (under the packages folder)
# - 'packages/**/node_modules/**/*'
Kudos to: https://mechanicalrock.github.io/2019/02/03/monorepos-aws-codebuild.html

How to have a "cache per package.json" file in GitLab CI?

I have a Vue web application that is built, tested and deployed using GitLab CI.
GitLab CI has a "Cache" feature where specific products of a Job can be cached so that future runs of the Job in the same Pipeline can be avoided and the cached products be used instead.
I'd like to improve my workflow's performance by caching the node_modules directory so it can be shared across Pipelines.
GitLab Docs suggests using ${CI_COMMIT_REF_SLUG} as the cache key to achieve this. However, this means "caching per-branch" and I would like to improve on that.
I would like to have a cache "per package.json". That is, only if there is a change in the contents of package.json will the cache key change and npm install will be run.
I was thinking of using a hash of the contents of the package.json file as the cache key. Is this possible with GitLab CI? If so, how?
This is now possible as of Gilab Runner v12.5
cache:
key:
files:
- Gemfile.lock
- package-lock.json // or yarn.lock
paths:
- vendor/ruby
- node_modules
It means cache key will be a SHA checksum computed from the most recent commits (up to two, if two files are listed) that changed the given files. Whenever one of these files changes, a new cache key is computed and a new cache is created. Any future job runs using the same Gemfile.lock and package.json with cache:key:files will use the new cache, instead of rebuilding the dependencies.
More info: https://docs.gitlab.com/ee/ci/yaml/#cachekeyfiles
Also make sure to use always --frozen-lockfile flag in your CI jobs. (or npm ci) Regular npm install or yarn install / yarn commands generate new lock files and you wouldn't usually notice it until you install packages again. Thus makes your build artifacts and caches inconsistent.
For that behavior use only:changes parameter with a static cache name.
Ex:
install:
image: node:latest
script:
- npm install
cache:
untracked: true
key: npm #static name, can use any branch, any commit, etc..
paths:
- node_modules
only: #Only execute this job when theres a change in package.json
changes:
- package.json
If you need read this to set cache properly in runners:
https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching https://docs.gitlab.com/ee/ci/caching/

Resources