Jekyll 4.0.0 doesn't build using cache in CI - ruby

Hello I just upgraded my website to Jekyll 4.0.0 and it's starting to take a long time to compile. Up to 10 minutes sometimes. But when I use the incremental build locally it's able to product the compiled version in a few seconds. So I tried to cache all the Jekyll related caches I could find. I'm using CircleCI this is my config.yml
- save_cache:
key: site-cache-260320
paths:
- _site
- .jekyll-cache
- .jekyll-metadata
- .sass-cache
This restores the cache folders to the repo when the CircleCI job starts. But it doesn't seem like they get reused in the compilation process. The job always takes almost 10 minutes to compile.
Am I missing a cache folder? Is there a Jekyll option I need to use? If I could get my website build/deploys down to a few seconds that would be life changing. Thanks!

The CircleCI documentation on caching also mention
CircleCI restores caches in the order of keys listed in the restore_cache step. Each cache key is namespaced to the project, and retrieval is prefix-matched. The cache will be restored from the first matching key
steps:
- restore_cache:
keys:
So make sure to configure your restore_cache step, to go along with your save_cache step.
For instance, be aware of the cache size.

Related

Clearing build cache in angular

In angular 13 default build caching was introduced.
I want to use it in my CD/CI gitlab pipelines but I can't find any information about when the cache should be cleared.
For every merge request, I want to build my app and run some tests.
It is safe to use the same cached directory for each MR, no matter what was changed?
If not, what should be the key for the cache?
I didn't find anything about it in angular docs.
You could follow some best practices as illustrated in GitLab documentation, whose runners do have a cache management.
See "Good caching practices"
test-job:
stage: build
cache:
- key:
files:
- Gemfile.lock
paths:
- vendor/ruby
- key:
files:
- yarn.lock
paths:
- .yarn-cache/
script:
- bundle install --path=vendor
- yarn install --cache-folder .yarn-cache
- echo Run tests...
In this example, you indicate to yarn where the cache folder (from the runner) is.
In your case (Angular cache, as requested in issue 21545), if you use a GitLab runner cache, you can clear said cache when you want.

Gitlab Runner Cache gets corrupted

I have a Gitlab CI/CD pipeline with multiple jobs running in parallel, each job executes mvn test package.
Because there's a lot of dependencies, I'm making use of Gitlabs caching feature to store the .m2 folder:
cache:
key: "$CI_PROJECT_NAME"
paths:
- .m2/repository
I'm using the CI_PROJECT_NAME as I want the cache to be available to all jobs in all branches.
It mostly works, in many jobs I see the build succeed, then a message that the cache was either
created or it's already up to date:
Creating cache my-project-name...
.m2/repository: found 10142 matching files and directories
Archive is up to date!
Created cache
But in some jobs, Maven suddenly fails:
355804 [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.5.2:compile (default) on project spark: wrap: scala.reflect.internal.FatalError: Error accessing /builds/Kxs9HrJp/4/analytics/my-project-name/.m2/repository/org/apache/spark/spark-catalyst_2.12/3.1.1/spark-catalyst_2.12-3.1.1.jar: zip END header not found -> [Help 1]
It seems that the cache was somehow corrupted. If I execute the same job again, it now consistently fails. If I clear the runner cache through the UI, the same job runs successfully again until it fails for another file at some point.
I have a feeling that the concurrent runs are the problem, but I don't know why.
Each job downloads the current state of the cache at the beginning.
Even if it's not up to date, maven will simply download the missing libraries.
If two or more jobs try to update / upload the cache "at the same time", it's OK for the last one to win and overwrite the others' cache.
Any idea what's happening here?
I think maybe it's related to concurrent workers ( Read over write at the same time maybe) if you have received the error just one time, I could suppose that it maybe error connection from the runner to the cache location, but what I'm seeing its more often, the problem is maybe concurrency.
Try to change your key to be more specific by branch/Commit hash and try again...
cache:
key: $CI_COMMIT_REF_SLUG
paths:
- .m2/repository
OR using a distributed location like S3 with versioning enabled...

Gitlab CI : how to cache node_modules from a prebuilt image?

The situation is this:
I'm running Cypress tests in a Gitlab CI (launched by vue-cli). To speed up the execution, I built a Docker image that contains the necessary dependencies.
How can I cache node_modules from the prebuilt image to use it in the test job ?
Currently I'm using an awful (but working) solution:
testsE2e:
image: path/to/prebuiltImg
stage: tests
script:
- ln -s /node_modules/ /builds/path/to/prebuiltImg/node_modules
- yarn test:e2e
- yarn test:e2e:report
But I think there must be a cleaner way using the Gitlab CI cache.
I've been testing:
cacheE2eDeps:
image: path/to/prebuiltImg
stage: dependencies
cache:
key: e2eDeps
paths:
- node_modules/
script:
- find / -name node_modules # check that node_modules files are there
- echo "Caching e2e test dependencies"
testsE2e:
image: path/to/prebuiltImg
stage: tests
cache:
key: e2eDeps
script:
- yarn test:e2e
- yarn test:e2e:report
But the job cacheE2eDeps displays a "WARNING: node_modules/: no matching files" error.
How can I do this successfully? The Gitlab documentation doesn't really talk about caching from a prebuilt image...
The Dockerfile used to build the image :
FROM cypress/browsers:node13.8.0-chrome81-ff75
COPY . .
RUN yarn install
There is not documentation for caching data from prebuilt images, because it’s simply not done. The dependencies are already available in the image so why cache them in the first place? It would only lead to an unnecessary data duplication.
Also, you seem to operate under the impression that cache should be used to share data between jobs, but it’s primary use case is sharing data between different runs of the same job. Sharing data between jobs should be done using artifacts.
In your case you can use cache instead of prebuilt image, like so:
variables:
CYPRESS_CACHE_FOLDER: "$CI_PROJECT_DIR/cache/Cypress"
testsE2e:
image: cypress/browsers:node13.8.0-chrome81-ff75
stage: tests
cache:
key: "e2eDeps"
paths:
- node_modules/
- cache/Cypress/
script:
- yarn install
- yarn test:e2e
- yarn test:e2e:report
The first time the above job is run, it’ll install dependencies from scratch, but the next time it’ll fetch them from the runner cache. The caveat is that unless all runners that run this job share cache, each time you run it on a new runner it’ll install the dependencies from scratch.
Here’s the documentation about using yarn with GitLab CI.
Edit:
To elaborate on using cache vs artifacts - artifacts are meant for both storing job output (eg. to manually download it later) and for passing results of one job to another one from a subsequent stage, while cache is meant to speed up job execution by preserving files that the job needs to download from the internet. See GitLab documentation for details.
Contents of node_modules directory obviously fit into the second category.

How to have a "cache per package.json" file in GitLab CI?

I have a Vue web application that is built, tested and deployed using GitLab CI.
GitLab CI has a "Cache" feature where specific products of a Job can be cached so that future runs of the Job in the same Pipeline can be avoided and the cached products be used instead.
I'd like to improve my workflow's performance by caching the node_modules directory so it can be shared across Pipelines.
GitLab Docs suggests using ${CI_COMMIT_REF_SLUG} as the cache key to achieve this. However, this means "caching per-branch" and I would like to improve on that.
I would like to have a cache "per package.json". That is, only if there is a change in the contents of package.json will the cache key change and npm install will be run.
I was thinking of using a hash of the contents of the package.json file as the cache key. Is this possible with GitLab CI? If so, how?
This is now possible as of Gilab Runner v12.5
cache:
key:
files:
- Gemfile.lock
- package-lock.json // or yarn.lock
paths:
- vendor/ruby
- node_modules
It means cache key will be a SHA checksum computed from the most recent commits (up to two, if two files are listed) that changed the given files. Whenever one of these files changes, a new cache key is computed and a new cache is created. Any future job runs using the same Gemfile.lock and package.json with cache:key:files will use the new cache, instead of rebuilding the dependencies.
More info: https://docs.gitlab.com/ee/ci/yaml/#cachekeyfiles
Also make sure to use always --frozen-lockfile flag in your CI jobs. (or npm ci) Regular npm install or yarn install / yarn commands generate new lock files and you wouldn't usually notice it until you install packages again. Thus makes your build artifacts and caches inconsistent.
For that behavior use only:changes parameter with a static cache name.
Ex:
install:
image: node:latest
script:
- npm install
cache:
untracked: true
key: npm #static name, can use any branch, any commit, etc..
paths:
- node_modules
only: #Only execute this job when theres a change in package.json
changes:
- package.json
If you need read this to set cache properly in runners:
https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching https://docs.gitlab.com/ee/ci/caching/

Is it possible to cache GitLab build artifacts only for a day?

I recently encountered a GitLab pipeline issue where my node_modules weren't being updated with newer versions of a library (particularly my own internal fork of a project, which uses the git+url syntax). I suspect, as the git+url doesn't have a version number in it, its tricky to hash the package file and detect there is a change...
My workaround was to try and put a $date entry in the cache entry of my .gitlab-ci.yml file, so that the cache is lost every 24 hours. However there is no CI variable listed which contains a date, and it doesn't seem that you can access OS variables everywhere in the yaml file. Is there a neat trick I can use?
I tried:
cache:
key: "$(date +%F)" # or see: https://gitlab.msu.edu/help/ci/variables/README.md
paths:
- node_modules
before_script:
- echo Gitlab job started $(date)
This doesn't seem to work - I think it just outputs the key string verbatum, although notice that the script echo command does.
Anyone have any neat ideas? For now, I am just putting a manual string, and will add a digit when I want to cause the cache to be blown (although it is a bit error prone)
At this time there is no way to set the cache expiration time for CI jobs. If the cache is using too much disk space and you're using the Docker executor, you can explore a tool such as https://gitlab.com/gitlab-org/gitlab-runner-docker-cleanup which will keep X amount of disk space free on the runner at any given time by expiring older cache.

Resources