CircleCI - how to download dependency only if not cached - caching

in brief
When using circleci to run unittest, I have to download & install LibreOffice. How can we do the download only if not cached?
p.s. Also posted on circleci support here
details
LibreOffice download & install often takes 30+ seconds to complete.
So I intend to cache it using save_cache feature here
My google search leads me to when steps here, though the guide there is not clear about how to check if a cache key exists.
I put the marker HOW_TO in the .circleci/config.yml below snippet
- restore_cache:
key: cache-libreoffice
- when:
condition: <<HOW_TO check if cache key exists >>
steps:
- run:
name: install atlas Pipfile
command: |
mkdir -p /tmp/libreoffice
cd /tmp/libreoffice
wget https://download.documentfoundation.org/libreoffice/stable/6.3.3/deb/x86_64/LibreOffice_6.3.3_Linux_x86-64_deb.tar.gz
- save_cache:
key: cache-libreoffice
paths: # the paths to save to cache entry
- /tmp/libreoffice

As circleci official support seems to give less helpful hints for the question, my workaround is to use custom docker image and have those dependencies installed there.

Related

Share a file between two workflows in CircleCI

In our repo build and deploy are two different workflows.
In build we call lerna to check for changed packages and save the output in a file saved to the current workspace.
check_changes:
working_directory: ~/project
executor: node
steps:
- checkout
- attach_workspace:
at: ~/project
- run:
command: npx lerna changed > changed.tmp
- persist_to_workspace:
root: ./
paths:
- changed.tmp
I'd like to pass the exact same file from build workflow to deploy workflow and access it in another job. How do I do that?
read_changes:
working_directory: ~/project
executor: node
steps:
- checkout
- attach_workspace:
at: ~/project
- run:
command: |
echo 'Reading changed.tmp file'
cat changed.tmp
According to this blog post
Unlike caching, workspaces are not shared between runs as they no
longer exists once a workflow is complete
it feels that caching would be the only option.
But according to the CircelCI documentation, my case doesn't fit their cache defintions:
Use the cache to store data that makes your job faster, but, in the
case of a cache miss or zero cache restore, the job still runs
successfully. For example, you might cache NPM package directories
(known as node_modules).
I think you can totally use caching here. Make sure you choose your key template(s) wisely.
The caveat to keep in mind is that (unlike the job level where you can use the requires key), there's no *native *way to sequentially execute workflows. Although you could consider using an orb for that; for example the roopakv/swissknife orb.
So you'll need to make sure that the job (that needs the file) in the deploy workflow, doesn't move to the restore_cache step until the save_cache in the other job has happened.

Clearing build cache in angular

In angular 13 default build caching was introduced.
I want to use it in my CD/CI gitlab pipelines but I can't find any information about when the cache should be cleared.
For every merge request, I want to build my app and run some tests.
It is safe to use the same cached directory for each MR, no matter what was changed?
If not, what should be the key for the cache?
I didn't find anything about it in angular docs.
You could follow some best practices as illustrated in GitLab documentation, whose runners do have a cache management.
See "Good caching practices"
test-job:
stage: build
cache:
- key:
files:
- Gemfile.lock
paths:
- vendor/ruby
- key:
files:
- yarn.lock
paths:
- .yarn-cache/
script:
- bundle install --path=vendor
- yarn install --cache-folder .yarn-cache
- echo Run tests...
In this example, you indicate to yarn where the cache folder (from the runner) is.
In your case (Angular cache, as requested in issue 21545), if you use a GitLab runner cache, you can clear said cache when you want.

how do I run a local script using github actions

Hello I am using kedro (a pipeline tool) and want to use github actions to trigger a kedro command (kedro run) whenever I make a push to my github repo.
Since I have all the data in my local repo, I thought it would make sense to run the kedro command on my local machine.
So my question is, is there a way to trigger a local action using github actions? Using self-hosted runners perhaps?
You can run the kedro pipeline directly in gh provided runner using the steps below. I added a script I used previously in here to run kedro lint with every push.
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
- name: Set up Python 3.7.9
uses: actions/setup-python#v2
with:
python-version: 3.7.9
- uses: actions/cache#v2
with:
path: ${{ env.pythonLocation }}
key: ${{ env.pythonLocation }}-${{ hashFiles('src/requirements.txt') }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r src/requirements.txt
- name: Run Kedro Pipeline
run: |
kedro run
That said, I'm also wondering why you would need to run it on every push. Given compute provided by github actions is likely resource constrained running there might not be the best place. You would also need to keep your data within your repo for this approach to work.
Hi #magical_unicorn I think #avan-sh's answer is correct - what I would also add is that we encourage you to not commit data to VCS / Git. There are some technical limitations such as filesize, but more importantly it's not great security practice when working with teams.
Whilst not necessary on all projects - it might be good practice to explore using some sort of cloud storage for your data, decoupling it from the version control system.

Gitlab CI : how to cache node_modules from a prebuilt image?

The situation is this:
I'm running Cypress tests in a Gitlab CI (launched by vue-cli). To speed up the execution, I built a Docker image that contains the necessary dependencies.
How can I cache node_modules from the prebuilt image to use it in the test job ?
Currently I'm using an awful (but working) solution:
testsE2e:
image: path/to/prebuiltImg
stage: tests
script:
- ln -s /node_modules/ /builds/path/to/prebuiltImg/node_modules
- yarn test:e2e
- yarn test:e2e:report
But I think there must be a cleaner way using the Gitlab CI cache.
I've been testing:
cacheE2eDeps:
image: path/to/prebuiltImg
stage: dependencies
cache:
key: e2eDeps
paths:
- node_modules/
script:
- find / -name node_modules # check that node_modules files are there
- echo "Caching e2e test dependencies"
testsE2e:
image: path/to/prebuiltImg
stage: tests
cache:
key: e2eDeps
script:
- yarn test:e2e
- yarn test:e2e:report
But the job cacheE2eDeps displays a "WARNING: node_modules/: no matching files" error.
How can I do this successfully? The Gitlab documentation doesn't really talk about caching from a prebuilt image...
The Dockerfile used to build the image :
FROM cypress/browsers:node13.8.0-chrome81-ff75
COPY . .
RUN yarn install
There is not documentation for caching data from prebuilt images, because it’s simply not done. The dependencies are already available in the image so why cache them in the first place? It would only lead to an unnecessary data duplication.
Also, you seem to operate under the impression that cache should be used to share data between jobs, but it’s primary use case is sharing data between different runs of the same job. Sharing data between jobs should be done using artifacts.
In your case you can use cache instead of prebuilt image, like so:
variables:
CYPRESS_CACHE_FOLDER: "$CI_PROJECT_DIR/cache/Cypress"
testsE2e:
image: cypress/browsers:node13.8.0-chrome81-ff75
stage: tests
cache:
key: "e2eDeps"
paths:
- node_modules/
- cache/Cypress/
script:
- yarn install
- yarn test:e2e
- yarn test:e2e:report
The first time the above job is run, it’ll install dependencies from scratch, but the next time it’ll fetch them from the runner cache. The caveat is that unless all runners that run this job share cache, each time you run it on a new runner it’ll install the dependencies from scratch.
Here’s the documentation about using yarn with GitLab CI.
Edit:
To elaborate on using cache vs artifacts - artifacts are meant for both storing job output (eg. to manually download it later) and for passing results of one job to another one from a subsequent stage, while cache is meant to speed up job execution by preserving files that the job needs to download from the internet. See GitLab documentation for details.
Contents of node_modules directory obviously fit into the second category.

Jekyll 4.0.0 doesn't build using cache in CI

Hello I just upgraded my website to Jekyll 4.0.0 and it's starting to take a long time to compile. Up to 10 minutes sometimes. But when I use the incremental build locally it's able to product the compiled version in a few seconds. So I tried to cache all the Jekyll related caches I could find. I'm using CircleCI this is my config.yml
- save_cache:
key: site-cache-260320
paths:
- _site
- .jekyll-cache
- .jekyll-metadata
- .sass-cache
This restores the cache folders to the repo when the CircleCI job starts. But it doesn't seem like they get reused in the compilation process. The job always takes almost 10 minutes to compile.
Am I missing a cache folder? Is there a Jekyll option I need to use? If I could get my website build/deploys down to a few seconds that would be life changing. Thanks!
The CircleCI documentation on caching also mention
CircleCI restores caches in the order of keys listed in the restore_cache step. Each cache key is namespaced to the project, and retrieval is prefix-matched. The cache will be restored from the first matching key
steps:
- restore_cache:
keys:
So make sure to configure your restore_cache step, to go along with your save_cache step.
For instance, be aware of the cache size.

Resources