Github action not caching node_modules across jobs - caching

I have a Github action on pull_request, and am attempting to cache node_modules between jobs. I can only see cache working on a job re-run, but not when opening new PRs, even when there have been no changes to my node_modules.
Is this by design?
Each time it runs in a new PR, I see this output in the job:
Run actions/cache#v2
with:
path: ~/.npm
key: Linux-build-cache-node-modules-83fc6365b85dd061416fccd5993d48c2003c388bb2184fd57f28d1041d9d261e
restore-keys: Linux-build-cache-node-modules-
Linux-build-
Linux-
env:
cache-name: cache-node-modules
Cache not found for input keys: Linux-build-cache-node-modules-83fc6365b85dd061416fccd5993d48c2003c388bb2184fd57f28d1041d9d261e, Linux-build-cache-node-modules-, Linux-build-, Linux-
Yet, re-running the job will enjoy the cache hit:
Run actions/cache#v2
Received 54525952 of 109978126 (49.6%), 51.8 MBs/sec
Received 109978126 of 109978126 (100.0%), 67.8 MBs/sec
Cache Size: ~105 MB (109978126 B)
/usr/bin/tar --use-compress-program zstd -d -xf /home/runner/work/_temp/0bdf436d-9d4d-4397-b8ed-5910ff503949/cache.tzst -P -C /home/runner/work/argutopia/argutopia
Cache restored successfully
Cache restored from key: Linux-build-cache-node-modules-83fc6365b85dd061416fccd5993d48c2003c388bb2184fd57f28d1041d9d261e
Here is my action yml for reference:
name: PR CI
on:
pull_request:
branches: [main, develop]
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [14.x]
steps:
- uses: actions/checkout#v2
with:
fetch-depth: 0
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node#v1
with:
node-version: ${{ matrix.node-version }}
- name: Cache node modules
uses: actions/cache#v2
id: cache
env:
cache-name: cache-node-modules
with:
path: ~/.npm
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
${{ runner.os }}-build-
${{ runner.os }}-
- name: Install Dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: |
echo '::debug::deps cache miss - installing deps'
npm install

Caching ~/.npm wont allow you to skip running npm install. It will save the global npm cache on this machine so that running npm install doesn't incur as much download time, but you'll then still need to build the local node_modules for your project.
If you want to be able to skip your install, you can cache the project node_modules folder as well and skip install on cache hits there, though this could present some issues down the road.

Use advanced alternative caching technique from this article
- name: Cache dependencies
id: cache
uses: actions/cache#v3
with:
path: ./node_modules
key: modules-${{ hashFiles('package-lock.json') }}
- name: Install dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: npm ci --ignore-scripts

Related

Only re-run the failed job in a GitHub Actions matrix

How can I re-run only a failed job in a GitHub Actions matrix?
I only want to run the failed jobs and not all of them. My CI matrix takes a lot of time to run, and I don't want to waste energy re-running the whole matrix because one of them failed.
So this button is not my solution:
Re-run a specific job has not been implemented yet, but it is on the roadmap.
However, reading this ISSUE about the subject, there is a workaround from this post, which stores the last run status in cache and then skips jobs based on that status.
How they used it:
- name: Set default run status
run: echo "::set-output name=last_run_status::default" > last_run_status
- name: Restore last run status
id: last_run
uses: actions/cache#v2
with:
path: |
last_run_status
key: ${{ github.run_id }}-${{ matrix.os }}-${{ matrix.node-version }}-${{ matrix.webpack }}-${{ steps.date.outputs.date }}
restore-keys: |
${{ github.run_id }}-${{ matrix.os }}-${{ matrix.node-version }}-${{ matrix.webpack }}-
- name: Set last run status
id: last_run_status
run: cat last_run_status
- name: Checkout ref
uses: actions/checkout#v2
with:
ref: ${{ github.event.workflow_dispatch.ref }}
- name: Use Node.js ${{ matrix.node-version }}
if: steps.last_run_status.outputs.last_run_status != 'success'
uses: actions/setup-node#v1
with:
node-version: ${{ matrix.node-version }}
Here is the original workflow file.
GitHub finally made it possible to do this. There is now a "Re-run failed jobs" button:

"The cypress npm package is installed, but the Cypress binary is missing." in GitHub Actions

I'm receiving the following error when running a Cypress e2e test runner on GitHub Actions:
The cypress npm package is installed, but the Cypress binary is missing.
We expected the binary to be installed here: /home/runner/.cache/Cypress/8.5.0/Cypress/Cypress
Reasons it may be missing:
- You're caching 'node_modules' but are not caching this path: /home/runner/.cache/Cypress
- You ran 'npm install' at an earlier build step but did not persist: /home/runner/.cache/Cypress
My .github/workflow/tests.yml is set up as follows:
name: celestia/tests
on:
pull_request:
branches:
- main
- master
jobs:
unit:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
node: [14]
env:
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
NPM_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- name: Checkout ๐Ÿ›Ž
uses: actions/checkout#master
# Setup .npmrc file to publish to GitHub Packages
- name: Setup node env ๐Ÿ— and .npmrc file to publish to GitHub Packages
uses: actions/setup-node#v2.1.2
with:
node-version: ${{ matrix.node }}
registry-url: 'https://npm.pkg.github.com'
# Defaults to the user or organization that owns the workflow file:
scope: '#observerly'
- name: Cache node_modules ๐Ÿ“ฆ
uses: actions/cache#v2
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- name: Install project dependencies ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป
run: npm ci --ignore-scripts
- name: Run jest unit tests ๐Ÿงช
run: npm run test:unit
e2e:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
node: [14]
env:
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
NPM_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- name: Checkout ๐Ÿ›Ž
uses: actions/checkout#master
# Setup .npmrc file to publish to GitHub Packages
- name: Setup node env ๐Ÿ— and .npmrc file to publish to GitHub Packages
uses: actions/setup-node#v2.1.2
with:
node-version: ${{ matrix.node }}
registry-url: 'https://npm.pkg.github.com'
# Defaults to the user or organization that owns the workflow file:
scope: '#observerly'
- name: Install project dependencies ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป
run: npm ci --ignore-scripts
- name: Run e2e cypress tests ๐Ÿงช
run: npm run test:e2e:headless
How can I side-step this particular issue?

Set up github actions to run cypress tests on localhost from 2 different repositories

I have my front-end (repUI) and cypress tests (repTest) in separate repositories., I am trying to run the tests by checkout both from repTest. Once the repUI is set up (install and build), the localhost:8000 is ready for tests to run.
So far the following workflow is designed:
name: Cypress E2E Tests on Dev
on:
workflow_dispatch:
push:
branches-ignore: [ main ]
schedule:
- cron: '0 0 * * 1-5'
jobs:
setup:
runs-on: ubuntu-20.04
strategy:
matrix:
node-version: [14.x]
steps:
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node#v1
with:
node-version: ${{ matrix.node-version }}
# Checkout UI repo
- name: Checkout voy-frontend repo
uses: actions/checkout#v2
with:
repository: orgName/xyz-frontend
path: xyz-frontend
ssh-key: ${{ secrets.GIT_SSH_KEY }}
# Install voy-f dependancies
- name: Install dependancies for application
run: |
cd voy-frontend
pwd
npm install
- name: Get npm cache directory
id: npm-cache-dir
run: |
echo "::set-output name=dir::$(npm config get cache)"
- uses: actions/cache#v2
id: npm-cache # use this to check for `cache-hit` ==> if: steps.npm-cache.outputs.cache-hit != 'true'
with:
path: ${{ steps.npm-cache-dir.outputs.dir }}
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
# Run application
- name: Run application
run: |
pwd
cd voy-frontend
npm run serve
cypress-test:
name: Test
needs: setup
runs-on: ubuntu-latest
steps:
# Checkout test automation repo
- name: git checkout
uses: actions/checkout#v2
with:
ref: main
# Install cypress dependancies
- name: Cypress install
uses: cypress-io/github-action#v2
# Run cypress tests
- name: Cypress run - localhost
run: npm run cy:r -- --env TENANT=${{ secrets.TENANT }},CLIENT_ID=${{ secrets.CLIENT_ID }},CLIENT_SECRET=${{ secrets.CLIENT_SECRET }},ORG_SCOPE=${{ secrets.ORG_SCOPE }},G_SCOPE=${{ secrets.G_SCOPE }} --spec "cypress/integration/specs/Search.feature" --headless --browser chrome
- uses: actions/upload-artifact#v2
if: failure()
with:
name: cypress-videos
path: |
cypress/reports
cypress/videos
cypress/screenshots
retention-days: 1
The problem is if I run the build command npm run serve for repUI, the applications successfully getting started but it is not proceeding further instead it just waits. If I comment out the npm run serve, all other jobs are running as it should.
Here is the screenshot:
For building, it only took ~60 seconds, and then it waits until I cancel it. Same case if I trigger the job again and always.
What needs to be done in order to proceed to the cypress-test job once the build is ready.?

Github Action Cache question with containers

I got a quick question and I can't find docs on it (maybe it is impossible)
How can I retrieve what has been built from a container (like this) :
prod-dependencies:
name: Production Dependencies
runs-on: ubuntu-20.04
container: elixir:1.11-alpine
env:
MIX_ENV: prod
steps:
- name: Checkout
uses: actions/checkout#v2
with:
fetch-depth: 0
- name: Retrieve Cached Dependencies
uses: actions/cache#v2
id: mix-prod-cache
with:
path: |
${{ github.workspace }}/deps
${{ github.workspace }}/_build
key: PROD-${{ hashFiles('mix.lock') }}
- name: Install Dependencies
if: steps.mix-prod-cache.outputs.cache-hit != 'true'
run: |
apk add build-base rust cargo
mix do local.hex --force, local.rebar --force, deps.get --only prod, deps.compile
I need to build on Alpine since I am deploying on Alpine and some C libs are different for the Erlang VM to work.
So here it is the place where I build dependencies and I want to put them on cache for the subsequent jobs but the cache is never populated. According to doc GitHub mount a volume with containers to retrieve artifacts but I must miss use it.
Thanks a lot for your help.
Here is a sample Github workflow for an Elixir app with a Postgres database that utilizes the action/cache#v2. Note that it defines 2 separate steps for restoring the deps and _build directories. In this workflow, the mix deps.get and mix compile steps are always run: they should execute very quickly if the cache is restored.
Adapted from this workflow
name: Test
on:
pull_request:
branches:
- develop
paths-ignore:
- 'docs/**'
- '*.md'
jobs:
test:
name: Lint and test
runs-on: ubuntu-18.04
env:
MIX_ENV: test
services:
db:
image: postgres:11
ports: ['5432:5432']
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout#v2
- uses: erlef/setup-elixir#v1
with:
otp-version: '23.1.2'
elixir-version: '1.11.2'
- name: Configure SSH for private Repos
uses: webfactory/ssh-agent#v0.4.1
with:
ssh-private-key: ${{ secrets.HEADLESS_PRIV }}
- name: Restore the deps cache
uses: actions/cache#v2
id: deps-cache-restore
with:
path: deps
key: ${{ runner.os }}-deps-mixlockhash-${{ hashFiles(format('{0}{1}', github.workspace, '/mix.lock')) }}
restore-keys: |
${{ runner.os }}-deps-
- name: Restore the _build cache
uses: actions/cache#v2
id: build-cache-restore
with:
path: _build
key: ${{ runner.os }}-build-mixlockhash-${{ hashFiles(format('{0}{1}', github.workspace, '/mix.lock')) }}
restore-keys: |
${{ runner.os }}-build-
- name: Get/update Mix dependencies
run: |
mix local.hex --force
mix local.rebar
mix deps.get
- name: Compile
run: mix compile
- name: Create database
run: mix ecto.create
- name: Migrate database
run: mix ecto.migrate
- name: Test
run: mix test

Parallelism in CI/CD Pipelines like GitHub Actions

Hello there and thank you for reading my question, its my first one here.
I am working with CI/CD pipelines for a year now and I think they are pretty nice and convinient for developing Websites and Stuff. But in the last months I have more and more problems creating fast, efficient and smart pipelines without redundant dependency installs or similar. So I want to use as less computation ressources as possible while still have fast builds. I want to parallelize steps and use theire artifacts in another final step. For example the following GitHub Actions workflow:
My goal with this workflow is to just build a VueJS Single Page App and deploy it to the IBM Cloud. For that I need to install the npm dependencies and build the Vue App and also install the IBM Cloud CLI. After these two steps are finished the builded App should be pushed to the IBM Cloud.
I could just simply run all steps sequentially like this:
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout#v2
- name: Use Node.js 10.X
uses: actions/setup-node#v1
with:
node-version: '10.x'
- name: Cache Node Modules
uses: actions/cache#v2
env:
cache-name: cache-node-modules
with:
# npm cache files are stored in `~/.npm` on Linux/macOS
path: ~/.npm
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
${{ runner.os }}-build-
${{ runner.os }}-
- name: Install Dependencies
run: npm ci
- name: Build Page
run: npm run build
- name: Install IBM Cloud CLI
run: curl -fsSL https://clis.cloud.ibm.com/install/linux | sh
shell: bash
- name: Install Cloud Foundry CLI
run: ibmcloud cf install
shell: bash
- name: Authenticate with IBM Cloud CLI
run: ibmcloud login --apikey "${{ secrets.IBM_CLOUD_API_KEY }}" --no-region -g Default
shell: bash
- name: Target a Cloud Foundry org and space
run: ibmcloud target --cf-api "${{ secrets.IBM_CLOUD_CF_API }}" -o "${{ secrets.IBM_CLOUD_CF_ORG }}" -s "${{ secrets.IBM_CLOUD_CF_SPACE }}"
shell: bash
- name: Deploy to Cloud Foundry
run: ibmcloud cf push
shell: bash
But in my opinion this is very ugly and can be improved. So I tried to split the job into 3 parts: build, predeploy and deploy. The build job installs and builds the Vue App. The Predeploy job install the IBM CLI. These two jobs doesn't depend on each other so they can be parallized. But the last job, deploy, depends on both so I added the needs: [build, predeploy] value to it. So I have the following workflow to archive this:
### This will not work!
name: Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout#v2
- name: Use Node.js 10.X
uses: actions/setup-node#v1
with:
node-version: '10.x'
- name: Cache Node Modules
uses: actions/cache#v2
env:
cache-name: cache-node-modules
with:
# npm cache files are stored in `~/.npm` on Linux/macOS
path: ~/.npm
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
${{ runner.os }}-build-
${{ runner.os }}-
- name: Install Dependencies
run: npm ci
- name: Build Page
run: npm run build
predeploy:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Install IBM Cloud CLI
run: curl -fsSL https://clis.cloud.ibm.com/install/linux | sh
- name: Install Cloud Foundry CLI
run: ibmcloud cf install
- name: Authenticate with IBM Cloud CLI
run: ibmcloud login --apikey "${{ secrets.IBM_CLOUD_API_KEY }}" --no-region -g Default
- name: Target a Cloud Foundry org and space
run: ibmcloud target --cf-api "${{ secrets.IBM_CLOUD_CF_API }}" -o "${{ secrets.IBM_CLOUD_CF_ORG }}" -s "${{ secrets.IBM_CLOUD_CF_SPACE }}"
deploy:
needs: [build, predeploy]
runs-on: ubuntu-latest
steps:
- name: Deploy to Cloud Foundry
# Error: 'ibmcloud: command not found'
run: ibmcloud cf push
shell: bash
Which looks on the GUI like:
[![My GitHub Workflow on the GUI][1]][1]
But this workflow will error since the last job doesn't share the same environment as the other jobs. I am aware that I could use the up/download Artifact feature of GitHub Actions but this seems to me like using a lot of resources. But I dont want to use a lot of ressources for my pipeline, I dont need a lot of different virtual environments or build matrixes. (I know they are very good for large projects, but they seem a little overkill for my little site)
So here are my two final Questions:
Why is parallelism in CI/CD often complication and not straight forward?
How can I improve my current pipeline with parallelism and without redundant executions?
I am glad about every helpful advice or link. Thank you. :)
[1]: https://i.stack.imgur.com/qEqLs.png
I think your original workflow was already pretty efficient. As you mentioned, different jobs are executed on different runners and sometime the additional complexity and effort put into the synchronization/logic between workflows outweighs the benefits of parallelism. In your case I don't think it would make much sense to run your jobs in parallel.
For your first question, I don't think it's an issue specific to CI/CD pipelines. I am getting a bit out of scope here but you have similar issues in any code that does work in parallel or as a matter of fact in any work in general that is done in parallel anywhere. Being factories, teams, code, CI pipelines, as soon as the work is split up, there will be some sort of mechanism to manage the allocation of work and track its progress. Which will make it more complex.
Why GH workflows might seem less straightforward than other systems seem to be a better question and I think it comes does to how long it has been around. It's a pretty recent addition to github and as new features are progressively being added it gets easier and easier to work with.
Regarding other optimizations for your workflow, I would recommend trying to avoid redoing the same work every time the workflow run if it's not needed. You already do this with the cache action for npm. But you could, for example build a docker image, or even better an action ,with your IBM CLI in it and remove the pre-deploy stage entirely. Simply having:
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout#v2
- name: Use Node.js 10.X
uses: actions/setup-node#v1
with:
node-version: '10.x'
- name: Cache Node Modules
uses: actions/cache#v2
env:
cache-name: cache-node-modules
with:
# npm cache files are stored in `~/.npm` on Linux/macOS
path: ~/.npm
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
${{ runner.os }}-build-
${{ runner.os }}-
- name: Install Dependencies
run: npm ci
- name: Build Page
run: npm run build
- name: Deploy to Cloud Foundry
uses: my-action:v1
with:
api-key: ${{ secrets.IBM_CLOUD_API_KEY }}
cf-api: ${{ secrets.IBM_CLOUD_CF_API }}
cf-org: ${{ secrets.IBM_CLOUD_CF_ORG }}
cf-space: ${{ secrets.IBM_CLOUD_CF_SPACE }}

Resources