Hello I am using kedro (a pipeline tool) and want to use github actions to trigger a kedro command (kedro run) whenever I make a push to my github repo.
Since I have all the data in my local repo, I thought it would make sense to run the kedro command on my local machine.
So my question is, is there a way to trigger a local action using github actions? Using self-hosted runners perhaps?
You can run the kedro pipeline directly in gh provided runner using the steps below. I added a script I used previously in here to run kedro lint with every push.
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
- name: Set up Python 3.7.9
uses: actions/setup-python#v2
with:
python-version: 3.7.9
- uses: actions/cache#v2
with:
path: ${{ env.pythonLocation }}
key: ${{ env.pythonLocation }}-${{ hashFiles('src/requirements.txt') }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r src/requirements.txt
- name: Run Kedro Pipeline
run: |
kedro run
That said, I'm also wondering why you would need to run it on every push. Given compute provided by github actions is likely resource constrained running there might not be the best place. You would also need to keep your data within your repo for this approach to work.
Hi #magical_unicorn I think #avan-sh's answer is correct - what I would also add is that we encourage you to not commit data to VCS / Git. There are some technical limitations such as filesize, but more importantly it's not great security practice when working with teams.
Whilst not necessary on all projects - it might be good practice to explore using some sort of cloud storage for your data, decoupling it from the version control system.
Related
I execute some tests in GitHub Actions using Testcontainers.
Testcontainers pulls the images which are used in my tests. Unfortunately the images are pulled again at every build.
How can I cache the images in GitHub Actions?
There's no official support from GitHub Actions (yet) to support caching pulled Docker images (see this and this issue).
What you can do is to pull the Docker images, save them as a .tar archive and store them in a folder for the GitHub Actions cache action to pick it up.
A sample workflow can look like the following:
build-java-project:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout#v2
- run: mkdir -p ~/image-cache
- id: image-cache
uses: actions/cache#v1
with:
path: ~/image-cache
key: image-cache-${{ runner.os }}
- if: steps.image-cache.outputs.cache-hit != 'true'
run: |
docker pull postgres:13
docker save -o ~/image-cache/postgres.tar alpine
- if: steps.image-cache.outputs.cache-hit == 'true'
run: docker load -i ~/image-cache/postgres.tar
- name: 'Run tests'
run: ./mvnw verify
While this is a little bit noisy, you'd need to adjust the pipeline every time you're depending on a new Docker image for your tests. Also be aware of how to do cache invalidation as I guess if you plan to use a :latest tag, the solution above won't recognize changes to the image.
The current GitHub Actions cache size is 10 GB which should be enough for a mid-size project relying on 5-10 Docker images for testing.
There's also the Docker GitHub Cache API but I'm not sure how well this integrates with Testcontainers.
Today I am dealing with the topic of Github Actions. I am not familiar with the topic of CI.
At GitHub I want to create an action. For the time being I use the boilplate of GitHub. I don't understand what ubuntu-latest jobs: build: runs-on: ubuntu-latest means. In another tutorial I saw self-hosted. On the server I want to deploy is also ubuntu, but that has nothing to do with it, right?
Thank you very much for an answer, feedback, comments and ideas.
GitHub workflow yml
name: CI
# Controls when the workflow will run
on:
# Triggers the workflow on push or pull request events but only for the master branch
push:
branches: [ master ]
pull_request:
branches: [ master ]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout#v2
# Runs a single command using the runners shell
- name: Run a one-line script
run: echo Hello, world!
# Runs a set of commands using the runners shell
- name: Run a multi-line script
run: |
echo Add other actions to build,
echo test, and deploy your project.
The runner is the application that runs a job and its steps from a GitHub Actions workflow.
It is used by GitHub Actions in the hosted virtual environments, or you can self-host the runner in your own environment.
Basically, GitHub-hosted runners offer a quicker, simpler way to run your workflows, while self-hosted runners are a highly configurable way to run workflows in your own custom environment.
Quoting the Github documentation:
GitHub-hosted runners:
- Receive automatic updates for the operating system, preinstalled packages and tools, and the self-hosted runner application.
- Are managed and maintained by GitHub.
- Provide a clean instance for every job execution.
- Use free minutes on your GitHub plan, with per-minute rates applied after surpassing the free minutes.
Self-hosted runners:
- Receive automatic updates for the self-hosted runner application only. You are responsible for updating the operating system and all other software.
- Can use cloud services or local machines that you already pay for.
- Are customizable to your hardware, operating system, software, and security requirements.
- Don't need to have a clean instance for every job execution.
Are free to use with GitHub Actions, but you are responsible for the cost of maintaining your runner machines.
You can also see on the link shared above the following table showing the available Github hosted runners with their associated label (such as ubuntu-latest):
So when you informed ubuntu-latest on your workflow, you asked Github to provide a runner to execute all the steps contained in your job implementation (it is not related to the server you wish to deploy, but to the pipeline that will perform the deploy operation (in your case)).
I have a workflow in GitHub that will execute a shell script, and inside this script I need to use gsutil
In my workflow yml-file I have the following steps:
name: Dummy Script
on:
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
environment: alfa
env:
_PROJECT_ID: my-project
steps:
- uses: actions/checkout#v2
- name: Set up Cloud SDK for ${{env._PROJECT_ID}}
uses: google-github-actions/setup-gcloud#master
with:
project_id: ${{env._PROJECT_ID}}
service_account_key: ${{ secrets.SA_ALFA }}
export_default_credentials: true
- run: gcloud projects list
- name: Run script.sh
run: |
path="${GITHUB_WORKSPACE}/script.sh"
chmod +x $path
sudo $path
shell: bash
And the script looks like:
#!/bin/bash
apt-get update -y
gcloud projects list
The 2nd step in yml (run: gcloud projects list) works as expected, listing the projects SA_USER have access to.
But when running the script in step 3, I get the following output:
WARNING: Could not open the configuration file: [/root/.config/gcloud/configurations/config_default].
ERROR: (gcloud.projects.list) You do not currently have an active account selected.
Please run:
$ gcloud auth login
to obtain new credentials.
If you have already logged in with a different account:
$ gcloud config set account ACCOUNT
to select an already authenticated account to use.
Error: Process completed with exit code 1.
So my question is:
How can I run a shell script file and pass on the authentication I have for my service account so I can run gcloud commands from a script file?
Due to reasons, it's a requirement that the script file should be able to run locally on developers computers, and from GitHub.
The problem seemed to be that the environment variables were not inherited when running with sudo. There are many ways to work around this, but I was able to confirm that it would run with sudo -E. Of course, if you don't need to run with sudo, you should remove it, but I guess it's necessary.
(The reproduction code was easy for me to reproduce it. Thanks)
I have a GitHub action that contains some npm and gulp commands and finally runs a Powershell file. I want to publish this GitHub action on the marketplace so that my team can use it. I can't find a solution to this problem anywhere. I checked the publish Github actions docs, there is no related document.
How do I invoke this action externally?
For instance, How do I convert this simple action so that it can be published to the marketplace?
Sample yml code
# This is a basic workflow to help you get started with Actions
name: CI
on:
push:
branches: [ master ]
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: windows-latest
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
- uses: actions/checkout#v2
- name: Use Node.js
uses: actions/setup-node#v1
with:
node-version: '10.x'
- name: Install dependencies
run: |
npm install
Thank you
The yaml you've posted here is a workflow, not an action. An action is the code behind the things like uses: actions/checkout#v2 (usually JavaScript, can be Dockerized too). If you're only writing YAML, you're just writing a workflow that invokes actions.
If you want to make your own action, check out the docs.
I'm trying to set up a workflow in CircleCI for my React project.
What I want to achieve is to get a job to build the stuff and another one to deploy the master branch to Firebase hosting.
This is what I have so far after several configurations:
witmy: &witmy
docker:
- image: circleci/node:7.10
version: 2
jobs:
build:
<<: *witmy
steps:
- checkout
- restore_cache:
keys:
- v1-dependencies-{{ checksum "package.json" }}
- v1-dependencies-
- run: yarn install
- save_cache:
paths:
- node_modules
key: v1-dependencies-{{ checksum "package.json" }}
- run:
name: Build app in production mode
command: |
yarn build
- persist_to_workspace:
root: .
deploy:
<<: *witmy
steps:
- attach_workspace:
at: .
- run:
name: Deploy Master to Firebase
command: ./node_modules/.bin/firebase deploy --token=MY_TOKEN
workflows:
version: 2
build-and-deploy:
jobs:
- build
- deploy:
requires:
- build
filters:
branches:
only: master
The build job always success, but with the deploy I have this error:
#!/bin/bash -eo pipefail
./node_modules/.bin/firebase deploy --token=MYTOKEN
/bin/bash: ./node_modules/.bin/firebase: No such file or directory
Exited with code 1
So, what I understand is that the deploy job is not running in the same place the build was, right?
I'm not sure how to fix that. I've read some examples they provide and tried several things, but it doesn't work. I've also read the documentation but I think it's not very clear how to configure everything... maybe I'm too dumb.
I hope you guys can help me out on this one.
Cheers!!
EDITED TO ADD MY CURRENT CONFIG USING WORKSPACES
I've added Workspaces... but still I'm not able to get it working, after a loooot of tries I'm getting this error:
Persisting to Workspace
The specified paths did not match any files in /home/circleci/project
And also it's a real pain to commit and push to CircleCI every single change to the config file when I want to test it... :/
Thanks!
disclaimer: I'm a CircleCI Developer Advocate
Each job is its own running Docker container (or VM). So the problem here is that nothing in node_modules exists in your deploy job. There's 2 ways to solve this:
Install Firebase and anything else you might need, on the fly, just like you do in the build job.
Utilize CircleCI Workspaces to carry over your node_modules directory from the build job to the deploy job.
In my opinion, option 2 is likely your best bet because it's more efficient.