Fail to active conda in Azure DevOps pipeline - yaml

Testing the azure devops pipeline on a python project build by conda
jobs:
- job: pre_build_setup
displayName: Pre Build Setup
pool:
vmImage: 'ubuntu-18.04'
steps:
- bash: echo "##vso[task.prependpath]$CONDA/bin"
displayName: Add conda to PATH
- job: build_environment
displayName: Build Environment
dependsOn: pre_build_setup
steps:
- script: conda env create --file environment.yml --name build_env
displayName: Create Anaconda environment
- script: conda env list
displayName: environment installation verification
- job: unit_tests
displayName: Unit Tests
dependsOn: build_environment
strategy:
maxParallel: 2
steps:
- bash: conda activate build_env
The last step - bash: conda activate build_env fails on me with the following error
Script contents:
conda activate build_env
========================== Starting Command Output ===========================
/bin/bash --noprofile --norc /home/vsts/work/_temp/d5af1b5c-9135-4984-ab16-72b82c91c329.sh
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
$ conda init <SHELL_NAME>
Currently supported shells are:
- bash
- fish
- tcsh
- xonsh
- zsh
- powershell
See 'conda init --help' for more information and options.
IMPORTANT: You may need to close and restart your shell after running 'conda init'.
##[error]Bash exited with code '1'.
Finishing: Bash
How can I active conda? seems the path is wrong so that it is unable to find conda.

CommandNotFoundError: Your shell has not been properly configured to
use 'conda activate'.
Here the issue is your script is run in a sub-shell, but condahasn't been initialized in this sub-shell.
You need change your active script as:
steps:
- task: Bash#3
inputs:
targetType: 'inline'
script: |
eval "$(conda shell.bash hook)"
conda activate build_env
displayName: Active
In addition, please do not split the Add PATH, create environment and active the environment into different jobs.
For Azure devops pipeline, agent job is the basic unit of the pipeline running process and each agent job has its own independent running environment and work logic.
For more detailed, you were using Hosted agent to apply your scripts in this issue scenario.
While there's one agent job starts to run, our pool system will assign an VM to this agent job. And, this VM will be recycled back once the agent job finished. When next agent job start to run , a completely new VM will be randomly reassign.
dependsOn can only share files and pass variables between jobs. It can not keep the VM continued in next job.
I believe you should be able to guess what problem you are going to encounter. Yes, even you can succeed to apply that activate script, you will continue face another error: Could not find conda environment: build_env. That's because the environment which is using by this activate script is a brand new vm, the VM that previous build_environment job used has been recycled by the system.
So, do not split the create environment and activate into 2 agent jobs:
- job: build_environment
displayName: Build Environment
dependsOn: pre_build_setup
steps:
- script: conda env create --file environment.yml --name build_env
displayName: Create Anaconda environment
- script: conda env list
displayName: environment installation verification
- task: Bash#3
inputs:
targetType: 'inline'
script: |
eval "$(conda shell.bash hook)"
conda activate build_env
displayName: Active

There's one more approach proposed by Microsoft which seems to be more robust.
In every step where you want the environment to be activated, you should run
source $CONDA/bin/activate <myEnv>
or just
source activate <myEnv>
if you've already added $CONDA/bin to the PATH variable. You may check the link above to find examples for Ubuntu, macOS and Windows.
In your case it would look as follows:
steps:
- task: Bash#3
inputs:
targetType: 'inline'
script: source activate build_env
displayName: Active
Important note: as for now, if you pass the name of the environment to activate script, the environment must be created in the same job. However, if you're using prefix (i.e. path to the environment directory), it doesn't matter.

Related

Gitlab CI/CD fail with "bash: line 132: go: command not found"

We have installed Gitlab on our custom server. We are looking to use the gitlab CI/CD pipeline to build and release our software for that I'm working on a POC. I have created a project with the following .gitlab-ci.yml
variables:
GOOS: linux
GOARCH: amd64
stages:
- test
- build
- deb-build
run_tests:
stage: test
image: golang:latest
before_script:
- go mod tidy
script:
- go test ./...
build_binary:
stage: build
image: golang:latest
artifacts:
untracked: true
script:
- GOOS=$GOOS GOARCH=$GOARCH go build -o newer .
build deb:
stage: deb-build
image: ubuntu:latest
before_script:
- mkdir -p deb-build/usr/local/bin/
- chmod -R 0755 deb-build/*
- mkdir build
script:
- cp newer deb-build/usr/local/bin/
- dpkg-deb --build deb-build release-1.1.1.deb
- mv release-1.1.1.deb build
artifacts:
paths:
- build/*
TLDR: I have updated the gitlab-ci.yml and the screenshot of the error.
What I have noticed, the error is persistent if I use the shared runner(GJ7z2Aym) if you register a runner (i.e Specific Runner)
gitlab-runner register --non-interactive --url "https://gitlab.sboxdc.com/" --registration-token "<register_token>" --description "" --executor "docker" --docker-image "docker:latest"
I see the build passing without any problem
Failed case.
https://gist.github.com/meetme2meat/0676c2ee8b78b3683c236d06247a8a4d
One that Passed
https://gist.github.com/meetme2meat/058e2656595a428a28fcd91ba68874e8
The failing job is using a runner with shell executor, that was probably setup when you configured your GitLab instance. This can be seen on logs by this line:
Preparing the "shell" executor
Using Shell executor...
shell executor will ignore your job's image: config. It will run job script directly on the machine on which the runner is hosted, and will try to find go binary on this machine (failing in your case). It's a bit like running go commands on some Ubuntu without having go installed.
Your successful job is using a runner with docker executor, running your job's script in a golang:latest image as you requested. It's like running docker run golang:latest sh -c '[your script]'. This can be seen in job's logs:
Preparing the "docker" executor
Using Docker executor with image golang:latest ...
Pulling docker image golang:latest ...
Using docker image sha256:05e[...]
golang:latest with digest golang#sha256:04f[...]
What you can do:
Make sure you configure a runner with a docker executor. Your config.toml would then look like:
[[runners]]
# ...
executor = "docker"
[runners.docker]
# ...
It seems you already did by registering your own runner.
Configure your jobs to use this runner with job tags. You can set tag docker_executor on your Docker runner (when registering or via Gitlab UI) and setup something like:
build_binary:
stage: build
# Tags a runner must have to run this job
tags:
- docker_executor
image: golang:latest
artifacts:
untracked: true
script:
- GOOS=$GOOS GOARCH=$GOARCH go build -o newer .
See Runner registration and Docker executor for details.
Since you have used image: golang:latest, go should be in the $PATH
You need to check at which stage it is failing: run_tests or build_binary.
Add echo $PATH in your script steps, to check what $PATH is considered.
Check also if the error comes from the lack of git, used by Go for accessing modules remote repositories. See this answer as an example.
From your gists, the default GitLab runner uses a shell executor (which knows nothing about Go)
Instead, the second one uses a Docker executor, based on the Go image.
Registering the (Docker) runner is therefore the right way to ensure the expected executor.

Permission error when trying to restore cached Conda environment in GitHub Actions

I am trying to write a GitHub Actions workflow to cache my Conda environment, such that in subsequent runs, the CI/CD can skip the protracted Conda dependency resolving step and just restore what it needs from cache. (Note: I only care about caching the environment; I'm not interested in caching the Conda or pip package caches.)
From reading the documentation for actions/cache and conda-incubator/setup-miniconda, this is what I have come up with (running on GitHub-hosted Ubuntu):
- name: Setup Conda
uses: conda-incubator/setup-miniconda#v2
with:
activate-environment: MY_ENV
auto-activate-base: false
- name: Establish Conda environment directory
id: conda_env
run: |
conda info --json | jq -r '"::set-output name=dir::\(.envs_dirs[0])"'
- name: Setup Conda environment caching
id: cache
uses: actions/cache#v3
env:
# Increase this to manually invalidate the cache
CACHE_NUMBER: 0
with:
path: ${{ steps.conda_env.outputs.dir }}
key: ${{ runner.os }}-conda-${{ hashFiles('environment.yml', 'requirements.txt') }}-${{ env.CACHE_NUMBER }}
- name: Update Conda environment
run: conda env update -n MY_ENV -f environment.yml
if: steps.cache.outputs.cache-hit != 'true'
In the first run through, this appears to work just fine: It sets up the cache directory correctly to where conda-incubator/setup-miniconda is keeping its environments (by default this seems to be /usr/share/miniconda/envs); Conda does the dependency resolution and installation of packages; then, at the end of the pipeline, actions/cache archives and uploads about 2GiB of data.
In subsequent runs, the cache key is correctly identified and it starts downloading. However, the unpacking fails with the following permission errors:
/usr/bin/tar: ../../../../../usr/share/miniconda/envs: Cannot utime: Operation not permitted
/usr/bin/tar: ../../../../../usr/share/miniconda/envs: Cannot change mode to rwxr-xr-x: Operation not permitted
/usr/bin/tar: Exiting with failure status due to previous errors
Warning: Tar failed with error: The process '/usr/bin/tar' failed with exit code 2
/usr/share/miniconda/envs is definitely writeable; that's where those 2GiB of data were written to in the first run. However, only root can change things like the atime and file permissions. I presume I cannot get actions/cache to run its untar step as root. How, therefore, does one get around this?

GitHub -> GCP, use gcloud commands inside shell script

I have a workflow in GitHub that will execute a shell script, and inside this script I need to use gsutil
In my workflow yml-file I have the following steps:
name: Dummy Script
on:
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
environment: alfa
env:
_PROJECT_ID: my-project
steps:
- uses: actions/checkout#v2
- name: Set up Cloud SDK for ${{env._PROJECT_ID}}
uses: google-github-actions/setup-gcloud#master
with:
project_id: ${{env._PROJECT_ID}}
service_account_key: ${{ secrets.SA_ALFA }}
export_default_credentials: true
- run: gcloud projects list
- name: Run script.sh
run: |
path="${GITHUB_WORKSPACE}/script.sh"
chmod +x $path
sudo $path
shell: bash
And the script looks like:
#!/bin/bash
apt-get update -y
gcloud projects list
The 2nd step in yml (run: gcloud projects list) works as expected, listing the projects SA_USER have access to.
But when running the script in step 3, I get the following output:
WARNING: Could not open the configuration file: [/root/.config/gcloud/configurations/config_default].
ERROR: (gcloud.projects.list) You do not currently have an active account selected.
Please run:
$ gcloud auth login
to obtain new credentials.
If you have already logged in with a different account:
$ gcloud config set account ACCOUNT
to select an already authenticated account to use.
Error: Process completed with exit code 1.
So my question is:
How can I run a shell script file and pass on the authentication I have for my service account so I can run gcloud commands from a script file?
Due to reasons, it's a requirement that the script file should be able to run locally on developers computers, and from GitHub.
The problem seemed to be that the environment variables were not inherited when running with sudo. There are many ways to work around this, but I was able to confirm that it would run with sudo -E. Of course, if you don't need to run with sudo, you should remove it, but I guess it's necessary.
(The reproduction code was easy for me to reproduce it. Thanks)

How do you execute a script after a job is cancelled in GitLab?

I am looking for a way to clean up the runner after a job has been cancelled in GitLab. The reason is we often have to cancel running jobs because the runner is sometimes stuck in the test pipeline and I can imagine a couple of other scenarios where you would want to cancel a job and have a clean up script run after. I am looking for something like after_script but just in the case when a job was cancelled.
I checked the GitLab keyword reference but could not find what I need.
The following part of my gitlab-ci.yaml shows the test stage which I would like to gracefully cancel by calling docker-compose down when the job was cancelled.
I am using a single gitlab-runner. Also, I don't use dind.
test_stage:
stage: test
only:
- master
- tags
- merge_requests
image: registry.gitlab.com/xxxxxxxxxxxxxxxxxxx
variables:
HEADLESS: "true"
script:
- docker login -u="xxxx" -p="${QUAY_IO_PASSWORD}" quay.io
- npm install
- npm run test
- npm install wait-on
- cd example
- docker-compose up --force-recreate --abort-on-container-exit --build traefik frontend &
- cd ..
- apt install -y iproute2
- export DOCKER_HOST_IP=$( /sbin/ip route|awk '/default/ { print $3 }' )
- echo "Waiting for ${DOCKER_HOST_IP}/xxxt"
- ./node_modules/.bin/wait-on "http://${DOCKER_HOST_IP}/xxx" && export BASE_URL=http://${DOCKER_HOST_IP} && npx codeceptjs run-workers 2
- cd example
- docker-compose down
after_script:
- cd example && docker-compose down
artifacts:
when: always
paths:
- /builds/project/tests/output/
retry:
max: 2
when: always
tags: [custom-runner]
Unfortunately this is not currently possible in GitLab. There have been several tickets opened in their repos, with this one being the most up-to-date.
As of the day that I'm posting this (September 27, 2022), there have been at least 14 missed deliverables for this. GitLab continues to say it's coming, but has never delivered it in the six years that this ticket has been open.
There are mitigations as far as automatic job cancelling, but unfortunately that will not help in your case.
Based on your use case, I can think of two different solutions:
Create a wrapper script that detects when parts of your test job are hanging
Set a timeout on the pipeline (In GitLab you can go to Settings -> CI/CD -> General Pipelines -> Timeout)
Neither of these solutions are as robust as if GitLab themselves implemented a solution, but they can at least prevent you from having a job hang for an eternity and clogging up everything else in the pipeline.

Gitlab CI/CD runner : mvn command not found

Maven is well installed on my gitlab-runner server. When executing mvn clean directly on my repo it works, when running my pipeline using Gitlab UI got this error :
bash: line 60: mvn: command not found
ERROR: Job failed: exit status 1
I notice that I tried to fix the problem by adding the before_script section in the .gitlab-ci.yml file :
before_script:
- export MAVEN_HOME=/usr/local/apache-maven
I add also the line :
environment = ["MAVEN_HOME=/usr/local/apache-maven"]
on the config.toml file.
the problem still persist, my executor is : shell.
Any advice!
I managed to fix the problem using this workaround:
script:
- $MAVEN_HOME/bin/mvn clean
Just add the maven docker image, add below line as first line:
image: maven:latest or image: maven:3-jdk-10 or image: maven:3-jdk-9
refer: https://docs.gitlab.com/ee/ci/examples/artifactory_and_gitlab/
For anyone experiencing similar issues, it might be a good idea to restart the gitlab runner ".\gitlab-runner.exe restart". Especially after fiddling with environmental variables.
There is an easier way:
Making changes in ~/.bash_profile not ~/.bashrc.
According to this document:
.bashrc it is more common to use a non-login shell
This document saying:
For certain executors, the runner passes the --login flag as shown above, which also loads the shell profile.
So it should not be ~/.bashrc, you can also try ~/.profile which It can hold the same configurations, which are then also accessible by other shells
In my scenario I do following things:
1. Set gitlab-runner's user password.
passwd gitlab-runner
2. Login gitlab-runner.
su - gitlab-runner
3. Make changes in .bash_profile
Add maven to PATH:
$ export M2_HOME=/usr/local/apache-maven/apache-maven-3.3.9
$ export M2=$M2_HOME/bin
$ export PATH=$M2:$PATH
You can include these commands in $HOME/.bashrc
I hope you had figure out your question. I met the same question when I build my ci on my server.
I use the shell as the executer for my Runner.
here are the steps to figure out.
1 check the user on the runner server
if you had install maven on the runner server successfully, maybe it just successful for the root, u can check the real user for the ci process.
job1:
stage: test
script: whoami
if my case, it print gitlab-runner, not the root
2 su the real user, check mvn again
In this time, it print error as same as the Gitlab ci UI.
3 install maven for the real user. run the pipeline again.
You can also use as per below in the .gitlab-ci.yml
before_script:
- export PATH=$PATH:/opt/apache-maven-3.8.1/bin

Resources