I use Antora 2.3 since sometime now and I had to introduce more source repositories into my playbook.yaml file as the documentation-site grew.
Currently, I have 50-60 Git repos, each of size around 15 MB, mentioned in the playbook.
Afterwards, my site generation began to crash while parallelly cloning the repositories mentioned in my content.sources[] of playbook.yaml (i.e. exits from the Antora command in Linux shell without any errors).
I tried allocating more memory and received same above issue
node --max-old-space-size=16384 `which antora` --cache-dir=./.cache/antora --generator custom-generate-site playbook.yaml --stacktrace
Related
Context
I want to run a bash script during the building stage of my CI.
So far, MacOS building works fine and Unix is in progress but I cannot execute the scripts in my Windows building stage.
Runner
We run a local gitlab runner on Windows 10 home where WSL is configured, Bash for Windows installed and working :
Bash executing in Windows powershell
Gitlab CI
Here is a small example that highlights the issue.
gitlab-ci.yml
stages:
- test
- build
build-test-win:
stage: build
tags:
- runner-qt-windows
script:
- ./test.sh
test.sh
#!/bin/bash
echo "test OK"
Job
Running with gitlab-runner 13.4.1 (e95f89a0)
on runner qt on windows 8KwtBu6r
Resolving secrets 00:00
Preparing the "shell" executor 00:00
Using Shell executor...
Preparing environment 00:01
Running on DESKTOP-5LUC498...
Getting source from Git repository
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in C:/Gitlab-Ci/builds/8KwtBu6r/0/<company>/projects/player-desktop/.git/
Checking out f8de4545 as 70-pld-demo-player-ecran-player...
Removing .qmake.stash
Removing Makefile
Removing app/
Removing business/
Removing <company>player/
git-lfs/2.11.0 (GitHub; windows amd64; go 1.14.2; git 48b28d97)
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:02
$ ./test.sh
Cleaning up file based variables 00:01
Job succeeded
Issue
As you can see, the echo message "test OK" is not visible in the job output.
Nothing seems to be executed but no error is shown and running the script on the Windows device directly works fine.
In case you are wondering, this is a Qt application built via qmake, make and deployed using windeployqt in a bash script (where the issue is).
Any tips or help would be appreciated.
edit : Deploy script contains ~30 lines which would make the gitlab-ci yaml file hard to read if the commands are put directly in the yaml instead of an external shell executed during the CI.
Executing the script from the Windows env
It may be due to gitlab opened a new window to execute bash so stdout not captured.
You can try use file system based methods to check the execution results, such as echo to files. The artifact can be specified with wildcard for example **/*.zip.
I also tested on my windows machine. First if i run ./test.sh in powershell, it will prompt dialog to let me select which program to execute. the default is git bash. That means on your machine you may have configured one executable (you'd better find it out)
I also tried in powershell:
bash -c "mnt/c/test.sh"
and it gives me test OK as expected, without new window.
So I suggest you try bash -c "some/path/test.sh" on your gitlab.
Motivation
My main goal is this: Within a pipeline, I would like to reuse as much as possible (i.e. not build the conda environment multiple times, if all jobs share the same environment).
In my project, I use conda as depency manager and gitlab ci/cd for continuous integration. For the sake of simplicity, let's say I have a build job and a test job. The most straight forward approach would be to create the conda environment from the environment.yml in any job and then do the actual work. This adds an overhead of several minutes to any job. It also seems like overhead to me, since I would like to build the environment once in the build job and then use it in my test job (especially when creating multiple jobs for different tests).
Research Results
The first thing I need to do is to set the CONDA_ENVS_PATH to somewhere in my project directory.
I've looked at gitlab's caching mechanism, but found that it only helps for the same job in repeated runs of the same pipeline, but not not for different jobs of the same run within a pipeline.
I've also looked at gitlab's artifacts mechanism, but found that due the up- and download of those, they don't increase run time significantly (basically I only save time by not downloading many small packages and not having to compile them again, but loose time by compressing and decompressing them).
I've also tried to make use of the GIT_CLEAN_FLAGS by setting them to none in my test job. That way, the conda environment is not deleted when getting the latest data from git. This does cause a serious speedup in my pipelines, but it does not work all the time. Some jobs fail, not finding the conda environment. A simple rerun does magically work however. Of course, in a CI / CD setting, this nondeterminism is not practical.
As a workaround to the original question, we've come up with an intermediate solution. We introduce a docker image, holding our custom environment, through a minimal Dockerfile and a couple of changes to our .gitlab-ci.yml (see below for an example). By only executing the job that builds our custom docker image when the dockerfile or environment changed, we save valuable time on each run. At the same time, we keep the full flexibility in our environment definition and can adjust it exactly how we usually would: by changing the environment.yml.
Question
All solutions tried so far are not really satisfactory. Thus my question is: How can my test job reuse the same conda environment as my build job in gitlab-ci?
In case someone else would like to use a similar setup: Here is my current approach:
# Dockerfile
FROM continuumio/miniconda3:latest
COPY environment.yml .
RUN conda env create -f environment.yml
ENTRYPOINT [""]
# .gitlab-ci.yml
# Use the latest version of this project's docker file
# This will be the default image for all jobs unless specified otherwise
image: $CI_REGISTRY_IMAGE:latest
# Change cache directories to be inside the project directory since we can
# only cache local items.
variables:
PRE_COMMIT_HOME: "${CI_PROJECT_DIR}/.cache/pre-commit"
stages:
- build
- test
# Make conda environment available to all jobs
# This expects the conda environment to have the same name as the gitlab project path
# Avoid dashes and other non-alphabetical characters
default:
before_script:
- source activate "${CI_PROJECT_NAME}"
# Build the docker image including the correct conda environment for subsequent jobs
# This assumes a docker image registry being configured for your gitlab instance
dockerimage:
stage: build
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: [""]
rules:
- changes:
- Dockerfile
- environment.yml
before_script: [ ]
script:
- mkdir -p /kaniko/.docker
- echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
- /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE:latest
# Run pytest
pytest:
stage: test
script:
- conda install pytest-cov
- pytest tests --cov=src
Edit: I replaced the example code for using GIT_CLEAN_FLAGS with our most recent approach: using a custom docker image.
Disclaimer: I saw this and this question, but they are both dated, don't have a satisfying answer and I only found them after writing this question, so I hope my additional question increases discoverability of the topic.
I would like to verify that an rpm is available from Nexus 3 after it is uploaded.
When an rpm is uploaded to Nexus 3, the following events happen (looking at the logs):
Scheduling rebuild of yum metadata to start in 60 seconds
Rebuilding yum metadata for repository rpm
...
Finished rebuilding yum metadata for repository rpm
This takes a while. In my CI pipeline I would like to check periodically until the artifact is available to be installed.
The pipeline builds the rpm, it uploads it to Nexus 3 and then checks every 10 seconds whether the rpm is available. In order to check the availability of the rpm I'm performing the following command:
yum clean all && yum --disablerepo="*" --enablerepo="the-repo-I-care-about" list --showduplicates | grep <name_of_artifcat> | grep <expected_version>
The /etc/yum.conf contains:
cachedir=/var/cache/yum/$basearch/$releasever
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
exactarch=1
obsoletes=1
gpgcheck=1
plugins=1
installonly_limit=5
distroverpkg=centos-release
http_caching=none
The /etc/yum.repos.d/repo-i-care-about.repo contains:
[repo-i-care-about]
name=Repo I care about
enabled=1
gpgcheck=0
baseurl=https://somewhere.com
metadata_expire=5
mirrorlist_expire=5
http_caching=none
The problem I'm experiencing is that the list response seems to return stale information.
The metadata rebuild takes about 70 seconds (60 seconds initial can be configured, I will tweak it eventually), and I'm checking every 10 seconds: the response from the yum repo looks cached somewhere (sometime), and when it happens if I try to perform the same search on another box with the same repo settings I get the expected artefact version.
The fact that on another machine I get the expected result on the first attempt given the specific list command and the fact that the machine where I check every 10 seconds seems to never receive the expected result (even after several minutes since the artefact is available on a different box) makes me think that the response gets cached.
I would like to avoid waiting 90 seconds or so before making the first list request (to make sure that the very first time I perform the list command the artefact is most likely ready and I don't cache the result), especially because the initial delay of the scheduling of the metadata might change (from 60 seconds we might change to a lower value).
The flakyness of this check got better since I've added the http_caching=none to the yum.conf and to the repo definition. But it still didn't make the problem go away reliably.
Is there any other settings around caching that I'm supposed to configure in order to expect more reliable results from the list command? At this point I really don't care about how long the list command would take, as long as it does not contain stale information.
Looks like deleting the /var/cache/yum/* folders is making the check more reliable. Still, it feels like I'm missing some settings to achieve what I need in a neater way.
I want to share a file between two jobs and modify it if there are changed files. The python script compare the cache.json file with changes and modify the cahce file sometimes.
.gitlab-ci.yaml:
image: ubuntu
stages:
- test
cache:
key: one-cache
paths:
- cache.json
job1:
stage: test
script:
# - touch cache.json
- cat cache.json
- python3 modify_json_file.py
- cat cache.json
The problem is that it the cache.json file not exist at the next job run. I get the error message: cat: cache.json: No such file or directory. I did also insert once the touch command, but this doesn't change anything for the next run without the touch command.
Do I something wrong or don't I understand the cache at gitlab wrong.
I think you need artifacts and not cache.
From cache vs artifact:
cache - Use for temporary storage for project dependencies. Not useful for keeping intermediate build results, like jar or apk files. Cache was designed to be used to speed up invocations of subsequent runs of a given job, by keeping things like dependencies (e.g., npm packages, Go vendor packages, etc.) so they don't have to be re-fetched from the public internet. While the cache can be abused to pass intermediate build results between stages, there may be cases where artifacts are a better fit.
artifacts - Use for stage results that will be passed between stages. Artifacts were designed to upload some compiled/generated bits of the build, and they can be fetched by any number of concurrent Runners. They are guaranteed to be available and are there to pass data between jobs. They are also exposed to be downloaded from the UI.
I recently encountered a GitLab pipeline issue where my node_modules weren't being updated with newer versions of a library (particularly my own internal fork of a project, which uses the git+url syntax). I suspect, as the git+url doesn't have a version number in it, its tricky to hash the package file and detect there is a change...
My workaround was to try and put a $date entry in the cache entry of my .gitlab-ci.yml file, so that the cache is lost every 24 hours. However there is no CI variable listed which contains a date, and it doesn't seem that you can access OS variables everywhere in the yaml file. Is there a neat trick I can use?
I tried:
cache:
key: "$(date +%F)" # or see: https://gitlab.msu.edu/help/ci/variables/README.md
paths:
- node_modules
before_script:
- echo Gitlab job started $(date)
This doesn't seem to work - I think it just outputs the key string verbatum, although notice that the script echo command does.
Anyone have any neat ideas? For now, I am just putting a manual string, and will add a digit when I want to cause the cache to be blown (although it is a bit error prone)
At this time there is no way to set the cache expiration time for CI jobs. If the cache is using too much disk space and you're using the Docker executor, you can explore a tool such as https://gitlab.com/gitlab-org/gitlab-runner-docker-cleanup which will keep X amount of disk space free on the runner at any given time by expiring older cache.