Caching Maven dependencies in Gitlab-CI correctly - maven

I have configured and working following setup
gitlab-ci, which uses docker-machine runner and uploads cache to S3
maven build with configured caching
caching correctly loads and uploads on each job
But the problem is, that every time I run mvn install, something in the local maven repository changes (I assume it updates pom metadata) and gitlab runner keeps uploading new versions of the cache, on every single build.
It is still faster and more reliable to use this "busted" cache, than to download the deps from internet every time, but the upload can take a long time and I would like to shave off this extra time.
How can I modify my build to force maven, to generate cacheable local repository?
Simplified version of my .gitlab-ci.yml:
variables:
# we have a custom java+maven image, that uses this ENV variable,
# to auto-configure path where to put the local maven repository
MAVEN_LOCAL_REPOSITORY: $CI_PROJECT_DIR/.cache/maven
job-build:
stage: build
image: internal-gitlab/java/maven:3.6-jdk8-alpine
script:
- mvn -B clean package
cache:
key: backend-dependencies
paths:
- .cache/

You have a constant as a cache key. Maybe a more fine grained cache would help.
See the link here

In short - prepare your own maven image with required dependencies and use it instead of internal-gitlab/java/maven:3.6-jdk8-alpine.
Some details:
First of all, you need to create a maven docker image where all (or most of) required for your project dependencies are presented. Publish it to your registry (gitlab has one) and use it instead of internal-gitlab/java/maven:3.6-jdk8-alpine.
To create such an image I usually create an additional job in CI triggered manually. You need to trigger it at initial stage and when project dependencies are heavily modified.
Working sample can be found here:
https://gitlab.com/alexej.vlasov/syncer/blob/master/.gitlab-ci.yml
- this project is using the prepared image and also it has a job to prepare this image.
https://gitlab.com/alexej.vlasov/maven/blob/master/Dockerfile
- dockerfile to run maven and download dependencies once.
The pros:
don't need to download dependencies each time - they are inside a
docker image (and docker layers are cached on the runners)
don't need to upload artifacts when job is finished

Related

How to cache OWASP dependecy check NVD database on CI

OWASP dependency check it's a great way of automating vulnerability discovery in our projects, though when running it as part of a CI pipeline per project it adds 3-4 minutes just to download the NVD database.
How can we cache this DB when running it with maven / gradle on a CI pipeline?
After a bit of research I found the way!
Basically, the files containing the NVM db are called: nvdcve-1.1-[YYYY].json.gz i.e. nvdcve-1.1-2022.json.gz which are later added to a Lucene index.
When running Dependency-Check with the Gradle plugin the files are created on:
$GRADLE_USER_HOME/.gradle/dependency-check-data/7.0/nvdcache/
When running it with Maven they are created on:
$MAVEN_HOME/.m2/repository/org/owasp/dependency-check-data/7.0/nvdcache/
So to cache this the DB on Gitlab CI you just have to add the following to your .gitlab-ci.yaml (Gradle):
before_script:
- export GRADLE_USER_HOME=`pwd`/.gradle
cache:
key: "$CI_PROJECT_NAME"
paths:
- .gradle/dependency-check-data
The first CI job run will create the cache and the consecutive (from same or different pipelines) will fetch it!

How I can access to a file saved in Gitlab CI/CD artifact in the next stages and increase it's version

I have a spring boot application and I deployed it in the first stage and now I have a jar file.
my question is how to access this jar file in the next stage and how I can run it.
second question is HOW can I increase its version number? for example jar file name is spring.0.0.1.jar and I want to increase version number after every push. is this possible?
First Question:
You can save the jar file as an artifact and access it in the second stages jobs' script area. For example
FirstJob:
stage: FirstStage
scrip:
- <your commands here>
artifacs:
paths:
- ./artifacts/myOutput.jar
Now your "myOutput.jar" is accessible in the artifacts folder for all following jobs. See here: https://docs.gitlab.com/ee/ci/pipelines/job_artifacts.html
Second Question:
As far as I know there is no way of handing down variables between pipelines in GitLab CI so this would not be possible if I'm right. Since artifacts don't get added to the repository, previous artifacts are not available to following pipelines either. Yet, if I had to come up with a solution on the spot, I'd try:
Saving the version number somewhere accessible to every pipeline (cloud, repo)
git push every artifact to actually add it to the repository, then check file name and increment version number
using the GitLab CI release option. The CI can create a release object for you, maybe this could help as well. See here: https://docs.gitlab.com/ee/ci/yaml/README.html#release

Gitlab Runner configuration to ignore folder builded on server

I'm new with Gitlab CI. Every time Gitlab CI run, it replace old folder on server. I have small problem when I want to reduce time Gradle build for project which include DL4J (very big size and take time to build). So I want it keep build folder from last version. I follow this to reduce time build by gradle.
Question: Is that possible to skip some folder by config of gitlab ci to keep it exist. This is my gitlab ci
stages:
- build
something_run:
stage: build
script:
- gradle build
- systemctl restart myproject
tags:
- ml
only:
- master
When it run, gradle will build project and time to build quite long. So I want next time CI run it will not delete last build version.
Take a look at cache (https://docs.gitlab.com/ee/ci/yaml/#cache)
cache is used to specify a list of files and directories which should be cached between jobs.
GitLab CI/CD provides a caching mechanism that can be used to save time when your jobs are running.
See also https://docs.gitlab.com/ee/ci/caching/index.html

Common maven repository for GitLab CI

I hope someone can help me with a simple setup of maven CI scripts for GitLab.
I tried to search stackoverflow and google, which results in several questions and answers, but either they seem to be completely different or not that I understand them.
I have a simple setup of two projects. project B depends on project A (= pom packaging).
I have in the runner configuration /etc/gitlab-runner/config.toml the line with the volumes added
[[runners]]
...
[runners.docker]
...
volumes = ["/cache", "/.m2"]
...
my .gitlab-ci.yml for both projects look like this
image: maven:3.6.1-jdk-12
cache:
paths:
- /.m2/repository
- target/
variables:
MAVEN_OPTS: "-Dmaven.repo.local=/.m2/repository"
maven_job:
script:
- mvn clean install
with this - the first project builds correctly and I can see that the caching is working, as it does not download all maven related plugins for building the project, when executed again and again.
It also states
[INFO] Installing /builds/end2end/projectA/pom.xml to /.m2/repository/de/end2end/projectA/0.4.4-SNAPSHOT/projectA-0.4.4-SNAPSHOT.pom
It reports though at the end
WARNING: /.m2/repository: not supported: outside build directory
WARNING: /.m2/repository/classworlds: not supported: outside build directory
WARNING: /.m2/repository/classworlds/classworlds: not supported: outside build directory
WARNING: /.m2/repository/classworlds/classworlds/1.1-alpha-2: not supported: outside build directory
WARNING: /.m2/repository/classworlds/classworlds/1.1-alpha-2/_remote.repositories: not supported: outside build directory
[...]
When executing projectB, the job fails with the info, that it cannot find projectA.
So - what is wrong with the configuration of the runner / .gitlab-ci.yml files ?
I tried
cache:
paths:
- .m2/repository
which removes the warnings, but then the projectA gets in its local .m2 installed
[INFO] Installing /builds/end2end/projectA/pom.xml to /builds/end2end/projectAt/.m2/repository/de/end2end/projectA/0.4.4-SNAPSHOT/projectA-0.4.4-SNAPSHOT.pom
and projectB fails with the same error as above.
In fact, as described in gitlab doc, you use the dynamic storage so the volume is shared between subsequent runs of the same concurrent job for one project. I you want to share data between projects you must use the host-bound storage.
For the warning, the cache is only for working directory, so absolute path like /.m2/repository is not supported. In your case, you don't have to use cache for maven repository because you use a volume.

Docker multi-stage build and mounting/sharing from previous stage

I would like to use multi-stage builds to avoid downloading all the Maven dependencies required by my Java project every time I build the app.
I am thinking of resolving the Maven dependencies in a first stage, then building the app in a second stage which would require access to the dependencies downloaded in the previous stage.
If I understood well multi-stage builds I could copy files created in the first stage to the second stage, but ideally I would like to be able to "mount" or "share" the folder from the first stage where the dependencies live instead of copying the files, is it possible? Or is there a better way to achieve this?
Thanks.
EDIT:
This was the first stage I was thinking about
FROM some-image-with-maven AS maven-repo
WORKDIR /workspace/
COPY pom.xml .
RUN mvn -B -f pom.xml dependency:resolve
But since the pom file will be different most of the times (because I would like to share this stage across projects), the following step that resolves dependencies will download all of them again (instead of using a cached layer).
You can only copy stuff from the first stage if you are not using volumes. When using volumes, you can share data between stages which are basically separate container instances.
Since missing to clean up volumes is often not handled properly I suggest to keep to the copy strategy. There is no real benefit using bind-mount to share data over the copy approach.
I don't believe there's a way to do this currently. To share from one build stage to the next, the only option is to COPY files from one stage's directory to the current stage.
To use the first stage as a build cache and avoid copying all the dependencies, I'd run your build in that first stage. Or you can make a second intermediate stage that is FROM stage1name if you want additional separation between the stages. The output of your build can then be copied to the final layer, avoiding the need to copy all the build dependencies.
Answering from the future...
If using buildkit or compatible (most people probably are by now), you can mount a previous stage with a bind mount. Something like this would accomplish what the original post was asking:
FROM someimage as build
COPY pom.xml .
RUN mvn -Dmaven.repo.local=/.m2_repository -B -f pom.xml dependency:resolve
FROM runtimeimage
COPY pom.xml .
COPY src/ ./src/
RUN --mount=type=bind,from=build,source=/.m2_repository,target=/.m2_repository \
mvn package
But more to the point, there is also a cache mount that you can use instead and you would incur the cost of downloading all the deps on first run, but subsequent would be able to find those deps in the cache:
FROM runtimeimage
COPY pom.xml .
COPY src/ ./src/
RUN --mount=type=cache,target=/.m2_repository,sharing=locked \
mvn package

Resources