OWASP dependency check it's a great way of automating vulnerability discovery in our projects, though when running it as part of a CI pipeline per project it adds 3-4 minutes just to download the NVD database.
How can we cache this DB when running it with maven / gradle on a CI pipeline?
After a bit of research I found the way!
Basically, the files containing the NVM db are called: nvdcve-1.1-[YYYY].json.gz i.e. nvdcve-1.1-2022.json.gz which are later added to a Lucene index.
When running Dependency-Check with the Gradle plugin the files are created on:
$GRADLE_USER_HOME/.gradle/dependency-check-data/7.0/nvdcache/
When running it with Maven they are created on:
$MAVEN_HOME/.m2/repository/org/owasp/dependency-check-data/7.0/nvdcache/
So to cache this the DB on Gitlab CI you just have to add the following to your .gitlab-ci.yaml (Gradle):
before_script:
- export GRADLE_USER_HOME=`pwd`/.gradle
cache:
key: "$CI_PROJECT_NAME"
paths:
- .gradle/dependency-check-data
The first CI job run will create the cache and the consecutive (from same or different pipelines) will fetch it!
Related
I have a GitLab pipeline setup that has a package step to do a maven build during the tag event and a release to upload the jar to the GitLab generic package registry using curl and GitLab-release cli.
What I'm expecting to happen is a cache of the .m2 to be loaded into the package step to allow the mvn clean package to do its thing. Then archive the created jar and test results only.
The release step should begin clean with no git clone, no cache and only the jar and test results.
Instead the 'find .' shows the release step contains everything including
Git directory (.git)
Full checked out repository
.m2 cache
target (fully built as the Package step produced)
From the cache documentation (https://docs.gitlab.com/ee/ci/caching/) on GitLab it states
Archive: 'dependencies' keyword to control which job fetches the artifacts
Disable Cache uses the 'cache: []'
Why is GitLab putting so much content into the release job? The release job fails at times because its finding multiple Jar files from previous tags (IE the clean and the archiving are holding past version).
gitlab-ci.yml
variables:
MAVEN_CLI_OPTS: "-s $CI_PROJECT_DIR/.m2/settings.xml"
MAVEN_VERSION_PLUGIN_VERSION: 2.11.0
MAVEN_ARTIFACT_NAME: test-component
GIT_CLEAN_FLAGS: -ffd
PACKAGE_REGISTRY_URL: "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/${MAVEN_ARTIFACT_NAME}"
cache:
key: primary
paths:
- .m2/repository
stages:
- package
- release
package:
stage: package
image: maven:latest
script:
- mvn ${MAVEN_CLI_OPTS} clean package
artifacts:
paths:
- target/*.jar
- target/surefire-reports
only:
- tags
- merge_requests
- branches
except:
- main
release:
stage: release
image: alpine:latest
cache: []
variables:
GIT_STRATEGY: none
dependencies:
- package
script:
- |
apk add curl gitlab-release-cli
find .
JAR_NAME=`basename target/${MAVEN_ARTIFACT_NAME}-${CI_COMMIT_TAG}.jar`
'curl --header "JOB-TOKEN: ${CI_JOB_TOKEN}" --upload-file target/${JAR_NAME} ${PACKAGE_REGISTRY_URL}/${CI_COMMIT_TAG}/${JAR_NAME}'
release-cli create --name "Release $CI_COMMIT_TAG" --description "$TAG_MESSAGE" --tag-name ${CI_COMMIT_TAG} --assets-link "{\"name\":\"jar\",\"url\":\"${PACKAGE_REGISTRY_URL}/${CI_COMMIT_TAG}/${JAR_NAME}\"}"
only:
- tags
See the GitLab docs on GIT_STRATEGY:
A Git strategy of none also re-uses the local working copy, but skips all Git operations normally done by GitLab. GitLab Runner pre-clone scripts are also skipped, if present. This strategy could mean you need to add fetch and checkout commands to your .gitlab-ci.yml script.
It can be used for jobs that operate exclusively on artifacts, like a deployment job. Git repository data may be present, but it’s likely out of date. You should only rely on files brought into the local working copy from cache or artifacts.
So GitLab documentation is pretty clear that you should always expect the git repository to be present. When you want to work exclusively with artifacts, I you can create a new temporary directory and reference the path to the artifacts explicitly rather than relying on a totally clean working directory.
I'm new with Gitlab CI. Every time Gitlab CI run, it replace old folder on server. I have small problem when I want to reduce time Gradle build for project which include DL4J (very big size and take time to build). So I want it keep build folder from last version. I follow this to reduce time build by gradle.
Question: Is that possible to skip some folder by config of gitlab ci to keep it exist. This is my gitlab ci
stages:
- build
something_run:
stage: build
script:
- gradle build
- systemctl restart myproject
tags:
- ml
only:
- master
When it run, gradle will build project and time to build quite long. So I want next time CI run it will not delete last build version.
Take a look at cache (https://docs.gitlab.com/ee/ci/yaml/#cache)
cache is used to specify a list of files and directories which should be cached between jobs.
GitLab CI/CD provides a caching mechanism that can be used to save time when your jobs are running.
See also https://docs.gitlab.com/ee/ci/caching/index.html
I hope someone can help me with a simple setup of maven CI scripts for GitLab.
I tried to search stackoverflow and google, which results in several questions and answers, but either they seem to be completely different or not that I understand them.
I have a simple setup of two projects. project B depends on project A (= pom packaging).
I have in the runner configuration /etc/gitlab-runner/config.toml the line with the volumes added
[[runners]]
...
[runners.docker]
...
volumes = ["/cache", "/.m2"]
...
my .gitlab-ci.yml for both projects look like this
image: maven:3.6.1-jdk-12
cache:
paths:
- /.m2/repository
- target/
variables:
MAVEN_OPTS: "-Dmaven.repo.local=/.m2/repository"
maven_job:
script:
- mvn clean install
with this - the first project builds correctly and I can see that the caching is working, as it does not download all maven related plugins for building the project, when executed again and again.
It also states
[INFO] Installing /builds/end2end/projectA/pom.xml to /.m2/repository/de/end2end/projectA/0.4.4-SNAPSHOT/projectA-0.4.4-SNAPSHOT.pom
It reports though at the end
WARNING: /.m2/repository: not supported: outside build directory
WARNING: /.m2/repository/classworlds: not supported: outside build directory
WARNING: /.m2/repository/classworlds/classworlds: not supported: outside build directory
WARNING: /.m2/repository/classworlds/classworlds/1.1-alpha-2: not supported: outside build directory
WARNING: /.m2/repository/classworlds/classworlds/1.1-alpha-2/_remote.repositories: not supported: outside build directory
[...]
When executing projectB, the job fails with the info, that it cannot find projectA.
So - what is wrong with the configuration of the runner / .gitlab-ci.yml files ?
I tried
cache:
paths:
- .m2/repository
which removes the warnings, but then the projectA gets in its local .m2 installed
[INFO] Installing /builds/end2end/projectA/pom.xml to /builds/end2end/projectAt/.m2/repository/de/end2end/projectA/0.4.4-SNAPSHOT/projectA-0.4.4-SNAPSHOT.pom
and projectB fails with the same error as above.
In fact, as described in gitlab doc, you use the dynamic storage so the volume is shared between subsequent runs of the same concurrent job for one project. I you want to share data between projects you must use the host-bound storage.
For the warning, the cache is only for working directory, so absolute path like /.m2/repository is not supported. In your case, you don't have to use cache for maven repository because you use a volume.
I have configured and working following setup
gitlab-ci, which uses docker-machine runner and uploads cache to S3
maven build with configured caching
caching correctly loads and uploads on each job
But the problem is, that every time I run mvn install, something in the local maven repository changes (I assume it updates pom metadata) and gitlab runner keeps uploading new versions of the cache, on every single build.
It is still faster and more reliable to use this "busted" cache, than to download the deps from internet every time, but the upload can take a long time and I would like to shave off this extra time.
How can I modify my build to force maven, to generate cacheable local repository?
Simplified version of my .gitlab-ci.yml:
variables:
# we have a custom java+maven image, that uses this ENV variable,
# to auto-configure path where to put the local maven repository
MAVEN_LOCAL_REPOSITORY: $CI_PROJECT_DIR/.cache/maven
job-build:
stage: build
image: internal-gitlab/java/maven:3.6-jdk8-alpine
script:
- mvn -B clean package
cache:
key: backend-dependencies
paths:
- .cache/
You have a constant as a cache key. Maybe a more fine grained cache would help.
See the link here
In short - prepare your own maven image with required dependencies and use it instead of internal-gitlab/java/maven:3.6-jdk8-alpine.
Some details:
First of all, you need to create a maven docker image where all (or most of) required for your project dependencies are presented. Publish it to your registry (gitlab has one) and use it instead of internal-gitlab/java/maven:3.6-jdk8-alpine.
To create such an image I usually create an additional job in CI triggered manually. You need to trigger it at initial stage and when project dependencies are heavily modified.
Working sample can be found here:
https://gitlab.com/alexej.vlasov/syncer/blob/master/.gitlab-ci.yml
- this project is using the prepared image and also it has a job to prepare this image.
https://gitlab.com/alexej.vlasov/maven/blob/master/Dockerfile
- dockerfile to run maven and download dependencies once.
The pros:
don't need to download dependencies each time - they are inside a
docker image (and docker layers are cached on the runners)
don't need to upload artifacts when job is finished
Between 1 and 2 minutes of my AWS CodeBuilds are spent downloading dependencies from Maven Central.
Short of building a pre-provisioned Docker container, is there any way to cache these between builds?
CodeBuild now provides a cache feature you can use to pre-load your dependencies.
Unsigned's answer is good but is a tad outdated. As of February 2019, CodeBuild allows both caching in an S3 bucket and allows the user to cache locally. You can now specify cache at 3 different layers of a build:
Docker Layer Caching
Git Layer Cahing (cache the last build and then only build from git diff)
Custom caching - specified within the cache: portion of your buildspec.yml file. Personally, I cache my node_modules/ here and then cache at the Git Layer.
Source: https://aws.amazon.com/blogs/devops/improve-build-performance-and-save-time-using-local-caching-in-aws-codebuild/