Docker multi-stage build and mounting/sharing from previous stage - maven

I would like to use multi-stage builds to avoid downloading all the Maven dependencies required by my Java project every time I build the app.
I am thinking of resolving the Maven dependencies in a first stage, then building the app in a second stage which would require access to the dependencies downloaded in the previous stage.
If I understood well multi-stage builds I could copy files created in the first stage to the second stage, but ideally I would like to be able to "mount" or "share" the folder from the first stage where the dependencies live instead of copying the files, is it possible? Or is there a better way to achieve this?
Thanks.
EDIT:
This was the first stage I was thinking about
FROM some-image-with-maven AS maven-repo
WORKDIR /workspace/
COPY pom.xml .
RUN mvn -B -f pom.xml dependency:resolve
But since the pom file will be different most of the times (because I would like to share this stage across projects), the following step that resolves dependencies will download all of them again (instead of using a cached layer).

You can only copy stuff from the first stage if you are not using volumes. When using volumes, you can share data between stages which are basically separate container instances.
Since missing to clean up volumes is often not handled properly I suggest to keep to the copy strategy. There is no real benefit using bind-mount to share data over the copy approach.

I don't believe there's a way to do this currently. To share from one build stage to the next, the only option is to COPY files from one stage's directory to the current stage.
To use the first stage as a build cache and avoid copying all the dependencies, I'd run your build in that first stage. Or you can make a second intermediate stage that is FROM stage1name if you want additional separation between the stages. The output of your build can then be copied to the final layer, avoiding the need to copy all the build dependencies.

Answering from the future...
If using buildkit or compatible (most people probably are by now), you can mount a previous stage with a bind mount. Something like this would accomplish what the original post was asking:
FROM someimage as build
COPY pom.xml .
RUN mvn -Dmaven.repo.local=/.m2_repository -B -f pom.xml dependency:resolve
FROM runtimeimage
COPY pom.xml .
COPY src/ ./src/
RUN --mount=type=bind,from=build,source=/.m2_repository,target=/.m2_repository \
mvn package
But more to the point, there is also a cache mount that you can use instead and you would incur the cost of downloading all the deps on first run, but subsequent would be able to find those deps in the cache:
FROM runtimeimage
COPY pom.xml .
COPY src/ ./src/
RUN --mount=type=cache,target=/.m2_repository,sharing=locked \
mvn package

Related

Pre-compiling Golang project dependencies to cache

In short, my current use-case involves dynamically creating a Golang plugin inside a Docker container. The compilation involves some new input from the user (which is why it is not compiled beforehand), but the dependencies are static, and won't change.
Currently, the full compilation is done from scratch inside the Docker container (though go mod download is used to reduce the time by a bit). I noticed that the go build command ends up compiling a lot of the dependencies, which adds a non-trivial amount of time for the plugin compilation, which affects the usability of my application.
Is there a Go supported method or command to read through the go.mod file and populate the GOCACHE directory? With such a command, I would run it in my Dockerfile itself, causing the Docker image to contain the cache with all the compiled build dependencies.
What I've tried:
go mod download: This only downloads the dependencies; it does not compile them.
I do have this working with a temporary workaround: I created a barebones main.go that imports all the dependencies, and run go build within my Dockerfile to populate the cache. As mentioned, this does solve my problem, but it feels like a bit of a hack. Additionally, if the dependencies change in the future, it requires someone to change this as well, which isn't ideal.
A lot of the answers I saw online for this involve CI/CD. With CI/CD, the container just has a partition mounted to the host, which contains a cache that is persisted after runs. This does not solve my immediate problem, which is for building the container itself.
Since the issues (1, 2) in the golang repository are still open, all that we can do is "hacking", I think. So, we can do something like that for the dependencies caching and pre-compilation as a separate docker layer:
FROM golang:1.19-buster as builder
COPY ./go.* /src/
WORKDIR /src
# burn the modules cache
RUN set -x \
# cache go dependencies
&& go mod download \
# pre-compile common dependencies
&& mkdir /tmp/gobin \
&& for p in $(go list -m -f '{{if and (not .Indirect) (not .Main)}}{{.Path}}/...#{{.Version}}{{end}}' all); do \
GOBIN=/tmp/gobin go install $p; \
done \
&& rm -r /tmp/gobin
COPY . /src
RUN go build ...
Without this trick, the building (docker buildx build --platform linux/amd64,linux/arm64 ...) takes about 9 minutes, with it ~6 minutes (profit 30%). But the pre-compilation step becomes longer by ~40%.

Caching Maven dependencies in Gitlab-CI correctly

I have configured and working following setup
gitlab-ci, which uses docker-machine runner and uploads cache to S3
maven build with configured caching
caching correctly loads and uploads on each job
But the problem is, that every time I run mvn install, something in the local maven repository changes (I assume it updates pom metadata) and gitlab runner keeps uploading new versions of the cache, on every single build.
It is still faster and more reliable to use this "busted" cache, than to download the deps from internet every time, but the upload can take a long time and I would like to shave off this extra time.
How can I modify my build to force maven, to generate cacheable local repository?
Simplified version of my .gitlab-ci.yml:
variables:
# we have a custom java+maven image, that uses this ENV variable,
# to auto-configure path where to put the local maven repository
MAVEN_LOCAL_REPOSITORY: $CI_PROJECT_DIR/.cache/maven
job-build:
stage: build
image: internal-gitlab/java/maven:3.6-jdk8-alpine
script:
- mvn -B clean package
cache:
key: backend-dependencies
paths:
- .cache/
You have a constant as a cache key. Maybe a more fine grained cache would help.
See the link here
In short - prepare your own maven image with required dependencies and use it instead of internal-gitlab/java/maven:3.6-jdk8-alpine.
Some details:
First of all, you need to create a maven docker image where all (or most of) required for your project dependencies are presented. Publish it to your registry (gitlab has one) and use it instead of internal-gitlab/java/maven:3.6-jdk8-alpine.
To create such an image I usually create an additional job in CI triggered manually. You need to trigger it at initial stage and when project dependencies are heavily modified.
Working sample can be found here:
https://gitlab.com/alexej.vlasov/syncer/blob/master/.gitlab-ci.yml
- this project is using the prepared image and also it has a job to prepare this image.
https://gitlab.com/alexej.vlasov/maven/blob/master/Dockerfile
- dockerfile to run maven and download dependencies once.
The pros:
don't need to download dependencies each time - they are inside a
docker image (and docker layers are cached on the runners)
don't need to upload artifacts when job is finished

Caching Jar dependencies for Maven-based Docker builds

I'm building a Docker image from this Dockerfile:
FROM maven:3.3.3-jdk-8
MAINTAINER Mickael BARON
ADD pom.xml /work/pom.xml
WORKDIR /work
RUN mvn dependency:go-offline --fail-never
ADD ["src", "/work/src"]
RUN ["mvn", "package"]
With this Dockerfile, I force to download the dependencies before packaging my Java project. Thus, I don't have to redownload the dependencies every time I changed a file from my src directory.
But, there is a problem and this problem is depending on the version of Maven (base image). In fact, the dependencies are downloaded but they are not persisted into the ~/.m2 directory of the container. It's empty. Thus, when I change some source file all the dependencies are redownloaded.
However, I noticed that if I change the version of Maven from the base image (for example FROM maven:3.2.5-jdk-8), it works.
Very strange, isn't it?
There is a new instruction regarding this topic:
https://github.com/carlossg/docker-maven#packaging-a-local-repository-with-the-image
The $MAVEN_CONFIG dir (default to /root/.m2) is configured as a volume so anything copied there in a Dockerfile at build time is lost. For that the dir /usr/share/maven/ref/ is created, and anything in there will be copied on container startup to $MAVEN_CONFIG.
To create a pre-packaged repository, create a pom.xml with the dependencies you need and use this in your Dockerfile. /usr/share/maven/ref/settings-docker.xml is a settings file that changes the local repository to /usr/share/maven/ref/repository, but you can use your own settings file as long as it uses /usr/share/maven/ref/repository as local repo.
I'm afraid it's because of this VOLUME instruction they've added:
https://github.com/carlossg/docker-maven/blame/8ab542b907e69c5269942bcc0915d8dffcc7e9fa/jdk-8/Dockerfile#L11
It makes /root/.m2 a volume and thus any changes to that folder made by build steps are not brought on to the following build containers.

How to make TeamCity only clean up certain files

Is it possible to make TeamCity only clean up certain files upon fetching files from my git repo? I modify one file as a build step, and thus always need a clean version of that file. However, it's really unnecessary to fetch the whole repo everytime because usually only a few files are modified (thus, I'd rather not use the 'Clean all files before build' command).
Thanks!
To clarify, lets say I have the following structure:
- index.html
- js/script.js
- js/plugins.js
I only want to always (regardless if any change has happened) to checkout index.html. The files in the js folder I only want to replace whenever any updates on them have happened.
If you are using TeamCity 6.5 or above you can use the Build Files Cleaner (Swabra) Build Feature. Once you have added it your build steps and run clean build it will clean any new unversioned files generated during the build either before the new build starts or at the end of the current build.
I personally prefer to run it before the new build starts as it allows you to look at any of the output when trying to work out why something went wrong.
Basically it makes sure that there is nothing in the build agents work folder that was not pulled from the repository before each build.

How to access test artifacts from Jenkins if test fails

I have a Maven project which performs a number of time consuming tests as part of the integration-test Maven cycle. I'm using Jenkins as the CI server.
During the integration test a number of files are produced in the target folder. For example, an "actual" BMP file is produced and compared to an "expected" BMP file. If the test fails, I need to look at the files in the target folder to determine how to deal with the error. Maybe the actual BMP looks fine and so it should be promoted to the new expected BMP. On the other hand, it may reveal a problem that requires a code fix.
The thing is I don't have any way to get access to these files, other than to ssh into the CI server and manually scp the files over to my own machine for closer inspection. It would be extremely helpful if I could access these files from the Jenkins web interface.
I tried using the build-helper-maven-plugin to attach the relevant files as Maven artifacts, but the problem is that there is no suitable phase in Maven that executes after an integration-test, if any test fails.
What can I do? Can I use the "Copy Artifact" plugin for this?
1) The files in the target folder can be accessed using a link such as /ws/projectname/target/filename...
2) Rather than typing the url each time, the SideBar plugin can be used to add a link to the file to Jenkins' left menu, making it easily accessible.
You need to copy your files into your workspace in a build step and archive them from there - Jenkins lets you specify artifacts only relative to the workspace.
I usually create a directory keyed by the BUILD_ID in the workspace, so that artifacts from different builds do not get mixed up in case I do not clean the workspace and archive from there (specifying ${BUILD_ID}/**/* in the archiving step).
In case your build fails before it can run the copying step and because of it does not do the copy, take a look at this question.

Resources