I wrote a pipline to build my Java application with Maven. I have feature branches and master branch in my Git repository, so I have to separate Maven goal package and deploy. Therefore I created two jobs in my pipeline. Last job needs job results from first job.
I know that I have to cache the job results, but I don't want to
expose the job results to GitLab UI
expose it to the next run of the pipeline
I tried following solutions without success.
Using cache
I followed How to deploy Maven projects to Artifactory with GitLab CI/CD:
Caching the .m2/repository folder (where all the Maven files are stored), and the target folder (where our application will be created), is useful for speeding up the process by running all Maven phases in a sequential order, therefore, executing mvn test will automatically run mvn compile if necessary.
but this solution shares job results between piplines, see Cache dependencies in GitLab CI/CD:
If caching is enabled, it’s shared between pipelines and jobs at the project level by default, starting from GitLab 9.0. Caches are not shared across projects.
and also it should not be used for caching in the same pipeline, see Cache vs artifacts:
Don’t use caching for passing artifacts between stages, as it is designed to store runtime dependencies needed to compile the project:
cache: For storing project dependencies
Caches are used to speed up runs of a given job in subsequent pipelines, by storing downloaded dependencies so that they don’t have to be fetched from the internet again (like npm packages, Go vendor packages, etc.) While the cache could be configured to pass intermediate build results between stages, this should be done with artifacts instead.
artifacts: Use for stage results that will be passed between stages.
Artifacts are files generated by a job which are stored and uploaded, and can then be fetched and used by jobs in later stages of the same pipeline. This data will not be available in different pipelines, but is available to be downloaded from the UI.
Using artifacts
This solution is exposing the job results to the GitLab UI, see artifacts:
The artifacts will be sent to GitLab after the job finishes and will be available for download in the GitLab UI.
and there is no way to expire the cache after finishing the pipeline, see artifacts:expire_in:
The value of expire_in is an elapsed time in seconds, unless a unit is provided.
Is there any way to cache job results only for the running pipline?
There is no way to send build artifacts between jobs in GitLab that only keeps them as long as the pipeline is running. This is how GitLab has designed their CI solution.
The recommended way to send build artifacts between jobs in GitLab is to use artifacts. This feature always upload the files to the GitLab instance, that they call the coordinator in this case. These files are available through the GitLab UI, as you write. For most cases this is a complete waste of space, but in rare cases it is very useful as you can download the artifacts and check why your pipeline broke.
The artifacts are available for download by project members that are at least Reporters, but can be viewed by everybody if public pipelines is enabled. You can read more about permissions here.
To not fill up your hard disk or quotas, you should use an expire_in. You could set it to just a few hours if you really don't want to waste space. I would not recommend this though, as if a job that depend on these artifacts fails and you retry it, if the artifacts have expired, you will have to restart the whole pipeline. I usually put this to one week for intermediate build artifacts as that often fits my needs.
If you want to use caches for keeping build artifacts, maybe because your build artifacts are huge and you need to optimize it, it should be possible to use CI_PIPELINE_ID as the key of the cache (I haven't tested this):
cache:
key: ${CI_PIPELINE_ID}
The files in the cache should be stored where your runner is installed. If you make sure that all jobs that need these build artifacts are executed by runners that have access to this cache, it should work.
You could also try some of the other predefined environment variables as key our your cache.
Related
I need to pass the folders generated in one pipeline to the next pipeline in Gitlab CI. What are the possible ways?
Is it possible through just Artifacts?
Can we only achieve it through cache?
If by Cache, is there any expiry that we can set in cache?
My actual question was (but no answers so far) :
Carry artifacts of Gitlab pages between pipelines/jobs
There is a simple distinction:
Cache is used between multiple runs of the same job in different pipelines and also on the same runner (unless you have configured a shared cache storage)
Artifacts are used to pass files between different jobs within a single pipeline
Jobs may specify an artifacts:expire_in keyword to control the lifespan of their artifacts (see https://docs.gitlab.com/ee/ci/yaml/README.html#artifactsexpire_in ).
I'm currently having a frustrating issue.
I have a setup of GitLab CI on a VPS server, which is working completely fine, I have my pipelines running without a problem.
The issue comes after having to redo a pipeline. Each time GitLab deletes the whole folder, where the build is and builds it again to deploy it. My problem is that I have a "uploads" folder, that stores all user content, that was uploaded, and each time I redo a pipeline everything gets deleted from this folder and I obviously need this content, because it's the purpose of the app.
I have tried GitLab CI cache - no luck. I have also tried making a new folder, that isn't in the repository, it deletes it too.
Running my first job looks like so:
Job
As you can see there are a lot of lines, that says "Removing ..."
In order to persist a folder with local files while integrating CI pipelines, the best approach is to use Docker data persistency, as you'll be able to delete everything from the last build while keeping local files inside your application between your builds, while maintains the ability to start from stretch every time you start a new pipeline.
Bind-mount volumes
Volumes managed by Docker
GitLab's CI/CD Documentation provides a short briefing on how to persist storage between jobs when using Docker to build your applications.
I'd also like to point out that if you're using Gitlab Runner through SSH, they explicitly state they do not support caching between builds when using this functionality. Even when using the standard Shell executor, they highly discourage saving data to the Builds folder. so it can be argued that the best practice approach is to use a bind-mount volume to your host and isolate the application from the user uploaded data.
I have 3 applications which are integrated with Jenkins.
Now I want to do following tasks on them in Jenkins :
After detecting change in SCM, build the application and deploy artifact(jar) to my local Nexus Repository.
Do the static code analysis.
Deploy the application to UAT server.
Till now, I have been successful in achieving all this requirement.
Problems:
I found that
I don't need to do static code analysis for every SCM change as it takes around 15mins. It will enough If I perform this action daily once or twice(periodically).
I don't need to do update war on UAT for each SCM change, but based on SCM change log(ie if change log contain '#deploy' keyword then upload)
My not-so-good solution:
For the time being, I have create 3 different jobs for a one project to fulfill above requirements, which is obviously not a correct thing to do.
So my question is, how can run specific Maven goal based on certain conditions in Jenkins?
We have a Subversion repository setup in this manor:
http://svn.vegicorp.net/svn/toast/api/trunk
http://svn.vegicorp.net/svn/toast/api/1.0
http://svn.vegicorp.net/svn/toast/data/trunk
http://svn.vegicorp.net/svn/toast/data/branches/1.2
http://svn.vegicorp.net/svn/toast/data/branches/1.3
I've setup a Jenkins Multi-Pipeline build for the entire toast project including all sub-projects -- each sub-project is a jarfile. What I want is for Jenkins to fire off a new build each time any file is changed in one of the toast projects. That project should rebuild. This way, if we create a new sub-project in toast or a new branch in one of the toast sub-projects, Jenkins will automatically create a new build for that.
Here's my Jenkins Multi-Branch setup:
Branch Sources
Subversion
Project Repository Base: http://svn.vegicorp.net/svn/toast
Credentials: builder/*****
Include Branches: */trunk, */branches/*
Exclude Branches: */private
Property Strategy: All branches get the same properties
Build Configuration
Mode: By Jenkinsfile
Build Triggers (None selected)
Trigger builds remotely (e.g., from scripts) Help for feature: Trigger * builds remotely (e.g., from scripts)
Build periodically Help for feature: Build periodically
Build when another project is promoted
Maven Dependency Update Trigger Help for feature: Maven Dependency Update Trigger
Periodically if not otherwise run
Note that the list of Build Triggers list does not include Poll SCM. Changes in the repository does not trigger any build. Jenkinsfiles are located at the root of each sub-project. If I force a reindex, all changed sub-projects get built and all new branches are found. I did originally checked Periodically and reindexed every minute to pick up a change, but that's klutzy and it seems to cause Jenkins to consume memory.
Triggering a build on an SCM change should be pretty basic, but I don't see a configuration parameter for this like I do with standard jobs. I also can't seem to go into sub-projects and set those to trigger builds either.
There must be something really, really simple that I am missing.
Configuration:
Jenkins 2.19
Pipeline 2.3
Pipeline API: 2.3
Pipeline Groovy: 2.17
Pipeline Job: 2.6
Pipeline REST API Plugin: 2.0
Pipeline Shared Groovy Libraries: 2.3
Pipeline: Stage View Plugin: 1.7
Pipeline: Supporting APIs 2.2
SCM API Plugin: 1.2
I finally found the answer. I found a entry in the Jenkins' Jira Database that mentioned this exact issue. The issue is called SCM polling is not being performed in multibranch pipeline with Mercurial SCM. Other users chimed in too.
The answer was that Jenkins Multi-branch projects don't need to poll the SCM because indexing the branches does that for you:
Branch projects (the children) do not poll in isolation. Rather, the multibranch project (the parent folder) subsumes that function as part of branch indexing. If there are new heads on existing branches, new branch project builds will be triggered. You need merely check the box Periodically if not otherwise run in the folder configuration.
So, I need to setup reindexing of the branches. I'm not happy with this solution because it seems rather clumsy. I can add post-commit and post-push hooks in SVN and Git to trigger builds when a change takes place, and then reindex on a periodic basis (say once per hour). The problem means configuring these hooks and then keeping them up to date. Each project needs its own POST action which means updating the repository server every time a project changes. With polling, I didn't have to worry about hook maintenance.
You never mentioned setting up a webhook for your repository, so this may be the problem (or part of it).
Jenkins by itself can't just know when changes to a repository have been made. The repository needs to be configured to broadcast when changes are made. A webhook defines a URL that the repository can POST various bits of information to. Point it to a URL that Jenkins can read, and that allows Jenkins to respond to specific types of information it receives.
For example, if you were using github, you could have Jenkins listen on a url such as https://my-jenkins.com/github-webhook/. Github could be configured to send a POST as soon as a PR is opened, or a merge is performed. This POST not only symbolizes that the action was performed, but will also contain information about the action, such as a SHA, branch name, user performing the action... etc.
Both Jenkins and SVN should be capable of defining the URL they each respectively POST and listen on.
My knowledge lies more specifically with git. But this may be a good place to start for SVN webhooks: http://help.projectlocker.com/knowledge_base/topics/how-do-i-use-subversion-webhooks
Maybe you need something under version control in the base directory. Try putting a test file here http://svn.vegicorp.net/svn/toast/test.txt. That may make the poll SCM option show up.
Hudson provides the option to have a Maven build job utilize a private local repository, or use the common one from the Maven installation, i.e. one shared with other build jobs. I have the sense that our builds should use private local repositories to ensure that they are clean builds. However, this causes performance issues, particularly with respect to bandwith of downloading all dependencies for each job -- we also have the jobs configured to start with a clean "workspace", whcih seems to nuke the private maven repo along with the rest of the build space.
For daily, continuous integration builds, what are the pros and cons of choosing whether or not to use a private local maven repository for each build job? Is it a big deal to share a local repo with other jobs?
Interpreting the jenkins documentation, you would use private Maven repository if
You end up having builds incorrectly succeed, just because your have all the dependencies in your local repository, despite that fact that
none of the repositories in POM might have them.
You have problems regarding having concurrent Maven processes trying to use the same local repository.
Furthermore
When using this option, consider setting up a Maven artifact manager
so that you don't have to hit remote Maven repositories too often.
Also you could explore your scm's clean option (rather than workspace clean) to avoid this repository getting nuked.
I believe Sonatype recommends using a local Nexus instance, even though their own research shows (State of the Software Supply Chain report 2015) that less then 5% of traffic to Maven Central comes from such repositories.
To get back to the question, assuming you have a local Nexus instance and high bandwidth connectivity (tens of Gbps at least) between your build server (e.g. Jenkins) and Nexus, then I can see few drawbacks to using a private local repo, in fact I would call the decrease in build performance a reasonable trade-off.
The above said, what exactly are we trading off? We are accepting a small performance penalty on the downside and on the upside we know with 100% certainty that independent, clean builds against with our local Nexus instance as proxy works.
The latter is important because consider the scenario where the local repo on the build server (probably in the jenkins' user home directory) has an artefact that is not cached in Nexus (this is not improbable if you started off your builds against Maven Central). This out-of-sync scenario is suboptimal because it is possible to get a scenario where your cache TTL settings in Nexus means that builds fail if Nexus' upstream connectivity to Central was down temporarily.
Finally, to add more to the benefits side of the trade-off, I spent hours today getting an artefact in the shared Jenkins user .m2/repository today. Earlier on in the day upstream connectivity to Central was locally up and down for hours (mysterious issue in enterprise context). In the end I deleted the entire shared jenkins user .m2/repository so it all be retrieved from the local Nexus.
It's worth considering having builds using a local .m2/repository (in jenkins user home directory) as well as builds using private local repositories (fast and less fast builds). In my case however I may opt for private local repositories only in the first instance - I may be able to accept the penalty if I optimise the build by focussing on low hanging fruit (e.g. split up multi module build).