How can I make one single `.gradle` cache for multiple projects? - gradle

We are trying to use one single .gradle cache among our multiple build workers (in jenkins) by creating .gradle in NFS mount which is shared with all the workers.
Now when we run multiple projects using gradle builds, they get failed with following errors:
Timeout waiting to lock artifact cache (/common/user/.gradle/caches/modules-2). It is currently in use by another Gradle instance.
Owner PID: 1XXXX
Our PID: 1XXXX
Owner Operation: resolve configuration ':classpath’
Our operation: resolve configuration ':classpath’
Lock file: /common/user/.gradle/caches/modules-2/modules-2.lock
What is the suggestive method to use .gradle cache sharing among multiple users. This model works fine for maven .m2 cache.
We cannot have .gradle for each workers as it occupies lot of space to store the jars in cache.

Because of the locking mechanism Gradle uses for its dependency cache, you can't have multiple instances write to the same cache directory.
However, you can create a shared, read-only dependency cache that can be used by multiple Gradle instances. You can find instructions in the docs. The basic mechanism is to create a folder that's pre-populated with the dependencies you think your builds will need, then set the GRADLE_RO_DEP_CACHE environment variable to point to that folder.
This cache, unlike the classical dependency cache, is accessed without locking, making it possible for multiple builds to read from the cache concurrently.
Because this cache is read-only, you would need to add dependencies to it beforehand. The builds themselves can't write their dependencies back to the read-only shared cache. The cache needs to follow the folder structure that Gradle expects, though, which isn't something that can really be set up by hand. In practice the way to get a working shared cache is to copy the dependency cache that was created by an existing Gradle instance.
The read-only cache should be sourced from a Gradle dependency cache that already contains some of the required dependencies. [...] In a CI environment, it’s a good idea to have one build which "seeds" a Gradle dependency cache, which is then copied to a different directory. This directory can then be used as the read-only cache for other builds.
The shared cache doesn't need to contain all of the dependencies, though. Any that are missing will be fetched by each individual build as normal, as if the shared cache wasn't there.
https://docs.gradle.org/current/userguide/dependency_resolution.html#sub:shared-readonly-cache

Using "ascii" graphics in the gradle manual isn't very instructive, but there they say:
run a regular gradle build.
now go into, on windows, %USERPROFILE%.gradle\caches, where you find a folder named 'modules-2'
grab the modules-2 folder, as is, move it into a directory accessible to all your builds, so that you have <mygradle_ro_cache>\modules-2...
delete any .lock or gc.* files from <mygradle_ro_cache>\modules-2\
set the env variable GRADLE_RO_DEP_CACHE to <mygradle_ro_cache>
Done.

Related

How to clear Module database in MarkLogic

What is the gradle task to clear all modules in a MarkLogic database?
I have tried mlClearDatabase, but it didn't work.
mlClearDatabase will clear the content database.
The task that you are looking for to clear the modules database is:
mlClearModulesDatabase - if the application exists, clear its modules database; otherwise do nothing
If you are clearing the modules in order to ensure that you are deploying to a fresh modules database, then you might want to use mlReloadModules, which will invoke mlClearModules and then mlLoadModules.
https://github.com/marklogic-community/ml-gradle/wiki/Task-reference#modules
mlClearModulesDatabase
gradle doesn't gurantee complete Modules database cleanup if other App server has dependency on that Modules database
mlClearDatabase -Pdatabase={db-name} -Pconfirm=true
gradle will clear the said database in force mode, for that reason, -Pconfirm=true is used. If other App servers have dependency on the cleared Modules database, your application will fail.
It is very true that mlReloadModules is the right way to deploy/redeploy modules.

How to cache job results only for running pipeline?

I wrote a pipline to build my Java application with Maven. I have feature branches and master branch in my Git repository, so I have to separate Maven goal package and deploy. Therefore I created two jobs in my pipeline. Last job needs job results from first job.
I know that I have to cache the job results, but I don't want to
expose the job results to GitLab UI
expose it to the next run of the pipeline
I tried following solutions without success.
Using cache
I followed How to deploy Maven projects to Artifactory with GitLab CI/CD:
Caching the .m2/repository folder (where all the Maven files are stored), and the target folder (where our application will be created), is useful for speeding up the process by running all Maven phases in a sequential order, therefore, executing mvn test will automatically run mvn compile if necessary.
but this solution shares job results between piplines, see Cache dependencies in GitLab CI/CD:
If caching is enabled, it’s shared between pipelines and jobs at the project level by default, starting from GitLab 9.0. Caches are not shared across projects.
and also it should not be used for caching in the same pipeline, see Cache vs artifacts:
Don’t use caching for passing artifacts between stages, as it is designed to store runtime dependencies needed to compile the project:
cache: For storing project dependencies
Caches are used to speed up runs of a given job in subsequent pipelines, by storing downloaded dependencies so that they don’t have to be fetched from the internet again (like npm packages, Go vendor packages, etc.) While the cache could be configured to pass intermediate build results between stages, this should be done with artifacts instead.
artifacts: Use for stage results that will be passed between stages.
Artifacts are files generated by a job which are stored and uploaded, and can then be fetched and used by jobs in later stages of the same pipeline. This data will not be available in different pipelines, but is available to be downloaded from the UI.
Using artifacts
This solution is exposing the job results to the GitLab UI, see artifacts:
The artifacts will be sent to GitLab after the job finishes and will be available for download in the GitLab UI.
and there is no way to expire the cache after finishing the pipeline, see artifacts:expire_in:
The value of expire_in is an elapsed time in seconds, unless a unit is provided.
Is there any way to cache job results only for the running pipline?
There is no way to send build artifacts between jobs in GitLab that only keeps them as long as the pipeline is running. This is how GitLab has designed their CI solution.
The recommended way to send build artifacts between jobs in GitLab is to use artifacts. This feature always upload the files to the GitLab instance, that they call the coordinator in this case. These files are available through the GitLab UI, as you write. For most cases this is a complete waste of space, but in rare cases it is very useful as you can download the artifacts and check why your pipeline broke.
The artifacts are available for download by project members that are at least Reporters, but can be viewed by everybody if public pipelines is enabled. You can read more about permissions here.
To not fill up your hard disk or quotas, you should use an expire_in. You could set it to just a few hours if you really don't want to waste space. I would not recommend this though, as if a job that depend on these artifacts fails and you retry it, if the artifacts have expired, you will have to restart the whole pipeline. I usually put this to one week for intermediate build artifacts as that often fits my needs.
If you want to use caches for keeping build artifacts, maybe because your build artifacts are huge and you need to optimize it, it should be possible to use CI_PIPELINE_ID as the key of the cache (I haven't tested this):
cache:
key: ${CI_PIPELINE_ID}
The files in the cache should be stored where your runner is installed. If you make sure that all jobs that need these build artifacts are executed by runners that have access to this cache, it should work.
You could also try some of the other predefined environment variables as key our your cache.

Can I keep Maven local repository on another machine and use it in my project?

Where are Maven and pom.xml file kept in a real-time project if the code is at GitHub. I mean can I keep my local repository somewhere in another machine and use it in my project. If yes, how?
Local repositories are not meant for sharing. They are also not "thread-safe" in any way, so accessing them simultaneously from two different builds might break things.
They are populated by the artifacts Maven downloads from MavenCentral and other repositories, and also the stuff you build yourself. As they are more or less a form of cache, there is no need to share them.
If you need a repository that is used from different machines or by different users, set up a Nexus/Artifactory server.

spring cloud data flow running out of disk space

We have large number of tasks(~30) kicked off by SCDF on PCF, however we are running to disk space issues with SCDF, the issue appears to be due to SCDF downloading artifacts each time a task is invoked.
The artifacts in our case are downloaded from an rest endpoint https://service/{artifact-name-version.jar} (which inturn serves it from an S3 repository)
Every time a task is invoked, it appears that SCDF downloads the artifact (to ~tmp/spring-cloud-deployer directory)verifies the sha1 hash to make sure it's the latest before it launches the task on PCF
The downloaded artifacts never get cleaned up
It's not desirable to download artifacts each time and fill up disk space in ~tmp/ of SCDF instance on PCF.
Is there a way to tell SCDF not to download artifact if it already exists ?
Also, can someone please explain the mechanism of artifact download, comparing sha1 hash and launching tasks (and various options around it)
Thanks !
SCDF downloads the artifacts for the following reasons at the server side.
1) Metadata (application properties) retrieval - if you have an explicit metadata resource then only that is downloaded
2) The corresponding deployer (local, CF) eventually downloads the artifact before it sends the deployment request/launching request.
The hash value is used for unique temp file creation when the artifact is downloaded.
Is there a way to tell SCDF not to download artifact if it already exists?
The HTTP based (or any explicit URL based other than maven, docker) artifacts are always downloaded due to the fact that the resources in a specific URL can be replaced with some other resource and we don't want to use the cache in this case.
Also, We recently deprecated the use of cache cleanup mechanism as it wasn't being used effectively.
If your use case (with this specific disk space limitation can't handle caching multiple artifacts) requires this cleaning of cache feature, please create a Github request here
We were also considering the removal of HTTP based artifact after it is deployed/launched. Looks like it is worth to revisit that now.

Howto handle infinispan cache creating and deployment

We have a infinispan cluster serving as cache server for our applications. Every time we need a new cache, we have to edit the config files, and redeploy the cluster, which is problematic. For obvious reasons, we don't want to redeploy the cache cluster.
We can add the new cache definition through web interface, or cli. But it has downside of not recording this configuration in a repo. Ideally I want to be able to add cache definitions in a way that is persistent in my code repo. So that in case of a disaster, I can simply redeploy the cache cluster.
We looked into creating cache definition through the source code, at application startup, but that doesn't seems to be possible.
Does anyone has an idea about the best practises for this issue?
After some R&D, this is what we found:
Programatic creation of the caches, are possible through jcache implementation in Infinispan, but we could not find a way to properly configure it. End result is just an empty cache definition, with no properties
What we ended up doing is to create caches using jboss cli. Use an script to create the cache definitions, and commit that script to version control system. This way you can recreate your cache server anytime by rerunning that script. The downside of this approach is that you are going to need to install jboss-cli on your deploying machine - CI probably- which is very inconvenient. We just decided to do this step manually for time being.

Resources