Getting leiningen to cache packages - caching

In a clojurescript project I'd like leiningen to be less reliant on the internet connection during our CI builds. I was hoping to get it to cache packages on a network disc (using the :local-repo setting to create a "shared cache") and then add it as a repository in such way that it fetches from there first and only from clojars and other external sites when it can't find it in the "shared cache".
I read this, removed my ~/.m2 folder, and added the following to my project.clj:
:profiles {:local-cache
{:local-repo "/shared/disc/clojars-cache"
:repositories {"local" {:uri "file:///shared/disc/clojars-cache"
:releases {:checksum :ignore}}}}}
The initial build with lein with-profile +local-cache cljsbuild does indeed populate the cache, but
My ~/.m2/repository folder is recreated and filled with stuff, though it seems to only be the clojure stuff needed by leiningen, and
after removing ~/.m2 subsequent rebuilds don't seem to use the local repository at all but instead download from clojars anyway.
Clearly I'm missing something... or maybe I'm going about this in the completely wrong way.
In short, how can I get leiningen to
create a cache of packages on a network disc, and
get it to prefer this cache as source of packages (over external sources like clojars)?

Leiningen already prefers to go to ~/.m2 by default. It will only go to Clojars if it doesn't already have a copy of the JAR that is requested stored locally in its ~/.m2. The exception to this rule is if you are specifying SNAPSHOT versions, where it will go out to the network to check if the SNAPSHOT version it has is the latest once a day (by default).
You can set the :offline? key as true if you don't want Leiningen to go to the network at all.
Answering your questions:
How can I get Leiningen to create a cache of packages on a network disk?
Leiningen already creates a cache of packages in ~/.m2. You could symlink that directory to your network disk, or use :local-repo as you are now, although it sounds like :local-repo isn't working for you?
How can I get Leiningen to prefer this cache over external sources?
Leiningen already does this. It sounds like :local-repo either isn't working, hasn't been configured correctly, or that directory isn't writable by Leiningen?
Stepping back to look at the wider problem, you're wanting to prevent unnecessary network traffic in your CI builds. Leiningen already caches every dependency by default. You haven't said which CI tool you're using, but they should all have the ability to cache the ~/.m2 folder between runs. Depending on the tool, you'll have to download your deps either once per project, or once per machine. I'd recommend sticking with that, rather than trying to share deps over the network, as that could lead to hard to debug test failures.
If that's not going to work for you, can you provide more details about your setup, and why you'd like Leiningen to be less reliant on the network in your CI builds?
UPDATE: After seeing that Gitlab CI is being used, it looks like you need to add a caching config?
cache:
paths:
- ~/.m2

Related

How can I pare down the size of my .gradle directory and still work offline?

I would like to enable development of a Kotlin project on a machine that, for security reasons, does not have a network connection. Both the source and target machines are Windows. I would like to transport the .gradle directory from my home directory, with all the build depencies cached, onto this offline machine, but I discovered the that the .gradle is enormous, like 3-4GB. I am also required to do a virus scan which, again, is for security reasons and it takes forever.
I would like to figure out if there are any files/directories in my .gradle which, perhaps, were downloaded as intermediary steps to set up Gradle but are no longer needed. Are there any of significant size that I could delete and an offline build would still work?

How to avoid Gradle wrapper downloading distro when running in Gradle docker image?

My project is built with gradlew. GitLab CI builds the project in a docker runner with an official Gradle image (see https://hub.docker.com/_/gradle).
Now even though Gradle is already installed in the cointainer, the wrapper will still download the distro every time. This makes up the majority of the build time.
How can I "tell" the wrapper about the already installed distro, so that it will not redownload it (assuming of course the versions match)?
Of course the alternative is to use gradle instead of gradlew in CI and rely on the docker image to have the correct distro but I'd like to avoid this if possible, as I would then have to manually keep .gitlab-ci.yml and the wrapper config in sync.
I don't think you cant instruct the wrapper to use a local version of Gradle that was installed manually.
The only approach I can think of to prevent downloading the distribution on every build, that doesn't involve additional steps when upgrading Gradle, is to cache the Gradle home folder (e.g. /home/gradle/.gradle). This should be possible even if it resides in a Docker container.
I don't know the details of how GitLab supports caching, but it probably only makes sense if the cache is stored locally (either on the same machine or in a cache server with high network bandwidth). If it has to be uploaded and downloaded from something like an S3 bucket on every build, that would likely take as much time as downloading it from services.gradle.org. But if you can make this work, you will not only cache the Gradle distribution but also the build dependencies, which should further speed up the build.

Download maven2 repository for offline use

We are developing offline due to limited internet resources and would like to run once every several months a whole grab of an external repository (e.g repo1.maven.org/maven2 - Disk space isn't an issue).
Today I'm using a simple POM that contains a lot of common dependencies that we are using, I've set my local maven to use a mirror to proxy thru a local nexus repository to cache locally and this is how I'm grabbing for offline use - but that isn't very effective.
I'm now looking for a command line tool that allow me to run searches on maven repositories so that I can write a script that grab them all to my local nexus installation and would like to hear if there is any or if there is another way to achieve that.
Thanks
Not a whole solution (yet) but I'm using httrack to grab the whole content of repo1.maven.org/maven2 - That is already better than nothing :)
In general, there is a goal in Maven dependency plugin called "go-offline"
So it allows to grab all the project dependencies and to store them in local .m2 repo.
You can find more information here.
If you want to run Maven and tell it to behave like the network does not exist you can run it with "-o" option (offline mode). So that if there is no dependency installed locally, Maven won't even try to go to network and bring it - but will fail the build.
On the opposite, if you want to force Maven to check and bring new versions (otherwise they already should be in your repo), you can use "-U" option.
I'm not really sure I've got the point about general search-and-download use case. Usually people install Nexus or Artifactory once in a network so that each dependency will be downloaded only once. In local development machines people usually just work with filesystem and don't maintain tools like this.
Now if you want to copy the whole repository from internet (for copying it later to some other network or something) you can just use crawlers like Apache Nutch for example or craft your own script that will recursively download all the files.

$GOPATH directory and diskspace

after a year or two developing with GO I had a look at my $GOPATH directory size and was surprised to see it was growing to a 4GB.
I understand that disk space is supposed (and it's not that much on this 128GB laptop SSD) to be cheap but still.
So my questions is, is there some good practice to manage the size $GOPATH directory ?
Would restarting from scratch would be a good idea? (although could be very much time consuming)
Your statement, that deleting the GOPATH would be time consuming, sounds like you are managing the dependencies of your Go projects with plain go get .... In my environment I do two things to make the GOPATH ephemeral.
Don't use plain go get for dependency management. Go 1.6 introduced the /vendor directory. Together with a dependency management tool like Glide every dependency for a certain project lies withing the project directory.
This means, if don't need a dependency anymore you can wipe the vendor directory of the project and download the dependencies again. Therefore you only have dependencies on your disk, which you actually need.
Also, if you stop working on a project and delete it from your disk, the dependencies also get deleted.
You can specify multiple paths in the GOPATH. Like the PATH environment variable you can split them by a colon. The interesting thing is, that Go uses the first path for downloading projects. On my machine the GOPATH looks like $HOME/.gopath:$HOME/projects. So if you put all your actual projects into the second directory, you will have an clear separation between your projects and dependencies. Thus you could wipe the first directory from time to time, but don't need to fear that you have to clone every of your projects, again.
These two things don't help to reduce the disk usage of your Go projects and their dependencies, but you get a better overview whats actually needed and makes you able to delete the dependency directory whenever you like.

How to debug the performance of a wrong setup of a build machine?

We have to setup new build environments regularily, and the process seems not so simple. Today I have got a new build machine, and the first Maven build was so slow, that I wanted to clarify why the performance was so bad. But how to do that?
Our context is:
We use multiple build machines, each project gets its own.
Each build machine has a similar setup, so that projects can start immediately and don't have to configure a lot.
We have the following tools preconfigured:
Hudson (currently 2.1.1, but will change)
Artifactory 2.3.3.1
Sonar
Hudson, Artifactory and Sonar have their own Tomcat configured
Maven 2.2.1 and Maven 3.0.3 (with no user configuration, only the installation has a settings.xml)
Ant 1.7.1 and Ant 1.8.2 (not relevant here)
Subversion 1.6 client
All tools should work together, especially the repository chain should be:
Build machine Maven repository
Build machine Artifactory
Central company Artifactory (is working as mirror and cache for the world)
Maven central (and other repository)
So when the Maven build needs a dependency resolved, it will be first looked-up in the local Maven repo, from there in the local Artifactory repo, then in the central Artifactory repo and only then on the internet.
We normally have to use proxies to connect to the internet, we don't need it in our intranet.
The first build (Maven Hello World) was built in around 45 minutes. In that time, all bootstrapping was happening, but I would have thought by using our chain of repositories (where the central repository is well filled), the build would be much faster. So I think the focus of the debugging will be the network, the local build is not the problem. So configuration and interaction of Maven and Artifactory is under consideration.
How do you debug such an environment? I have access to the build machine (as sudo) and to the central repository, but I do not know how to start, what to prove, where to look. So what is your experience, what are the tips and tricks you would like to share?
Here are a few things I have done up to now. If you have additional advice, you are welcome!
I suspected the chain of repositories to be the source of evil, so I addressed that first. The reasons are:
The real build on the local machine (of a hello world program) may differ in milliseconds, but not minutes.
Network makes a difference, so attack that first.
The chain of repository is interesting, if something is not found locally. Here are the steps to ensure that that is the case:
For Maven, delete the contents of the local cache. If the local cache is filled, you don't know if a resource if found in the local cache or elsewhere. (Do that at least at the end, if everything else is working again.)
For Artifactory, find that cache as well, and clean it by deleting its contents. It is only a cache, so it will be filled a new.
If you use a clever browser for measuring the lookup, ensure that what you asked for is not in the cache of the browser.
Else use a tool like wget to ask for a resource.
Try to minimize the sources for failure. So try to divide the long distance of your lookup in smaller segments that you control.
Don't use Maven for doing the lookup, start first with the Artifactory repository (only), and later then with Maven.
This led to the following tests I wanted to do. Every time I ensured that the previous prerequisits were met:
Ask for https://<my-project-artifactory>/repo/<my-pom>. Expectation:
Local lookup will fail, so has to find the resource in a remote repository in the central company Artifactory.
Possible effects could come from proxy, artifactory lookup.
Result: Lookup for a simple POM needed ~ 30 seconds. That is too much.
Remove the proxy. With wget, there is an option --no-proxy which does just that. Expection:
Faster lookup.
Result: No change at all, so proxy was not the reason.
Ask for https://<my-project-artifactory>/libs-snapshots-company/<my-pom>. So change the virtual repository to a real remote repository. Expectation:
Artifactory knows where to do the lookup, so it will be much faster.
Result: POM was found immediately, so the 30 seconds are Artifactory doing lookup. But what could be the reason for that?
Removed in Artifactory all remote and virtual repositories (only left our companies ones and the cached Maven central). But use again https://<my-project-artifactory>/repo/<my-pom>. Expectation:
Artifactory will find the repository much faster.
Result: POM came in an instant, not measurable.
I was then courageous and just started the build (with empty cache locally). The build needed then 5 seconds (instead of 15 minutes the same morning).
So I think I have now better understood what can go wrong, a lot of questions are remaining. Please add your ideas as answers, you will get reputation for them!

Resources