Nexus - Proxy repository with no storage - continuous-integration

I have two instances of Sonatype Nexus.
One of them is installed on a processing-focused machine, in other words, it has no disk space. So, I hold this instance to keep only our lighweight dependencies, such as small jars used by a lot of projects.
The other machine have enough disk space, but it is out of our network. I want to use it to keep the bigger dependencies, the ones we don't use frequently.
That said, I want to create a proxy repository into the first Nexus installation pointing to the installation that hold the biggest artifacts. But I don't want it to download the dependencies from the other, because it would cause a lot of headaches when deal to disk space. Do you guys have any idea on how to do it? I'll clarify, I need a proxy repository pointing to another Nexus installation that that don't download the artifacts when they're requested. Please, help.

Related

Download maven2 repository for offline use

We are developing offline due to limited internet resources and would like to run once every several months a whole grab of an external repository (e.g repo1.maven.org/maven2 - Disk space isn't an issue).
Today I'm using a simple POM that contains a lot of common dependencies that we are using, I've set my local maven to use a mirror to proxy thru a local nexus repository to cache locally and this is how I'm grabbing for offline use - but that isn't very effective.
I'm now looking for a command line tool that allow me to run searches on maven repositories so that I can write a script that grab them all to my local nexus installation and would like to hear if there is any or if there is another way to achieve that.
Thanks
Not a whole solution (yet) but I'm using httrack to grab the whole content of repo1.maven.org/maven2 - That is already better than nothing :)
In general, there is a goal in Maven dependency plugin called "go-offline"
So it allows to grab all the project dependencies and to store them in local .m2 repo.
You can find more information here.
If you want to run Maven and tell it to behave like the network does not exist you can run it with "-o" option (offline mode). So that if there is no dependency installed locally, Maven won't even try to go to network and bring it - but will fail the build.
On the opposite, if you want to force Maven to check and bring new versions (otherwise they already should be in your repo), you can use "-U" option.
I'm not really sure I've got the point about general search-and-download use case. Usually people install Nexus or Artifactory once in a network so that each dependency will be downloaded only once. In local development machines people usually just work with filesystem and don't maintain tools like this.
Now if you want to copy the whole repository from internet (for copying it later to some other network or something) you can just use crawlers like Apache Nutch for example or craft your own script that will recursively download all the files.

What are the consequences of always using Maven Snapshots?

I work with a small team that manages a large number of very small applications (~100 Portlets). Each portlet has its own git repository. During some code I was reviewing today, someone made a small edit, and then updated their pom.xml version from 1.88-SNAPSHOT to 1.89-SNAPSHOT. I added a comment asking if this is the best way to do releases, but I don't really know the negative consequences of doing this.
Why not do this? I know snapshots are not supposed to be releases, but why not? What are the consequences of using only snapshots? I know maven will not cache snapshots the same as non-snapshots, and so it may download the artifact every time, but let's pretend the caching doesn't matter. From a release-management perspective, why is using a SNAPSHOT version every time and just bumping the number a bad idea?
UPDATE:
Each of these projects results in a war file that will never be available on a maven repo outside of our team, so there are no downstream users.
The main reason for not wanting to do this is that the whole Maven eco-system relies on a specific definition of what a snapshot version is. And this definition is not the one you're setting in your question: it is only supposed to represent a version currently in active development, and it is not suppose to be a stable version. The consequence is that a lot of the tools built around Maven assumes this definition by default:
The maven-release-plugin will not let you prepare a release with a snapshot version as released version. So you'll need to resort to tagging by hand on your version control, or make your own scripts. This also means that the users of those libraries won't be able to use this plugin with default configuration, they'll need to set allowTimestampedSnapshots.
The versions-maven-plugin which can be used to automatically update to the latest release version won't work properly as well, so your users won't be able to use it without configuration pain.
Repository managers, like Artifactory or Nexus, comes built-in with a clear distinction of repositories hosting snapshot dependencies and release dependencies. For example, if you use shared Nexus company-wide, it could be configured to purge old snapshots so this would break things for you... Imagine someone depends on 1.88-SNAPSHOT and it is completely removed: you'll have to go back in time and redeploy it, until the next removal... Also, certain Artifactory internal repositories can be configured not to accept any snapshots, so you won't be able to deploy it there; the users will be forced, again, to add more repository configuration to point at those that do allow snapshots, which they may not want to do.
Maven is about convention before configuration, meaning that all Maven projects should try to share the same semantics (directory layout, versioning...). New developers that would access your project will be confused and lose time trying to understand why your project is build the way it is.
In the end, doing this will just cause more pain on the users and will not simplify a single thing for you. Probably, you could make it somewhat work, but when something is going to break (because of company policy, or some other future change), don't act surprised...
Tunaki gave a lot of reasonable points why you break Maven best practices, and I fully support that view. But even if you don't care about "conventions of other companies", there are reasons:
If you are not doing CI (and consider every build as potential release), you need to distinguish between versions which should go productive and those who are just for testing. If everything is SNAPSHOT, this is hard to do.
If someone (accidentally) deploys a second 1.88-SNAPSHOT, it will be the new 1.88-SNAPSHOT, hiding the old one (which is available by a concrete timestamp, but this is messy). Release versions cannot be deployed twice.

How to get artifactory to update the maven-metadata.xml for a virtual repo?

long time reader, first time asker...
I have a standalone network (no internet access). It has an artifactory server which has virtual libs-snapshot and libs-release repos. Under libs-snapshot, there are 4 local snapshot repos. The reason for this is that we get a dump of all the artifactory repos from somewhere else (non-connected), and import it to this network. But we have to modify a subset of the snapshot artifacts there. So we created another local snapshot repo, call it mine-snapshot-local (maven 2 repo, set as unique, max artifacts=1?), and added it to the top of the libs-snapshot virtual. In theory, this would allow us to modify the handful of artifacts we needed to, deploy to our own repo, and local developers would pick those up. But we would still have access to the 99% of other artifacts from the periodic dump from the other non-connected system. In addition, we can import the drops from the other network, which are concurrently being modified, on a wholesale basis without touching our standalone network repo (mine-snapshot-local). I guess we're "branching" artifactory repos...
I realize we could probably just deploy straight into one of the imported repos, but the next time we get a dump from the other network, all those custom modified artifacts would go away... so I'd really like to get this method to work if possible.
from my local eclipse, the maven plugin deploys artifacts explicitly, and without error, to the mine-snapshot-local repo. The issue I'm seeing is that the maven-metadata.xml for the virtual libs-snapshot is not being updated. The timestamp of that file is updated, and if I browse with a web browser to libs-snapshot/whatever_package, I can see my newly deployed artifacts, with newer timestamps than the existing snapshots. But the maven-metadata.xml file still contains pointers to the "older" snapshot.
maven-metadata.xml is successfully updated in the mine-snapshot-local repo, but it is as if artifactory is not merging all the metadata files together correctly for the virtual repo. Or, more likely, I have misconfigured something to cause it to ignore our top-layer local repo somehow (but why would the snapshot jar/pom still show up there?).
We are using artifactory 2.6.1 (and do not have the option to upgrade).
I've tried a number of things: setting the snapshot repos to unique, nonunique, deployer, limiting the number of snapshots, etc. None of it seems to make much of a difference.
The one thing I do see as possibly being an issue is the build number assigned to a snapshot. For example, in the imported repo, the artifact might have a timestamp that is a week old but a build number of 4355. In my new repo, when i deploy, i have a much newer timestamp, but the build number is 1 (or something much, much smaller than 4355).
Am I barking up the wrong tree by trying to have multiple local snapshot repos like this? It seems like this should be ok, but maybe not.
You are using a very (but very) old version of Artifactory and it could be that you are suffering from an issue that was long gone. The normal behavior should be that if you have 4 maven repositories and you updated/deployed new artifacts into one of those repositories, the Virtual repository should aggregate the metadata from all of the listed repositories.
Just to verify, you mentioned that you are deploying from Eclipse, are you referring to P2? If so just a side note, Artifactory will not calculate metadata for P2 artifacts.

Moving Nexus Repository, Backup Only Certain Artifacts?

I have a Sonatype Nexus repository on an older machine, and I have purchased a newer server which will become my new repository host. In the installation of Nexus on the older machine I have an extensive collection of artifacts, the vast majority of which are now obsolete and can be safely removed from Nexus.
I know it is possible for me to move all of the artifacts from the old installation into the new installation by simply copying the sonatype-work directory to the new box. My question is this: If I want to prune the artifacts in that directory down to only what I need right now (probably about 20% of the repository contents) what steps would I have to take other than deleting the unwanted artifacts? For example, would I need to force Nexus to rebuild indexes? Thanks for the help!
You could just install the new Nexus and proxy off the old one via one proxy repo in addition to Central and other repos. Then you run this for a while and only things not found in other public repositories you configure will be proxied from the old Nexus instance.
At a later stage you could run scheduled task on the old repo that removed old items.
When you are satisfied you got everything you need, you do one last backup and then take the old Nexus instance offline.
Of course the other option is to just not worry and migrate it all. In the end you really only have to migrate what you actually deployed (so probably releases and 3rd party repos).
The easiest option btw. is to just copy the whole sonatype-work folder over to the new machine and fire it up with a new Nexus install there and flick the switch.

How to debug the performance of a wrong setup of a build machine?

We have to setup new build environments regularily, and the process seems not so simple. Today I have got a new build machine, and the first Maven build was so slow, that I wanted to clarify why the performance was so bad. But how to do that?
Our context is:
We use multiple build machines, each project gets its own.
Each build machine has a similar setup, so that projects can start immediately and don't have to configure a lot.
We have the following tools preconfigured:
Hudson (currently 2.1.1, but will change)
Artifactory 2.3.3.1
Sonar
Hudson, Artifactory and Sonar have their own Tomcat configured
Maven 2.2.1 and Maven 3.0.3 (with no user configuration, only the installation has a settings.xml)
Ant 1.7.1 and Ant 1.8.2 (not relevant here)
Subversion 1.6 client
All tools should work together, especially the repository chain should be:
Build machine Maven repository
Build machine Artifactory
Central company Artifactory (is working as mirror and cache for the world)
Maven central (and other repository)
So when the Maven build needs a dependency resolved, it will be first looked-up in the local Maven repo, from there in the local Artifactory repo, then in the central Artifactory repo and only then on the internet.
We normally have to use proxies to connect to the internet, we don't need it in our intranet.
The first build (Maven Hello World) was built in around 45 minutes. In that time, all bootstrapping was happening, but I would have thought by using our chain of repositories (where the central repository is well filled), the build would be much faster. So I think the focus of the debugging will be the network, the local build is not the problem. So configuration and interaction of Maven and Artifactory is under consideration.
How do you debug such an environment? I have access to the build machine (as sudo) and to the central repository, but I do not know how to start, what to prove, where to look. So what is your experience, what are the tips and tricks you would like to share?
Here are a few things I have done up to now. If you have additional advice, you are welcome!
I suspected the chain of repositories to be the source of evil, so I addressed that first. The reasons are:
The real build on the local machine (of a hello world program) may differ in milliseconds, but not minutes.
Network makes a difference, so attack that first.
The chain of repository is interesting, if something is not found locally. Here are the steps to ensure that that is the case:
For Maven, delete the contents of the local cache. If the local cache is filled, you don't know if a resource if found in the local cache or elsewhere. (Do that at least at the end, if everything else is working again.)
For Artifactory, find that cache as well, and clean it by deleting its contents. It is only a cache, so it will be filled a new.
If you use a clever browser for measuring the lookup, ensure that what you asked for is not in the cache of the browser.
Else use a tool like wget to ask for a resource.
Try to minimize the sources for failure. So try to divide the long distance of your lookup in smaller segments that you control.
Don't use Maven for doing the lookup, start first with the Artifactory repository (only), and later then with Maven.
This led to the following tests I wanted to do. Every time I ensured that the previous prerequisits were met:
Ask for https://<my-project-artifactory>/repo/<my-pom>. Expectation:
Local lookup will fail, so has to find the resource in a remote repository in the central company Artifactory.
Possible effects could come from proxy, artifactory lookup.
Result: Lookup for a simple POM needed ~ 30 seconds. That is too much.
Remove the proxy. With wget, there is an option --no-proxy which does just that. Expection:
Faster lookup.
Result: No change at all, so proxy was not the reason.
Ask for https://<my-project-artifactory>/libs-snapshots-company/<my-pom>. So change the virtual repository to a real remote repository. Expectation:
Artifactory knows where to do the lookup, so it will be much faster.
Result: POM was found immediately, so the 30 seconds are Artifactory doing lookup. But what could be the reason for that?
Removed in Artifactory all remote and virtual repositories (only left our companies ones and the cached Maven central). But use again https://<my-project-artifactory>/repo/<my-pom>. Expectation:
Artifactory will find the repository much faster.
Result: POM came in an instant, not measurable.
I was then courageous and just started the build (with empty cache locally). The build needed then 5 seconds (instead of 15 minutes the same morning).
So I think I have now better understood what can go wrong, a lot of questions are remaining. Please add your ideas as answers, you will get reputation for them!

Resources