Guidelines when splitting artifact repositories - maven

I am looking for an article which describes a set of guidelines to follow when creating repositories in an artifact repository manager.
I know that:
You need to keep snapshots in snapshot repositories.
You need to keep releases in release repositories.
Third-party artifacts should be in a separate repository (the same goes for forked/patched
versions of third-party libraries).
It's generally a good idea to prefix the names with int-* and ext-*.
Usually different product lines end up having their own repositories as sometimes their artifacts don't depend on each other.
I've been trying to find an article on this to illustrate to a client how this artifact separation abstraction is done by other companies and organizations using repositories.
Many thanks in advance!

I am not aware of existence of such an article, but as #tieTYT mentioned, you can look at Artifactory default repositories. They reflect years of experience in binaries management, continuous integration and delivery.
Those practices still apply even if you use Nexus (and you can observe them even without installing Artifactory, by looking at JFrog public Artifactory instance http://repo.jfrog.org)
For your convenience, here are the defaults (important usage emphasised):
Local Repositories:
libs-snapshot-local: Deploy here your local snapshots
libs-release-local: Deploy here your local releases
ext-snapshot-local: Deploy here 3rd-party snapshots which aren't available in remote repos
ext-release-local: Deploy here 3rd-party releases which aren't available in remote repos
plugins-snapshot-local: Deploy here your plugin (usually, maven) snapshots
plugins-release-local: Deploy here your plugin (usually, maven) releases
Remote Repositories:
jcenter: proxy of http://jcenter.bintray.com. Normally, that's the only remote repo you'll need. It includes whatever exists in maven central plus all other major maven repositories
Virtual Repositories:
remote-repos: aggregation of all the remote repositories
libs-release: this is the resolution repository for release builds. It includes remote-repos, libs-release-local and ext-release-local
libs-snapshot: this is the resolution repository for snapshot builds. It includes remote-repos, libs-snapshot-local and ext-shapshot-local
repo: this is special virtual repository, that aggregates everything. Generally, do not use it, if you ever plan building release pipeline using binary repository.
I'll be glad to advice on specific question.

As is the case with many questions about best practices, the answer is: It depends.
Technically there are only two distinctions that are required:
Snapshot vs release repo
Hosted vs proxy repository
Snapshots vs release repositories as a distinction is required since the Maven repository format and therefore Maven and other build tools differentiate how they work with the the meta data and what they do during upload.
For proxy repositories you will just have to add as many you need to proxy. This will depend on what components you require and will be separate for proxying snapshot and release repos.
For hosted repositories you also have to have separate snapshot and release repos. Beyond that is is all up for grabs. Having a separate third party repo as preconfigured in Nexus (and Artifactory) and other setups are certainly useful, but not really necessary. You can have all those distinctions sorted out by internal meta data where required.
Along the same lines you can have one release repo for everyone or one for each team or whatever. You can still apply access rights within those repositories to separate access and so on in Nexus with repository targets. I assume Artifactory and Archiva can do something similar. The question here mostly boils down to ease of administration, backups, security setup and access for users.
Naming conventions like you mentioned can help if you want to have separate repositories, but technically none of this is necessary.
Other things I have seen are e.g. migration repos that are used to migrate legacy project libraries into a repo but become frozen after the migration is done, separate repos per team, separate repos per project and so on. Another aspect are separate repos for different levels of approval and so on (e.g. check out problems with that on http://blog.sonatype.com/people/2013/10/golden-repository/)
In the end however this all hinges really on usability and meta data and is not required. Ultimately these repositories will in most cases grouped together and accessed via one group, which flattens out the whole separation. And access rights still carry through into the group so everything can still be controlled as you like. So it turns to be a matter of taste on how you want to slice and dice and manage it.
PS: I am referring to the Maven repositories and format. Once you add a whole bunch of other formats into the mix and wrappers around them exposing them in other formats, everything gets more complicated, but the ideas behind things stay similar.

Related

Can I keep Maven local repository on another machine and use it in my project?

Where are Maven and pom.xml file kept in a real-time project if the code is at GitHub. I mean can I keep my local repository somewhere in another machine and use it in my project. If yes, how?
Local repositories are not meant for sharing. They are also not "thread-safe" in any way, so accessing them simultaneously from two different builds might break things.
They are populated by the artifacts Maven downloads from MavenCentral and other repositories, and also the stuff you build yourself. As they are more or less a form of cache, there is no need to share them.
If you need a repository that is used from different machines or by different users, set up a Nexus/Artifactory server.

Parent and child repositories in nexus

We use sonatype nexus. We want to have common repository that will hold all the artifacts in our organization and the child repositories for each project in organization. The goal we want to achieve is to have all artifacts physically located in common repository and some(I dont know how to correct call it in terms of nexus) links from project repositories to common repositories, so the project repositories will not have physically located artifacts just links to them.
Why we need this? Just to separate artifacts as per project using, but not have cloned artifacts in each project repos.
I've analysed proxy, virtual types of repos and went through nexus documentation. Is this actually doable in nexus?
As this post explained, there really only two ways to design your repositories layout: one per project/team or single repo for the entire organization but partitioning by the group id: https://support.sonatype.com/hc/en-us/articles/213465778-What-approach-should-I-use-to-restrict-access-to-artifacts-in-Nexus-
Use one repo per project, then group them, so they all can be referenced from one single group URL. Access control can be done at per repo level so only this project can upload to this repo.
Use one single repo for the entire organization but partitioned by the group id, e.g. it will look like this, org.yourcompany.projectname.artifactid. Then you can define the repository target .projectname. to access control to this partition.
I think you actually want a Group, not a repository. A repo represents a single root directory on a disk or is a proxy of a single repo elsewhere accessible by http/s. These things are not nestable.
However, groups hold many repositories. You could create a group for each project team, containing only the repos they need, each of which could be separated by whatever criteria you want.
For example, you could have a repo that holds java DAO libs for sql databases, and another repo that holds java DAO libs for no-sql databases, yet another that contains SOAP apis, and another that holds REST apis. You could then create two groups -- say, 'modern' and 'old-school' and assign the appropriate repos to each. You could give access to the 'old-school' group to your 'serious' java devs, and 'modern' to your android script-kiddies.
I'm not suggesting that this is a particularly good breakdown -- it's just an example.
At one place where I worked we had internationally separated teams, and each had access to their own libs and central, so we had a group per country. In another place we had mobile dev and server dev, each requiring their own groups.

How to set a proxy to a public maven repository is read-only mode

I want to setup a development environment that allows reusing some artifacts from public Maven repositories like Maven Central, Code Haus. Specifically, I like the concept of transitive dependencies.
In our company, our production network cannot export any data outside, but we can push data inside. We already have some gateways to copy file from the outside into our network. Therefore, I could use this to copy the required packages manually but we would miss the power of maven. In our case, the perfect solution would be to be able to get data from public repository but be forbidden to deploy to the external repo.
So I would like to have your expert view on this problem.
We can use various means, as long as the capability to export data outside our network is guarantee:
External packages are created on a disk area that is read-only from production servers.
Some HTTP requests are filtered.
Using a repository manager, as Nexus.
In the repository management guide, Nexus talks about this possibility (http://books.sonatype.com/nexus-book/reference/confignx-sect-manage-repo.html). I would like a confirmation from you guys about how secure it is. Specifically, this has to be updated only by the IT manager.
Regards,
Loïc.
This is completely feasible and a common setup with Nexus. Here are the steps roughly.
Lock all developers and CI server inside the network disallowing direct access to outside servers
Setup Nexus to proxy external repositories like Central as desired
Allow Nexus to reach to those external repositories via the proxy
Configure developers and CI server machines to access Nexus to get the dependencies (and transitive dependencies) as desired
Optionally you can also
Configure CI servers to deploy any internal packages to Nexus
Configure deployment tools to get components for deployment from Nexus
Also note this can be done via different repository formats and toolchains. The common one is Maven, but Nexus also supports NPM, Nuget, Rubygems, sites, YUM and others.
And if you want to make some of your packages in Nexus available to the outside you can configure this as well following multiple options.
Also note that a proxy repository is by definition read only in terms of deployments to it directly. Thats what a hosted repository is for...

Start to use artifactory

in company where I am working we are starting to use artifactory like tool of repositories managment, and then I'm reading the user guide of this tool. We started in the configuration creating a virtual repository, a few local and remote repositories. On the use guide i found the following thing:
Prevent disclosing sensitive business information derived from your artifact queries to whomever can intercept the queries, including the
owners of the remote repository itself.
I saw that this could be avoided through
exclude pattern
functionality on the virtual repository. Can you give us some suggestion about this? What kinds of request we should avoided to do?
You should avoid requests for internal artifacts being sent to remote repositories (directly or via virtuals). This can happen when projects depends on internal libraries or within multi module projects where modules depends on each other. When working with virtual repositories Artifactory will always search for such artifacts in local repositories first. However, if someone asked for a wrong version or had a typo in the artifact name, the artifact will not be found in a local repository and Artifactory will try to look for it in the remote repositories configured in this virtual.
To avoid exposing sensitive business information as described above, we strongly recommend the following best practices:
The list of remote repositories used in an organization should be managed under a single virtual repository to which all requests are directed
All internal artifacts should be specified in the Excludes Pattern field of the virtual repository (or alternatively, of each remote repository) using wildcard characters to encapsulate the widest possible specification of internal artifacts.
Assuming all of your projects/modules are using some kind of namespace, for example com.mycompany, you can configure an exclusion pattern for artifacts under this namespace: com/mycompany/**.
For more information take a look at avoiding security risks with an excludes pattern

What are the drawbacks to using a private local repository for Hudson jobs?

Hudson provides the option to have a Maven build job utilize a private local repository, or use the common one from the Maven installation, i.e. one shared with other build jobs. I have the sense that our builds should use private local repositories to ensure that they are clean builds. However, this causes performance issues, particularly with respect to bandwith of downloading all dependencies for each job -- we also have the jobs configured to start with a clean "workspace", whcih seems to nuke the private maven repo along with the rest of the build space.
For daily, continuous integration builds, what are the pros and cons of choosing whether or not to use a private local maven repository for each build job? Is it a big deal to share a local repo with other jobs?
Interpreting the jenkins documentation, you would use private Maven repository if
You end up having builds incorrectly succeed, just because your have all the dependencies in your local repository, despite that fact that
none of the repositories in POM might have them.
You have problems regarding having concurrent Maven processes trying to use the same local repository.
Furthermore
When using this option, consider setting up a Maven artifact manager
so that you don't have to hit remote Maven repositories too often.
Also you could explore your scm's clean option (rather than workspace clean) to avoid this repository getting nuked.
I believe Sonatype recommends using a local Nexus instance, even though their own research shows (State of the Software Supply Chain report 2015) that less then 5% of traffic to Maven Central comes from such repositories.
To get back to the question, assuming you have a local Nexus instance and high bandwidth connectivity (tens of Gbps at least) between your build server (e.g. Jenkins) and Nexus, then I can see few drawbacks to using a private local repo, in fact I would call the decrease in build performance a reasonable trade-off.
The above said, what exactly are we trading off? We are accepting a small performance penalty on the downside and on the upside we know with 100% certainty that independent, clean builds against with our local Nexus instance as proxy works.
The latter is important because consider the scenario where the local repo on the build server (probably in the jenkins' user home directory) has an artefact that is not cached in Nexus (this is not improbable if you started off your builds against Maven Central). This out-of-sync scenario is suboptimal because it is possible to get a scenario where your cache TTL settings in Nexus means that builds fail if Nexus' upstream connectivity to Central was down temporarily.
Finally, to add more to the benefits side of the trade-off, I spent hours today getting an artefact in the shared Jenkins user .m2/repository today. Earlier on in the day upstream connectivity to Central was locally up and down for hours (mysterious issue in enterprise context). In the end I deleted the entire shared jenkins user .m2/repository so it all be retrieved from the local Nexus.
It's worth considering having builds using a local .m2/repository (in jenkins user home directory) as well as builds using private local repositories (fast and less fast builds). In my case however I may opt for private local repositories only in the first instance - I may be able to accept the penalty if I optimise the build by focussing on low hanging fruit (e.g. split up multi module build).

Resources