How does the indexing of Maven artifact repositories work - maven

I would like to understand how the indexing for the artifact repositories like Nexus and Artifactory works. What benefit does it provide? I mean -- how does it help and what is the logic that's used when resolving artifacts?
My understanding is that the Lucene indexes contain information concerning which artifacts are presents in a given proxied repository or group and that once these indexes have been downloaded, you can easily check if a remote repository contains the artifact you're looking for and you can try to resolve it from the repositories which have it. Is this the only use? Is the index also queried for local resolutions (because each repository does have an index)...? How does this actually work?

Artifactory doesn't use indexes for searching. We believe that indexes are the thing of the past, when machines were slow and couldn't handle large searches on the server side. Here is only partial list of why search indexes are bad:
Client need to download huge files before searching
The indexes are updated too rare to reflect frequent changes
System with search indexes requires special client to perform the search against
The client it toughly coupled with the index format.
Nowdays, when servers like Artifactory can provide real-time searching, exposed via UI for humans an API for tools like IDEs, the indexes are obsolete and supported in Artifactory only for compatibility with tools like m2eclipse.

Repository indexing is all about searching. The Maven Eclipse plugin documentation describes the functionality:
http://books.sonatype.com/m2eclipse-book/reference/repository-sect-repo-view.html#d5e1169
Maintaining a server-side index makes Maven client operation more efficient. Server-side repository managers can use indexes to enable search interfaces and REST APIs for retrieving artifacts (Sonatype Nexus doesn't need a database).

As Mark already said, Maven Index is all about searching (either server side, where search is exposed over UI, or using REST) or client side like for example M2E does (typical example is code completion in POM editor, where context hints uses index to provide you Gs, As and Vs while adding dependencies for example).
Nexus does NOT use index to fulfil it's main functionality: serving up artifacts and/or proxying them, while it DOES maintain the index on the fly. Again, indexes are not used in "resolution" or any other way, except for Search UI and downstream publishing reason (for clients like M2E is).
For example "client side" usage of Maven Indexer, you can look at the examples here.
HTH,
~t~

Related

Can I create a new Jfrog Artifactory package type plugin?

I want to store RTL modules (Mostly VHDL files - .vhd) in Artifactory, and be able to trace the dependencies of those packages with Xray and the other Jfrog services.
I already have a pretty clean "package" format, I just want to have Artifactory parse my meta-data files on upload that are part of that package, the same way it does control files in a debian package.
Is this possible? And where would I start?
Since you mentioned Xray, it’s important to note that XRay supports only the certain types of files it supports (and these have to reside within a supported repository type). I’m not sure what you mean by “tracking dependencies” but I should note that XRay is mostly good at scanning code components and identifying vulnerabilities.
To simply track dependencies (i.e methodically know which dependencies are associated with a certain package) you can use the various Build Info integrations. Read about this here:
https://www.jfrog.com/confluence/display/RTF/Build+Integration
If you associate your files with a certain build info object (a metadata object that stores build-related information), you’ll be able to track build artifacts and dependencies in the Artifactory UI and even query for them using the Artifactory Query Language. There are various options (CI plugins) depending on which CI server you are using, but in general all of the JFrog CI plugins serve the same purpose, which is uploading your content to Artifactory and keeping track of build metadata, such as build dependencies.
With regards to your question, you didn’t elaborate on your end goal but you should be able to achieve this using a User Plugin. User Plugins can be used to extend Artifactory’s built-in capabilities and add your own business logic to procedures in Artifactory (like a deployment of a file). You can read more about this here:
https://www.jfrog.com/confluence/display/RTF/User+Plugins
There are many examples on our public Github repository that will probably help:
https://github.com/JFrogDev/artifactory-user-plugins

How do I configure routing in Nexus OSS 3

Sonatype Nexus 2 has "routing" capabilities, so that I can configure my requests for internal artifacts to only be served by certain (internal) repositories.
I've got a version of Nexus 3 running but I don't see any way to implement this capability. There is something called "content selectors" which might be the new mechanism, but there is absolutely no documentation of it, so I can't use it. This is a pretty important security requirement.
Am I missing something? How do I route requests in Nexus 3?
Nexus 3.17 is out. It is the first version to support routing rules. They work differently than in Nexus 2.X, but meet the same need. Things are now rule centered instead of repo centered. I found the Nexus 3 approach to be easier to understand.
This documentation page shows the new routing rules.
And for future readers:
Content selectors: privileges within nexus. Useful if you want to restrict a user to certain paths
Routing rules: which repos are queried for what patterns. Useful if you want to only look up certain paths from certain repos.
From support question I asked of nexus team, this feature is not yet in Nexus 3. They are working on a simpler design as feedback on the feature in nexus 2 was that it was confusing.
This question is quite old; I hope this answer helps to document the new implementation or Nexus 2 "routing" in Nexus 3 "Content selector".
It's correct, Sonatype Nexus 2 "routing" capabilities have been substituted by "Content selector" in Nexus 3, based on JEXL queries.
Some notes are now available in Chapter 4 of Nexus Repository Manager 3.1 Documentation.
Basically you have to create a new selector from
Server Administration and Configuration -> Repository Content Selectors.
Define the JEXL query for your scope, e.g. the query below searchs for all path beginning with com/mycompany in maven2 repositories:
format == "maven2" && path =^ "com/mycompany/"
You can test your query using "Preview" buttun.
After that you go on more or less as in Nexus 2.
Server Administration and Configuration -> Security -> Privileges -> Create privilege
Give a name and description, select your "Content selector", select the repositories to apply the privilege and the action (comma separated list), e.g.
read,browse
Next create or modify a Role
Server Administration and Configuration -> Security -> Roles
giving the privilege you just configured.
Finally assign the role to the users you need.

Start to use artifactory

in company where I am working we are starting to use artifactory like tool of repositories managment, and then I'm reading the user guide of this tool. We started in the configuration creating a virtual repository, a few local and remote repositories. On the use guide i found the following thing:
Prevent disclosing sensitive business information derived from your artifact queries to whomever can intercept the queries, including the
owners of the remote repository itself.
I saw that this could be avoided through
exclude pattern
functionality on the virtual repository. Can you give us some suggestion about this? What kinds of request we should avoided to do?
You should avoid requests for internal artifacts being sent to remote repositories (directly or via virtuals). This can happen when projects depends on internal libraries or within multi module projects where modules depends on each other. When working with virtual repositories Artifactory will always search for such artifacts in local repositories first. However, if someone asked for a wrong version or had a typo in the artifact name, the artifact will not be found in a local repository and Artifactory will try to look for it in the remote repositories configured in this virtual.
To avoid exposing sensitive business information as described above, we strongly recommend the following best practices:
The list of remote repositories used in an organization should be managed under a single virtual repository to which all requests are directed
All internal artifacts should be specified in the Excludes Pattern field of the virtual repository (or alternatively, of each remote repository) using wildcard characters to encapsulate the widest possible specification of internal artifacts.
Assuming all of your projects/modules are using some kind of namespace, for example com.mycompany, you can configure an exclusion pattern for artifacts under this namespace: com/mycompany/**.
For more information take a look at avoiding security risks with an excludes pattern

Guidelines when splitting artifact repositories

I am looking for an article which describes a set of guidelines to follow when creating repositories in an artifact repository manager.
I know that:
You need to keep snapshots in snapshot repositories.
You need to keep releases in release repositories.
Third-party artifacts should be in a separate repository (the same goes for forked/patched
versions of third-party libraries).
It's generally a good idea to prefix the names with int-* and ext-*.
Usually different product lines end up having their own repositories as sometimes their artifacts don't depend on each other.
I've been trying to find an article on this to illustrate to a client how this artifact separation abstraction is done by other companies and organizations using repositories.
Many thanks in advance!
I am not aware of existence of such an article, but as #tieTYT mentioned, you can look at Artifactory default repositories. They reflect years of experience in binaries management, continuous integration and delivery.
Those practices still apply even if you use Nexus (and you can observe them even without installing Artifactory, by looking at JFrog public Artifactory instance http://repo.jfrog.org)
For your convenience, here are the defaults (important usage emphasised):
Local Repositories:
libs-snapshot-local: Deploy here your local snapshots
libs-release-local: Deploy here your local releases
ext-snapshot-local: Deploy here 3rd-party snapshots which aren't available in remote repos
ext-release-local: Deploy here 3rd-party releases which aren't available in remote repos
plugins-snapshot-local: Deploy here your plugin (usually, maven) snapshots
plugins-release-local: Deploy here your plugin (usually, maven) releases
Remote Repositories:
jcenter: proxy of http://jcenter.bintray.com. Normally, that's the only remote repo you'll need. It includes whatever exists in maven central plus all other major maven repositories
Virtual Repositories:
remote-repos: aggregation of all the remote repositories
libs-release: this is the resolution repository for release builds. It includes remote-repos, libs-release-local and ext-release-local
libs-snapshot: this is the resolution repository for snapshot builds. It includes remote-repos, libs-snapshot-local and ext-shapshot-local
repo: this is special virtual repository, that aggregates everything. Generally, do not use it, if you ever plan building release pipeline using binary repository.
I'll be glad to advice on specific question.
As is the case with many questions about best practices, the answer is: It depends.
Technically there are only two distinctions that are required:
Snapshot vs release repo
Hosted vs proxy repository
Snapshots vs release repositories as a distinction is required since the Maven repository format and therefore Maven and other build tools differentiate how they work with the the meta data and what they do during upload.
For proxy repositories you will just have to add as many you need to proxy. This will depend on what components you require and will be separate for proxying snapshot and release repos.
For hosted repositories you also have to have separate snapshot and release repos. Beyond that is is all up for grabs. Having a separate third party repo as preconfigured in Nexus (and Artifactory) and other setups are certainly useful, but not really necessary. You can have all those distinctions sorted out by internal meta data where required.
Along the same lines you can have one release repo for everyone or one for each team or whatever. You can still apply access rights within those repositories to separate access and so on in Nexus with repository targets. I assume Artifactory and Archiva can do something similar. The question here mostly boils down to ease of administration, backups, security setup and access for users.
Naming conventions like you mentioned can help if you want to have separate repositories, but technically none of this is necessary.
Other things I have seen are e.g. migration repos that are used to migrate legacy project libraries into a repo but become frozen after the migration is done, separate repos per team, separate repos per project and so on. Another aspect are separate repos for different levels of approval and so on (e.g. check out problems with that on http://blog.sonatype.com/people/2013/10/golden-repository/)
In the end however this all hinges really on usability and meta data and is not required. Ultimately these repositories will in most cases grouped together and accessed via one group, which flattens out the whole separation. And access rights still carry through into the group so everything can still be controlled as you like. So it turns to be a matter of taste on how you want to slice and dice and manage it.
PS: I am referring to the Maven repositories and format. Once you add a whole bunch of other formats into the mix and wrappers around them exposing them in other formats, everything gets more complicated, but the ideas behind things stay similar.

List available artifacts from repo with gradle, ivy or other

I'm looking for a way to list all available artifacts programmatically for given repo url, group and artifact. The repo is maven-based.
I know about maven-metadata.xml but the repo that is in use doesn't provide classifier details which are crucial for me.
Solution may be based on ivy, gradle or other compatible tools. If anybody has an idea please let mi know :)
I hope to find a code sample that will allow me to browse repo in an easy and friendly way.
Use the search features of your Maven repository manager.
If you're using Nexus, it supports searches of it's Lucene index. For example the following URL returns all the artifacts matching the string "log4j":
https://repository.sonatype.org/service/local/lucene/search?q=log4j
The response is verbose but includes information like classfiers (which is what you're looking for)
maven-metdata.xml only has module information, and classifier belongs to artifact (not module). Gradle is probably not a good fit here. I'd consider a low-level approach with some GET requests and HTML parsing. In case the repository is backed by a repository manager such as Artifactory and Nexus, their REST API might also be an option.
Thanks guys for all hints. Yesterday I've managed to solve the problem using artifactory REST search API and parsing the incoming JSON respones. Thanks once again.

Resources