Dependency downloaded from Maven Central is free from Malware? - maven

I am new to Maven and trying to use it for Android build. I have this doubt in my mind which was also triggered by Jason Van Zyl's interview here.
My Question
How to ensure that the dependency we have downloaded from Maven Central is free from Malware or is not corrupted?

You can never be sure it is free from malware, i.e. it is always possible.
Maven Central host open source projects so the source code is always available from somewhere, so if you need to be sure of malware free and also sure of compatible licensing terms you should download the source and build it yourself and not use Maven Central.
I am sure if an artifact did have malware in it the people running Maven Central have a policy to be contacted to investigate such things and deal with it.
...
Re corruption free. Maven makes use of hash digests for many things to ensure corruption free data, your Maven client and Maven Repository can be configured to always validate. Files on the server also usually have a *.md5 or *.sha1 URL of the data checksum.
Also JARs themselves have intrinsic checksums. They are based on ZIP files and these do have a checksum scheme that should detect most corruption. The ZIP directory is always at the end of the file so short/truncated files will also be detected.
Obviously these mechanisms are not 100% reliable but maybe considered 99.99% reliable.
...
As a software producer putting things up on Maven central. I would urge you to always SIGN your JARs. This is a mechanism that allows each independent software producer to sign the original JAR they produce and then distribute it via any mechanism across the internet. Any user can (theoretically) download it from any source and be able to verify that it has not been tampered with.
Unfortunately Maven Central does not have a policy to ensure source code is available alongside binaries, or have a policy enforcing JAR signing. So from a security stand point Maven Central is useful to get things going with your local development but if you do care about security do not use it.
You nee to implement your own security policy (or pay someone else) to implement it on your behalf.
To manage your secured environment you might wish to take a look at one of the Maven repositories you can run on your local network such as Sonatype Nexus (this comes in open-source and free edition with most features enabled).
...
NB I did not read the link you provided yet will do now.

Suggesting to build software yourself instead of downloading it from the Central repository is an amazingly bad idea. Uploads to central are very closely guarded and you can enable strict checksum checks for every download Maven does in your settings.xml.
If the checksum matches you will have the exact copy of artifacts in central and they are very closely monitored during uploads and do not change once uploaded.
In addition you would only use Central indirectly via a company controlled repository manager like Nexus and if you need more security, license and audit tooling and reporting you would look at tools like Sonatype Nexus Professional and Sonatype Insight or similar products.

Related

Why use a maven repository manager

I've been using maven since a year for managing my projects' dependencies, but I recently came to know that there is a concept of Maven Repository Manager.
I would like to ask What is a Maven Repository Manager and what is the purpose of using maven repository manager.
A "Maven Repository Manager" is basically a server that stores copies of all of your libraries so that they can be downloaded when a project is built. When you use Maven, you are using a repository manager already called "Maven Central." See here: https://maven.apache.org/repository-management.html
When you are working with a large project or corporation, they may host an alternative to Maven Central, like Sonatype Nexus. There are two reasons why they do.
First, a big corporation might have libraries that are intended only for internal use that are used across a large number of projects. For example, if you worked at Amazon, you might have libraries for completing credit card transactions. That shouldn't necessarily be shared with the rest of the world, so you don't want to put it in Central; you need to put it someplace private.
Second, it reduces bandwidth. If every developer at Amazon only used Maven Central, then that would be lots of network traffic. A repository acts as a "proxy" to Central. It searches internally for a library, and then if it doesn't find it, it downloads it from Central and then saves it for the next time someone asks for it.
To solve problems like:
how do you get your binary to your server in the first place?
which version do you want?
when you deploy a new version how do you revert to an old version?
which employees can access which binaries?
And so forth.

Maven - a set-up query

Given a group of developers, each one has the following requirements on the respective (local)Windows machines:
Through IDEs like Eclipse, STS etc., run Spring, Hibernate etc. projects
Quickly build, deploy , run, change if required, rebuild and redeploy(everything, preferably via IDEs) the projects available on Github
There are following constraints/objectives :
The individual developer machines have restricted or no Internet access
The developers must take the required jars from a single location which will store jars required across the team
Whenever required, a developer must be able to pull updated jars from the central location onto his local environment and continue to run the projects seamlessly
Within the IDE, build a Github project and run it (locally)
Attached is the image to give a clear idea of the work environment which I'm envisaging!
I have started reading Maven but quite overwhelmed - how should I proceed?
You should use an internal Maven repository. There is such applications as Nexus and Artifactory (those are probably the close numbers 1 and 2 in the business, just my opinion). You can set it up to use your proxy server.
It will be able to serve as a proxy for your Maven clients, and keep a copy of the artifacts that are downloaded. They will even allow you control over what kind of artifacts your developers pull in (although they may not always appreciate that).
It will also be able to store and serve your own artifacts that your developers can deploy (release) to it.
Maven is great at dependency management, and that is what most organizations start using it for. But as your process matures, there is also the opportunity for version/release management using Maven. Developers will build SNAPSHOT versions for themselves, or share these with the team through the repository. When they release their artifacts, they make a final version of the artifact available in the repository.
Maven has great support for your IDEs, myself I use Eclipse a lot, which has m2e to work with Maven.
Apache itself on 'Why do I need a Repository Manager?'
New tools require adaptation, sometimes culture shifts. And in the case of Maven, where many organizations come from scripted builds, it may require a paradigm shift. It sounds a bit as if you're at risk of being overwhelmed some more in the future. I think it will be worth your effort, but you may want to get some experienced help to get you on track.
More of a personal note: done with those proxy servers, alright!
As suggested by Sander Verhagen in another answer, what you should do is to use a repository proxy. Nexus and Artifactory is the most famous one. Here I will describe briefly steps you need to do for what you are looking for:
Set up Nexus in machine M
In Nexus, setup a proxy repository to Central (this should be available out of box), and other repositories that you want your developer to access. You may need to add http proxy setting when you are configuring the Repo Proxy.
(Optional, but recommended) setup a repository group which includes all the public repository proxies. Assuming URL of this proxy group is http://M/nexus/groups/public
In developer's machine, update ~/.m2/settings.xml, set http://M/nexus/groups/public as the mirror of central. If you created other internal hosted repositories in Nexus, you may add them in settings.xml as well.
That's all. You can use Maven as normal. Dependencies will now be fetched from Nexus in M.

What's the purpose of an artifact repository?

Wherever you read about continuous delivery or continuous integration it's recommended to use an artifact repository to store the artifacts even though Jenkins already stores them for each build.
So why is it recommended to use an artifact repository? Is there a smooth solution to work with the artifacts of the Jenkins builds, ex. to use these artifacts for deployment?
An artifact repository and continuous integration tools serve two different purposes and one cannot be substituted with the other. Check this video from Artifactory, one of the providers of artifact repositories, about why one should use an artifact repository.
Jenkins stores the artifacts as plain files without versioning while artifacts in an artifact repository can be version controlled. So you have a lot more flexibility in retrieving artifacts and governing them. Read this very good article on why we need them. Surely not all of those things are supported by continuous integration tools like Jenkins.
Moreover, you can also look at the Artifactory plugin for Jenkins which integrates the two.
An artifact repository is needed but the artifact repo is a conceptual piece an not always a distinct tool. With Jenkins you should have MD5 signatures and (I think) a way of downloading the files you want (web service call, right?) from your remote server. Certainly, if you're doing something simple like using the Jenkins build pipeline plugin, it should be able to access the right versions of the files smoothly.
Alternatively, if you are using a separate deployment tool, the better ones bundle an artifact repository.
Regardless, you want what the ITIL folks call a Definitive Media/Software Library. Definitive in that the bits are secure, trusted, and official. And a library in that they can be easily looked up and accessed. When working with an artifact repository, you need to make sure its adequately secure. It is backed up. It is accessible for your deployments (including to production). If you look at Jenkins and it meets your criteria in those categories, consider yourself done. If it's lacking, and I wouldn't be surprised if it was, then you need either a dedicated tool like the Maven repos, or something bundled with the deploy tooling.
For more of my rambling on the subject, there's a recorded webcast. The slides for that are up on Slideshare.
I haven't kept up to date with Jenkins, we still use a version of the CI when it was orginally called Hudson.
In your projects your poms you should normally point to your own artifact repository were you can fetch and deploy your own (company) projects.
Using an artifact repository with your CI server, it can then deploy successfully built snapshot and releases which can be available to other developers.

Does it still make sense to use Maven when dependent jars are checked in with source code?

We check all of our source code's dependent third-party JARs into source control along with our source code. When needed, we manually download updates to third party JARs and replace those JARs that are in source control with the newer versions. We haven't felt the need to use Maven yet as this process seems simple enough for us. But are we missing something of great value by not using Maven? Or does our scenario not warrant using Maven?
"JARs dont change much", I hear this all the time.....
Storing jars in the SCM is simple in the beginning of the project. Over time the number of jars gets larger and larger.... Wait 2 or 3 years and nobody remembers where the jars came from, what their licensing terms were and most commonly what versions are being used (important to know when analysing security vulnerabilities).....
The best article I've read recently making the case for a repository manager is:
http://www.sonatype.com/people/2012/07/wait-you-dont-have-a-repository-manager/
A little irreverant, but does make a valid point about the kind of technical inertia one encounters all the time.
Switching a project team from ANT to Maven can be scary.... Maven works quite differently, so I find it is best deployed with greenfield or adventurous project teams. For the old-school ANT users, I recommend using the Apache ivy plugin. Ivy allows such teams to outsource the management of their dependencies but keep the build technology they're comfortable with.
Ultimately the biggest benefit of using Maven are not dependency management. It's the standized build process. I've seen several failed attempts to create a "standard" ANT build process. Problem every build engineer has his opinion on what the standard should be.... Maven's approach of forcing users to write build plugins may appear restrictive in the beginning, but just like the iPhone eventually developers discover "there's a Maven plugin for that" :-)
When it comes to dependency management Maven really can be quite valuable. As Mark O'Connor suggests, running a local repository manager would likely be better than checking the artifacts into source control.
There are many tools (like m2e in eclipse) that can help with dependency management and provide valuable feedback on which modules or dependencies require which other dependencies. Maven will also make sure to get the appropriate version of a dependency even if different modules depend on different versions of a given library. That will help prevent duplicate versions of the same jar showing up in your deployed project as long as they have the same group and artifact id.
Even for a very simple project I don't think I would resort to checking dependencies into the source control system.
It's not only about 3rd Party Libraries. Mostly if you have multiple repositories. In our case, we had four repositories with lots of inter- and intra-dependencies.
Actually I started this answer and then I had to go for 15 minutes to talk to some colleague about a problem happened after someone forgot to update the .jar of one project in the other's lib directory.
And it looks more professional :)

Practices and patterns for maven, using and distributing jars that have click-through license requirements?

First, a not-so-brief backgrounder (...my apologies in advance for the LONG question... skip to paragraph #6 to get to the actual question :-) ...Long-time maven users know the pains of the old (missing) "sun jars" issue. Worked-arounds included locally installing (.m2/repo) or running a repo like nexus, artifactory, et al.... all just to work-around this issue of not having all our artifacts available in public repositories due to the need to click "agree" to a license.
The sun jars issue is largely a thing of the past, but the problem might get worse in the not-so-distant future, as corporations begin to both embrace the use of open-source, while at the same allow free downloads of their software (try-before-you-buy; free to use, but with restrictions). All their jars will require you to "agree" to the license before downloading. If you're maintaining your own internal repo, you have to write pom's that declare the dependencies of these commercial jars... because the vendor didn't. It might not even be obvious that the commercial jars are using open-source jars (nor which version they are using) when the vendor re-jars all the classes into one big uber-jar (bad), which you then have to manually repackage to avoid having duplicate implementations of spring, log4j, apache camel, jdbc drivers, (etc) in your application.
I'm hoping that this old anti-pattern will soon come to an end: manually downloading jars, manually re-packaging & uploading to a local repo; or -- worse -- manually installing in {user}/.m2/repository (each developer's environment is potentially different; I have to set up / debug each QA person's build environment...).
It's also a problem for my end-users who use the (commercial) software I write, as distributed by my owner (er, my company): my corp won't publish pom's or jars to a public maven repository due to the absence of any click-through license agreement. (Any company could & should publish pom's w/ the dependencies declared (but still require the jars to be manually downloaded from the corp website), but this would take a lot of teaching/explaining to middle management. Not many understand pom's or dependency management (or software development))...
Long story short, it's not a software problem; it's not a maven problem. It is/was a lawyer-made problem, as of yet with no programmer-made solution (that I'm aware of). And I'm curious to know what the status quo is after all these years...
...So, my question, then, is this:
Does anyone know of usage patterns or even solutions to the issue of building maven (or ivy) projects with "free" jars that require you to click "I agree" to a license? This is something that would have to be implemented in the maven client as well as the maven repo. (Maybe this exists today?)
Building should be as simple as running "mvn install" once, agree to a license, and then all builds work locally just fine until you delete your local ~/.m2/repository or upgrade your dependency version. And therefore this would also work on your CI server: after one, initial build, where you hypothetically could click-through via the hudson/jenkins user-interface.
(...If there's a one-line / one-link answer to this overly verbose question (e.g., "just install the maven-world-peace-plugin"), well, my sincerest apologies for the long-winded "question"...!)

Resources