Package a multiple-entry jar using maven for hadoop project - maven

I'm new to maven. I want to package a jar of my hadoop project with its dependencies, and then use it like:
hadoop jar project.jar com.abc.def.SomeClass1 -params ...
hadoop jar project.jar com.abc.def.AnotherClass -params ...
And I want to have multiple entry points for this jar (different hadoop jobs).
How could I do it?
Thanks!

There's two ways to create a jar with dependencies:
Hadoop supports jars in a jar format - meaning that your jar contain contain a lib folder of jars that will be added to the classpath at job submission and map / reduce task execution
You can unpack the jar dependencies and re-pack them with your classes into a single monolithic jar.
The first will require you to create a maven assembly definition file but in reality is more hassle than it's worth. The second also uses maven assemblies but utilizes a built in descriptor. To use the second, just add the following to your project -> build -> plugins section in the pom:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
Now when you run mvn package you'll get two jars in your target folder:
${project.name}-${project.version}.jar - Which will just contain classes and resources for your project
${project.name}-${project.version}-jar-with-dependencies.jar - which will contain your classes / resources and everything from your dependency tree with a scope of compile unpacked and repacked into a single jar
For multi entry points, you don't need to do anything specific, just make sure you don't define a Main-Class entry in the jar manifest (if you explicitly configure a manifest, otherwise the default doesn't name a Main-Class so you should be good)

Related

Intellij artifact tool doesn't create correct executable spark jar

I created a Spark maven project in IntelliJ IDEA 2018 and tried to export an executable jar file of my main class. As I try to submit it to Yarn cluster, it errors The main class not found! while the MANIFEST.MF includes it:
Manifest-Version: 1.0
Main-Class: Test
I did the same with other processing engines like Apache Flink and IntelliJ could create an executable jar file that successfully runs on the cluster.
So in Spark case I always have to use maven-assembly-plugin and export the jar file using the command:mvn clean compile assembly:single
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>Test</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
I guess it's because of spark dependencies format. I faced the same problem in creating a jar file from my written class using Spark dependencies(not executable). For example, adding spark-sql dependency to Maven project eventuate in getting some other dependencies like spark-catalyst. Is there any way to export Spark executable jar file using IntelliJ IDEA?
maven-shade-plugin can be alternative option to create Uber jar. Here is the detailed pom.xml.

Make a jar that has a classifier by default

We have a project called core-services. This builds three jars:
core-services-client Contains all client classes
core-services-server: Contains all server and client classes
core-services-test: Contains all junit classes
Right now, I build the core-services-server jar by default, and then use assemblies to build the client and test jars. If a developer wants to use the client or test jars, they must specify a classifier. However, when they want to depend upon the server jar, they don't specify a classifier.
This will lead to developers just using the server jar when they really should be using the client jar. I'd like to build all three jars to require a classifier when using them as a dependency. However, I can't do this when specifying the project:
<groupId>com.vegicorp</groupId>
<artifactId>core-services</artifactId>
<version>1.0.0</version>
<classifier>server</classifier>
<packaging>jar</packaging>
I know I can use <finalName> to call the default jar core-services-server, but I want to make sure that if a developer depends upon the core-services, they must say whether they want the server, the client, or the testing classes. If I merely rename it, they will get the server jar by default.
How can I specify that the default jar has a classifier of server?
I figured it out. I can put the default classifier into the maven-jar-plugin configuration in my pom.xml:
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<!-- All other configuration information is in the parent pom -->
<configuration>
<classifier>server</classifier>
</configuration>
</plugin>
When I do mvn deploy, I get a warning that No primary artifact to deploy, deploying attached artifacts instead, and all three jars deploy with classifiers.

How can the production jar specify its own dependencies when added to other project as a dependency?

If the question title can't make it clear, take me explain here in more detail. Suppose the production jar of one of my Maven applications needs to be used into my other Maven web-application. Adding that jar to my second application Maven dependency doesn't add its transitive dependencies. Also, the jar in itself is an application.
One way is to look at the POM of the first application and add those in the POM of the other application. But then, how do central Maven jars add their own transitive dependencies when added to some project.
In other words, if I add commons-io.jar Maven dependency to my project, it automatically adds its transitive dependencies. But when I add myjar.jar as a Maven dependency (scope->system) then it doesn't automatically adds its transitive dependencies.
I think that I should develop my first application as some other archetype which can be used in such a case. Please advise me how to proceed further.
Sorry for this newbie question. Actually, I'm new to Maven and I've started using Netbeans-embedded-maven to create applications. I really like the way Maven simplifies the job.
edited
Seems like I should explain in more detail. So here is it.
Suppose I wrote a program/application that used A.jar,B.jar,C.jar and my production output was X.jar (which obviously doesn't contain other jars within as per maven default build). The above A,B,C jars are present in maven central repository and were added as dependency to my project. The project build jar is X.jar
Now I write another application in which I added X.jar as a system dependency, now what I want is that A.jar, B.jar, C.jar added automatically to the project since they are transitive dependencies for X.jar
Hope so I've explained it clear this time. Please forgive me for my writing style in case you didn't understand earlier.
One solution is to build X.jar containing all dependencies within it using something like this
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<mainClass>com.nitinsurana.mlmmaven.Start</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-my-jar-with-dependencies</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
But I'm looking for something that automatically adds transitive dependencies of a system dependency.
The system scope is not supposed to be used for actual jar dependencies that will be packaged with another application. Quoting from the official documentation:
Dependencies with the scope system are always available and are not looked up in repository. They are usually used to tell Maven about dependencies which are provided by the JDK or the VM. Thus, system dependencies are especially useful for resolving dependencies on artifacts which are now provided by the JDK, but where available as separate downloads earlier. Typical example are the JDBC standard extensions or the Java Authentication and Authorization Service (JAAS).
You should use the default compile scope.
As others have suggested, use the (default) compile scope and add <exclusions> for transitive dependencies you don't want / need.
See: Maven > Optional Dependencies and Dependency Exclusions
I had gone through the link provided by #Sean and it seems like what I want is not possible.
Shall I vote to delete this question ?
Although the answer is IT CAN'T BE DONE and heres' why :
Project-A -> Project-B
The diagram above says that Project-A depends on Project-B. When A declares B as an optional dependency in its POM, this relationship remains unchanged. Its just like a normal build where Project-B will be added in its classpath.
Project-X -> Project-A
But when another project(Project-X) declares Project-A as a dependency in its POM, the optional dependency takes effect. You'll notice that Project-B is not included in the classpath of Project-X; you will need to declare it directly in your POM in order for B to be included in X's classpath.
Taken from Official Documentation
So, your X module is mavenized? Then you can install it locally with mvn clean install and then use it in another projects with all transitive dependencies and compile scope. This case is good till you do everything on you own machine. As far as you want to share the code with others or configure CI build you need X with its pom available to others. The best way to do this is to have your own artifactory, accessible from all other machines. You install X there and use it with compile scope as ususal, just need to add new repo to pom.

how to expose WEB-INF/lib inside war using maven

We are using an 3rd party war in our web app (war). In order to communicate with the war, we have created a bridge module (jar). The intention is to prevent our web app from directly communicating with the external war, but instead communicate through the bridge module.
All the 3 modules (2 wars and 1 jar) are inside an ear file which is deployed in JBoss.
ear
- war1 (our web app)
- war2 (external web app)
- bridge jar
Point to note is, the bridge jar uses some API (exposed as jars), which are present inside the WEB-INF/lib directory of the external war.
At the time of bringing up JBoss, we get java.lang.NoClassDefFoundError errors because the bridge jar is not able to find the API present in external war's WEB-INF/lib.
We do not want to place all external jars directly under ear as it will mean the external jars are not confined only within its war.
Is there a way to access the jars present inside WEB-INF/lib of the external war from the bridge jar? Can we achieve this using maven build process, or is there a better approach to this?
We've had a similar problem recently with our jars not able to see other jars. Resolved it by creating a manifest.mf using the maven-ejb-plugin defined in the pom.xml of the "bridge jar"
2 ways to do this:
a) if bridge jar's pom.xml already has war1 and war2 defined as dependencies - then use maven-ejb-plugin with
<configuration>
<ejbVersion>3.0</ejbVersion>
<archive>
<manifest>
<addClasspath>true</addClasspath>
</manifest>
</archive>
</configuration>
This should autogenerate manifest.mf with a Classpath matching all dependencies defined in the pom
b) else, define your own manifest.mf with the right entries you need and point to it like so
<configuration>
<ejbVersion>3.0</ejbVersion>
<archive>
<manifestFile>src/main/resources/META-INF/MANIFEST.MF</manifestFile>
</archive>
</configuration>
Since your jars are in the WEB-INF/lib of the war1, I think you should go for option 2 with a Manifest containing direct entries such as
Class-Path: WEB-INF/lib/some-external.jar

Maven: Use assembly as resulting artefact - appendAssemblyId == false?

I have a multimodule project.
Last module is "assemble", which is intended to put few modules' .jar's together in a single big .jar, which I could use for distribution.
This module does nothing else, so I did this:
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
...
</configuration>
</plugin>
</plugins>
I want this behavior to be able to simply run the resulting jar from IDE (NetBeans 7.0).
Maven does exactly what I want, but says this:
[assembly:single]
Reading assembly descriptor: src/assembly/assembly.xml
Building jar: /mnt/ssd1/_projekty/JBoss/bots/JawaBot/2.0/assemble/target/JawaBot-assemble-2.0.0-SNAPSHOT.jar
Configuration options: 'appendAssemblyId' is set to false, and 'classifier' is missing.
Instead of attaching the assembly file: /mnt/ssd1/_projekty/JBoss/bots/JawaBot/2.0/assemble/target/JawaBot-assemble-2.0.0-SNAPSHOT.jar, it will become the file for main project artifact.
NOTE: If multiple descriptors or descriptor-formats are provided for this project, the value of this file will be non-deterministic!
Replacing pre-existing project main-artifact file: /mnt/ssd1/_projekty/JBoss/bots/JawaBot/2.0/assemble/target/JawaBot-assemble-2.0.0-SNAPSHOT.jar
with assembly file: /mnt/ssd1/_projekty/JBoss/bots/JawaBot/2.0/assemble/target/JawaBot-assemble-2.0.0-SNAPSHOT.jar
This message seems like it's not a recommended way to achieve my goal.
Is there any better?
The assembly plugin will create an artifact with a 'classifier' consisting of the assembly ID from the descriptor. It won't create a main artifact AFAICT.
You might be happier with the maven-shade-plugin, plus configuring the maven-jar-plugin to set the manifest class name so that java -jar works. The shade plugin can produce a main artifact.

Resources