Intellij artifact tool doesn't create correct executable spark jar - maven

I created a Spark maven project in IntelliJ IDEA 2018 and tried to export an executable jar file of my main class. As I try to submit it to Yarn cluster, it errors The main class not found! while the MANIFEST.MF includes it:
Manifest-Version: 1.0
Main-Class: Test
I did the same with other processing engines like Apache Flink and IntelliJ could create an executable jar file that successfully runs on the cluster.
So in Spark case I always have to use maven-assembly-plugin and export the jar file using the command:mvn clean compile assembly:single
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>Test</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
I guess it's because of spark dependencies format. I faced the same problem in creating a jar file from my written class using Spark dependencies(not executable). For example, adding spark-sql dependency to Maven project eventuate in getting some other dependencies like spark-catalyst. Is there any way to export Spark executable jar file using IntelliJ IDEA?

maven-shade-plugin can be alternative option to create Uber jar. Here is the detailed pom.xml.

Related

Remove timestamp in maven tycho build

I have a multi module eclipse RCP Application. We are building the application through maven tycho. The build is creating successfully.
In the build folder i have the usual plugins folder which contains all the plugins(both jar packaging and directory packaging) in the project.
The plugins contains timestamp in it.Is there any way to remove the timestamp from the plugin while building. currently it is plugin.name_1.0.0.20200211.jar but i want the plugin to be plugin.name_1.0.0.jar
Adding a format tag did the trick for me. Pom file snippet is attached.
<plugin>
<groupId>org.eclipse.tycho</groupId>
<artifactId>tycho-packaging-plugin</artifactId>
<version>${tycho-version}</version>
<configuration>
<format>''</format>
</configuration>
</plugin>

how to add jar dependency to xtext maven build

What is the correct way to use a maven jar file in my xtext dsl project?
What I have tried is this:
use the maven-dependency-plugin in the pom.xml file of the *.dsl project to download the .jar file from a maven repository into the ./lib/ directory. This is done as early as possible in the build process: in the maven validate phase
in MANIFEST.MF: add the jar to the classpath: e.g. Bundle-ClassPath: ., lib/value-2.5.6-annotations.jar
in build.properties: add it to the bin.includes
The problem is, that the build only works when I call mvn install twice.
The first time, the .jar file is downloaded to the lib directory as expected (early in the build process), but then the build fails because it cannot resolve the types in my jar file.
When I then run mvn install again (the .jar file now already exists in the lib directory before the build), it works fine.
Any ideas how to resolve this?
Short answer
Currently it does not work as expected, because of bugs in Tycho
#353889: Defer target&dependency resolution to the normal build
#393978 maven-dependency-plugin:copy-dependencies goal does not work reliably with Tycho projects - "error copying ....jar.jar"
Long answer
Here is what I did to make it work (for now) in the xxx.dsl project:
pom.xml file
I use the maven-dependency-plugin to download the jar file in the maven validate phase (as early in the build as possible) to the lib directory.
Note, that I use stripVersion=true so that the file in the lib dir is called value-annotations.jar (and not value-2.5.6-annotations.jar). If I ever want to update the version in the future, I only need to update it in one place in the pom.xml file.
The jar file must also be specified as a dependency, because otherwise the users of the dsl plugin cannot build the project: i.e. the generateXtext task of the xtext-gradle-plugin will fail because it cannot find the classes in the jar file.
Relevant pom.xml code:
<project ...>
<properties>
<xtextVersion>2.13.0</xtextVersion>
<immutablesVersion>2.5.6</immutablesVersion>
...
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.0.2</version>
<executions>
<execution>
<id>copy-libraries</id>
<phase>validate</phase>
<goals>
<goal>copy</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>org.immutables</groupId>
<artifactId>value</artifactId>
<version>${immutablesVersion}</version>
<classifier>annotations</classifier>
<outputDirectory>lib</outputDirectory>
</artifactItem>
</artifactItems>
<stripVersion>true</stripVersion>
</configuration>
</execution>
</executions>
</plugin>
...
</build>
<dependencies>
<dependency>
<groupId>org.immutables</groupId>
<artifactId>value</artifactId>
<version>${immutablesVersion}</version>
<classifier>annotations</classifier>
</dependency>
</dependencies>
</project>
META-INF/MANIFEST.MF file
Add the jar file to the Bundle-ClassPath, so that we can use it: e.g. in the DslJvmModelInferrer.xtend
Add the package of the jar file to Export-Package, so that these files can be accessed by the xxx.dsl.tests project
Relevant parts of MANIFEST.MF:
Bundle-ClassPath: ., lib/value-annotations.jar
Export-Package: xxx.xtext,
...
xxx.xtext.validation,
org.immutables.value
build.properties file
Add the jar file to the bin.includes so that it will be copied to the generated jar file (in the target directory):
bin.includes=model/generated/,\
.,\
META-INF/,\
lib/value-annotations.jar,\
plugin.xml
Build
Now the build works in Eclipse.
On the command line (and in my continuous integration server script), I must execute maven twice (because of the mentioned bugs):
mvn verify (to download the jars)
mvn install

Spark Maven and Jar Development Workflow with local and remote server

So I have a very basic question about how to most effectively work with a local spark environment along with a remote server deployment and despite all of the various pieces of info about this, I still don't find any of them very clear.
I have my IntelliJ environment and dependencies in need within my pom to be able to compile and run and test with my local within intellij. Then I want to test and run against a remote server by copying over my packaged jar file via scp to then run spark-submits.
But I don't need any of the dependencies from maven within my pom file since spark-submit will just use the software on the server anyway so really I just need a jar file with the classes and keeping it very lightweight for the scp would be best. Not sure if I'm mis-understanding this but now I just need to figure out how to exclude any dependency from being added to the jar during packaging. What is the right way to do that?
Update:
So I managed to create a jar with and without dependencies using the below and I could just upload the one without any dependencies to server after building but how can I build only one jar file without any dependencies rather than waiting for a larger jar with everything which I don't need anyway:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
Two things here.
The provided dependency scope will allow you to work locally and prevent any server provided libraries from being packaged.
Maven doesn't package external libraries without creating an uber or shaded jar.
An example of a good Spark POM is provided by Databricks
Also worth mentioning, Maven copy local file to remote server using SSH
See Maven Wagon SSH plugin

How to bundle my Maven based gatling loadtest into one JAR?

I created a Gatling load test using the highcharts archetype. I decided against just downloading the latest Gatling ZIP file and creating a simulation within the extracted folder since I rely on a number of dependencies in public and private Maven repositories.
I want to
bundle my simulation and all its dependencies into a single JAR,
distribute the JAR to multiple load generators in EC2/GCE, and
start the test on all remote load generators.
Maven's assembly plugin looks like an obvious candidate to solve #1. So I added the following to my pom.xml:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>io.gatling.app.Gatling</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
With this configuration, running a JAR file created with mvn clean package assembly:single results in the following NoSuchFileException:
$ java -jar target/myapp-0.1-SNAPSHOT-jar-with-dependencies.jar
Exception in thread "main" java.nio.file.NoSuchFileException: ./target/test-classes
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:407)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at io.gatling.core.util.PathHelper$RichPath$.deepListAux$1(PathHelper.scala:99)
at io.gatling.core.util.PathHelper$RichPath$.deepList$extension(PathHelper.scala:105)
at io.gatling.core.util.PathHelper$RichPath$.deepFiles$extension(PathHelper.scala)
at io.gatling.app.classloader.SimulationClassLoader.simulationClasses(SimulationClassLoader.scala:55)
at io.gatling.app.Gatling.loadSimulations(Gatling.scala:92)
at io.gatling.app.Gatling.start(Gatling.scala:70)
at io.gatling.app.Gatling$.fromArgs(Gatling.scala:59)
at io.gatling.app.Gatling$.main(Gatling.scala:44)
at io.gatling.app.Gatling.main(Gatling.scala)
Is this how I should bundle up my Maven based Gatling project?
Have I misconfigured Gatling's Maven plugin at the time the JAR file is created?
Update 1:
Creating the target/test-classes directory gets around the NoSuchFileException. However, gatling then doesn't find any of my simulations. None of the *.scala files were added to JAR generated by the assembly plugin.

Package a multiple-entry jar using maven for hadoop project

I'm new to maven. I want to package a jar of my hadoop project with its dependencies, and then use it like:
hadoop jar project.jar com.abc.def.SomeClass1 -params ...
hadoop jar project.jar com.abc.def.AnotherClass -params ...
And I want to have multiple entry points for this jar (different hadoop jobs).
How could I do it?
Thanks!
There's two ways to create a jar with dependencies:
Hadoop supports jars in a jar format - meaning that your jar contain contain a lib folder of jars that will be added to the classpath at job submission and map / reduce task execution
You can unpack the jar dependencies and re-pack them with your classes into a single monolithic jar.
The first will require you to create a maven assembly definition file but in reality is more hassle than it's worth. The second also uses maven assemblies but utilizes a built in descriptor. To use the second, just add the following to your project -> build -> plugins section in the pom:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
Now when you run mvn package you'll get two jars in your target folder:
${project.name}-${project.version}.jar - Which will just contain classes and resources for your project
${project.name}-${project.version}-jar-with-dependencies.jar - which will contain your classes / resources and everything from your dependency tree with a scope of compile unpacked and repacked into a single jar
For multi entry points, you don't need to do anything specific, just make sure you don't define a Main-Class entry in the jar manifest (if you explicitly configure a manifest, otherwise the default doesn't name a Main-Class so you should be good)

Resources