Spark Maven and Jar Development Workflow with local and remote server

Spark Maven and Jar Development Workflow with local and remote server - maven

So I have a very basic question about how to most effectively work with a local spark environment along with a remote server deployment and despite all of the various pieces of info about this, I still don't find any of them very clear.
I have my IntelliJ environment and dependencies in need within my pom to be able to compile and run and test with my local within intellij. Then I want to test and run against a remote server by copying over my packaged jar file via scp to then run spark-submits.
But I don't need any of the dependencies from maven within my pom file since spark-submit will just use the software on the server anyway so really I just need a jar file with the classes and keeping it very lightweight for the scp would be best. Not sure if I'm mis-understanding this but now I just need to figure out how to exclude any dependency from being added to the jar during packaging. What is the right way to do that?
Update:
So I managed to create a jar with and without dependencies using the below and I could just upload the one without any dependencies to server after building but how can I build only one jar file without any dependencies rather than waiting for a larger jar with everything which I don't need anyway:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>

Two things here.
The provided dependency scope will allow you to work locally and prevent any server provided libraries from being packaged.
Maven doesn't package external libraries without creating an uber or shaded jar.
An example of a good Spark POM is provided by Databricks
Also worth mentioning, Maven copy local file to remote server using SSH
See Maven Wagon SSH plugin

Related

Why can't my local environment find generated Proto classes?

I have a project that is set up to compile protobufs specified in my resources directory. To that end, I am using the xolstice plugin, with the following configuration:
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>compile-custom</goal>
</goals>
</execution>
</executions>
<configuration>
<protocArtifact>com.google.protobuf:protoc:${protobuf.version}:exe:${os.detected.classifier}</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:${grpc.version}:exe:${os.detected.classifier}</pluginArtifact>
</configuration>
</plugin>
This is roughly the configuration described here. The protos in question include a few object models as well as a GRPC service which I register for use.
The .jar is packaged easily enough, with a maven-jar-plugin I've inherited from our common root pom. The configuration for that is:
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.2.0</version>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addDefaultSpecificationEntries>true</addDefaultSpecificationEntries>
</manifest>
</archive>
</configuration>
</plugin>
When I run my project in IntelliJ, everything seems to work fine - I can observe that the required protos are generated correctly and I don't have any issues. However, when I run the .jar with java -jar target/service.jar, I run into the following issue:
[Byte Buddy] ERROR com.artistchooser2.handlers.ChooseArtistsHandler [jdk.internal.loader.ClassLoaders$AppClassLoader#5c29bfd, unnamed module #776b83cc, Thread[main,5,main], loaded=false]
java.lang.IllegalStateException: Cannot resolve type description for com.artistChooser2.v1.ChooseArtistsServiceGrpc$ChooseArtistsServiceImplBase
at net.bytebuddy.pool.TypePool$Resolution$Illegal.resolve(TypePool.java:161)
at net.bytebuddy.pool.TypePool$Default$WithLazyResolution$LazyTypeDescription.delegate(TypePool.java:1038)
The class which should have been generated by the protocol compilation step seems nowhere to be found. Interestingly, however, I can easily confirm that this seems to be broken by running: jar -tvf target/service.jar |grep 'ChooseArtistsServiceGrpc$ChooseArtistsServiceImplBase'. If I run that, I can observe that the class IS actually available and correctly packaged with the .jar. I can also verify that in IntelliJ easily enough by perusing through everything within the .jar.
I noticed this issue because I was setting up a test that runs my service in a Docker image and verifies that it starts up correctly, as it would in production. Interestingly, however, although I am locally unable to get mvn verify to run successfully, my build server (which I have confirmed is running mvn verify) runs to completion without issue.
I've checked all of the usual suspects - it has nothing to do with the maven build profile that is used on the build server, maven versions are the same on the build server and locally, I've even tried clearing the .m2/repository in case there was something fishy there.
So I guess my question is whether anyone has any further leads? Is there something else I should be looking into, some sort of environment variable, or anything else that might cause the above exception locally but not on a build server?

So I'm still not entirely sure how exactly this issue manifested - in my .proto spec, I had:
option java_package = "com.artistChooser2.v1";
Interestingly, on my local machine, when I found the compiled protos in the packaged .jar, they were appearing under com.artistchooser2.v1. Note the lowercase 'c'. But the test I was running was still looking in the upper 'C' package.
For some reason, on the build server, they were actually compiling in the correct location, but not on my local jar. I'm running a Mac, while the build server is a Linux box. Curious if it has something to do with the environment.
Either way, the solution was to alter the package name to what my machine was expecting, which was the lower 'c', probably a better package name anyway.

Always run proguard-maven-plugin before install phase

What I am trying to do, is to obfuscate a certain packages in a multi module application, before it gets installed to my local repository, so that the final package will be an EAR file which contains obfuscated jars.
I tried to obfuscate the jars during EAR building process without success. Now i want to build the EAR with obfuscated jars instead ob obfuscating then during the build.
So I've got the following plugin configuration:
<plugin>
<groupId>com.github.wvengen</groupId>
<artifactId>proguard-maven-plugin</artifactId>
<version>2.0.11</version>
<dependencies>
<dependency>
<groupId>net.sf.proguard</groupId>
<artifactId>proguard-base</artifactId>
<version>${version.proguard}</version>
</dependency>
</dependencies>
<executions>
<execution>
<phase>process-classes</phase>
<goals>
<goal>proguard</goal>
</goals>
</execution>
</executions>
<configuration>
...
</configuration>
</plugin>
So there are two problems for me:
Progruard always runs after the install phase, so that the EAR build always gets the not obfuscated jars
I always have to add proguard:proguard to the maven command, which of course fails in a multi module project where some modules don't have to be obfuscated
So my questions:
How can I obfuscate the package before it gets installed?
How can I make plugins like this one run on default without adding <phase>:<goal> to the maven call?
Thnx.

It seems that for the proguard plugin to work, JAR files are needed. Perhaps you can achieve this by attaching the proguard plugin's proguard goal to the package phase (and not process-classes phase) of the default Maven build life cycle as proposed here by Alexey Shmalko. It's not clear to me if you are using the maven-shade-plugin, but if you are, then place the proguard plugin configuration your in pom.xml after that of maven-shade-plugin (this is because both these plugin attach to the same phase: package).
My expectation is that since package phase is achieved before install phase, it should give you the effect you are looking for.

Incomplete WAR automatically uploaded by Maven

To ease up the deployment process of my Jave EE application, I instructed Maven to automatically copy the resulting WAR file to the application server.
pom.xml:
<plugin>
<artifactId>exec-maven-plugin</artifactId>
<groupId>org.codehaus.mojo</groupId>
<executions>
<execution><!-- Run our version calculation script -->
<id>Copy to Application Server</id>
<phase>generate-sources</phase>
<goals>
<goal>exec</goal>
</goals>
<configuration>
<executable>${basedir}/copy-to-appserver.sh</executable>
</configuration>
</execution>
</executions>
</plugin>
copy-to-appserver.sh:
scp /home/user/.m2/repository/com/wolf/apix/1.0/apix-1.0.war user#srv-web:/opt/wildfly-8.2.0.Final/standalone/deployments/apix.war
Unfortunately, this fails! The WAR is successfully transmitted to the application server, but it's mixed with old and new code. My assumption is that Maven tries to send it while still being in the WAR creation process, because when I run the copy script copy-to-appserver.sh manually after the deployment, everything works fine with it on the application server.
My question is, what do I have to change, so that Maven only accesses the WAR file when its creation / manipulation is complete?

Your plugin is being executed prematurely in the generate-sources phase
Run it in the last phase by changing the phase to deploy
<phase>deploy</phase>

In addition running the plugin in the correct phase, as suggested by 6ton, you might also want to consider using the Maven WildFly plugin, which specifically designed to solve your problem. That way, you can get rid of that nasty, nasty script.

I would recommend to separate the build process and the deployment process cause the deploy life cycle phase is intended to upload the artifacts to a remote repository.

Automatically publish JavaDoc as a functioning website on remote machine

Is there a way of automatically publishing as live website javadocs after uploading it to Nexus Maven repository? I have some packages, which are under constant development and I'd like to have the docs for them available for browsing by other team members straight after uploading updated code to our remote repository.
Is there any ready solution for doing that or would I have to write say a shell script (executed by Maven after successfull deployment of the code to the remote repo), which would copy the docs to a remote location on a web server?
I know that Nexus Professional allows to view javadocs out of the box, but even for 10 users it is a bit pricey, so I'd appreciate a different solution :-)
I'm using Eclipse#Windows + Maven 2.
Thanks!

If your project is open-source and is released to Maven Central then your javadoc will be available at javadoc.io automatically.
The url is of the form: http://www.javadoc.io/doc/[groupId]/[artifactId]. For example, I have my javadoc for my project at www.javadoc.io/doc/me.ramswaroop.jbot/jbot

Nexus Open Source as well as Nexus Professional support the site repository format, which allows you to host Maven produced sites. If you set up publishing a Maven site that includes Javadoc as part of your build you can have them accessible there.
The other thing you can do is just publish javadoc and source artifacts as part of you build and then Eclipse will be able to automatically download them from Nexus and you therefore wont even need a website with the javadoc on it for you developers.
Of course other people might want the site though.
Documentation on all that is available in the Nexus book as usual.

A bit late for reply, but someone else might need it in future. I didn't try the solution you proposed instead using one utilizing Maven Wagon plugin. Basically I first upload the jar with javadoc to webserver using sftp protocol and then unzip it using ssh command.
So parts of my parent POM, which configure the plugin look as follows:
<pluginManagement>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>wagon-maven-plugin</artifactId>
<executions>
<execution>
<id>upload-javadoc</id>
<phase>deploy</phase>
<goals>
<goal>upload</goal>
</goals>
<configuration>
<serverId>my_id</serverId>
<fromDir>${local.dir}</fromDir>
<includes>${javaDoc.file}</includes>
<excludes>pom.xml</excludes>
<url>${remote.url}</url>
<toDir>${remote.dir}</toDir>
</configuration>
</execution>
<execution>
<id>execute-test-commands</id>
<phase>deploy</phase>
<goals>
<goal>sshexec</goal>
</goals>
<configuration>
<serverId>my_id</serverId>
<url>${remote.url}</url>
<commands>
<command>mkdir -p ${www.dir}</command>
<command>unzip -o ${remote.dir}/${javaDoc.file} -d ${www.dir} > latest_unzip_log.txt</command>
</commands>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
And then just call the plugin in child pom:
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
</plugin>
</plugins>
So that's my solution, but I guess Manfred's is much simpler.
Ps.
Having the Javadoc available only in Ecplise (which I do anyways) was not the goal as the purpose was for the Javadoc to be available for other project members through a website.

Another option to host javadocs is https://docshoster.org/. It can automatically pickup from maven. There is an api to upload directly as well.

How do I write a maven plugin which actually runs?

The instructions here seem very clear:
http://maven.apache.org/guides/plugin/guide-java-plugin-development.html
However, the first problem I run into is that the dependencies are wrong. I also needed to reference the maven-plugin-annotations dependency.
Then, when I attempt to run I get the "No plugin descriptor found at META-INF/maven/plugin.xml" error. I haven't figured out what to do about that.
I've found lots of pages referencing the maven-plugin-plugin, but I can't figure out how to add it to the pom so that it actually does anything which allows my own plugin to run.
Is there an updated version of the plugin development instructions which actually mentions the need to use maven-plugin-plugin?
If I can't get this to work I'm just going to go back to using exec-maven-plugin. It's uglier, but at least it works easily.

There are actually several terrific resources from Sonatype for learning how to write plugins:
Maven the Complete Reference: Writing Plugins
Maven Cookbook: Creating an Ant Maven Plugin
Maven Cookbook: Writing Plugins in Groovy

If I recall correctly, you need to configure the maven-plugin-plugin this way to avoid the "No plugin descriptor found..." issue.
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-plugin-plugin</artifactId>
<version>3.2</version>
<configuration>
<!-- see http://jira.codehaus.org/browse/MNG-5346 -->
<skipErrorNoDescriptorsFound>true</skipErrorNoDescriptorsFound>
</configuration>
<executions>
<execution>
<id>mojo-descriptor</id>
<goals>
<goal>descriptor</goal>
</goals>
</execution>
</executions>
</plugin>
I forked a simple GitHub project called maven-wrapper (port of the Gradle wrapper) to make it a Maven plugin.
"It should be easy" for you to figure out pieces that you may eventually be missing:
Maven wrapper plugin(Mojo)
Maven Wrapper full POM

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spark Maven and Jar Development Workflow with local and remote server - maven

Related

Why can't my local environment find generated Proto classes?

Always run proguard-maven-plugin before install phase

Incomplete WAR automatically uploaded by Maven

Automatically publish JavaDoc as a functioning website on remote machine

How do I write a maven plugin which actually runs?

Categories

Resources