spark jobserver ERROR classnotfoundexception - jdbc

I have been trying spark using spark-shell. All my data is in sql.
I used to include external jars using the --jars flag like /bin/spark-shell --jars /path/to/mysql-connector-java-5.1.23-bin.jar --master spark://sparkmaster.com:7077
I have included it in class path by changing the bin/compute-classpath.sh file
I was running succesfully with this config.
Now when I am running a standalone job through jobserver. I am getting the following error message
result: {
"message" : "com.mysql.jdbc.Driver"
"errorClass" : "java.lang.classNotFoundException"
"stack" :[.......]
}
I have included the jar file in my local.conf file as below.
context-settings{
.....
dependent-jar-uris = ["file:///absolute/path/to/the/jarfile"]
......
}

All of your dependencies should be included in your spark-jobserver application JAR (e.g. create an "uber-jar"), or be included on the classpath of the Spark executors. I recommend configuring the classpath, as it's faster and requires less disk-space since the third-party library dependencies don't need to be copied to each worker whenever your application runs.
Here are the steps to configure the worker (executor) classpath on Spark 1.3.1:
Copy the third-party JAR(s) to each of your Spark workers and the Spark master
Place the JAR(s) in the same directory on each host (e.g. /home/ec2-user/lib
Add the following line to the Spark /root/spark/conf/spark-defaults.conf file on the Spark master:
spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/name-of-your-jar-file.jar
Here's an example of my own modifications to use the Stanford NLP library:
spark.executor.extraClassPath /root/ephemeral-hdfs/conf:/home/ec2-user/lib/stanford-corenlp-3.4.1.jar:/home/ec2-user/lib/stanford-corenlp-3.4.1-models.jar

You might not be having /path/to/mysql-connector-java-5.1.23-bin.jar in your workers.
You can either copy required dependency to all spark workers or
Bundle the submitting jar with required dependencies.
I use maven for building the jar. The scope of dependencies must be run-time.

curl --data-binary #/PATH/jobs_jar_2.10-1.0.jar 192.168.0.115:8090/jars/job_to_be_registered
For posting dependency jar
curl -d "" 'http://192.168.0.115:8090/contexts/new_context?dependent-jar-uris=file:///path/dependent.jar'
This works for jobserver 1.6.1

Related

How can I submit an Apache Storm topology to a Storm cluster?

I'm following this tutorial: https://learn.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-develop-java-topology
What I've done so far is
maven setting
vi *.java files (in src/main/java/com/microsoft/example directory)
RandomSentenceSpout.java
SplitSentence.java
WordCount.java
WordCountTopology.java
mvn compile
jar cf storm.jar *.class (in target/classes/com/microsoft/example directory)
RandomSentenceSpout.class SplitSentence.class WordCount.class WordCountTopology.class
The above 4 files were used to make storm.jar file
Then, I tried
storm jar ./storm.jar com.microsoft.example.WordCountTopology WordCountTopology
and
storm jar ./storm.jar WordCountTopology
, but both of these failed, saying:
Error: Could not find or load main class com.microsoft.example.WordCountTopology
or
Error: Could not find or load main class WordCountTopology
According to a document, it says
Syntax: storm jar topology-jar-path class ...
Runs the main method of class with the specified arguments. The storm
jars and configs in ~/.storm are put on the classpath. The process is
configured so that StormSubmitter will upload the jar at
topology-jar-path when the topology is submitted.
I cannot find where to fix.
How can I resolve this?
I think your jar file does not contain class WordCountTopology. You can check it with jar tf storm.jar | grep WordCountTopology.
Looks like your jar does not contain a Manifest file which keeps information about the main class.
Try including the Manifest file or you can run the below java command to include the Manifest file
Hope this works!
jar cvfe storm.jar mainClassNameWithoutDotClassExtn *.class

Neo4j-Ogm with Spring Boot: Classpath scanning doesn't find DomainEntities when deployed as runnable jar

I am in the process of migrating an existing app from Spring-Data-Neo4j 3.x to 4.1 using neo4j-ogm 2.0.4.
After overcoming some obstacles, it is now running fine when launched directly from IDE.
However it doesn't find any DomainEntities when I run it via a Spring Boot runnable jar:
(ClassPathScanner.java:132) Classpath elements:
(ClassPathScanner.java:134) D:\Programme\Project\myProject.jar
(DomainInfo.java:108) Starting Post-processing phase
(DomainInfo.java:74) Building annotation class map
(DomainInfo.java:87) Building interface class map for 0 classes
(DomainInfo.java:136) Checking for #Transient classes....
(DomainInfo.java:155) Registering converters and deregistering transient fields and methods....
(DomainInfo.java:159) Post-processing complete
(DomainInfo.java:69) 0 classes loaded in 40179 milliseconds
The executable jar is built using the Spring Boot Gradle Plugin which allows to make a jar executable:
springBoot {
executable = true
}
I've attached to the app via remote debugging when the jar starts and found that org.neo4j.ogm.scanner.ClassPathScanner#scan only contains my jar as classPathElement. According to the code, this should now be scanned as a zip/jar file. When classPathElement.isFile() is executed however, this evaluates to false and the jar is skipped.
Why is that the case? Is an executable jar not a file?
What steps can I take to get this running? I could probably use some other deployment mechanism, but I found this fairly simple and well working.
I did some additional investigation and it turned out that this was not related to the runnable jar at all. This was actually caused by having a space in the path to the jar file.
I think that is a perfectly valid case and am not sure why this doesn't work. In my case it was ok though to simply rename the respective folder and remove the space.

Hadoop data join package

I am new to hadoop while exploring the hadoop data join package I am given the below mentioned command:
hadoop jar /home/biadmin/DataJoin.jar com.datajoin.DataJoin
/user/biadmin/Datajoin/customers.txt
/user/biadmin/Datajoin/orders.txt
/user/biadmin/Datajoin/outpu1
I am getting below error Exception in thread main
java.lang.NoClassDefFoundError: org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
at java.lang.ClassLoader.defineClassImpl(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:364)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:154)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:777)
at java.net.URLClassLoader.access$400(URLClassLoader.java:96)
You need to add the hadoop-datajoin jar into classpath while running the job. Use the -libjars option for adding the extra jars into class path. Your command will be like this. Provide the correct path to jar directory or you can download the jars.
hadoop jar /home/biadmin/DataJoin.jar com.datajoin.DataJoin
-libjars <path>/hadoop-datajoin.jar
/user/biadmin/Datajoin/customers.txt
/user/biadmin/Datajoin/orders.txt
/user/biadmin/Datajoin/outpu1

Running spark job using Yarn giving error:com.google.common.util.concurrent.Futures.withFallback

I am trying run spark job using yarn,but getting below error
java.lang.NoSuchMethodError: com.google.common.util.concurrent.Futures.withFallback(Lcom/google/common/util/concurrent/ListenableFuture;Lcom/google/common/util/concurrent/FutureFallback;Ljava/util/concurrent/Executor;)Lcom/google/common/util/concurrent/ListenableFuture;
at com.datastax.driver.core.Connection.initAsync(Connection.java:176)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:721)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:248)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:194)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connect(Cluster.java:249)
at com.figmd.processor.ProblemDataloader$ParseJson.call(ProblemDataloader.java:46)
at com.figmd.processor.ProblemDataloader$ParseJson.call(ProblemDataloader.java:34)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:140)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
cluster Details:
Spark 1.2.1,hadoop 2.7.1
I have provided class path using spark.driver.extraClassPath.
hadoop user has access to that class path as well.But I think yarn is not getting the JAR's on that classpath.
I am not able to reach root cause of it.Any help will be appreciated.
Thanks.
I faced the same problem, and the solution was shade guava to avoid classpath collision.
If you're using sbt assembly to build your jar, you can just add this to your build.sbt:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.**" -> "shadeio.#1").inAll
)
I wrote a blog post which describes my process to arrive to this solution: Making Hadoop 2.6 + Spark-Cassandra Driver Play Nice Together.
Hope it helps!
Issue is related to guava version mismatch.
withFallback was added to version 14 of Guava. It looks like you have Guava < 14 on your classpath
Adding to #Arjones answer, if you are using gradle + GradleShadow, you can add this to your build.gradle to relocate or rename the Guava classes.
shadowJar {
relocate 'com.google.common', 'com.example.com.google.common'
}

unable to execute jar file(consists dependencies), which is build from gradle

i am very new to gradle, i was trying to build a java file which is dependent on other jar file. It is building properly but when i try to execute it, it is giving "NoClassDefinitionFoundError".
my build.gradle file is:
apply plugin : 'java'
jar {
manifest {
attributes 'Main-Class': 'Hey'
}
}
dependencies
{
compile files('lib/BuildBasicJavaProject.jar') ------line A
}
if i remove the above line A then it is not even building the project.
if i keep that line A then it is building properly and producing the jar file, but when i execute it using ,
java -jar jarfilename.jar
then it is giving me a NoClassDefinitionFoundError.
Where do i need to specify the dependents path while running the jar file??
May be its a basic doubt but i wasted 2 days already in it, i tried giving
1) absolute path of the dependency file
2) adding the following line,
runtime files('lib/BuildBasicJavaProject.jar')
But did not succeed.
Thanks in advance
First welcome to Gradle world.
Your Gradle scripts seems to be correct. When you have a dependency, one jar depends on another like in your case at compile time you define compile time dependency like you did. So if you need this jar to run it you need runtime dependency, in your case. But Gradle automatically put all your compile time dependencies to be runtime dependencies. So you do not need to specify them explicitly.
So why then your code does not working?
The classpath (-cp) option is ignored if using the -jar option. So you can not specify dependent jar using -cp when type jar.So you have to write If you are on Windows
java -cp myJar.jar;.\lib\BuildBasicJavaProject.jar Hey
or use (:) and slashes(/) for Linux.
Where Hey is the full-quallified name of your main class, which have to be defind in the Manifest.
So if your class Hey is in the package:com.alabala.dev and it's name is Hey it's full qualified name is com.alabala.dev.Hey. So you have to tell Gradle
mainClassName = "com.alabala.dev.Hey"
So now Gradle put it in the manifest and when you are trying to load this jar in the JVM, she will know that to start it, she have to execute com.alabala.dev.Hey.
What is cp and why you have to specify it? Said with simple word cp is classpath - directories and archives in which JVM searches when want to load something. So here there is nothing linekd with Gradle it is Java.
You'll want to specify dependency jar(s) as a part of classpath, when you are executing your jar.
Something along these these lines:
java -cp myJar.jar:./lib/BuildBasicJavaProject.jar my.package.MyMainClass
Bare in mind that classpath delimiters are different on different platforms (: is for *nix based systems).

Resources