getting java.lang.ClassNotFoundException: when using spark-submit - spark-streaming

I am getting the following error using spark-submit in local mode,
java.lang.ClassNotFoundException:com.mytwitter.spark.TwitterStreaming
The error occurs when I use this command in terminal,
spark-submit --class com.mytwitter.spark.TwitterStreaming /home/hadoop/workspace/Twitter/target/Twitter-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Related

spark-submit to a docker container

I created a Spark Cluster using this repository and the relative documentation.
Now I'm trying to execute through spark-submit a job inside the Docker container of the Spark Master so the command that I use is something similar:
/path/bin/spark-submit --class uk.ac.ncl.NGS_SparkGATK.Pipeline \
--master spark://spark-master:7077 NGS-SparkGATK.jar HelloWorld
now the problem is that i receive Failed to connect to master spark-master:7077
I tried any combination: container IP, container ID, container name, localhost, 0.0.0.0, 127.0.0.1 but I receive always the same error.
While if I use --master local[*] the application works.
What am I missing?
the problem was to use the hostname for spark://spark-master:7077
So inside the Spark Master is something like this:
SPARK_MASTER_HOST=`hostname`
/path/bin/spark-submit --class uk.ac.ncl.NGS_SparkGATK.Pipeline \
--master spark://$SPARK_MASTER_HOST:7077 NGS-SparkGATK.jar HelloWorld

Running pysparkling-water using Livy spark failed

I have been able to run the ChicagoCrimeDemo.py script using spark-submit successfully (spark-submit --master=yarn-client --py-files /opt/sparkling-water-1.6.10/py/build/dist/h2o_pysparkling_1.6-1.6.10-py2.7.egg /opt/sparkling-water-1.6.10/py/examples/scripts/ChicagoCrimeDemo.py) .
Although when I try to execute the same script using Livy(Spark), I am getting the following error:

Spark submit with master as yarn-client (windows) gives Error "Could not find or load main class"

I have installed Hadoop2.7.1 with spark 1.4.1 on windows 8.1
When I execute below command
cd spark
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client lib/spark-examples*.jar 10
I get below error in JobHistoryServer log
Error: Could not find or load main class '-Dspark.externalBlockStore.folderName=spark-262c4697-ef0c-4042-af0c-8106b08574fb'
I did further debugging(along searching net) and could get hold of container cmd script where below sections(other lines are omitted) are given
...
#set CLASSPATH=C:/tmp/hadoop-xyz/nm-local-dir/usercache/xyz/appcache/application_1487502025818_0003/container_1487502025818_0003_02_000001/classpath-3207656532274684591.jar
...
#call %JAVA_HOME%/bin/java -server -Xmx512m -Djava.io.tmpdir=%PWD%/tmp '-Dspark.fileserver.uri=http://192.168.1.2:34814' '-Dspark.app.name=Spark shell' '-Dspark.driver.port=34810' '-Dspark.repl.class.uri=http://192.168.1.2:34785' '-Dspark.driver.host=192.168.1.2' '-Dspark.externalBlockStore.folderName=spark-dd9f3f84-6cf4-4ff8-b0f6-7ff84daf74bc' '-Dspark.master=yarn-client' '-Dspark.driver.appUIAddress=http://192.168.1.2:4040' '-Dspark.jars=' '-Dspark.executor.id=driver' -Dspark.yarn.app.container.log.dir=/dep/logs/userlogs/application_1487502025818_0003/container_1487502025818_0003_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '192.168.1.2:34810' --executor-memory 1024m --executor-cores 1 --num-executors 2 1> /dep/logs/userlogs/application_1487502025818_0003/container_1487502025818_0003_02_000001/stdout 2> /dep/logs/userlogs/application_1487502025818_0003/container_1487502025818_0003_02_000001/stderr
I check relevant files for CLASSPATH, looks OK. The main class org.apache.spark.deploy.yarn.ExecutorLauncher is available in spark assembly jar which is part of container jar
So, what could be the issue here? I searched net and found many discussions, but are for unix variants, but not many for Windows. I am wondering whether spark submit really works on Windows (yarn-client mode only, standalone cluster mode works) without any special setup!!!
BTW, if I run the above java command from cmd.exe command prompt, I get the same error as all command line arguments are quoted with single quote instead of double quotes(making these double quotes work!!), so is this a bug
Note spark-shell also fails (in yarn mode) and but yarn jar ... command works
Looks like it was a defect in earlier version. With latest Hadoop 2.7.3 with spark 2.1.0, it is working correctly.!!! Could not find any reference though.

Spark not launching on Windows Yarn

I met the following issue that Spark can not launch on Windows Yarn.
15/06/05 06:31:34 ERROR spark.SparkContext: Error initializing SparkContext.org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:114)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
And when I drill down to Yarn, and get the error:
Error: Could not find or load main class '-Dspark.driver.memory=2G'
After investigating this issue, the root cause is:
In the YARN command generation part, single quotes (‘) will be
added surrounding some of the java options.
But Windows does not like the quotes for these options.
Similar to this issue https://issues.apache.org/jira/browse/SPARK-5754 .
Anyone know how to escape the logic working on Windows Yarn cluster?
Spark Submit Command:
%SPARK_HOME%\bin\spark-submit.cmd --jars ... ^
--class ....^
--master yarn-client ^
--driver-memory 10G ^
--executor-memory 20G ^
--executor-cores 6 ^
--num-executors 10 ^
QuasarNRT.jar 10 6
-Tao

Spark-submit not working when application jar is in hdfs

I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied my application jar to a directory in hdfs, i get the following exception:
Warning: Skip remote jar hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar.
java.lang.ClassNotFoundException: com.example.SimpleApp
Here's the command:
$ ./bin/spark-submit --class com.example.SimpleApp --master local hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar
I'm using hadoop version 2.6.0, spark version 1.2.1
The only way it worked for me, when I was using
--master yarn-cluster
To make HDFS library accessible to spark-job , you have to run job in cluster mode.
$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--class <main_class> \
--master yarn-cluster \
hdfs://myhost:8020/user/root/myjar.jar
Also, There is Spark JIRA raised for client mode which is not supported yet.
SPARK-10643 :Support HDFS application download in client mode spark submit
There is a workaround. You could mount the directory in HDFS (which contains your application jar) as local directory.
I did the same (with azure blob storage, but it should be similar for HDFS)
example command for azure wasb
sudo mount -t cifs //{storageAccountName}.file.core.windows.net/{directoryName} {local directory path} -o vers=3.0,username={storageAccountName},password={storageAccountKey},dir_mode=0777,file_mode=0777
Now, in your spark submit command, you provide the path from the command above
$ ./bin/spark-submit --class com.example.SimpleApp --master local {local directory path}/simple-project-1.0-SNAPSHOT.jar
spark-submit --master spark://kssr-virtual-machine:7077 --deploy-mode client --executor-memory 1g hdfs://localhost:9000/user/wordcount.py
For me its working I am using Hadoop 3.3.1 & Spark 3.2.1. I am able to read the file from HDFS.
Yes, it has to be a local file. I think that's simply the answer.

Resources