Running spark-submit with --master yarn-cluster: issue with spark-assembly - hadoop

I am running Spark 1.1.0, HDP 2.1, on a kerberized cluster. I can successfully run spark-submit using --master yarn-client and the results are properly written to HDFS, however, the job doesn't show up on the Hadoop All Applications page. I want to run spark-submit using --master yarn-cluster but I continue to get this error:
appDiagnostics: Application application_1417686359838_0012 failed 2 times due to AM Container
for appattempt_1417686359838_0012_000002 exited with exitCode: -1000 due to: File does not
exist: hdfs://<HOST>/user/<username>/.sparkStaging/application_<numbers>_<more numbers>/spark-assembly-1.1.0-hadoop2.4.0.jar
.Failing this attempt.. Failing the application.
I've provisioned my account with access to the cluster. I've configured yarn-site.xml. I've cleared .sparkStaging. I've tried including --jars [path to my spark assembly in spark/lib]. I've found this question that is very similar, yet unanswered. I can't tell if this is a 2.1 issue, spark 1.1.0, kerberized cluster, configurations, or what. Any help would be much appreciated.

This is probably because you left sparkConf.setMaster("local[n]") in the code.

Related

how to switch between cluster types in Apache Spark

I'm trying to switch cluster manager from standalone to 'YARN' in Apache Spark that I've installed for learning.
I read following thread to understand which cluster type should be chosen
However, I'd like to know the steps/syntax to change the cluster type.
Ex: from Standalone to YARN or from YARN to Standalone.
In spark there is one function name as --master that can helps you to execute your script on yarn Cluster mode or standalone mode.
Run the application on local mode or standalone used this with spark-submit command
--master Local[*]
or
--master spark://192.168.10.01:7077 \
--deploy-mode cluster \
Run on a YARN cluster
--master yarn
--deploy-mode cluster
For more information kindly visit this link.
https://spark.apache.org/docs/latest/submitting-applications.html
If you are not running through command line then you can directly set this master on SparkConf object.
sparkConf.setMaster(http://path/to/master/url:port) in cluster mode
or
sparkConf.setMaster(local[*]) in client/local mode

Can't see Yarn Job when doing Spark-Submit on Yarn Cluster

I am using spark-submit for my job with the command below:
spark-submit script_test.py --master yarn --deploy-mode cluster
spark-submit script_test.py --master yarn-cluster --deploy-mode cluster
The job is working fine. I can see it under the Spark History Server UI. However, I cannot see it under the RessourceManager UI ( YARN).
I have the feeling that my job is not sent to the cluster but it is running only in one node. However, I see nothing wrong on the way I use the Spark-submit command.
Am-i wrong? How can I check it? Or send the job to yarn cluster?
When you are using --master yarn means that in some place you have configured the yarn-site with hosts, ports, and so on.
Maybe the machine where you are using the spark-submit doesn't know where is the Yarn master.
You could check your hadoop/yarn/spark config files, specially the yarn-site.xml to check if the host of the Resource Manager is correct or not.
Those files are in different folders depending on which distribution of Hadoop you are using. In HDP I guess they are in /etc/hadoop/conf
Hope it helps.

Spark cluster mode. Deployment of application jar

I want to run a maven project in spark cluster mode. I have the application jar file. I also have one master and 6 workers in working condition. But when I execute the jar application, the work is not getting distributed among the workers. The following is the command I gave from the spark directory.
./bin/spark-submit --class org.deeplearning4j.mlp.MnistMLPExample --master spark://115.145.173.152:7077 --driver-memory 10g /home/hadoop/Niki/mnist/target/dl4j-spark-0.7-SNAPSHOT-bin.jar.
If I add another parameter --deploy-mode cluster, Then its throwing exception as follows:
Exception in thread "main" com.beust.jcommander.ParameterException: Unknown option: --deploy-mode
Can anyone help me out. Thanks a lot
Hi Nikitha yes you need jar file in all worker nodes because spark transformations and actions will execute on worker nodes and if they use this path they search file in there local path so distribute it on all worker nodes also Can you please tell why you use this jar file path in spark code.
You are running spark in standalone mode. There is no cluster/client mode in standalone. It is relvent in yarn only.

What is the master URL in EC2 spark cluster

I have a spark cluster launched using spark-ec2 script.
(EDIT: after login into the master), I can run spark jobs locally on the master node as :
spark-submit --class myApp --master local myApp.jar
But I can't seem to run the job in the cluster mode:
../spark/bin/spark-submit --class myApp --master spark://54.111.111.111:7077 --deploy-mode cluster myApp.jar
The ip address of the master is obtained from the AWS console.
I get the following errors:
WARN RestSubmissionClient: Unable to connect to server
Warning: Master endpoint spark://54.111.111.111:7077 was not a REST server. Falling back to legacy submission gateway instead.
Error connecting to master (akka.tcp://sparkMaster#54.111.111.111:7077).
Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#54.177.156.236:7077
No master is available, exiting.
How to submit to a EC2 spark cluster ?
When you run with --master local you are also not connecting to the master. You are executing Spark operations in the same JVM as the application. (See docs.)
Your application code may be wrong too. So first just try to run spark-shell on the master node. /root/spark/bin/spark-shell is configured to connect to the EC2 Spark master when started without flags. If that works, you can try spark-shell --master spark://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:7077 on your laptop. Be sure to use the external IP or hostname of the master machine.
If that works too, try running your application in client mode (without --deploy-mode cluster). Hopefully in the course of trying all these, you will figure out what was wrong with your original approach. Good luck!
This is nothing to do with EC2, I had similar error on my server. I was able to resolve it by overwriting spark-env.sh SPARK_MASTER_IP.

Spark Standalone Mode: Worker not starting properly in cloudera

I am new to the spark, After installing the spark using parcels available in the cloudera manager.
I have configured the files as shown in the below link from cloudera enterprise:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
After this setup, I have started all the nodes in the spark by running /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-all.sh. But I couldn't run the worker nodes as I got the specified error below.
[root#localhost sbin]# sh start-all.sh
org.apache.spark.deploy.master.Master running as process 32405. Stop it first.
root#localhost.localdomain's password:
localhost.localdomain: starting org.apache.spark.deploy.worker.Worker, logging to /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain: failed to launch org.apache.spark.deploy.worker.Worker:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
localhost.localdomain: at gnu.java.lang.MainThread.run(libgcj.so.10)
localhost.localdomain: full log in /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain:starting org.apac
When I run jps command, I got:
23367 Jps
28053 QuorumPeerMain
28218 SecondaryNameNode
32405 Master
28148 DataNode
7852 Main
28159 NameNode
I couldn't run the worker node properly. Actually I thought to install a standalone spark where the master and worker work on a single machine. In slaves file of spark directory, I given the address as "localhost.localdomin" which is my host name. I am not aware of this settings file. Please any one cloud help me out with this installation process. Actually I couldn't run the worker nodes. But I can start the master node.
Thanks & Regards,
bips
Please notice error info below:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
I met the same error when I installed and started Spark master/workers on CentOS 6.2 x86_64 after making sure that libgcj.x86_64 and libgcj.i686 had been installed on my server, finally I solved it. Below is my solution, wish it can help you.
It seem as if your JAVA_HOME environment parameter didn't set correctly.
Maybe, your JAVA_HOME links to system embedded java, e.g. java version "1.5.0".
Spark needs java version >= 1.6.0. If you are using java 1.5.0 to start Spark, you will see this error info.
Try to export JAVA_HOME="your java home path", then start Spark again.

Resources