Submitting a Spark application to the virtualbox Spark Master

Submitting a Spark application to the virtualbox Spark Master - hadoop

I have created a Spark application Hello World that works well locally through Eclipse IDE.
I would like to deploy remotely this application from my local machine to the virtualbox Cloudera machine, through the "spark-submit".
The command line used for that is:
C:\Users\S-LAMARTI\Desktop\AXA\Workspaces\AXA\helloworld\target>%SPARK_HOME%/spa
rk-submit --class com.saadlamarti.helloworld.App --master spark://192.168.56.102
:7077 --deploy-mode cluster helloworld-0.0.1-SNAPSHOT.jar
Unfortunately, the application doesn't work, and I get this message error:
15/10/12 12:20:40 WARN RestSubmissionClient: Unable to connect to server spark:/
/192.168.56.102:7077.
Warning: Master endpoint spark://192.168.56.102:7077 was not a REST server. Fall
ing back to legacy submission gateway instead.
Can someone have any idea, why is not working?

Remove the arguement --deploy-mode cluster and try again.

Check the master:8080,and then you can see two url,one is the client submit url,another is the rest for cluster.

Find your REST url, if you set the argument --deploy-mode cluster, you must set the argument --master spark:Rest url.

Related

How to launch a Spark Streaming YARN application with Kerberos-Only Users?

The Problem: As expected, OS Users are able to launch and own a spark streaming application. However, when we try to run a job where the owner of the application is not an OS User, the spark streaming returns an error saying that the user was not found. As you can see in the output from the 'spark-submit' command:
main : run as user is 'user_name'
main : requested yarn user is 'user_name'
User 'user_name' not found
I already saw this error in some other forums and the recommendation was to created the OS-User, but unfortunately this is not an option here. In storm applications a Kerberos-Only User can be used in combination with an OS-User, but this seems not to be the case in spark.
What I have tried so far: The closest I could get was to use two OS Users, where one has 'read' access to the keytab file of the second one. I ran the application from one to 'impersonate' the second and the second appears as the owner. No errors appear as both are OS Users, but it does fail when I use a Kerberos-Only user as the second. Following you can see the submitted command for spark-streaming (BTW both are also HDFS-users, otherwise it would also not be possible to launch):
spark-submit --master yarn --deploy-mode cluster --keytab /etc/security/keytabs/user_name.keytab
--principal kerberosOnlyUser#LOCAL
--files ./spark_jaas.conf#spark_jaas.conf,
./user_name_copy.keytab#user_name_copy.keytab --conf "spark.driver.extraJavaOptions=-Djava.security.auth.login.config=./spark_jaas.conf"
--conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./spark_jaas.conf"
--driver-java-options "-Djava.security.auth.login.config=./spark_jaas.conf"
--conf spark.yarn.submit.waitAppCompletion=true --class ...
I also tried the alternative with the --proxy-user command, but the same error was returned.
Is it really not possible to use a Kerberos-only user in spark? Or is there a workaround?
The environment is:
Spark 2.3.0 on YARN.
Hadoop 2.7.3.
Thanks a lot for your help!

Spark cluster mode. Deployment of application jar

I want to run a maven project in spark cluster mode. I have the application jar file. I also have one master and 6 workers in working condition. But when I execute the jar application, the work is not getting distributed among the workers. The following is the command I gave from the spark directory.
./bin/spark-submit --class org.deeplearning4j.mlp.MnistMLPExample --master spark://115.145.173.152:7077 --driver-memory 10g /home/hadoop/Niki/mnist/target/dl4j-spark-0.7-SNAPSHOT-bin.jar.
If I add another parameter --deploy-mode cluster, Then its throwing exception as follows:
Exception in thread "main" com.beust.jcommander.ParameterException: Unknown option: --deploy-mode
Can anyone help me out. Thanks a lot

Hi Nikitha yes you need jar file in all worker nodes because spark transformations and actions will execute on worker nodes and if they use this path they search file in there local path so distribute it on all worker nodes also Can you please tell why you use this jar file path in spark code.

You are running spark in standalone mode. There is no cluster/client mode in standalone. It is relvent in yarn only.

spark-submit --proxy-user do not work in yarn cluster mode

Currently I am using a cloudera hadoop single node cluster (kerberos enabled.)
In client mode I use following commands
kinit
spark-submit --master yarn-client --proxy-user cloudera examples/src/main/python/pi.py
This works fine. In cluster mode I use following command (no kinit done and no TGT is present in the cache)
spark-submit --principal <myprinc> --keytab <KT location> --master yarn-cluster examples/src/main/python/pi.py
Also works fine. But when I use following command in cluster mode (no kinit done and no TGT is present in the cache)
spark-submit --principal <myprinc> --keytab <KT location> --master yarn-cluster --proxy-user <proxy-user> examples/src/main/python/pi.py
throws following error
<proxy-user> tries to renew a token with renewer <myprinc>
I guess in cluster mode the spark-submit do not look for TGT in the client machine... it transfers the "keytab" file to the cluster and then starts the spark job. So why does the specifying "--proxy-user" option looks for TGT while submitting in the "yarn-cluster" mode. Am I doing some thing wrong.

Spark doesn't allow to submit keytab and principal with proxy-user. The feature description in the official documentation for YARN mode (second paragraph) states specifically that you need keytab and principal when you are running long running jobs. This enables the application to continue working with any security issue.
Imagine if all application users logging into your applications can proxy to your keytab.
I have to do what Hive does to run "spark-submit". Basically kinit before submitting my application and then provide a proxy-user. So here is how I solved it.
kinit # -k -t
spark-submit with --proxy-user
is best implementation. So no your are not doing anything wrong.

What is the master URL in EC2 spark cluster

I have a spark cluster launched using spark-ec2 script.
(EDIT: after login into the master), I can run spark jobs locally on the master node as :
spark-submit --class myApp --master local myApp.jar
But I can't seem to run the job in the cluster mode:
../spark/bin/spark-submit --class myApp --master spark://54.111.111.111:7077 --deploy-mode cluster myApp.jar
The ip address of the master is obtained from the AWS console.
I get the following errors:
WARN RestSubmissionClient: Unable to connect to server
Warning: Master endpoint spark://54.111.111.111:7077 was not a REST server. Falling back to legacy submission gateway instead.
Error connecting to master (akka.tcp://sparkMaster#54.111.111.111:7077).
Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#54.177.156.236:7077
No master is available, exiting.
How to submit to a EC2 spark cluster ?

When you run with --master local you are also not connecting to the master. You are executing Spark operations in the same JVM as the application. (See docs.)
Your application code may be wrong too. So first just try to run spark-shell on the master node. /root/spark/bin/spark-shell is configured to connect to the EC2 Spark master when started without flags. If that works, you can try spark-shell --master spark://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:7077 on your laptop. Be sure to use the external IP or hostname of the master machine.
If that works too, try running your application in client mode (without --deploy-mode cluster). Hopefully in the course of trying all these, you will figure out what was wrong with your original approach. Good luck!

This is nothing to do with EC2, I had similar error on my server. I was able to resolve it by overwriting spark-env.sh SPARK_MASTER_IP.

Running spark-submit with --master yarn-cluster: issue with spark-assembly

I am running Spark 1.1.0, HDP 2.1, on a kerberized cluster. I can successfully run spark-submit using --master yarn-client and the results are properly written to HDFS, however, the job doesn't show up on the Hadoop All Applications page. I want to run spark-submit using --master yarn-cluster but I continue to get this error:
appDiagnostics: Application application_1417686359838_0012 failed 2 times due to AM Container
for appattempt_1417686359838_0012_000002 exited with exitCode: -1000 due to: File does not
exist: hdfs://<HOST>/user/<username>/.sparkStaging/application_<numbers>_<more numbers>/spark-assembly-1.1.0-hadoop2.4.0.jar
.Failing this attempt.. Failing the application.
I've provisioned my account with access to the cluster. I've configured yarn-site.xml. I've cleared .sparkStaging. I've tried including --jars [path to my spark assembly in spark/lib]. I've found this question that is very similar, yet unanswered. I can't tell if this is a 2.1 issue, spark 1.1.0, kerberized cluster, configurations, or what. Any help would be much appreciated.

This is probably because you left sparkConf.setMaster("local[n]") in the code.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio