spark in running but yarn can't find it - hadoop

I have a problem about yarn cluster
I run hdfs-namenode, hdfs-datanode, yarn at localhost and then run a spark-master and a spark-worker at localhost too, see like this:
$ jps
5809 Main
53730 ResourceManager
53540 SecondaryNameNode
53125 NameNode
56710 Master
54009 NodeManager
56809 Worker
53308 DataNode
56911 Jps
I can see spark-worker is link to spark-master throw http://127.0.0.1:8080
img : spark-web-ui
[![enter image description here][1]][1]
But in yarn web-ui http://127.0.0.1:8088, there is nothing in Nodes of the cluster page
img :
[![enter image description here][2]][2]
My conf/spark-env.sh is
export SCALA_HOME="/opt/scala-2.11.8/"
export JAVA_HOME="/opt/jdk1.8.0_101/"
export HADOOP_HOME="/opt/hadoop-2.7.3/"
export HADOOP_CONF_DIR="/opt/hadoop-2.7.3/etc/hadoop/"
export SPARK_MASTER_IP=127.0.0.1
export SPARK_LOCAL_DIRS="/opt/spark-2.0.0-bin-hadoop2.7/"
export SPARK_DRIVER_MEMORY=1G
And conf/spark-defaults.conf is
spark.master spark://127.0.0.1:7077
spark.yarn.submit.waitAppCompletion false
spark.yarn.access.namenodes hdfs://127.0.0.1:8032
And yarn-site.xml is
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>127.0.0.1</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
</configuration>
When I submit an application use
spark-submit --master yarn --deploy-mode cluster test.py
I can get out put like this
16/10/12 16:19:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/12 16:19:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
16/10/12 16:19:30 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/10/12 16:19:30 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/10/12 16:19:30 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/10/12 16:19:30 INFO yarn.Client: Setting up container launch context for our AM
16/10/12 16:19:30 INFO yarn.Client: Setting up the launch environment for our AM container
16/10/12 16:19:30 INFO yarn.Client: Preparing resources for our AM container
16/10/12 16:19:31 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/10/12 16:19:32 INFO yarn.Client: Uploading resource file:/opt/spark-2.0.0-bin-hadoop2.7/spark-3cdb2435-d6a0-4ce0-a54a-f2849d5f4909/__spark_libs__2140674596658903486.zip -> hdfs://127.0.0.1:9000/user/fuxiuyin/.sparkStaging/application_1476256306830_0002/__spark_libs__2140674596658903486.zip
16/10/12 16:19:33 INFO yarn.Client: Uploading resource file:/home/fuxiuyin/PycharmProjects/spark-test/test.py -> hdfs://127.0.0.1:9000/user/fuxiuyin/.sparkStaging/application_1476256306830_0002/test.py
16/10/12 16:19:33 INFO yarn.Client: Uploading resource file:/opt/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip -> hdfs://127.0.0.1:9000/user/fuxiuyin/.sparkStaging/application_1476256306830_0002/pyspark.zip
16/10/12 16:19:33 INFO yarn.Client: Uploading resource file:/opt/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip -> hdfs://127.0.0.1:9000/user/fuxiuyin/.sparkStaging/application_1476256306830_0002/py4j-0.10.1-src.zip
16/10/12 16:19:33 INFO yarn.Client: Uploading resource file:/opt/spark-2.0.0-bin-hadoop2.7/spark-3cdb2435-d6a0-4ce0-a54a-f2849d5f4909/__spark_conf__3570291475444079549.zip -> hdfs://127.0.0.1:9000/user/fuxiuyin/.sparkStaging/application_1476256306830_0002/__spark_conf__.zip
16/10/12 16:19:33 INFO spark.SecurityManager: Changing view acls to: fuxiuyin
16/10/12 16:19:33 INFO spark.SecurityManager: Changing modify acls to: fuxiuyin
16/10/12 16:19:33 INFO spark.SecurityManager: Changing view acls groups to:
16/10/12 16:19:33 INFO spark.SecurityManager: Changing modify acls groups to:
16/10/12 16:19:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(fuxiuyin); groups with view permissions: Set(); users with modify permissions: Set(fuxiuyin); groups with modify permissions: Set()
16/10/12 16:19:33 INFO yarn.Client: Submitting application application_1476256306830_0002 to ResourceManager
16/10/12 16:19:33 INFO impl.YarnClientImpl: Submitted application application_1476256306830_0002
16/10/12 16:19:33 INFO yarn.Client: Application report for application_1476256306830_0002 (state: ACCEPTED)
16/10/12 16:19:33 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1476260373944
final status: UNDEFINED
tracking URL: http://localhost:8088/proxy/application_1476256306830_0002/
user: fuxiuyin
16/10/12 16:19:33 INFO util.ShutdownHookManager: Shutdown hook called
16/10/12 16:19:33 INFO util.ShutdownHookManager: Deleting directory /opt/spark-2.0.0-bin-hadoop2.7/spark-3cdb2435-d6a0-4ce0-a54a-f2849d5f4909
It's success, but in yarn web-ui, this app isn't running always in ACCEPTED
Looks like no spark node run this app.
Can anyone tell me what's wrong?
Thanks~

You can specify one type of cluster:
YARN (cluster or client mode)
Spark standalone
Mesos
You have started Spark standalone server and you're connecting to this cluster manager. If you want to start Spark on YARN, you should specify yarn master - just --master yarn
Edit:
Please add logs and spark-submit command. Please also post how are you launching YARN. If first attempt was wrong, then it means you have configuration problem
Edit number 2: It seems that YARN doesn't have enough resources to process your application. Please check your config, i.e. check if maybe increasing yarn.nodemanager.resource.memory-mb will help. Also you can go to Spark Web UI - http://application-master-ip:4040 - and see information from Spark Context.
Also, you can check if you can deploy application to Spark Standalone (which you are also starting), just by setting --master spark://...: as in configuration. Then you will be sure if it is a problem with YARN or in Spark
BTW. You can omit running Spark Standalone if you're submitting to YARN :) And memory used by Stanalone Workers can be used by YARN

:). Thanks everyone, I'm so sorry to waste your time. When I check the resource in http://localhost:8088/ I noticed this:
I just stop the server and delete tmp directory and logs directory. Then it works.
Thank you again

Related

Submit a Spark job on a Yarn cluster from a remote client

I want to submit a Spark job on a remote YARN cluster using the spark-submit command. My client is a Windows machine and the cluster is composed of a master and 4 slaves. I copied the Hadoop config files from my cluster to the remote machine, namely core-site.xml and yarn-site.xml and set the HADOOP_CONF_DIR variable in spark-env.sh to point to them.
However, when I submit a job using the following command :
spark-submit --jars hdfs:///user/kmansour/elevation/geotrellis-1.2.1-assembly.jar \
--class tutorial.CalculateFlowDirection hdfs:///user/kmansour/elevation/demo_2.11-0.2.0.jar hdfs:///user/kmansour/elevation/TIF/DTM_1m_19_E_17_108_*.tif \
--deploy-mode cluster \
--master yarn
I get stuck with:
INFO yarn.Client: Application report for application_1519070657292_0088 (state: ACCEPTED)
Until I get this :
diagnostics: Application application_1519070657292_0088 failed 2 times due to AM Container for appattempt_1519070657292_0088_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://node1:8088/cluster/app/application_1519070657292_0088Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1519070657292_0088_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
When I check out the application tracking page, I get this on stderr :
18/03/13 14:48:05 INFO util.SignalUtils: Registered signal handler for TERM
18/03/13 14:48:05 INFO util.SignalUtils: Registered signal handler for HUP
18/03/13 14:48:05 INFO util.SignalUtils: Registered signal handler for INT
18/03/13 14:48:06 INFO yarn.ApplicationMaster: Preparing Local resources
18/03/13 14:48:08 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1519070657292_0088_000002
18/03/13 14:48:08 INFO spark.SecurityManager: Changing view acls to: kmansour
18/03/13 14:48:08 INFO spark.SecurityManager: Changing modify acls to: kmansour
18/03/13 14:48:08 INFO spark.SecurityManager: Changing view acls groups to:
18/03/13 14:48:08 INFO spark.SecurityManager: Changing modify acls groups to:
18/03/13 14:48:08 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kmansour); groups with view permissions: Set(); users with modify permissions: Set(kmansour); groups with modify permissions: Set()
18/03/13 14:48:08 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable.
18/03/13 14:50:15 ERROR yarn.ApplicationMaster: Failed to connect to driver at 132.156.9.98:50687, retrying ...
18/03/13 14:50:15 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:577)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:433)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:785)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
18/03/13 14:50:15 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
18/03/13 14:50:16 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
18/03/13 14:50:16 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://132.156.9.142:8020/user/kmansour/.sparkStaging/application_1519070657292_0088
18/03/13 14:50:16 INFO util.ShutdownHookManager: Shutdown hook called
The IP address of my master node is 132.156.9.142 and the IP address of my client is 132.156.9.98. The log shows me that the application master is attempting to connect to the driver on the client when I explicitly stated --deploy-mode cluster.
Shouldn't the driver driver be on a node in the cluster ?
This is the content of my config files :
spark-defaults.conf :
spark.eventLog.enabled true
spark.eventLog.dir hdfs://132.156.9.142:8020/events
spark.history.fs.logDirectory hdfs://132.156.9.142:8020/events
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.cores 2
spark.driver.memory 5g
spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.executor.instances 4
spark.executor.cores 2
spark.executor.memory 6g
spark.yarn.am.memory 2g
spark.yarn.jars hdfs://node1:8020/jars/*.jar
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>7168</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>5</value>
</property>
</configuration>
core-site.xml :
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://132.156.9.142:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>C:\Users\kmansour\Documents\hadoop-2.7.4\tmp</value>
</property>
</configuration>
I am very new at all this and perhaps my reasoning is flawed, any input or suggestions would help.
You need to change order or parameters passed to spark-submit. In your configuration:
spark-submit --jars hdfs:///user/kmansour/elevation/geotrellis-1.2.1-assembly.jar \
--class tutorial.CalculateFlowDirection hdfs:///user/kmansour/elevation/demo_2.11-0.2.0.jar hdfs:///user/kmansour/elevation/TIF/DTM_1m_19_E_17_108_*.tif \
--deploy-mode cluster \
--master yarn
Spark is called in default mode (yarn-client probably) and then your --deploy-mode and --master as passed as app parameters, because there are entered after jar file location. Change it to:
spark-submit --jars hdfs:///user/kmansour/elevation/geotrellis-1.2.1-assembly.jar \
--deploy-mode cluster \
--master yarn \
--class tutorial.CalculateFlowDirection hdfs:///user/kmansour/elevation/demo_2.11-0.2.0.jar hdfs:///user/kmansour/elevation/TIF/DTM_1m_19_E_17_108_*.tif
and you will get true yarn-cluster mode.

Spark 1.6.2 & YARN: diagnostics: Application failed 2 times due to AM Container for exited with exitCode: -1

I have a cluster of 2 machines and am trying to submit a spark job with YARN cluster manager.
vanilla Spark 1.6.2 built aginst hadoop 2.6.2
vanilla Hadoop 2.7.2
I can successfully run map-reduce jobs and spark jobs with standalone cluster manager. But when I run it with YARN, I got an error.
Any suggestions how to get it to work?
How do I enable more verbose logging? The error message is absolutely unclear
Why no log files are created under hadoop/logs/userlogs/applicationXXX?
Rhetorical question: IMO: hadoop logging & diagnostic isn't very well. Why is that? Hadoop seems to be an established product.
Below is the output:
mike#mp-desktop ~/opt/hadoop $ spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster ~/prg/scala/spark-examples_2.11-1.0.jar 10
16/07/09 08:59:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/09 08:59:01 INFO client.RMProxy: Connecting to ResourceManager at mp-desktop/192.168.1.60:8050
16/07/09 08:59:01 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
16/07/09 08:59:01 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/07/09 08:59:01 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/07/09 08:59:01 INFO yarn.Client: Setting up container launch context for our AM
16/07/09 08:59:01 INFO yarn.Client: Setting up the launch environment for our AM container
16/07/09 08:59:01 INFO yarn.Client: Preparing resources for our AM container
16/07/09 08:59:02 INFO yarn.Client: Uploading resource file:/home/mike/opt/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-assembly-1.6.2-hadoop2.6.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/home/mike/prg/scala/spark-examples_2.11-1.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-examples_2.11-1.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757/__spark_conf__7114661171911035574.zip -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/__spark_conf__7114661171911035574.zip
16/07/09 08:59:06 INFO spark.SecurityManager: Changing view acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: Changing modify acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mike); users with modify permissions: Set(mike)
16/07/09 08:59:07 INFO yarn.Client: Submitting application 1 to ResourceManager
16/07/09 08:59:07 INFO impl.YarnClientImpl: Submitted application application_1468043888852_0001
16/07/09 08:59:08 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:08 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1468043947113
final status: UNDEFINED
tracking URL: http://mp-desktop:8088/proxy/application_1468043888852_0001/
user: mike
16/07/09 08:59:09 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:10 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:11 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:12 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:13 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:14 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:15 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:16 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:17 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:18 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:19 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:20 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:21 INFO yarn.Client: Application report for application_1468043888852_0001 (state: FAILED)
16/07/09 08:59:21 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1468043888852_0001 failed 2 times due to AM Container for appattempt_1468043888852_0001_000002 exited with exitCode: -1
For more detailed output, check application tracking page:http://mp-desktop:8088/cluster/app/application_1468043888852_0001Then, click on links to logs of each attempt.
Diagnostics: File /home/mike/hadoopstorage/nm-local-dir/usercache/mike/appcache/application_1468043888852_0001/container_1468043888852_0001_02_000001 does not exist
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1468043947113
final status: FAILED
tracking URL: http://mp-desktop:8088/cluster/app/application_1468043888852_0001
user: mike
16/07/09 08:59:21 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1468043888852_0001
Exception in thread "main" org.apache.spark.SparkException: Application application_1468043888852_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/07/09 08:59:21 INFO util.ShutdownHookManager: Shutdown hook called
16/07/09 08:59:21 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757
Thanks!
The error message I had was similar:
16/07/15 13:55:53 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:54 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:55 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:56 INFO Client: Application report for application_1468583505911_0002 (state: FAILED)
16/07/15 13:55:56 INFO Client:
client token: N/A
diagnostics: Application application_1468583505911_0002 failed 2 times due to AM Container for appattempt_1468583505911_0002_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://<redacted>:8088/cluster/app/application_1468583505911_0002Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
java.io.FileNotFoundException: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
Try running with YARN in client mode instead of cluster mode which prints out the driver program log to your shell:
spark-submit --class myClass --master yarn /path/to/myClass.jar
The log output showed that myClass was failing immediately because I had the incorrect number of args (the class expected more than 1 arg).
The class had failed with my custom exit code (42) and prints "Usage" info to the log, allowing me to fix the actual problem.
When I ran with --master yarn-cluster, this output was not visible to me and I could not see the "Usage" information mentioned above. Instead, all I had was the vague "File does not exist" issue shown above.
Specifying the correct number of arguments to myClass resolved the issue.
At this point I've assumed that my Spark job failed so quickly that it started cleaning up the .sparkStaging files it had copied before YARN had checked for them.
Probably you have solved your problem but I faced the same issue this morning with Spark 2.1 in yarn cluster and I found this post. I had the same error as you did and my problem was spark conference object which needed:
conf = (SparkConf()
.setMaster("yarn") #I had this value as local
.setAppName("My app Name")
So when I changed this and did my spark submit ( --master yarn --deploy-mode cluster ) everything worked right.
I solved this problem by the following statement and I use CDH5.14.1 and spark 1.6
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
Here is the link:https://spark.apache.org/docs/1.6.0/running-on-yarn.html

Spark HDFS Exception in createBlockOutputStream while uploading resource file

I'm trying to run my JAR in the cluster with yarn-cluster but i'm getting an exception after a while. The last INFO before it fails is Uploading resource. I've check all the security groups, did hsdf ls with success but still getting the error.
./bin/spark-submit --class MyMainClass --master yarn-cluster /tmp/myjar-1.0.jar myjarparameter
16/01/21 16:13:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/21 16:13:52 INFO client.RMProxy: Connecting to ResourceManager at yarn.myserver.com/publicip:publicport
16/01/21 16:13:53 INFO yarn.Client: Requesting a new application from cluster with 10 NodeManagers
16/01/21 16:13:53 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (13312 MB per container)
16/01/21 16:13:53 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/01/21 16:13:53 INFO yarn.Client: Setting up container launch context for our AM
16/01/21 16:13:53 INFO yarn.Client: Preparing resources for our AM container
16/01/21 16:13:54 INFO yarn.Client: Uploading resource file:/opt/spark-1.2.0-bin-hadoop2.3/lib/spark-assembly-1.2.0-hadoop2.3.0.jar -> hdfs://hdfs.myserver.com/user/henrique/.sparkStaging/application_1452514285349_6427/spark-assembly-1.2.0-hadoop2.3.0.jar
16/01/21 16:14:55 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/PRIVATE_IP:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1341)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1167)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1122)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:522)
16/01/21 16:14:55 INFO hdfs.DFSClient: Abandoning BP-26920217-10.140.213.58-1440247331237:blk_1132201932_58466886
16/01/21 16:14:55 INFO hdfs.DFSClient: Excluding datanode 10.164.16.207:50010
16/01/21 16:15:55 INFO hdfs.DFSClient: Exception in createBlockOutputStream
./bin/hadoop fs -ls /user/henrique/.sparkStaging/
drwx------- henrique supergroup 0 2016-01-20 18:36 user/henrique/.sparkStaging/application_1452514285349_5868
drwx------ henrique supergroup 0 2016-01-21 16:13 user/henrique/.sparkStaging/application_1452514285349_6427
drwx------ henrique supergroup 0 2016-01-21 17:06 user/henrique/.sparkStaging/application_1452514285349_6443
SOLVED! Hadoop was trying to connect to private IPs. The problem was solved by adding this config to hsdf-site.xml
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>

Spark 1.3.0: Running Pi example on YARN fails

I have Hadoop 2.6.0.2.2.0.0-2041 with Hive 0.14.0.2.2.0.0-2041
After building Spark with command:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests package
I try to run Pi example on YARN with the following command:
export HADOOP_CONF_DIR=/etc/hadoop/conf
/var/home2/test/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--executor-memory 3G \
--num-executors 50 \
hdfs:///user/test/jars/spark-examples-1.3.0-hadoop2.4.0.jar \
1000
I get exceptions: application_1427875242006_0029 failed 2 times due to AM Container for appattempt_1427875242006_0029_000002 exited with exitCode: 1 Which in fact is Diagnostics: Exception from container-launch.(please see log below).
Application tracking url reveals the following messages:
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all
and also:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I have Hadoop working fine on 4 nodes and completly at a loss how to make Spark work on YARN.
Should I set spark.yarn.access.namenodes Spark configuration property? Though my application does not need to access any name nodes directly, but maybe this will solve the problem?
Please advise where to look for, any ideas would be of great help, thank you!
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/04/06 10:53:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/06 10:53:42 INFO impl.TimelineClientImpl: Timeline service address: http://etl-hdp-yarn.foo.bar.com:8188/ws/v1/timeline/
15/04/06 10:53:42 INFO client.RMProxy: Connecting to ResourceManager at etl-hdp-yarn.foo.bar.com/192.168.0.16:8050
15/04/06 10:53:42 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
15/04/06 10:53:42 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
15/04/06 10:53:42 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/04/06 10:53:42 INFO yarn.Client: Setting up container launch context for our AM
15/04/06 10:53:42 INFO yarn.Client: Preparing resources for our AM container
15/04/06 10:53:43 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/04/06 10:53:43 INFO yarn.Client: Uploading resource file:/var/home2/test/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.6.0.jar -> hdfs://etl-hdp-nn1.foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0029/spark-assembly-1.3.0-hadoop2.6.0.jar
15/04/06 10:53:44 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/user/test/jars/spark-examples-1.3.0-hadoop2.4.0.jar
15/04/06 10:53:44 INFO yarn.Client: Setting up the launch environment for our AM container
15/04/06 10:53:44 INFO spark.SecurityManager: Changing view acls to: test
15/04/06 10:53:44 INFO spark.SecurityManager: Changing modify acls to: test
15/04/06 10:53:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test); users with modify permissions: Set(test)
15/04/06 10:53:44 INFO yarn.Client: Submitting application 29 to ResourceManager
15/04/06 10:53:44 INFO impl.YarnClientImpl: Submitted application application_1427875242006_0029
15/04/06 10:53:45 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:45 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1428317623905
final status: UNDEFINED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0029/
user: test
15/04/06 10:53:46 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:47 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:48 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:49 INFO yarn.Client: Application report for application_1427875242006_0029 (state: FAILED)
15/04/06 10:53:49 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1427875242006_0029 failed 2 times due to AM Container for appattempt_1427875242006_0029_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0029/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1427875242006_0029_02_000001
Exit code: 1
Exception message: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0029/container_1427875242006_0029_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0029/container_1427875242006_0029_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1428317623905
final status: FAILED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/cluster/app/application_1427875242006_0029
user: test
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
If you are using spark with hdp, then we have to do following things.
Add these entries in your $SPARK_HOME/conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)
create java-opts file in $SPARK_HOME/conf and add the installed HDP version in that file like
-Dhdp.version=2.2.0.0-2041 (your installed HDP version)
to know hdp verion please run command hdp-select status hadoop-client in the cluster
This is a bug in the HDP - Spark Integration
In your spark-defaults.conf add the following lines
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
This should help address the issue
I think your hadoop classpath is not setup.
lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Spark 1.3.0 on YARN: Application failed 2 times due to AM Container

When running Spark 1.3.0 Pi example on YARN (Hadoop 2.6.0.2.2.0.0-2041) with the following script:
# Run on a YARN cluster
export HADOOP_CONF_DIR=/etc/hadoop/conf
/var/home2/test/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--executor-memory 3G \
--num-executors 50 \
/var/home2/test/spark/lib/spark-examples-1.3.0-hadoop2.4.0.jar \
1000
It fails with "Application failed 2 times due to AM Container" message (please see below). As far as I understand, all neccessary information to run Spark application in YARN mode is provided in this launch script. What else should be configured to run on YARN. What is missing? Other reasons for YARN launch to fail?
[test#etl-hdp-mgmt pi]$ ./run-pi.sh
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/04/01 12:59:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/01 12:59:58 INFO client.RMProxy: Connecting to ResourceManager at etl-hdp-yarn.foo.bar.com/192.168.0.16:8050
15/04/01 12:59:58 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
15/04/01 12:59:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
15/04/01 12:59:58 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/04/01 12:59:58 INFO yarn.Client: Setting up container launch context for our AM
15/04/01 12:59:58 INFO yarn.Client: Preparing resources for our AM container
15/04/01 12:59:59 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/04/01 12:59:59 INFO yarn.Client: Uploading resource file:/var/home2/test/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar -> hdfs://foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0010/spark-assembly-1.3.0-hadoop2.4.0.jar
15/04/01 13:00:01 INFO yarn.Client: Uploading resource file:/var/home2/test/spark/lib/spark-examples-1.3.0-hadoop2.4.0.jar -> hdfs://foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0010/spark-examples-1.3.0-hadoop2.4.0.jar
15/04/01 13:00:02 INFO yarn.Client: Setting up the launch environment for our AM container
15/04/01 13:00:03 INFO spark.SecurityManager: Changing view acls to: test
15/04/01 13:00:03 INFO spark.SecurityManager: Changing modify acls to: test
15/04/01 13:00:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test); users with modify permissions: Set(test)
15/04/01 13:00:03 INFO yarn.Client: Submitting application 10 to ResourceManager
15/04/01 13:00:03 INFO impl.YarnClientImpl: Submitted application application_1427875242006_0010
15/04/01 13:00:04 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:04 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1427893202566
final status: UNDEFINED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0010/
user: test
15/04/01 13:00:05 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:06 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:07 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:08 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:09 INFO yarn.Client: Application report for application_1427875242006_0010 (state: FAILED)
15/04/01 13:00:09 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1427875242006_0010 failed 2 times due to AM Container for appattempt_1427875242006_0010_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0010/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1427875242006_0010_02_000001
Exit code: 1
Exception message: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0010/container_1427875242006_0010_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0010/container_1427875242006_0010_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1427893202566
final status: FAILED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/cluster/app/application_1427875242006_0010
user: test
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Run
yarn logs -applicationId application_1427875242006_0010 > /tmp/application_1427875242006_0010
Logs there should indicate why it failed.
"Failed 2 times" happens because when you run in yarn cluster mode, driver runs in AM whose retry is 2 by default.
So your driver is retried twice.
I totally agree with #SeanOwen. Follow the Spark Building documentation.
You need to compile spark for YARN using the correct configuration for your hadoop cluster (version,hive support, etc).
The problem won't persist then!
This is the problem with spark communicating with Application Master.
The RM and NM talk to each other over RPC so the problem could be launch_container.cmd is not running correctly. Check that the NM has communicating with RM while submitting job
Try adding this to your yarn-site.xml:
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>1200</value>
</property>
This will ensure that the launch_container.cmd from the NM error seen does not get deleted ( will remain around for 20 mins - increase 1200 to a higher number if needed). Now, what you can do is try and run that launch_container.cmd script manually from the container dir and see where it bails out.
Hope this will help you.
I also faced a similar problem. Actually, you do not need to mention --master yarn-cluster when you are running your self-contained application in the cluster.
This problem has been solved on Cloudera forum refer this https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Issue-running-spark-application-in-Yarn-cluster-mode/td-p/44570

Resources