Spark: Unknown/unsupported param error when setting conf.yarn.jar - hadoop

I have a little application that runs fine on my Spark cluster based on Yarn when I commit it with spark-submit like this:
~/spark-1.4.0-bin-hadoop2.4$ bin/spark-submit --class MyClass --master yarn-cluster --queue testing myApp.jar hdfs://nameservice1/user/XXX/README.md_count
However, I would like to avoid uploading the spark-assembly.jar file each time, so I set the spark.yarn.jar configuration parameter:
~/spark-1.4.0-bin-hadoop2.4$ bin/spark-submit --class MyClass --master yarn-cluster --queue testing --conf "spark.yarn.jar=hdfs://nameservice1/user/spark/share/lib/spark-assembly.jar" myApp.jar hdfs://nameservice1/user/XXX/README.md_count
This seems to be fine at first:
15/07/08 13:57:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/08 13:57:18 INFO yarn.Client: Requesting a new application from cluster with 24 NodeManagers
15/07/08 13:57:18 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/07/08 13:57:18 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/07/08 13:57:18 INFO yarn.Client: Setting up container launch context for our AM
15/07/08 13:57:18 INFO yarn.Client: Preparing resources for our AM container
15/07/08 13:57:18 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs://nameservice1/user/spark/share/lib/spark-assembly.jar
[...]
However, it fails eventually:
15/07/08 13:57:18 INFO yarn.Client: Submitting application 670 to ResourceManager
15/07/08 13:57:18 INFO impl.YarnClientImpl: Submitted application application_1434986503384_0670
15/07/08 13:57:19 INFO yarn.Client: Application report for application_1434986503384_0670 (state: ACCEPTED)
15/07/08 13:57:19 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: testing
start time: 1436356638869
final status: UNDEFINED
tracking URL: http://node-00a/cluster/app/application_1434986503384_0670
user: XXX
15/07/08 13:57:20 INFO yarn.Client: Application report for application_1434986503384_0670 (state: ACCEPTED)
15/07/08 13:57:21 INFO yarn.Client: Application report for application_1434986503384_0670 (state: ACCEPTED)
15/07/08 13:57:23 INFO yarn.Client: Application report for application_1434986503384_0670 (state: FAILED)
15/07/08 13:57:23 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1434986503384_0670 failed 2 times due to AM Container for appattempt_1434986503384_0670_000002 exited with exitCode: 1 due to: Exception from container-launch.
Container id: container_1434986503384_0670_02_000001
Exit code: 1
[...]
In the Yarn log, I find the following error message indicating a wrong usage of parameters:
Container: container_1434986503384_0670_01_000001 on node-01b_8041
===================================================================================================
LogType:stderr
Log Upload Time:Mi Jul 08 13:57:22 +0200 2015
LogLength:764
Log Contents:
Unknown/unsupported param List(--arg, hdfs://nameservice1/user/XXX/README.md_count, --executor-memory, 1024m, --executor-cores, 1, --num-executors, 2)
Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required)
--class CLASS_NAME Name of your application's main class (required)
--args ARGS Arguments to be passed to your application's main class.
Mutliple invocations are possible, each will be passed in order.
--num-executors NUM Number of executors to start (Default: 2)
--executor-cores NUM Number of cores for the executors (Default: 1)
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
End of LogType:stderr
As the same application runs when uploading the local assembly file upon submission, it seems to come down to the assembly file. Could the one on the cluster be a wrong/different version? How could I validate that? What other reasons might be the cause? Is the warning WARN util.NativeCodeLoader: ... possibly related?
The same happens when I set the (deprecated) environment variable SPARK_JAR instead of setting spark.yarn.jar.

Asking the obvious question here: are you sure the spark-assembly.jar on HDFS is the same one as you have locally? If not, can you try uploading your local spark-assembly to your home directory on HDFS and try again?

Related

Yarn Job doesn't go past "state: ACCEPTED"

Thank you in advance for a any help.
I am running a yarn job using provided Hadoop example. The job never completes and stays at the "ACCEPTED" state. Looking at what is being printed out, it seems like the job is waiting to be completed -- and the client continuously probing for the job status.
Example job (from Hadoop 2.6.0):
spark-submit --master yarn-client --driver-memory 4g --executor-memory 2g --executor-cores 4 --class org.apache.spark.examples.SparkPi /home/john/spark/spark-1.6.1-bin-hadoop2.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar 100
Output:
....
....
disabled; ui acls disabled; users with view permissions: Set(john); users with modify permissions: Set(jogn)
16/07/27 17:36:09 INFO yarn.Client: Submitting application 1 to ResourceManager
16/07/27 17:36:09 INFO impl.YarnClientImpl: Submitted application application_1469665943738_0001
16/07/27 17:36:10 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:10 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1469666169333
final status: UNDEFINED
tracking URL: http://cpt-bdx021:8088/proxy/application_1469665943738_0001/
user: john
16/07/27 17:36:11 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:12 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:13 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:14 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:15 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:16 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:17 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:18 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:19 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:20 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:21 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
16/07/27 17:36:22 INFO yarn.Client: Application report for application_1469665943738_0001 (state: ACCEPTED)
...........
...........
...........
UPDATE (Looks like job was submitted to ResourceManager -- hence "ACCEPTED", but ResourceManager "sees" no nodes or hadoop workers to actually get job across to):
$ jps
jps
12404 Jps
12211 NameNode
12315 DataNode
11743 ApplicationHistoryServer
11876 ResourceManager
11542 NodeManager
$ yarn node -list
16/07/27 23:07:53 INFO client.RMProxy: Connecting to ResourceManager at /192.168.0.5.55:8032
Total Nodes:0
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
UPDATE(2):I am using the default etc/container-executor.cfg file:
yarn.nodemanager.linux-container-executor.group=#configured value of yarn.nodemanager.linux-container-executor.group
banned.users=#comma separated list of users who can not run applications
min.user.id=1000#Prevent other super-users
allowed.system.users=##comma separated list of system users who CAN run applications
Also, as I side, I want to mention that I do not have a hadoop user or hadoop` user group. I am using the default account with which I logged on to the system. If that matters. Thanks!
UPDATE(3): NodeManager log
org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at 192.168.0.5.55:8031
2016-07-28 00:23:26,083 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []
2016-07-28 00:23:26,087 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2016-07-28 00:23:26,233 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -160570002
2016-07-28 00:23:26,236 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1876215653
2016-07-28 00:23:26,237 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as 192.168.0.5.55:53034 with total resource of <memory:8192, vCores:8>
2016-07-28 00:23:26,237 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
Reason why your job never get completed is because it never goes to state RUNNING (from state ACCEPTED). There is a scheduler which takes care of scheduling what apps will get resources and thereby to state RUNNING.
There are two schedulers available: fair-scheduler and capacity-scheduler. You can find details in Hadoop Yarn documentation. If you could provide yarn-site.xml, capacity-scheduler.xml and fair-scheduler.xml files I would give you better help :).
The most common possibility is that the queue that you are sending your job to does not have the available resources you are requesting.
Typical problems may be:
Resource requirements (memory and/or cores). You're asking for more memory/cores that it is able to allocate. This may be because of a near-full use of cluster, or that your settings are not consistent. More details on this page.
Disk space. Check node space, there is a health check that may stop you from running application.
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
In a multitenant / multiqueue environment, if there are hard resource limits per-queue, your application may be hitting those. You may want to increase your settings, or test in another queue with more resources.

Spark 1.6.2 & YARN: diagnostics: Application failed 2 times due to AM Container for exited with exitCode: -1

I have a cluster of 2 machines and am trying to submit a spark job with YARN cluster manager.
vanilla Spark 1.6.2 built aginst hadoop 2.6.2
vanilla Hadoop 2.7.2
I can successfully run map-reduce jobs and spark jobs with standalone cluster manager. But when I run it with YARN, I got an error.
Any suggestions how to get it to work?
How do I enable more verbose logging? The error message is absolutely unclear
Why no log files are created under hadoop/logs/userlogs/applicationXXX?
Rhetorical question: IMO: hadoop logging & diagnostic isn't very well. Why is that? Hadoop seems to be an established product.
Below is the output:
mike#mp-desktop ~/opt/hadoop $ spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster ~/prg/scala/spark-examples_2.11-1.0.jar 10
16/07/09 08:59:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/09 08:59:01 INFO client.RMProxy: Connecting to ResourceManager at mp-desktop/192.168.1.60:8050
16/07/09 08:59:01 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
16/07/09 08:59:01 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/07/09 08:59:01 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/07/09 08:59:01 INFO yarn.Client: Setting up container launch context for our AM
16/07/09 08:59:01 INFO yarn.Client: Setting up the launch environment for our AM container
16/07/09 08:59:01 INFO yarn.Client: Preparing resources for our AM container
16/07/09 08:59:02 INFO yarn.Client: Uploading resource file:/home/mike/opt/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-assembly-1.6.2-hadoop2.6.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/home/mike/prg/scala/spark-examples_2.11-1.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-examples_2.11-1.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757/__spark_conf__7114661171911035574.zip -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/__spark_conf__7114661171911035574.zip
16/07/09 08:59:06 INFO spark.SecurityManager: Changing view acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: Changing modify acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mike); users with modify permissions: Set(mike)
16/07/09 08:59:07 INFO yarn.Client: Submitting application 1 to ResourceManager
16/07/09 08:59:07 INFO impl.YarnClientImpl: Submitted application application_1468043888852_0001
16/07/09 08:59:08 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:08 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1468043947113
final status: UNDEFINED
tracking URL: http://mp-desktop:8088/proxy/application_1468043888852_0001/
user: mike
16/07/09 08:59:09 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:10 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:11 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:12 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:13 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:14 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:15 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:16 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:17 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:18 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:19 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:20 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:21 INFO yarn.Client: Application report for application_1468043888852_0001 (state: FAILED)
16/07/09 08:59:21 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1468043888852_0001 failed 2 times due to AM Container for appattempt_1468043888852_0001_000002 exited with exitCode: -1
For more detailed output, check application tracking page:http://mp-desktop:8088/cluster/app/application_1468043888852_0001Then, click on links to logs of each attempt.
Diagnostics: File /home/mike/hadoopstorage/nm-local-dir/usercache/mike/appcache/application_1468043888852_0001/container_1468043888852_0001_02_000001 does not exist
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1468043947113
final status: FAILED
tracking URL: http://mp-desktop:8088/cluster/app/application_1468043888852_0001
user: mike
16/07/09 08:59:21 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1468043888852_0001
Exception in thread "main" org.apache.spark.SparkException: Application application_1468043888852_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/07/09 08:59:21 INFO util.ShutdownHookManager: Shutdown hook called
16/07/09 08:59:21 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757
Thanks!
The error message I had was similar:
16/07/15 13:55:53 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:54 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:55 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:56 INFO Client: Application report for application_1468583505911_0002 (state: FAILED)
16/07/15 13:55:56 INFO Client:
client token: N/A
diagnostics: Application application_1468583505911_0002 failed 2 times due to AM Container for appattempt_1468583505911_0002_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://<redacted>:8088/cluster/app/application_1468583505911_0002Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
java.io.FileNotFoundException: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
Try running with YARN in client mode instead of cluster mode which prints out the driver program log to your shell:
spark-submit --class myClass --master yarn /path/to/myClass.jar
The log output showed that myClass was failing immediately because I had the incorrect number of args (the class expected more than 1 arg).
The class had failed with my custom exit code (42) and prints "Usage" info to the log, allowing me to fix the actual problem.
When I ran with --master yarn-cluster, this output was not visible to me and I could not see the "Usage" information mentioned above. Instead, all I had was the vague "File does not exist" issue shown above.
Specifying the correct number of arguments to myClass resolved the issue.
At this point I've assumed that my Spark job failed so quickly that it started cleaning up the .sparkStaging files it had copied before YARN had checked for them.
Probably you have solved your problem but I faced the same issue this morning with Spark 2.1 in yarn cluster and I found this post. I had the same error as you did and my problem was spark conference object which needed:
conf = (SparkConf()
.setMaster("yarn") #I had this value as local
.setAppName("My app Name")
So when I changed this and did my spark submit ( --master yarn --deploy-mode cluster ) everything worked right.
I solved this problem by the following statement and I use CDH5.14.1 and spark 1.6
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
Here is the link:https://spark.apache.org/docs/1.6.0/running-on-yarn.html

running a spark submit job as cluster deploy mode fails but passes with client

EDITI: by removing the conf setting in the app for 'setMaster' I'm able to run yarn-cluster successfully - if anyone coudl help with spark master as cluster deploy - that'd be fantastic
I'm trying to set up spark on a local testmachine so that I can read from an s3 bucket and then write back to it.
Running the jar/application using client works fine, well, fine in that it goes off to the bucket and creates a file and comes back again.
However I need this to work in cluster mode so that it more closely resembles our prod environment yet it's constantly failing - no real sensible messages in the logs that I can see and little feedback to go on.
Any help is greatly appreciated - I'm very new to spark/hadoop so may have overlooked something obvious.
I also tried running with yarn-cluster as the master but that failed for a different reason (saying it couldn't find the s3Native classes - which I pass in as jars)
This is one a windows ev
The command I'm running:
c:\>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --deploy-mode cluster --master spark://127.0.0.1:7077 --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://bucket/jar/fileInputRename.txt"
The output from this on the console is:
Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
master spark://127.0.0.1:7077
deployMode cluster
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile C:\Spark\bin\..\conf\spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass FileInputRename
primaryResource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
name FileInputRename
childArgs [s3://SessionCam-Steve/jar/fileInputRename.txt]
jars file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:
Running Spark using the REST application submission protocol.
Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
FileInputRename
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> FileInputRename
spark.jars -> file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar,file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
spark.submit.deployMode -> cluster
spark.master -> spark://127.0.0.1:7077
Classpath elements:
16/03/24 12:01:56 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://127.0.0.1:7077.
After a few more seconds it shows the c prompt and nothing else. The logs on 8080 :
Application ID Name Cores Memory per Node Submitted Time User State Duration
app-20160324120221-0016 FileInputRename 1 1024.0 MB 2016/03/24 12:02:21 Administrator FINISHED 3 s
where the error message only shows:
16/03/24 12:02:24 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
If I run the yarn-cluster as main so that this is my command:
c:>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --master yarn-cluster --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://SessionCam-Steve/jar/fileInputRename.txt"
The output and exception:
Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
master yarn-cluster
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile C:\Spark\bin\..\conf\spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass FileInputRename
primaryResource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
name FileInputRename
childArgs [s3://SessionCam-Steve/jar/fileInputRename.txt]
jars file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:
Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
FileInputRename
--addJars
file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
--jar
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
--class
FileInputRename
--arg
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.app.name -> FileInputRename
spark.submit.deployMode -> cluster
spark.master -> yarn-cluster
Classpath elements:
16/03/24 12:05:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/03/24 12:05:23 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/03/24 12:05:23 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/03/24 12:05:23 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/03/24 12:05:23 INFO yarn.Client: Setting up container launch context for our AM
16/03/24 12:05:23 INFO yarn.Client: Setting up the launch environment for our AM container
16/03/24 12:05:23 INFO yarn.Client: Preparing resources for our AM container
16/03/24 12:05:24 WARN : Your hostname, WIN-EU4MXZ2GSIW resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:a94:1d11%14, but we couldn't find any external IP address!
16/03/24 12:05:25 INFO yarn.Client: Uploading resource file:/C:/Spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/spark-assembly-1.6.1-had
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/sparkSubmit_NoJarSetInConf.j
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/hadoop-aws-2.
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/aws-java-sd
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/temp/2/spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1/__spark_conf__7363738392648975127.zip -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_14
16/03/24 12:05:28 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
16/03/24 12:05:28 INFO yarn.Client: Submitting application 4 to ResourceManager
16/03/24 12:05:29 INFO impl.YarnClientImpl: Submitted application application_1458817514983_0004
16/03/24 12:05:30 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:30 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458821128787
final status: UNDEFINED
tracking URL: http://WIN-EU4MXZ2GSIW:8088/proxy/application_1458817514983_0004/
user: Administrator
16/03/24 12:05:31 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:32 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:33 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:34 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:35 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:36 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:37 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:38 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:39 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:40 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:41 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:42 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:43 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:44 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:45 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:46 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:47 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:48 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:49 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:50 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:51 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:52 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:53 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:54 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:55 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:57 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:58 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:59 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:00 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:01 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:02 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:03 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:04 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:05 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:06 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:07 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:08 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:09 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:10 INFO yarn.Client: Application report for application_1458817514983_0004 (state: FAILED)
16/03/24 12:06:10 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1458817514983_0004 failed 2 times due to AM Container for appattempt_1458817514983_0004_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1458817514983_0004_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: 1 file(s) moved.
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458821128787
final status: FAILED
tracking URL: http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004
user: Administrator
Exception in thread "main" org.apache.spark.SparkException: Application application_1458817514983_0004 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/24 12:06:10 INFO util.ShutdownHookManager: Shutdown hook called
16/03/24 12:06:10 INFO util.ShutdownHookManager: Deleting directory C:\temp\2\spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1
this creates two application ids in the gui:
Application ID Name Cores Memory per Node Submitted Time User State Duration
app-20160324120600-0018 FileInputRename 2 1024.0 MB 2016/03/24 12:06:00 Administrator FINISHED 9 s
app-20160324120543-0017 FileInputRename 2 1024.0 MB 2016/03/24 12:05:43 Administrator FINISHED 8 s
both of which have this as the exception:
16/03/24 12:05:49 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:84)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 16 more
it'd be fantastic and a huge relief if I could get one of these working - thank you in advance for any help.
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
you have problem with classpath. in cluster mode the "application" is executed on one of nodes which probably has their own classpath, so
Either
hadoop-aws-2.7.1.jar for some reason is not present at all(despite the fact that you provide it with --jars - check if it present on all workers at provided path)
or there is another hadoop-aws jar in classpath with another version(personally I think it would be 2nd variant).
Try remove --jars=... argument and see if it helps.
If you have already your yarn cluster installed and configured you just need to set HADOOP_CONF_DIR environment variable to point to your client hadoop configuration. You don't need specify those aws jars since they are already provided by your hadoop cluster. So if you provide them again they could conflict with the ones already there. Reference the documentation here for the spark submit. Also I would suggest to use s3n as protocol to read and write from S3.

Spark 1.3.0: Running Pi example on YARN fails

I have Hadoop 2.6.0.2.2.0.0-2041 with Hive 0.14.0.2.2.0.0-2041
After building Spark with command:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests package
I try to run Pi example on YARN with the following command:
export HADOOP_CONF_DIR=/etc/hadoop/conf
/var/home2/test/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--executor-memory 3G \
--num-executors 50 \
hdfs:///user/test/jars/spark-examples-1.3.0-hadoop2.4.0.jar \
1000
I get exceptions: application_1427875242006_0029 failed 2 times due to AM Container for appattempt_1427875242006_0029_000002 exited with exitCode: 1 Which in fact is Diagnostics: Exception from container-launch.(please see log below).
Application tracking url reveals the following messages:
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all
and also:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
I have Hadoop working fine on 4 nodes and completly at a loss how to make Spark work on YARN.
Should I set spark.yarn.access.namenodes Spark configuration property? Though my application does not need to access any name nodes directly, but maybe this will solve the problem?
Please advise where to look for, any ideas would be of great help, thank you!
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/04/06 10:53:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/06 10:53:42 INFO impl.TimelineClientImpl: Timeline service address: http://etl-hdp-yarn.foo.bar.com:8188/ws/v1/timeline/
15/04/06 10:53:42 INFO client.RMProxy: Connecting to ResourceManager at etl-hdp-yarn.foo.bar.com/192.168.0.16:8050
15/04/06 10:53:42 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
15/04/06 10:53:42 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
15/04/06 10:53:42 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/04/06 10:53:42 INFO yarn.Client: Setting up container launch context for our AM
15/04/06 10:53:42 INFO yarn.Client: Preparing resources for our AM container
15/04/06 10:53:43 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/04/06 10:53:43 INFO yarn.Client: Uploading resource file:/var/home2/test/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.6.0.jar -> hdfs://etl-hdp-nn1.foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0029/spark-assembly-1.3.0-hadoop2.6.0.jar
15/04/06 10:53:44 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/user/test/jars/spark-examples-1.3.0-hadoop2.4.0.jar
15/04/06 10:53:44 INFO yarn.Client: Setting up the launch environment for our AM container
15/04/06 10:53:44 INFO spark.SecurityManager: Changing view acls to: test
15/04/06 10:53:44 INFO spark.SecurityManager: Changing modify acls to: test
15/04/06 10:53:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test); users with modify permissions: Set(test)
15/04/06 10:53:44 INFO yarn.Client: Submitting application 29 to ResourceManager
15/04/06 10:53:44 INFO impl.YarnClientImpl: Submitted application application_1427875242006_0029
15/04/06 10:53:45 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:45 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1428317623905
final status: UNDEFINED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0029/
user: test
15/04/06 10:53:46 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:47 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:48 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:49 INFO yarn.Client: Application report for application_1427875242006_0029 (state: FAILED)
15/04/06 10:53:49 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1427875242006_0029 failed 2 times due to AM Container for appattempt_1427875242006_0029_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0029/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1427875242006_0029_02_000001
Exit code: 1
Exception message: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0029/container_1427875242006_0029_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0029/container_1427875242006_0029_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1428317623905
final status: FAILED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/cluster/app/application_1427875242006_0029
user: test
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
If you are using spark with hdp, then we have to do following things.
Add these entries in your $SPARK_HOME/conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)
create java-opts file in $SPARK_HOME/conf and add the installed HDP version in that file like
-Dhdp.version=2.2.0.0-2041 (your installed HDP version)
to know hdp verion please run command hdp-select status hadoop-client in the cluster
This is a bug in the HDP - Spark Integration
In your spark-defaults.conf add the following lines
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041
This should help address the issue
I think your hadoop classpath is not setup.
lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Spark 1.3.0 on YARN: Application failed 2 times due to AM Container

When running Spark 1.3.0 Pi example on YARN (Hadoop 2.6.0.2.2.0.0-2041) with the following script:
# Run on a YARN cluster
export HADOOP_CONF_DIR=/etc/hadoop/conf
/var/home2/test/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--executor-memory 3G \
--num-executors 50 \
/var/home2/test/spark/lib/spark-examples-1.3.0-hadoop2.4.0.jar \
1000
It fails with "Application failed 2 times due to AM Container" message (please see below). As far as I understand, all neccessary information to run Spark application in YARN mode is provided in this launch script. What else should be configured to run on YARN. What is missing? Other reasons for YARN launch to fail?
[test#etl-hdp-mgmt pi]$ ./run-pi.sh
Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/04/01 12:59:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/01 12:59:58 INFO client.RMProxy: Connecting to ResourceManager at etl-hdp-yarn.foo.bar.com/192.168.0.16:8050
15/04/01 12:59:58 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
15/04/01 12:59:58 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
15/04/01 12:59:58 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/04/01 12:59:58 INFO yarn.Client: Setting up container launch context for our AM
15/04/01 12:59:58 INFO yarn.Client: Preparing resources for our AM container
15/04/01 12:59:59 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/04/01 12:59:59 INFO yarn.Client: Uploading resource file:/var/home2/test/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar -> hdfs://foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0010/spark-assembly-1.3.0-hadoop2.4.0.jar
15/04/01 13:00:01 INFO yarn.Client: Uploading resource file:/var/home2/test/spark/lib/spark-examples-1.3.0-hadoop2.4.0.jar -> hdfs://foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0010/spark-examples-1.3.0-hadoop2.4.0.jar
15/04/01 13:00:02 INFO yarn.Client: Setting up the launch environment for our AM container
15/04/01 13:00:03 INFO spark.SecurityManager: Changing view acls to: test
15/04/01 13:00:03 INFO spark.SecurityManager: Changing modify acls to: test
15/04/01 13:00:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test); users with modify permissions: Set(test)
15/04/01 13:00:03 INFO yarn.Client: Submitting application 10 to ResourceManager
15/04/01 13:00:03 INFO impl.YarnClientImpl: Submitted application application_1427875242006_0010
15/04/01 13:00:04 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:04 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1427893202566
final status: UNDEFINED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0010/
user: test
15/04/01 13:00:05 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:06 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:07 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:08 INFO yarn.Client: Application report for application_1427875242006_0010 (state: ACCEPTED)
15/04/01 13:00:09 INFO yarn.Client: Application report for application_1427875242006_0010 (state: FAILED)
15/04/01 13:00:09 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1427875242006_0010 failed 2 times due to AM Container for appattempt_1427875242006_0010_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0010/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1427875242006_0010_02_000001
Exit code: 1
Exception message: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0010/container_1427875242006_0010_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0010/container_1427875242006_0010_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1427893202566
final status: FAILED
tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/cluster/app/application_1427875242006_0010
user: test
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Run
yarn logs -applicationId application_1427875242006_0010 > /tmp/application_1427875242006_0010
Logs there should indicate why it failed.
"Failed 2 times" happens because when you run in yarn cluster mode, driver runs in AM whose retry is 2 by default.
So your driver is retried twice.
I totally agree with #SeanOwen. Follow the Spark Building documentation.
You need to compile spark for YARN using the correct configuration for your hadoop cluster (version,hive support, etc).
The problem won't persist then!
This is the problem with spark communicating with Application Master.
The RM and NM talk to each other over RPC so the problem could be launch_container.cmd is not running correctly. Check that the NM has communicating with RM while submitting job
Try adding this to your yarn-site.xml:
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>1200</value>
</property>
This will ensure that the launch_container.cmd from the NM error seen does not get deleted ( will remain around for 20 mins - increase 1200 to a higher number if needed). Now, what you can do is try and run that launch_container.cmd script manually from the container dir and see where it bails out.
Hope this will help you.
I also faced a similar problem. Actually, you do not need to mention --master yarn-cluster when you are running your self-contained application in the cluster.
This problem has been solved on Cloudera forum refer this https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Issue-running-spark-application-in-Yarn-cluster-mode/td-p/44570

Resources