RandomForest via Spark causing Heap Space error - apache-spark-mllib

HDP-2.5.0.0 using Ambari 2.4.0.1, Spark 2.0.1.
I am having a Scala code that reads a 108MB csv file and uses the RandomForest.
I run the following command :
/usr/hdp/current/spark2-client/bin/spark-submit --class samples.FuelModel --master yarn --deploy-mode cluster --driver-memory 8g spark-assembly-1.0.jar
At least one container is killed, probably, due to OutOfMemory as the heap space overflows, the console output is :
16/12/22 09:07:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/22 09:07:25 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/12/22 09:07:26 INFO TimelineClientImpl: Timeline service address: http://l4326pp.sss.com:8188/ws/v1/timeline/
16/12/22 09:07:26 INFO AHSProxy: Connecting to Application History server at l4326pp.sss.com/138.106.33.132:10200
16/12/22 09:07:26 INFO Client: Requesting a new application from cluster with 4 NodeManagers
16/12/22 09:07:26 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (204800 MB per container)
16/12/22 09:07:26 INFO Client: Will allocate AM container, with 9011 MB memory including 819 MB overhead
16/12/22 09:07:26 INFO Client: Setting up container launch context for our AM
16/12/22 09:07:26 INFO Client: Setting up the launch environment for our AM container
16/12/22 09:07:26 INFO Client: Preparing resources for our AM container
16/12/22 09:07:27 INFO YarnSparkHadoopUtil: getting token for namenode: hdfs://prodhadoop/user/ojoqcu/.sparkStaging/application_1481607361601_8315
16/12/22 09:07:27 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 79178 for ojoqcu on ha-hdfs:prodhadoop
16/12/22 09:07:28 INFO metastore: Trying to connect to metastore with URI thrift://l4327pp.sss.com:9083
16/12/22 09:07:29 INFO metastore: Connected to metastore.
16/12/22 09:07:29 INFO YarnSparkHadoopUtil: HBase class not found java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
16/12/22 09:07:29 INFO Client: Source and destination file systems are the same. Not copying hdfs:/lib/spark2_2.0.1.tar.gz
16/12/22 09:07:29 INFO Client: Uploading resource file:/localhome/ojoqcu/code/debug/Rikard/spark-assembly-1.0.jar -> hdfs://prodhadoop/user/ojoqcu/.sparkStaging/application_1481607361601_8315/spark-assembly-1.0.jar
16/12/22 09:07:29 INFO Client: Uploading resource file:/tmp/spark-ff9db580-00db-476e-9086-377c60bc7e2a/__spark_conf__1706674327523194508.zip -> hdfs://prodhadoop/user/ojoqcu/.sparkStaging/application_1481607361601_8315/__spark_conf__.zip
16/12/22 09:07:29 WARN Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/12/22 09:07:29 INFO SecurityManager: Changing view acls to: ojoqcu
16/12/22 09:07:29 INFO SecurityManager: Changing modify acls to: ojoqcu
16/12/22 09:07:29 INFO SecurityManager: Changing view acls groups to:
16/12/22 09:07:29 INFO SecurityManager: Changing modify acls groups to:
16/12/22 09:07:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ojoqcu); groups with view permissions: Set(); users with modify permissions: Set(ojoqcu); groups with modify permissions: Set()
16/12/22 09:07:29 INFO Client: Submitting application application_1481607361601_8315 to ResourceManager
16/12/22 09:07:30 INFO YarnClientImpl: Submitted application application_1481607361601_8315
16/12/22 09:07:31 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:31 INFO Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: dataScientist
start time: 1482394049862
final status: UNDEFINED
tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
user: ojoqcu
16/12/22 09:07:32 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:33 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:34 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:07:35 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:35 INFO Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: 138.106.33.145
ApplicationMaster RPC port: 0
queue: dataScientist
start time: 1482394049862
final status: UNDEFINED
tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
user: ojoqcu
16/12/22 09:07:36 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:37 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:38 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:39 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:40 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:41 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:42 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:43 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:44 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:45 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:46 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:47 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:48 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:49 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:50 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:51 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:52 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:53 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:54 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:55 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:56 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:57 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:58 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:07:59 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:00 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:01 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:02 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:03 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:04 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:05 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:06 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:07 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:08 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:09 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:10 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:11 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:12 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:13 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:14 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:15 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:16 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:17 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:18 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:19 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:20 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:21 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:22 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:23 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:24 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:25 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:26 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:27 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:28 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:29 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:30 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:31 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:32 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:33 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:34 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:35 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:36 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:37 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:38 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:39 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:40 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:41 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:42 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:43 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:44 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:45 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:46 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:47 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:48 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:49 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:50 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:51 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:52 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:52 INFO Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: dataScientist
start time: 1482394049862
final status: UNDEFINED
tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
user: ojoqcu
16/12/22 09:08:53 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:54 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:55 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:56 INFO Client: Application report for application_1481607361601_8315 (state: ACCEPTED)
16/12/22 09:08:57 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:57 INFO Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: 138.106.33.144
ApplicationMaster RPC port: 0
queue: dataScientist
start time: 1482394049862
final status: UNDEFINED
tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
user: ojoqcu
16/12/22 09:08:58 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:08:59 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:00 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:01 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:02 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:03 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:04 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:05 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:06 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:07 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:08 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:09 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:10 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:11 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:12 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:13 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:14 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:15 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:16 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:17 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:18 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:19 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:20 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:21 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:22 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:23 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:24 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:25 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:26 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:27 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:28 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:29 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:30 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:31 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:32 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:33 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:34 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:35 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:36 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:37 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:38 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:39 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:40 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:41 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:42 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:43 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:44 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:45 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:46 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:47 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:48 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:49 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:50 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:51 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:52 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:53 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:54 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:55 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:56 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:57 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:58 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:09:59 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:00 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:01 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:02 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:03 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:04 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:05 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:06 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:07 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:08 INFO Client: Application report for application_1481607361601_8315 (state: RUNNING)
16/12/22 09:10:09 INFO Client: Application report for application_1481607361601_8315 (state: FINISHED)
16/12/22 09:10:09 INFO Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 28.0 failed 4 times, most recent failure: Lost task 1.3 in stage 28.0 (TID 59, l4328pp.sss.com): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container marked as failed: container_e63_1481607361601_8315_02_000005 on host: l4328pp.sss.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
Driver stacktrace:
ApplicationMaster host: 138.106.33.144
ApplicationMaster RPC port: 0
queue: dataScientist
start time: 1482394049862
final status: FAILED
tracking URL: http://l4327pp.sss.com:8088/proxy/application_1481607361601_8315/
user: ojoqcu
Exception in thread "main" org.apache.spark.SparkException: Application application_1481607361601_8315 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/12/22 09:10:09 INFO ShutdownHookManager: Shutdown hook called
16/12/22 09:10:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-ff9db580-00db-476e-9086-377c60bc7e2a
The YARN log from one of the nodes executing the jobs :
2016-12-22 09:10:09,095 WARN runtime.DefaultLinuxContainerRuntime (DefaultLinuxContainerRuntime.java:launchContainer(107)) - Launch container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=143:
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:103)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=143:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
Attached are the screenshots from the SparkUI.

The command that worked :
*Most conservative, takes the least cluster
/usr/hdp/current/spark2-client/bin/spark-submit --class samples.FuelModel --master yarn --deploy-mode cluster --executor-memory 4g --driver-memory 8g spark-assembly-1.0.jar
The time taken is too long(40+ minutes) to be acceptable, I tried increasing the no. of cores and executors but it didn't help much. I guess the overhead exceeds the performance benefit due to the processing a small file(< 200MB) on 2 nodes. I would be glad if any one can provide any pointers.

Related

Endless INFO Client: Application report for application_xx (state: ACCEPTED) messages for Spark submit on Yarn

When I submit Spark application using Hadoop with Yarn in cluster mode.
Yarn client State stucks in Accepted state and it never change to Running. I am Using Centos 7 Hadoop Cluster which has 1 Master 2 Slaves
I login to openstack with floating IP(which we externally associate) This IP is different from IP address we get we do ifconfig in system.
Below are the logs:
18/01/21 16:34:19 INFO yarn.Client: Uploading resource file:/usr/local/spark/examples/jars/spark-examples_2.11-2.0.1.jar -> hdfs://192.168.198.10:8020/user/cloud-user/.sparkStaging/application_1516548465362_0014/spark-examples_2.11-2.0.1.jar
18/01/21 16:34:19 INFO yarn.Client: Uploading resource file:/tmp/spark-f37b5cec-a81f-46c3-9b5e-6ce7854c6dd4/__spark_conf__2008488553335511154.zip -> hdfs://192.168.198.10:8020/user/cloud-user/.sparkStaging/application_1516548465362_0014/__spark_conf__.zip
18/01/21 16:34:19 INFO spark.SecurityManager: Changing view acls to: cloud-user
18/01/21 16:34:19 INFO spark.SecurityManager: Changing modify acls to: cloud-user
18/01/21 16:34:19 INFO spark.SecurityManager: Changing view acls groups to:
18/01/21 16:34:19 INFO spark.SecurityManager: Changing modify acls groups to:
18/01/21 16:34:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloud-user); groups with view permissions: Set(); users with modify permissions: Set(cloud-user); groups with modify permissions: Set()
18/01/21 16:34:19 INFO yarn.Client: Submitting application application_1516548465362_0014 to ResourceManager
18/01/21 16:34:19 INFO impl.YarnClientImpl: Submitted application application_1516548465362_0014
18/01/21 16:34:20 INFO yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
18/01/21 16:34:20 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1516552459599
tracking URL: http://master.abc.com:8088/proxy/application_1516548465362_0014/
user: cloud-user
18/01/21 16:34:21 INFO yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
18/01/21 16:34:22 INFO yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
18/01/21 16:34:23 INFO yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
18/01/21 16:34:24 INFO yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
18/01/21 16:34:25 INFO yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
18/01/21 16:34:26 INFO yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
18/01/21 16:34:27 yarn.Client: Application report for application_1516548465362_0014 (state: ACCEPTED)
Tried all options which people have suggested but nothing work.I see node has enough space but not sure why this is not working. Any help is appreciated. Thanks
Unresolved datanode registration: hostname cannot be resolved (ip=192.168.198.11, hostname=192.168.198.11)
I don't think the hostname should be an IP, but you need to update /etc/hosts on each machine to tell it where the slaves and masters are, or you need either static IPs or use a DNS server to resolve address for machines that float on the network

Spark 1.6.2 & YARN: diagnostics: Application failed 2 times due to AM Container for exited with exitCode: -1

I have a cluster of 2 machines and am trying to submit a spark job with YARN cluster manager.
vanilla Spark 1.6.2 built aginst hadoop 2.6.2
vanilla Hadoop 2.7.2
I can successfully run map-reduce jobs and spark jobs with standalone cluster manager. But when I run it with YARN, I got an error.
Any suggestions how to get it to work?
How do I enable more verbose logging? The error message is absolutely unclear
Why no log files are created under hadoop/logs/userlogs/applicationXXX?
Rhetorical question: IMO: hadoop logging & diagnostic isn't very well. Why is that? Hadoop seems to be an established product.
Below is the output:
mike#mp-desktop ~/opt/hadoop $ spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster ~/prg/scala/spark-examples_2.11-1.0.jar 10
16/07/09 08:59:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/09 08:59:01 INFO client.RMProxy: Connecting to ResourceManager at mp-desktop/192.168.1.60:8050
16/07/09 08:59:01 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
16/07/09 08:59:01 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/07/09 08:59:01 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/07/09 08:59:01 INFO yarn.Client: Setting up container launch context for our AM
16/07/09 08:59:01 INFO yarn.Client: Setting up the launch environment for our AM container
16/07/09 08:59:01 INFO yarn.Client: Preparing resources for our AM container
16/07/09 08:59:02 INFO yarn.Client: Uploading resource file:/home/mike/opt/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-assembly-1.6.2-hadoop2.6.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/home/mike/prg/scala/spark-examples_2.11-1.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-examples_2.11-1.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757/__spark_conf__7114661171911035574.zip -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/__spark_conf__7114661171911035574.zip
16/07/09 08:59:06 INFO spark.SecurityManager: Changing view acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: Changing modify acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mike); users with modify permissions: Set(mike)
16/07/09 08:59:07 INFO yarn.Client: Submitting application 1 to ResourceManager
16/07/09 08:59:07 INFO impl.YarnClientImpl: Submitted application application_1468043888852_0001
16/07/09 08:59:08 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:08 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1468043947113
final status: UNDEFINED
tracking URL: http://mp-desktop:8088/proxy/application_1468043888852_0001/
user: mike
16/07/09 08:59:09 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:10 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:11 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:12 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:13 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:14 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:15 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:16 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:17 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:18 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:19 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:20 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:21 INFO yarn.Client: Application report for application_1468043888852_0001 (state: FAILED)
16/07/09 08:59:21 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1468043888852_0001 failed 2 times due to AM Container for appattempt_1468043888852_0001_000002 exited with exitCode: -1
For more detailed output, check application tracking page:http://mp-desktop:8088/cluster/app/application_1468043888852_0001Then, click on links to logs of each attempt.
Diagnostics: File /home/mike/hadoopstorage/nm-local-dir/usercache/mike/appcache/application_1468043888852_0001/container_1468043888852_0001_02_000001 does not exist
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1468043947113
final status: FAILED
tracking URL: http://mp-desktop:8088/cluster/app/application_1468043888852_0001
user: mike
16/07/09 08:59:21 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1468043888852_0001
Exception in thread "main" org.apache.spark.SparkException: Application application_1468043888852_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/07/09 08:59:21 INFO util.ShutdownHookManager: Shutdown hook called
16/07/09 08:59:21 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757
Thanks!
The error message I had was similar:
16/07/15 13:55:53 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:54 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:55 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:56 INFO Client: Application report for application_1468583505911_0002 (state: FAILED)
16/07/15 13:55:56 INFO Client:
client token: N/A
diagnostics: Application application_1468583505911_0002 failed 2 times due to AM Container for appattempt_1468583505911_0002_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://<redacted>:8088/cluster/app/application_1468583505911_0002Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
java.io.FileNotFoundException: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
Try running with YARN in client mode instead of cluster mode which prints out the driver program log to your shell:
spark-submit --class myClass --master yarn /path/to/myClass.jar
The log output showed that myClass was failing immediately because I had the incorrect number of args (the class expected more than 1 arg).
The class had failed with my custom exit code (42) and prints "Usage" info to the log, allowing me to fix the actual problem.
When I ran with --master yarn-cluster, this output was not visible to me and I could not see the "Usage" information mentioned above. Instead, all I had was the vague "File does not exist" issue shown above.
Specifying the correct number of arguments to myClass resolved the issue.
At this point I've assumed that my Spark job failed so quickly that it started cleaning up the .sparkStaging files it had copied before YARN had checked for them.
Probably you have solved your problem but I faced the same issue this morning with Spark 2.1 in yarn cluster and I found this post. I had the same error as you did and my problem was spark conference object which needed:
conf = (SparkConf()
.setMaster("yarn") #I had this value as local
.setAppName("My app Name")
So when I changed this and did my spark submit ( --master yarn --deploy-mode cluster ) everything worked right.
I solved this problem by the following statement and I use CDH5.14.1 and spark 1.6
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
Here is the link:https://spark.apache.org/docs/1.6.0/running-on-yarn.html

running a spark submit job as cluster deploy mode fails but passes with client

EDITI: by removing the conf setting in the app for 'setMaster' I'm able to run yarn-cluster successfully - if anyone coudl help with spark master as cluster deploy - that'd be fantastic
I'm trying to set up spark on a local testmachine so that I can read from an s3 bucket and then write back to it.
Running the jar/application using client works fine, well, fine in that it goes off to the bucket and creates a file and comes back again.
However I need this to work in cluster mode so that it more closely resembles our prod environment yet it's constantly failing - no real sensible messages in the logs that I can see and little feedback to go on.
Any help is greatly appreciated - I'm very new to spark/hadoop so may have overlooked something obvious.
I also tried running with yarn-cluster as the master but that failed for a different reason (saying it couldn't find the s3Native classes - which I pass in as jars)
This is one a windows ev
The command I'm running:
c:\>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --deploy-mode cluster --master spark://127.0.0.1:7077 --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://bucket/jar/fileInputRename.txt"
The output from this on the console is:
Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
master spark://127.0.0.1:7077
deployMode cluster
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile C:\Spark\bin\..\conf\spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass FileInputRename
primaryResource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
name FileInputRename
childArgs [s3://SessionCam-Steve/jar/fileInputRename.txt]
jars file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:
Running Spark using the REST application submission protocol.
Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
FileInputRename
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> FileInputRename
spark.jars -> file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar,file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
spark.submit.deployMode -> cluster
spark.master -> spark://127.0.0.1:7077
Classpath elements:
16/03/24 12:01:56 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://127.0.0.1:7077.
After a few more seconds it shows the c prompt and nothing else. The logs on 8080 :
Application ID Name Cores Memory per Node Submitted Time User State Duration
app-20160324120221-0016 FileInputRename 1 1024.0 MB 2016/03/24 12:02:21 Administrator FINISHED 3 s
where the error message only shows:
16/03/24 12:02:24 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
If I run the yarn-cluster as main so that this is my command:
c:>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --master yarn-cluster --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://SessionCam-Steve/jar/fileInputRename.txt"
The output and exception:
Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
master yarn-cluster
deployMode null
executorMemory null
executorCores null
totalExecutorCores null
propertiesFile C:\Spark\bin\..\conf\spark-defaults.conf
driverMemory null
driverCores null
driverExtraClassPath null
driverExtraLibraryPath null
driverExtraJavaOptions null
supervise false
queue null
numExecutors null
files null
pyFiles null
archives null
mainClass FileInputRename
primaryResource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
name FileInputRename
childArgs [s3://SessionCam-Steve/jar/fileInputRename.txt]
jars file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
packages null
packagesExclusions null
repositories null
verbose true
Spark properties used, including those specified through
--conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:
Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
FileInputRename
--addJars
file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
--jar
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
--class
FileInputRename
--arg
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.app.name -> FileInputRename
spark.submit.deployMode -> cluster
spark.master -> yarn-cluster
Classpath elements:
16/03/24 12:05:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/03/24 12:05:23 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/03/24 12:05:23 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/03/24 12:05:23 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/03/24 12:05:23 INFO yarn.Client: Setting up container launch context for our AM
16/03/24 12:05:23 INFO yarn.Client: Setting up the launch environment for our AM container
16/03/24 12:05:23 INFO yarn.Client: Preparing resources for our AM container
16/03/24 12:05:24 WARN : Your hostname, WIN-EU4MXZ2GSIW resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:a94:1d11%14, but we couldn't find any external IP address!
16/03/24 12:05:25 INFO yarn.Client: Uploading resource file:/C:/Spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/spark-assembly-1.6.1-had
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/sparkSubmit_NoJarSetInConf.j
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/hadoop-aws-2.
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/aws-java-sd
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/temp/2/spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1/__spark_conf__7363738392648975127.zip -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_14
16/03/24 12:05:28 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
16/03/24 12:05:28 INFO yarn.Client: Submitting application 4 to ResourceManager
16/03/24 12:05:29 INFO impl.YarnClientImpl: Submitted application application_1458817514983_0004
16/03/24 12:05:30 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:30 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458821128787
final status: UNDEFINED
tracking URL: http://WIN-EU4MXZ2GSIW:8088/proxy/application_1458817514983_0004/
user: Administrator
16/03/24 12:05:31 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:32 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:33 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:34 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:35 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:36 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:37 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:38 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:39 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:40 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:41 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:42 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:43 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:44 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:45 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:46 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:47 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:48 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:49 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:50 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:51 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:52 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:53 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:54 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:55 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:57 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:58 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:59 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:00 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:01 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:02 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:03 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:04 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:05 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:06 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:07 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:08 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:09 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:10 INFO yarn.Client: Application report for application_1458817514983_0004 (state: FAILED)
16/03/24 12:06:10 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1458817514983_0004 failed 2 times due to AM Container for appattempt_1458817514983_0004_000002 exited with exitCode: 15
For more detailed output, check application tracking page:http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1458817514983_0004_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: 1 file(s) moved.
Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1458821128787
final status: FAILED
tracking URL: http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004
user: Administrator
Exception in thread "main" org.apache.spark.SparkException: Application application_1458817514983_0004 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/24 12:06:10 INFO util.ShutdownHookManager: Shutdown hook called
16/03/24 12:06:10 INFO util.ShutdownHookManager: Deleting directory C:\temp\2\spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1
this creates two application ids in the gui:
Application ID Name Cores Memory per Node Submitted Time User State Duration
app-20160324120600-0018 FileInputRename 2 1024.0 MB 2016/03/24 12:06:00 Administrator FINISHED 9 s
app-20160324120543-0017 FileInputRename 2 1024.0 MB 2016/03/24 12:05:43 Administrator FINISHED 8 s
both of which have this as the exception:
16/03/24 12:05:49 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:84)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 16 more
it'd be fantastic and a huge relief if I could get one of these working - thank you in advance for any help.
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
you have problem with classpath. in cluster mode the "application" is executed on one of nodes which probably has their own classpath, so
Either
hadoop-aws-2.7.1.jar for some reason is not present at all(despite the fact that you provide it with --jars - check if it present on all workers at provided path)
or there is another hadoop-aws jar in classpath with another version(personally I think it would be 2nd variant).
Try remove --jars=... argument and see if it helps.
If you have already your yarn cluster installed and configured you just need to set HADOOP_CONF_DIR environment variable to point to your client hadoop configuration. You don't need specify those aws jars since they are already provided by your hadoop cluster. So if you provide them again they could conflict with the ones already there. Reference the documentation here for the spark submit. Also I would suggest to use s3n as protocol to read and write from S3.

How to redirect Spark output to console?

I am running spark jobs on a CHD cluster and all the logs are stored into a history server as text files. Is there a way to get those outputs to print on the console? All I see is
15/10/21 15:47:09 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:10 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:11 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:12 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:13 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:14 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:15 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:16 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:17 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:18 INFO yarn.Client: Application report for application_1445455790310_0014 (state: ACCEPTED)
15/10/21 15:47:19 INFO yarn.Client: Application report for application_1445455790310_0014 (state: RUNNING)
Spark logging system works with log4j.
According to the official documentation, Spark provides a configuration file named log4j.properties.template, in which you can specify different logging properties.
The file is under the folder conf in the main Spark directory.
In order to let Spark detect and use this configuration file, it is necessary to rename it and remove the .template out of it.
The default template looks something like the following:
# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
...
In this example, the default logging method is set to console. Although I have not tested it, you should be able to see the output out of the box with this example template.

Spark applications stuck at ACCEPTED state

I installed a fresh instance of Cloudera 5.4 on a single Ubuntu 14.04 server and want to run one of spark applications.
This is the command:
sudo -uhdfs spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/spark-examples-1.3.0-cdh5.4.5-hadoop2.6.0-cdh5.4.5.jar
This is the output:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/avro-tools-1.7.6-cdh5.4.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/08/29 12:07:56 INFO RMProxy: Connecting to ResourceManager at chd2.moneyball.guru/104.131.78.0:8032
15/08/29 12:07:56 INFO Client: Requesting a new application from cluster with 1 NodeManagers
15/08/29 12:07:56 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (1750 MB per container)
15/08/29 12:07:56 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/08/29 12:07:56 INFO Client: Setting up container launch context for our AM
15/08/29 12:07:56 INFO Client: Preparing resources for our AM container
15/08/29 12:07:57 INFO Client: Uploading resource file:/opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/jars/spark-examples-1.3.0-cdh5.4.5-hadoop2.6.0-cdh5.4.5.jar -> hdfs://chd2.moneyball.guru:8020/user/hdfs/.sparkStaging/application_1440861466017_0007/spark-examples-1.3.0-cdh5.4.5-hadoop2.6.0-cdh5.4.5.jar
15/08/29 12:07:57 INFO Client: Setting up the launch environment for our AM container
15/08/29 12:07:57 INFO SecurityManager: Changing view acls to: hdfs
15/08/29 12:07:57 INFO SecurityManager: Changing modify acls to: hdfs
15/08/29 12:07:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs)
15/08/29 12:07:57 INFO Client: Submitting application 7 to ResourceManager
15/08/29 12:07:57 INFO YarnClientImpl: Submitted application application_1440861466017_0007
15/08/29 12:07:58 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:07:58 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdfs
start time: 1440864477580
final status: UNDEFINED
tracking URL: http://chd2.moneyball.guru:8088/proxy/application_1440861466017_0007/
user: hdfs
15/08/29 12:07:59 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:00 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:01 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:02 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:03 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:04 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:05 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:06 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED)
15/08/29 12:08:07 INFO Client: Application report for application_1440861466017_0007 (state: ACCEPTED
.....
It will show the last line in a loop.
Can you help please? Let me know if you need anything else.
I increased yarn.nodemanager.resource.memory-mb. Everything is ok now
This can happen when Yarn's slots are occupied by other jobs and the cluster is at its capacity. The job gets stuck in the ACCEPTED state waiting for its turn to run. Can you check from Yarn Resource Manager UI to see if anything else is running on the cluster which might be slowing this app down? The RM UI can be accessed by going to http://104.131.78.0:8088, assuming that your RM Address is still 104.131.78.0 as shown in your logs. You should be able to see 1) if any other application is running on your cluster, and 2) navigate to the Spark UI running on http://ApplicationMasterAddress:4040 for further analysis.
I ran into a similar issue on Spark 1.5.2, and was able to fix this by using a Scala object to contain my main function, instead of a Scala class

Resources