Remotely connect to spark on yarn cluster in client mode - hadoop

I have a remote spark on yarn cluster that if I use rstudio server(web version) hosted on that cluster to connect in client mode I can do the following:
sc <- SparkR::sparkR.init(master = "yarn-client")
However if I try to use rstudio on my local machine to connect to that spark cluster the same way then I have errors:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master
...
ERROR Utils: Uncaught exception in thread nioEventLoopGroup-2-2
java.lang.NullPointerException
...
ERROR RBackendHandler: createSparkContext on org.apache.spark.api.r.RRDD failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
A more detailed error message on hadoop application tracking page is like this:
User: blueivy
Name: SparkR
Application Type: SPARK
Application Tags:
State: FAILED
FinalStatus: FAILED
Started: 27-Oct-2015 11:07:09
Elapsed: 4mins, 39sec
Tracking URL: History
Diagnostics:
Application application_1445628650748_0027 failed 2 times due to AM Container for appattempt_1445628650748_0027_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://master:8088/proxy/application_1445628650748_0027/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1445628650748_0027_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:618)
at java.lang.Thread.run(Thread.java:785)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
I have the same configurations and environment for hadoop and spark with remote cluster: spark 1.5.1, hadoop 2.6.0 and ubuntu 14.04. Anyone can help me find what's my mistake here?

Related

java.net.ConnectException error when running yarn

I'm having an error when running yarn on a job. HDFS and Yarn both start up fine, jps shows everything normal, pseudo-distributed mode on HDFS works perfectly, and I have triple and quadruple checked my configuration files. Whenever I attempt to run Yarn, however, this happens:
INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From serverA/IPaddress to serverB:30170 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 6 failover attempts. Trying to failover after sleeping for 44428ms.
Yarn then attempts to connect over and over again until I forcefully quit the process. Any ideas why this is happening?
Can you see yarn web ui?
How did you start hdfs and yarn?
You can try ./sbin/start-all.sh

Mapreduce job failed because of container failed

Mapreduce job failed because of container failed with below log.
15/03/21 20:18:25 INFO mapreduce.Job: Job job_1426295876693_0015 failed with state FAILED due to: Application application_1426295876693_0015 failed 2 times due to Error launching appattempt_1426295876693_0015_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1426996344559 found 1426969281613
It means that your cluster is not synced with same system time. Install NTP server. It will fix your issue.

issue Running Spark Job on Yarn Cluster

I want to run my spark Job in Hadoop YARN cluster mode, and I am using the following command:
spark-submit --master yarn-cluster
--driver-memory 1g
--executor-memory 1g
--executor-cores 1
--class com.dc.analysis.jobs.AggregationJob
sparkanalitic.jar param1 param2 param3
I am getting error below, kindly suggest whats going wrong, is the command correct or not. I am using CDH 5.3.1.
Diagnostics: Application application_1424284032717_0066 failed 2 times due
to AM Container for appattempt_1424284032717_0066_000002 exited with
exitCode: 15 due to: Exception from container-launch.
Container id: container_1424284032717_0066_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 15
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdfs
start time: 1424699723648
final status: FAILED
tracking URL: http://myhostname:8088/cluster/app/application_1424284032717_0066
user: hdfs
2015-02-23 19:26:04 DEBUG Client - stopping client from cache: org.apache.hadoop.ipc.Client#4085f1ac
2015-02-23 19:26:04 DEBUG Utils - Shutdown hook called
2015-02-23 19:26:05 DEBUG Utils - Shutdown hook called
Any help would be greatly appreciated.
It can mean a lot of things, for us, we get the similar error message because of unsupported Java class version, and we fixed the problem by deleting the referenced Java class in our project.
Use this command to see the detailed error message:
yarn logs -applicationId application_1424284032717_0066
You should remove ".setMaster("local")" in the code.
The command looks correct.
What I've come across is that the "exit code 15" normally indicates a TableNotFound Exception. That usually means there's an error in the code you're submitting.
You can check this by visiting the tracking URL.
For me exit code issue solved by placing hive-site.xml in spark/conf directory.
Remove the line "spark.master":"local[*]" in the spark configuration file if you are running the spark jobs under cluster.
Suppose run on the local pc, include it.
Mani

Submit Job in Spark using Yarn Cluster

I am unable to submit the job in yarn cluster.The job is running fine under yarn-client option. When submit it to yarn-cluster only this log is coming multiple times.
Application report for application_1421828570504_0002 (state: ACCEPTED)
and got failed with the following exception.
diagnostics: Application application_1421828570504_0002 failed 10 times due to AM Container for app
attempt_1421828570504_0002_000010 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
You should have a look at the logs of your application:
> yarn logs --applicationId application_1421828570504_0002
This will yield some debug information of the actual run within the spark containers.
Since it is running locally but not on the cluster my wild guess would be a missing SparkContext definition. Have a look at my answer to this question for a fix.

Storm-YARN : Application container fails to launch

I am running a storm (trident) topology that reads avro from kafka & writes the records in hbase.
The topology is running as expected in Localcluster mode, but while using Stormsubmitter I'm facing below issues.
In Distributed Hadoop mode I'm getting the below error [1] while launching the YARN application.
In Hadoop (local mode, with 1 box only) Yarn is spawnning the nimbus server and storm-ui. But there are no supervisor(s) running to run the spout/bolts in the topology. I guess the reason might be insufficient memory (4G to run the topology + hbase, hdfs, kafka, zookeeper etc...).
Can you help me out in understanding the reason of this container failure? There are no errors/info present in application logs.
[1] YARN container fails to launch with below error on running.
storm-yarn launch /homext/storm-yarn.yml --queue default -appname storm-yarn-demo --stormZip /tmp/storm-0.9.zip
Application application_1415038356032_0304 failed 2 times due to AM Container for appattempt_1415038356032_0304_000002 exited with exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
.Failing this attempt.. Failing the application.
This log is insufficient to diagnose. All it says is that the container failed to launch. You should look into the container output. Check the ${yarn.nodemanager.log-dirs} on the nodes, there will be an application folder (application_1415038356032_0304) and in there there will be a container folder for each attempt (...1415038356032_0304_000002) containing the stderr, stdout and syslog of this attempt. Read those and you'll likely identify the problem.
If these don't exist, look in ${yarn.nodemanager.local-dirs} you'll find the container launch script (I thinks is called container-launch.sh) for this app/container attempt. In it will be the actual command to launch the container. Try to run that from the shell prompt and see what you get.
If it fails at an early stage then the logs can be found in HDFS under:
/tmp/logs/<user>/logs/
This should give enough information to diagnose the problem.
In my case I found a log file:
/tmp/logs/hdfs/logs/application_1426618997634_0004/vagrant-cdh-node4_8041
With some errors like:
/bin/bash: /usr/lib/jvm/java-7-oracle/bin/java: No such file or directory
And fixing the JAVA_HOME environment variable did the trick.

Resources