Spark Shell stuck in YARN Accepted state - hadoop

Running Spark 1.3.1 on Yarn and EMR. When I run the spark-shell everything looks normal until I start seeing messages like INFO yarn.Client: Application report for application_1439330624449_1561 (state: ACCEPTED). These messages are generated endlessly, once per second. Meanwhile, I am unable to use the Spark shell.
I don't understand why this is happening.

Seeing (near) endless Accepted messages from YARN has always been a sure sign that there were not enough cluster resources to allocate for my Spark jobs / shell. YARN will continue trying to schedule your Spark application, but will eventually time-out if not enough resources become available in a certain amount of time.
Are you providing any command line options to spark-shell that override the defaults provided? When I ask for too many executors/cores/memory YARN will accept my request but never transition to a Running ApplicationMaster.
Try running a spark-shell with no options (other than perhaps --master yarn) and see if it gets past Accepted.

Realized there were a couple of streaming jobs I had killed in the terminal, but I guess they were somehow still running. I was able to find these in the UI showing all running applications on YARN (I wasn't able to execute Hive queries as either). Once I killed the jobs using the command below the spark-shell started as usual.
yarn application -kill application_1428487296152_25597

I guess that YARN is not having resources enough for running jobs.
Please check
https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html
for calculating how many resources can you provide to YARN.
Please check the number of cores and the RAM quantity that it is controlled by the following variables:
yarn.nodemanager.resource.cpu-vcores
yarn.nodemanager.resource.memory-mb

Related

spark Yarn mode how to get applicationId from spark-submit

When I submit spark job using spark-submit with master yarn and deploy-mode cluster, it doesn't print/return any applicationId and once job is completed I have to manually check MapReduce jobHistory or spark HistoryServer to get the job details.
My cluster is used by many users and it takes lot of time to spot my job in jobHistory/HistoryServer.
is there any way to configure spark-submit to return the applicationId?
Note: I found many similar questions but their solutions retrieve applicationId within the driver code using sparkcontext.applicationId and in case of master yarn and deploy-mode cluster the driver also run as a part of mapreduce job, any logs or sysout printed to remote host log.
Here are the approaches that I used to achieve this:
Save the application Id to HDFS file. (Suggested by #zhangtong in comment).
Send an email alert with applictionId from driver.

How to change queue of currently running hadoop job?

My hadoop job was running over 10 hours but since I put it in wrong queue, the containers are kept getting killed by the scheduler.
How do I change the queue of currently running hadoop job without restarting it?
Thank you
if running Yarn you can change the current job's queue by
yarn application -movetoqueue <app_id> -queue <queue_name>

Get status when running job without hadoop

When I run a hadoop job with the hadoop application it prints a lot of stuff. Among them, It show the relative progress of the job ("map: 30%, reduce: 0%" and stuff like that). But, when running a job without the application it does not print anything, not even errors. Is there a way to get that level of logging without the application? That is, without running [hadoop_folder]/bin/hadoop jar <my_jar> <indexer> <args>....
You can get this information from Application Master (assuming you use YARN and not MR1 where you would get it from Job Tracker). There is usually web UI where you can find this information. Details will depend on your Hadoop installation / distribution.
In case of Hadoop v1 check Job tracker web URL and in case of Hadoop v2 check Application Master web UI

Cloudera Hue running WordCount

I have successfully installed and started up the CDH5 manager and agent. However whenever I try running the MR hello world job, ie WordCount, it runs upto 33% and stays in the same condition for a long time and it doesn't proceed.
Any clues as to where it might be going wrong?
FYI, when trying to run in the terminal it works fine.
It is recommended to switch Hue to use the CherryPy server instead of Spawning. In the hue.ini or the Hue Safety Valve in CM, enter:
[desktop]
use_cherrypy_server = true
These issues may be due to Beeswax crashing or being very slow and blocking all the requests as the Spawing Server is not perfectly greenified
Hue can use Oozie to submit jobs and it requires on more MR task. Usually the problem is that Yarn Apps asks for too much memory in your cluster (so decrease their default resources in yarn config) or it is gotcha #5 http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

Stopping a Hadoop 2x container

Can someone tell how to kill a container? i see nodes are still running containers even after the application is finished and i want to know the command to kill them? Because of this issue, my subsequent applications stays in accepted state.
Thanks
Hadoop job -list
This gives you jobs that are running with JobID's
To kill job
Hadoop job –kill JobID
If yarn application is finished and some containers are still running, I'd say this is a bug somewhere. Is this a MR app? I don't think there's any commands to kill containers and anyway those should be handled by a nodemanager. Resource manager and Node manager should kill all containers when application is finished.
You didn't provide any info on what is this app, hadoop version, operating system, etc. Having said that, I once had a problem in my ubuntu hosts which had HADOOP-9752 bug which prevented nodemanager to kill a container.

Resources