Can someone tell how to kill a container? i see nodes are still running containers even after the application is finished and i want to know the command to kill them? Because of this issue, my subsequent applications stays in accepted state.
Thanks
Hadoop job -list
This gives you jobs that are running with JobID's
To kill job
Hadoop job –kill JobID
If yarn application is finished and some containers are still running, I'd say this is a bug somewhere. Is this a MR app? I don't think there's any commands to kill containers and anyway those should be handled by a nodemanager. Resource manager and Node manager should kill all containers when application is finished.
You didn't provide any info on what is this app, hadoop version, operating system, etc. Having said that, I once had a problem in my ubuntu hosts which had HADOOP-9752 bug which prevented nodemanager to kill a container.
Related
My hadoop job was running over 10 hours but since I put it in wrong queue, the containers are kept getting killed by the scheduler.
How do I change the queue of currently running hadoop job without restarting it?
Thank you
if running Yarn you can change the current job's queue by
yarn application -movetoqueue <app_id> -queue <queue_name>
I want to set up a series of spark steps on an EMR spark cluster, and terminate the current step if it's taking too long. However, when I ssh into the master node and run hadoop jobs -list, the master node seems to believe that there is no jobs running. I don't want to terminate the cluster, because doing so would force me to buy a whole new hour of whatever cluster I'm running. Can anyone please help me terminate a spark-step in EMR without terminating the entire cluster?
That's easy:
yarn application -kill [application id]
you can list your running applications with
yarn application -list
You can kill application from the Resource manager (in the links at the top right under cluster status).
In the resource manager, click on the application you want to kill and in the application page there is a small "kill" label (top left) you can click to kill the application.
Obviously you can also SSH but this way I think is faster and easier for some users.
I'm running Spark applications on YARN, when I kill the job using:
yarn application kill -applicationId application_XYZ
I can not go to Spark Job GUI of killed application form Hadoop GUI (ResourceManager). When I open Spark history server directly and try to display Incomplete application application logs it works. When job is completed (not killed) log can be displayed this way: Hadoop GUI -> Spark history server. I'm using YARN log aggregation service to aggregate logs. Aslo I can access application logs using:
yarn logs -applicationId application_XYZ
Have you experienced the same behaviour when you kill a Spark application? Is there anything wrong with killing application this way?
There is nothing wrong in killing of the application like that.And yes, Hadoop UI does not show the output of the killed jobs but as you mentioned you can see it from the logs on the master.
Running Spark 1.3.1 on Yarn and EMR. When I run the spark-shell everything looks normal until I start seeing messages like INFO yarn.Client: Application report for application_1439330624449_1561 (state: ACCEPTED). These messages are generated endlessly, once per second. Meanwhile, I am unable to use the Spark shell.
I don't understand why this is happening.
Seeing (near) endless Accepted messages from YARN has always been a sure sign that there were not enough cluster resources to allocate for my Spark jobs / shell. YARN will continue trying to schedule your Spark application, but will eventually time-out if not enough resources become available in a certain amount of time.
Are you providing any command line options to spark-shell that override the defaults provided? When I ask for too many executors/cores/memory YARN will accept my request but never transition to a Running ApplicationMaster.
Try running a spark-shell with no options (other than perhaps --master yarn) and see if it gets past Accepted.
Realized there were a couple of streaming jobs I had killed in the terminal, but I guess they were somehow still running. I was able to find these in the UI showing all running applications on YARN (I wasn't able to execute Hive queries as either). Once I killed the jobs using the command below the spark-shell started as usual.
yarn application -kill application_1428487296152_25597
I guess that YARN is not having resources enough for running jobs.
Please check
https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html
for calculating how many resources can you provide to YARN.
Please check the number of cores and the RAM quantity that it is controlled by the following variables:
yarn.nodemanager.resource.cpu-vcores
yarn.nodemanager.resource.memory-mb
Hadoop YARN launches instances of YarnChild in child VM to execute the actual tasks. Those tasks communicate with their ApplicationMaster (AM) through the umbilical interface.
My question is what happens if AM dies and Resource Manager(RM) fails to bring it up (say, due to some code defect in AM)? In such a case, the children tasks would (a) note the absence of AM due to heartbeat and then, (b) go to RM to get new AM location, which in this case they will not get. So, what happens to these orphaned tasks? I have a scenario where I would like to terminate them. Is that the default behavior and does their NodeManager (NM) terminate them?
From Hadoop -Definitive Guide, Chapter 6, Failures, Failures in yarn
After a crash, a new resource manager instance is brought up(by
admin), and it recovers from the saved state. The state consists of
node managers in system, as well as running applications. Here tasks
are not part of resource managers state, as they are managed by
application.
Also, it is said that the resource manager is designed to be able to recover from crashes.
All child task related to that particular application master would be on halt state. Hadoop admin should either restart the application master or kill it. NodeManager doesn't terminate the failed Application Master.
If you want to kill a application then you can use yarn application -kill application_id command to kill the application. It will kill all running and queued jobs under the application.
If you want to kill a task in YARN then you can use hadoop job -kill-task <task-id> to kill a particular task in YARN