How to kill a mapred job started by hive? - hadoop

I'm working by CDH 5.1 now. It starts normal Hadoop job by YARN but hive still works with mapred. Sometimes a big query will hang for a long time and I want to kill it.
I can find this big job by JobTracker web console while it didn't provide a button to kill it.
Another way is killing by command line. However, I couldn't find any job running by command line.
I have tried 2 commands:
yarn application -list
mapred job -list
How to kill big query like this?

You can get the Job ID from Hive CLI when you run a job or from the Web UI. You can also list the job IDs using the application ID from resource manager. Ideally, you should get everything from
mapred job -list
or
hadoop job -list
Using the Job ID you can kill it by using the below command.
hadoop job -kill <job_id>
Another alternative would be to kill the application using
yarn application -kill <application_id>

Related

How to Kill Hive Query, without knowing application id?

My hive-server2 list a few running jobs, so I can find the various query_id.
But there is not yarn-application information in the Yarn 8088 pages.
My question is how to kill the running job.
If you are using Yarn as resource manager, you can find all running jobs by running the following in shell:
yarn application -list -appStates ALL
You can change ALL to RUNNING etc. depending on what application state you are interested in seeing.
An alternative command to the above to see running applications is:
mapred job -list
In order to kill a specific application/job, with YARN you can run:
yarn application -kill <application_id>
Or otherwise:
mapred job -kill <job_id>

How to get the job id of a specific running hadoop jobs

I need to get the id of a specific hadoop job.
In my case, I lunch a sqoop commande remotely and I went to verify the job status with this commande :
hadoop job -status job_id | grep -w 'state'
I can get this information from the GUI but i went to do something
can any one help me !!!
You can use the Yarn REST apis, via your browser or curl from the command line. It will list all the currently running and previously running jobs, including sqoop and the mapreduce jobs that sqoop generates and executes. Use the UI first, if you have it up and running just point your browser to http:<host>:8088/cluster (not sure if the port is the same on all hadoop distributions. I believe 8088 is the default on apache). Alternatively you can use yarn commands directly, e.g, yarn application -list.

Do we need to put namenode in safe mode before restarting the job tracker?

I have a Hadoop cluster running Cloudera's CDH3, Apache Hadoop's 0.20.2 equivalent. I want to restart the job-tracker as there are some jobs which are not getting killed. I tried killing them from the command line, the command executes successfully, but the jobs are still in Job Cleanup: Pending status. Anyways I want to restart the job-tracker and see if that cleanup the jobs. I know the command to restart the job-tracker, but I am not sure if I need to put the name-node in safe-mode before I restart the job-tracker.
You can try to kill the unwanted jobs using hadoop job -kill <Job-ID> and check for command status echo "$?". If that doesn't work, Restart is the only option.
Hadoop Jobtracker and namenodes are independent components, No need to execute namenode safenode before Jobtracker restart. You can restart Jobtracker process alone.(tasktracker if required)

Find running job priority

How can I find the priority used by a job running in Hadoop?
I tried to use Hadoop commands like hadoop job, yarn container, or mapred job, etc., but couldn't find how to get the running job priority.
You can use getJobPriority() method in your mapreduce code.
Use:
hadoop job -list
...it will show you the information of all running jobs with priority.
hadoop job -list all
...will show you the information of all the job(Running,Success,Fail) with priority.

CDH4.4: Restarting HDFS and MapReduce from shell

I'm trying to automate stopping, formatting and starting HDFS and MapReduce services on a Cloudera Hadoop 4.4 cluster, using a bash script.
It's easy to kill HDFS and MapReduce processes using "pkill -U hdfs && pkill -U mapred", but how can I start those processes again, without using the Cloudera Manager GUI?
Well, apparently CM has a pretty sweet API
Check it out here
http://cloudera.github.io/cm_api/

Resources