Kill hive queries without exiting from hive shell - shell

Is there any way we can kill hive query without exiting from hive shell ?. For Example, I wrongly ran the select statement from some table which has million rows of data, i just wanted to stop it, but not exiting from the shell. If I pressed CTRL+Z, its coming out of shell.

You have two options:
press Ctrl+C and wait till command terminates, it will not exit from hive CLI, press Ctrl+C second time and the session will terminate immediately exiting to the shell
from another shell run
yarn application -kill <Application ID> or
mapred job -kill <JOB_ID>

First, look for Job ID by:
hadoop job -list
And then kill it by ID:
hadoop job -kill <JOB_ID>

Go with the second option
yarn application -kill <Application ID>. Get the application ID by getting onto another session.
This is the only way I think you would be able to kill the current query. I do use via beeline on hortonwork framework.

Related

HBase export task mysteriously stopped logging to output file

I recently attempted to do an export of a table from an HBase instance using a 10 data node Hadoop cluster. The command line looked like the following:
nohup hbase org.apache.hadoop.hbase.mapreduce.Export documents /export/documents 10 > ~/documents_export.out &
As you can see, I nohup the process so it wouldn't prematurely die when my SSH session closed, and I put the whole thing in the background. To capture the output, I directed it to a file.
As expected, the process started to run and in fact ran for several hours before the output mysteriously stopped in the file I was outputting to. It stopped at about 31% through the mapping phase of the mapreduce job being run. However, per Hadoop, the mapreduce job itself was still going and in fact was working to completion the next morning.
So, my question is why did output stop going to my log file? My best guess is that the parent HBase process I invoked exited normally when it was done with the initial setup for the mapreduce job involved in the export.

EMR kill PIG script

Is there a way of killing a running pig script, not only the current hadoop job ?
As you know a pig script is translated to a hadoop job DAG. Assume everything runs smoothly up to some point in this graph but, for some reason, I want to stop the execution of this script/"DAG". Is there an emr command to do that ?
I tried to kill the current hadoop job and it looks like the execution of the pig script is CANCELLED but the cluster/master node is left in a weird state which makes all the subsequent pig scripts fail instantly.

How to Kill Hadoop fs -copyToLocal task

I ran the following command on my local filesystem:
hadoop fs -copyToLocal <HDFS Path>
But, in the middle of the task (after hitting the command in terminal and before the command completes it's task), I want to cancel the copy. How can I do this ?
Also, is -copyToLocal executed as a MR job internally ? Can someone point me to a reference.
Thanks.
It uses the FileSystem API to stream & copy the file to local. There is no MR.
You could find the process on the machine & kill the process. It is usually a JVM process which gets invoked.
if you are using Nohup and/or & to perform the process you will get the job status by searching CopyToLocal in ps -eaf action, and if you are using normal command execution, you can use ctrl+z or ctrl+c. these will kill the process.
In Both case the dump and temp location which all create remains there, so once killing the process you have to clear the dump/temp dump to perform the same process again.
It will not create any MR Job,

How to interrupt PIG from DUMP -ing a huge file/variable in grunt mode?

How do we interrupt pig dump command (EDIT: when it has completed the MapReduce jobs and is now just displaying the result on grunt shell) without exiting the grunt shell?
Sometimes, if we dump a HUGE file by mistake, it goes on forever!
I know we can use CTRL+C to stop it but it also quits the grunt shell and then we have to write all the commands again.
We can execute the following command in the grunt shell
kill jobid
We can find the job’s ID by looking at Hadoop’s JobTracker GUI, which lists all jobs currently running on the cluster. Note that this command kills a particular MapReduce job. If the Pig job contains other MapReduce jobs that do not depend on the killed MapReduce job, these jobs will still continue. If you want to kill all of the MapReduce jobs associated with a particular Pig job, it is best to terminate the process running Pig using CTRL+C, and then use this command to kill any MapReduce jobs that are still running.

how to kill hadoop jobs

I want to kill all my hadoop jobs automatically when my code encounters an unhandled exception. I am wondering what is the best practice to do it?
Thanks
Depending on the version, do:
version <2.3.0
Kill a hadoop job:
hadoop job -kill $jobId
You can get a list of all jobId's doing:
hadoop job -list
version >=2.3.0
Kill a hadoop job:
yarn application -kill $ApplicationId
You can get a list of all ApplicationId's doing:
yarn application -list
Use of folloing command is depreciated
hadoop job -list
hadoop job -kill $jobId
consider using
mapred job -list
mapred job -kill $jobId
Run list to show all the jobs, then use the jobID/applicationID in the appropriate command.
Kill mapred jobs:
mapred job -list
mapred job -kill <jobId>
Kill yarn jobs:
yarn application -list
yarn application -kill <ApplicationId>
An unhandled exception will (assuming it's repeatable like bad data as opposed to read errors from a particular data node) eventually fail the job anyway.
You can configure the maximum number of times a particular map or reduce task can fail before the entire job fails through the following properties:
mapred.map.max.attempts - The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.
mapred.reduce.max.attempts - Same as above, but for reduce tasks
If you want to fail the job out at the first failure, set this value from its default of 4 to 1.
Simply forcefully kill the process ID, the hadoop job will also be killed automatically . Use this command:
kill -9 <process_id>
eg: process ID no: 4040 namenode
username#hostname:~$ kill -9 4040
Use below command to kill all jobs running on yarn.
For accepted jobs use below command.
for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done
For running, jobs use the below command.
for x in $(yarn application -list -appStates RUNNING | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done

Resources