How to interrupt PIG from DUMP -ing a huge file/variable in grunt mode? - hadoop

How do we interrupt pig dump command (EDIT: when it has completed the MapReduce jobs and is now just displaying the result on grunt shell) without exiting the grunt shell?
Sometimes, if we dump a HUGE file by mistake, it goes on forever!
I know we can use CTRL+C to stop it but it also quits the grunt shell and then we have to write all the commands again.

We can execute the following command in the grunt shell
kill jobid
We can find the job’s ID by looking at Hadoop’s JobTracker GUI, which lists all jobs currently running on the cluster. Note that this command kills a particular MapReduce job. If the Pig job contains other MapReduce jobs that do not depend on the killed MapReduce job, these jobs will still continue. If you want to kill all of the MapReduce jobs associated with a particular Pig job, it is best to terminate the process running Pig using CTRL+C, and then use this command to kill any MapReduce jobs that are still running.

Related

HBase export task mysteriously stopped logging to output file

I recently attempted to do an export of a table from an HBase instance using a 10 data node Hadoop cluster. The command line looked like the following:
nohup hbase org.apache.hadoop.hbase.mapreduce.Export documents /export/documents 10 > ~/documents_export.out &
As you can see, I nohup the process so it wouldn't prematurely die when my SSH session closed, and I put the whole thing in the background. To capture the output, I directed it to a file.
As expected, the process started to run and in fact ran for several hours before the output mysteriously stopped in the file I was outputting to. It stopped at about 31% through the mapping phase of the mapreduce job being run. However, per Hadoop, the mapreduce job itself was still going and in fact was working to completion the next morning.
So, my question is why did output stop going to my log file? My best guess is that the parent HBase process I invoked exited normally when it was done with the initial setup for the mapreduce job involved in the export.

Kill hive queries without exiting from hive shell

Is there any way we can kill hive query without exiting from hive shell ?. For Example, I wrongly ran the select statement from some table which has million rows of data, i just wanted to stop it, but not exiting from the shell. If I pressed CTRL+Z, its coming out of shell.
You have two options:
press Ctrl+C and wait till command terminates, it will not exit from hive CLI, press Ctrl+C second time and the session will terminate immediately exiting to the shell
from another shell run
yarn application -kill <Application ID> or
mapred job -kill <JOB_ID>
First, look for Job ID by:
hadoop job -list
And then kill it by ID:
hadoop job -kill <JOB_ID>
Go with the second option
yarn application -kill <Application ID>. Get the application ID by getting onto another session.
This is the only way I think you would be able to kill the current query. I do use via beeline on hortonwork framework.

LSF - BSUB Running a script if the job is killed

Im working with the LSF, running bsub commands.
I'm implementing the -Ep switch to run a post exec script. This works great until the Job is killed or hits a memory limit, run limit etc.
Is there any way for the job to detect its running out of resource and then run the script? or to force it to run the script even if its been killed?
I guess my other option is running job with a dependency on that job which will run the "post exec" script when it finishes.
Any thoughts?
Kind Regards,
TheBigPeeler
From the documentation, you should be seeing the behaviour that you want.
A post-execution command runs after the job finishes, regardless of
the exit state of the job. Once a post-execution command is associated
with a job, that command runs even if the job fails. You cannot
configure the post-execution command to run only under certain
conditions.
I thought that maybe the interaction with JOB_INCLUDE_POSTEXEC (lsb.params) could account for the difference, but from my test the post-exec still runs in both cases. I used runlimit (bsub -W) to trigger the job kill.
Is it possible that the post exec is running, but exits early?
What version of LSF are you using? (What's the output of mbatchd -V and sbatchd -V)

EMR kill PIG script

Is there a way of killing a running pig script, not only the current hadoop job ?
As you know a pig script is translated to a hadoop job DAG. Assume everything runs smoothly up to some point in this graph but, for some reason, I want to stop the execution of this script/"DAG". Is there an emr command to do that ?
I tried to kill the current hadoop job and it looks like the execution of the pig script is CANCELLED but the cluster/master node is left in a weird state which makes all the subsequent pig scripts fail instantly.

Why does scheduling Spark jobs through cron fail (while the same command works when executed on terminal)?

I am trying to schedule a spark job using cron.
I have made a shell script and it executes well on the terminal.
However, when I execute the script using cron it gives me insufficient memory to start JVM thread error.
Every time I start the script using terminal there is no issue. This issue comes when the script starts with cron.
Kindly if you could suggest something.

Resources