I have seen the following behavior several times and cannot figure out why it happens.
I have a bash scripts like this:
echo "execute UidGenerator"
hadoop jar foo.jar com.xyz.platform.UidGenerator input_path output_path
echo "execute UidAggregator"
hadoop jar foo.jar com.xyz.platform.UidAggregator input_path output_path
The UidAggregator should not start until UidGenerator finishes. However, I saw the following logs and the 2nd job starts while the 1st job is still running.
16/08/22 07:46:42 INFO mapred.JobClient: map 100% reduce 68%
16/08/22 07:47:29 INFO mapred.JobClient: map 100% reduce 69%
16/08/22 07:49:01 INFO mapred.JobClient: map 100% reduce 70%
execute UidAggregator
I've seen this behavior several times. The shell command follows a 'hadoop jar' command starts while the job is still running (job tracker shows the job is running successfully).
Has anyone seen this behavior? I thought bash scrip should not execute next command until previous commands exit.
Related
I recently attempted to do an export of a table from an HBase instance using a 10 data node Hadoop cluster. The command line looked like the following:
nohup hbase org.apache.hadoop.hbase.mapreduce.Export documents /export/documents 10 > ~/documents_export.out &
As you can see, I nohup the process so it wouldn't prematurely die when my SSH session closed, and I put the whole thing in the background. To capture the output, I directed it to a file.
As expected, the process started to run and in fact ran for several hours before the output mysteriously stopped in the file I was outputting to. It stopped at about 31% through the mapping phase of the mapreduce job being run. However, per Hadoop, the mapreduce job itself was still going and in fact was working to completion the next morning.
So, my question is why did output stop going to my log file? My best guess is that the parent HBase process I invoked exited normally when it was done with the initial setup for the mapreduce job involved in the export.
I want to write a shell script which submits a MapReduce job by the command hadoop jar <example.jar> <main-class>, then how can I get the ID of the job submitted by that command in the shell script, right after that command was invoked?
I know that the command hadoop job -list can display all jobs' IDs, but in that case I can't tell which job is submitted by the shell script.
How do we interrupt pig dump command (EDIT: when it has completed the MapReduce jobs and is now just displaying the result on grunt shell) without exiting the grunt shell?
Sometimes, if we dump a HUGE file by mistake, it goes on forever!
I know we can use CTRL+C to stop it but it also quits the grunt shell and then we have to write all the commands again.
We can execute the following command in the grunt shell
kill jobid
We can find the job’s ID by looking at Hadoop’s JobTracker GUI, which lists all jobs currently running on the cluster. Note that this command kills a particular MapReduce job. If the Pig job contains other MapReduce jobs that do not depend on the killed MapReduce job, these jobs will still continue. If you want to kill all of the MapReduce jobs associated with a particular Pig job, it is best to terminate the process running Pig using CTRL+C, and then use this command to kill any MapReduce jobs that are still running.
I want to kill all my hadoop jobs automatically when my code encounters an unhandled exception. I am wondering what is the best practice to do it?
Thanks
Depending on the version, do:
version <2.3.0
Kill a hadoop job:
hadoop job -kill $jobId
You can get a list of all jobId's doing:
hadoop job -list
version >=2.3.0
Kill a hadoop job:
yarn application -kill $ApplicationId
You can get a list of all ApplicationId's doing:
yarn application -list
Use of folloing command is depreciated
hadoop job -list
hadoop job -kill $jobId
consider using
mapred job -list
mapred job -kill $jobId
Run list to show all the jobs, then use the jobID/applicationID in the appropriate command.
Kill mapred jobs:
mapred job -list
mapred job -kill <jobId>
Kill yarn jobs:
yarn application -list
yarn application -kill <ApplicationId>
An unhandled exception will (assuming it's repeatable like bad data as opposed to read errors from a particular data node) eventually fail the job anyway.
You can configure the maximum number of times a particular map or reduce task can fail before the entire job fails through the following properties:
mapred.map.max.attempts - The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.
mapred.reduce.max.attempts - Same as above, but for reduce tasks
If you want to fail the job out at the first failure, set this value from its default of 4 to 1.
Simply forcefully kill the process ID, the hadoop job will also be killed automatically . Use this command:
kill -9 <process_id>
eg: process ID no: 4040 namenode
username#hostname:~$ kill -9 4040
Use below command to kill all jobs running on yarn.
For accepted jobs use below command.
for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done
For running, jobs use the below command.
for x in $(yarn application -list -appStates RUNNING | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done
Scenario:
When I run enter the query on Hive CLI, I get the errors as below:
Query:
**$ bin/hive -e "insert overwrite table pokes select a.* from invites a where a.ds='2008-08-15';"**
Error is like this:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201111291547_0013, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201111291547_0013
Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2/bin/hadoop job
-Dmapred.job.tracker=localhost:9101 -kill job_201111291547_0013
2011-12-01 14:00:52,380 Stage-1 map = 0%, reduce = 0%
2011-12-01 14:01:19,518 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201111291547_0013 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
Question:
So my question is that how to stop a job? In this case the job is : job_201111291547_0013
Pls help me out so that I can remove these errors and try for next.
Thanks.
You can stop a job by running hadoop job -kill <job_id>.
hadoop job -kill is deprecated now.
Use mapred job -kill instead.
The log traces of the job launched provide the command to kill the job as well.You can use that to kill the job.
That however gives a warning that hadoop job -kill is deprecated. You can also use instead
mapred job -kill
One more option is to try WebHCat API from browser or command line, using utilities like Curl. Here's WebHCat API to delete a hive job
Also note that the link says that
The job is not immediately deleted, therefore the information returned may not reflect deletion, as in our example. Use GET jobs/:jobid to monitor the job and confirm that it is eventually deleted.