Does hadoop list command show jobs having state other than 1? - hadoop

I know that hadoop job -list commnad lists currently running jobs, i.e. jobs whose state is 1 (Running). But does it list jobs which are failed? I mean can I get an output something like this:
1 jobs currently running
JobId State StartTime UserName
job_200808111901_0001 3 1218506470390 abc
job_200808111901_0002 2 1218506470390 xyz
Note that the states of the above jobs are 3 (Failed) and 2 (Succeeded).
I am very new to Hadoop, so please pardon me if this too simple question. I tried to google, but all the examples give listed jobs with state 1 only.

Just add all and you will get what you desire.
Execute something like: hadoop job -list all.
An example output would look something like the following:
$ hadoop job -list all
0 jobs submitted
States are:
Running : 1 Succeded : 2 Failed : 3 Prep : 4
JobId State StartTime UserName Priority SchedulingInfo
For more details about the hadoop commands and especialy hadoop job, read here.

Related

How to dynamically chose PBS queues during job submission

I run a lot of small computing jobs in remote cluster where job submission is managed by PBS. Normally in a PBS (bash) script I specify the queue that I would like to submit the job through the command
#PBS -q <queue_name>
The job queue that I need to chose depends on the load on a specific queue. Every time before I submit a job, I analyze this through the command on terminal
qstat -q
which provides me an output which looks like as follows
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
queue1 -- -- 03:00:00 -- 0 2 -- E R
queue2 -- -- 06:00:00 -- 8 6 -- E R
I would like to automate the queue selection by the job script based on two constraints
The queue selected must have a walltime more than the job time specified. The job time is specified through command #PBS -l walltime=02:30:00.
The queue must have the least no. of jobs in Que as shown in the above output.
I'm having trouble in identifying which tools that I need to use in terminal to help me automate the queue selection
It is possible that you could wrap your qsub submission in another script which would run qstat -q, parse the output, and then select the queue based on the walltime requested and how many active jobs are in each queue. The script could then submit the job and add -q <name of desired queue> to the end of the qsub command.
However, it seems that you are manually trying to do some of what a scheduler - with appropriate policies - does for you. Why do you need to dynamically switch queues? A better setup would be for the queues to essentially categorize the jobs - like you are already doing with walltime - and then allowing the scheduler to run the jobs appropriately. Any setup where a user needs to carefully select the queue seems a little suspect to me.

Meaning of hadoop job status

I run the command hadoop job -list all to show all the submitted jobs, and it shows state meaning as: Running : 1 Succeded : 2 Failed : 3 Prep : 4.
But now I have a job whose status is 5. The list command output like this: job_201209101415_429766 5 1358332807055 user NORMAL NA
Anybody konws what it means?
Thank you!
Job status 5 implies Killed. Please check this web link
http://hadoop.apache.org/docs/r0.20.2/api/constant-values.html#org.apache.hadoop.mapred.JobStatus.KILLED

how to kill hadoop jobs

I want to kill all my hadoop jobs automatically when my code encounters an unhandled exception. I am wondering what is the best practice to do it?
Thanks
Depending on the version, do:
version <2.3.0
Kill a hadoop job:
hadoop job -kill $jobId
You can get a list of all jobId's doing:
hadoop job -list
version >=2.3.0
Kill a hadoop job:
yarn application -kill $ApplicationId
You can get a list of all ApplicationId's doing:
yarn application -list
Use of folloing command is depreciated
hadoop job -list
hadoop job -kill $jobId
consider using
mapred job -list
mapred job -kill $jobId
Run list to show all the jobs, then use the jobID/applicationID in the appropriate command.
Kill mapred jobs:
mapred job -list
mapred job -kill <jobId>
Kill yarn jobs:
yarn application -list
yarn application -kill <ApplicationId>
An unhandled exception will (assuming it's repeatable like bad data as opposed to read errors from a particular data node) eventually fail the job anyway.
You can configure the maximum number of times a particular map or reduce task can fail before the entire job fails through the following properties:
mapred.map.max.attempts - The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.
mapred.reduce.max.attempts - Same as above, but for reduce tasks
If you want to fail the job out at the first failure, set this value from its default of 4 to 1.
Simply forcefully kill the process ID, the hadoop job will also be killed automatically . Use this command:
kill -9 <process_id>
eg: process ID no: 4040 namenode
username#hostname:~$ kill -9 4040
Use below command to kill all jobs running on yarn.
For accepted jobs use below command.
for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done
For running, jobs use the below command.
for x in $(yarn application -list -appStates RUNNING | awk 'NR > 2 { print $1 }'); do yarn application -kill $x; done

DATASTAGE: how to run more instance jobs in parallel using DSJOB

I have a question.
I want to run more instance of same job in parallel from within a script: I have a loop in which I invoke jobs with dsjob and without option "-wait" and "-jobstatus".
I want that jobs completed before script termination, but I don't know how to verify if job instance terminated.
I though to use wait command but it is not appropriate.
Thanks in advance
First,you should assure job compile option "Allow Multiple Instance" choose.
Second:
#!/bin/bash
. /home/dsadm/.bash_profile
INVOCATION=(1 2 3 4 5)
cd $DSHOME/bin
for id in ${INVOCATION[#]}
do
./dsjob -run -mode NORMAL -wait test demo.$id
done
project -- test
job -- demo
$id -- invocation id
the two line in shell scipt:guarantee the environment path can work.
Run the jobs like you say without the -wait, and then loop around running dsjob -jobinfo and parse the output for a job status of 1 or 2. When all jobs return this status, they are all finished.
You might find, though, that you check the status of the job before it actually starts running and you might pick up an old status. You might be able to fix this by first resetting the job instance and waiting for a status of "Not running", prior to running the job.
Invoke the jobs in loop without wait or job-status option
after your loop , check the jobs status by dsjob command
Example - dsjob -jobinfo projectname jobname.invocationid
you can code one more loop for this also and use sleep command under that
write yours further logic as per status of the jobs
but its good to create Job Sequence to invoke this multi-instance job simultaneously with the help of different invoaction-ids
create a sequence job if these are in same process
create different sequences or directly create different scripts to trigger these jobs simultaneously with invocation- ids and schedule in same time.
Best option create a standard generalized script where each thing will be getting created or getting value as per input command line parameters
Example - log files on the basis of jobname + invocation-id
then schedule the same script for different parameters or invocations .

How to stop a particular job while running Hive queries on Hadoop?

Scenario:
When I run enter the query on Hive CLI, I get the errors as below:
Query:
**$ bin/hive -e "insert overwrite table pokes select a.* from invites a where a.ds='2008-08-15';"**
Error is like this:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201111291547_0013, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201111291547_0013
Kill Command = C:\cygwin\home\Bhavesh.Shah\hadoop-0.20.2/bin/hadoop job
-Dmapred.job.tracker=localhost:9101 -kill job_201111291547_0013
2011-12-01 14:00:52,380 Stage-1 map = 0%, reduce = 0%
2011-12-01 14:01:19,518 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201111291547_0013 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
Question:
So my question is that how to stop a job? In this case the job is : job_201111291547_0013
Pls help me out so that I can remove these errors and try for next.
Thanks.
You can stop a job by running hadoop job -kill <job_id>.
hadoop job -kill is deprecated now.
Use mapred job -kill instead.
The log traces of the job launched provide the command to kill the job as well.You can use that to kill the job.
That however gives a warning that hadoop job -kill is deprecated. You can also use instead
mapred job -kill
One more option is to try WebHCat API from browser or command line, using utilities like Curl. Here's WebHCat API to delete a hive job
Also note that the link says that
The job is not immediately deleted, therefore the information returned may not reflect deletion, as in our example. Use GET jobs/:jobid to monitor the job and confirm that it is eventually deleted.

Resources