Running a few map reduce jobs and one job takes over the all the reducer capacity. Is there a way to kill one or two reducer tasks to free up the cluster?
I can go directly to the one of the task tracker server and kill the java process manually. But I am wondering if there is a more decent way to do this?
You can kill the task-attempt by :
hadoop job -kill-task [task_attempt_id]
To get the task-attempt-id, you need to go one level deeper into the task(by clicking on task hyperlink on job tracker).
First find the job ID:
hadoop job -list
Now, kill the job:
hadoop job -kill <job_ID_goes_here>
hadoop job -kill-task [attempt-id] wherein the attempt-id can be obtained from the UI.
Related
I have a requirement, where I need to monitor hadoop job (Hive/Map Reduce, spark ) that are running for long, may be say 3 hr duration in the cluster. I know I can view all these jobs in UI, but I need to monitor it every hourly or 30 min and send email/alerts if job is running for more then 3 hours. Is there a way to do this.
My environment is HDP 2.6
Thanks in Advance....
You can look into Oozie. Oozie allows you to configure alerts if a job exceeds it's expected run-time.
In order to use this feature you'd have to submit your job as an Oozie workflow.
http://oozie.apache.org/docs/4.2.0/DG_Overview.html
https://oozie.apache.org/docs/4.3.0/DG_SLAMonitoring.html#SLA_Definition_in_Workflow
as tk421 mentions - oozie is the "right" way to do this in the context of hadoop.
however, if you do not require all the overhead, something simple like an on-demand watchdog timer might be sufficient (ie: wdt.io) . Basically the workflow is send the start signal, start the job, and send an end signal when the job completes. IF the second signal does not come in within the allotted amount of time an email / sms alert is dispatched.
This method would work for non-hadoop workflows as well.
I am not sure if this is something that has been fixed for newer releases of Hadoop, but I'm currently locked into running Hadoop 0.20 (legacy code).
Here's the issue: when I launch a Hadoop job, there is "Job setup" task that needs to run first. It seems to me that Hadoop randomly picks this task to be either a map task or a reduce task.
We have more capacity for map tasks configured than reduce tasks, so whenever I get unlucky and have a reduce startup task, it takes forever long for my job to even start running. Any ideas how to overcome this?
Hadoop job first complete all your mapper task. Once all the mapper task is completed then it will go across the network and do shuffling and sorting and only after then your reducer task will start processing. So i guess there could possibly be some other for this delay.
I'm new to hadoop and I would like to know what happens when a "single point of failure" job tracker node goes down and Map jobs are either running or writing the output. Would the Jobtracker starts all mapjobs all over again ?
Job tracker is a single point of failure meaning if it goes down you wont be able to submit any additional Map/reduce jobs and existing jobs would be killed.
When you restart your job tracker, you would need to resubmit whole job again.
How can I know after the job has completed, how many nodes that actually ran on that job, how many map task, and how many reduce task?
Thanks....
You could use jobtracker UI for the same. It's running on 50030 by default and URL would look like http://myhost:50030/.
Once you go there, you could see how many mapper's and how many reducers that were used by your job. You could play around by clicking on job link itself.
This is how Hadoop currently works: If a reducer fails (throws a NullPointerException for example), Hadoop will reschedule another reducer to do the task of the reducer that failed.
Is it possible to configure Hadoop to not reschedule failed reducers i.e. if any reducer fails, Hadoop merely reports failure and does nothing else.
Of course, the reducers that did not fail will continue to completion.
you can set the mapred.reduce.max.attempts property using the Configuration class the job.xml
setting it to 0 should solve your problem
If you set the configuration to not reschedule failed tasks as soon as the first one fails your jobtracker will fail the job and kill currently running tasks. So what you want to do is pretty much impossible.