Hadoop: Do not re-schedule a failed reducer - hadoop

This is how Hadoop currently works: If a reducer fails (throws a NullPointerException for example), Hadoop will reschedule another reducer to do the task of the reducer that failed.
Is it possible to configure Hadoop to not reschedule failed reducers i.e. if any reducer fails, Hadoop merely reports failure and does nothing else.
Of course, the reducers that did not fail will continue to completion.

you can set the mapred.reduce.max.attempts property using the Configuration class the job.xml
setting it to 0 should solve your problem

If you set the configuration to not reschedule failed tasks as soon as the first one fails your jobtracker will fail the job and kill currently running tasks. So what you want to do is pretty much impossible.

Related

Hadoop 0.20: "job setup" task

I am not sure if this is something that has been fixed for newer releases of Hadoop, but I'm currently locked into running Hadoop 0.20 (legacy code).
Here's the issue: when I launch a Hadoop job, there is "Job setup" task that needs to run first. It seems to me that Hadoop randomly picks this task to be either a map task or a reduce task.
We have more capacity for map tasks configured than reduce tasks, so whenever I get unlucky and have a reduce startup task, it takes forever long for my job to even start running. Any ideas how to overcome this?
Hadoop job first complete all your mapper task. Once all the mapper task is completed then it will go across the network and do shuffling and sorting and only after then your reducer task will start processing. So i guess there could possibly be some other for this delay.

"single point of failure" job tracker node goes down and Map jobs are either running or writing the output

I'm new to hadoop and I would like to know what happens when a "single point of failure" job tracker node goes down and Map jobs are either running or writing the output. Would the Jobtracker starts all mapjobs all over again ?
Job tracker is a single point of failure meaning if it goes down you wont be able to submit any additional Map/reduce jobs and existing jobs would be killed.
When you restart your job tracker, you would need to resubmit whole job again.

how to check that mapreduce is running parallelly?

I submitted a mapreduce job and checked the log.
In the log l see that there are many mappers, each mapper processes one split, and the processing details of each mapper is logged in the log file sequentially in time.
However, I would like to check if my job is running parallelly and I want to see how many mappers are running concurrently.
I don't know where to find these informations.
Please help me, thx!
Use following Jobtracker Web UI and drill down to executing MapReduce job
http://<Jobtracker-HostName>:50030/

How to fake task reporting in hadoop job?

I am using hadoop 1.0.3 to run some data crunching jobs. My reducer does not write to the HDFS, instead, I make my reducer write the result directly to mongoDB. Recently I have started to face a problem; my jobs some times "timeout" and restart and the message that I get from hadoop console is "Task attempt_201301241103_0003_m_000001_0 failed to report status for 601 seconds". So I think the problem lies with my approach, which is to write to mongodb instead of HDFS. I want to fake hadoop job status report. How can I do that ? Please help.
Also, I have observed that my reducer always remains 0% and only the Map phase shows constant increment in %. As soon as the job completes, the reducer shows 100% all of a sudden.
Thankyou,
Regards,
Mohsin
The message on the console you are seeing is from a map phase. Notice the "m" in it. To keep sending progress, you can do context.progress(); in the map method.
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/StatusReporter.html

How to debug a hung hadoop map-reduce job

I run MR Job, Map Phase run successful, but Reduce Phase complied at 33% and hang (hanging about 1 hour) status: "reduce > sort"
How i can debug it?
It may be nothing to do with your case, but I had this happen when IPTABLES (~firewall) was mis-configured on one node. When that node was assigned a reducer role, the reduce phase would hang at 33%. Check the error logs to make sure the connections are working, especially if you have recently added new nodes and/or configured them manually.

Resources