is the cleanup() method called for failed map tasks? - hadoop

Is the cleanup() method called for failed map tasks? If so, how it ensures 'atomicity'?
In my case, am preparing some statistics in mapper which are written to DB in the cleanup() method. In this case, if a mapper fails in the mid of executing its input split, the cleanup method will write the till processed data into DB? This will result in incorrect statistics, as the alternate mapper attempt will also write the same data again.

Depending on when your mapper fails, the cleanup may be called or not. For example, if your mapper fails in map method, the cleanup will not be invoked. But if your mapper fails in cleanup method, the cleanup has already been called.
If the mapper fails, hadoop usually will relaunch the mapper task on another machine. So you need to make sure that running your mappers or reducers several times is always outputing the same result, or it will be hard to debug.
For your situation, you can set up some Counters to collect the statistics data and read the Counters after your Job successes. If some mapper fails, the part Counters will be dropped out. Counters after your Job successes will be guaranteed to be correct.

Related

Mapred Task Timeout

I have written a Map only job, where data is written from one HBase table to another, after some processing. But in my setup method of mapper, I am loading data from a file which takes more time than my mapred.task.timeout configuration.
I read the explanation given here. My question is,
1) will there be no communication between the task and the task tracker in the middle of a setup phase?
2) How to update status string??
Job wont timeout as long as there is a progress
Progress reporting is important, as Hadoop will not fail a task that’s making progress. All of the following operations constitute progress:
• Reading an input record (in a mapper or reducer)
• Writing an output record (in a mapper or reducer)
• Setting the status description on a reporter (using Reporter’s
setStatus() method)
• Incrementing a counter (using Reporter’s incrCounter() method)
• Calling Reporter’s progress() method
so if you keep on doing any of this at a nominal interval that job wont be killed.

Hadoop mapper task detailed execution time

For a certain Hadoop MapReduce mapper task, I have already had the mapper task's complete execution time. In general, a mapper has three steps: (1)read input from HDFS or other source like Amazon S3; (2)process input data; (3)write intermediate result to local disk. Now, I am wondering if it's possible to know the time spent by each step.
My purpose is to get the result of (1) how long does it take for mappers to read input from HDFS or S3. The result just indicate how fast a mapper could read. It's more like a I/O performance for a mapper; (2) how long does it take for the mapper to process these data, it's more like the computing capability of the task.
Anyone has any idea for how to acquire these results?
Thanks.
Just implement a read-only mapper that does not emit anything. This will then give an indication of how long it takes for each split to be read (but not processed).
You can as a further step define a variable passed to the job at runtime (via the job properties) which allows you to do just one of the following (by e.g. parsing the variable against an Enum object and then switching on the values):
just read
just read and process (but not write/emit anything)
do it all
This of course assumes that you have access to the mapper code.

what happen to Reducer while Map operation sends non key value as an output in MapReduce

Map operation generally take input as key and value pair. and it will return same key and value pair as output. If map will return non key-value pair output, that time how Reducer will process that output.
Please any one assist on this would be appreciated
I am not sure about Java MapReduce, but in Hadoop Streaming if the mappers do not produce any output the reducers will not be run.
You can test it by creating 2 small python scripts:
A mapper that simply consumes the input without producing anything
#!/usr/bin/python
input()
A reducer that crashes as soon at it is started
#!/usr/bin/python
sys.exit("some error message")
If you launch it, the MapReduce job will complete without any error

When do the results from a mapper task get deleted from disk?

When do the outputs for a mapper task get deleted from the local filesystem? Do they persist until the entire job completes or do they get deleted at an earlier time than that?
In addition to the map and reduce tasks, two further tasks are created: a job setup task
and a job cleanup task. These are run by tasktrackers and are used to run code to setup
the job before any map tasks run, and to cleanup after all the reduce tasks are complete.
The OutputCommitter that is configured for the job determines the code to be run, and
by default this is a FileOutputCommitter. For the job setup task it will create the final
output directory for the job and the temporary working space for the task output, and
for the job cleanup task it will delete the temporary working space for the task output.
Have a look at OutputCommitter.
If your hadoop.tmp.dir is set to a default setting (say, /tmp/), it will most likely be subject to tmpwatch and any default settings in your OS. I would suggest poking around in /etc/cron.d/, /etc/cron.daily, etc/cron.weekly/, etc., to see exactly what your OS default is like.
One thing to keep in mind about tmpwatch is that, by default, it will key on access time, not modification time (i.e., files that have not been 'touched' since X will be considered 'stale' and subject to removal). However, it's a common practice with Hadoop to mount filesystems with the noatime and nodiratime flags, meaning that access times will not get updated and thus skewing your tmpwatch behaviors.
Otherwise, Hadoop will purge task attempt logs older than 24 hours (after task completion), by default. While a few years old, this writeup has some great info on the default behaviors. Take a look in particular at the sections that refer to mapreduce.job.userlog.retain.hours.
EDIT: responding to OP's comment, which clears up my misunderstanding of the question:
As far as the intermediate output of map tasks which is spilled to disk, used by any combiners, and copied to any reducers, the Hadoop Definitive Guide has this to say:
Tasktrackers do not delete map outputs from disk as soon as the first
reducer has retrieved them, as the reducer may fail. Instead, they
wait until they are told to delete them by the jobtracker, which is
after the job has completed.
Source
I've also +1'd #mgs answer below, as they have linked the source code that controls this and described the Job cleanup task.
So, yes, the map output data is deleted immediately after the job completes, successfully or not, and no sooner.
"Tasktrackers do not delete map outputs from disk as soon as the first reducer has retrieved them, as the reducer may fail. Instead, they wait until they are told to delete them by the jobtracker, which is after the job has completed"
Hadoop: The Definitive Guide ( Section 6.4)

Query regarding shuffling in map reduce

How does a node processing running the mapper knows that it has to send some key-value output to node A (running the reducer) & some to node B (running another reducer)?
Is there somewhere a reducer node list is maintained by the the JobTracker?
If yes, how does it chooses a node to run the reducer?
A Mapper doesn't really know where to send the data, it focuses on 2 things:
Writes the data to disk. Initially the map output is buffered in memory, and once it hits a certain threshold it gets flushed to disk. But right before going to disk, the data is partitioned by taking a hash of the output key which corresponds to which Reducer it will be sent to.
Once a map task is done it will notify the parent task tracker to say it's done, which will then notify the job tracker itself. So the job tracker has the complete mapping between map outputs and task trackers.
From there, when a Reducer starts, it will keep asking the job tracker for the map outputs corresponding to his partition until it has retrieved them all. Whenever a map output is available, the reduce task will start copying it, and gradually merge as it copies.
If this is still unclear, I will advise looking at the reference book on Hadoop which has a whole chapter describing this part, here is a schema extracted from it that could help you visualize what happens in the shuffle step:
The mappers do not send the data to the reducers, rather the reducers pull the data from the task trackers where successful map tasks ran.
The Job Tracker, when allocating a reducer task to a task tracker, knows where the successful map tasks ran, and can compile a list of task tracker and map attempt task results to pull.

Resources