Hadoop shuffle/merge time: average vs. total - hadoop

Hadoop outputs the following statistics:
average map time
average reduce time
average shuffle time
average merge time
The total map and reduce time can be obtained by multiplying the number of completed maps/reduces with these averages. But how can the total shuffle/merge time be obtained? Or: how is the average shuffle time calculated?

Average Map Time = Total time taken by all Map tasks/ Count of Map Tasks
Average Reduce Time = Total time taken by all Reduce tasks/Count of Reduce tasks
Average Merge time = Average of (attempt.sortFinishTime - attempt.shuffleFinishTime)
In Shuffle phase, intermediate data, which was generated by Map tasks is directed to the right reducers. The Shuffle phase assigns keys to reducers &
sends all values of a particular key to the right reducer.
Sorting also happens in this phase before sending output values to Reducer.
The shuffle phase involves transfer of data across the network from Map nodes.
From Apache link
Shuffle
Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP.
Sort
The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage.
The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
Hadoop framework will execute these two phases : shuffling & sorting

Related

Why the time of Hadoop job decreases significantly when reducers reach certain number

I test the scalability of a MapReduce based algorithm with increasing number of reducers. It looks fine generally (the time decreases with increasing reducers). But the time of the job always decreases significantly when the reducer reach certain number (30 in my hadoop cluster) instead of decreasing gradually. What are the possible causes?
Something about My Hadoop Job:
(1) Light Map Phase. Only a few hundred lines input. Each line will generate around five thousand key-value pairs. The whole map phase won't take more than 2 minutes.
(2) Heavy Reduce Phase. Each key in the reduce function will match 1-2 thousand values. And the algorithm in reduce phase is very compute intensive. Generally the reduce phase will take around 30 minutes to be finished.
Time performance plot:
it should be because of high no of key-value pair. At specific no of reducers they are getting equally distributed to the reducers, which is resulting in all reducer performing the task at almost same time.Otherwise it might be the case that combiner keeps on waiting for 1 or 2 heavily loaded reducers to finish there job.
IMHO it could be that with sufficient number of reducers available the network IO (to transfer intermediate results) between each reduce stage decreases.
As network IO is usually the bottleneck in most Map-Reduce programs. This decrease in network IO needed will give significant improvement.

why Average merge time in Hadoop yarn(2.7.1) is zero? and how to obtain precises of this time

I have a question about average merge time in Hadoop yarn 2.7.1
I ran a wordcount example on cluster with 7 node with a txt file (1.5GB)
as you can see in following picture a job has 12 map task and one reduce task what Average merge time is Zero?
does this mean that sorting or merging of 12 map output is zero?
screenshot
please guide me
No,
Average merge time is avg(sortFinishTime-shuffleFinishTime)
Reducer receives input from multiple mappers during shuffle. Once the input is received, these are appended into single file at reducer(locally) and is sorted. Once sorted, reduce phase starts.
Average merge time is the average of time taken after shuffle phase and before reduce phase starts.

Meaning of map time or reduce time in JobHistoryServer

I want to know the exact meaning of the notations in the below picture. This picture came from job history server web UI. I definitely know the meaning of Elapsed but I am not sure about other things. Where can I find clear definition of those? Or is there anyone who knows the meaning of those?
What I want to know is map time, reduce time, shuffle time and merge time separately. And the sum of the four time should be very similar(or equal) to elapsed time. But the 'Average' keyword makes me confuse.
There are 396 map, and 1 reduce.
As you probably already know, there are three phases to a MapReduce job:
Map is the 1st phase, where each Map task is provided with an input split, which is a small portion of the total input data. The Map tasks process data from the input split & output intermediate data which needs to go to the reducers.
Shuffle phase is the next step, where the intermediate data that was generated by Map tasks is directed to the correct reducers. Reducers usually handle a subset of the total number of keys generated by the Map task. The Shuffle phase assigns keys to reducers & sends all values pertaining to a key to the assigned reducer. Sorting (or Merging) is also a part of this phase, where values of a given key are sorted and sent to the reducer. As you may realize, the shuffle phase involves transfer of data across the network from Map -> Reduce tasks.
Reduce is the last step of the MapReduce Job. The Reduce tasks process all values pertaining to a key & output their results to the desired location (HDFS/Hive/Hbase).
Now coming to the average times, you said there were 396 map tasks. Each Map task is essentially doing exactly the same processing job, but on different chunks of data. So the Average Map time is basically the average of time taken by all 396 map tasks to complete.
Average Map Time = Total time taken by all Map tasks/ Number of Map Tasks
Similarly,
Average Reduce Time = Total time taken by all Reduce tasks/Number of Reduce tasks
Now, why is the average time significant? It is because, most, if not all your map tasks & reduce tasks would be running in parallel (depending on your cluster capacity/ no. of slots per node, etc.). So calculating the average time of all map tasks & reduce tasks will give you good insight into the completion time of the Map or Reduce phase as a whole.
Another observation from your screenshot is that your Shuffle phase took 40 minutes. There can be several reasons for this.
You have 396 map tasks, each generating intermediate data. The shuffle phase had to pass all this data across the network to just 1 reducer, causing a lot of network traffic & hence increasing transfer time. Maybe you can optimize performance by increasing the number of reducers.
The network itself has very low bandwidth, and cannot efficiently handle large amounts of data transfer. In this case, consider deploying a combiner, which will effectively reduce the amount of data flowing through your network between the map and reduce phases.
There are also some hidden costs of execution such as job setup time, time required by job tracker to contact task trackers & assign map/reduce tasks, time taken by slave nodes to send heartbeat signals to JobTracker, time taken by NameNode to assign storage block & create Input splits, etc. which all go into the total elapsed time.
Hope this helps.

How can map and reduce run in parllel

I am a beginner to hadoop & when I am running a hadoop job I noticed the progress log which shows map 80% reduce 25%. My understanding of map reduce is that mappers produce bunch of intermediate values. After mappers producing output there is shuffle/sort of intermediate pairs & these values are sent to reduce job. Can someone please explain me how map/reduce can work in parallel.
The outputs from the mappers have to be copied to the appropriate reducer nodes. This is called the shuffle process. That can start even before all the mappers have finished, since the decision of which key goes to which reducer is dependent only on the output key from the mapper. So the 25% progress you see is due to the shuffle phase.
After shuffle, there is a sort phase and then the reduce phase. Sort and reduce cannot happen unless all mappers have completed. Since shuffle can happen before the mappers finish, you can see a maximum of 33.33% reduce completion before the mappers have finished. This is because the default apache implementation considers shuffle, sort and reduce each to take an equal 33.33% of the time.

comparison between the time of a map-reduce job with and without a reducer

In my Hapoop job, when I set the number of reducers to 0, the mapping phase is dramatically faster than the case in which the number of reducers is not 0. In the beginning of the mapping phase there is reducer running, so I don't understand why the mapping time dramatically increases.
If you have not configured a reducer, the map output will not be sorted before written to disk.
The reason is that Hadoop uses an external sort algorithm, which means that the map tasks sort their task output [1]. Then the reducer just merges the sorted map output segments together.
In case there is no reducer, there is no need to group the data on the key- thus no need to sort.
[1] Addition for possible nit-pickers: A map task starts to sort once its output buffer is filled up. This sorted segment is spilled to disk and merged at the end of the map task with all other spilled segments until a single sorted file emerges. Sending a single file (maybe even compressed) is much more efficient for bandwidth usage / transfer performance. On the reducer side, the sorted files will then be merged again. The very last merge pass is directly streamed into the reduce method.

Resources