How can map and reduce run in parllel - hadoop

I am a beginner to hadoop & when I am running a hadoop job I noticed the progress log which shows map 80% reduce 25%. My understanding of map reduce is that mappers produce bunch of intermediate values. After mappers producing output there is shuffle/sort of intermediate pairs & these values are sent to reduce job. Can someone please explain me how map/reduce can work in parallel.

The outputs from the mappers have to be copied to the appropriate reducer nodes. This is called the shuffle process. That can start even before all the mappers have finished, since the decision of which key goes to which reducer is dependent only on the output key from the mapper. So the 25% progress you see is due to the shuffle phase.
After shuffle, there is a sort phase and then the reduce phase. Sort and reduce cannot happen unless all mappers have completed. Since shuffle can happen before the mappers finish, you can see a maximum of 33.33% reduce completion before the mappers have finished. This is because the default apache implementation considers shuffle, sort and reduce each to take an equal 33.33% of the time.

Related

Will there be Shuffle and sort in Map only task?

Does the shuffle and sort phase come before the end of the map task or does it come after the output is generated from the map task so that there is no look back to the map task anymore. This is a 'Map only task' case where I get confusion.
If there is no Shuffle and sort in Map only task, can someone explain how is the data written into the final output files.
When you have a map-only task, there is not shuffling at all, which means that mappers will write the final output directly to the HDFS.
On the other hand, when you have a whole Map-Reduce program, with mappers and reducers, yes, shuffling can start before reduce-phase start.
Quoting this very nice answer in SO:
First of all shuffling is the process of transfering data from the
mappers to the reducers, so I think it is obvious that it is necessary
for the reducers, since otherwise, they wouldn't be able to have any
input (or input from every mapper). Shuffling can start even before
the map phase has finished, to save some time. That's why you can see
a reduce status greater than 0% (but less than 33%) when the map
status is not yet 100%.
Hope this answer had clarified your confusion.

Reducer doesn't start still progress on MapReduce Job

If reducers do not start before all mappers finish then why does the progress on MapReduce job shows something like Map(50%) Reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?
Its is because of the mapreduce.job.reduce.slowstart.completedmaps property which's default value is 0.05.
It means that the reducer phase will be started as soon as atleast 5% of total mappers have completed the execution.
So the dispatched reducers will continue to stay in copy phase until all mappers are completed.
If you wish to start reducers only after all mappers have completed, then configure 1.0 value for the given property in the job configuration.
Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The progress calculation also takes in account the processing of data transfer which is done by reduce process, therefore the reduce progress starts showing up as soon as any intermediate key-value pair for a mapper is available to be transferred to reducer. Though the reducer progress is updated still the programmer defined reduce method is called only after all the mappers have finished

Why map and reduce run at the same time?

I am newbie on Hadoop. I remember I learned from somewhere that in Hadoop, all map functions have to be completed before reduce functions can start off.
But I just got the printout when I run a map reduce program like this:
map(15%), reduce(5%)
map(20%), reduce(7%)
map(30%), reduce(10%)
map(38%), reduce(17%)
map(40%), reduce(25%)
why they run in parallel?
Before actual Reduce phase starts, Shuffle, Sort and Merge take place as Mappers keep on completing. This percentage signifies that. It is not the actual Reduce phase. This happens in parallel to reduce the overhead which would otherwise be incurred if framework keeps on waiting for completion of all the Mappers first and then do the Shuffling, Sorting and Merging.

how to start sort and reduce in hadoop before shuffle completes for all mappers?

I understand from When do reduce tasks start in Hadoop that the reduce task in hadoop contains three steps: shuffle, sort and reduce where the sort (and after that the reduce) can only start once all the mappers are done. Is there a way to start the sort and reduce every time a mapper finishes.
For example lets we have only one job with mappers mapperA and mapperB and 2 reducers. What i want to do is:
mapperA finishes
shuffles copies the appropriate partitions of the mapperAs output lets say to reducer 1 and 2
sort on reducer 1 and 2 starts sorting and reducing and generates some intermediate output
now mapperB finishes
shuffles copies the appropriate partitions of the mapperBs output to reducer 1 and 2
sort and reduce on reducer 1 and 2 starts again and the reducer merges the new output with the old one
Is this possible? Thanks
You can't with the current implementation. However, people have "hacked" the Hadoop code to do what you want to do.
In the MapReduce model, you need to wait for all mappers to finish, since the keys need to be grouped and sorted; plus, you may have some speculative mappers running and you do not know yet which of the duplicate mappers will finish first.
However, as the "Breaking the MapReduce Stage Barrier" paper indicates, for some applications, it may make sense not to wait for all of the output of the mappers. If you would want to implement this sort of behavior (most likely for research purposes), then you should take a look at theorg.apache.hadoop.mapred.ReduceTask.ReduceCopier class, which implements ShuffleConsumerPlugin.
EDIT: Finally, as #teo points out in this related SO question, the
ReduceCopier.fetchOutputs() method is the one that holds the reduce
task from running until all map outputs are copied (through the while
loop in line 2026 of Hadoop release 1.0.4).
You can configure this using the slowstart property, which denotes the percentage of your mappers that need to be finished before the copy to the reducers starts. It usual default is in the 0.9 - 0.95 (90-95%) mark, but you can override to be 0 if your want
`mapreduce.reduce.slowstart.completed.map`
Starting the sort process before all mappers finish is sort of a hadoop-antipattern (if I may put it that way!), in that the reducers cannot know that there is no more data to receive until all mappers finish. you, the invoker may know that, based on your definition of keys, partitioner etc., but the reducers don't.

When do reduce tasks start in Hadoop?

In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?
The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. You can tell which one MapReduce is doing by looking at the reducer completion percentage: 0-33% means its doing shuffle, 34-66% is sort, 67%-100% is reduce. This is why your reducers will sometimes seem "stuck" at 33%-- it's waiting for mappers to finish.
Reducers start shuffling based on a threshold of percentage of mappers that have finished. You can change the parameter to get reducers to start sooner or later.
Why is starting the reducers early a good thing? Because it spreads out the data transfer from the mappers to the reducers over time, which is a good thing if your network is the bottleneck.
Why is starting the reducers early a bad thing? Because they "hog up" reduce slots while only copying data and waiting for mappers to finish. Another job that starts later that will actually use the reduce slots now can't use them.
You can customize when the reducers startup by changing the default value of mapred.reduce.slowstart.completed.maps in mapred-site.xml. A value of 1.00 will wait for all the mappers to finish before starting the reducers. A value of 0.0 will start the reducers right away. A value of 0.5 will start the reducers when half of the mappers are complete. You can also change mapred.reduce.slowstart.completed.maps on a job-by-job basis. In new versions of Hadoop (at least 2.4.1) the parameter is called is mapreduce.job.reduce.slowstart.completedmaps (thanks user yegor256).
Typically, I like to keep mapred.reduce.slowstart.completed.maps above 0.9 if the system ever has multiple jobs running at once. This way the job doesn't hog up reducers when they aren't doing anything but copying data. If you only ever have one job running at a time, doing 0.1 would probably be appropriate.
The reduce phase can start long before a reducer is called. As soon as "a" mapper finishes the job, the generated data undergoes some sorting and shuffling (which includes call to combiner and partitioner). The reducer "phase" kicks in the moment post mapper data processing is started. As these processing is done, you will see progress in reducers percentage. However, none of the reducers have been called in yet. Depending on number of processors available/used, nature of data and number of expected reducers, you may want to change the parameter as described by #Donald-miner above.
As much I understand Reduce phase start with the map phase and keep consuming the record from maps. However since there is sort and shuffle phase after the map phase all the outputs have to be sorted and sent to the reducer. So logically you can imagine that reduce phase starts only after map phase but actually for performance reason reducers are also initialized with the mappers.
The percentage shown for the reduce phase is actually about the amount of the data copied from the maps output to the reducers input directories.
To know when does this copying start? It is a configuration you can set as Donald showed above. Once all the data is copied to reducers (ie. 100% reduce) that's when the reducers start working and hence might freeze in "100% reduce" if your reducers code is I/O or CPU intensive.
Reduce starts only after all the mapper have fished there task, Reducer have to communicate with all the mappers so it has to wait till the last mapper finished its task.however mapper starts transferring data to the moment it has completed its task.
Consider a WordCount example in order to understand better how the map reduce task works.Suppose we have a large file, say a novel and our task is to find the number of times each word occurs in the file. Since the file is large, it might be divided into different blocks and replicated in different worker nodes. The word count job is composed of map and reduce tasks. The map task takes as input each block and produces an intermediate key-value pair. In this example, since we are counting the number of occurences of words, the mapper while processing a block would result in intermediate results of the form (word1,count1), (word2,count2) etc. The intermediate results of all the mappers is passed through a shuffle phase which will reorder the intermediate result.
Assume that our map output from different mappers is of the following form:
Map 1:-
(is,24)
(was,32)
(and,12)
Map2 :-
(my,12)
(is,23)
(was,30)
The map outputs are sorted in such a manner that the same key values are given to the same reducer. Here it would mean that the keys corresponding to is,was etc go the same reducer.It is the reducer which produces the final output,which in this case would be:-
(and,12)(is,47)(my,12)(was,62)
Reducer tasks starts only after the completion of all the mappers.
But the data transfer happens after each Map.
Actually it is a pull operation.
That means, each time reducer will be asking every maptask if they have some data to retrive from Map.If they find any mapper completed their task , Reducer pull the intermediate data.
The intermediate data from Mapper is stored in disk.
And the data transfer from Mapper to Reduce happens through Network (Data Locality is not preserved in Reduce phase)
When Mapper finishes its task then Reducer starts its job to Reduce the Data this is Mapreduce job.

Resources