Reducer doesn't start still progress on MapReduce Job - hadoop

If reducers do not start before all mappers finish then why does the progress on MapReduce job shows something like Map(50%) Reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?

Its is because of the mapreduce.job.reduce.slowstart.completedmaps property which's default value is 0.05.
It means that the reducer phase will be started as soon as atleast 5% of total mappers have completed the execution.
So the dispatched reducers will continue to stay in copy phase until all mappers are completed.
If you wish to start reducers only after all mappers have completed, then configure 1.0 value for the given property in the job configuration.

Reducers start copying intermediate key-value pairs from the mappers as soon as they are available. The progress calculation also takes in account the processing of data transfer which is done by reduce process, therefore the reduce progress starts showing up as soon as any intermediate key-value pair for a mapper is available to be transferred to reducer. Though the reducer progress is updated still the programmer defined reduce method is called only after all the mappers have finished

Related

MapReduce - reduce running while map is not finished

I've implemented a simple WordCount-application in hadoop. On my cluster, I have one namenode and 4 datanodes. Replication-rate is set to 4.
In the filesystem I have put many lorem-impsum-files.
While running the wordcount application I see the reducer working even though the mappers aren't finished yet.
2021-10-29 14:53:31,044 INFO mapreduce.Job: map 70% reduce 23%
How does this work?
On many tutorial pages is written (one page for example):
"A reducer cannot start while a mapper is still in progress"
https://www.talend.com/resources/what-is-mapreduce/
How can the reducers work if the result set of mapping isn't completed?
Once data is emitted by a mapper, it undergoes two steps:
It is shuffled - this is the process of sending data to the correct reducer depending on its key and the partitioner logic.
It is sorted - this happens on the reducer itself.
So even though data is still being emitted by the mapper, reducer tasks are being created and are sorting data as it arrives. You're correct in that they won't actually start processing values until all mapping has finished.

Which method stops reducers from starting the actual reduce phase in hadoop yarn?

I am new to hadoop yarn and want reducers to start the actual reducing process before the completion of all the maps. I tried to find out the class where the reducers are invoked but could not find out. Can any one help me in this regard?
The reducers can only start collecting the output of mappers, before all the mappers are completed. This is called the shuffle phase.
However, they cannot start the sorting and reduce phases, since they need to have ALL the map output records, before starting to work on them. The reason is simple:
Imagine the wordcount example and that you want to count the frequency of a word. In the reduce phase, if you emit a value (the frequency) of a key (the word), before getting the output of all the mappers (i.e., some counts are missing for this word), then, you may give the wrong frequency of a word.
You can change the time when the reducers start collecting (not reducing) the mappers' outputs, by setting the mapreduce.job.reduce.slowstart.completedmaps property to 1, meaning that the reducers will only start when ALL the mappers are complete: conf.set(mapreduce.job.reduce.slowstart.completedmaps, "1.00");. In the old API this property used to be (based on this link):
mapred.reduce.slowstart.completed.maps

Hadoop combiner execution on reducers

I have a long running MapReduce job with some mappers taking considerably more time than others.
Checking the stats on the web interface, I saw that my combiner also kicked in on the reducers (which where mostly idle as just 2 mappers were still running).
Although it seems reasonable to not waste time and do some pre-aggregation until all mappers have finished, I cannot find any documentation for this behaviour. Can anyone confirm that this is indeed a feature of Hadoop or just displayed wrong on the web interface?
The combiner starts when a reasonable amount of data has been emitted by the mapper. Note that a combiner runs as an aggregation (typically) of a mapper's output (and not on the reduce side). More details can be found here.
Also, the reducers can start gathering (only) the data that are emitted by the mappers, before all the mappers have finished. That is called the shuffling phase of the reducer. You can change the time when the reducers will start gathering data, by changing the mapred.reduce.slowstart.completed.maps property (or mapreduce.job.reduce.slowstart.completedmaps in newer versions). More details on this SO post.

How can map and reduce run in parllel

I am a beginner to hadoop & when I am running a hadoop job I noticed the progress log which shows map 80% reduce 25%. My understanding of map reduce is that mappers produce bunch of intermediate values. After mappers producing output there is shuffle/sort of intermediate pairs & these values are sent to reduce job. Can someone please explain me how map/reduce can work in parallel.
The outputs from the mappers have to be copied to the appropriate reducer nodes. This is called the shuffle process. That can start even before all the mappers have finished, since the decision of which key goes to which reducer is dependent only on the output key from the mapper. So the 25% progress you see is due to the shuffle phase.
After shuffle, there is a sort phase and then the reduce phase. Sort and reduce cannot happen unless all mappers have completed. Since shuffle can happen before the mappers finish, you can see a maximum of 33.33% reduce completion before the mappers have finished. This is because the default apache implementation considers shuffle, sort and reduce each to take an equal 33.33% of the time.

When do reduce tasks start in Hadoop?

In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?
The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. You can tell which one MapReduce is doing by looking at the reducer completion percentage: 0-33% means its doing shuffle, 34-66% is sort, 67%-100% is reduce. This is why your reducers will sometimes seem "stuck" at 33%-- it's waiting for mappers to finish.
Reducers start shuffling based on a threshold of percentage of mappers that have finished. You can change the parameter to get reducers to start sooner or later.
Why is starting the reducers early a good thing? Because it spreads out the data transfer from the mappers to the reducers over time, which is a good thing if your network is the bottleneck.
Why is starting the reducers early a bad thing? Because they "hog up" reduce slots while only copying data and waiting for mappers to finish. Another job that starts later that will actually use the reduce slots now can't use them.
You can customize when the reducers startup by changing the default value of mapred.reduce.slowstart.completed.maps in mapred-site.xml. A value of 1.00 will wait for all the mappers to finish before starting the reducers. A value of 0.0 will start the reducers right away. A value of 0.5 will start the reducers when half of the mappers are complete. You can also change mapred.reduce.slowstart.completed.maps on a job-by-job basis. In new versions of Hadoop (at least 2.4.1) the parameter is called is mapreduce.job.reduce.slowstart.completedmaps (thanks user yegor256).
Typically, I like to keep mapred.reduce.slowstart.completed.maps above 0.9 if the system ever has multiple jobs running at once. This way the job doesn't hog up reducers when they aren't doing anything but copying data. If you only ever have one job running at a time, doing 0.1 would probably be appropriate.
The reduce phase can start long before a reducer is called. As soon as "a" mapper finishes the job, the generated data undergoes some sorting and shuffling (which includes call to combiner and partitioner). The reducer "phase" kicks in the moment post mapper data processing is started. As these processing is done, you will see progress in reducers percentage. However, none of the reducers have been called in yet. Depending on number of processors available/used, nature of data and number of expected reducers, you may want to change the parameter as described by #Donald-miner above.
As much I understand Reduce phase start with the map phase and keep consuming the record from maps. However since there is sort and shuffle phase after the map phase all the outputs have to be sorted and sent to the reducer. So logically you can imagine that reduce phase starts only after map phase but actually for performance reason reducers are also initialized with the mappers.
The percentage shown for the reduce phase is actually about the amount of the data copied from the maps output to the reducers input directories.
To know when does this copying start? It is a configuration you can set as Donald showed above. Once all the data is copied to reducers (ie. 100% reduce) that's when the reducers start working and hence might freeze in "100% reduce" if your reducers code is I/O or CPU intensive.
Reduce starts only after all the mapper have fished there task, Reducer have to communicate with all the mappers so it has to wait till the last mapper finished its task.however mapper starts transferring data to the moment it has completed its task.
Consider a WordCount example in order to understand better how the map reduce task works.Suppose we have a large file, say a novel and our task is to find the number of times each word occurs in the file. Since the file is large, it might be divided into different blocks and replicated in different worker nodes. The word count job is composed of map and reduce tasks. The map task takes as input each block and produces an intermediate key-value pair. In this example, since we are counting the number of occurences of words, the mapper while processing a block would result in intermediate results of the form (word1,count1), (word2,count2) etc. The intermediate results of all the mappers is passed through a shuffle phase which will reorder the intermediate result.
Assume that our map output from different mappers is of the following form:
Map 1:-
(is,24)
(was,32)
(and,12)
Map2 :-
(my,12)
(is,23)
(was,30)
The map outputs are sorted in such a manner that the same key values are given to the same reducer. Here it would mean that the keys corresponding to is,was etc go the same reducer.It is the reducer which produces the final output,which in this case would be:-
(and,12)(is,47)(my,12)(was,62)
Reducer tasks starts only after the completion of all the mappers.
But the data transfer happens after each Map.
Actually it is a pull operation.
That means, each time reducer will be asking every maptask if they have some data to retrive from Map.If they find any mapper completed their task , Reducer pull the intermediate data.
The intermediate data from Mapper is stored in disk.
And the data transfer from Mapper to Reduce happens through Network (Data Locality is not preserved in Reduce phase)
When Mapper finishes its task then Reducer starts its job to Reduce the Data this is Mapreduce job.

Resources