Recent Posts

Wednesday, 14 March 2018

Hadoop Reducer

Hadoop Reducer 
     In Hadoop, Reducer takes the output of the Mapper (intermediate key-value pair) process each of them to generate the output. The output of the reducer is the final output, which is stored in HDFS. Usually, in the Hadoop Reducer, we do aggregation or summation sort of computation.
 
What is Hadoop Reducer?
     The Hadoop Reducer process the output of the mapper. After processing the data, it produces a new set of output. At last HDFS stores this output data. Hadoop Reducer takes a set of an intermediate key-value pair produced by the mapper as the input and runs a Reducer function on each of them. One can aggregate, filter, and combine this data (key, value) in a number of ways for a wide range of processing. Reducer first processes the intermediate values for particular key generated by the map function and then generates the output (zero or more key-value pair).
     One-one mapping takes place between keys and reducers. Reducers run in parallel since they are independent of one another. The user decides the number of reducers. By default number of reducers is 1.

Phases of Reducer in MapReduce
     As you can see in the diagram at the top, there are 3 phases of Reducer in Hadoop MapReduce. Let’s discuss each of them one by one-
1. Shuffle Phase
     In this phase, the sorted output from the mapper is the input to the Reducer. In Shuffle phase, with the help of HTTP, the framework fetches the relevant partition of the output of all the mappers.

2. Sort Phase
     In this phase, the input from different mappers is again sorted based on the similar keys in different Mappers. The shuffle and sort phases occur concurrently.

3. Reduce Phase
     In this phase, after shuffling and sorting, reduce task aggregates the key-value pairs. The OutputCollector.collect() method, writes the output of the reduce task to the Filesystem. Reducer output is not sorted.

How many Reducers in Hadoop?
     With the help of Job.setNumreduceTasks(int) the user set the number of reducers for the job. The right number of reducers are 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of the maximum container per node>).
     With 0.95, all reducers immediately launch and start transferring map outputs as the maps finish. With 1.75, the first round of reducers is finished by the faster nodes and second wave of reducers is launched doing a much better job of load balancing.

Increasing the number of reducers
1. Increases the Framework overhead.
2. Increases load balancing.
3. Lowers the cost of failures.

How Map and Reduce work together 

Previous Tutorial : Hadoop Mapper 
 

1 comment:

  1. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.
    big-data-hadoop-training-institute-in-bangalore
    hadoop-training-institute-in-chennai

    ReplyDelete