Recent Posts

Sunday, 8 April 2018

Hadoop Partitioner Tutorial

Hadoop Partitioner
     Partitioning of the keys of the intermediate map output is controlled by the Partitioner. By hash function, key (or a subset of the key) is used to derive the partition. According to the key-value each mapper output is partitioned and records having the same key value go into the same partition (within each mapper), and then each partition is sent to a reducer. Partition class determines which partition a given (key, value) pair will go. Partition phase takes place after map phase and before reduce phase. 
     MapReduce job takes an input data set and produces the list of the key-value pair which is the result of map phase in which input data is split and each task processes the split and each map, output the list of key-value pairs. Then, the output from the map phase is sent to reduce task which processes the user-defined reduce function on map outputs. But before reduce phase, partitioning of the map output take place on the basis of the key and sorted.
     This partitioning specifies that all the values for each key are grouped together and make sure that all the values of a single key go to the same reducer, thus allows even distribution of the map output over the reducer. Partitioner in Hadoop MapReduce redirects the mapper output to the reducer by determining which reducer is responsible for the particular key.
     The total number of Partitioners that run in Hadoop is equal to the number of reducers i.e. Partitioner will divide the data according to the number of reducers which is set by JobConf.setNumReduceTasks() method. Thus, the data from single partitioner is processed by a single reducer. And partitioner is created only when there are multiple reducers.
     The Default Hadoop partitioner in Hadoop MapReduce is Hash Partitioner which computes a hash value for the key and assigns the partition based on this result. If in data input one key appears more than any other key. In such case, we use two mechanisms to send data to partitions.
  • The key appearing more will be sent to one partition.
  • All the other key will be sent to partitions according to their hashCode().
     But if hashCode() method does not uniformly distribute other keys data over partition range, then data will not be evenly sent to reducers. Poor partitioning of data means that some reducers will have more data input than other i.e. they will have more work to do than other reducers. So, the entire job will wait for one reducer to finish its extra-large share of the load.

How to overcome poor partitioning in MapReduce?
     To overcome poor partitioner in Hadoop MapReduce, we can create Custom partitioner, which allows sharing workload uniformly across different reducers. 

Next Tutorial : Hadoop Combiner Tutorial

Previous Tutorial : Hadoop Recordreader Tutorial 
 

11 comments:

  1. Aside from these aptitudes, the information researcher has business sharpness, relational abilities, advancement, and critical thinking capacity. data science course in pune

    ReplyDelete
  2. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    data science course

    ReplyDelete
  3. Cool stuff you have and you keep Business Analytics Course In Pune overhaul every one of us

    ReplyDelete
  4. Attend The Machine Learning course in Bangalore From ExcelR. Practical Machine Learning course in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning course in Bangalore.
    Machine Learning course in Bangalore

    ReplyDelete
  5. Thank you so much for helping me out to find the Data analytics course in Mumbai Organisations and introducing reputed stalwarts in the industry dealing with data analyzing & assorting it in a structured and precise manner. Keep up the good work. Looking forward to view more from you.

    ReplyDelete
  6. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai

    ReplyDelete
  7. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!
    Data Analytics Course in Mumbai

    ReplyDelete
  8. Nice Blog...Very interesting to read this article. I have learn some new information.thanks for sharing.
    ExcelR Mumbai

    ReplyDelete