Recent Posts

Monday, 15 January 2018

Rack Awareness in Hadoop HDFS

     Rack is the collection of machines which are physically located in a single place\data-center connected through traditional network design and top of rack switching mechanism. In Hadoop, Rack is a physical collection of slave machines put together at a single location for data storage. There can be multiple racks in a single location.
      In a large cluster of Hadoop, in order to improve the network traffic while reading/writing HDFS file, NameNode chooses the DataNode which is closer to the same rack or nearby rack to Read/Write request. NameNode achieves rack information by maintaining the rack ids of each DataNode. This concept that chooses closer DataNodes based on the rack information is called Rack Awareness in Hadoop.
     Rack awareness is having the knowledge of Cluster topology or more specifically how the different data nodes are distributed across the racks of a Hadoop cluster. Default Hadoop installation assumes that all data nodes belong to the same rack.  Here is the sample representation for Replication Rack awareness.
     When the client is ready to load a file into the cluster, the content of the file will be divided into blocks(each Block size 128 MB) and then client consults the Name node and gets the address of data nodes for the default 3 replication copies for every block. While placing in the data nodes, the key rule followed is "for every block of data, two copies will exist in one rack, third copy in the different rack". This rule is called as "Replica Placement Policy".

Why Rack Awareness?
     In Big data Hadoop, rack awareness is required for below reasons:
1. To improve data high availability and reliability.
2. To improve the performance of the cluster.
3. To improve network bandwidth.
4. To avoid losing data if entire rack fails though the chance of the rack failure is far less than that of node failure.
5. To keep bulk data in the rack when possible.
6. An assumption that in-rack ids higher bandwidth, lower latency.

Rack Awareness is important to improve
1. Data high availability and reliability.
2. The performance of the cluster.
3. To improve network bandwidth. 

Next Tutorial : HDFS Federation in Hadoop

Previous Tutorial :  HDFS High Availability Tutorial 
 

12 comments:

  1. Informative post about hadoop, i am looking forward for realtime hadoop online training institute.

    ReplyDelete
  2. Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
    Best Hadoop Training Institute In chennai

    ReplyDelete
  3. You won't believe me, but I was planning to write a blog very similar to the one you have posted here. Great work!

    Hadoop Training In Chennai


    Python Training In Chennai

    ReplyDelete
  4. Possibly I can show improvement over a 10-year experience individual yet the business is no place near hearing my accounts. data science course in pune

    ReplyDelete
  5. Excellent Blog! I would like to thank for the efforts you have made in writing this post.
    digital marketing course

    ReplyDelete
  6. Well, the most on top staying topic is Data Science.Out of all, Data science course in Mumbai
    is making a huge difference all across the country. Thank you so much for showing your work and thank you so much for this wonderful article.

    ReplyDelete
  7. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. I would like to state about something which creates curiosity in knowing more about it. It is a part of our daily routine life which we usually don`t notice in all the things which turns the dreams in to real experiences. Back from the ages, we have been growing and world is evolving at a pace lying on the shoulder of technology."data science courses in hyderabad" will be a great piece added to the term technology. Cheer for more ideas & innovation which are part of evolution.

    ReplyDelete
  8. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai

    ReplyDelete
  9. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
    Data Analytics Course in Mumbai

    ReplyDelete
  10. Such a very useful article. Very interesting to read this article. I have learn some new information.thanks for sharing. ExcelR

    ReplyDelete