An HDFS cluster has 2 types of nodes operating in a master-worker pattern: a NameNode (the master) and a number of DataNodes (workers). The master node maintains various data storage and processing management services in distributed Hadoop clusters. The actual data in HDFS is stored in Slave nodes. Data is also processed on the slave nodes.
1. Master node
Master node also called NameNode. As the name suggests, this node manages all the slave nodes and assign work to slaves. Master is the centerpiece of HDFS. It stores the metadata of HDFS. All the information related to files stored in HDFS gets stored in Master. It also gives information about where across the cluster the file data is kept. Master contains information about the details of the blocks and its location for all files present in HDFS. Master is the most critical part of HDFS and if all the masters get crashed or down then the HDFS cluster is also considered down and becomes useless. Two files ‘Namespace image’ and the ‘edit log’ are used to store metadata information.
2. Slave node
Slave node also called DataNode. Datanodes are the slaves which are deployed on each machine and provide the actual storage. They are the actual worker nodes. These are responsible for serving read and write requests from the clients. They also perform block creation, deletion, and replication upon instruction from the NameNode. They can be deployed on commodity hardware. If any slave node goes down, NameNode automatically replicates the blocks which were present at that data node to other nodes in the cluster.