Recent Posts

Tuesday, 17 April 2018

Hadoop Counters

Hadoop Counters 

     Hadoop Counters provides a way to measure the progress or the number of operations that occur within map/reduce job. Counters in Hadoop MapReduce are a useful channel for gathering statistics about the MapReduce job: for quality control or for application-level. They are also useful for problem diagnosis.
     Counters represents Hadoop global counters, defined either by the MapReduce framework or applications. Each counter is named by an “Enum” and has a long for the value. Counters are bunched into groups, each comprising of counters from a particular Enum class.

Counters validate that,
1. The correct number of bytes was read and written.
2. The correct number of tasks was launched and successfully ran.
3. The amount of CPU and memory consumed is appropriate for our job and cluster nodes.

Types of Hadoop MapReduce Counters
     There are basically 2 types of MapReduce Counters:
1. Built-In Counters
2. User-Defined Counters/Custom counters 

1. Built-In Counters
     Apache Hadoop maintains some built-in counters for every job and these report various metrics, like, there are counters for the number of bytes and records, which allow us to confirm that the expected amount of input is consumed and the expected amount of output is produced.
     Hadoop Counters are divided into groups and there are several groups for the built-in counters. Each group either contains task counters (which are updated as task progress) or job counter (which are updated as a job progress). There are several groups for the Hadoop built-in Counters.

a) MapReduce Task Counter
     Task counter collects specific information about tasks during its execution time. Which include the number of records read and written.
     For example, MAP_INPUT_RECORDS counter is the Task Counter. It also counts the input records read by each map task. 
     Hadoop Task counters are maintained by each task attempt and periodically sent to the application master so they can be globally aggregated.

b) File System Counters
     This Counter gathers information like a number of bytes read and written by the file system. The name and description of the file system counters are as follows:
FileSystem bytes read – The number of bytes read by the filesystem.
FileSystem bytes written – The number of bytes written to the filesystem.

c) FileInputFormat Counters
     These Counters also gather information of a number of bytes read by map tasks via FileInputFormat. Learn Hadoop InputFormat in detail.

d) FileOutputFormat counters
     These counters also gather information of a number of bytes written by map tasks (for map-only jobs) or reduce tasks via FileOutputFormat. Learn Hadoop OutputFormat in detail.

e) Job Counters in MapReduce
     Job counter measures the job-level statistics. It does not measure values that change while a task is running.  For example, TOTAL_LAUNCHED_MAPS, count the number of map tasks that were launched over the course of a job. Application master also measures the Job counters. So, they don’t need to be sent across the network, unlike all other counters, including user-defined ones.

2. User-Defined Counters or Custom Counters in Hadoop MapReduce
     In addition to built-in counters, Hadoop MapReduce permits user code to define a set of counters. Then it increments them as desired in the mapper or reducer. Like in Java to define counters it uses, ‘enum’. A job may define an arbitrary number of ‘enums’. Each with an arbitrary number of fields. The name of the enum is the group name. The enum’s fields are the counter names.

a) Dynamic Counters in Hadoop
      Java enum’s fields are defined at compile time. So we cannot create new counters at run time using enums. So, we use dynamic counters to create new counters at run time. But dynamic counter is not defined at compile time.

Next Tutorial : Apache Hive Introduction

Previous Tutorial : Speculative Execution in Hadoop 
 

1 comment: