Hadoop–Ganglia Integration using Hadoop Metrics2 Framework
In our previous post here, we detailed why Ganglia is a good tool for monitoring clusters. However, when monitoring a Hadoop cluster you often need more information about CPU, disk, memory, and nodal network statistics than the generic Ganglia config can provide. For those who need more finely tuned monitoring, Hadoop supports a framework for recording internal statistics and then for posting them to an external source, either to a file or to Ganglia. In fact, Hadoop now supports an implementation of the Metrics2 Framework for Ganglia. In this post we’ll discuss Hadoop Metrics2 Framework’s design and how it enables Ganglia metrics.
The Hadoop Metrics2 Framework provisions multiple metrics output plugins for use in parallel. It allows dynamic reconfiguration of metrics plugins without having to restart the server, and it exports metrics via Java Management Extensions (JMX).
The Hadoop Metrics2 Framework consists of three major components:
1. The metric source is used to generate metrics.
2. The metric sink is used to consume the metrics produced by the metric sources.
3. The metric system is used to periodically poll metric sources and to pass the metric records to sink.
Implementing and Configuring Components
A metric source class must implement the following interface:
A metric sink must implement this interface:
The basic syntax to configure metric system components is:
Here’s a sample job tracker configuration for sinking a file:
Metrics can be filtered on source, context, records, tags, and metrics themselves. Here is a filtering example :
This will filter out all the metrics within the context “foo”.
Metrics are collected from the following:
5. Task (Map and Reduce Tasks)
Here is an example of a Hadoop Metrics2 configuration for Ganglia:
#define the sink class:
#GangliaSink30 for Ganglia 3.0
#GangliaSink31 for Ganglia 3.1
#Define the period
# Support sparse updates to save cpu/bandwidth usage
# for dense updates the metrics cache is updated everytime #the metric is published which can be cpu/network intensive
#Define the slope if any for specific metrics
#Define dmax value to indicate for how a long a particular #value must be retained
#Define the server locations for various hadoop daemons
Ganglia slopes can take four values:
1. zero – the metric value always remains the same.
2. positive – the metric value can only be increased.
3. negative – the metric value can only be decreased.
4. both – the metric value can be either increased or decreased.
dmax: If a certain metric value has not changed after dmax time, then Ganglia will consider that metric no longer monitored and the graph for that metric will not be shown.