summaryrefslogtreecommitdiffstats
path: root/docs/guides/monitor-hadoop-cluster.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
commitbe1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch)
tree9754ff1ca740f6346cf8483ec915d4054bc5da2d /docs/guides/monitor-hadoop-cluster.md
parentInitial commit. (diff)
downloadnetdata-5efc5e06000ce889ea2653d549c1aa6d7ed83a59.tar.xz
netdata-5efc5e06000ce889ea2653d549c1aa6d7ed83a59.zip
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'docs/guides/monitor-hadoop-cluster.md')
-rw-r--r--docs/guides/monitor-hadoop-cluster.md191
1 files changed, 191 insertions, 0 deletions
diff --git a/docs/guides/monitor-hadoop-cluster.md b/docs/guides/monitor-hadoop-cluster.md
new file mode 100644
index 00000000..1ddac85e
--- /dev/null
+++ b/docs/guides/monitor-hadoop-cluster.md
@@ -0,0 +1,191 @@
+<!--
+title: "Monitor a Hadoop cluster with Netdata"
+sidebar_label: "Monitor a Hadoop cluster with Netdata"
+custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor-hadoop-cluster.md
+learn_status: "Published"
+learn_topic_type: "Tasks"
+learn_rel_path: "Miscellaneous"
+-->
+
+# Monitor a Hadoop cluster with Netdata
+
+Hadoop is an [Apache project](https://hadoop.apache.org/) is a framework for processing large sets of data across a
+distributed cluster of systems.
+
+And while Hadoop is designed to be a highly-available and fault-tolerant service, those who operate a Hadoop cluster
+will want to monitor the health and performance of their [Hadoop Distributed File System
+(HDFS)](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html) and [Zookeeper](https://zookeeper.apache.org/)
+implementations.
+
+Netdata comes with built-in and pre-configured support for monitoring both HDFS and Zookeeper.
+
+This guide assumes you have a Hadoop cluster, with HDFS and Zookeeper, running already. If you don't, please follow
+the [official Hadoop
+instructions](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html) or an
+alternative, like the guide available from
+[DigitalOcean](https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-in-stand-alone-mode-on-ubuntu-18-04).
+
+For more specifics on the collection modules used in this guide, read the respective pages in our documentation:
+
+- [HDFS](https://github.com/netdata/go.d.plugin/blob/master/modules/hdfs/README.md)
+- [Zookeeper](https://github.com/netdata/go.d.plugin/blob/master/modules/zookeeper/README.md)
+
+## Set up your HDFS and Zookeeper installations
+
+As with all data sources, Netdata can auto-detect HDFS and Zookeeper nodes if you installed them using the standard
+installation procedure.
+
+For Netdata to collect HDFS metrics, it needs to be able to access the node's `/jmx` endpoint. You can test whether an
+JMX endpoint is accessible by using `curl HDFS-IP:PORT/jmx`. For a NameNode, you should see output similar to the
+following:
+
+```json
+{
+ "beans" : [ {
+ "name" : "Hadoop:service=NameNode,name=JvmMetrics",
+ "modelerType" : "JvmMetrics",
+ "MemNonHeapUsedM" : 65.67851,
+ "MemNonHeapCommittedM" : 67.3125,
+ "MemNonHeapMaxM" : -1.0,
+ "MemHeapUsedM" : 154.46341,
+ "MemHeapCommittedM" : 215.0,
+ "MemHeapMaxM" : 843.0,
+ "MemMaxM" : 843.0,
+ "GcCount" : 15,
+ "GcTimeMillis" : 305,
+ "GcNumWarnThresholdExceeded" : 0,
+ "GcNumInfoThresholdExceeded" : 0,
+ "GcTotalExtraSleepTime" : 92,
+ "ThreadsNew" : 0,
+ "ThreadsRunnable" : 6,
+ "ThreadsBlocked" : 0,
+ "ThreadsWaiting" : 7,
+ "ThreadsTimedWaiting" : 34,
+ "ThreadsTerminated" : 0,
+ "LogFatal" : 0,
+ "LogError" : 0,
+ "LogWarn" : 2,
+ "LogInfo" : 348
+ },
+ { ... }
+ ]
+}
+```
+
+The JSON result for a DataNode's `/jmx` endpoint is slightly different:
+
+```json
+{
+ "beans" : [ {
+ "name" : "Hadoop:service=DataNode,name=DataNodeActivity-dev-slave-01.dev.local-9866",
+ "modelerType" : "DataNodeActivity-dev-slave-01.dev.local-9866",
+ "tag.SessionId" : null,
+ "tag.Context" : "dfs",
+ "tag.Hostname" : "dev-slave-01.dev.local",
+ "BytesWritten" : 500960407,
+ "TotalWriteTime" : 463,
+ "BytesRead" : 80689178,
+ "TotalReadTime" : 41203,
+ "BlocksWritten" : 16,
+ "BlocksRead" : 16,
+ "BlocksReplicated" : 4,
+ ...
+ },
+ { ... }
+ ]
+}
+```
+
+If Netdata can't access the `/jmx` endpoint for either a NameNode or DataNode, it will not be able to auto-detect and
+collect metrics from your HDFS implementation.
+
+Zookeeper auto-detection relies on an accessible client port and a allow-listed `mntr` command. For more details on
+`mntr`, see Zookeeper's documentation on [cluster
+options](https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_clusterOptions) and [Zookeeper
+commands](https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands).
+
+## Configure the HDFS and Zookeeper modules
+
+To configure Netdata's HDFS module, navigate to your Netdata directory (typically at `/etc/netdata/`) and use
+`edit-config` to initialize and edit your HDFS configuration file.
+
+```bash
+cd /etc/netdata/
+sudo ./edit-config go.d/hdfs.conf
+```
+
+At the bottom of the file, you will see two example jobs, both of which are commented out:
+
+```yaml
+# [ JOBS ]
+#jobs:
+# - name: namenode
+# url: http://127.0.0.1:9870/jmx
+#
+# - name: datanode
+# url: http://127.0.0.1:9864/jmx
+```
+
+Uncomment these lines and edit the `url` value(s) according to your setup. Now's the time to add any other configuration
+details, which you can find inside of the `hdfs.conf` file itself. Most production implementations will require TLS
+certificates.
+
+The result for a simple HDFS setup, running entirely on `localhost` and without certificate authentication, might look
+like this:
+
+```yaml
+# [ JOBS ]
+jobs:
+ - name: namenode
+ url: http://127.0.0.1:9870/jmx
+
+ - name: datanode
+ url: http://127.0.0.1:9864/jmx
+```
+
+At this point, Netdata should be configured to collect metrics from your HDFS servers. Let's move on to Zookeeper.
+
+Next, use `edit-config` again to initialize/edit your `zookeeper.conf` file.
+
+```bash
+cd /etc/netdata/
+sudo ./edit-config go.d/zookeeper.conf
+```
+
+As with the `hdfs.conf` file, head to the bottom, uncomment the example jobs, and tweak the `address` values according
+to your setup. Again, you may need to add additional configuration options, like TLS certificates.
+
+```yaml
+jobs:
+ - name : local
+ address : 127.0.0.1:2181
+
+ - name : remote
+ address : 203.0.113.10:2182
+```
+
+Finally, [restart Netdata](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md).
+
+```sh
+sudo systemctl restart netdata
+```
+
+Upon restart, Netdata should recognize your HDFS/Zookeeper servers, enable the HDFS and Zookeeper modules, and begin
+showing real-time metrics for both in your Netdata dashboard. 🎉
+
+## Configuring HDFS and Zookeeper alerts
+
+The Netdata community helped us create sane defaults for alerts related to both HDFS and Zookeeper. You may want to
+investigate these to ensure they work well with your Hadoop implementation.
+
+- [HDFS alerts](https://raw.githubusercontent.com/netdata/netdata/master/health/health.d/hdfs.conf)
+
+You can also access/edit these files directly with `edit-config`:
+
+```bash
+sudo /etc/netdata/edit-config health.d/hdfs.conf
+sudo /etc/netdata/edit-config health.d/zookeeper.conf
+```
+
+For more information about editing the defaults or writing new alert entities, see our
+[health monitoring documentation](https://github.com/netdata/netdata/blob/master/health/README.md).