diff options
Diffstat (limited to 'src/go/plugin/go.d/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md')
-rw-r--r-- | src/go/plugin/go.d/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md | 286 |
1 files changed, 286 insertions, 0 deletions
diff --git a/src/go/plugin/go.d/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md b/src/go/plugin/go.d/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md new file mode 100644 index 000000000..e37ccde0c --- /dev/null +++ b/src/go/plugin/go.d/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md @@ -0,0 +1,286 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/hdfs/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/hdfs/metadata.yaml" +sidebar_label: "Hadoop Distributed File System (HDFS)" +learn_status: "Published" +learn_rel_path: "Collecting Metrics/Storage, Mount Points and Filesystems" +most_popular: True +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Hadoop Distributed File System (HDFS) + + +<img src="https://netdata.cloud/img/hadoop.svg" width="150"/> + + +Plugin: go.d.plugin +Module: hfs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors HDFS nodes. + +Netdata accesses HDFS metrics over `Java Management Extensions` (JMX) through the web interface of an HDFS daemon. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Hadoop Distributed File System (HDFS) instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | DataNode | NameNode | +|:------|:----------|:----|:---:|:---:| +| hdfs.heap_memory | committed, used | MiB | • | • | +| hdfs.gc_count_total | gc | events/s | • | • | +| hdfs.gc_time_total | ms | ms | • | • | +| hdfs.gc_threshold | info, warn | events/s | • | • | +| hdfs.threads | new, runnable, blocked, waiting, timed_waiting, terminated | num | • | • | +| hdfs.logs_total | info, error, warn, fatal | logs/s | • | • | +| hdfs.rpc_bandwidth | received, sent | kilobits/s | • | • | +| hdfs.rpc_calls | calls | calls/s | • | • | +| hdfs.open_connections | open | connections | • | • | +| hdfs.call_queue_length | length | num | • | • | +| hdfs.avg_queue_time | time | ms | • | • | +| hdfs.avg_processing_time | time | ms | • | • | +| hdfs.capacity | remaining, used | KiB | | • | +| hdfs.used_capacity | dfs, non_dfs | KiB | | • | +| hdfs.load | load | load | | • | +| hdfs.volume_failures_total | failures | events/s | | • | +| hdfs.files_total | files | num | | • | +| hdfs.blocks_total | blocks | num | | • | +| hdfs.blocks | corrupt, missing, under_replicated | num | | • | +| hdfs.data_nodes | live, dead, stale | num | | • | +| hdfs.datanode_capacity | remaining, used | KiB | • | | +| hdfs.datanode_used_capacity | dfs, non_dfs | KiB | • | | +| hdfs.datanode_failed_volumes | failed volumes | num | • | | +| hdfs.datanode_bandwidth | reads, writes | KiB/s | • | | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ hdfs_capacity_usage ](https://github.com/netdata/netdata/blob/master/src/health/health.d/hdfs.conf) | hdfs.capacity | summary datanodes space capacity utilization | +| [ hdfs_missing_blocks ](https://github.com/netdata/netdata/blob/master/src/health/health.d/hdfs.conf) | hdfs.blocks | number of missing blocks | +| [ hdfs_stale_nodes ](https://github.com/netdata/netdata/blob/master/src/health/health.d/hdfs.conf) | hdfs.data_nodes | number of datanodes marked stale due to delayed heartbeat | +| [ hdfs_dead_nodes ](https://github.com/netdata/netdata/blob/master/src/health/health.d/hdfs.conf) | hdfs.data_nodes | number of datanodes which are currently dead | +| [ hdfs_num_failed_volumes ](https://github.com/netdata/netdata/blob/master/src/health/health.d/hdfs.conf) | hdfs.num_failed_volumes | number of failed volumes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `go.d/hdfs.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config go.d/hdfs.conf +``` +#### Options + +The following options can be defined globally: update_every, autodetection_retry. + + +<details open><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 1 | no | +| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | +| url | Server URL. | http://127.0.0.1:9870/jmx | yes | +| timeout | HTTP request timeout. | 1 | no | +| username | Username for basic HTTP authentication. | | no | +| password | Password for basic HTTP authentication. | | no | +| proxy_url | Proxy URL. | | no | +| proxy_username | Username for proxy basic HTTP authentication. | | no | +| proxy_password | Password for proxy basic HTTP authentication. | | no | +| method | HTTP request method. | GET | no | +| body | HTTP request body. | | no | +| headers | HTTP request headers. | | no | +| not_follow_redirects | Redirect handling policy. Controls whether the client follows redirects. | no | no | +| tls_skip_verify | Server certificate chain and hostname validation policy. Controls whether the client performs this check. | no | no | +| tls_ca | Certification authority that the client uses when verifying the server's certificates. | | no | +| tls_cert | Client TLS certificate. | | no | +| tls_key | Client TLS key. | | no | + +</details> + +#### Examples + +##### Basic + +A basic example configuration. + +```yaml +jobs: + - name: local + url: http://127.0.0.1:9870/jmx + +``` +##### HTTP authentication + +Basic HTTP authentication. + +<details open><summary>Config</summary> + +```yaml +jobs: + - name: local + url: http://127.0.0.1:9870/jmx + username: username + password: password + +``` +</details> + +##### HTTPS with self-signed certificate + +Do not validate server certificate chain and hostname. + + +<details open><summary>Config</summary> + +```yaml +jobs: + - name: local + url: https://127.0.0.1:9870/jmx + tls_skip_verify: yes + +``` +</details> + +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details open><summary>Config</summary> + +```yaml +jobs: + - name: local + url: http://127.0.0.1:9870/jmx + + - name: remote + url: http://192.0.2.1:9870/jmx + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +**Important**: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature. + +To troubleshoot issues with the `hfs` collector, run the `go.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `go.d.plugin` to debug the collector: + + ```bash + ./go.d.plugin -d -m hfs + ``` + +### Getting Logs + +If you're encountering problems with the `hfs` collector, follow these steps to retrieve logs and identify potential issues: + +- **Run the command** specific to your system (systemd, non-systemd, or Docker container). +- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. + +#### System with systemd + +Use the following command to view logs generated since the last Netdata service restart: + +```bash +journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep hfs +``` + +#### System without systemd + +Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: + +```bash +grep hfs /var/log/netdata/collector.log +``` + +**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. + +#### Docker Container + +If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: + +```bash +docker logs netdata 2>&1 | grep hfs +``` + + |