summaryrefslogtreecommitdiffstats
path: root/src/go/collectors/go.d.plugin/modules/nvidia_smi/integrations/nvidia_gpu.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-08-26 08:15:24 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-08-26 08:15:35 +0000
commitf09848204fa5283d21ea43e262ee41aa578e1808 (patch)
treec62385d7adf209fa6a798635954d887f718fb3fb /src/go/collectors/go.d.plugin/modules/nvidia_smi/integrations/nvidia_gpu.md
parentReleasing debian version 1.46.3-2. (diff)
downloadnetdata-f09848204fa5283d21ea43e262ee41aa578e1808.tar.xz
netdata-f09848204fa5283d21ea43e262ee41aa578e1808.zip
Merging upstream version 1.47.0.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/go/collectors/go.d.plugin/modules/nvidia_smi/integrations/nvidia_gpu.md')
-rw-r--r--src/go/collectors/go.d.plugin/modules/nvidia_smi/integrations/nvidia_gpu.md217
1 files changed, 0 insertions, 217 deletions
diff --git a/src/go/collectors/go.d.plugin/modules/nvidia_smi/integrations/nvidia_gpu.md b/src/go/collectors/go.d.plugin/modules/nvidia_smi/integrations/nvidia_gpu.md
deleted file mode 100644
index 28016cfbd..000000000
--- a/src/go/collectors/go.d.plugin/modules/nvidia_smi/integrations/nvidia_gpu.md
+++ /dev/null
@@ -1,217 +0,0 @@
-<!--startmeta
-custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/collectors/go.d.plugin/modules/nvidia_smi/README.md"
-meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/collectors/go.d.plugin/modules/nvidia_smi/metadata.yaml"
-sidebar_label: "Nvidia GPU"
-learn_status: "Published"
-learn_rel_path: "Collecting Metrics/Hardware Devices and Sensors"
-most_popular: False
-message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
-endmeta-->
-
-# Nvidia GPU
-
-
-<img src="https://netdata.cloud/img/nvidia.svg" width="150"/>
-
-
-Plugin: go.d.plugin
-Module: nvidia_smi
-
-<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />
-
-## Overview
-
-This collector monitors GPUs performance metrics using
-the [nvidia-smi](https://developer.nvidia.com/nvidia-system-management-interface) CLI tool.
-
-> **Warning**: under development, [loop mode](https://github.com/netdata/netdata/issues/14522) not implemented yet.
-
-
-
-
-This collector is supported on all platforms.
-
-This collector supports collecting metrics from multiple instances of this integration, including remote instances.
-
-
-### Default Behavior
-
-#### Auto-Detection
-
-This integration doesn't support auto-detection.
-
-#### Limits
-
-The default configuration for this integration does not impose any limits on data collection.
-
-#### Performance Impact
-
-The default configuration for this integration is not expected to impose a significant performance impact on the system.
-
-
-## Metrics
-
-Metrics grouped by *scope*.
-
-The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
-
-
-
-### Per gpu
-
-These metrics refer to the GPU.
-
-Labels:
-
-| Label | Description |
-|:-----------|:----------------|
-| uuid | GPU id (e.g. 00000000:00:04.0) |
-| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |
-
-Metrics:
-
-| Metric | Dimensions | Unit | XML | CSV |
-|:------|:----------|:----|:---:|:---:|
-| nvidia_smi.gpu_pcie_bandwidth_usage | rx, tx | B/s | • | |
-| nvidia_smi.gpu_pcie_bandwidth_utilization | rx, tx | % | • | |
-| nvidia_smi.gpu_fan_speed_perc | fan_speed | % | • | • |
-| nvidia_smi.gpu_utilization | gpu | % | • | • |
-| nvidia_smi.gpu_memory_utilization | memory | % | • | • |
-| nvidia_smi.gpu_decoder_utilization | decoder | % | • | |
-| nvidia_smi.gpu_encoder_utilization | encoder | % | • | |
-| nvidia_smi.gpu_frame_buffer_memory_usage | free, used, reserved | B | • | • |
-| nvidia_smi.gpu_bar1_memory_usage | free, used | B | • | |
-| nvidia_smi.gpu_temperature | temperature | Celsius | • | • |
-| nvidia_smi.gpu_voltage | voltage | V | • | |
-| nvidia_smi.gpu_clock_freq | graphics, video, sm, mem | MHz | • | • |
-| nvidia_smi.gpu_power_draw | power_draw | Watts | • | • |
-| nvidia_smi.gpu_performance_state | P0-P15 | state | • | • |
-| nvidia_smi.gpu_mig_mode_current_status | enabled, disabled | status | • | |
-| nvidia_smi.gpu_mig_devices_count | mig | devices | • | |
-
-### Per mig
-
-These metrics refer to the Multi-Instance GPU (MIG).
-
-Labels:
-
-| Label | Description |
-|:-----------|:----------------|
-| uuid | GPU id (e.g. 00000000:00:04.0) |
-| product_name | GPU product name (e.g. NVIDIA A100-SXM4-40GB) |
-| gpu_instance_id | GPU instance id (e.g. 1) |
-
-Metrics:
-
-| Metric | Dimensions | Unit | XML | CSV |
-|:------|:----------|:----|:---:|:---:|
-| nvidia_smi.gpu_mig_frame_buffer_memory_usage | free, used, reserved | B | • | |
-| nvidia_smi.gpu_mig_bar1_memory_usage | free, used | B | • | |
-
-
-
-## Alerts
-
-There are no alerts configured by default for this integration.
-
-
-## Setup
-
-### Prerequisites
-
-#### Enable in go.d.conf.
-
-This collector is disabled by default. You need to explicitly enable it in the `go.d.conf` file.
-
-
-
-### Configuration
-
-#### File
-
-The configuration file name for this integration is `go.d/nvidia_smi.conf`.
-
-
-You can edit the configuration file using the `edit-config` script from the
-Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).
-
-```bash
-cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
-sudo ./edit-config go.d/nvidia_smi.conf
-```
-#### Options
-
-The following options can be defined globally: update_every, autodetection_retry.
-
-
-<details open><summary>Config options</summary>
-
-| Name | Description | Default | Required |
-|:----|:-----------|:-------|:--------:|
-| update_every | Data collection frequency. | 10 | no |
-| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |
-| binary_path | Path to nvidia_smi binary. The default is "nvidia_smi" and the executable is looked for in the directories specified in the PATH environment variable. | nvidia_smi | no |
-| timeout | nvidia_smi binary execution timeout. | 2 | no |
-| use_csv_format | Used format when requesting GPU information. XML is used if set to 'no'. | no | no |
-
-</details>
-
-#### Examples
-
-##### CSV format
-
-Use CSV format when requesting GPU information.
-
-<details open><summary>Config</summary>
-
-```yaml
-jobs:
- - name: nvidia_smi
- use_csv_format: yes
-
-```
-</details>
-
-##### Custom binary path
-
-The executable is not in the directories specified in the PATH environment variable.
-
-<details open><summary>Config</summary>
-
-```yaml
-jobs:
- - name: nvidia_smi
- binary_path: /usr/local/sbin/nvidia_smi
-
-```
-</details>
-
-
-
-## Troubleshooting
-
-### Debug Mode
-
-To troubleshoot issues with the `nvidia_smi` collector, run the `go.d.plugin` with the debug option enabled. The output
-should give you clues as to why the collector isn't working.
-
-- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
- your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.
-
- ```bash
- cd /usr/libexec/netdata/plugins.d/
- ```
-
-- Switch to the `netdata` user.
-
- ```bash
- sudo -u netdata -s
- ```
-
-- Run the `go.d.plugin` to debug the collector:
-
- ```bash
- ./go.d.plugin -d -m nvidia_smi
- ```
-
-