diff options
Diffstat (limited to 'libnetdata/log/README.md')
-rw-r--r-- | libnetdata/log/README.md | 196 |
1 files changed, 193 insertions, 3 deletions
diff --git a/libnetdata/log/README.md b/libnetdata/log/README.md index f811bb4b..d9ed6437 100644 --- a/libnetdata/log/README.md +++ b/libnetdata/log/README.md @@ -7,8 +7,198 @@ learn_topic_type: "Tasks" learn_rel_path: "Developers/libnetdata" --> -# Log +# Netdata Logging -The netdata log library supports debug, info, error and fatal error logging. -By default we have an access log, an error log and a collectors log. +This document describes how Netdata generates its own logs, not how Netdata manages and queries logs databases. + +## Log sources + +Netdata supports the following log sources: + +1. **daemon**, logs generated by Netdata daemon. +2. **collector**, logs generated by Netdata collectors, including internal and external ones. +3. **access**, API requests received by Netdata +4. **health**, all alert transitions and notifications + +## Log outputs + +For each log source, Netdata supports the following output methods: + +- **off**, to disable this log source +- **journal**, to send the logs to systemd-journal. +- **syslog**, to send the logs to syslog. +- **system**, to send the output to `stderr` or `stdout` depending on the log source. +- **stdout**, to write the logs to Netdata's `stdout`. +- **stderr**, to write the logs to Netdata's `stderr`. +- **filename**, to send the logs to a file. + +For `daemon` and `collector` the default is `journal` when systemd-journal is available. +To decide if systemd-journal is available, Netdata checks: + +1. `stderr` is connected to systemd-journald +2. `/run/systemd/journal/socket` exists +3. `/host/run/systemd/journal/socket` exists (`/host` is configurable in containers) + +If any of the above is detected, Netdata will select `journal` for `daemon` and `collector` sources. + +All other sources default to a file. + +## Log formats + +| Format | Description | +|---------|--------------------------------------------------------------------------------------------------------| +| journal | journald-specific log format. Automatically selected when logging to systemd-journal. | +| logfmt | logs data as a series of key/value pairs. The default when logging to any output other than `journal`. | +| json | logs data in JSON format. | + +## Log levels + +Each time Netdata logs, it assigns a priority to the log. It can be one of this (in order of importance): + +| Level | Description | +|-----------|----------------------------------------------------------------------------------------| +| emergency | a fatal condition, Netdata will most likely exit immediately after. | +| alert | a very important issue that may affect how Netdata operates. | +| critical | a very important issue the user should know which, Netdata thinks it can survive. | +| error | an error condition indicating that Netdata is trying to do something, but it fails. | +| warning | something unexpected has happened that may or may not affect the operation of Netdata. | +| notice | something that does not affect the operation of Netdata, but the user should notice. | +| info | the default log level about information the user should know. | +| debug | these are more verbose logs that can be ignored. | + +## Logs Configuration + +In `netdata.conf`, there are the following settings: + +``` +[logs] + # logs to trigger flood protection = 1000 + # logs flood protection period = 60 + # facility = daemon + # level = info + # daemon = journal + # collector = journal + # access = /var/log/netdata/access.log + # health = /var/log/netdata/health.log +``` + +- `logs to trigger flood protection` and `logs flood protection period` enable logs flood protection for `daemon` and `collector` sources. It can also be configured per log source. +- `facility` is used only when Netdata logs to syslog. +- `level` defines the minimum [log level](#log-levels) of logs that will be logged. This setting is applied only to `daemon` and `collector` sources. It can also be configured per source. + +### Configuring log sources + +Each for the sources (`daemon`, `collector`, `access`, `health`), accepts the following: + +``` +source = {FORMAT},level={LEVEL},protection={LOG}/{PERIOD}@{OUTPUT} +``` + +Where: + +- `{FORMAT}`, is one of the [log formats](#log-formats), +- `{LEVEL}`, is the minimum [log level](#log-levels) to be logged, +- `{LOGS}` is the number of `logs to trigger flood protection` configured per output, +- `{PERIOD}` is the equivalent of `logs flood protection period` configured per output, +- `{OUTPUT}` is one of the `[log outputs](#log-outputs), + +All parameters can be omitted, except `{OUTPUT}`. If `{OUTPUT}` is the only given parameter, `@` can be omitted. + +### Logs rotation + +Netdata comes with `logrotate` configuration to rotate its log files periodically. + +The default is usually found in `/etc/logrotate.d/netdata`. + +Sending a `SIGHUP` to Netdata, will instruct it to re-open all its log files. + +## Log Fields + +Netdata exposes the following fields to its logs: + +| journal | logfmt | json | Description | +|:--------------------------------------:|:------------------------------:|:------------------------------:|:---------------------------------------------------------------------------------------------------------:| +| `_SOURCE_REALTIME_TIMESTAMP` | `time` | `time` | the timestamp of the event | +| `SYSLOG_IDENTIFIER` | `comm` | `comm` | the program logging the event | +| `ND_LOG_SOURCE` | `source` | `source` | one of the [log sources](#log-sources) | +| `PRIORITY`<br/>numeric | `level`<br/>text | `level`<br/>numeric | one of the [log levels](#log-levels) | +| `ERRNO` | `errno` | `errno` | the numeric value of `errno` | +| `INVOCATION_ID` | - | - | a unique UUID of the Netdata session, reset on every Netdata restart, inherited by systemd when available | +| `CODE_LINE` | - | - | the line number of of the source code logging this event | +| `CODE_FILE` | - | - | the filename of the source code logging this event | +| `CODE_FUNCTION` | - | - | the function name of the source code logging this event | +| `TID` | `tid` | `tid` | the thread id of the thread logging this event | +| `THREAD_TAG` | `thread` | `thread` | the name of the thread logging this event | +| `MESSAGE_ID` | `msg_id` | `msg_id` | see [message IDs](#message-ids) | +| `ND_MODULE` | `module` | `module` | the Netdata module logging this event | +| `ND_NIDL_NODE` | `node` | `node` | the hostname of the node the event is related to | +| `ND_NIDL_INSTANCE` | `instance` | `instance` | the instance of the node the event is related to | +| `ND_NIDL_CONTEXT` | `context` | `context` | the context the event is related to (this is usually the chart name, as shown on netdata dashboards | +| `ND_NIDL_DIMENSION` | `dimension` | `dimension` | the dimension the event is related to | +| `ND_SRC_TRANSPORT` | `src_transport` | `src_transport` | when the event happened during a request, this is the request transport | +| `ND_SRC_IP` | `src_ip` | `src_ip` | when the event happened during an inbound request, this is the IP the request came from | +| `ND_SRC_PORT` | `src_port` | `src_port` | when the event happened during an inbound request, this is the port the request came from | +| `ND_SRC_CAPABILITIES` | `src_capabilities` | `src_capabilities` | when the request came from a child, this is the communication capabilities of the child | +| `ND_DST_TRANSPORT` | `dst_transport` | `dst_transport` | when the event happened during an outbound request, this is the outbound request transport | +| `ND_DST_IP` | `dst_ip` | `dst_ip` | when the event happened during an outbound request, this is the IP the request destination | +| `ND_DST_PORT` | `dst_port` | `dst_port` | when the event happened during an outbound request, this is the port the request destination | +| `ND_DST_CAPABILITIES` | `dst_capabilities` | `dst_capabilities` | when the request goes to a parent, this is the communication capabilities of the parent | +| `ND_REQUEST_METHOD` | `req_method` | `req_method` | when the event happened during an inbound request, this is the method the request was received | +| `ND_RESPONSE_CODE` | `code` | `code` | when responding to a request, this this the response code | +| `ND_CONNECTION_ID` | `conn` | `conn` | when there is a connection id for an inbound connection, this is the connection id | +| `ND_TRANSACTION_ID` | `transaction` | `transaction` | the transaction id (UUID) of all API requests | +| `ND_RESPONSE_SENT_BYTES` | `sent_bytes` | `sent_bytes` | the bytes we sent to API responses | +| `ND_RESPONSE_SIZE_BYTES` | `size_bytes` | `size_bytes` | the uncompressed bytes of the API responses | +| `ND_RESPONSE_PREP_TIME_USEC` | `prep_ut` | `prep_ut` | the time needed to prepare a response | +| `ND_RESPONSE_SENT_TIME_USEC` | `sent_ut` | `sent_ut` | the time needed to send a response | +| `ND_RESPONSE_TOTAL_TIME_USEC` | `total_ut` | `total_ut` | the total time needed to complete a response | +| `ND_ALERT_ID` | `alert_id` | `alert_id` | the alert id this event is related to | +| `ND_ALERT_EVENT_ID` | `alert_event_id` | `alert_event_id` | a sequential number of the alert transition (per host) | +| `ND_ALERT_UNIQUE_ID` | `alert_unique_id` | `alert_unique_id` | a sequential number of the alert transition (per alert) | +| `ND_ALERT_TRANSITION_ID` | `alert_transition_id` | `alert_transition_id` | the unique UUID of this alert transition | +| `ND_ALERT_CONFIG` | `alert_config` | `alert_config` | the alert configuration hash (UUID) | +| `ND_ALERT_NAME` | `alert` | `alert` | the alert name | +| `ND_ALERT_CLASS` | `alert_class` | `alert_class` | the alert classification | +| `ND_ALERT_COMPONENT` | `alert_component` | `alert_component` | the alert component | +| `ND_ALERT_TYPE` | `alert_type` | `alert_type` | the alert type | +| `ND_ALERT_EXEC` | `alert_exec` | `alert_exec` | the alert notification program | +| `ND_ALERT_RECIPIENT` | `alert_recipient` | `alert_recipient` | the alert recipient(s) | +| `ND_ALERT_VALUE` | `alert_value` | `alert_value` | the current alert value | +| `ND_ALERT_VALUE_OLD` | `alert_value_old` | `alert_value_old` | the previous alert value | +| `ND_ALERT_STATUS` | `alert_status` | `alert_status` | the current alert status | +| `ND_ALERT_STATUS_OLD` | `alert_value_old` | `alert_value_old` | the previous alert value | +| `ND_ALERT_UNITS` | `alert_units` | `alert_units` | the units of the alert | +| `ND_ALERT_SUMMARY` | `alert_summary` | `alert_summary` | the summary text of the alert | +| `ND_ALERT_INFO` | `alert_info` | `alert_info` | the info text of the alert | +| `ND_ALERT_DURATION` | `alert_duration` | `alert_duration` | the duration the alert was in its previous state | +| `ND_ALERT_NOTIFICATION_TIMESTAMP_USEC` | `alert_notification_timestamp` | `alert_notification_timestamp` | the timestamp the notification delivery is scheduled | +| `ND_REQUEST` | `request` | `request` | the full request during which the event happened | +| `MESSAGE` | `msg` | `msg` | the event message | + + +### Message IDs + +Netdata assigns specific message IDs to certain events: + +- `ed4cdb8f1beb4ad3b57cb3cae2d162fa` when a Netdata child connects to this Netdata +- `6e2e3839067648968b646045dbf28d66` when this Netdata connects to a Netdata parent +- `9ce0cb58ab8b44df82c4bf1ad9ee22de` when alerts change state +- `6db0018e83e34320ae2a659d78019fb7` when notifications are sent + +You can view these events using the Netdata systemd-journal.plugin at the `MESSAGE_ID` filter, +or using `journalctl` like this: + +```bash +# query children connection +journalctl MESSAGE_ID=ed4cdb8f1beb4ad3b57cb3cae2d162fa + +# query parent connection +journalctl MESSAGE_ID=6e2e3839067648968b646045dbf28d66 + +# query alert transitions +journalctl MESSAGE_ID=9ce0cb58ab8b44df82c4bf1ad9ee22de + +# query alert notifications +journalctl MESSAGE_ID=6db0018e83e34320ae2a659d78019fb7 +``` |