summaryrefslogtreecommitdiffstats
path: root/libnetdata/log/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'libnetdata/log/README.md')
-rw-r--r--libnetdata/log/README.md196
1 files changed, 193 insertions, 3 deletions
diff --git a/libnetdata/log/README.md b/libnetdata/log/README.md
index f811bb4b3..d9ed64374 100644
--- a/libnetdata/log/README.md
+++ b/libnetdata/log/README.md
@@ -7,8 +7,198 @@ learn_topic_type: "Tasks"
learn_rel_path: "Developers/libnetdata"
-->
-# Log
+# Netdata Logging
-The netdata log library supports debug, info, error and fatal error logging.
-By default we have an access log, an error log and a collectors log.
+This document describes how Netdata generates its own logs, not how Netdata manages and queries logs databases.
+
+## Log sources
+
+Netdata supports the following log sources:
+
+1. **daemon**, logs generated by Netdata daemon.
+2. **collector**, logs generated by Netdata collectors, including internal and external ones.
+3. **access**, API requests received by Netdata
+4. **health**, all alert transitions and notifications
+
+## Log outputs
+
+For each log source, Netdata supports the following output methods:
+
+- **off**, to disable this log source
+- **journal**, to send the logs to systemd-journal.
+- **syslog**, to send the logs to syslog.
+- **system**, to send the output to `stderr` or `stdout` depending on the log source.
+- **stdout**, to write the logs to Netdata's `stdout`.
+- **stderr**, to write the logs to Netdata's `stderr`.
+- **filename**, to send the logs to a file.
+
+For `daemon` and `collector` the default is `journal` when systemd-journal is available.
+To decide if systemd-journal is available, Netdata checks:
+
+1. `stderr` is connected to systemd-journald
+2. `/run/systemd/journal/socket` exists
+3. `/host/run/systemd/journal/socket` exists (`/host` is configurable in containers)
+
+If any of the above is detected, Netdata will select `journal` for `daemon` and `collector` sources.
+
+All other sources default to a file.
+
+## Log formats
+
+| Format | Description |
+|---------|--------------------------------------------------------------------------------------------------------|
+| journal | journald-specific log format. Automatically selected when logging to systemd-journal. |
+| logfmt | logs data as a series of key/value pairs. The default when logging to any output other than `journal`. |
+| json | logs data in JSON format. |
+
+## Log levels
+
+Each time Netdata logs, it assigns a priority to the log. It can be one of this (in order of importance):
+
+| Level | Description |
+|-----------|----------------------------------------------------------------------------------------|
+| emergency | a fatal condition, Netdata will most likely exit immediately after. |
+| alert | a very important issue that may affect how Netdata operates. |
+| critical | a very important issue the user should know which, Netdata thinks it can survive. |
+| error | an error condition indicating that Netdata is trying to do something, but it fails. |
+| warning | something unexpected has happened that may or may not affect the operation of Netdata. |
+| notice | something that does not affect the operation of Netdata, but the user should notice. |
+| info | the default log level about information the user should know. |
+| debug | these are more verbose logs that can be ignored. |
+
+## Logs Configuration
+
+In `netdata.conf`, there are the following settings:
+
+```
+[logs]
+ # logs to trigger flood protection = 1000
+ # logs flood protection period = 60
+ # facility = daemon
+ # level = info
+ # daemon = journal
+ # collector = journal
+ # access = /var/log/netdata/access.log
+ # health = /var/log/netdata/health.log
+```
+
+- `logs to trigger flood protection` and `logs flood protection period` enable logs flood protection for `daemon` and `collector` sources. It can also be configured per log source.
+- `facility` is used only when Netdata logs to syslog.
+- `level` defines the minimum [log level](#log-levels) of logs that will be logged. This setting is applied only to `daemon` and `collector` sources. It can also be configured per source.
+
+### Configuring log sources
+
+Each for the sources (`daemon`, `collector`, `access`, `health`), accepts the following:
+
+```
+source = {FORMAT},level={LEVEL},protection={LOG}/{PERIOD}@{OUTPUT}
+```
+
+Where:
+
+- `{FORMAT}`, is one of the [log formats](#log-formats),
+- `{LEVEL}`, is the minimum [log level](#log-levels) to be logged,
+- `{LOGS}` is the number of `logs to trigger flood protection` configured per output,
+- `{PERIOD}` is the equivalent of `logs flood protection period` configured per output,
+- `{OUTPUT}` is one of the `[log outputs](#log-outputs),
+
+All parameters can be omitted, except `{OUTPUT}`. If `{OUTPUT}` is the only given parameter, `@` can be omitted.
+
+### Logs rotation
+
+Netdata comes with `logrotate` configuration to rotate its log files periodically.
+
+The default is usually found in `/etc/logrotate.d/netdata`.
+
+Sending a `SIGHUP` to Netdata, will instruct it to re-open all its log files.
+
+## Log Fields
+
+Netdata exposes the following fields to its logs:
+
+| journal | logfmt | json | Description |
+|:--------------------------------------:|:------------------------------:|:------------------------------:|:---------------------------------------------------------------------------------------------------------:|
+| `_SOURCE_REALTIME_TIMESTAMP` | `time` | `time` | the timestamp of the event |
+| `SYSLOG_IDENTIFIER` | `comm` | `comm` | the program logging the event |
+| `ND_LOG_SOURCE` | `source` | `source` | one of the [log sources](#log-sources) |
+| `PRIORITY`<br/>numeric | `level`<br/>text | `level`<br/>numeric | one of the [log levels](#log-levels) |
+| `ERRNO` | `errno` | `errno` | the numeric value of `errno` |
+| `INVOCATION_ID` | - | - | a unique UUID of the Netdata session, reset on every Netdata restart, inherited by systemd when available |
+| `CODE_LINE` | - | - | the line number of of the source code logging this event |
+| `CODE_FILE` | - | - | the filename of the source code logging this event |
+| `CODE_FUNCTION` | - | - | the function name of the source code logging this event |
+| `TID` | `tid` | `tid` | the thread id of the thread logging this event |
+| `THREAD_TAG` | `thread` | `thread` | the name of the thread logging this event |
+| `MESSAGE_ID` | `msg_id` | `msg_id` | see [message IDs](#message-ids) |
+| `ND_MODULE` | `module` | `module` | the Netdata module logging this event |
+| `ND_NIDL_NODE` | `node` | `node` | the hostname of the node the event is related to |
+| `ND_NIDL_INSTANCE` | `instance` | `instance` | the instance of the node the event is related to |
+| `ND_NIDL_CONTEXT` | `context` | `context` | the context the event is related to (this is usually the chart name, as shown on netdata dashboards |
+| `ND_NIDL_DIMENSION` | `dimension` | `dimension` | the dimension the event is related to |
+| `ND_SRC_TRANSPORT` | `src_transport` | `src_transport` | when the event happened during a request, this is the request transport |
+| `ND_SRC_IP` | `src_ip` | `src_ip` | when the event happened during an inbound request, this is the IP the request came from |
+| `ND_SRC_PORT` | `src_port` | `src_port` | when the event happened during an inbound request, this is the port the request came from |
+| `ND_SRC_CAPABILITIES` | `src_capabilities` | `src_capabilities` | when the request came from a child, this is the communication capabilities of the child |
+| `ND_DST_TRANSPORT` | `dst_transport` | `dst_transport` | when the event happened during an outbound request, this is the outbound request transport |
+| `ND_DST_IP` | `dst_ip` | `dst_ip` | when the event happened during an outbound request, this is the IP the request destination |
+| `ND_DST_PORT` | `dst_port` | `dst_port` | when the event happened during an outbound request, this is the port the request destination |
+| `ND_DST_CAPABILITIES` | `dst_capabilities` | `dst_capabilities` | when the request goes to a parent, this is the communication capabilities of the parent |
+| `ND_REQUEST_METHOD` | `req_method` | `req_method` | when the event happened during an inbound request, this is the method the request was received |
+| `ND_RESPONSE_CODE` | `code` | `code` | when responding to a request, this this the response code |
+| `ND_CONNECTION_ID` | `conn` | `conn` | when there is a connection id for an inbound connection, this is the connection id |
+| `ND_TRANSACTION_ID` | `transaction` | `transaction` | the transaction id (UUID) of all API requests |
+| `ND_RESPONSE_SENT_BYTES` | `sent_bytes` | `sent_bytes` | the bytes we sent to API responses |
+| `ND_RESPONSE_SIZE_BYTES` | `size_bytes` | `size_bytes` | the uncompressed bytes of the API responses |
+| `ND_RESPONSE_PREP_TIME_USEC` | `prep_ut` | `prep_ut` | the time needed to prepare a response |
+| `ND_RESPONSE_SENT_TIME_USEC` | `sent_ut` | `sent_ut` | the time needed to send a response |
+| `ND_RESPONSE_TOTAL_TIME_USEC` | `total_ut` | `total_ut` | the total time needed to complete a response |
+| `ND_ALERT_ID` | `alert_id` | `alert_id` | the alert id this event is related to |
+| `ND_ALERT_EVENT_ID` | `alert_event_id` | `alert_event_id` | a sequential number of the alert transition (per host) |
+| `ND_ALERT_UNIQUE_ID` | `alert_unique_id` | `alert_unique_id` | a sequential number of the alert transition (per alert) |
+| `ND_ALERT_TRANSITION_ID` | `alert_transition_id` | `alert_transition_id` | the unique UUID of this alert transition |
+| `ND_ALERT_CONFIG` | `alert_config` | `alert_config` | the alert configuration hash (UUID) |
+| `ND_ALERT_NAME` | `alert` | `alert` | the alert name |
+| `ND_ALERT_CLASS` | `alert_class` | `alert_class` | the alert classification |
+| `ND_ALERT_COMPONENT` | `alert_component` | `alert_component` | the alert component |
+| `ND_ALERT_TYPE` | `alert_type` | `alert_type` | the alert type |
+| `ND_ALERT_EXEC` | `alert_exec` | `alert_exec` | the alert notification program |
+| `ND_ALERT_RECIPIENT` | `alert_recipient` | `alert_recipient` | the alert recipient(s) |
+| `ND_ALERT_VALUE` | `alert_value` | `alert_value` | the current alert value |
+| `ND_ALERT_VALUE_OLD` | `alert_value_old` | `alert_value_old` | the previous alert value |
+| `ND_ALERT_STATUS` | `alert_status` | `alert_status` | the current alert status |
+| `ND_ALERT_STATUS_OLD` | `alert_value_old` | `alert_value_old` | the previous alert value |
+| `ND_ALERT_UNITS` | `alert_units` | `alert_units` | the units of the alert |
+| `ND_ALERT_SUMMARY` | `alert_summary` | `alert_summary` | the summary text of the alert |
+| `ND_ALERT_INFO` | `alert_info` | `alert_info` | the info text of the alert |
+| `ND_ALERT_DURATION` | `alert_duration` | `alert_duration` | the duration the alert was in its previous state |
+| `ND_ALERT_NOTIFICATION_TIMESTAMP_USEC` | `alert_notification_timestamp` | `alert_notification_timestamp` | the timestamp the notification delivery is scheduled |
+| `ND_REQUEST` | `request` | `request` | the full request during which the event happened |
+| `MESSAGE` | `msg` | `msg` | the event message |
+
+
+### Message IDs
+
+Netdata assigns specific message IDs to certain events:
+
+- `ed4cdb8f1beb4ad3b57cb3cae2d162fa` when a Netdata child connects to this Netdata
+- `6e2e3839067648968b646045dbf28d66` when this Netdata connects to a Netdata parent
+- `9ce0cb58ab8b44df82c4bf1ad9ee22de` when alerts change state
+- `6db0018e83e34320ae2a659d78019fb7` when notifications are sent
+
+You can view these events using the Netdata systemd-journal.plugin at the `MESSAGE_ID` filter,
+or using `journalctl` like this:
+
+```bash
+# query children connection
+journalctl MESSAGE_ID=ed4cdb8f1beb4ad3b57cb3cae2d162fa
+
+# query parent connection
+journalctl MESSAGE_ID=6e2e3839067648968b646045dbf28d66
+
+# query alert transitions
+journalctl MESSAGE_ID=9ce0cb58ab8b44df82c4bf1ad9ee22de
+
+# query alert notifications
+journalctl MESSAGE_ID=6db0018e83e34320ae2a659d78019fb7
+```