diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-10-17 09:30:20 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2023-10-17 09:30:20 +0000 |
commit | 386ccdd61e8256c8b21ee27ee2fc12438fc5ca98 (patch) | |
tree | c9fbcacdb01f029f46133a5ba7ecd610c2bcb041 /collectors | |
parent | Adding upstream version 1.42.4. (diff) | |
download | netdata-386ccdd61e8256c8b21ee27ee2fc12438fc5ca98.tar.xz netdata-386ccdd61e8256c8b21ee27ee2fc12438fc5ca98.zip |
Adding upstream version 1.43.0.upstream/1.43.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'collectors')
319 files changed, 34841 insertions, 11794 deletions
diff --git a/collectors/COLLECTORS.md b/collectors/COLLECTORS.md index aa56ac702..ea0776abc 100644 --- a/collectors/COLLECTORS.md +++ b/collectors/COLLECTORS.md @@ -41,641 +41,1156 @@ If you don't see the app/service you'd like to monitor in this list: in [Go](https://github.com/netdata/go.d.plugin/blob/master/README.md#how-to-develop-a-collector) or [Python](https://github.com/netdata/netdata/blob/master/docs/guides/python-collector.md) -## Available Collectors - -- [Monitor anything with Netdata](#monitor-anything-with-netdata) - - [Add your application to Netdata](#add-your-application-to-netdata) - - [Available Collectors](#available-collectors) - - [Service and application collectors](#service-and-application-collectors) - - [Generic](#generic) - - [APM (application performance monitoring)](#apm-application-performance-monitoring) - - [Containers and VMs](#containers-and-vms) - - [Data stores](#data-stores) - - [Distributed computing](#distributed-computing) - - [Email](#email) - - [Kubernetes](#kubernetes) - - [Logs](#logs) - - [Messaging](#messaging) - - [Network](#network) - - [Provisioning](#provisioning) - - [Remote devices](#remote-devices) - - [Search](#search) - - [Storage](#storage) - - [Web](#web) - - [System collectors](#system-collectors) - - [Applications](#applications) - - [Disks and filesystems](#disks-and-filesystems) - - [eBPF](#ebpf) - - [Hardware](#hardware) - - [Memory](#memory) - - [Networks](#networks) - - [Operating systems](#operating-systems) - - [Processes](#processes) - - [Resources](#resources) - - [Users](#users) - - [Netdata collectors](#netdata-collectors) - - [Orchestrators](#orchestrators) - - [Third-party collectors](#third-party-collectors) - - [Etc](#etc) - -## Service and application collectors - -The Netdata Agent auto-detects and collects metrics from all of the services and applications below. You can also -configure any of these collectors according to your setup and infrastructure. - -### Generic - -- [Prometheus endpoints](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/README.md): Gathers - metrics from any number of Prometheus endpoints, with support to autodetect more than 600 services and applications. -- [Pandas](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/pandas/README.md): A Python - collector that gathers - metrics from a [pandas](https://pandas.pydata.org/) dataframe. Pandas is a high level data processing library in - Python that can read various formats of data from local files or web endpoints. Custom processing and transformation - logic can also be expressed as part of the collector configuration. - -### APM (application performance monitoring) - -- [Go applications](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/go_expvar/README.md): - Monitor any Go application that exposes its - metrics with the `expvar` package from the Go standard library. -- [Java Spring Boot 2 applications](https://github.com/netdata/go.d.plugin/blob/master/modules/springboot2/README.md): - Monitor running Java Spring Boot 2 applications that expose their metrics with the use of the Spring Boot Actuator. -- [statsd](https://github.com/netdata/netdata/blob/master/collectors/statsd.plugin/README.md): Implement a high - performance `statsd` server for Netdata. -- [phpDaemon](https://github.com/netdata/go.d.plugin/blob/master/modules/phpdaemon/README.md): Collect worker - statistics (total, active, idle), and uptime for web and network applications. -- [uWSGI](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/uwsgi/README.md): Monitor - performance metrics exposed by the uWSGI Stats - Server. +## Available Data Collection Integrations +<!-- AUTOGENERATED PART BY integrations/gen_doc_collector_page.py SCRIPT, DO NOT EDIT MANUALLY --> +### APM + +- [Alamos FE2 server](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/alamos_fe2_server.md) + +- [Apache Airflow](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/apache_airflow.md) + +- [Apache Flink](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/apache_flink.md) + +- [Audisto](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/audisto.md) + +- [Dependency-Track](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dependency-track.md) + +- [Go applications (EXPVAR)](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/go_expvar/integrations/go_applications_expvar.md) + +- [Google Pagespeed](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/google_pagespeed.md) + +- [IBM AIX systems Njmon](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ibm_aix_systems_njmon.md) + +- [JMX](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/jmx.md) + +- [Java Spring-boot 2 applications](https://github.com/netdata/go.d.plugin/blob/master/modules/springboot2/integrations/java_spring-boot_2_applications.md) + +- [NRPE daemon](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nrpe_daemon.md) + +- [Sentry](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sentry.md) + +- [Sysload](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sysload.md) + +- [VSCode](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/vscode.md) + +- [YOURLS URL Shortener](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/yourls_url_shortener.md) + +- [bpftrace variables](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/bpftrace_variables.md) + +- [gpsd](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/gpsd.md) + +- [jolokia](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/jolokia.md) + +- [phpDaemon](https://github.com/netdata/go.d.plugin/blob/master/modules/phpdaemon/integrations/phpdaemon.md) + +### Authentication and Authorization + +- [Fail2ban](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/fail2ban/integrations/fail2ban.md) + +- [FreeRADIUS](https://github.com/netdata/go.d.plugin/blob/master/modules/freeradius/integrations/freeradius.md) + +- [HashiCorp Vault secrets](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hashicorp_vault_secrets.md) + +- [LDAP](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ldap.md) + +- [OpenLDAP (community)](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openldap_community.md) + +- [OpenLDAP](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/openldap/integrations/openldap.md) + +- [RADIUS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/radius.md) + +- [SSH](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ssh.md) + +- [TACACS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/tacacs.md) + +### Blockchain Servers + +- [Chia](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/chia.md) + +- [Crypto exchanges](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/crypto_exchanges.md) + +- [Cryptowatch](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cryptowatch.md) + +- [Energi Core Wallet](https://github.com/netdata/go.d.plugin/blob/master/modules/energid/integrations/energi_core_wallet.md) + +- [Go-ethereum](https://github.com/netdata/go.d.plugin/blob/master/modules/geth/integrations/go-ethereum.md) + +- [Helium miner (validator)](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/helium_miner_validator.md) + +- [IOTA full node](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/iota_full_node.md) + +- [Sia](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sia.md) + +### CICD Platforms + +- [Concourse](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/concourse.md) + +- [GitLab Runner](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/gitlab_runner.md) + +- [Jenkins](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/jenkins.md) + +- [Puppet](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/puppet/integrations/puppet.md) + +### Cloud Provider Managed + +- [AWS EC2 Compute instances](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_ec2_compute_instances.md) + +- [AWS EC2 Spot Instance](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_ec2_spot_instance.md) + +- [AWS ECS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_ecs.md) + +- [AWS Health events](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_health_events.md) + +- [AWS Quota](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_quota.md) + +- [AWS S3 buckets](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_s3_buckets.md) + +- [AWS SQS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_sqs.md) + +- [AWS instance health](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_instance_health.md) + +- [Akamai Global Traffic Management](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/akamai_global_traffic_management.md) + +- [Akami Cloudmonitor](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/akami_cloudmonitor.md) + +- [Alibaba Cloud](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/alibaba_cloud.md) + +- [ArvanCloud CDN](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/arvancloud_cdn.md) + +- [Azure AD App passwords](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/azure_ad_app_passwords.md) + +- [Azure Elastic Pool SQL](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/azure_elastic_pool_sql.md) + +- [Azure Resources](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/azure_resources.md) + +- [Azure SQL](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/azure_sql.md) + +- [Azure Service Bus](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/azure_service_bus.md) + +- [Azure application](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/azure_application.md) + +- [BigQuery](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/bigquery.md) + +- [CloudWatch](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cloudwatch.md) + +- [Dell EMC ECS cluster](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dell_emc_ecs_cluster.md) + +- [DigitalOcean](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/digitalocean.md) + +- [GCP GCE](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/gcp_gce.md) + +- [GCP Quota](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/gcp_quota.md) + +- [Google Cloud Platform](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/google_cloud_platform.md) + +- [Google Stackdriver](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/google_stackdriver.md) + +- [Linode](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/linode.md) + +- [Lustre metadata](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/lustre_metadata.md) + +- [Nextcloud servers](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nextcloud_servers.md) + +- [OpenStack](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openstack.md) + +- [Zerto](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/zerto.md) ### Containers and VMs -- [Docker containers](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/README.md): Monitor the - health and performance of individual Docker containers using the cgroups collector plugin. -- [DockerD](https://github.com/netdata/go.d.plugin/blob/master/modules/docker/README.md): Collect container health - statistics. -- [Docker Engine](https://github.com/netdata/go.d.plugin/blob/master/modules/docker_engine/README.md): Collect - runtime statistics from the `docker` daemon using the `metrics-address` feature. -- [Docker Hub](https://github.com/netdata/go.d.plugin/blob/master/modules/dockerhub/README.md): Collect statistics - about Docker repositories, such as pulls, starts, status, time since last update, and more. -- [Libvirt](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/README.md): Monitor the health and - performance of individual Libvirt containers - using the cgroups collector plugin. -- [LXC](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/README.md): Monitor the health and - performance of individual LXC containers using - the cgroups collector plugin. -- [LXD](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/README.md): Monitor the health and - performance of individual LXD containers using - the cgroups collector plugin. -- [systemd-nspawn](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/README.md): Monitor the - health and performance of individual - systemd-nspawn containers using the cgroups collector plugin. -- [vCenter Server Appliance](https://github.com/netdata/go.d.plugin/blob/master/modules/vcsa/README.md): Monitor - appliance system, components, and software update health statuses via the Health API. -- [vSphere](https://github.com/netdata/go.d.plugin/blob/master/modules/vsphere/README.md): Collect host and virtual - machine performance metrics. -- [Xen/XCP-ng](https://github.com/netdata/netdata/blob/master/collectors/xenstat.plugin/README.md): Collect XenServer - and XCP-ng metrics using `libxenstat`. - -### Data stores - -- [CockroachDB](https://github.com/netdata/go.d.plugin/blob/master/modules/cockroachdb/README.md): Monitor various - database components using `_status/vars` endpoint. -- [Consul](https://github.com/netdata/go.d.plugin/blob/master/modules/consul/README.md): Capture service and unbound - checks status (passing, warning, critical, maintenance). -- [Couchbase](https://github.com/netdata/go.d.plugin/blob/master/modules/couchbase/README.md): Gather per-bucket - metrics from any number of instances of the distributed JSON document database. -- [CouchDB](https://github.com/netdata/go.d.plugin/blob/master/modules/couchdb/README.md): Monitor database health and - performance metrics - (reads/writes, HTTP traffic, replication status, etc). -- [MongoDB](https://github.com/netdata/go.d.plugin/blob/master/modules/mongodb/README.md): Collect server, database, - replication and sharding performance and health metrics. -- [MySQL](https://github.com/netdata/go.d.plugin/blob/master/modules/mysql/README.md): Collect database global, - replication and per user statistics. -- [OracleDB](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/oracledb/README.md): Monitor - database performance and health metrics. -- [Pika](https://github.com/netdata/go.d.plugin/blob/master/modules/pika/README.md): Gather metric, such as clients, - memory usage, queries, and more from the Redis interface-compatible database. -- [Postgres](https://github.com/netdata/go.d.plugin/blob/master/modules/postgres/README.md): Collect database health - and performance metrics. -- [ProxySQL](https://github.com/netdata/go.d.plugin/blob/master/modules/proxysql/README.md): Monitor database backend - and frontend performance metrics. -- [Redis](https://github.com/netdata/go.d.plugin/blob/master/modules/redis/README.md): Monitor status from any - number of database instances by reading the server's response to the `INFO ALL` command. -- [RethinkDB](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/rethinkdbs/README.md): Collect - database server and cluster statistics. -- [Riak KV](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/riakkv/README.md): Collect - database stats from the `/stats` endpoint. -- [Zookeeper](https://github.com/netdata/go.d.plugin/blob/master/modules/zookeeper/README.md): Monitor application - health metrics reading the server's response to the `mntr` command. -- [Memcached](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/memcached/README.md): Collect - memory-caching system performance metrics. - -### Distributed computing - -- [BOINC](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/boinc/README.md): Monitor the total - number of tasks, open tasks, and task - states for the distributed computing client. -- [Gearman](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/gearman/README.md): Collect - application summary (queued, running) and per-job - worker statistics (queued, idle, running). - -### Email - -- [Dovecot](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/dovecot/README.md): Collect email - server performance metrics by reading the - server's response to the `EXPORT global` command. -- [EXIM](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/exim/README.md): Uses the `exim` tool - to monitor the queue length of a - mail/message transfer agent (MTA). -- [Postfix](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/postfix/README.md): Uses - the `postqueue` tool to monitor the queue length of a - mail/message transfer agent (MTA). +- [Containers](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/containers.md) -### Kubernetes +- [Docker Engine](https://github.com/netdata/go.d.plugin/blob/master/modules/docker_engine/integrations/docker_engine.md) + +- [Docker Hub repository](https://github.com/netdata/go.d.plugin/blob/master/modules/dockerhub/integrations/docker_hub_repository.md) + +- [Docker](https://github.com/netdata/go.d.plugin/blob/master/modules/docker/integrations/docker.md) + +- [LXC Containers](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/lxc_containers.md) + +- [Libvirt Containers](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/libvirt_containers.md) + +- [NSX-T](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nsx-t.md) + +- [Podman](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/podman.md) + +- [Proxmox Containers](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/proxmox_containers.md) + +- [Proxmox VE](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/proxmox_ve.md) + +- [VMware vCenter Server](https://github.com/netdata/go.d.plugin/blob/master/modules/vsphere/integrations/vmware_vcenter_server.md) + +- [Virtual Machines](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/virtual_machines.md) + +- [Xen/XCP-ng](https://github.com/netdata/netdata/blob/master/collectors/xenstat.plugin/integrations/xen-xcp-ng.md) + +- [cAdvisor](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cadvisor.md) + +- [oVirt Containers](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/ovirt_containers.md) + +- [vCenter Server Appliance](https://github.com/netdata/go.d.plugin/blob/master/modules/vcsa/integrations/vcenter_server_appliance.md) + +### Databases + +- [4D Server](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/4d_server.md) + +- [AWS RDS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aws_rds.md) + +- [Cassandra](https://github.com/netdata/go.d.plugin/blob/master/modules/cassandra/integrations/cassandra.md) + +- [ClickHouse](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/clickhouse.md) + +- [ClusterControl CMON](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/clustercontrol_cmon.md) + +- [CockroachDB](https://github.com/netdata/go.d.plugin/blob/master/modules/cockroachdb/integrations/cockroachdb.md) + +- [CouchDB](https://github.com/netdata/go.d.plugin/blob/master/modules/couchdb/integrations/couchdb.md) + +- [Couchbase](https://github.com/netdata/go.d.plugin/blob/master/modules/couchbase/integrations/couchbase.md) + +- [HANA](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hana.md) + +- [Hasura GraphQL Server](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hasura_graphql_server.md) + +- [InfluxDB](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/influxdb.md) + +- [Machbase](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/machbase.md) + +- [MariaDB](https://github.com/netdata/go.d.plugin/blob/master/modules/mysql/integrations/mariadb.md) + +- [Memcached (community)](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/memcached_community.md) + +- [Memcached](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/memcached/integrations/memcached.md) + +- [MongoDB](https://github.com/netdata/go.d.plugin/blob/master/modules/mongodb/integrations/mongodb.md) + +- [MySQL](https://github.com/netdata/go.d.plugin/blob/master/modules/mysql/integrations/mysql.md) + +- [ODBC](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/odbc.md) + +- [Oracle DB (community)](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/oracle_db_community.md) + +- [Oracle DB](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/oracledb/integrations/oracle_db.md) + +- [Patroni](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/patroni.md) + +- [Percona MySQL](https://github.com/netdata/go.d.plugin/blob/master/modules/mysql/integrations/percona_mysql.md) + +- [PgBouncer](https://github.com/netdata/go.d.plugin/blob/master/modules/pgbouncer/integrations/pgbouncer.md) + +- [Pgpool-II](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/pgpool-ii.md) + +- [Pika](https://github.com/netdata/go.d.plugin/blob/master/modules/pika/integrations/pika.md) + +- [PostgreSQL](https://github.com/netdata/go.d.plugin/blob/master/modules/postgres/integrations/postgresql.md) + +- [ProxySQL](https://github.com/netdata/go.d.plugin/blob/master/modules/proxysql/integrations/proxysql.md) + +- [Redis](https://github.com/netdata/go.d.plugin/blob/master/modules/redis/integrations/redis.md) + +- [RethinkDB](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/rethinkdbs/integrations/rethinkdb.md) + +- [RiakKV](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/riakkv/integrations/riakkv.md) + +- [SQL Database agnostic](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sql_database_agnostic.md) + +- [Vertica](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/vertica.md) + +- [Warp10](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/warp10.md) + +- [pgBackRest](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/pgbackrest.md) + +### Distributed Computing Systems + +- [BOINC](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/boinc/integrations/boinc.md) + +- [Gearman](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/gearman/integrations/gearman.md) + +### DNS and DHCP Servers + +- [Akamai Edge DNS Traffic](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/akamai_edge_dns_traffic.md) + +- [CoreDNS](https://github.com/netdata/go.d.plugin/blob/master/modules/coredns/integrations/coredns.md) + +- [DNS query](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsquery/integrations/dns_query.md) + +- [DNSBL](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dnsbl.md) + +- [DNSdist](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsdist/integrations/dnsdist.md) + +- [Dnsmasq DHCP](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsmasq_dhcp/integrations/dnsmasq_dhcp.md) + +- [Dnsmasq](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsmasq/integrations/dnsmasq.md) + +- [ISC Bind (RNDC)](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/bind_rndc/integrations/isc_bind_rndc.md) + +- [ISC DHCP](https://github.com/netdata/go.d.plugin/blob/master/modules/isc_dhcpd/integrations/isc_dhcp.md) + +- [Name Server Daemon](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md) + +- [NextDNS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nextdns.md) + +- [Pi-hole](https://github.com/netdata/go.d.plugin/blob/master/modules/pihole/integrations/pi-hole.md) + +- [PowerDNS Authoritative Server](https://github.com/netdata/go.d.plugin/blob/master/modules/powerdns/integrations/powerdns_authoritative_server.md) + +- [PowerDNS Recursor](https://github.com/netdata/go.d.plugin/blob/master/modules/powerdns_recursor/integrations/powerdns_recursor.md) -- [Kubelet](https://github.com/netdata/go.d.plugin/blob/master/modules/k8s_kubelet/README.md): Monitor one or more - instances of the Kubelet agent and collects metrics on number of pods/containers running, volume of Docker - operations, and more. -- [kube-proxy](https://github.com/netdata/go.d.plugin/blob/master/modules/k8s_kubeproxy/README.md): Collect - metrics, such as syncing proxy rules and REST client requests, from one or more instances of `kube-proxy`. -- [Service discovery](https://github.com/netdata/agent-service-discovery/blob/master/README.md): Find what services are running on a - cluster's pods, converts that into configuration files, and exports them so they can be monitored by Netdata. - -### Logs - -- [Fluentd](https://github.com/netdata/go.d.plugin/blob/master/modules/fluentd/README.md): Gather application - plugins metrics from an endpoint provided by `in_monitor plugin`. -- [Logstash](https://github.com/netdata/go.d.plugin/blob/master/modules/logstash/README.md): Monitor JVM threads, - memory usage, garbage collection statistics, and more. -- [OpenVPN status logs](https://github.com/netdata/go.d.plugin/blob/master/modules/openvpn_status_log/README.md): Parse - server log files and provide summary (client, traffic) metrics. -- [Squid web server logs](https://github.com/netdata/go.d.plugin/blob/master/modules/squidlog/README.md): Tail Squid - access logs to return the volume of requests, types of requests, bandwidth, and much more. -- [Web server logs (Go version for Apache, NGINX)](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md): Tail access logs and provide - very detailed web server performance statistics. This module is able to parse 200k+ rows in less than half a second. -- [Web server logs (Apache, NGINX)](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/README.md): Tail - access log - file and collect web server/caching proxy metrics. - -### Messaging - -- [ActiveMQ](https://github.com/netdata/go.d.plugin/blob/master/modules/activemq/README.md): Collect message broker - queues and topics statistics using the ActiveMQ Console API. -- [Beanstalk](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/beanstalk/README.md): Collect - server and tube-level statistics, such as CPU - usage, jobs rates, commands, and more. -- [Pulsar](https://github.com/netdata/go.d.plugin/blob/master/modules/pulsar/README.md): Collect summary, - namespaces, and topics performance statistics. -- [RabbitMQ](https://github.com/netdata/go.d.plugin/blob/master/modules/rabbitmq/README.md): Collect message - broker overview, system and per virtual host metrics. -- [VerneMQ](https://github.com/netdata/go.d.plugin/blob/master/modules/vernemq/README.md): Monitor MQTT broker - health and performance metrics. It collects all available info for both MQTTv3 and v5 communication - -### Network - -- [Bind 9](https://github.com/netdata/go.d.plugin/blob/master/modules/bind/README.md): Collect nameserver summary - performance statistics via a web interface (`statistics-channels` feature). -- [Chrony](https://github.com/netdata/go.d.plugin/blob/master/modules/chrony/README.md): Monitor the precision and - statistics of a local `chronyd` server. -- [CoreDNS](https://github.com/netdata/go.d.plugin/blob/master/modules/coredns/README.md): Measure DNS query round - trip time. -- [Dnsmasq](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsmasq_dhcp/README.md): Automatically - detects all configured `Dnsmasq` DHCP ranges and Monitor their utilization. -- [DNSdist](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsdist/README.md): Collect - load-balancer performance and health metrics. -- [Dnsmasq DNS Forwarder](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsmasq/README.md): Gather - queries, entries, operations, and events for the lightweight DNS forwarder. -- [DNS Query Time](https://github.com/netdata/go.d.plugin/blob/master/modules/dnsquery/README.md): Monitor the round - trip time for DNS queries in milliseconds. -- [Freeradius](https://github.com/netdata/go.d.plugin/blob/master/modules/freeradius/README.md): Collect - server authentication and accounting statistics from the `status server`. -- [Libreswan](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/libreswan/README.md): Collect - bytes-in, bytes-out, and uptime metrics. -- [Icecast](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/icecast/README.md): Monitor the - number of listeners for active sources. -- [ISC Bind (RDNC)](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/bind_rndc/README.md): - Collect nameserver summary performance - statistics using the `rndc` tool. -- [ISC DHCP](https://github.com/netdata/go.d.plugin/blob/master/modules/isc_dhcpd/README.md): Reads a - `dhcpd.leases` file and collects metrics on total active leases, pool active leases, and pool utilization. -- [OpenLDAP](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/openldap/README.md): Provides - statistics information from the OpenLDAP - (`slapd`) server. -- [NSD](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/nsd/README.md): Monitor nameserver - performance metrics using the `nsd-control` - tool. -- [NTP daemon](https://github.com/netdata/go.d.plugin/blob/master/modules/ntpd/README.md): Monitor the system variables - of the local `ntpd` daemon (optionally including variables of the polled peers) using the NTP Control Message Protocol - via a UDP socket. -- [OpenSIPS](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/opensips/README.md): Collect - server health and performance metrics using the - `opensipsctl` tool. -- [OpenVPN](https://github.com/netdata/go.d.plugin/blob/master/modules/openvpn/README.md): Gather server summary - (client, traffic) and per user metrics (traffic, connection time) stats using `management-interface`. -- [Pi-hole](https://github.com/netdata/go.d.plugin/blob/master/modules/pihole/README.md): Monitor basic (DNS - queries, clients, blocklist) and extended (top clients, top permitted, and blocked domains) statistics using the PHP - API. -- [PowerDNS Authoritative Server](https://github.com/netdata/go.d.plugin/blob/master/modules/powerdns/README.md): - Monitor one or more instances of the nameserver software to collect questions, events, and latency metrics. -- [PowerDNS Recursor](https://github.com/netdata/go.d.plugin/blob/master/modules/powerdns/README.md#recursor): - Gather incoming/outgoing questions, drops, timeouts, and cache usage from any number of DNS recursor instances. -- [RetroShare](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/retroshare/README.md): Monitor - application bandwidth, peers, and DHT - metrics. -- [Tor](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/tor/README.md): Capture traffic usage - statistics using the Tor control port. -- [Unbound](https://github.com/netdata/go.d.plugin/blob/master/modules/unbound/README.md): Collect DNS resolver - summary and extended system and per thread metrics via the `remote-control` interface. - -### Provisioning - -- [Puppet](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/puppet/README.md): Monitor the - status of Puppet Server and Puppet DB. - -### Remote devices - -- [AM2320](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/am2320/README.md): Monitor sensor - temperature and humidity. -- [Access point](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/ap/README.md): Monitor - client, traffic and signal metrics using the `aw` - tool. -- [APC UPS](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/apcupsd/README.md): Capture status - information using the `apcaccess` tool. -- [Energi Core](https://github.com/netdata/go.d.plugin/blob/master/modules/energid/README.md): Monitor - blockchain indexes, memory usage, network usage, and transactions of wallet instances. -- [UPS/PDU](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/nut/README.md): Read the status of - UPS/PDU devices using the `upsc` tool. -- [SNMP devices](https://github.com/netdata/go.d.plugin/blob/master/modules/snmp/README.md): Gather data using the SNMP - protocol. -- [1-Wire sensors](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/w1sensor/README.md): - Monitor sensor temperature. - -### Search - -- [Elasticsearch](https://github.com/netdata/go.d.plugin/blob/master/modules/elasticsearch/README.md): Collect - dozens of metrics on search engine performance from local nodes and local indices. Includes cluster health and - statistics. -- [Solr](https://github.com/netdata/go.d.plugin/blob/master/modules/solr/README.md): Collect application search - requests, search errors, update requests, and update errors statistics. - -### Storage - -- [Ceph](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/ceph/README.md): Monitor the Ceph - cluster usage and server data consumption. -- [HDFS](https://github.com/netdata/go.d.plugin/blob/master/modules/hdfs/README.md): Monitor health and performance - metrics for filesystem datanodes and namenodes. -- [IPFS](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/ipfs/README.md): Collect file system - bandwidth, peers, and repo metrics. -- [Scaleio](https://github.com/netdata/go.d.plugin/blob/master/modules/scaleio/README.md): Monitor storage system, - storage pools, and SDCS health and performance metrics via VxFlex OS Gateway API. -- [Samba](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/samba/README.md): Collect file - sharing metrics using the `smbstatus` tool. - -### Web - -- [Apache](https://github.com/netdata/go.d.plugin/blob/master/modules/apache/README.md): Collect Apache web - server performance metrics via the `server-status?auto` endpoint. -- [HAProxy](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/haproxy/README.md): Collect - frontend, backend, and health metrics. -- [HTTP endpoints](https://github.com/netdata/go.d.plugin/blob/master/modules/httpcheck/README.md): Monitor - any HTTP endpoint's availability and response time. -- [Lighttpd](https://github.com/netdata/go.d.plugin/blob/master/modules/lighttpd/README.md): Collect web server - performance metrics using the `server-status?auto` endpoint. -- [Litespeed](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/litespeed/README.md): Collect - web server data (network, connection, - requests, cache) by reading `.rtreport*` files. -- [Nginx](https://github.com/netdata/go.d.plugin/blob/master/modules/nginx/README.md): Monitor web server - status information by gathering metrics via `ngx_http_stub_status_module`. -- [Nginx VTS](https://github.com/netdata/go.d.plugin/blob/master/modules/nginxvts/README.md): Gathers metrics from - any Nginx deployment with the _virtual host traffic status module_ enabled, including metrics on uptime, memory - usage, and cache, and more. -- [PHP-FPM](https://github.com/netdata/go.d.plugin/blob/master/modules/phpfpm/README.md): Collect application - summary and processes health metrics by scraping the status page (`/status?full`). -- [TCP endpoints](https://github.com/netdata/go.d.plugin/blob/master/modules/portcheck/README.md): Monitor any - TCP endpoint's availability and response time. -- [Spigot Minecraft servers](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/spigotmc/README.md): - Monitor average ticket rate and number - of users. -- [Squid](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/squid/README.md): Monitor client and - server bandwidth/requests by gathering - data from the Cache Manager component. -- [Tengine](https://github.com/netdata/go.d.plugin/blob/master/modules/tengine/README.md): Monitor web server - statistics using information provided by `ngx_http_reqstat_module`. -- [Tomcat](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/tomcat/README.md): Collect web - server performance metrics from the Manager App - (`/manager/status?XML=true`). -- [Traefik](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/traefik/README.md): Uses Traefik's - Health API to provide statistics. -- [Varnish](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/varnish/README.md): Provides HTTP - accelerator global, backends (VBE), and - disks (SMF) statistics using the `varnishstat` tool. -- [x509 check](https://github.com/netdata/go.d.plugin/blob/master/modules/x509check/README.md): Monitor certificate - expiration time. -- [Whois domain expiry](https://github.com/netdata/go.d.plugin/blob/master/modules/whoisquery/README.md): Checks the - remaining time until a given domain is expired. - -## System collectors - -The Netdata Agent can collect these system- and hardware-level metrics using a variety of collectors, some of which -(such as `proc.plugin`) collect multiple types of metrics simultaneously. - -### Applications - -- [Fail2ban](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/fail2ban/README.md): Parses - configuration files to detect all jails, then - uses log files to report ban rates and volume of banned IPs. -- [Monit](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/monit/README.md): Monitor statuses - of targets (service-checks) using the XML - stats interface. -- [Windows](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/README.md): Collect CPU, memory, - network, disk, OS, system, and log-in metrics scraping [windows_exporter](https://github.com/prometheus-community/windows_exporter). - -### Disks and filesystems - -- [BCACHE](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor BCACHE statistics - with the `proc.plugin` collector. -- [Block devices](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Gather metrics about - the health and performance of block - devices using the `proc.plugin` collector. -- [Btrfs](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitors Btrfs filesystems - with the `proc.plugin` collector. -- [Device mapper](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Gather metrics about - the Linux device mapper with the proc - collector. -- [Disk space](https://github.com/netdata/netdata/blob/master/collectors/diskspace.plugin/README.md): Collect disk space - usage metrics on Linux mount points. -- [Clock synchronization](https://github.com/netdata/netdata/blob/master/collectors/timex.plugin/README.md): Collect the - system clock synchronization status on Linux. -- [Files and directories](https://github.com/netdata/go.d.plugin/blob/master/modules/filecheck/README.md): Gather - metrics about the existence, modification time, and size of files or directories. -- [ioping.plugin](https://github.com/netdata/netdata/blob/master/collectors/ioping.plugin/README.md): Measure disk - read/write latency. -- [NFS file servers and clients](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): - Gather operations, utilization, and space usage - using the `proc.plugin` collector. -- [RAID arrays](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Collect health, disk - status, operation status, and more with the `proc.plugin` collector. -- [Veritas Volume Manager](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Gather - metrics about the Veritas Volume Manager (VVM). -- [ZFS](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor bandwidth and - utilization of ZFS disks/partitions using the proc - collector. +- [Unbound](https://github.com/netdata/go.d.plugin/blob/master/modules/unbound/integrations/unbound.md) ### eBPF -- [Files](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md): Provides information about - how often a system calls kernel - functions related to file descriptors using the eBPF collector. -- [Virtual file system (VFS)](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md): Monitor - IO, errors, deleted objects, and - more for kernel virtual file systems (VFS) using the eBPF collector. -- [Processes](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md): Monitor threads, task - exits, and errors using the eBPF collector. - -### Hardware - -- [Adaptec RAID](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/adaptec_raid/README.md): - Monitor logical and physical devices health - metrics using the `arcconf` tool. -- [CUPS](https://github.com/netdata/netdata/blob/master/collectors/cups.plugin/README.md): Monitor CUPS. -- [FreeIPMI](https://github.com/netdata/netdata/blob/master/collectors/freeipmi.plugin/README.md): - Uses `libipmimonitoring-dev` or `libipmimonitoring-devel` to - monitor the number of sensors, temperatures, voltages, currents, and more. -- [Hard drive temperature](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/hddtemp/README.md): - Monitor the temperature of storage - devices. -- [HP Smart Storage Arrays](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/hpssa/README.md): - Monitor controller, cache module, logical - and physical drive state, and temperature using the `ssacli` tool. -- [MegaRAID controllers](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/megacli/README.md): - Collect adapter, physical drives, and - battery stats using the `megacli` tool. -- [NVIDIA GPU](https://github.com/netdata/go.d.plugin/blob/master/modules/nvidia_smi/README.md): Monitor - performance metrics (memory usage, fan - speed, pcie bandwidth utilization, temperature, and more) using the `nvidia-smi` tool. -- [Sensors](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/sensors/README.md): Reads system - sensors information (temperature, voltage, - electric current, power, and more) from `/sys/devices/`. -- [S.M.A.R.T](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/smartd_log/README.md): Reads - SMART Disk Monitoring daemon logs. - -### Memory - -- [Available memory](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Tracks changes in - available RAM using the `proc.plugin` collector. -- [Committed memory](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor committed - memory using the `proc.plugin` collector. -- [Huge pages](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Gather metrics about - huge pages in Linux and FreeBSD with the - `proc.plugin` collector. -- [KSM](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Measure the amount of merging, - savings, and effectiveness using the - `proc.plugin` collector. -- [Numa](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Gather metrics on the number - of non-uniform memory access (NUMA) events - every second using the `proc.plugin` collector. -- [Page faults](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Collect the number of - memory page faults per second using the - `proc.plugin` collector. -- [RAM](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Collect metrics on system RAM, - available RAM, and more using the - `proc.plugin` collector. -- [SLAB](https://github.com/netdata/netdata/blob/master/collectors/slabinfo.plugin/README.md): Collect kernel SLAB - details on Linux systems. -- [swap](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor the amount of free - and used swap at every second using the - `proc.plugin` collector. -- [Writeback memory](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Collect how much - memory is actively being written to disk at - every second using the `proc.plugin` collector. - -### Networks - -- [Access points](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/ap/README.md): Visualizes - data related to access points. -- [Ping](https://github.com/netdata/go.d.plugin/blob/master/modules/ping/README.md): Measure network latency, jitter and - packet loss between the monitored node - and any number of remote network end points. -- [Netfilter](https://github.com/netdata/netdata/blob/master/collectors/nfacct.plugin/README.md): Collect netfilter - firewall, connection tracker, and accounting - metrics using `libmnl` and `libnetfilter_acct`. -- [Network stack](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor the - networking stack for errors, TCP connection aborts, - bandwidth, and more. -- [Network QoS](https://github.com/netdata/netdata/blob/master/collectors/tc.plugin/README.md): Collect traffic QoS - metrics (`tc`) of Linux network interfaces. -- [SYNPROXY](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor entries uses, SYN - packets received, TCP cookies, and more. - -### Operating systems - -- [freebsd.plugin](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/README.md): Collect resource - usage and performance data on FreeBSD systems. -- [macOS](https://github.com/netdata/netdata/blob/master/collectors/macos.plugin/README.md): Collect resource usage and - performance data on macOS systems. - -### Processes - -- [Applications](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/README.md): Gather CPU, disk, - memory, network, eBPF, and other metrics per - application using the `apps.plugin` collector. -- [systemd](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/README.md): Monitor the CPU and - memory usage of systemd services using the - `cgroups.plugin` collector. -- [systemd unit states](https://github.com/netdata/go.d.plugin/blob/master/modules/systemdunits/README.md): See the - state (active, inactive, activating, deactivating, failed) of various systemd unit types. -- [System processes](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Collect metrics - on system load and total processes running - using `/proc/loadavg` and the `proc.plugin` collector. -- [Uptime](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor the uptime of a - system using the `proc.plugin` collector. - -### Resources - -- [CPU frequency](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor CPU - frequency, as set by the `cpufreq` kernel module, - using the `proc.plugin` collector. -- [CPU idle](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Measure CPU idle every - second using the `proc.plugin` collector. -- [CPU performance](https://github.com/netdata/netdata/blob/master/collectors/perf.plugin/README.md): Collect CPU - performance metrics using performance monitoring - units (PMU). -- [CPU throttling](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Gather metrics - about thermal throttling using the `/proc/stat` - module and the `proc.plugin` collector. -- [CPU utilization](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Capture CPU - utilization, both system-wide and per-core, using - the `/proc/stat` module and the `proc.plugin` collector. -- [Entropy](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor the available - entropy on a system using the `proc.plugin` - collector. -- [Interprocess Communication (IPC)](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): - Monitor IPC semaphores and shared memory - using the `proc.plugin` collector. -- [Interrupts](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Monitor interrupts per - second using the `proc.plugin` collector. -- [IdleJitter](https://github.com/netdata/netdata/blob/master/collectors/idlejitter.plugin/README.md): Measure CPU - latency and jitter on all operating systems. -- [SoftIRQs](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Collect metrics on - SoftIRQs, both system-wide and per-core, using the - `proc.plugin` collector. -- [SoftNet](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md): Capture SoftNet events per - second, both system-wide and per-core, - using the `proc.plugin` collector. - -### Users - -- [systemd-logind](https://github.com/netdata/go.d.plugin/blob/master/modules/logind/README.md): Monitor active - sessions, users, and seats tracked - by `systemd-logind` or `elogind`. -- [User/group usage](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/README.md): Gather CPU, disk, - memory, network, and other metrics per user - and user group using the `apps.plugin` collector. - -## Netdata collectors - -These collectors are recursive in nature, in that they monitor some function of the Netdata Agent itself. Some -collectors are described only in code and associated charts in Netdata dashboards. - -- [ACLK (code only)](https://github.com/netdata/netdata/blob/master/aclk/legacy/aclk_stats.c): View whether a Netdata - Agent is connected to Netdata Cloud via the [ACLK](https://github.com/netdata/netdata/blob/master/aclk/README.md), the - volume of queries, process times, and more. -- [Alarms](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/alarms/README.md): This collector - creates an - **Alarms** menu with one line plot showing the alarm states of a Netdata Agent over time. -- [Anomalies](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/anomalies/README.md): This - collector uses the - Python PyOD library to perform unsupervised anomaly detection on your Netdata charts and/or dimensions. -- [Exporting (code only)](https://github.com/netdata/netdata/blob/master/exporting/send_internal_metrics.c): Gather - metrics on CPU utilization for - the [exporting engine](https://github.com/netdata/netdata/blob/master/exporting/README.md), and specific metrics for - each enabled - exporting connector. -- [Global statistics (code only)](https://github.com/netdata/netdata/blob/master/daemon/global_statistics.c): See - metrics on the CPU utilization, network traffic, volume of web clients, API responses, database engine usage, and - more. - -## Orchestrators - -Plugin orchestrators organize and run many of the above collectors. - -If you're interested in developing a new collector that you'd like to contribute to Netdata, we highly recommend using -the `go.d.plugin`. - -- [go.d.plugin](https://github.com/netdata/go.d.plugin): An orchestrator for data collection modules written in `go`. -- [python.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md): An - orchestrator for data collection modules written in `python` v2/v3. -- [charts.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/README.md): An - orchestrator for data collection modules written in `bash` v4+. - -## Third-party collectors - -These collectors are developed and maintained by third parties and, unlike the other collectors, are not installed by -default. To use a third-party collector, visit their GitHub/documentation page and follow their installation procedures. - -<details> -<summary>Typical third party Python collector installation instructions</summary> - -In general the below steps should be sufficient to use a third party collector. - -1. Download collector code file - into [folder expected by Netdata](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md#environment-variables). -2. Download default collector configuration file - into [folder expected by Netdata](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md#environment-variables). -3. [Edit configuration file](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure#configure-a-collector) - from step 2 if required. -4. [Enable collector](https://github.com/netdata/netdata/blob/master/docs/collect/enable-configure#enable-a-collector-or-its-orchestrator). -5. [Restart Netdata](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) - -For example below are the steps to enable -the [Python ClickHouse collector](https://github.com/netdata/community/tree/main/collectors/python.d.plugin/clickhouse). - -```bash -# download python collector script to /usr/libexec/netdata/python.d/ -$ sudo wget https://raw.githubusercontent.com/netdata/community/main/collectors/python.d.plugin/clickhouse/clickhouse.chart.py -O /usr/libexec/netdata/python.d/clickhouse.chart.py - -# (optional) download default .conf to /etc/netdata/python.d/ -$ sudo wget https://raw.githubusercontent.com/netdata/community/main/collectors/python.d.plugin/clickhouse/clickhouse.conf -O /etc/netdata/python.d/clickhouse.conf - -# enable collector by adding line a new line with "clickhouse: yes" to /etc/netdata/python.d.conf file -# this will append to the file if it already exists or create it if not -$ sudo echo "clickhouse: yes" >> /etc/netdata/python.d.conf - -# (optional) edit clickhouse.conf if needed -$ sudo vi /etc/netdata/python.d/clickhouse.conf - -# restart netdata -# see docs for more information: https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md -$ sudo systemctl restart netdata -``` - -</details> - -- [CyberPower UPS](https://github.com/HawtDogFlvrWtr/netdata_cyberpwrups_plugin): Polls CyberPower UPS data using - PowerPanel® Personal Linux. -- [Logged-in users](https://github.com/veksh/netdata-numsessions): Collect the number of currently logged-on users. -- [nextcloud](https://github.com/arnowelzel/netdata-nextcloud): Monitor Nextcloud servers. -- [nim-netdata-plugin](https://github.com/FedericoCeratto/nim-netdata-plugin): A helper to create native Netdata - plugins using Nim. -- [Nvidia GPUs](https://github.com/coraxx/netdata_nv_plugin): Monitor Nvidia GPUs. -- [Teamspeak 3](https://github.com/coraxx/netdata_ts3_plugin): Pulls active users and bandwidth from TeamSpeak 3 - servers. -- [SSH](https://github.com/Yaser-Amiri/netdata-ssh-module): Monitor failed authentication requests of an SSH server. -- [ClickHouse](https://github.com/netdata/community/tree/main/collectors/python.d.plugin/clickhouse): - Monitor [ClickHouse](https://clickhouse.com/) database. -- [Ethtool](https://github.com/ghanapunq/netdata_ethtool_plugin): Monitor network interfaces with ethtool. -- [netdata-needrestart](https://github.com/nodiscc/netdata-needrestart) - Check/graph the number of processes/services/kernels that should be restarted after upgrading packages. -- [netdata-debsecan](https://github.com/nodiscc/netdata-debsecan) - Check/graph the number of CVEs in currently installed packages. -- [netdata-logcount](https://github.com/nodiscc/netdata-logcount) - Check/graph the number of syslog messages, by level over time. -- [netdata-apt](https://github.com/nodiscc/netdata-apt) - Check/graph and alert on the number of upgradeable packages, and available distribution upgrades. -- [diskquota](https://github.com/netdata/community/tree/main/collectors/python.d.plugin/diskquota) - Monitors the defined quotas on one or more filesystems depending on configuration. - -## Etc - -- [charts.d example](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/example/README.md): An - example `charts.d` collector. -- [python.d example](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/example/README.md): An - example `python.d` collector. -- [go.d example](https://github.com/netdata/go.d.plugin/blob/master/modules/example/README.md): An - example `go.d` collector. +- [eBPF Cachestat](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_cachestat.md) + +- [eBPF DCstat](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_dcstat.md) + +- [eBPF Disk](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_disk.md) + +- [eBPF Filedescriptor](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_filedescriptor.md) + +- [eBPF Filesystem](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_filesystem.md) + +- [eBPF Hardirq](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_hardirq.md) + +- [eBPF MDflush](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_mdflush.md) + +- [eBPF Mount](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_mount.md) + +- [eBPF OOMkill](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_oomkill.md) + +- [eBPF Process](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_process.md) + +- [eBPF Processes](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_processes.md) + +- [eBPF SHM](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_shm.md) + +- [eBPF SWAP](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_swap.md) + +- [eBPF Socket](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_socket.md) + +- [eBPF SoftIRQ](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_softirq.md) + +- [eBPF Sync](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_sync.md) + +- [eBPF VFS](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/integrations/ebpf_vfs.md) + +### FreeBSD + +- [FreeBSD NFS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/freebsd_nfs.md) + +- [FreeBSD RCTL/RACCT](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/freebsd_rctl-racct.md) + +- [dev.cpu.0.freq](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/dev.cpu.0.freq.md) + +- [dev.cpu.temperature](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/dev.cpu.temperature.md) + +- [devstat](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/devstat.md) + +- [getifaddrs](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/getifaddrs.md) + +- [getmntinfo](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/getmntinfo.md) + +- [hw.intrcnt](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/hw.intrcnt.md) + +- [ipfw](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/ipfw.md) + +- [kern.cp_time](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/kern.cp_time.md) + +- [kern.ipc.msq](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/kern.ipc.msq.md) + +- [kern.ipc.sem](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/kern.ipc.sem.md) + +- [kern.ipc.shm](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/kern.ipc.shm.md) + +- [net.inet.icmp.stats](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.inet.icmp.stats.md) + +- [net.inet.ip.stats](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.inet.ip.stats.md) + +- [net.inet.tcp.states](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.inet.tcp.states.md) + +- [net.inet.tcp.stats](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.inet.tcp.stats.md) + +- [net.inet.udp.stats](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.inet.udp.stats.md) + +- [net.inet6.icmp6.stats](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.inet6.icmp6.stats.md) + +- [net.inet6.ip6.stats](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.inet6.ip6.stats.md) + +- [net.isr](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/net.isr.md) + +- [system.ram](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/system.ram.md) + +- [uptime](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/uptime.md) + +- [vm.loadavg](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.loadavg.md) + +- [vm.stats.sys.v_intr](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.stats.sys.v_intr.md) + +- [vm.stats.sys.v_soft](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.stats.sys.v_soft.md) + +- [vm.stats.sys.v_swtch](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.stats.sys.v_swtch.md) + +- [vm.stats.vm.v_pgfaults](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.stats.vm.v_pgfaults.md) + +- [vm.stats.vm.v_swappgs](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.stats.vm.v_swappgs.md) + +- [vm.swap_info](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.swap_info.md) + +- [vm.vmtotal](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/vm.vmtotal.md) + +- [zfs](https://github.com/netdata/netdata/blob/master/collectors/freebsd.plugin/integrations/zfs.md) + +### FTP Servers + +- [ProFTPD](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/proftpd.md) + +### Gaming + +- [BungeeCord](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/bungeecord.md) + +- [CS:GO](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cs:go.md) + +- [Minecraft](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/minecraft.md) + +- [OpenRCT2](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openrct2.md) + +- [SpigotMC](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/spigotmc/integrations/spigotmc.md) + +- [Steam](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/steam.md) + +### Generic Data Collection + +- [Custom Exporter](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/custom_exporter.md) + +- [Excel spreadsheet](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/excel_spreadsheet.md) + +- [Generic Command Line Output](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/generic_command_line_output.md) + +- [JetBrains Floating License Server](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/jetbrains_floating_license_server.md) + +- [OpenWeatherMap](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openweathermap.md) + +- [Pandas](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/pandas/integrations/pandas.md) + +- [Prometheus endpoint](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/prometheus_endpoint.md) + +- [SNMP devices](https://github.com/netdata/go.d.plugin/blob/master/modules/snmp/integrations/snmp_devices.md) + +- [Shell command](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/shell_command.md) + +- [Tankerkoenig API](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/tankerkoenig_api.md) + +- [TwinCAT ADS Web Service](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/twincat_ads_web_service.md) + +### Hardware Devices and Sensors + +- [1-Wire Sensors](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/w1sensor/integrations/1-wire_sensors.md) + +- [AM2320](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/am2320/integrations/am2320.md) + +- [AMD CPU & GPU](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/amd_cpu_&_gpu.md) + +- [AMD GPU](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/amd_gpu.md) + +- [ARM HWCPipe](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/arm_hwcpipe.md) + +- [CUPS](https://github.com/netdata/netdata/blob/master/collectors/cups.plugin/integrations/cups.md) + +- [HDD temperature](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/hddtemp/integrations/hdd_temperature.md) + +- [HP iLO](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hp_ilo.md) + +- [IBM CryptoExpress (CEX) cards](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ibm_cryptoexpress_cex_cards.md) + +- [IBM Z Hardware Management Console](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ibm_z_hardware_management_console.md) + +- [IPMI (By SoundCloud)](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ipmi_by_soundcloud.md) + +- [Intelligent Platform Management Interface (IPMI)](https://github.com/netdata/netdata/blob/master/collectors/freeipmi.plugin/integrations/intelligent_platform_management_interface_ipmi.md) + +- [Linux Sensors (lm-sensors)](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/sensors/integrations/linux_sensors_lm-sensors.md) + +- [Linux Sensors (sysfs)](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/sensors/integrations/linux_sensors_sysfs.md) + +- [NVML](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nvml.md) + +- [Nvidia GPU](https://github.com/netdata/go.d.plugin/blob/master/modules/nvidia_smi/integrations/nvidia_gpu.md) + +- [Raritan PDU](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/raritan_pdu.md) + +- [S.M.A.R.T.](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/smartd_log/integrations/s.m.a.r.t..md) + +- [ServerTech](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/servertech.md) + +- [Siemens S7 PLC](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/siemens_s7_plc.md) + +- [T-Rex NVIDIA GPU Miner](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/t-rex_nvidia_gpu_miner.md) + +### IoT Devices + +- [Airthings Waveplus air sensor](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/airthings_waveplus_air_sensor.md) + +- [Bobcat Miner 300](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/bobcat_miner_300.md) + +- [Christ Elektronik CLM5IP power panel](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/christ_elektronik_clm5ip_power_panel.md) + +- [CraftBeerPi](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/craftbeerpi.md) + +- [Dutch Electricity Smart Meter](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dutch_electricity_smart_meter.md) + +- [Elgato Key Light devices.](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/elgato_key_light_devices..md) + +- [Energomera smart power meters](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/energomera_smart_power_meters.md) + +- [Helium hotspot](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/helium_hotspot.md) + +- [Homebridge](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/homebridge.md) + +- [Homey](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/homey.md) + +- [Jarvis Standing Desk](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/jarvis_standing_desk.md) + +- [MP707 USB thermometer](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mp707_usb_thermometer.md) + +- [Modbus protocol](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/modbus_protocol.md) + +- [Monnit Sensors MQTT](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/monnit_sensors_mqtt.md) + +- [Nature Remo E lite devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nature_remo_e_lite_devices.md) + +- [Netatmo sensors](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/netatmo_sensors.md) + +- [OpenHAB](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openhab.md) + +- [Personal Weather Station](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/personal_weather_station.md) + +- [Philips Hue](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/philips_hue.md) + +- [Pimoroni Enviro+](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/pimoroni_enviro+.md) + +- [Powerpal devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/powerpal_devices.md) + +- [Radio Thermostat](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/radio_thermostat.md) + +- [SMA Inverters](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sma_inverters.md) + +- [Salicru EQX inverter](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/salicru_eqx_inverter.md) + +- [Sense Energy](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sense_energy.md) + +- [Shelly humidity sensor](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/shelly_humidity_sensor.md) + +- [Smart meters SML](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/smart_meters_sml.md) + +- [Solar logging stick](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/solar_logging_stick.md) + +- [SolarEdge inverters](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/solaredge_inverters.md) + +- [Solis Ginlong 5G inverters](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/solis_ginlong_5g_inverters.md) + +- [Sunspec Solar Energy](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sunspec_solar_energy.md) + +- [TP-Link P110](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/tp-link_p110.md) + +- [Tado smart heating solution](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/tado_smart_heating_solution.md) + +- [Tesla Powerwall](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/tesla_powerwall.md) + +- [Tesla Wall Connector](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/tesla_wall_connector.md) + +- [Tesla vehicle](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/tesla_vehicle.md) + +- [Xiaomi Mi Flora](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/xiaomi_mi_flora.md) + +- [iqAir AirVisual air quality monitors](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/iqair_airvisual_air_quality_monitors.md) + +### Kubernetes + +- [Cilium Agent](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cilium_agent.md) + +- [Cilium Operator](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cilium_operator.md) + +- [Cilium Proxy](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cilium_proxy.md) + +- [Kubelet](https://github.com/netdata/go.d.plugin/blob/master/modules/k8s_kubelet/integrations/kubelet.md) + +- [Kubeproxy](https://github.com/netdata/go.d.plugin/blob/master/modules/k8s_kubeproxy/integrations/kubeproxy.md) + +- [Kubernetes Cluster Cloud Cost](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/kubernetes_cluster_cloud_cost.md) + +- [Kubernetes Cluster State](https://github.com/netdata/go.d.plugin/blob/master/modules/k8s_state/integrations/kubernetes_cluster_state.md) + +- [Kubernetes Containers](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/kubernetes_containers.md) + +- [Rancher](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/rancher.md) + +### Linux Systems + +- [CPU performance](https://github.com/netdata/netdata/blob/master/collectors/perf.plugin/integrations/cpu_performance.md) + +- [Disk space](https://github.com/netdata/netdata/blob/master/collectors/diskspace.plugin/integrations/disk_space.md) + +- [Files and directories](https://github.com/netdata/go.d.plugin/blob/master/modules/filecheck/integrations/files_and_directories.md) + +- [OpenRC](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openrc.md) + +#### CPU + +- [Interrupts](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/interrupts.md) + +- [SoftIRQ statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/softirq_statistics.md) + +#### Disk + +- [Disk Statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/disk_statistics.md) + +- [MD RAID](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/md_raid.md) + +##### BTRFS + +- [BTRFS](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/btrfs.md) + +##### NFS + +- [NFS Client](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/nfs_client.md) + +- [NFS Server](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/nfs_server.md) + +##### ZFS + +- [ZFS Adaptive Replacement Cache](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/zfs_adaptive_replacement_cache.md) + +- [ZFS Pools](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/zfs_pools.md) + +#### Firewall + +- [Conntrack](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/conntrack.md) + +- [Netfilter](https://github.com/netdata/netdata/blob/master/collectors/nfacct.plugin/integrations/netfilter.md) + +- [Synproxy](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/synproxy.md) + +- [nftables](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nftables.md) + +#### IPC + +- [Inter Process Communication](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/inter_process_communication.md) + +#### Kernel + +- [Linux kernel SLAB allocator statistics](https://github.com/netdata/netdata/blob/master/collectors/slabinfo.plugin/integrations/linux_kernel_slab_allocator_statistics.md) + +- [Power Capping](https://github.com/netdata/netdata/blob/master/collectors/debugfs.plugin/integrations/power_capping.md) + +#### Memory + +- [Kernel Same-Page Merging](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/kernel_same-page_merging.md) + +- [Linux ZSwap](https://github.com/netdata/netdata/blob/master/collectors/debugfs.plugin/integrations/linux_zswap.md) + +- [Memory Statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/memory_statistics.md) + +- [Memory Usage](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/memory_usage.md) + +- [Memory modules (DIMMs)](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/memory_modules_dimms.md) + +- [Non-Uniform Memory Access](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/non-uniform_memory_access.md) + +- [Page types](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/page_types.md) + +- [System Memory Fragmentation](https://github.com/netdata/netdata/blob/master/collectors/debugfs.plugin/integrations/system_memory_fragmentation.md) + +- [ZRAM](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/zram.md) + +#### Network + +- [Access Points](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/ap/integrations/access_points.md) + +- [IP Virtual Server](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/ip_virtual_server.md) + +- [IPv6 Socket Statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/ipv6_socket_statistics.md) + +- [InfiniBand](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/infiniband.md) + +- [Network interfaces](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/network_interfaces.md) + +- [Network statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/network_statistics.md) + +- [SCTP Statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/sctp_statistics.md) + +- [Socket statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/socket_statistics.md) + +- [Softnet Statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/softnet_statistics.md) + +- [Wireless network interfaces](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/wireless_network_interfaces.md) + +- [tc QoS classes](https://github.com/netdata/netdata/blob/master/collectors/tc.plugin/integrations/tc_qos_classes.md) + +#### Power Supply + +- [Power Supply](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/power_supply.md) + +#### Pressure + +- [Pressure Stall Information](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/pressure_stall_information.md) + +#### System + +- [Entropy](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/entropy.md) + +- [System Load Average](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/system_load_average.md) + +- [System Uptime](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/system_uptime.md) + +- [System statistics](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/integrations/system_statistics.md) + +### Logs Servers + +- [AuthLog](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/authlog.md) + +- [Fluentd](https://github.com/netdata/go.d.plugin/blob/master/modules/fluentd/integrations/fluentd.md) + +- [Graylog Server](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/graylog_server.md) + +- [Logstash](https://github.com/netdata/go.d.plugin/blob/master/modules/logstash/integrations/logstash.md) + +- [journald](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/journald.md) + +- [loki](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/loki.md) + +- [mtail](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mtail.md) + +### macOS Systems + +- [Apple Time Machine](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/apple_time_machine.md) + +- [macOS](https://github.com/netdata/netdata/blob/master/collectors/macos.plugin/integrations/macos.md) + +### Mail Servers + +- [DMARC](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dmarc.md) + +- [Dovecot](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/dovecot/integrations/dovecot.md) + +- [Exim](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/exim/integrations/exim.md) + +- [Halon](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/halon.md) + +- [Maildir](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/maildir.md) + +- [Postfix](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/postfix/integrations/postfix.md) + +### Media Services + +- [Discourse](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/discourse.md) + +- [Icecast](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/icecast/integrations/icecast.md) + +- [OBS Studio](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/obs_studio.md) + +- [RetroShare](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/retroshare/integrations/retroshare.md) + +- [SABnzbd](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sabnzbd.md) + +- [Stream](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/stream.md) + +- [Twitch](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/twitch.md) + +- [Zulip](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/zulip.md) + +### Message Brokers + +- [ActiveMQ](https://github.com/netdata/go.d.plugin/blob/master/modules/activemq/integrations/activemq.md) + +- [Apache Pulsar](https://github.com/netdata/go.d.plugin/blob/master/modules/pulsar/integrations/apache_pulsar.md) + +- [Beanstalk](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/beanstalk/integrations/beanstalk.md) + +- [IBM MQ](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ibm_mq.md) + +- [Kafka Connect](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/kafka_connect.md) + +- [Kafka ZooKeeper](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/kafka_zookeeper.md) + +- [Kafka](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/kafka.md) + +- [MQTT Blackbox](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mqtt_blackbox.md) + +- [RabbitMQ](https://github.com/netdata/go.d.plugin/blob/master/modules/rabbitmq/integrations/rabbitmq.md) + +- [Redis Queue](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/redis_queue.md) + +- [VerneMQ](https://github.com/netdata/go.d.plugin/blob/master/modules/vernemq/integrations/vernemq.md) + +- [XMPP Server](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/xmpp_server.md) + +- [mosquitto](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mosquitto.md) + +### Networking Stack and Network Interfaces + +- [8430FT modem](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/8430ft_modem.md) + +- [A10 ACOS network devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/a10_acos_network_devices.md) + +- [Andrews & Arnold line status](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/andrews_&_arnold_line_status.md) + +- [Aruba devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/aruba_devices.md) + +- [Bird Routing Daemon](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/bird_routing_daemon.md) + +- [Checkpoint device](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/checkpoint_device.md) + +- [Cisco ACI](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cisco_aci.md) + +- [Citrix NetScaler](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/citrix_netscaler.md) + +- [DDWRT Routers](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ddwrt_routers.md) + +- [FRRouting](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/frrouting.md) + +- [Fortigate firewall](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/fortigate_firewall.md) + +- [Freifunk network](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/freifunk_network.md) + +- [Fritzbox network devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/fritzbox_network_devices.md) + +- [Hitron CGN series CPE](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hitron_cgn_series_cpe.md) + +- [Hitron CODA Cable Modem](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hitron_coda_cable_modem.md) + +- [Huawei devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/huawei_devices.md) + +- [Keepalived](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/keepalived.md) + +- [Meraki dashboard](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/meraki_dashboard.md) + +- [MikroTik devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mikrotik_devices.md) + +- [Mikrotik RouterOS devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mikrotik_routeros_devices.md) + +- [NetFlow](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/netflow.md) + +- [NetMeter](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/netmeter.md) + +- [Open vSwitch](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/open_vswitch.md) + +- [OpenROADM devices](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openroadm_devices.md) + +- [RIPE Atlas](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ripe_atlas.md) + +- [SONiC NOS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sonic_nos.md) + +- [SmartRG 808AC Cable Modem](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/smartrg_808ac_cable_modem.md) + +- [Starlink (SpaceX)](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/starlink_spacex.md) + +- [Traceroute](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/traceroute.md) + +- [Ubiquiti UFiber OLT](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ubiquiti_ufiber_olt.md) + +- [Zyxel GS1200-8](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/zyxel_gs1200-8.md) + +### Incident Management + +- [OTRS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/otrs.md) + +- [StatusPage](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/statuspage.md) + +### Observability + +- [Collectd](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/collectd.md) + +- [Dynatrace](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dynatrace.md) + +- [Grafana](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/grafana.md) + +- [Hubble](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hubble.md) + +- [Naemon](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/naemon.md) + +- [Nagios](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/nagios.md) + +- [New Relic](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/new_relic.md) + +### Other + +- [Example collector](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/example/integrations/example_collector.md) + +- [GitHub API rate limit](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/github_api_rate_limit.md) + +- [GitHub repository](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/github_repository.md) + +- [Netdata Agent alarms](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/alarms/integrations/netdata_agent_alarms.md) + +- [python.d changefinder](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/changefinder/integrations/python.d_changefinder.md) + +- [python.d zscores](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/zscores/integrations/python.d_zscores.md) + +### Processes and System Services + +- [Applications](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/integrations/applications.md) + +- [Supervisor](https://github.com/netdata/go.d.plugin/blob/master/modules/supervisord/integrations/supervisor.md) + +- [User Groups](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/integrations/user_groups.md) + +- [Users](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/integrations/users.md) + +### Provisioning Systems + +- [BOSH](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/bosh.md) + +- [Cloud Foundry Firehose](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cloud_foundry_firehose.md) + +- [Cloud Foundry](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cloud_foundry.md) + +- [Spacelift](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/spacelift.md) + +### Search Engines + +- [Elasticsearch](https://github.com/netdata/go.d.plugin/blob/master/modules/elasticsearch/integrations/elasticsearch.md) + +- [Meilisearch](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/meilisearch.md) + +- [OpenSearch](https://github.com/netdata/go.d.plugin/blob/master/modules/elasticsearch/integrations/opensearch.md) + +- [Solr](https://github.com/netdata/go.d.plugin/blob/master/modules/solr/integrations/solr.md) + +- [Sphinx](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/sphinx.md) + +### Security Systems + +- [Certificate Transparency](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/certificate_transparency.md) + +- [ClamAV daemon](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/clamav_daemon.md) + +- [Clamscan results](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/clamscan_results.md) + +- [Crowdsec](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/crowdsec.md) + +- [Honeypot](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/honeypot.md) + +- [Lynis audit reports](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/lynis_audit_reports.md) + +- [OpenVAS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/openvas.md) + +- [SSL Certificate](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ssl_certificate.md) + +- [Suricata](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/suricata.md) + +- [Vault PKI](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/vault_pki.md) + +### Service Discovery / Registry + +- [Consul](https://github.com/netdata/go.d.plugin/blob/master/modules/consul/integrations/consul.md) + +- [Kafka Consumer Lag](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/kafka_consumer_lag.md) + +- [ZooKeeper](https://github.com/netdata/go.d.plugin/blob/master/modules/zookeeper/integrations/zookeeper.md) + +- [etcd](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/etcd.md) + +### Storage, Mount Points and Filesystems + +- [AdaptecRAID](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/adaptec_raid/integrations/adaptecraid.md) + +- [Altaro Backup](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/altaro_backup.md) + +- [Borg backup](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/borg_backup.md) + +- [CVMFS clients](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cvmfs_clients.md) + +- [Ceph](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/ceph/integrations/ceph.md) + +- [Dell EMC Isilon cluster](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dell_emc_isilon_cluster.md) + +- [Dell EMC ScaleIO](https://github.com/netdata/go.d.plugin/blob/master/modules/scaleio/integrations/dell_emc_scaleio.md) + +- [Dell EMC XtremIO cluster](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dell_emc_xtremio_cluster.md) + +- [Dell PowerMax](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/dell_powermax.md) + +- [EOS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/eos.md) + +- [Generic storage enclosure tool](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/generic_storage_enclosure_tool.md) + +- [HDSentinel](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hdsentinel.md) + +- [HP Smart Storage Arrays](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/hpssa/integrations/hp_smart_storage_arrays.md) + +- [Hadoop Distributed File System (HDFS)](https://github.com/netdata/go.d.plugin/blob/master/modules/hdfs/integrations/hadoop_distributed_file_system_hdfs.md) + +- [IBM Spectrum Virtualize](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ibm_spectrum_virtualize.md) + +- [IBM Spectrum](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/ibm_spectrum.md) + +- [IPFS](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/ipfs/integrations/ipfs.md) + +- [Lagerist Disk latency](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/lagerist_disk_latency.md) + +- [MegaCLI](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/megacli/integrations/megacli.md) + +- [MogileFS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mogilefs.md) + +- [NVMe devices](https://github.com/netdata/go.d.plugin/blob/master/modules/nvme/integrations/nvme_devices.md) + +- [NetApp Solidfire](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/netapp_solidfire.md) + +- [Netapp ONTAP API](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/netapp_ontap_api.md) + +- [Samba](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/samba/integrations/samba.md) + +- [Starwind VSAN VSphere Edition](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/starwind_vsan_vsphere_edition.md) + +- [Storidge](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/storidge.md) + +- [Synology ActiveBackup](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/synology_activebackup.md) + +### Synthetic Checks + +- [Blackbox](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/blackbox.md) + +- [Domain expiration date](https://github.com/netdata/go.d.plugin/blob/master/modules/whoisquery/integrations/domain_expiration_date.md) + +- [HTTP Endpoints](https://github.com/netdata/go.d.plugin/blob/master/modules/httpcheck/integrations/http_endpoints.md) + +- [IOPing](https://github.com/netdata/netdata/blob/master/collectors/ioping.plugin/integrations/ioping.md) + +- [Idle OS Jitter](https://github.com/netdata/netdata/blob/master/collectors/idlejitter.plugin/integrations/idle_os_jitter.md) + +- [Monit](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/monit/integrations/monit.md) + +- [Ping](https://github.com/netdata/go.d.plugin/blob/master/modules/ping/integrations/ping.md) + +- [Pingdom](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/pingdom.md) + +- [Site 24x7](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/site_24x7.md) + +- [TCP Endpoints](https://github.com/netdata/go.d.plugin/blob/master/modules/portcheck/integrations/tcp_endpoints.md) + +- [Uptimerobot](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/uptimerobot.md) + +- [X.509 certificate](https://github.com/netdata/go.d.plugin/blob/master/modules/x509check/integrations/x.509_certificate.md) + +### System Clock and NTP + +- [Chrony](https://github.com/netdata/go.d.plugin/blob/master/modules/chrony/integrations/chrony.md) + +- [NTPd](https://github.com/netdata/go.d.plugin/blob/master/modules/ntpd/integrations/ntpd.md) + +- [Timex](https://github.com/netdata/netdata/blob/master/collectors/timex.plugin/integrations/timex.md) + +### Systemd + +- [Systemd Services](https://github.com/netdata/netdata/blob/master/collectors/cgroups.plugin/integrations/systemd_services.md) + +- [Systemd Units](https://github.com/netdata/go.d.plugin/blob/master/modules/systemdunits/integrations/systemd_units.md) + +- [systemd-logind users](https://github.com/netdata/go.d.plugin/blob/master/modules/logind/integrations/systemd-logind_users.md) + +### Task Queues + +- [Celery](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/celery.md) + +- [Mesos](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/mesos.md) + +- [Slurm](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/slurm.md) + +### Telephony Servers + +- [GTP](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/gtp.md) + +- [Kannel](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/kannel.md) + +- [OpenSIPS](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/opensips/integrations/opensips.md) + +### UPS + +- [APC UPS](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/apcupsd/integrations/apc_ups.md) + +- [Eaton UPS](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/eaton_ups.md) + +- [Network UPS Tools (NUT)](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/nut/integrations/network_ups_tools_nut.md) + +- [UPS (NUT)](https://github.com/netdata/go.d.plugin/blob/master/modules/upsd/integrations/ups_nut.md) + +### VPNs + +- [Fastd](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/fastd.md) + +- [Libreswan](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/libreswan/integrations/libreswan.md) + +- [OpenVPN status log](https://github.com/netdata/go.d.plugin/blob/master/modules/openvpn_status_log/integrations/openvpn_status_log.md) + +- [OpenVPN](https://github.com/netdata/go.d.plugin/blob/master/modules/openvpn/integrations/openvpn.md) + +- [SoftEther VPN Server](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/softether_vpn_server.md) + +- [Speedify CLI](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/speedify_cli.md) + +- [Tor](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/tor/integrations/tor.md) + +- [WireGuard](https://github.com/netdata/go.d.plugin/blob/master/modules/wireguard/integrations/wireguard.md) + +- [strongSwan](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/strongswan.md) + +### Web Servers and Web Proxies + +- [APIcast](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/apicast.md) + +- [Apache](https://github.com/netdata/go.d.plugin/blob/master/modules/apache/integrations/apache.md) + +- [Clash](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/clash.md) + +- [Cloudflare PCAP](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/cloudflare_pcap.md) + +- [Envoy](https://github.com/netdata/go.d.plugin/blob/master/modules/envoy/integrations/envoy.md) + +- [Gobetween](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/gobetween.md) + +- [HAProxy](https://github.com/netdata/go.d.plugin/blob/master/modules/haproxy/integrations/haproxy.md) + +- [HHVM](https://github.com/netdata/go.d.plugin/blob/master/modules/prometheus/integrations/hhvm.md) + +- [HTTPD](https://github.com/netdata/go.d.plugin/blob/master/modules/apache/integrations/httpd.md) + +- [Lighttpd](https://github.com/netdata/go.d.plugin/blob/master/modules/lighttpd/integrations/lighttpd.md) + +- [Litespeed](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/litespeed/integrations/litespeed.md) + +- [NGINX Plus](https://github.com/netdata/go.d.plugin/blob/master/modules/nginxplus/integrations/nginx_plus.md) + +- [NGINX VTS](https://github.com/netdata/go.d.plugin/blob/master/modules/nginxvts/integrations/nginx_vts.md) + +- [NGINX](https://github.com/netdata/go.d.plugin/blob/master/modules/nginx/integrations/nginx.md) + +- [PHP-FPM](https://github.com/netdata/go.d.plugin/blob/master/modules/phpfpm/integrations/php-fpm.md) + +- [Squid log files](https://github.com/netdata/go.d.plugin/blob/master/modules/squidlog/integrations/squid_log_files.md) + +- [Squid](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/squid/integrations/squid.md) + +- [Tengine](https://github.com/netdata/go.d.plugin/blob/master/modules/tengine/integrations/tengine.md) + +- [Tomcat](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/tomcat/integrations/tomcat.md) + +- [Traefik](https://github.com/netdata/go.d.plugin/blob/master/modules/traefik/integrations/traefik.md) + +- [Varnish](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/varnish/integrations/varnish.md) + +- [Web server log files](https://github.com/netdata/go.d.plugin/blob/master/modules/weblog/integrations/web_server_log_files.md) + +- [uWSGI](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md) + +### Windows Systems + +- [Active Directory](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/integrations/active_directory.md) + +- [HyperV](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/integrations/hyperv.md) + +- [MS Exchange](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/integrations/ms_exchange.md) + +- [MS SQL Server](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/integrations/ms_sql_server.md) + +- [NET Framework](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/integrations/net_framework.md) + +- [Windows](https://github.com/netdata/go.d.plugin/blob/master/modules/windows/integrations/windows.md) diff --git a/collectors/all.h b/collectors/all.h index 22b75aaaa..ec4ac00eb 100644 --- a/collectors/all.h +++ b/collectors/all.h @@ -266,65 +266,76 @@ // IP STACK -#define NETDATA_CHART_PRIO_IP_ERRORS 4100 -#define NETDATA_CHART_PRIO_IP_TCP_CONNABORTS 4210 -#define NETDATA_CHART_PRIO_IP_TCP_SYN_QUEUE 4215 -#define NETDATA_CHART_PRIO_IP_TCP_ACCEPT_QUEUE 4216 -#define NETDATA_CHART_PRIO_IP_TCP_REORDERS 4220 -#define NETDATA_CHART_PRIO_IP_TCP_OFO 4250 -#define NETDATA_CHART_PRIO_IP_TCP_SYNCOOKIES 4260 -#define NETDATA_CHART_PRIO_IP_TCP_MEM 4290 -#define NETDATA_CHART_PRIO_IP_BCAST 4500 -#define NETDATA_CHART_PRIO_IP_BCAST_PACKETS 4510 -#define NETDATA_CHART_PRIO_IP_MCAST 4600 -#define NETDATA_CHART_PRIO_IP_MCAST_PACKETS 4610 -#define NETDATA_CHART_PRIO_IP_ECN 4700 +#define NETDATA_CHART_PRIO_IP_TCP_PACKETS 4200 +#define NETDATA_CHART_PRIO_IP_TCP_ERRORS 4210 +#define NETDATA_CHART_PRIO_IP_TCP_ESTABLISHED_CONNS 4220 +#define NETDATA_CHART_PRIO_IP_TCP_OPENS 4220 +#define NETDATA_CHART_PRIO_IP_TCP_HANDSHAKE 4230 +#define NETDATA_CHART_PRIO_IP_TCP_CONNABORTS 4240 +#define NETDATA_CHART_PRIO_IP_TCP_SYN_QUEUE 4250 +#define NETDATA_CHART_PRIO_IP_TCP_ACCEPT_QUEUE 4260 +#define NETDATA_CHART_PRIO_IP_TCP_REORDERS 4270 +#define NETDATA_CHART_PRIO_IP_TCP_OFO 4280 +#define NETDATA_CHART_PRIO_IP_TCP_SYNCOOKIES 4290 +#define NETDATA_CHART_PRIO_IP_TCP_MEM_PRESSURE 4300 +#define NETDATA_CHART_PRIO_IP_SOCKETS 4310 // IPv4 -#define NETDATA_CHART_PRIO_IPV4_SOCKETS 5100 -#define NETDATA_CHART_PRIO_IPV4_PACKETS 5130 -#define NETDATA_CHART_PRIO_IPV4_ERRORS 5150 -#define NETDATA_CHART_PRIO_IPV4_ICMP 5170 -#define NETDATA_CHART_PRIO_IPV4_TCP 5200 -#define NETDATA_CHART_PRIO_IPV4_TCP_SOCKETS 5201 -#define NETDATA_CHART_PRIO_IPV4_TCP_MEM 5290 -#define NETDATA_CHART_PRIO_IPV4_UDP 5300 -#define NETDATA_CHART_PRIO_IPV4_UDP_MEM 5390 -#define NETDATA_CHART_PRIO_IPV4_UDPLITE 5400 +#define NETDATA_CHART_PRIO_IPV4_PACKETS 5000 +#define NETDATA_CHART_PRIO_IPV4_ERRORS 5050 +#define NETDATA_CHART_PRIO_IPV4_BCAST 5100 +#define NETDATA_CHART_PRIO_IPV4_BCAST_PACKETS 5105 +#define NETDATA_CHART_PRIO_IPV4_MCAST 5150 +#define NETDATA_CHART_PRIO_IPV4_MCAST_PACKETS 5155 +#define NETDATA_CHART_PRIO_IPV4_TCP_SOCKETS 5180 +#define NETDATA_CHART_PRIO_IPV4_TCP_SOCKETS_MEM 5185 +#define NETDATA_CHART_PRIO_IPV4_ICMP_PACKETS 5200 +#define NETDATA_CHART_PRIO_IPV4_ICMP_MESSAGES 5205 +#define NETDATA_CHART_PRIO_IPV4_ICMP_ERRORS 5210 +#define NETDATA_CHART_PRIO_IPV4_UDP_PACKETS 5250 +#define NETDATA_CHART_PRIO_IPV4_UDP_ERRORS 5255 +#define NETDATA_CHART_PRIO_IPV4_UDP_SOCKETS 5260 +#define NETDATA_CHART_PRIO_IPV4_UDP_SOCKETS_MEM 5265 +#define NETDATA_CHART_PRIO_IPV4_UDPLITE_PACKETS 5300 +#define NETDATA_CHART_PRIO_IPV4_UDPLITE_ERRORS 5305 +#define NETDATA_CHART_PRIO_IPV4_UDPLITE_SOCKETS 5310 +#define NETDATA_CHART_PRIO_IPV4_ECN 5350 +#define NETDATA_CHART_PRIO_IPV4_FRAGMENTS_IN 5400 +#define NETDATA_CHART_PRIO_IPV4_FRAGMENTS_OUT 5405 +#define NETDATA_CHART_PRIO_IPV4_FRAGMENTS_SOCKETS 5410 +#define NETDATA_CHART_PRIO_IPV4_FRAGMENTS_SOCKETS_MEM 5415 #define NETDATA_CHART_PRIO_IPV4_RAW 5450 -#define NETDATA_CHART_PRIO_IPV4_FRAGMENTS 5460 -#define NETDATA_CHART_PRIO_IPV4_FRAGMENTS_MEM 5470 // IPv6 - -#define NETDATA_CHART_PRIO_IPV6_PACKETS 6200 -#define NETDATA_CHART_PRIO_IPV6_ECT 6210 -#define NETDATA_CHART_PRIO_IPV6_ERRORS 6300 -#define NETDATA_CHART_PRIO_IPV6_FRAGMENTS 6400 -#define NETDATA_CHART_PRIO_IPV6_FRAGSOUT 6401 -#define NETDATA_CHART_PRIO_IPV6_FRAGSIN 6402 -#define NETDATA_CHART_PRIO_IPV6_TCP 6500 -#define NETDATA_CHART_PRIO_IPV6_UDP 6600 -#define NETDATA_CHART_PRIO_IPV6_UDP_PACKETS 6601 -#define NETDATA_CHART_PRIO_IPV6_UDP_ERRORS 6610 -#define NETDATA_CHART_PRIO_IPV6_UDPLITE 6700 -#define NETDATA_CHART_PRIO_IPV6_UDPLITE_PACKETS 6701 -#define NETDATA_CHART_PRIO_IPV6_UDPLITE_ERRORS 6710 -#define NETDATA_CHART_PRIO_IPV6_RAW 6800 -#define NETDATA_CHART_PRIO_IPV6_BCAST 6840 -#define NETDATA_CHART_PRIO_IPV6_MCAST 6850 -#define NETDATA_CHART_PRIO_IPV6_MCAST_PACKETS 6851 -#define NETDATA_CHART_PRIO_IPV6_ICMP 6900 -#define NETDATA_CHART_PRIO_IPV6_ICMP_REDIR 6910 -#define NETDATA_CHART_PRIO_IPV6_ICMP_ERRORS 6920 -#define NETDATA_CHART_PRIO_IPV6_ICMP_ECHOS 6930 -#define NETDATA_CHART_PRIO_IPV6_ICMP_GROUPMEMB 6940 -#define NETDATA_CHART_PRIO_IPV6_ICMP_ROUTER 6950 -#define NETDATA_CHART_PRIO_IPV6_ICMP_NEIGHBOR 6960 -#define NETDATA_CHART_PRIO_IPV6_ICMP_LDV2 6970 -#define NETDATA_CHART_PRIO_IPV6_ICMP_TYPES 6980 - +#define NETDATA_CHART_PRIO_IPV6_PACKETS 6000 +#define NETDATA_CHART_PRIO_IPV6_ERRORS 6005 +#define NETDATA_CHART_PRIO_IPV6_BCAST 6050 +#define NETDATA_CHART_PRIO_IPV6_MCAST 6100 +#define NETDATA_CHART_PRIO_IPV6_MCAST_PACKETS 6105 +#define NETDATA_CHART_PRIO_IPV6_TCP_SOCKETS 6140 +#define NETDATA_CHART_PRIO_IPV6_ICMP 6150 +#define NETDATA_CHART_PRIO_IPV6_ICMP_REDIR 6155 +#define NETDATA_CHART_PRIO_IPV6_ICMP_ERRORS 6160 +#define NETDATA_CHART_PRIO_IPV6_ICMP_ECHOS 6165 +#define NETDATA_CHART_PRIO_IPV6_ICMP_GROUPMEMB 6170 +#define NETDATA_CHART_PRIO_IPV6_ICMP_ROUTER 6180 +#define NETDATA_CHART_PRIO_IPV6_ICMP_NEIGHBOR 6185 +#define NETDATA_CHART_PRIO_IPV6_ICMP_LDV2 6190 +#define NETDATA_CHART_PRIO_IPV6_ICMP_TYPES 6195 +#define NETDATA_CHART_PRIO_IPV6_UDP 6200 +#define NETDATA_CHART_PRIO_IPV6_UDP_PACKETS 6205 +#define NETDATA_CHART_PRIO_IPV6_UDP_ERRORS 6210 +#define NETDATA_CHART_PRIO_IPV6_UDP_SOCKETS 6215 +#define NETDATA_CHART_PRIO_IPV6_UDPLITE 6250 +#define NETDATA_CHART_PRIO_IPV6_UDPLITE_PACKETS 6255 +#define NETDATA_CHART_PRIO_IPV6_UDPLITE_ERRORS 6260 +#define NETDATA_CHART_PRIO_IPV6_UDPLITE_SOCKETS 6265 +#define NETDATA_CHART_PRIO_IPV6_ECT 6300 +#define NETDATA_CHART_PRIO_IPV6_FRAGSIN 6350 +#define NETDATA_CHART_PRIO_IPV6_FRAGSOUT 6355 +#define NETDATA_CHART_PRIO_IPV6_FRAGMENTS_SOCKETS 6360 +#define NETDATA_CHART_PRIO_IPV6_RAW_SOCKETS 6400 // Network interfaces @@ -403,7 +414,8 @@ // [ml] charts #define ML_CHART_PRIO_DIMENSIONS 39181 #define ML_CHART_PRIO_ANOMALY_RATE 39182 -#define ML_CHART_PRIO_DETECTOR_EVENTS 39183 +#define ML_CHART_PRIO_TYPE_ANOMALY_RATE 39183 +#define ML_CHART_PRIO_DETECTOR_EVENTS 39184 // [netdata.ml] charts #define NETDATA_ML_CHART_RUNNING 890001 diff --git a/collectors/apps.plugin/apps_groups.conf b/collectors/apps.plugin/apps_groups.conf index 659bd0f03..9e9d83436 100644 --- a/collectors/apps.plugin/apps_groups.conf +++ b/collectors/apps.plugin/apps_groups.conf @@ -83,11 +83,12 @@ xenstat.plugin: xenstat.plugin perf.plugin: perf.plugin charts.d.plugin: *charts.d.plugin* python.d.plugin: *python.d.plugin* +systemd-journal.plugin:*systemd-journal.plugin* tc-qos-helper: *tc-qos-helper.sh* fping: fping ioping: ioping go.d.plugin: *go.d.plugin* -slabinfo.plugin: slabinfo.plugin +slabinfo.plugin: *slabinfo.plugin* ebpf.plugin: *ebpf.plugin* debugfs.plugin: *debugfs.plugin* @@ -136,7 +137,7 @@ modem: ModemManager netmanager: NetworkManager nm* systemd-networkd networkctl netplan connmand wicked* avahi-autoipd networkd-dispatcher firewall: firewalld ufw nft tor: tor -bluetooth: bluetooth bluez bluedevil obexd +bluetooth: bluetooth bluetoothd bluez bluedevil obexd # ----------------------------------------------------------------------------- # high availability and balancers @@ -159,7 +160,7 @@ chat: irssi *vines* *prosody* murmurd # ----------------------------------------------------------------------------- # monitoring -logs: ulogd* syslog* rsyslog* logrotate systemd-journald rotatelogs sysklogd metalog +logs: ulogd* syslog* rsyslog* logrotate *systemd-journal* rotatelogs sysklogd metalog nms: snmpd vnstatd smokeping zabbix* munin* mon openhpid tailon nrpe monit: monit splunk: splunkd @@ -209,7 +210,7 @@ proxmox-ve: pve* spiceproxy # ----------------------------------------------------------------------------- # containers & virtual machines -containers: lxc* docker* balena* +containers: lxc* docker* balena* containerd VMs: vbox* VBox* qemu* kvm* libvirt: virtlogd virtqemud virtstoraged virtnetworkd virtlockd virtinterfaced libvirt: virtnodedevd virtproxyd virtsecretd libvirtd @@ -238,7 +239,7 @@ dhcp: *dhcp* dhclient # ----------------------------------------------------------------------------- # name servers and clients -dns: named unbound nsd pdns_server knotd gdnsd yadifad dnsmasq systemd-resolve* pihole* avahi-daemon avahi-dnsconfd +dns: named unbound nsd pdns_server knotd gdnsd yadifad dnsmasq *systemd-resolve* pihole* avahi-daemon avahi-dnsconfd dnsdist: dnsdist # ----------------------------------------------------------------------------- @@ -271,7 +272,7 @@ backup: rsync lsyncd bacula* borg rclone # ----------------------------------------------------------------------------- # cron -cron: cron* atd anacron systemd-cron* incrond +cron: cron* atd anacron *systemd-cron* incrond # ----------------------------------------------------------------------------- # UPS @@ -319,7 +320,7 @@ airflow: *airflow* # ----------------------------------------------------------------------------- # GUI -X: X Xorg xinit xdm Xwayland xsettingsd +X: X Xorg xinit xdm Xwayland xsettingsd touchegg wayland: swaylock swayidle waypipe wayvnc kde: *kdeinit* kdm sddm plasmashell startplasma-* kwin* kwallet* krunner kactivitymanager* gnome: gnome-* gdm gconf* mutter @@ -353,11 +354,11 @@ kswapd: kswapd zswap: zswap kcompactd: kcompactd -system: systemd-* udisks* udevd* *udevd ipv6_addrconf dbus-* rtkit* +system: systemd* udisks* udevd* *udevd ipv6_addrconf dbus-* rtkit* system: mdadm acpid uuidd upowerd elogind* eudev mdev lvmpolld dmeventd system: accounts-daemon rngd haveged rasdaemon irqbalance start-stop-daemon system: supervise-daemon openrc* init runit runsvdir runsv auditd lsmd -system: abrt* nscd rtkit-daemon gpg-agent usbguard* +system: abrt* nscd rtkit-daemon gpg-agent usbguard* boltd geoclue kernel: kworker kthreadd kauditd lockd khelper kdevtmpfs khungtaskd rpciod kernel: fsnotify_mark kthrotld deferwq scsi_* kdmflush oom_reaper kdevtempfs diff --git a/collectors/apps.plugin/apps_plugin.c b/collectors/apps.plugin/apps_plugin.c index d25ae3f9b..152038968 100644 --- a/collectors/apps.plugin/apps_plugin.c +++ b/collectors/apps.plugin/apps_plugin.c @@ -265,10 +265,12 @@ struct target { uint32_t idhash; char name[MAX_NAME + 1]; - + char clean_name[MAX_NAME + 1]; // sanitized name used in chart id (need to replace at least dots) uid_t uid; gid_t gid; + bool is_other; + kernel_uint_t minflt; kernel_uint_t cminflt; kernel_uint_t majflt; @@ -782,7 +784,8 @@ static struct target *get_users_target(uid_t uid) { snprintfz(w->name, MAX_NAME, "%s", pw->pw_name); } - netdata_fix_chart_name(w->name); + strncpyz(w->clean_name, w->name, MAX_NAME); + netdata_fix_chart_name(w->clean_name); w->uid = uid; @@ -830,7 +833,8 @@ struct target *get_groups_target(gid_t gid) snprintfz(w->name, MAX_NAME, "%s", gr->gr_name); } - netdata_fix_chart_name(w->name); + strncpyz(w->clean_name, w->name, MAX_NAME); + netdata_fix_chart_name(w->clean_name); w->gid = gid; @@ -899,6 +903,14 @@ static struct target *get_apps_groups_target(const char *id, struct target *targ else // copy the id strncpyz(w->name, nid, MAX_NAME); + + // dots are used to distinguish chart type and id in streaming, so we should replace them + strncpyz(w->clean_name, w->name, MAX_NAME); + netdata_fix_chart_name(w->clean_name); + for (char *d = w->clean_name; *d; d++) { + if (*d == '.') + *d = '_'; + } strncpyz(w->compare, nid, MAX_COMPARE_NAME); size_t len = strlen(w->compare); @@ -997,6 +1009,7 @@ static int read_apps_groups_conf(const char *path, const char *file) apps_groups_default_target = get_apps_groups_target("p+!o@w#e$i^r&7*5(-i)l-o_", NULL, "other"); // match nothing if(!apps_groups_default_target) fatal("Cannot create default target"); + apps_groups_default_target->is_other = true; // allow the user to override group 'other' if(apps_groups_default_target->target) @@ -1457,17 +1470,17 @@ cleanup: netdata_log_info( "FDS_LIMITS: PID %d (%s) is using " "%0.2f %% of its fds limits, " - "open fds = %llu (" - "files = %llu, " - "pipes = %llu, " - "sockets = %llu, " - "inotifies = %llu, " - "eventfds = %llu, " - "timerfds = %llu, " - "signalfds = %llu, " - "eventpolls = %llu " - "other = %llu " - "), open fds limit = %llu, " + "open fds = %"PRIu64 "(" + "files = %"PRIu64 ", " + "pipes = %"PRIu64 ", " + "sockets = %"PRIu64", " + "inotifies = %"PRIu64", " + "eventfds = %"PRIu64", " + "timerfds = %"PRIu64", " + "signalfds = %"PRIu64", " + "eventpolls = %"PRIu64" " + "other = %"PRIu64" " + "), open fds limit = %"PRIu64", " "%s, " "original line [%s]", p->pid, p->comm, p->openfds_limits_percent, all_fds, @@ -2460,7 +2473,7 @@ static inline int debug_print_process_and_parents(struct pid_stat *p, usec_t tim for(i = 0; i < indent ;i++) buffer[i] = ' '; buffer[i] = '\0'; - fprintf(stderr, " %s %s%s (%d %s %llu" + fprintf(stderr, " %s %s%s (%d %s %"PRIu64"" , buffer , prefix , p->comm @@ -3431,8 +3444,8 @@ static void calculate_netdata_statistics(void) { // ---------------------------------------------------------------------------- // update chart dimensions -static inline void send_BEGIN(const char *type, const char *id, usec_t usec) { - fprintf(stdout, "BEGIN %s.%s %llu\n", type, id, usec); +static inline void send_BEGIN(const char *type, const char *name,const char *metric, usec_t usec) { + fprintf(stdout, "BEGIN %s.%s_%s %" PRIu64 "\n", type, name, metric, usec); } static inline void send_SET(const char *name, kernel_uint_t value) { @@ -3440,7 +3453,7 @@ static inline void send_SET(const char *name, kernel_uint_t value) { } static inline void send_END(void) { - fprintf(stdout, "END\n"); + fprintf(stdout, "END\n\n"); } void send_resource_usage_to_netdata(usec_t dt) { @@ -3518,11 +3531,11 @@ void send_resource_usage_to_netdata(usec_t dt) { } fprintf(stdout, - "BEGIN netdata.apps_cpu %llu\n" - "SET user = %llu\n" - "SET system = %llu\n" + "BEGIN netdata.apps_cpu %"PRIu64"\n" + "SET user = %"PRIu64"\n" + "SET system = %"PRIu64"\n" "END\n" - "BEGIN netdata.apps_sizes %llu\n" + "BEGIN netdata.apps_sizes %"PRIu64"\n" "SET calls = %zu\n" "SET files = %zu\n" "SET filenames = %zu\n" @@ -3549,7 +3562,7 @@ void send_resource_usage_to_netdata(usec_t dt) { ); fprintf(stdout, - "BEGIN netdata.apps_fix %llu\n" + "BEGIN netdata.apps_fix %"PRIu64"\n" "SET utime = %u\n" "SET stime = %u\n" "SET gtime = %u\n" @@ -3566,7 +3579,7 @@ void send_resource_usage_to_netdata(usec_t dt) { if(include_exited_childs) fprintf(stdout, - "BEGIN netdata.apps_children_fix %llu\n" + "BEGIN netdata.apps_children_fix %"PRIu64"\n" "SET cutime = %u\n" "SET cstime = %u\n" "SET cgtime = %u\n" @@ -3736,249 +3749,104 @@ static void normalize_utilization(struct target *root) { static void send_collected_data_to_netdata(struct target *root, const char *type, usec_t dt) { struct target *w; - send_BEGIN(type, "cpu", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (kernel_uint_t)(w->utime * utime_fix_ratio) + (kernel_uint_t)(w->stime * stime_fix_ratio) + (kernel_uint_t)(w->gtime * gtime_fix_ratio) + (include_exited_childs?((kernel_uint_t)(w->cutime * cutime_fix_ratio) + (kernel_uint_t)(w->cstime * cstime_fix_ratio) + (kernel_uint_t)(w->cgtime * cgtime_fix_ratio)):0ULL)); - } - send_END(); - - send_BEGIN(type, "cpu_user", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (kernel_uint_t)(w->utime * utime_fix_ratio) + (include_exited_childs?((kernel_uint_t)(w->cutime * cutime_fix_ratio)):0ULL)); - } - send_END(); - - send_BEGIN(type, "cpu_system", dt); for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (kernel_uint_t)(w->stime * stime_fix_ratio) + (include_exited_childs?((kernel_uint_t)(w->cstime * cstime_fix_ratio)):0ULL)); - } - send_END(); + if (unlikely(!w->exposed && !w->is_other)) + continue; - if(show_guest_time) { - send_BEGIN(type, "cpu_guest", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (kernel_uint_t)(w->gtime * gtime_fix_ratio) + (include_exited_childs?((kernel_uint_t)(w->cgtime * cgtime_fix_ratio)):0ULL)); - } + send_BEGIN(type, w->clean_name, "processes", dt); + send_SET("processes", w->processes); send_END(); - } - -#ifndef __FreeBSD__ - send_BEGIN(type, "voluntary_ctxt_switches", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->status_voluntary_ctxt_switches); - } - send_END(); - - send_BEGIN(type, "involuntary_ctxt_switches", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->status_nonvoluntary_ctxt_switches); - } - send_END(); -#endif - - send_BEGIN(type, "threads", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - send_SET(w->name, w->num_threads); - } - send_END(); - send_BEGIN(type, "processes", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - send_SET(w->name, w->processes); - } - send_END(); - -#ifndef __FreeBSD__ - send_BEGIN(type, "uptime", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (global_uptime > w->starttime)?(global_uptime - w->starttime):0); - } - send_END(); - - if (enable_detailed_uptime_charts) { - send_BEGIN(type, "uptime_min", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->uptime_min); - } + send_BEGIN(type, w->clean_name, "threads", dt); + send_SET("threads", w->num_threads); send_END(); - send_BEGIN(type, "uptime_avg", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->uptime_sum / w->processes); - } - send_END(); + if (unlikely(!w->processes && !w->is_other)) + continue; - send_BEGIN(type, "uptime_max", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->uptime_max); - } + send_BEGIN(type, w->clean_name, "cpu_utilization", dt); + send_SET("user", (kernel_uint_t)(w->utime * utime_fix_ratio) + (include_exited_childs ? ((kernel_uint_t)(w->cutime * cutime_fix_ratio)) : 0ULL)); + send_SET("system", (kernel_uint_t)(w->stime * stime_fix_ratio) + (include_exited_childs ? ((kernel_uint_t)(w->cstime * cstime_fix_ratio)) : 0ULL)); send_END(); - } -#endif - - send_BEGIN(type, "mem", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (w->status_vmrss > w->status_vmshared)?(w->status_vmrss - w->status_vmshared):0ULL); - } - send_END(); - - send_BEGIN(type, "rss", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->status_vmrss); - } - send_END(); - - send_BEGIN(type, "vmem", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->status_vmsize); - } - send_END(); - -#ifndef __FreeBSD__ - send_BEGIN(type, "swap", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->status_vmswap); - } - send_END(); -#endif - - send_BEGIN(type, "minor_faults", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (kernel_uint_t)(w->minflt * minflt_fix_ratio) + (include_exited_childs?((kernel_uint_t)(w->cminflt * cminflt_fix_ratio)):0ULL)); - } - send_END(); - - send_BEGIN(type, "major_faults", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, (kernel_uint_t)(w->majflt * majflt_fix_ratio) + (include_exited_childs?((kernel_uint_t)(w->cmajflt * cmajflt_fix_ratio)):0ULL)); - } - send_END(); #ifndef __FreeBSD__ - send_BEGIN(type, "lreads", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->io_logical_bytes_read); - } - send_END(); - - send_BEGIN(type, "lwrites", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->io_logical_bytes_written); - } - send_END(); -#endif - - send_BEGIN(type, "preads", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->io_storage_bytes_read); - } - send_END(); - - send_BEGIN(type, "pwrites", dt); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed && w->processes)) - send_SET(w->name, w->io_storage_bytes_written); - } - send_END(); - - if(enable_file_charts) { - send_BEGIN(type, "fds_open_limit", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->max_open_files_percent * 100.0); + if (enable_guest_charts) { + send_BEGIN(type, w->clean_name, "cpu_guest_utilization", dt); + send_SET("guest", (kernel_uint_t)(w->gtime * gtime_fix_ratio) + (include_exited_childs ? ((kernel_uint_t)(w->cgtime * cgtime_fix_ratio)) : 0ULL)); + send_END(); } - send_END(); - send_BEGIN(type, "fds_open", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, pid_openfds_sum(w)); - } + send_BEGIN(type, w->clean_name, "cpu_context_switches", dt); + send_SET("voluntary", w->status_voluntary_ctxt_switches); + send_SET("involuntary", w->status_nonvoluntary_ctxt_switches); send_END(); - send_BEGIN(type, "fds_files", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.files); - } + send_BEGIN(type, w->clean_name, "mem_private_usage", dt); + send_SET("mem", (w->status_vmrss > w->status_vmshared)?(w->status_vmrss - w->status_vmshared) : 0ULL); send_END(); +#endif - send_BEGIN(type, "fds_sockets", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.sockets); - } + send_BEGIN(type, w->clean_name, "mem_usage", dt); + send_SET("rss", w->status_vmrss); send_END(); - send_BEGIN(type, "fds_pipes", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.pipes); - } + send_BEGIN(type, w->clean_name, "vmem_usage", dt); + send_SET("vmem", w->status_vmsize); send_END(); - send_BEGIN(type, "fds_inotifies", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.inotifies); - } + send_BEGIN(type, w->clean_name, "mem_page_faults", dt); + send_SET("minor", (kernel_uint_t)(w->minflt * minflt_fix_ratio) + (include_exited_childs ? ((kernel_uint_t)(w->cminflt * cminflt_fix_ratio)) : 0ULL)); + send_SET("major", (kernel_uint_t)(w->majflt * majflt_fix_ratio) + (include_exited_childs ? ((kernel_uint_t)(w->cmajflt * cmajflt_fix_ratio)) : 0ULL)); send_END(); - send_BEGIN(type, "fds_eventfds", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.eventfds); - } +#ifndef __FreeBSD__ + send_BEGIN(type, w->clean_name, "swap_usage", dt); + send_SET("swap", w->status_vmswap); send_END(); +#endif - send_BEGIN(type, "fds_timerfds", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.timerfds); - } +#ifndef __FreeBSD__ + send_BEGIN(type, w->clean_name, "uptime", dt); + send_SET("uptime", (global_uptime > w->starttime) ? (global_uptime - w->starttime) : 0); send_END(); - send_BEGIN(type, "fds_signalfds", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.signalfds); + if (enable_detailed_uptime_charts) { + send_BEGIN(type, w->clean_name, "uptime_summary", dt); + send_SET("min", w->uptime_min); + send_SET("avg", w->processes > 0 ? w->uptime_sum / w->processes : 0); + send_SET("max", w->uptime_max); + send_END(); } - send_END(); +#endif - send_BEGIN(type, "fds_eventpolls", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.eventpolls); - } + send_BEGIN(type, w->clean_name, "disk_physical_io", dt); + send_SET("reads", w->io_storage_bytes_read); + send_SET("writes", w->io_storage_bytes_written); send_END(); - send_BEGIN(type, "fds_other", dt); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed && w->processes)) - send_SET(w->name, w->openfds.other); - } +#ifndef __FreeBSD__ + send_BEGIN(type, w->clean_name, "disk_logical_io", dt); + send_SET("reads", w->io_logical_bytes_read); + send_SET("writes", w->io_logical_bytes_written); send_END(); +#endif + if (enable_file_charts) { + send_BEGIN(type, w->clean_name, "fds_open_limit", dt); + send_SET("limit", w->max_open_files_percent * 100.0); + send_END(); + + send_BEGIN(type, w->clean_name, "fds_open", dt); + send_SET("files", w->openfds.files); + send_SET("sockets", w->openfds.sockets); + send_SET("pipes", w->openfds.sockets); + send_SET("inotifies", w->openfds.inotifies); + send_SET("event", w->openfds.eventfds); + send_SET("timer", w->openfds.timerfds); + send_SET("signal", w->openfds.signalfds); + send_SET("eventpolls", w->openfds.eventpolls); + send_SET("other", w->openfds.other); + send_END(); + } } } @@ -3986,312 +3854,146 @@ static void send_collected_data_to_netdata(struct target *root, const char *type // ---------------------------------------------------------------------------- // generate the charts -static void send_charts_updates_to_netdata(struct target *root, const char *type, const char *title) +static void send_charts_updates_to_netdata(struct target *root, const char *type, const char *lbl_name, const char *title) { struct target *w; - int newly_added = 0; - - for(w = root ; w ; w = w->next) { - if (w->target) continue; - if(unlikely(w->processes && (debug_enabled || w->debug_enabled))) { - struct pid_on_target *pid_on_target; - - fprintf(stderr, "apps.plugin: target '%s' has aggregated %u process%s:", w->name, w->processes, (w->processes == 1)?"":"es"); - - for(pid_on_target = w->root_pid; pid_on_target; pid_on_target = pid_on_target->next) { - fprintf(stderr, " %d", pid_on_target->pid); + if (debug_enabled) { + for (w = root; w; w = w->next) { + if (unlikely(w->debug_enabled && !w->target && w->processes)) { + struct pid_on_target *pid_on_target; + fprintf(stderr, "apps.plugin: target '%s' has aggregated %u process(es):", w->name, w->processes); + for (pid_on_target = w->root_pid; pid_on_target; pid_on_target = pid_on_target->next) { + fprintf(stderr, " %d", pid_on_target->pid); + } + fputc('\n', stderr); } - - fputc('\n', stderr); - } - - if (!w->exposed && w->processes) { - newly_added++; - w->exposed = 1; - if (debug_enabled || w->debug_enabled) - debug_log_int("%s just added - regenerating charts.", w->name); } } - // nothing more to show - if(!newly_added && show_guest_time == show_guest_time_old) return; - - // we have something new to show - // update the charts - fprintf(stdout, "CHART %s.cpu '' '%s CPU Time (100%% = 1 core)' 'percentage' cpu %s.cpu stacked 20001 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu %s\n", w->name, time_factor * RATES_DETAIL / 100, w->hidden ? "hidden" : ""); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.mem '' '%s Real Memory (w/o shared)' 'MiB' mem %s.mem stacked 20003 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute %ld %ld\n", w->name, 1L, 1024L); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.rss '' '%s Resident Set Size (w/shared)' 'MiB' mem %s.rss stacked 20004 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute %ld %ld\n", w->name, 1L, 1024L); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.vmem '' '%s Virtual Memory Size' 'MiB' mem %s.vmem stacked 20005 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute %ld %ld\n", w->name, 1L, 1024L); - } - APPS_PLUGIN_FUNCTIONS(); + for (w = root; w; w = w->next) { + if (likely(w->exposed || (!w->processes && !w->is_other))) + continue; - fprintf(stdout, "CHART %s.threads '' '%s Threads' 'threads' processes %s.threads stacked 20006 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); + w->exposed = 1; - fprintf(stdout, "CHART %s.processes '' '%s Processes' 'processes' processes %s.processes stacked 20007 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); + fprintf(stdout, "CHART %s.%s_cpu_utilization '' '%s CPU utilization (100%% = 1 core)' 'percentage' cpu %s.cpu_utilization stacked 20001 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION user '' absolute 1 %llu\n", time_factor * RATES_DETAIL / 100LLU); + fprintf(stdout, "DIMENSION system '' absolute 1 %llu\n", time_factor * RATES_DETAIL / 100LLU); #ifndef __FreeBSD__ - fprintf(stdout, "CHART %s.uptime '' '%s Carried Over Uptime' 'seconds' processes %s.uptime line 20008 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - if (enable_detailed_uptime_charts) { - fprintf(stdout, "CHART %s.uptime_min '' '%s Minimum Uptime' 'seconds' processes %s.uptime_min line 20009 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.uptime_avg '' '%s Average Uptime' 'seconds' processes %s.uptime_avg line 20010 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.uptime_max '' '%s Maximum Uptime' 'seconds' processes %s.uptime_max line 20011 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - } + if (enable_guest_charts) { + fprintf(stdout, "CHART %s.%s_cpu_guest_utilization '' '%s CPU guest utlization (100%% = 1 core)' 'percentage' cpu %s.cpu_guest_utilization line 20005 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION guest '' absolute 1 %llu\n", time_factor * RATES_DETAIL / 100LLU); + } + + fprintf(stdout, "CHART %s.%s_cpu_context_switches '' '%s CPU context switches' 'switches/s' cpu %s.cpu_context_switches stacked 20010 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION voluntary '' absolute 1 %llu\n", RATES_DETAIL); + fprintf(stdout, "DIMENSION involuntary '' absolute 1 %llu\n", RATES_DETAIL); + + fprintf(stdout, "CHART %s.%s_mem_private_usage '' '%s memory usage without shared' 'MiB' mem %s.mem_private_usage area 20050 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION mem '' absolute %ld %ld\n", 1L, 1024L); #endif - fprintf(stdout, "CHART %s.cpu_user '' '%s CPU User Time (100%% = 1 core)' 'percentage' cpu %s.cpu_user stacked 20020 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, time_factor * RATES_DETAIL / 100LLU); - } - APPS_PLUGIN_FUNCTIONS(); + fprintf(stdout, "CHART %s.%s_mem_usage '' '%s memory RSS usage' 'MiB' mem %s.mem_usage area 20055 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION rss '' absolute %ld %ld\n", 1L, 1024L); - fprintf(stdout, "CHART %s.cpu_system '' '%s CPU System Time (100%% = 1 core)' 'percentage' cpu %s.cpu_system stacked 20021 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, time_factor * RATES_DETAIL / 100LLU); - } - APPS_PLUGIN_FUNCTIONS(); + fprintf(stdout, "CHART %s.%s_mem_page_faults '' '%s memory page faults' 'pgfaults/s' mem %s.mem_page_faults stacked 20060 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION major '' absolute 1 %llu\n", RATES_DETAIL); + fprintf(stdout, "DIMENSION minor '' absolute 1 %llu\n", RATES_DETAIL); - if(show_guest_time) { - fprintf(stdout, "CHART %s.cpu_guest '' '%s CPU Guest Time (100%% = 1 core)' 'percentage' cpu %s.cpu_guest stacked 20022 %d\n", type, title, type, update_every); - for (w = root; w; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, time_factor * RATES_DETAIL / 100LLU); - } - APPS_PLUGIN_FUNCTIONS(); - } + fprintf(stdout, "CHART %s.%s_vmem_usage '' '%s virtual memory size' 'MiB' mem %s.vmem_usage line 20065 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION vmem '' absolute %ld %ld\n", 1L, 1024L); #ifndef __FreeBSD__ - fprintf(stdout, "CHART %s.voluntary_ctxt_switches '' '%s Voluntary Context Switches' 'switches/s' cpu %s.voluntary_ctxt_switches stacked 20023 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.involuntary_ctxt_switches '' '%s Involuntary Context Switches' 'switches/s' cpu %s.involuntary_ctxt_switches stacked 20024 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); + fprintf(stdout, "CHART %s.%s_swap_usage '' '%s swap usage' 'MiB' mem %s.swap_usage area 20065 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION swap '' absolute %ld %ld\n", 1L, 1024L); #endif #ifndef __FreeBSD__ - fprintf(stdout, "CHART %s.swap '' '%s Swap Memory' 'MiB' swap %s.swap stacked 20011 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute %ld %ld\n", w->name, 1L, 1024L); - } - APPS_PLUGIN_FUNCTIONS(); -#endif - - fprintf(stdout, "CHART %s.major_faults '' '%s Major Page Faults (swap read)' 'page faults/s' swap %s.major_faults stacked 20012 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.minor_faults '' '%s Minor Page Faults' 'page faults/s' mem %s.minor_faults stacked 20011 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); - -#ifdef __FreeBSD__ - // FIXME: same metric name as in Linux but different units. - fprintf(stdout, "CHART %s.preads '' '%s Disk Reads' 'blocks/s' disk %s.preads stacked 20002 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.pwrites '' '%s Disk Writes' 'blocks/s' disk %s.pwrites stacked 20002 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); + fprintf(stdout, "CHART %s.%s_disk_physical_io '' '%s disk physical IO' 'KiB/s' disk %s.disk_physical_io area 20100 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION reads '' absolute 1 %llu\n", 1024LLU * RATES_DETAIL); + fprintf(stdout, "DIMENSION writes '' absolute -1 %llu\n", 1024LLU * RATES_DETAIL); + + fprintf(stdout, "CHART %s.%s_disk_logical_io '' '%s disk logical IO' 'KiB/s' disk %s.disk_logical_io area 20105 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION reads '' absolute 1 %llu\n", 1024LLU * RATES_DETAIL); + fprintf(stdout, "DIMENSION writes '' absolute -1 %llu\n", 1024LLU * RATES_DETAIL); #else - fprintf(stdout, "CHART %s.preads '' '%s Disk Reads' 'KiB/s' disk %s.preads stacked 20002 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, 1024LLU * RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.pwrites '' '%s Disk Writes' 'KiB/s' disk %s.pwrites stacked 20002 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, 1024LLU * RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.lreads '' '%s Disk Logical Reads' 'KiB/s' disk %s.lreads stacked 20042 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, 1024LLU * RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.lwrites '' '%s I/O Logical Writes' 'KiB/s' disk %s.lwrites stacked 20042 %d\n", type, title, type, update_every); - for (w = root; w ; w = w->next) { - if(unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 %llu\n", w->name, 1024LLU * RATES_DETAIL); - } - APPS_PLUGIN_FUNCTIONS(); + fprintf(stdout, "CHART %s.%s_disk_physical_io '' '%s disk physical IO' 'blocks/s' disk %s.disk_physical_block_io area 20100 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION reads '' absolute 1 %llu\n", RATES_DETAIL); + fprintf(stdout, "DIMENSION writes '' absolute -1 %llu\n", RATES_DETAIL); #endif - if(enable_file_charts) { - fprintf(stdout, "CHART %s.fds_open_limit '' '%s Open File Descriptors Limit' '%%' fds %s.fds_open_limit line 20050 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 100\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_open '' '%s Open File Descriptors' 'fds' fds %s.fds_open stacked 20051 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_files '' '%s Open Files' 'fds' fds %s.fds_files stacked 20052 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_sockets '' '%s Open Sockets' 'fds' fds %s.fds_sockets stacked 20053 %d\n", - type, title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_pipes '' '%s Pipes' 'fds' fds %s.fds_pipes stacked 20054 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_inotifies '' '%s iNotify File Descriptors' 'fds' fds %s.fds_inotifies stacked 20055 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_eventfds '' '%s Event File Descriptors' 'fds' fds %s.fds_eventfds stacked 20056 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_timerfds '' '%s Timer File Descriptors' 'fds' fds %s.fds_timerfds stacked 20057 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_signalfds '' '%s Signal File Descriptors' 'fds' fds %s.fds_signalfds stacked 20058 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); - - fprintf(stdout, "CHART %s.fds_eventpolls '' '%s Event Poll File Descriptors' 'fds' fds %s.fds_eventpolls stacked 20059 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); + fprintf(stdout, "CHART %s.%s_processes '' '%s processes' 'processes' processes %s.processes line 20150 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION processes '' absolute 1 1\n"); + + fprintf(stdout, "CHART %s.%s_threads '' '%s threads' 'threads' processes %s.threads line 20155 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION threads '' absolute 1 1\n"); + + if (enable_file_charts) { + fprintf(stdout, "CHART %s.%s_fds_open_limit '' '%s open file descriptors limit' '%%' fds %s.fds_open_limit line 20200 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION limit '' absolute 1 100\n"); + + fprintf(stdout, "CHART %s.%s_fds_open '' '%s open files descriptors' 'fds' fds %s.fds_open stacked 20210 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION files '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION sockets '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION pipes '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION inotifies '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION event '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION timer '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION signal '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION eventpolls '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION other '' absolute 1 1\n"); + } - fprintf(stdout, "CHART %s.fds_other '' '%s Other File Descriptors' 'fds' fds %s.fds_other stacked 20060 %d\n", type, - title, type, update_every); - for (w = root; w; w = w->next) { - if (unlikely(w->exposed)) - fprintf(stdout, "DIMENSION %s '' absolute 1 1\n", w->name); - } - APPS_PLUGIN_FUNCTIONS(); +#ifndef __FreeBSD__ + fprintf(stdout, "CHART %s.%s_uptime '' '%s uptime' 'seconds' uptime %s.uptime line 20250 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION uptime '' absolute 1 1\n"); + + if (enable_detailed_uptime_charts) { + fprintf(stdout, "CHART %s.%s_uptime_summary '' '%s uptime summary' 'seconds' uptime %s.uptime_summary area 20255 %d\n", type, w->clean_name, title, type, update_every); + fprintf(stdout, "CLABEL '%s' '%s' 0\n", lbl_name, w->name); + fprintf(stdout, "CLABEL_COMMIT\n"); + fprintf(stdout, "DIMENSION min '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION avg '' absolute 1 1\n"); + fprintf(stdout, "DIMENSION max '' absolute 1 1\n"); + } +#endif } } - #ifndef __FreeBSD__ static void send_proc_states_count(usec_t dt) { @@ -4310,7 +4012,7 @@ static void send_proc_states_count(usec_t dt) } // send process state count - send_BEGIN("system", "processes_state", dt); + fprintf(stdout, "BEGIN system.processes_state %" PRIu64 "\n", dt); for (proc_state i = PROC_STATUS_RUNNING; i < PROC_STATUS_END; i++) { send_SET(proc_states[i], proc_state_count[i]); } @@ -4575,7 +4277,7 @@ static int check_capabilities() { } #endif -static netdata_mutex_t mutex = NETDATA_MUTEX_INITIALIZER; +static netdata_mutex_t apps_and_stdout_mutex = NETDATA_MUTEX_INITIALIZER; #define PROCESS_FILTER_CATEGORY "category:" #define PROCESS_FILTER_USER "user:" @@ -4629,8 +4331,8 @@ static void get_MemTotal(void) { } static void apps_plugin_function_processes_help(const char *transaction) { - pluginsd_function_result_begin_to_stdout(transaction, HTTP_RESP_OK, "text/plain", now_realtime_sec() + 3600); - fprintf(stdout, "%s", + BUFFER *wb = buffer_create(0, NULL); + buffer_sprintf(wb, "%s", "apps.plugin / processes\n" "\n" "Function `processes` presents all the currently running processes of the system.\n" @@ -4660,7 +4362,9 @@ static void apps_plugin_function_processes_help(const char *transaction) { "\n" "Filters can be combined. Each filter can be given only one time.\n" ); - pluginsd_function_result_end_to_stdout(); + + pluginsd_function_result_to_stdout(transaction, HTTP_RESP_OK, "text/plain", now_realtime_sec() + 3600, wb); + buffer_free(wb); } #define add_value_field_llu_with_max(wb, key, value) do { \ @@ -4675,7 +4379,7 @@ static void apps_plugin_function_processes_help(const char *transaction) { buffer_json_add_array_item_double(wb, _tmp); \ } while(0) -static void function_processes(const char *transaction, char *function __maybe_unused, char *line_buffer __maybe_unused, int line_max __maybe_unused, int timeout __maybe_unused) { +static void function_processes(const char *transaction, char *function __maybe_unused, int timeout __maybe_unused, bool *cancelled __maybe_unused) { struct pid_stat *p; char *words[PLUGINSD_MAX_WORDS] = { NULL }; @@ -4696,21 +4400,24 @@ static void function_processes(const char *transaction, char *function __maybe_u if(!category && strncmp(keyword, PROCESS_FILTER_CATEGORY, strlen(PROCESS_FILTER_CATEGORY)) == 0) { category = find_target_by_name(apps_groups_root_target, &keyword[strlen(PROCESS_FILTER_CATEGORY)]); if(!category) { - pluginsd_function_json_error(transaction, HTTP_RESP_BAD_REQUEST, "No category with that name found."); + pluginsd_function_json_error_to_stdout(transaction, HTTP_RESP_BAD_REQUEST, + "No category with that name found."); return; } } else if(!user && strncmp(keyword, PROCESS_FILTER_USER, strlen(PROCESS_FILTER_USER)) == 0) { user = find_target_by_name(users_root_target, &keyword[strlen(PROCESS_FILTER_USER)]); if(!user) { - pluginsd_function_json_error(transaction, HTTP_RESP_BAD_REQUEST, "No user with that name found."); + pluginsd_function_json_error_to_stdout(transaction, HTTP_RESP_BAD_REQUEST, + "No user with that name found."); return; } } else if(strncmp(keyword, PROCESS_FILTER_GROUP, strlen(PROCESS_FILTER_GROUP)) == 0) { group = find_target_by_name(groups_root_target, &keyword[strlen(PROCESS_FILTER_GROUP)]); if(!group) { - pluginsd_function_json_error(transaction, HTTP_RESP_BAD_REQUEST, "No group with that name found."); + pluginsd_function_json_error_to_stdout(transaction, HTTP_RESP_BAD_REQUEST, + "No group with that name found."); return; } } @@ -4736,13 +4443,12 @@ static void function_processes(const char *transaction, char *function __maybe_u else { char msg[PLUGINSD_LINE_MAX]; snprintfz(msg, PLUGINSD_LINE_MAX, "Invalid parameter '%s'", keyword); - pluginsd_function_json_error(transaction, HTTP_RESP_BAD_REQUEST, msg); + pluginsd_function_json_error_to_stdout(transaction, HTTP_RESP_BAD_REQUEST, msg); return; } } time_t expires = now_realtime_sec() + update_every; - pluginsd_function_result_begin_to_stdout(transaction, HTTP_RESP_OK, "application/json", expires); unsigned int cpu_divisor = time_factor * RATES_DETAIL / 100; unsigned int memory_divisor = 1024; @@ -5096,13 +4802,13 @@ static void function_processes(const char *transaction, char *function __maybe_u RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 2, "KiB/s", LReads_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "LWrites", "Logical I/O Writes", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 2, "KiB/s", LWrites_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); #endif // I/O calls @@ -5110,12 +4816,12 @@ static void function_processes(const char *transaction, char *function __maybe_u RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 2, "calls/s", RCalls_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "WCalls", "I/O Write Calls", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 2, "calls/s", WCalls_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); // minor page faults buffer_rrdf_table_add_field(wb, field_id++, "MinFlt", "Minor Page Faults/s", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, @@ -5153,7 +4859,7 @@ static void function_processes(const char *transaction, char *function __maybe_u RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 2, "pgflts/s", TMajFlt_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); // open file descriptors buffer_rrdf_table_add_field(wb, field_id++, "FDsLimitPercent", "Percentage of Open Descriptors vs Limits", @@ -5165,24 +4871,24 @@ static void function_processes(const char *transaction, char *function __maybe_u RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 0, "fds", FDs_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "Files", "Open Files", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 0, "fds", Files_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "Pipes", "Open Pipes", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 0, "fds", Pipes_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "Sockets", "Open Sockets", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 0, "fds", Sockets_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "iNotiFDs", "Open iNotify Descriptors", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 0, "fds", iNotiFDs_max, RRDF_FIELD_SORT_DESCENDING, @@ -5219,12 +4925,12 @@ static void function_processes(const char *transaction, char *function __maybe_u RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 0, "processes", Processes_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "Threads", "Threads", RRDF_FIELD_TYPE_BAR_WITH_INTEGER, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_NUMBER, 0, "threads", Threads_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_SUM, RRDF_FIELD_FILTER_RANGE, - RRDF_FIELD_OPTS_VISIBLE, NULL); + RRDF_FIELD_OPTS_NONE, NULL); buffer_rrdf_table_add_field(wb, field_id++, "Uptime", "Uptime in seconds", RRDF_FIELD_TYPE_DURATION, RRDF_FIELD_VISUAL_BAR, RRDF_FIELD_TRANSFORM_DURATION_S, 2, "seconds", Uptime_max, RRDF_FIELD_SORT_DESCENDING, NULL, RRDF_FIELD_SUMMARY_MAX, @@ -5520,69 +5226,13 @@ static void function_processes(const char *transaction, char *function __maybe_u buffer_json_member_add_time_t(wb, "expires", expires); buffer_json_finalize(wb); - fwrite(buffer_tostring(wb), buffer_strlen(wb), 1, stdout); - buffer_free(wb); + pluginsd_function_result_to_stdout(transaction, HTTP_RESP_OK, "application/json", expires, wb); - pluginsd_function_result_end_to_stdout(); + buffer_free(wb); } static bool apps_plugin_exit = false; -static void *reader_main(void *arg __maybe_unused) { - char buffer[PLUGINSD_LINE_MAX + 1]; - - char *s = NULL; - while(!apps_plugin_exit && (s = fgets(buffer, PLUGINSD_LINE_MAX, stdin))) { - - char *words[PLUGINSD_MAX_WORDS] = { NULL }; - size_t num_words = quoted_strings_splitter_pluginsd(buffer, words, PLUGINSD_MAX_WORDS); - - const char *keyword = get_word(words, num_words, 0); - - if(keyword && strcmp(keyword, PLUGINSD_KEYWORD_FUNCTION) == 0) { - char *transaction = get_word(words, num_words, 1); - char *timeout_s = get_word(words, num_words, 2); - char *function = get_word(words, num_words, 3); - - if(!transaction || !*transaction || !timeout_s || !*timeout_s || !function || !*function) { - netdata_log_error("Received incomplete %s (transaction = '%s', timeout = '%s', function = '%s'). Ignoring it.", - keyword, - transaction?transaction:"(unset)", - timeout_s?timeout_s:"(unset)", - function?function:"(unset)"); - } - else { - int timeout = str2i(timeout_s); - if(timeout <= 0) timeout = PLUGINS_FUNCTIONS_TIMEOUT_DEFAULT; - -// internal_error(true, "Received function '%s', transaction '%s', timeout %d", function, transaction, timeout); - - netdata_mutex_lock(&mutex); - - if(strncmp(function, "processes", strlen("processes")) == 0) - function_processes(transaction, function, buffer, PLUGINSD_LINE_MAX + 1, timeout); - else - pluginsd_function_json_error(transaction, HTTP_RESP_NOT_FOUND, "No function with this name found in apps.plugin."); - - fflush(stdout); - netdata_mutex_unlock(&mutex); - -// internal_error(true, "Done with function '%s', transaction '%s', timeout %d", function, transaction, timeout); - } - } - else - netdata_log_error("Received unknown command: %s", keyword?keyword:"(unset)"); - } - - if(!s || feof(stdin) || ferror(stdin)) { - apps_plugin_exit = true; - netdata_log_error("Received error on stdin."); - } - - exit(1); - return NULL; -} - int main(int argc, char **argv) { // debug_flags = D_PROCFILE; stderror = stderr; @@ -5601,6 +5251,8 @@ int main(int argc, char **argv) { error_log_errors_per_period = 100; error_log_throttle_period = 3600; + log_set_global_severity_for_external_plugins(); + bool send_resource_usage = true; { const char *s = getenv("NETDATA_INTERNALS_MONITORING"); @@ -5686,10 +5338,17 @@ int main(int argc, char **argv) { all_pids = callocz(sizeof(struct pid_stat *), (size_t) pid_max + 1); - netdata_thread_t reader_thread; - netdata_thread_create(&reader_thread, "APPS_READER", NETDATA_THREAD_OPTION_DONT_LOG, reader_main, NULL); - netdata_mutex_lock(&mutex); + // ------------------------------------------------------------------------ + // the event loop for functions + + struct functions_evloop_globals *wg = + functions_evloop_init(1, "APPS", &apps_and_stdout_mutex, &apps_plugin_exit); + + functions_evloop_add_function(wg, "processes", function_processes, PLUGINS_FUNCTIONS_TIMEOUT_DEFAULT); + + // ------------------------------------------------------------------------ + netdata_mutex_lock(&apps_and_stdout_mutex); APPS_PLUGIN_GLOBAL_FUNCTIONS(); usec_t step = update_every * USEC_PER_SEC; @@ -5697,7 +5356,7 @@ int main(int argc, char **argv) { heartbeat_t hb; heartbeat_init(&hb); for(; !apps_plugin_exit ; global_iterations_counter++) { - netdata_mutex_unlock(&mutex); + netdata_mutex_unlock(&apps_and_stdout_mutex); #ifdef NETDATA_PROFILING #warning "compiling for profiling" @@ -5708,17 +5367,15 @@ int main(int argc, char **argv) { #else usec_t dt = heartbeat_next(&hb, step); #endif - netdata_mutex_lock(&mutex); + netdata_mutex_lock(&apps_and_stdout_mutex); struct pollfd pollfd = { .fd = fileno(stdout), .events = POLLERR }; if (unlikely(poll(&pollfd, 1, 0) < 0)) { - netdata_mutex_unlock(&mutex); - netdata_thread_cancel(reader_thread); + netdata_mutex_unlock(&apps_and_stdout_mutex); fatal("Cannot check if a pipe is available"); } if (unlikely(pollfd.revents & POLLERR)) { - netdata_mutex_unlock(&mutex); - netdata_thread_cancel(reader_thread); + netdata_mutex_unlock(&apps_and_stdout_mutex); fatal("Received error on read pipe."); } @@ -5728,8 +5385,7 @@ int main(int argc, char **argv) { if(!collect_data_for_all_processes()) { netdata_log_error("Cannot collect /proc data for running processes. Disabling apps.plugin..."); printf("DISABLE\n"); - netdata_mutex_unlock(&mutex); - netdata_thread_cancel(reader_thread); + netdata_mutex_unlock(&apps_and_stdout_mutex); exit(1); } @@ -5743,21 +5399,18 @@ int main(int argc, char **argv) { send_proc_states_count(dt); #endif - // this is smart enough to show only newly added apps, when needed - send_charts_updates_to_netdata(apps_groups_root_target, "apps", "Apps"); - if(likely(enable_users_charts)) - send_charts_updates_to_netdata(users_root_target, "users", "Users"); + send_charts_updates_to_netdata(apps_groups_root_target, "app", "app_group", "Apps"); + send_collected_data_to_netdata(apps_groups_root_target, "app", dt); - if(likely(enable_groups_charts)) - send_charts_updates_to_netdata(groups_root_target, "groups", "User Groups"); - - send_collected_data_to_netdata(apps_groups_root_target, "apps", dt); - - if(likely(enable_users_charts)) - send_collected_data_to_netdata(users_root_target, "users", dt); + if (enable_users_charts) { + send_charts_updates_to_netdata(users_root_target, "user", "user", "Users"); + send_collected_data_to_netdata(users_root_target, "user", dt); + } - if(likely(enable_groups_charts)) - send_collected_data_to_netdata(groups_root_target, "groups", dt); + if (enable_groups_charts) { + send_charts_updates_to_netdata(groups_root_target, "usergroup", "user_group", "User Groups"); + send_collected_data_to_netdata(groups_root_target, "usergroup", dt); + } fflush(stdout); @@ -5765,5 +5418,5 @@ int main(int argc, char **argv) { debug_log("done Loop No %zu", global_iterations_counter); } - netdata_mutex_unlock(&mutex); + netdata_mutex_unlock(&apps_and_stdout_mutex); } diff --git a/collectors/apps.plugin/integrations/applications.md b/collectors/apps.plugin/integrations/applications.md new file mode 100644 index 000000000..f4bbc8733 --- /dev/null +++ b/collectors/apps.plugin/integrations/applications.md @@ -0,0 +1,113 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/integrations/applications.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/metadata.yaml" +sidebar_label: "Applications" +learn_status: "Published" +learn_rel_path: "Data Collection/Processes and System Services" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Applications + + +<img src="https://netdata.cloud/img/applications.svg" width="150"/> + + +Plugin: apps.plugin +Module: apps + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Applications for optimal software performance and resource usage. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per applications group + +These metrics refer to the application group. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| app_group | The name of the group defined in the configuration. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| app_group.cpu_utilization | user, system | percentage | +| app_group.cpu_guest_utilization | guest | percentage | +| app_group.cpu_context_switches | voluntary, involuntary | switches/s | +| app_group.mem_usage | rss | MiB | +| app_group.mem_private_usage | mem | MiB | +| app_group.vmem_usage | vmem | MiB | +| app_group.mem_page_faults | minor, major | pgfaults/s | +| app_group.swap_usage | swap | MiB | +| app_group.disk_physical_io | reads, writes | KiB/s | +| app_group.disk_logical_io | reads, writes | KiB/s | +| app_group.processes | processes | processes | +| app_group.threads | threads | threads | +| app_group.fds_open_limit | limit | percentage | +| app_group.fds_open | files, sockets, pipes, inotifies, event, timer, signal, eventpolls, other | fds | +| app_group.uptime | uptime | seconds | +| app_group.uptime_summary | min, avg, max | seconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/apps.plugin/integrations/user_groups.md b/collectors/apps.plugin/integrations/user_groups.md new file mode 100644 index 000000000..6f56d7be6 --- /dev/null +++ b/collectors/apps.plugin/integrations/user_groups.md @@ -0,0 +1,113 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/integrations/user_groups.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/metadata.yaml" +sidebar_label: "User Groups" +learn_status: "Published" +learn_rel_path: "Data Collection/Processes and System Services" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# User Groups + + +<img src="https://netdata.cloud/img/user.svg" width="150"/> + + +Plugin: apps.plugin +Module: groups + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors resource utilization on a user groups context. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per user group + +These metrics refer to the user group. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| user_group | The name of the user group. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| usergroup.cpu_utilization | user, system | percentage | +| usergroup.cpu_guest_utilization | guest | percentage | +| usergroup.cpu_context_switches | voluntary, involuntary | switches/s | +| usergroup.mem_usage | rss | MiB | +| usergroup.mem_private_usage | mem | MiB | +| usergroup.vmem_usage | vmem | MiB | +| usergroup.mem_page_faults | minor, major | pgfaults/s | +| usergroup.swap_usage | swap | MiB | +| usergroup.disk_physical_io | reads, writes | KiB/s | +| usergroup.disk_logical_io | reads, writes | KiB/s | +| usergroup.processes | processes | processes | +| usergroup.threads | threads | threads | +| usergroup.fds_open_limit | limit | percentage | +| usergroup.fds_open | files, sockets, pipes, inotifies, event, timer, signal, eventpolls, other | fds | +| usergroup.uptime | uptime | seconds | +| usergroup.uptime_summary | min, avg, max | seconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/apps.plugin/integrations/users.md b/collectors/apps.plugin/integrations/users.md new file mode 100644 index 000000000..f325f05f6 --- /dev/null +++ b/collectors/apps.plugin/integrations/users.md @@ -0,0 +1,113 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/integrations/users.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/apps.plugin/metadata.yaml" +sidebar_label: "Users" +learn_status: "Published" +learn_rel_path: "Data Collection/Processes and System Services" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Users + + +<img src="https://netdata.cloud/img/users.svg" width="150"/> + + +Plugin: apps.plugin +Module: users + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors resource utilization on a user context. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per user + +These metrics refer to the user. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| user | The name of the user. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| user.cpu_utilization | user, system | percentage | +| user.cpu_guest_utilization | guest | percentage | +| user.cpu_context_switches | voluntary, involuntary | switches/s | +| user.mem_usage | rss | MiB | +| user.mem_private_usage | mem | MiB | +| user.vmem_usage | vmem | MiB | +| user.mem_page_faults | minor, major | pgfaults/s | +| user.swap_usage | swap | MiB | +| user.disk_physical_io | reads, writes | KiB/s | +| user.disk_logical_io | reads, writes | KiB/s | +| user.processes | processes | processes | +| user.threads | threads | threads | +| user.fds_open_limit | limit | percentage | +| user.fds_open | files, sockets, pipes, inotifies, event, timer, signal, eventpolls, other | fds | +| user.uptime | uptime | seconds | +| user.uptime_summary | min, avg, max | seconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/apps.plugin/metadata.yaml b/collectors/apps.plugin/metadata.yaml index 9794a5ea2..f24160ba7 100644 --- a/collectors/apps.plugin/metadata.yaml +++ b/collectors/apps.plugin/metadata.yaml @@ -67,160 +67,123 @@ modules: description: "" availability: [] scopes: - - name: global - description: "" - labels: [] + - name: applications group + description: These metrics refer to the application group. + labels: + - name: app_group + description: The name of the group defined in the configuration. metrics: - - name: apps.cpu - description: Apps CPU Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.cpu_user - description: Apps CPU User Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.cpu_system - description: Apps CPU System Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.cpu_guest - description: Apps CPU Guest Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.mem - description: Apps Real Memory (w/o shared) - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.rss - description: Apps Resident Set Size (w/shared) - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.vmem - description: Apps Virtual Memory Size - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.swap - description: Apps Swap Memory - unit: "MiB" + - name: app_group.cpu_utilization + description: Apps CPU utilization (100% = 1 core) + unit: percentage chart_type: stacked dimensions: - - name: a dimension per app group - - name: apps.major_faults - description: Apps Major Page Faults (swap read) - unit: "page faults/s" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.minor_faults - description: Apps Minor Page Faults (swap read) - unit: "page faults/s" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.preads - description: Apps Disk Reads - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.pwrites - description: Apps Disk Writes - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.lreads - description: Apps Disk Logical Reads - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.lwrites - description: Apps I/O Logical Writes - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.threads - description: Apps Threads - unit: "threads" - chart_type: stacked + - name: user + - name: system + - name: app_group.cpu_guest_utilization + description: Apps CPU guest utilization (100% = 1 core) + unit: percentage + chart_type: line dimensions: - - name: a dimension per app group - - name: apps.processes - description: Apps Processes - unit: "processes" + - name: guest + - name: app_group.cpu_context_switches + description: Apps CPU context switches + unit: switches/s chart_type: stacked dimensions: - - name: a dimension per app group - - name: apps.voluntary_ctxt_switches - description: Apps Voluntary Context Switches - unit: "processes" - chart_type: stacked + - name: voluntary + - name: involuntary + - name: app_group.mem_usage + description: Apps memory RSS usage + unit: MiB + chart_type: line dimensions: - - name: a dimension per app group - - name: apps.involuntary_ctxt_switches - description: Apps Involuntary Context Switches - unit: "processes" + - name: rss + - name: app_group.mem_private_usage + description: Apps memory usage without shared + unit: MiB chart_type: stacked dimensions: - - name: a dimension per app group - - name: apps.uptime - description: Apps Carried Over Uptime - unit: "seconds" + - name: mem + - name: app_group.vmem_usage + description: Apps virtual memory size + unit: MiB chart_type: line dimensions: - - name: a dimension per app group - - name: apps.uptime_min - description: Apps Minimum Uptime - unit: "seconds" + - name: vmem + - name: app_group.mem_page_faults + description: Apps memory page faults + unit: pgfaults/s + chart_type: stacked + dimensions: + - name: minor + - name: major + - name: app_group.swap_usage + description: Apps swap usage + unit: MiB + chart_type: area + dimensions: + - name: swap + - name: app_group.disk_physical_io + description: Apps disk physical IO + unit: KiB/s + chart_type: area + dimensions: + - name: reads + - name: writes + - name: app_group.disk_logical_io + description: Apps disk logical IO + unit: KiB/s + chart_type: area + dimensions: + - name: reads + - name: writes + - name: app_group.processes + description: Apps processes + unit: processes chart_type: line dimensions: - - name: a dimension per app group - - name: apps.uptime_avg - description: Apps Average Uptime - unit: "seconds" + - name: processes + - name: app_group.threads + description: Apps threads + unit: threads chart_type: line dimensions: - - name: a dimension per app group - - name: apps.uptime_max - description: Apps Maximum Uptime - unit: "seconds" + - name: threads + - name: app_group.fds_open_limit + description: Apps open file descriptors limit + unit: percentage chart_type: line dimensions: - - name: a dimension per app group - - name: apps.files - description: Apps Open Files - unit: "open files" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: apps.sockets - description: Apps Open Sockets - unit: "open sockets" - chart_type: stacked + - name: limit + - name: app_group.fds_open + description: Apps open file descriptors + unit: fds + chart_type: stacked + dimensions: + - name: files + - name: sockets + - name: pipes + - name: inotifies + - name: event + - name: timer + - name: signal + - name: eventpolls + - name: other + - name: app_group.uptime + description: Apps uptime + unit: seconds + chart_type: line dimensions: - - name: a dimension per app group - - name: apps.pipes - description: Apps Open Pipes - unit: "open pipes" - chart_type: stacked + - name: uptime + - name: app_group.uptime_summary + description: Apps uptime summary + unit: seconds + chart_type: area dimensions: - - name: a dimension per app group + - name: min + - name: avg + - name: max - meta: plugin_name: apps.plugin module_name: groups @@ -289,160 +252,123 @@ modules: description: "" availability: [] scopes: - - name: global - description: "" - labels: [] + - name: user group + description: These metrics refer to the user group. + labels: + - name: user_group + description: The name of the user group. metrics: - - name: groups.cpu - description: User Groups CPU Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.cpu_user - description: User Groups CPU User Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.cpu_system - description: User Groups CPU System Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.cpu_guest - description: User Groups CPU Guest Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.mem - description: User Groups Real Memory (w/o shared) - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.rss - description: User Groups Resident Set Size (w/shared) - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.vmem - description: User Groups Virtual Memory Size - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.swap - description: User Groups Swap Memory - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.major_faults - description: User Groups Major Page Faults (swap read) - unit: "page faults/s" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.minor_faults - description: User Groups Page Faults (swap read) - unit: "page faults/s" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.preads - description: User Groups Disk Reads - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.pwrites - description: User Groups Disk Writes - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.lreads - description: User Groups Disk Logical Reads - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.lwrites - description: User Groups I/O Logical Writes - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.threads - description: User Groups Threads - unit: "threads" + - name: usergroup.cpu_utilization + description: User Groups CPU utilization (100% = 1 core) + unit: percentage chart_type: stacked dimensions: - - name: a dimension per user group - - name: groups.processes - description: User Groups Processes - unit: "processes" - chart_type: stacked - dimensions: - - name: a dimension per user group - - name: groups.voluntary_ctxt_switches - description: User Groups Voluntary Context Switches - unit: "processes" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: groups.involuntary_ctxt_switches - description: User Groups Involuntary Context Switches - unit: "processes" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: groups.uptime - description: User Groups Carried Over Uptime - unit: "seconds" + - name: user + - name: system + - name: usergroup.cpu_guest_utilization + description: User Groups CPU guest utilization (100% = 1 core) + unit: percentage chart_type: line dimensions: - - name: a dimension per user group - - name: groups.uptime_min - description: User Groups Minimum Uptime - unit: "seconds" + - name: guest + - name: usergroup.cpu_context_switches + description: User Groups CPU context switches + unit: switches/s + chart_type: stacked + dimensions: + - name: voluntary + - name: involuntary + - name: usergroup.mem_usage + description: User Groups memory RSS usage + unit: MiB + chart_type: area + dimensions: + - name: rss + - name: usergroup.mem_private_usage + description: User Groups memory usage without shared + unit: MiB + chart_type: area + dimensions: + - name: mem + - name: usergroup.vmem_usage + description: User Groups virtual memory size + unit: MiB chart_type: line dimensions: - - name: a dimension per user group - - name: groups.uptime_avg - description: User Groups Average Uptime - unit: "seconds" + - name: vmem + - name: usergroup.mem_page_faults + description: User Groups memory page faults + unit: pgfaults/s + chart_type: stacked + dimensions: + - name: minor + - name: major + - name: usergroup.swap_usage + description: User Groups swap usage + unit: MiB + chart_type: area + dimensions: + - name: swap + - name: usergroup.disk_physical_io + description: User Groups disk physical IO + unit: KiB/s + chart_type: area + dimensions: + - name: reads + - name: writes + - name: usergroup.disk_logical_io + description: User Groups disk logical IO + unit: KiB/s + chart_type: area + dimensions: + - name: reads + - name: writes + - name: usergroup.processes + description: User Groups processes + unit: processes chart_type: line dimensions: - - name: a dimension per user group - - name: groups.uptime_max - description: User Groups Maximum Uptime - unit: "seconds" + - name: processes + - name: usergroup.threads + description: User Groups threads + unit: threads chart_type: line dimensions: - - name: a dimension per user group - - name: groups.files - description: User Groups Open Files - unit: "open files" - chart_type: stacked + - name: threads + - name: usergroup.fds_open_limit + description: User Groups open file descriptors limit + unit: percentage + chart_type: line dimensions: - - name: a dimension per user group - - name: groups.sockets - description: User Groups Open Sockets - unit: "open sockets" - chart_type: stacked + - name: limit + - name: usergroup.fds_open + description: User Groups open file descriptors + unit: fds + chart_type: stacked + dimensions: + - name: files + - name: sockets + - name: pipes + - name: inotifies + - name: event + - name: timer + - name: signal + - name: eventpolls + - name: other + - name: usergroup.uptime + description: User Groups uptime + unit: seconds + chart_type: line dimensions: - - name: a dimension per user group - - name: groups.pipes - description: User Groups Open Pipes - unit: "open pipes" - chart_type: stacked + - name: uptime + - name: usergroup.uptime_summary + description: User Groups uptime summary + unit: seconds + chart_type: area dimensions: - - name: a dimension per user group + - name: min + - name: avg + - name: max - meta: plugin_name: apps.plugin module_name: users @@ -509,157 +435,120 @@ modules: description: "" availability: [] scopes: - - name: global - description: "" - labels: [] + - name: user + description: These metrics refer to the user. + labels: + - name: user + description: The name of the user. metrics: - - name: users.cpu - description: Users CPU Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.cpu_user - description: Users CPU User Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.cpu_system - description: Users CPU System Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.cpu_guest - description: Users CPU Guest Time (100% = 1 core) - unit: "percentage" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.mem - description: Users Real Memory (w/o shared) - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.rss - description: Users Resident Set Size (w/shared) - unit: "MiB" + - name: user.cpu_utilization + description: User CPU utilization (100% = 1 core) + unit: percentage chart_type: stacked dimensions: - - name: a dimension per user - - name: users.vmem - description: Users Virtual Memory Size - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.swap - description: Users Swap Memory - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.major_faults - description: Users Major Page Faults (swap read) - unit: "page faults/s" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.minor_faults - description: Users Page Faults (swap read) - unit: "page faults/s" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.preads - description: Users Disk Reads - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.pwrites - description: Users Disk Writes - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.lreads - description: Users Disk Logical Reads - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.lwrites - description: Users I/O Logical Writes - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.threads - description: Users Threads - unit: "threads" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.processes - description: Users Processes - unit: "processes" - chart_type: stacked - dimensions: - - name: a dimension per user - - name: users.voluntary_ctxt_switches - description: Users Voluntary Context Switches - unit: "processes" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: users.involuntary_ctxt_switches - description: Users Involuntary Context Switches - unit: "processes" - chart_type: stacked - dimensions: - - name: a dimension per app group - - name: users.uptime - description: Users Carried Over Uptime - unit: "seconds" + - name: user + - name: system + - name: user.cpu_guest_utilization + description: User CPU guest utilization (100% = 1 core) + unit: percentage chart_type: line dimensions: - - name: a dimension per user - - name: users.uptime_min - description: Users Minimum Uptime - unit: "seconds" + - name: guest + - name: user.cpu_context_switches + description: User CPU context switches + unit: switches/s + chart_type: stacked + dimensions: + - name: voluntary + - name: involuntary + - name: user.mem_usage + description: User memory RSS usage + unit: MiB + chart_type: area + dimensions: + - name: rss + - name: user.mem_private_usage + description: User memory usage without shared + unit: MiB + chart_type: area + dimensions: + - name: mem + - name: user.vmem_usage + description: User virtual memory size + unit: MiB chart_type: line dimensions: - - name: a dimension per user - - name: users.uptime_avg - description: Users Average Uptime - unit: "seconds" + - name: vmem + - name: user.mem_page_faults + description: User memory page faults + unit: pgfaults/s + chart_type: stacked + dimensions: + - name: minor + - name: major + - name: user.swap_usage + description: User swap usage + unit: MiB + chart_type: area + dimensions: + - name: swap + - name: user.disk_physical_io + description: User disk physical IO + unit: KiB/s + chart_type: area + dimensions: + - name: reads + - name: writes + - name: user.disk_logical_io + description: User disk logical IO + unit: KiB/s + chart_type: area + dimensions: + - name: reads + - name: writes + - name: user.processes + description: User processes + unit: processes chart_type: line dimensions: - - name: a dimension per user - - name: users.uptime_max - description: Users Maximum Uptime - unit: "seconds" + - name: processes + - name: user.threads + description: User threads + unit: threads chart_type: line dimensions: - - name: a dimension per user - - name: users.files - description: Users Open Files - unit: "open files" - chart_type: stacked + - name: threads + - name: user.fds_open_limit + description: User open file descriptors limit + unit: percentage + chart_type: line dimensions: - - name: a dimension per user - - name: users.sockets - description: Users Open Sockets - unit: "open sockets" - chart_type: stacked + - name: limit + - name: user.fds_open + description: User open file descriptors + unit: fds + chart_type: stacked + dimensions: + - name: files + - name: sockets + - name: pipes + - name: inotifies + - name: event + - name: timer + - name: signal + - name: eventpolls + - name: other + - name: user.uptime + description: User uptime + unit: seconds + chart_type: line dimensions: - - name: a dimension per user - - name: users.pipes - description: Users Open Pipes - unit: "open pipes" - chart_type: stacked + - name: uptime + - name: user.uptime_summary + description: User uptime summary + unit: seconds + chart_type: area dimensions: - - name: a dimension per user + - name: min + - name: avg + - name: max diff --git a/collectors/cgroups.plugin/README.md b/collectors/cgroups.plugin/README.md index 2e4fff230..ba6a20e5e 100644 --- a/collectors/cgroups.plugin/README.md +++ b/collectors/cgroups.plugin/README.md @@ -139,10 +139,10 @@ chart instead of `auto` to enable it permanently. For example: You can also set the `enable zero metrics` option to `yes` in the `[global]` section which enables charts with zero metrics for all internal Netdata plugins. -### Alarms +### Alerts -CPU and memory limits are watched and used to rise alarms. Memory usage for every cgroup is checked against `ram` -and `ram+swap` limits. CPU usage for every cgroup is checked against `cpuset.cpus` and `cpu.cfs_period_us` + `cpu.cfs_quota_us` pair assigned for the cgroup. Configuration for the alarms is available in `health.d/cgroups.conf` +CPU and memory limits are watched and used to rise alerts. Memory usage for every cgroup is checked against `ram` +and `ram+swap` limits. CPU usage for every cgroup is checked against `cpuset.cpus` and `cpu.cfs_period_us` + `cpu.cfs_quota_us` pair assigned for the cgroup. Configuration for the alerts is available in `health.d/cgroups.conf` file. ## Monitoring systemd services @@ -264,7 +264,7 @@ Network interfaces and cgroups (containers) are self-cleaned. When a network int a few errors in error.log complaining about files it cannot find, but immediately: 1. It will detect this is a removed container or network interface -2. It will freeze/pause all alarms for them +2. It will freeze/pause all alerts for them 3. It will mark their charts as obsolete 4. Obsolete charts are not be offered on new dashboard sessions (so hit F5 and the charts are gone) 5. Existing dashboard sessions will continue to see them, but of course they will not refresh diff --git a/collectors/cgroups.plugin/cgroup-name.sh b/collectors/cgroups.plugin/cgroup-name.sh index 6edd9d9f0..c0f3d0cb6 100755 --- a/collectors/cgroups.plugin/cgroup-name.sh +++ b/collectors/cgroups.plugin/cgroup-name.sh @@ -16,6 +16,21 @@ export LC_ALL=C PROGRAM_NAME="$(basename "${0}")" +LOG_LEVEL_ERR=1 +LOG_LEVEL_WARN=2 +LOG_LEVEL_INFO=3 +LOG_LEVEL="$LOG_LEVEL_INFO" + +set_log_severity_level() { + case ${NETDATA_LOG_SEVERITY_LEVEL,,} in + "info") LOG_LEVEL="$LOG_LEVEL_INFO";; + "warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";; + "err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";; + esac +} + +set_log_severity_level + logdate() { date "+%Y-%m-%d %H:%M:%S" } @@ -28,18 +43,21 @@ log() { } +info() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return + log INFO "${@}" +} + warning() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return log WARNING "${@}" } error() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return log ERROR "${@}" } -info() { - log INFO "${@}" -} - fatal() { log FATAL "${@}" exit 1 diff --git a/collectors/cgroups.plugin/cgroup-network-helper.sh b/collectors/cgroups.plugin/cgroup-network-helper.sh index 783332f73..008bc987f 100755 --- a/collectors/cgroups.plugin/cgroup-network-helper.sh +++ b/collectors/cgroups.plugin/cgroup-network-helper.sh @@ -31,6 +31,21 @@ export LC_ALL=C PROGRAM_NAME="$(basename "${0}")" +LOG_LEVEL_ERR=1 +LOG_LEVEL_WARN=2 +LOG_LEVEL_INFO=3 +LOG_LEVEL="$LOG_LEVEL_INFO" + +set_log_severity_level() { + case ${NETDATA_LOG_SEVERITY_LEVEL,,} in + "info") LOG_LEVEL="$LOG_LEVEL_INFO";; + "warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";; + "err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";; + esac +} + +set_log_severity_level + logdate() { date "+%Y-%m-%d %H:%M:%S" } @@ -43,18 +58,21 @@ log() { } +info() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return + log INFO "${@}" +} + warning() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return log WARNING "${@}" } error() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return log ERROR "${@}" } -info() { - log INFO "${@}" -} - fatal() { log FATAL "${@}" exit 1 diff --git a/collectors/cgroups.plugin/cgroup-network.c b/collectors/cgroups.plugin/cgroup-network.c index a490df394..b00f246bb 100644 --- a/collectors/cgroups.plugin/cgroup-network.c +++ b/collectors/cgroups.plugin/cgroup-network.c @@ -11,9 +11,11 @@ #endif char environment_variable2[FILENAME_MAX + 50] = ""; +char environment_variable3[FILENAME_MAX + 50] = ""; char *environment[] = { "PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin", environment_variable2, + environment_variable3, NULL }; @@ -671,6 +673,10 @@ int main(int argc, char **argv) { // the first environment variable is a fixed PATH= snprintfz(environment_variable2, sizeof(environment_variable2) - 1, "NETDATA_HOST_PREFIX=%s", netdata_configured_host_prefix); + char *s = getenv("NETDATA_LOG_SEVERITY_LEVEL"); + if (s) + snprintfz(environment_variable3, sizeof(environment_variable3) - 1, "NETDATA_LOG_SEVERITY_LEVEL=%s", s); + // ------------------------------------------------------------------------ if(argc == 2 && (!strcmp(argv[1], "version") || !strcmp(argv[1], "-version") || !strcmp(argv[1], "--version") || !strcmp(argv[1], "-v") || !strcmp(argv[1], "-V"))) { @@ -680,6 +686,8 @@ int main(int argc, char **argv) { if(argc != 3) usage(); + + log_set_global_severity_for_external_plugins(); int arg = 1; int helper = 1; diff --git a/collectors/cgroups.plugin/integrations/containers.md b/collectors/cgroups.plugin/integrations/containers.md new file mode 100644 index 000000000..6dec9ce2b --- /dev/null +++ b/collectors/cgroups.plugin/integrations/containers.md @@ -0,0 +1,166 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/containers.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "Containers" +learn_status: "Published" +learn_rel_path: "Data Collection/Containers and VMs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Containers + + +<img src="https://netdata.cloud/img/container.svg" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Containers for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.cpu_limit | used | percentage | +| cgroup.cpu | user, system | percentage | +| cgroup.cpu_per_core | a dimension per core | percentage | +| cgroup.throttled | throttled | percentage | +| cgroup.throttled_duration | duration | ms | +| cgroup.cpu_shares | shares | shares | +| cgroup.mem | cache, rss, swap, rss_huge, mapped_file | MiB | +| cgroup.writeback | dirty, writeback | MiB | +| cgroup.mem_activity | in, out | MiB/s | +| cgroup.pgfaults | pgfault, swap | MiB/s | +| cgroup.mem_usage | ram, swap | MiB | +| cgroup.mem_usage_limit | available, used | MiB | +| cgroup.mem_utilization | utilization | percentage | +| cgroup.mem_failcnt | failures | count | +| cgroup.io | read, write | KiB/s | +| cgroup.serviced_ops | read, write | operations/s | +| cgroup.throttle_io | read, write | KiB/s | +| cgroup.throttle_serviced_ops | read, write | operations/s | +| cgroup.queued_ops | read, write | operations | +| cgroup.merged_ops | read, write | operations/s | +| cgroup.cpu_some_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_some_pressure_stall_time | time | ms | +| cgroup.cpu_full_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_full_pressure_stall_time | time | ms | +| cgroup.memory_some_pressure | some10, some60, some300 | percentage | +| cgroup.memory_some_pressure_stall_time | time | ms | +| cgroup.memory_full_pressure | some10, some60, some300 | percentage | +| cgroup.memory_full_pressure_stall_time | time | ms | +| cgroup.io_some_pressure | some10, some60, some300 | percentage | +| cgroup.io_some_pressure_stall_time | time | ms | +| cgroup.io_full_pressure | some10, some60, some300 | percentage | +| cgroup.io_full_pressure_stall_time | time | ms | + +### Per cgroup network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | +| device | TBD | +| interface_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.net_net | received, sent | kilobits/s | +| cgroup.net_packets | received, sent, multicast | pps | +| cgroup.net_errors | inbound, outbound | errors/s | +| cgroup.net_drops | inbound, outbound | errors/s | +| cgroup.net_fifo | receive, transmit | errors/s | +| cgroup.net_compressed | receive, sent | pps | +| cgroup.net_events | frames, collisions, carrier | events/s | +| cgroup.net_operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| cgroup.net_carrier | up, down | state | +| cgroup.net_mtu | mtu | octets | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ cgroup_10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.cpu_limit | average cgroup CPU utilization over the last 10 minutes | +| [ cgroup_ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.mem_usage | cgroup memory utilization | +| [ cgroup_1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ cgroup_10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/integrations/kubernetes_containers.md b/collectors/cgroups.plugin/integrations/kubernetes_containers.md new file mode 100644 index 000000000..4bfa55c6d --- /dev/null +++ b/collectors/cgroups.plugin/integrations/kubernetes_containers.md @@ -0,0 +1,184 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/kubernetes_containers.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "Kubernetes Containers" +learn_status: "Published" +learn_rel_path: "Data Collection/Kubernetes" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Kubernetes Containers + + +<img src="https://netdata.cloud/img/kubernetes.svg" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Containers for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per k8s cgroup + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| k8s_namespace | TBD | +| k8s_pod_name | TBD | +| k8s_pod_uid | TBD | +| k8s_controller_kind | TBD | +| k8s_controller_name | TBD | +| k8s_node_name | TBD | +| k8s_container_name | TBD | +| k8s_container_id | TBD | +| k8s_kind | TBD | +| k8s_qos_class | TBD | +| k8s_cluster_id | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| k8s.cgroup.cpu_limit | used | percentage | +| k8s.cgroup.cpu | user, system | percentage | +| k8s.cgroup.cpu_per_core | a dimension per core | percentage | +| k8s.cgroup.throttled | throttled | percentage | +| k8s.cgroup.throttled_duration | duration | ms | +| k8s.cgroup.cpu_shares | shares | shares | +| k8s.cgroup.mem | cache, rss, swap, rss_huge, mapped_file | MiB | +| k8s.cgroup.writeback | dirty, writeback | MiB | +| k8s.cgroup.mem_activity | in, out | MiB/s | +| k8s.cgroup.pgfaults | pgfault, swap | MiB/s | +| k8s.cgroup.mem_usage | ram, swap | MiB | +| k8s.cgroup.mem_usage_limit | available, used | MiB | +| k8s.cgroup.mem_utilization | utilization | percentage | +| k8s.cgroup.mem_failcnt | failures | count | +| k8s.cgroup.io | read, write | KiB/s | +| k8s.cgroup.serviced_ops | read, write | operations/s | +| k8s.cgroup.throttle_io | read, write | KiB/s | +| k8s.cgroup.throttle_serviced_ops | read, write | operations/s | +| k8s.cgroup.queued_ops | read, write | operations | +| k8s.cgroup.merged_ops | read, write | operations/s | +| k8s.cgroup.cpu_some_pressure | some10, some60, some300 | percentage | +| k8s.cgroup.cpu_some_pressure_stall_time | time | ms | +| k8s.cgroup.cpu_full_pressure | some10, some60, some300 | percentage | +| k8s.cgroup.cpu_full_pressure_stall_time | time | ms | +| k8s.cgroup.memory_some_pressure | some10, some60, some300 | percentage | +| k8s.cgroup.memory_some_pressure_stall_time | time | ms | +| k8s.cgroup.memory_full_pressure | some10, some60, some300 | percentage | +| k8s.cgroup.memory_full_pressure_stall_time | time | ms | +| k8s.cgroup.io_some_pressure | some10, some60, some300 | percentage | +| k8s.cgroup.io_some_pressure_stall_time | time | ms | +| k8s.cgroup.io_full_pressure | some10, some60, some300 | percentage | +| k8s.cgroup.io_full_pressure_stall_time | time | ms | + +### Per k8s cgroup network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| device | TBD | +| interface_type | TBD | +| k8s_namespace | TBD | +| k8s_pod_name | TBD | +| k8s_pod_uid | TBD | +| k8s_controller_kind | TBD | +| k8s_controller_name | TBD | +| k8s_node_name | TBD | +| k8s_container_name | TBD | +| k8s_container_id | TBD | +| k8s_kind | TBD | +| k8s_qos_class | TBD | +| k8s_cluster_id | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| k8s.cgroup.net_net | received, sent | kilobits/s | +| k8s.cgroup.net_packets | received, sent, multicast | pps | +| k8s.cgroup.net_errors | inbound, outbound | errors/s | +| k8s.cgroup.net_drops | inbound, outbound | errors/s | +| k8s.cgroup.net_fifo | receive, transmit | errors/s | +| k8s.cgroup.net_compressed | receive, sent | pps | +| k8s.cgroup.net_events | frames, collisions, carrier | events/s | +| k8s.cgroup.net_operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| k8s.cgroup.net_carrier | up, down | state | +| k8s.cgroup.net_mtu | mtu | octets | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ k8s_cgroup_10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | k8s.cgroup.cpu_limit | average cgroup CPU utilization over the last 10 minutes | +| [ k8s_cgroup_ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | k8s.cgroup.mem_usage | cgroup memory utilization | +| [ k8s_cgroup_1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | k8s.cgroup.net_packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ k8s_cgroup_10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | k8s.cgroup.net_packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/integrations/libvirt_containers.md b/collectors/cgroups.plugin/integrations/libvirt_containers.md new file mode 100644 index 000000000..af0310b10 --- /dev/null +++ b/collectors/cgroups.plugin/integrations/libvirt_containers.md @@ -0,0 +1,166 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/libvirt_containers.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "Libvirt Containers" +learn_status: "Published" +learn_rel_path: "Data Collection/Containers and VMs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Libvirt Containers + + +<img src="https://netdata.cloud/img/libvirt.png" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Libvirt for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.cpu_limit | used | percentage | +| cgroup.cpu | user, system | percentage | +| cgroup.cpu_per_core | a dimension per core | percentage | +| cgroup.throttled | throttled | percentage | +| cgroup.throttled_duration | duration | ms | +| cgroup.cpu_shares | shares | shares | +| cgroup.mem | cache, rss, swap, rss_huge, mapped_file | MiB | +| cgroup.writeback | dirty, writeback | MiB | +| cgroup.mem_activity | in, out | MiB/s | +| cgroup.pgfaults | pgfault, swap | MiB/s | +| cgroup.mem_usage | ram, swap | MiB | +| cgroup.mem_usage_limit | available, used | MiB | +| cgroup.mem_utilization | utilization | percentage | +| cgroup.mem_failcnt | failures | count | +| cgroup.io | read, write | KiB/s | +| cgroup.serviced_ops | read, write | operations/s | +| cgroup.throttle_io | read, write | KiB/s | +| cgroup.throttle_serviced_ops | read, write | operations/s | +| cgroup.queued_ops | read, write | operations | +| cgroup.merged_ops | read, write | operations/s | +| cgroup.cpu_some_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_some_pressure_stall_time | time | ms | +| cgroup.cpu_full_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_full_pressure_stall_time | time | ms | +| cgroup.memory_some_pressure | some10, some60, some300 | percentage | +| cgroup.memory_some_pressure_stall_time | time | ms | +| cgroup.memory_full_pressure | some10, some60, some300 | percentage | +| cgroup.memory_full_pressure_stall_time | time | ms | +| cgroup.io_some_pressure | some10, some60, some300 | percentage | +| cgroup.io_some_pressure_stall_time | time | ms | +| cgroup.io_full_pressure | some10, some60, some300 | percentage | +| cgroup.io_full_pressure_stall_time | time | ms | + +### Per cgroup network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | +| device | TBD | +| interface_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.net_net | received, sent | kilobits/s | +| cgroup.net_packets | received, sent, multicast | pps | +| cgroup.net_errors | inbound, outbound | errors/s | +| cgroup.net_drops | inbound, outbound | errors/s | +| cgroup.net_fifo | receive, transmit | errors/s | +| cgroup.net_compressed | receive, sent | pps | +| cgroup.net_events | frames, collisions, carrier | events/s | +| cgroup.net_operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| cgroup.net_carrier | up, down | state | +| cgroup.net_mtu | mtu | octets | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ cgroup_10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.cpu_limit | average cgroup CPU utilization over the last 10 minutes | +| [ cgroup_ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.mem_usage | cgroup memory utilization | +| [ cgroup_1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ cgroup_10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/integrations/lxc_containers.md b/collectors/cgroups.plugin/integrations/lxc_containers.md new file mode 100644 index 000000000..becc9ae17 --- /dev/null +++ b/collectors/cgroups.plugin/integrations/lxc_containers.md @@ -0,0 +1,166 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/lxc_containers.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "LXC Containers" +learn_status: "Published" +learn_rel_path: "Data Collection/Containers and VMs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# LXC Containers + + +<img src="https://netdata.cloud/img/lxc.png" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor LXC Containers for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.cpu_limit | used | percentage | +| cgroup.cpu | user, system | percentage | +| cgroup.cpu_per_core | a dimension per core | percentage | +| cgroup.throttled | throttled | percentage | +| cgroup.throttled_duration | duration | ms | +| cgroup.cpu_shares | shares | shares | +| cgroup.mem | cache, rss, swap, rss_huge, mapped_file | MiB | +| cgroup.writeback | dirty, writeback | MiB | +| cgroup.mem_activity | in, out | MiB/s | +| cgroup.pgfaults | pgfault, swap | MiB/s | +| cgroup.mem_usage | ram, swap | MiB | +| cgroup.mem_usage_limit | available, used | MiB | +| cgroup.mem_utilization | utilization | percentage | +| cgroup.mem_failcnt | failures | count | +| cgroup.io | read, write | KiB/s | +| cgroup.serviced_ops | read, write | operations/s | +| cgroup.throttle_io | read, write | KiB/s | +| cgroup.throttle_serviced_ops | read, write | operations/s | +| cgroup.queued_ops | read, write | operations | +| cgroup.merged_ops | read, write | operations/s | +| cgroup.cpu_some_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_some_pressure_stall_time | time | ms | +| cgroup.cpu_full_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_full_pressure_stall_time | time | ms | +| cgroup.memory_some_pressure | some10, some60, some300 | percentage | +| cgroup.memory_some_pressure_stall_time | time | ms | +| cgroup.memory_full_pressure | some10, some60, some300 | percentage | +| cgroup.memory_full_pressure_stall_time | time | ms | +| cgroup.io_some_pressure | some10, some60, some300 | percentage | +| cgroup.io_some_pressure_stall_time | time | ms | +| cgroup.io_full_pressure | some10, some60, some300 | percentage | +| cgroup.io_full_pressure_stall_time | time | ms | + +### Per cgroup network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | +| device | TBD | +| interface_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.net_net | received, sent | kilobits/s | +| cgroup.net_packets | received, sent, multicast | pps | +| cgroup.net_errors | inbound, outbound | errors/s | +| cgroup.net_drops | inbound, outbound | errors/s | +| cgroup.net_fifo | receive, transmit | errors/s | +| cgroup.net_compressed | receive, sent | pps | +| cgroup.net_events | frames, collisions, carrier | events/s | +| cgroup.net_operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| cgroup.net_carrier | up, down | state | +| cgroup.net_mtu | mtu | octets | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ cgroup_10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.cpu_limit | average cgroup CPU utilization over the last 10 minutes | +| [ cgroup_ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.mem_usage | cgroup memory utilization | +| [ cgroup_1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ cgroup_10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/integrations/ovirt_containers.md b/collectors/cgroups.plugin/integrations/ovirt_containers.md new file mode 100644 index 000000000..c9f6d74b7 --- /dev/null +++ b/collectors/cgroups.plugin/integrations/ovirt_containers.md @@ -0,0 +1,166 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/ovirt_containers.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "oVirt Containers" +learn_status: "Published" +learn_rel_path: "Data Collection/Containers and VMs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# oVirt Containers + + +<img src="https://netdata.cloud/img/ovirt.svg" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor oVirt for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.cpu_limit | used | percentage | +| cgroup.cpu | user, system | percentage | +| cgroup.cpu_per_core | a dimension per core | percentage | +| cgroup.throttled | throttled | percentage | +| cgroup.throttled_duration | duration | ms | +| cgroup.cpu_shares | shares | shares | +| cgroup.mem | cache, rss, swap, rss_huge, mapped_file | MiB | +| cgroup.writeback | dirty, writeback | MiB | +| cgroup.mem_activity | in, out | MiB/s | +| cgroup.pgfaults | pgfault, swap | MiB/s | +| cgroup.mem_usage | ram, swap | MiB | +| cgroup.mem_usage_limit | available, used | MiB | +| cgroup.mem_utilization | utilization | percentage | +| cgroup.mem_failcnt | failures | count | +| cgroup.io | read, write | KiB/s | +| cgroup.serviced_ops | read, write | operations/s | +| cgroup.throttle_io | read, write | KiB/s | +| cgroup.throttle_serviced_ops | read, write | operations/s | +| cgroup.queued_ops | read, write | operations | +| cgroup.merged_ops | read, write | operations/s | +| cgroup.cpu_some_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_some_pressure_stall_time | time | ms | +| cgroup.cpu_full_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_full_pressure_stall_time | time | ms | +| cgroup.memory_some_pressure | some10, some60, some300 | percentage | +| cgroup.memory_some_pressure_stall_time | time | ms | +| cgroup.memory_full_pressure | some10, some60, some300 | percentage | +| cgroup.memory_full_pressure_stall_time | time | ms | +| cgroup.io_some_pressure | some10, some60, some300 | percentage | +| cgroup.io_some_pressure_stall_time | time | ms | +| cgroup.io_full_pressure | some10, some60, some300 | percentage | +| cgroup.io_full_pressure_stall_time | time | ms | + +### Per cgroup network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | +| device | TBD | +| interface_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.net_net | received, sent | kilobits/s | +| cgroup.net_packets | received, sent, multicast | pps | +| cgroup.net_errors | inbound, outbound | errors/s | +| cgroup.net_drops | inbound, outbound | errors/s | +| cgroup.net_fifo | receive, transmit | errors/s | +| cgroup.net_compressed | receive, sent | pps | +| cgroup.net_events | frames, collisions, carrier | events/s | +| cgroup.net_operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| cgroup.net_carrier | up, down | state | +| cgroup.net_mtu | mtu | octets | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ cgroup_10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.cpu_limit | average cgroup CPU utilization over the last 10 minutes | +| [ cgroup_ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.mem_usage | cgroup memory utilization | +| [ cgroup_1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ cgroup_10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/integrations/proxmox_containers.md b/collectors/cgroups.plugin/integrations/proxmox_containers.md new file mode 100644 index 000000000..2caad5eac --- /dev/null +++ b/collectors/cgroups.plugin/integrations/proxmox_containers.md @@ -0,0 +1,166 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/proxmox_containers.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "Proxmox Containers" +learn_status: "Published" +learn_rel_path: "Data Collection/Containers and VMs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Proxmox Containers + + +<img src="https://netdata.cloud/img/proxmox.png" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Proxmox for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.cpu_limit | used | percentage | +| cgroup.cpu | user, system | percentage | +| cgroup.cpu_per_core | a dimension per core | percentage | +| cgroup.throttled | throttled | percentage | +| cgroup.throttled_duration | duration | ms | +| cgroup.cpu_shares | shares | shares | +| cgroup.mem | cache, rss, swap, rss_huge, mapped_file | MiB | +| cgroup.writeback | dirty, writeback | MiB | +| cgroup.mem_activity | in, out | MiB/s | +| cgroup.pgfaults | pgfault, swap | MiB/s | +| cgroup.mem_usage | ram, swap | MiB | +| cgroup.mem_usage_limit | available, used | MiB | +| cgroup.mem_utilization | utilization | percentage | +| cgroup.mem_failcnt | failures | count | +| cgroup.io | read, write | KiB/s | +| cgroup.serviced_ops | read, write | operations/s | +| cgroup.throttle_io | read, write | KiB/s | +| cgroup.throttle_serviced_ops | read, write | operations/s | +| cgroup.queued_ops | read, write | operations | +| cgroup.merged_ops | read, write | operations/s | +| cgroup.cpu_some_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_some_pressure_stall_time | time | ms | +| cgroup.cpu_full_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_full_pressure_stall_time | time | ms | +| cgroup.memory_some_pressure | some10, some60, some300 | percentage | +| cgroup.memory_some_pressure_stall_time | time | ms | +| cgroup.memory_full_pressure | some10, some60, some300 | percentage | +| cgroup.memory_full_pressure_stall_time | time | ms | +| cgroup.io_some_pressure | some10, some60, some300 | percentage | +| cgroup.io_some_pressure_stall_time | time | ms | +| cgroup.io_full_pressure | some10, some60, some300 | percentage | +| cgroup.io_full_pressure_stall_time | time | ms | + +### Per cgroup network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | +| device | TBD | +| interface_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.net_net | received, sent | kilobits/s | +| cgroup.net_packets | received, sent, multicast | pps | +| cgroup.net_errors | inbound, outbound | errors/s | +| cgroup.net_drops | inbound, outbound | errors/s | +| cgroup.net_fifo | receive, transmit | errors/s | +| cgroup.net_compressed | receive, sent | pps | +| cgroup.net_events | frames, collisions, carrier | events/s | +| cgroup.net_operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| cgroup.net_carrier | up, down | state | +| cgroup.net_mtu | mtu | octets | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ cgroup_10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.cpu_limit | average cgroup CPU utilization over the last 10 minutes | +| [ cgroup_ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.mem_usage | cgroup memory utilization | +| [ cgroup_1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ cgroup_10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/integrations/systemd_services.md b/collectors/cgroups.plugin/integrations/systemd_services.md new file mode 100644 index 000000000..b71060050 --- /dev/null +++ b/collectors/cgroups.plugin/integrations/systemd_services.md @@ -0,0 +1,110 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/systemd_services.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "Systemd Services" +learn_status: "Published" +learn_rel_path: "Data Collection/Systemd" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Systemd Services + + +<img src="https://netdata.cloud/img/systemd.svg" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Containers for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per systemd service + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| service_name | Service name | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| systemd.service.cpu.utilization | user, system | percentage | +| systemd.service.memory.usage | ram, swap | MiB | +| systemd.service.memory.failcnt | fail | failures/s | +| systemd.service.memory.ram.usage | rss, cache, mapped_file, rss_huge | MiB | +| systemd.service.memory.writeback | writeback, dirty | MiB | +| systemd.service.memory.paging.faults | minor, major | MiB/s | +| systemd.service.memory.paging.io | in, out | MiB/s | +| systemd.service.disk.io | read, write | KiB/s | +| systemd.service.disk.iops | read, write | operations/s | +| systemd.service.disk.throttle.io | read, write | KiB/s | +| systemd.service.disk.throttle.iops | read, write | operations/s | +| systemd.service.disk.queued_iops | read, write | operations/s | +| systemd.service.disk.merged_iops | read, write | operations/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/integrations/virtual_machines.md b/collectors/cgroups.plugin/integrations/virtual_machines.md new file mode 100644 index 000000000..3bb79c128 --- /dev/null +++ b/collectors/cgroups.plugin/integrations/virtual_machines.md @@ -0,0 +1,166 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/integrations/virtual_machines.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cgroups.plugin/metadata.yaml" +sidebar_label: "Virtual Machines" +learn_status: "Published" +learn_rel_path: "Data Collection/Containers and VMs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Virtual Machines + + +<img src="https://netdata.cloud/img/container.svg" width="150"/> + + +Plugin: cgroups.plugin +Module: /sys/fs/cgroup + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Virtual Machines for performance, resource usage, and health status. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.cpu_limit | used | percentage | +| cgroup.cpu | user, system | percentage | +| cgroup.cpu_per_core | a dimension per core | percentage | +| cgroup.throttled | throttled | percentage | +| cgroup.throttled_duration | duration | ms | +| cgroup.cpu_shares | shares | shares | +| cgroup.mem | cache, rss, swap, rss_huge, mapped_file | MiB | +| cgroup.writeback | dirty, writeback | MiB | +| cgroup.mem_activity | in, out | MiB/s | +| cgroup.pgfaults | pgfault, swap | MiB/s | +| cgroup.mem_usage | ram, swap | MiB | +| cgroup.mem_usage_limit | available, used | MiB | +| cgroup.mem_utilization | utilization | percentage | +| cgroup.mem_failcnt | failures | count | +| cgroup.io | read, write | KiB/s | +| cgroup.serviced_ops | read, write | operations/s | +| cgroup.throttle_io | read, write | KiB/s | +| cgroup.throttle_serviced_ops | read, write | operations/s | +| cgroup.queued_ops | read, write | operations | +| cgroup.merged_ops | read, write | operations/s | +| cgroup.cpu_some_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_some_pressure_stall_time | time | ms | +| cgroup.cpu_full_pressure | some10, some60, some300 | percentage | +| cgroup.cpu_full_pressure_stall_time | time | ms | +| cgroup.memory_some_pressure | some10, some60, some300 | percentage | +| cgroup.memory_some_pressure_stall_time | time | ms | +| cgroup.memory_full_pressure | some10, some60, some300 | percentage | +| cgroup.memory_full_pressure_stall_time | time | ms | +| cgroup.io_some_pressure | some10, some60, some300 | percentage | +| cgroup.io_some_pressure_stall_time | time | ms | +| cgroup.io_full_pressure | some10, some60, some300 | percentage | +| cgroup.io_full_pressure_stall_time | time | ms | + +### Per cgroup network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| container_name | TBD | +| image | TBD | +| device | TBD | +| interface_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.net_net | received, sent | kilobits/s | +| cgroup.net_packets | received, sent, multicast | pps | +| cgroup.net_errors | inbound, outbound | errors/s | +| cgroup.net_drops | inbound, outbound | errors/s | +| cgroup.net_fifo | receive, transmit | errors/s | +| cgroup.net_compressed | receive, sent | pps | +| cgroup.net_events | frames, collisions, carrier | events/s | +| cgroup.net_operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| cgroup.net_carrier | up, down | state | +| cgroup.net_mtu | mtu | octets | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ cgroup_10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.cpu_limit | average cgroup CPU utilization over the last 10 minutes | +| [ cgroup_ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.mem_usage | cgroup memory utilization | +| [ cgroup_1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ cgroup_10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/cgroups.conf) | cgroup.net_packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cgroups.plugin/metadata.yaml b/collectors/cgroups.plugin/metadata.yaml index b342d30a3..ec6228ea2 100644 --- a/collectors/cgroups.plugin/metadata.yaml +++ b/collectors/cgroups.plugin/metadata.yaml @@ -406,7 +406,7 @@ modules: link: https://kubernetes.io/ icon_filename: kubernetes.svg categories: - - data-collection.containers-and-vms + #- data-collection.containers-and-vms - data-collection.kubernetes keywords: - k8s @@ -821,154 +821,104 @@ modules: description: "" availability: [] scopes: - - name: global + - name: systemd service description: "" - labels: [] + labels: + - name: service_name + description: Service name metrics: - - name: services.cpu + - name: systemd.service.cpu.utilization description: Systemd Services CPU utilization (100% = 1 core) - unit: "percentage" + unit: percentage chart_type: stacked dimensions: - - name: a dimension per systemd service - - name: services.mem_usage + - name: user + - name: system + - name: systemd.service.memory.usage description: Systemd Services Used Memory - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_rss - description: Systemd Services RSS Memory - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_mapped - description: Systemd Services Mapped Memory - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_cache - description: Systemd Services Cache Memory - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_writeback - description: Systemd Services Writeback Memory - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_pgfault - description: Systemd Services Memory Minor Page Faults - unit: "MiB/s" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_pgmajfault - description: Systemd Services Memory Major Page Faults - unit: "MiB/s" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_pgpgin - description: Systemd Services Memory Charging Activity - unit: "MiB/s" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.mem_pgpgout - description: Systemd Services Memory Uncharging Activity - unit: "MiB/s" + unit: MiB chart_type: stacked dimensions: - - name: a dimension per systemd service - - name: services.mem_failcnt + - name: ram + - name: swap + - name: systemd.service.memory.failcnt description: Systemd Services Memory Limit Failures - unit: "failures" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.swap_usage - description: Systemd Services Swap Memory Used - unit: "MiB" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.io_read - description: Systemd Services Disk Read Bandwidth - unit: "KiB/s" - chart_type: stacked - dimensions: - - name: a dimension per systemd service - - name: services.io_write - description: Systemd Services Disk Write Bandwidth - unit: "KiB/s" - chart_type: stacked + unit: failures/s + chart_type: line dimensions: - - name: a dimension per systemd service - - name: services.io_ops_read - description: Systemd Services Disk Read Operations - unit: "operations/s" + - name: fail + - name: systemd.service.memory.ram.usage + description: Systemd Services Memory + unit: MiB chart_type: stacked dimensions: - - name: a dimension per systemd service - - name: services.io_ops_write - description: Systemd Services Disk Write Operations - unit: "operations/s" + - name: rss + - name: cache + - name: mapped_file + - name: rss_huge + - name: systemd.service.memory.writeback + description: Systemd Services Writeback Memory + unit: MiB chart_type: stacked dimensions: - - name: a dimension per systemd service - - name: services.throttle_io_read - description: Systemd Services Throttle Disk Read Bandwidth - unit: "KiB/s" - chart_type: stacked + - name: writeback + - name: dirty + - name: systemd.service.memory.paging.faults + description: Systemd Services Memory Minor and Major Page Faults + unit: MiB/s + chart_type: area dimensions: - - name: a dimension per systemd service - - name: services.services.throttle_io_write - description: Systemd Services Throttle Disk Write Bandwidth - unit: "KiB/s" - chart_type: stacked + - name: minor + - name: major + - name: systemd.service.memory.paging.io + description: Systemd Services Memory Paging IO + unit: MiB/s + chart_type: area dimensions: - - name: a dimension per systemd service - - name: services.throttle_io_ops_read - description: Systemd Services Throttle Disk Read Operations - unit: "operations/s" - chart_type: stacked + - name: in + - name: out + - name: systemd.service.disk.io + description: Systemd Services Disk Read/Write Bandwidth + unit: KiB/s + chart_type: area dimensions: - - name: a dimension per systemd service - - name: throttle_io_ops_write - description: Systemd Services Throttle Disk Write Operations - unit: "operations/s" - chart_type: stacked + - name: read + - name: write + - name: systemd.service.disk.iops + description: Systemd Services Disk Read/Write Operations + unit: operations/s + chart_type: line dimensions: - - name: a dimension per systemd service - - name: services.queued_io_ops_read - description: Systemd Services Queued Disk Read Operations - unit: "operations/s" - chart_type: stacked + - name: read + - name: write + - name: systemd.service.disk.throttle.io + description: Systemd Services Throttle Disk Read/Write Bandwidth + unit: KiB/s + chart_type: area dimensions: - - name: a dimension per systemd service - - name: services.queued_io_ops_write - description: Systemd Services Queued Disk Write Operations - unit: "operations/s" - chart_type: stacked + - name: read + - name: write + - name: systemd.service.disk.throttle.iops + description: Systemd Services Throttle Disk Read/Write Operations + unit: operations/s + chart_type: line dimensions: - - name: a dimension per systemd service - - name: services.merged_io_ops_read - description: Systemd Services Merged Disk Read Operations - unit: "operations/s" - chart_type: stacked + - name: read + - name: write + - name: systemd.service.disk.queued_iops + description: Systemd Services Queued Disk Read/Write Operations + unit: operations/s + chart_type: line dimensions: - - name: a dimension per systemd service - - name: services.merged_io_ops_write - description: Systemd Services Merged Disk Write Operations - unit: "operations/s" - chart_type: stacked + - name: read + - name: write + - name: systemd.service.disk.merged_iops + description: Systemd Services Merged Disk Read/Write Operations + unit: operations/s + chart_type: line dimensions: - - name: a dimension per systemd service + - name: read + - name: write - <<: *module meta: <<: *meta diff --git a/collectors/cgroups.plugin/sys_fs_cgroup.c b/collectors/cgroups.plugin/sys_fs_cgroup.c index 9c7488c82..3bb8e7d3e 100644 --- a/collectors/cgroups.plugin/sys_fs_cgroup.c +++ b/collectors/cgroups.plugin/sys_fs_cgroup.c @@ -38,6 +38,7 @@ // cgroup globals static char cgroup_chart_id_prefix[] = "cgroup_"; +static char services_chart_id_prefix[] = "systemd_"; static int is_inside_k8s = 0; @@ -796,17 +797,20 @@ struct cgroup { char enabled; // enabled in the config char pending_renames; - char *intermediate_id; // TODO: remove it when the renaming script is fixed char *id; uint32_t hash; + char *intermediate_id; // TODO: remove it when the renaming script is fixed + char *chart_id; - uint32_t hash_chart; + uint32_t hash_chart_id; - char *chart_title; + // 'cgroup_name' label value. + // By default this is the *id (path), later changed to the resolved name (cgroup-name.sh) or systemd service name. + char *name; - DICTIONARY *chart_labels; + RRDLABELS *chart_labels; int container_orchestrator; @@ -878,35 +882,6 @@ struct cgroup { unsigned long long memoryswap_limit; const RRDSETVAR_ACQUIRED *chart_var_memoryswap_limit; - // services - RRDDIM *rd_cpu; - RRDDIM *rd_mem_usage; - RRDDIM *rd_mem_failcnt; - RRDDIM *rd_swap_usage; - - RRDDIM *rd_mem_detailed_cache; - RRDDIM *rd_mem_detailed_rss; - RRDDIM *rd_mem_detailed_mapped; - RRDDIM *rd_mem_detailed_writeback; - RRDDIM *rd_mem_detailed_pgpgin; - RRDDIM *rd_mem_detailed_pgpgout; - RRDDIM *rd_mem_detailed_pgfault; - RRDDIM *rd_mem_detailed_pgmajfault; - - RRDDIM *rd_io_service_bytes_read; - RRDDIM *rd_io_serviced_read; - RRDDIM *rd_throttle_io_read; - RRDDIM *rd_throttle_io_serviced_read; - RRDDIM *rd_io_queued_read; - RRDDIM *rd_io_merged_read; - - RRDDIM *rd_io_service_bytes_write; - RRDDIM *rd_io_serviced_write; - RRDDIM *rd_throttle_io_write; - RRDDIM *rd_throttle_io_serviced_write; - RRDDIM *rd_io_queued_write; - RRDDIM *rd_io_merged_write; - struct cgroup *next; struct cgroup *discovered_next; @@ -1667,7 +1642,7 @@ static inline void read_all_discovered_cgroups(struct cgroup *root) { #define CGROUP_NETWORK_INTERFACE_MAX_LINE 2048 static inline void read_cgroup_network_interfaces(struct cgroup *cg) { - netdata_log_debug(D_CGROUP, "looking for the network interfaces of cgroup '%s' with chart id '%s' and title '%s'", cg->id, cg->chart_id, cg->chart_title); + netdata_log_debug(D_CGROUP, "looking for the network interfaces of cgroup '%s' with chart id '%s'", cg->id, cg->chart_id); pid_t cgroup_pid; char cgroup_identifier[CGROUP_NETWORK_INTERFACE_MAX_LINE + 1]; @@ -1747,17 +1722,6 @@ static inline void free_cgroup_network_interfaces(struct cgroup *cg) { #define CGROUP_CHARTID_LINE_MAX 1024 -static inline char *cgroup_title_strdupz(const char *s) { - if(!s || !*s) s = "/"; - - if(*s == '/' && s[1] != '\0') s++; - - char *r = strdupz(s); - netdata_fix_chart_name(r); - - return r; -} - static inline char *cgroup_chart_id_strdupz(const char *s) { if(!s || !*s) s = "/"; @@ -1781,14 +1745,14 @@ static inline void substitute_dots_in_id(char *s) { // ---------------------------------------------------------------------------- // parse k8s labels -char *cgroup_parse_resolved_name_and_labels(DICTIONARY *labels, char *data) { +char *cgroup_parse_resolved_name_and_labels(RRDLABELS *labels, char *data) { // the first word, up to the first space is the name char *name = strsep_skip_consecutive_separators(&data, " "); // the rest are key=value pairs separated by comma while(data) { char *pair = strsep_skip_consecutive_separators(&data, ","); - rrdlabels_add_pair(labels, pair, RRDLABEL_SRC_AUTO| RRDLABEL_SRC_K8S); + rrdlabels_add_pair(labels, pair, RRDLABEL_SRC_AUTO | RRDLABEL_SRC_K8S); } return name; @@ -1866,7 +1830,7 @@ static inline void cgroup_free(struct cgroup *cg) { freez(cg->id); freez(cg->intermediate_id); freez(cg->chart_id); - freez(cg->chart_title); + freez(cg->name); rrdlabels_destroy(cg->chart_labels); @@ -1883,7 +1847,7 @@ static inline void discovery_rename_cgroup(struct cgroup *cg) { } cg->pending_renames--; - netdata_log_debug(D_CGROUP, "looking for the name of cgroup '%s' with chart id '%s' and title '%s'", cg->id, cg->chart_id, cg->chart_title); + netdata_log_debug(D_CGROUP, "looking for the name of cgroup '%s' with chart id '%s'", cg->id, cg->chart_id); netdata_log_debug(D_CGROUP, "executing command %s \"%s\" for cgroup '%s'", cgroups_rename_script, cg->intermediate_id, cg->chart_id); pid_t cgroup_pid; @@ -1927,14 +1891,14 @@ static inline void discovery_rename_cgroup(struct cgroup *cg) { name = cgroup_parse_resolved_name_and_labels(cg->chart_labels, new_name); rrdlabels_remove_all_unmarked(cg->chart_labels); - freez(cg->chart_title); - cg->chart_title = cgroup_title_strdupz(name); + freez(cg->name); + cg->name = strdupz(name); freez(cg->chart_id); cg->chart_id = cgroup_chart_id_strdupz(name); substitute_dots_in_id(cg->chart_id); - cg->hash_chart = simple_hash(cg->chart_id); + cg->hash_chart_id = simple_hash(cg->chart_id); } static void is_cgroup_procs_exist(netdata_ebpf_cgroup_shm_body_t *out, char *id) { @@ -1992,21 +1956,31 @@ static inline void convert_cgroup_to_systemd_service(struct cgroup *cg) { s[len] = '\0'; } - freez(cg->chart_title); - cg->chart_title = cgroup_title_strdupz(s); + freez(cg->name); + cg->name = strdupz(s); + + freez(cg->chart_id); + cg->chart_id = cgroup_chart_id_strdupz(s); + substitute_dots_in_id(cg->chart_id); + cg->hash_chart_id = simple_hash(cg->chart_id); } static inline struct cgroup *discovery_cgroup_add(const char *id) { netdata_log_debug(D_CGROUP, "adding to list, cgroup with id '%s'", id); struct cgroup *cg = callocz(1, sizeof(struct cgroup)); + cg->id = strdupz(id); cg->hash = simple_hash(cg->id); - cg->chart_title = cgroup_title_strdupz(id); + + cg->name = strdupz(id); + cg->intermediate_id = cgroup_chart_id_strdupz(id); + cg->chart_id = cgroup_chart_id_strdupz(id); substitute_dots_in_id(cg->chart_id); - cg->hash_chart = simple_hash(cg->chart_id); + cg->hash_chart_id = simple_hash(cg->chart_id); + if (cgroup_use_unified_cgroups) { cg->options |= CGROUP_OPTIONS_IS_UNIFIED; } @@ -2500,8 +2474,10 @@ static inline void discovery_cleanup_all_cgroups() { // enable the first duplicate cgroup { struct cgroup *t; - for(t = discovered_cgroup_root; t ; t = t->discovered_next) { - if(t != cg && t->available && !t->enabled && t->options & CGROUP_OPTIONS_DISABLED_DUPLICATE && t->hash_chart == cg->hash_chart && !strcmp(t->chart_id, cg->chart_id)) { + for (t = discovered_cgroup_root; t; t = t->discovered_next) { + if (t != cg && t->available && !t->enabled && t->options & CGROUP_OPTIONS_DISABLED_DUPLICATE && + (is_cgroup_systemd_service(t) == is_cgroup_systemd_service(cg)) && + t->hash_chart_id == cg->hash_chart_id && !strcmp(t->chart_id, cg->chart_id)) { netdata_log_debug(D_CGROUP, "Enabling duplicate of cgroup '%s' with id '%s', because the original with id '%s' stopped.", t->chart_id, t->id, cg->id); t->enabled = 1; t->options &= ~CGROUP_OPTIONS_DISABLED_DUPLICATE; @@ -2553,8 +2529,8 @@ static inline void discovery_share_cgroups_with_ebpf() { for (cg = cgroup_root, count = 0; cg; cg = cg->next, count++) { netdata_ebpf_cgroup_shm_body_t *ptr = &shm_cgroup_ebpf.body[count]; - char *prefix = (is_cgroup_systemd_service(cg)) ? "" : "cgroup_"; - snprintfz(ptr->name, CGROUP_EBPF_NAME_SHARED_LENGTH - 1, "%s%s", prefix, cg->chart_title); + char *prefix = (is_cgroup_systemd_service(cg)) ? services_chart_id_prefix : cgroup_chart_id_prefix; + snprintfz(ptr->name, CGROUP_EBPF_NAME_SHARED_LENGTH - 1, "%s%s", prefix, cg->chart_id); ptr->hash = simple_hash(ptr->name); ptr->options = cg->options; ptr->enabled = cg->enabled; @@ -2658,13 +2634,13 @@ static inline void discovery_process_first_time_seen_cgroup(struct cgroup *cg) { } if (cgroup_enable_systemd_services && matches_systemd_services_cgroups(cg->id)) { - netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') matches 'cgroups to match as systemd services'", cg->id, cg->chart_title); + netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') matches 'cgroups to match as systemd services'", cg->id, cg->chart_id); convert_cgroup_to_systemd_service(cg); return; } if (matches_enabled_cgroup_renames(cg->id)) { - netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') matches 'run script to rename cgroups matching', will try to rename it", cg->id, cg->chart_title); + netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') matches 'run script to rename cgroups matching', will try to rename it", cg->id, cg->chart_id); if (is_inside_k8s && k8s_is_container(cg->id)) { // it may take up to a minute for the K8s API to return data for the container // tested on AWS K8s cluster with 100% CPU utilization @@ -2676,15 +2652,20 @@ static inline void discovery_process_first_time_seen_cgroup(struct cgroup *cg) { } static int discovery_is_cgroup_duplicate(struct cgroup *cg) { - // https://github.com/netdata/netdata/issues/797#issuecomment-241248884 - struct cgroup *c; - for (c = discovered_cgroup_root; c; c = c->discovered_next) { - if (c != cg && c->enabled && c->hash_chart == cg->hash_chart && !strcmp(c->chart_id, cg->chart_id)) { - collector_error("CGROUP: chart id '%s' already exists with id '%s' and is enabled and available. Disabling cgroup with id '%s'.", cg->chart_id, c->id, cg->id); - return 1; - } - } - return 0; + // https://github.com/netdata/netdata/issues/797#issuecomment-241248884 + struct cgroup *c; + for (c = discovered_cgroup_root; c; c = c->discovered_next) { + if (c != cg && c->enabled && (is_cgroup_systemd_service(c) == is_cgroup_systemd_service(cg)) && + c->hash_chart_id == cg->hash_chart_id && !strcmp(c->chart_id, cg->chart_id)) { + collector_error( + "CGROUP: chart id '%s' already exists with id '%s' and is enabled and available. Disabling cgroup with id '%s'.", + cg->chart_id, + c->id, + cg->id); + return 1; + } + } + return 0; } static inline void discovery_process_cgroup(struct cgroup *cg) { @@ -2720,17 +2701,25 @@ static inline void discovery_process_cgroup(struct cgroup *cg) { } if (is_cgroup_systemd_service(cg)) { + if (discovery_is_cgroup_duplicate(cg)) { + cg->enabled = 0; + cg->options |= CGROUP_OPTIONS_DISABLED_DUPLICATE; + return; + } + if (!cg->chart_labels) + cg->chart_labels = rrdlabels_create(); + rrdlabels_add(cg->chart_labels, "service_name", cg->name, RRDLABEL_SRC_AUTO); cg->enabled = 1; return; } - if (!(cg->enabled = matches_enabled_cgroup_names(cg->chart_title))) { - netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') disabled by 'enable by default cgroups names matching'", cg->id, cg->chart_title); + if (!(cg->enabled = matches_enabled_cgroup_names(cg->name))) { + netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') disabled by 'enable by default cgroups names matching'", cg->id, cg->name); return; } if (!(cg->enabled = matches_enabled_cgroup_paths(cg->id))) { - netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') disabled by 'enable by default cgroups matching'", cg->id, cg->chart_title); + netdata_log_debug(D_CGROUP, "cgroup '%s' (name '%s') disabled by 'enable by default cgroups matching'", cg->id, cg->name); return; } @@ -2744,10 +2733,9 @@ static inline void discovery_process_cgroup(struct cgroup *cg) { cg->chart_labels = rrdlabels_create(); if (!k8s_is_kubepod(cg)) { - rrdlabels_add(cg->chart_labels, "cgroup_name", cg->chart_id, RRDLABEL_SRC_AUTO); - if (!dictionary_get(cg->chart_labels, "image")) { + rrdlabels_add(cg->chart_labels, "cgroup_name", cg->name, RRDLABEL_SRC_AUTO); + if (!rrdlabels_exist(cg->chart_labels, "image")) rrdlabels_add(cg->chart_labels, "image", "", RRDLABEL_SRC_AUTO); - } } worker_is_busy(WORKER_DISCOVERY_PROCESS_NETWORK); @@ -2801,6 +2789,19 @@ static void cgroup_discovery_cleanup(void *ptr) { service_exits(); } +static inline char *cgroup_chart_type(char *buffer, struct cgroup *cg) { + if(buffer[0]) return buffer; + + if (cg->chart_id[0] == '\0' || (cg->chart_id[0] == '/' && cg->chart_id[1] == '\0')) + strncpy(buffer, "cgroup_root", RRD_ID_LENGTH_MAX); + else if (is_cgroup_systemd_service(cg)) + snprintfz(buffer, RRD_ID_LENGTH_MAX, "%s%s", services_chart_id_prefix, cg->chart_id); + else + snprintfz(buffer, RRD_ID_LENGTH_MAX, "%s%s", cgroup_chart_id_prefix, cg->chart_id); + + return buffer; +} + void cgroup_discovery_worker(void *ptr) { UNUSED(ptr); @@ -2850,709 +2851,376 @@ void cgroup_discovery_worker(void *ptr) #define CHART_TITLE_MAX 300 void update_systemd_services_charts( - int update_every - , int do_cpu - , int do_mem_usage - , int do_mem_detailed - , int do_mem_failcnt - , int do_swap_usage - , int do_io - , int do_io_ops - , int do_throttle_io - , int do_throttle_ops - , int do_queued_ops - , int do_merged_ops -) { - static RRDSET - *st_cpu = NULL, - *st_mem_usage = NULL, - *st_mem_failcnt = NULL, - *st_swap_usage = NULL, - - *st_mem_detailed_cache = NULL, - *st_mem_detailed_rss = NULL, - *st_mem_detailed_mapped = NULL, - *st_mem_detailed_writeback = NULL, - *st_mem_detailed_pgfault = NULL, - *st_mem_detailed_pgmajfault = NULL, - *st_mem_detailed_pgpgin = NULL, - *st_mem_detailed_pgpgout = NULL, - - *st_io_read = NULL, - *st_io_serviced_read = NULL, - *st_throttle_io_read = NULL, - *st_throttle_ops_read = NULL, - *st_queued_ops_read = NULL, - *st_merged_ops_read = NULL, - - *st_io_write = NULL, - *st_io_serviced_write = NULL, - *st_throttle_io_write = NULL, - *st_throttle_ops_write = NULL, - *st_queued_ops_write = NULL, - *st_merged_ops_write = NULL; - - // create the charts - - if (unlikely(do_cpu && !st_cpu)) { - char title[CHART_TITLE_MAX + 1]; - snprintfz(title, CHART_TITLE_MAX, "Systemd Services CPU utilization (100%% = 1 core)"); - - st_cpu = rrdset_create_localhost( - "services" - , "cpu" - , NULL - , "cpu" - , "services.cpu" - , title - , "percentage" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if (unlikely(do_mem_usage && !st_mem_usage)) { - st_mem_usage = rrdset_create_localhost( - "services" - , "mem_usage" - , NULL - , "mem" - , "services.mem_usage" - , "Systemd Services Used Memory" - , "MiB" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 10 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(likely(do_mem_detailed)) { - if(unlikely(!st_mem_detailed_rss)) { - st_mem_detailed_rss = rrdset_create_localhost( - "services" - , "mem_rss" - , NULL - , "mem" - , "services.mem_rss" - , "Systemd Services RSS Memory" - , "MiB" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 20 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_mem_detailed_mapped)) { - st_mem_detailed_mapped = rrdset_create_localhost( - "services" - , "mem_mapped" - , NULL - , "mem" - , "services.mem_mapped" - , "Systemd Services Mapped Memory" - , "MiB" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 30 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_mem_detailed_cache)) { - st_mem_detailed_cache = rrdset_create_localhost( - "services" - , "mem_cache" - , NULL - , "mem" - , "services.mem_cache" - , "Systemd Services Cache Memory" - , "MiB" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 40 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_mem_detailed_writeback)) { - st_mem_detailed_writeback = rrdset_create_localhost( - "services" - , "mem_writeback" - , NULL - , "mem" - , "services.mem_writeback" - , "Systemd Services Writeback Memory" - , "MiB" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 50 - , update_every - , RRDSET_TYPE_STACKED - ); - - } - - if(unlikely(!st_mem_detailed_pgfault)) { - st_mem_detailed_pgfault = rrdset_create_localhost( - "services" - , "mem_pgfault" - , NULL - , "mem" - , "services.mem_pgfault" - , "Systemd Services Memory Minor Page Faults" - , "MiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 60 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_mem_detailed_pgmajfault)) { - st_mem_detailed_pgmajfault = rrdset_create_localhost( - "services" - , "mem_pgmajfault" - , NULL - , "mem" - , "services.mem_pgmajfault" - , "Systemd Services Memory Major Page Faults" - , "MiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 70 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_mem_detailed_pgpgin)) { - st_mem_detailed_pgpgin = rrdset_create_localhost( - "services" - , "mem_pgpgin" - , NULL - , "mem" - , "services.mem_pgpgin" - , "Systemd Services Memory Charging Activity" - , "MiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 80 - , update_every - , RRDSET_TYPE_STACKED - ); - - } - - if(unlikely(!st_mem_detailed_pgpgout)) { - st_mem_detailed_pgpgout = rrdset_create_localhost( - "services" - , "mem_pgpgout" - , NULL - , "mem" - , "services.mem_pgpgout" - , "Systemd Services Memory Uncharging Activity" - , "MiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 90 - , update_every - , RRDSET_TYPE_STACKED - ); - } - } - - if(unlikely(do_mem_failcnt && !st_mem_failcnt)) { - st_mem_failcnt = rrdset_create_localhost( - "services" - , "mem_failcnt" - , NULL - , "mem" - , "services.mem_failcnt" - , "Systemd Services Memory Limit Failures" - , "failures" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 110 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if (do_swap_usage && !st_swap_usage) { - st_swap_usage = rrdset_create_localhost( - "services" - , "swap_usage" - , NULL - , "swap" - , "services.swap_usage" - , "Systemd Services Swap Memory Used" - , "MiB" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 100 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(likely(do_io)) { - if(unlikely(!st_io_read)) { - st_io_read = rrdset_create_localhost( - "services" - , "io_read" - , NULL - , "disk" - , "services.io_read" - , "Systemd Services Disk Read Bandwidth" - , "KiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 120 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_io_write)) { - st_io_write = rrdset_create_localhost( - "services" - , "io_write" - , NULL - , "disk" - , "services.io_write" - , "Systemd Services Disk Write Bandwidth" - , "KiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 130 - , update_every - , RRDSET_TYPE_STACKED - ); - } - } - - if(likely(do_io_ops)) { - if(unlikely(!st_io_serviced_read)) { - st_io_serviced_read = rrdset_create_localhost( - "services" - , "io_ops_read" - , NULL - , "disk" - , "services.io_ops_read" - , "Systemd Services Disk Read Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 140 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_io_serviced_write)) { - st_io_serviced_write = rrdset_create_localhost( - "services" - , "io_ops_write" - , NULL - , "disk" - , "services.io_ops_write" - , "Systemd Services Disk Write Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 150 - , update_every - , RRDSET_TYPE_STACKED - ); - } - } - - if(likely(do_throttle_io)) { - if(unlikely(!st_throttle_io_read)) { - - st_throttle_io_read = rrdset_create_localhost( - "services" - , "throttle_io_read" - , NULL - , "disk" - , "services.throttle_io_read" - , "Systemd Services Throttle Disk Read Bandwidth" - , "KiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 160 - , update_every - , RRDSET_TYPE_STACKED - ); - - } - - if(unlikely(!st_throttle_io_write)) { - st_throttle_io_write = rrdset_create_localhost( - "services" - , "throttle_io_write" - , NULL - , "disk" - , "services.throttle_io_write" - , "Systemd Services Throttle Disk Write Bandwidth" - , "KiB/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 170 - , update_every - , RRDSET_TYPE_STACKED - ); - } - } - - if(likely(do_throttle_ops)) { - if(unlikely(!st_throttle_ops_read)) { - st_throttle_ops_read = rrdset_create_localhost( - "services" - , "throttle_io_ops_read" - , NULL - , "disk" - , "services.throttle_io_ops_read" - , "Systemd Services Throttle Disk Read Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 180 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_throttle_ops_write)) { - st_throttle_ops_write = rrdset_create_localhost( - "services" - , "throttle_io_ops_write" - , NULL - , "disk" - , "services.throttle_io_ops_write" - , "Systemd Services Throttle Disk Write Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 190 - , update_every - , RRDSET_TYPE_STACKED - ); - } - } - - if(likely(do_queued_ops)) { - if(unlikely(!st_queued_ops_read)) { - st_queued_ops_read = rrdset_create_localhost( - "services" - , "queued_io_ops_read" - , NULL - , "disk" - , "services.queued_io_ops_read" - , "Systemd Services Queued Disk Read Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 200 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_queued_ops_write)) { - - st_queued_ops_write = rrdset_create_localhost( - "services" - , "queued_io_ops_write" - , NULL - , "disk" - , "services.queued_io_ops_write" - , "Systemd Services Queued Disk Write Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 210 - , update_every - , RRDSET_TYPE_STACKED - ); - } - } - - if(likely(do_merged_ops)) { - if(unlikely(!st_merged_ops_read)) { - st_merged_ops_read = rrdset_create_localhost( - "services" - , "merged_io_ops_read" - , NULL - , "disk" - , "services.merged_io_ops_read" - , "Systemd Services Merged Disk Read Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 220 - , update_every - , RRDSET_TYPE_STACKED - ); - } - - if(unlikely(!st_merged_ops_write)) { - st_merged_ops_write = rrdset_create_localhost( - "services" - , "merged_io_ops_write" - , NULL - , "disk" - , "services.merged_io_ops_write" - , "Systemd Services Merged Disk Write Operations" - , "operations/s" - , PLUGIN_CGROUPS_NAME - , PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME - , NETDATA_CHART_PRIO_CGROUPS_SYSTEMD + 230 - , update_every - , RRDSET_TYPE_STACKED - ); - } - } - + int update_every, + int do_cpu, + int do_mem_usage, + int do_mem_detailed, + int do_mem_failcnt, + int do_swap_usage, + int do_io, + int do_io_ops, + int do_throttle_io, + int do_throttle_ops, + int do_queued_ops, + int do_merged_ops) +{ // update the values struct cgroup *cg; - for(cg = cgroup_root; cg ; cg = cg->next) { - if(unlikely(!cg->enabled || cg->pending_renames || !is_cgroup_systemd_service(cg))) - continue; + int systemd_cgroup_chart_priority = NETDATA_CHART_PRIO_CGROUPS_SYSTEMD; + char type[RRD_ID_LENGTH_MAX + 1]; - if(likely(do_cpu && cg->cpuacct_stat.updated)) { - if(unlikely(!cg->rd_cpu)){ + for (cg = cgroup_root; cg; cg = cg->next) { + if (unlikely(!cg->enabled || cg->pending_renames || !is_cgroup_systemd_service(cg))) + continue; + type[0] = '\0'; + if (likely(do_cpu && cg->cpuacct_stat.updated)) { + if (unlikely(!cg->st_cpu)) { + cg->st_cpu = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "cpu_utilization", + NULL, + "cpu", + "systemd.service.cpu.utilization", + "Systemd Services CPU utilization (100%% = 1 core)", + "percentage", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority, + update_every, + RRDSET_TYPE_STACKED); + rrdset_update_rrdlabels(cg->st_cpu, cg->chart_labels); if (!(cg->options & CGROUP_OPTIONS_IS_UNIFIED)) { - cg->rd_cpu = rrddim_add(st_cpu, cg->chart_id, cg->chart_title, 100, system_hz, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_cpu, "user", NULL, 100, system_hz, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_cpu, "system", NULL, 100, system_hz, RRD_ALGORITHM_INCREMENTAL); } else { - cg->rd_cpu = rrddim_add(st_cpu, cg->chart_id, cg->chart_title, 100, 1000000, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_cpu, "user", NULL, 100, 1000000, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_cpu, "system", NULL, 100, 1000000, RRD_ALGORITHM_INCREMENTAL); } } - rrddim_set_by_pointer(st_cpu, cg->rd_cpu, cg->cpuacct_stat.user + cg->cpuacct_stat.system); - } - - if(likely(do_mem_usage && cg->memory.updated_usage_in_bytes)) { - if(unlikely(!cg->rd_mem_usage)) - cg->rd_mem_usage = rrddim_add(st_mem_usage, cg->chart_id, cg->chart_title, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); - - rrddim_set_by_pointer(st_mem_usage, cg->rd_mem_usage, cg->memory.usage_in_bytes); + // complete the iteration + rrddim_set(cg->st_cpu, "user", cg->cpuacct_stat.user); + rrddim_set(cg->st_cpu, "system", cg->cpuacct_stat.system); + rrdset_done(cg->st_cpu); } - if(likely(do_mem_detailed && cg->memory.updated_detailed)) { - if(unlikely(!cg->rd_mem_detailed_rss)) - cg->rd_mem_detailed_rss = rrddim_add(st_mem_detailed_rss, cg->chart_id, cg->chart_title, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); - - rrddim_set_by_pointer(st_mem_detailed_rss, cg->rd_mem_detailed_rss, cg->memory.total_rss); + if (unlikely(do_mem_usage && cg->memory.updated_usage_in_bytes)) { + if (unlikely(!cg->st_mem_usage)) { + cg->st_mem_usage = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "mem_usage", + NULL, + "mem", + "systemd.service.memory.usage", + "Systemd Services Used Memory", + "MiB", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 5, + update_every, + RRDSET_TYPE_STACKED); - if(unlikely(!cg->rd_mem_detailed_mapped)) - cg->rd_mem_detailed_mapped = rrddim_add(st_mem_detailed_mapped, cg->chart_id, cg->chart_title, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + rrdset_update_rrdlabels(cg->st_mem_usage, cg->chart_labels); + rrddim_add(cg->st_mem_usage, "ram", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + if (likely(do_swap_usage)) + rrddim_add(cg->st_mem_usage, "swap", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + } - rrddim_set_by_pointer(st_mem_detailed_mapped, cg->rd_mem_detailed_mapped, cg->memory.total_mapped_file); + rrddim_set(cg->st_mem_usage, "ram", cg->memory.usage_in_bytes); + if (likely(do_swap_usage)) { + if (!(cg->options & CGROUP_OPTIONS_IS_UNIFIED)) { + rrddim_set( + cg->st_mem_usage, + "swap", + cg->memory.msw_usage_in_bytes > (cg->memory.usage_in_bytes + cg->memory.total_inactive_file) ? + cg->memory.msw_usage_in_bytes - + (cg->memory.usage_in_bytes + cg->memory.total_inactive_file) : + 0); + } else { + rrddim_set(cg->st_mem_usage, "swap", cg->memory.msw_usage_in_bytes); + } + } + rrdset_done(cg->st_mem_usage); + } - if(unlikely(!cg->rd_mem_detailed_cache)) - cg->rd_mem_detailed_cache = rrddim_add(st_mem_detailed_cache, cg->chart_id, cg->chart_title, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + if (likely(do_mem_failcnt && cg->memory.updated_failcnt)) { + if (unlikely(do_mem_failcnt && !cg->st_mem_failcnt)) { + cg->st_mem_failcnt = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "mem_failcnt", + NULL, + "mem", + "systemd.service.memory.failcnt", + "Systemd Services Memory Limit Failures", + "failures/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 10, + update_every, + RRDSET_TYPE_LINE); - rrddim_set_by_pointer(st_mem_detailed_cache, cg->rd_mem_detailed_cache, cg->memory.total_cache); + rrdset_update_rrdlabels(cg->st_mem_failcnt, cg->chart_labels); + rrddim_add(cg->st_mem_failcnt, "fail", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + } - if(unlikely(!cg->rd_mem_detailed_writeback)) - cg->rd_mem_detailed_writeback = rrddim_add(st_mem_detailed_writeback, cg->chart_id, cg->chart_title, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_set(cg->st_mem_failcnt, "fail", cg->memory.failcnt); + rrdset_done(cg->st_mem_failcnt); + } - rrddim_set_by_pointer(st_mem_detailed_writeback, cg->rd_mem_detailed_writeback, cg->memory.total_writeback); + if (likely(do_mem_detailed && cg->memory.updated_detailed)) { + if (unlikely(!cg->st_mem)) { + cg->st_mem = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "mem_ram_usage", + NULL, + "mem", + "systemd.service.memory.ram.usage", + "Systemd Services Memory", + "MiB", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 15, + update_every, + RRDSET_TYPE_STACKED); - if(unlikely(!cg->rd_mem_detailed_pgfault)) - cg->rd_mem_detailed_pgfault = rrddim_add(st_mem_detailed_pgfault, cg->chart_id, cg->chart_title, system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + rrdset_update_rrdlabels(cg->st_mem, cg->chart_labels); + rrddim_add(cg->st_mem, "rss", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(cg->st_mem, "cache", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(cg->st_mem, "mapped_file", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(cg->st_mem, "rss_huge", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + } - rrddim_set_by_pointer(st_mem_detailed_pgfault, cg->rd_mem_detailed_pgfault, cg->memory.total_pgfault); + rrddim_set(cg->st_mem, "rss", cg->memory.total_rss); + rrddim_set(cg->st_mem, "cache", cg->memory.total_cache); + rrddim_set(cg->st_mem, "mapped_file", cg->memory.total_mapped_file); + rrddim_set(cg->st_mem, "rss_huge", cg->memory.total_rss_huge); + rrdset_done(cg->st_mem); - if(unlikely(!cg->rd_mem_detailed_pgmajfault)) - cg->rd_mem_detailed_pgmajfault = rrddim_add(st_mem_detailed_pgmajfault, cg->chart_id, cg->chart_title, system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + if (unlikely(!cg->st_writeback)) { + cg->st_writeback = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "mem_writeback", + NULL, + "mem", + "systemd.service.memory.writeback", + "Systemd Services Writeback Memory", + "MiB", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 20, + update_every, + RRDSET_TYPE_STACKED); - rrddim_set_by_pointer(st_mem_detailed_pgmajfault, cg->rd_mem_detailed_pgmajfault, cg->memory.total_pgmajfault); + rrdset_update_rrdlabels(cg->st_writeback, cg->chart_labels); + rrddim_add(cg->st_writeback, "writeback", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + rrddim_add(cg->st_writeback, "dirty", NULL, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + } - if(unlikely(!cg->rd_mem_detailed_pgpgin)) - cg->rd_mem_detailed_pgpgin = rrddim_add(st_mem_detailed_pgpgin, cg->chart_id, cg->chart_title, system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + rrddim_set(cg->st_writeback, "writeback", cg->memory.total_writeback); + rrddim_set(cg->st_writeback, "dirty", cg->memory.total_dirty); + rrdset_done(cg->st_writeback); - rrddim_set_by_pointer(st_mem_detailed_pgpgin, cg->rd_mem_detailed_pgpgin, cg->memory.total_pgpgin); + if (unlikely(!cg->st_pgfaults)) { + cg->st_pgfaults = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "mem_pgfault", + NULL, + "mem", + "systemd.service.memory.paging.faults", + "Systemd Services Memory Minor and Major Page Faults", + "MiB/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 25, + update_every, + RRDSET_TYPE_AREA); - if(unlikely(!cg->rd_mem_detailed_pgpgout)) - cg->rd_mem_detailed_pgpgout = rrddim_add(st_mem_detailed_pgpgout, cg->chart_id, cg->chart_title, system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + rrdset_update_rrdlabels(cg->st_pgfaults, cg->chart_labels); + rrddim_add(cg->st_pgfaults, "minor", NULL, system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_pgfaults, "major", NULL, system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + } - rrddim_set_by_pointer(st_mem_detailed_pgpgout, cg->rd_mem_detailed_pgpgout, cg->memory.total_pgpgout); - } + rrddim_set(cg->st_pgfaults, "minor", cg->memory.total_pgfault); + rrddim_set(cg->st_pgfaults, "major", cg->memory.total_pgmajfault); + rrdset_done(cg->st_pgfaults); - if(likely(do_mem_failcnt && cg->memory.updated_failcnt)) { - if(unlikely(!cg->rd_mem_failcnt)) - cg->rd_mem_failcnt = rrddim_add(st_mem_failcnt, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); + if (unlikely(!cg->st_mem_activity)) { + cg->st_mem_activity = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "mem_paging_io", + NULL, + "mem", + "systemd.service.memory.paging.io", + "Systemd Services Memory Paging IO", + "MiB/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 30, + update_every, + RRDSET_TYPE_AREA); + + rrdset_update_rrdlabels(cg->st_mem_activity, cg->chart_labels); + rrddim_add(cg->st_mem_activity, "in", NULL, system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_mem_activity, "out", NULL, -system_page_size, 1024 * 1024, RRD_ALGORITHM_INCREMENTAL); + } - rrddim_set_by_pointer(st_mem_failcnt, cg->rd_mem_failcnt, cg->memory.failcnt); + rrddim_set(cg->st_mem_activity, "in", cg->memory.total_pgpgin); + rrddim_set(cg->st_mem_activity, "out", cg->memory.total_pgpgout); + rrdset_done(cg->st_mem_activity); } - if(likely(do_swap_usage && cg->memory.updated_msw_usage_in_bytes)) { - if(unlikely(!cg->rd_swap_usage)) - cg->rd_swap_usage = rrddim_add(st_swap_usage, cg->chart_id, cg->chart_title, 1, 1024 * 1024, RRD_ALGORITHM_ABSOLUTE); + if (likely(do_io && cg->io_service_bytes.updated)) { + if (unlikely(!cg->st_io)) { + cg->st_io = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "disk_io", + NULL, + "disk", + "systemd.service.disk.io", + "Systemd Services Disk Read/Write Bandwidth", + "KiB/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 35, + update_every, + RRDSET_TYPE_AREA); - if(!(cg->options & CGROUP_OPTIONS_IS_UNIFIED)) { - rrddim_set_by_pointer( - st_swap_usage, - cg->rd_swap_usage, - cg->memory.msw_usage_in_bytes > (cg->memory.usage_in_bytes + cg->memory.total_inactive_file) ? - cg->memory.msw_usage_in_bytes - (cg->memory.usage_in_bytes + cg->memory.total_inactive_file) : 0); - } else { - rrddim_set_by_pointer(st_swap_usage, cg->rd_swap_usage, cg->memory.msw_usage_in_bytes); + rrdset_update_rrdlabels(cg->st_io, cg->chart_labels); + rrddim_add(cg->st_io, "read", NULL, 1, 1024, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_io, "write", NULL, -1, 1024, RRD_ALGORITHM_INCREMENTAL); } + rrddim_set(cg->st_io, "read", cg->io_service_bytes.Read); + rrddim_set(cg->st_io, "write", cg->io_service_bytes.Write); + rrdset_done(cg->st_io); } - if(likely(do_io && cg->io_service_bytes.updated)) { - if(unlikely(!cg->rd_io_service_bytes_read)) - cg->rd_io_service_bytes_read = rrddim_add(st_io_read, cg->chart_id, cg->chart_title, 1, 1024, RRD_ALGORITHM_INCREMENTAL); - - rrddim_set_by_pointer(st_io_read, cg->rd_io_service_bytes_read, cg->io_service_bytes.Read); - - if(unlikely(!cg->rd_io_service_bytes_write)) - cg->rd_io_service_bytes_write = rrddim_add(st_io_write, cg->chart_id, cg->chart_title, 1, 1024, RRD_ALGORITHM_INCREMENTAL); - - rrddim_set_by_pointer(st_io_write, cg->rd_io_service_bytes_write, cg->io_service_bytes.Write); - } - - if(likely(do_io_ops && cg->io_serviced.updated)) { - if(unlikely(!cg->rd_io_serviced_read)) - cg->rd_io_serviced_read = rrddim_add(st_io_serviced_read, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); - - rrddim_set_by_pointer(st_io_serviced_read, cg->rd_io_serviced_read, cg->io_serviced.Read); - - if(unlikely(!cg->rd_io_serviced_write)) - cg->rd_io_serviced_write = rrddim_add(st_io_serviced_write, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); + if (likely(do_io_ops && cg->io_serviced.updated)) { + if (unlikely(!cg->st_serviced_ops)) { + cg->st_serviced_ops = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "disk_iops", + NULL, + "disk", + "systemd.service.disk.iops", + "Systemd Services Disk Read/Write Operations", + "operations/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 40, + update_every, + RRDSET_TYPE_LINE); - rrddim_set_by_pointer(st_io_serviced_write, cg->rd_io_serviced_write, cg->io_serviced.Write); + rrdset_update_rrdlabels(cg->st_serviced_ops, cg->chart_labels); + rrddim_add(cg->st_serviced_ops, "read", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_serviced_ops, "write", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL); + } + rrddim_set(cg->st_serviced_ops, "read", cg->io_serviced.Read); + rrddim_set(cg->st_serviced_ops, "write", cg->io_serviced.Write); + rrdset_done(cg->st_serviced_ops); } - if(likely(do_throttle_io && cg->throttle_io_service_bytes.updated)) { - if(unlikely(!cg->rd_throttle_io_read)) - cg->rd_throttle_io_read = rrddim_add(st_throttle_io_read, cg->chart_id, cg->chart_title, 1, 1024, RRD_ALGORITHM_INCREMENTAL); - - rrddim_set_by_pointer(st_throttle_io_read, cg->rd_throttle_io_read, cg->throttle_io_service_bytes.Read); - - if(unlikely(!cg->rd_throttle_io_write)) - cg->rd_throttle_io_write = rrddim_add(st_throttle_io_write, cg->chart_id, cg->chart_title, 1, 1024, RRD_ALGORITHM_INCREMENTAL); + if (likely(do_throttle_io && cg->throttle_io_service_bytes.updated)) { + if (unlikely(!cg->st_throttle_io)) { + cg->st_throttle_io = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "disk_throttle_io", + NULL, + "disk", + "systemd.service.disk.throttle.io", + "Systemd Services Throttle Disk Read/Write Bandwidth", + "KiB/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 45, + update_every, + RRDSET_TYPE_AREA); - rrddim_set_by_pointer(st_throttle_io_write, cg->rd_throttle_io_write, cg->throttle_io_service_bytes.Write); + rrdset_update_rrdlabels(cg->st_throttle_io, cg->chart_labels); + rrddim_add(cg->st_throttle_io, "read", NULL, 1, 1024, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_throttle_io, "write", NULL, -1, 1024, RRD_ALGORITHM_INCREMENTAL); + } + rrddim_set(cg->st_throttle_io, "read", cg->throttle_io_service_bytes.Read); + rrddim_set(cg->st_throttle_io, "write", cg->throttle_io_service_bytes.Write); + rrdset_done(cg->st_throttle_io); } - if(likely(do_throttle_ops && cg->throttle_io_serviced.updated)) { - if(unlikely(!cg->rd_throttle_io_serviced_read)) - cg->rd_throttle_io_serviced_read = rrddim_add(st_throttle_ops_read, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); - - rrddim_set_by_pointer(st_throttle_ops_read, cg->rd_throttle_io_serviced_read, cg->throttle_io_serviced.Read); - - if(unlikely(!cg->rd_throttle_io_serviced_write)) - cg->rd_throttle_io_serviced_write = rrddim_add(st_throttle_ops_write, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); + if (likely(do_throttle_ops && cg->throttle_io_serviced.updated)) { + if (unlikely(!cg->st_throttle_serviced_ops)) { + cg->st_throttle_serviced_ops = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "disk_throttle_iops", + NULL, + "disk", + "systemd.service.disk.throttle.iops", + "Systemd Services Throttle Disk Read/Write Operations", + "operations/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 50, + update_every, + RRDSET_TYPE_LINE); - rrddim_set_by_pointer(st_throttle_ops_write, cg->rd_throttle_io_serviced_write, cg->throttle_io_serviced.Write); + rrdset_update_rrdlabels(cg->st_throttle_serviced_ops, cg->chart_labels); + rrddim_add(cg->st_throttle_serviced_ops, "read", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_throttle_serviced_ops, "write", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL); + } + rrddim_set(cg->st_throttle_serviced_ops, "read", cg->throttle_io_serviced.Read); + rrddim_set(cg->st_throttle_serviced_ops, "write", cg->throttle_io_serviced.Write); + rrdset_done(cg->st_throttle_serviced_ops); } - if(likely(do_queued_ops && cg->io_queued.updated)) { - if(unlikely(!cg->rd_io_queued_read)) - cg->rd_io_queued_read = rrddim_add(st_queued_ops_read, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); - - rrddim_set_by_pointer(st_queued_ops_read, cg->rd_io_queued_read, cg->io_queued.Read); - - if(unlikely(!cg->rd_io_queued_write)) - cg->rd_io_queued_write = rrddim_add(st_queued_ops_write, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); + if (likely(do_queued_ops && cg->io_queued.updated)) { + if (unlikely(!cg->st_queued_ops)) { + cg->st_queued_ops = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "disk_queued_iops", + NULL, + "disk", + "systemd.service.disk.queued_iops", + "Systemd Services Queued Disk Read/Write Operations", + "operations/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 55, + update_every, + RRDSET_TYPE_LINE); - rrddim_set_by_pointer(st_queued_ops_write, cg->rd_io_queued_write, cg->io_queued.Write); + rrdset_update_rrdlabels(cg->st_queued_ops, cg->chart_labels); + rrddim_add(cg->st_queued_ops, "read", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_queued_ops, "write", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL); + } + rrddim_set(cg->st_queued_ops, "read", cg->io_queued.Read); + rrddim_set(cg->st_queued_ops, "write", cg->io_queued.Write); + rrdset_done(cg->st_queued_ops); } - if(likely(do_merged_ops && cg->io_merged.updated)) { - if(unlikely(!cg->rd_io_merged_read)) - cg->rd_io_merged_read = rrddim_add(st_merged_ops_read, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); - - rrddim_set_by_pointer(st_merged_ops_read, cg->rd_io_merged_read, cg->io_merged.Read); - - if(unlikely(!cg->rd_io_merged_write)) - cg->rd_io_merged_write = rrddim_add(st_merged_ops_write, cg->chart_id, cg->chart_title, 1, 1, RRD_ALGORITHM_INCREMENTAL); + if (likely(do_merged_ops && cg->io_merged.updated)) { + if (unlikely(!cg->st_merged_ops)) { + cg->st_merged_ops = rrdset_create_localhost( + cgroup_chart_type(type, cg), + "disk_merged_iops", + NULL, + "disk", + "systemd.service.disk.merged_iops", + "Systemd Services Merged Disk Read/Write Operations", + "operations/s", + PLUGIN_CGROUPS_NAME, + PLUGIN_CGROUPS_MODULE_SYSTEMD_NAME, + systemd_cgroup_chart_priority + 60, + update_every, + RRDSET_TYPE_LINE); - rrddim_set_by_pointer(st_merged_ops_write, cg->rd_io_merged_write, cg->io_merged.Write); + rrdset_update_rrdlabels(cg->st_merged_ops, cg->chart_labels); + rrddim_add(cg->st_merged_ops, "read", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + rrddim_add(cg->st_merged_ops, "write", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL); + } + rrddim_set(cg->st_merged_ops, "read", cg->io_merged.Read); + rrddim_set(cg->st_merged_ops, "write", cg->io_merged.Write); + rrdset_done(cg->st_merged_ops); } } - - // complete the iteration - if(likely(do_cpu)) - rrdset_done(st_cpu); - - if(likely(do_mem_usage)) - rrdset_done(st_mem_usage); - - if(unlikely(do_mem_detailed)) { - rrdset_done(st_mem_detailed_cache); - rrdset_done(st_mem_detailed_rss); - rrdset_done(st_mem_detailed_mapped); - rrdset_done(st_mem_detailed_writeback); - rrdset_done(st_mem_detailed_pgfault); - rrdset_done(st_mem_detailed_pgmajfault); - rrdset_done(st_mem_detailed_pgpgin); - rrdset_done(st_mem_detailed_pgpgout); - } - - if(likely(do_mem_failcnt)) - rrdset_done(st_mem_failcnt); - - if(likely(do_swap_usage)) - rrdset_done(st_swap_usage); - - if(likely(do_io)) { - rrdset_done(st_io_read); - rrdset_done(st_io_write); - } - - if(likely(do_io_ops)) { - rrdset_done(st_io_serviced_read); - rrdset_done(st_io_serviced_write); - } - - if(likely(do_throttle_io)) { - rrdset_done(st_throttle_io_read); - rrdset_done(st_throttle_io_write); - } - - if(likely(do_throttle_ops)) { - rrdset_done(st_throttle_ops_read); - rrdset_done(st_throttle_ops_write); - } - - if(likely(do_queued_ops)) { - rrdset_done(st_queued_ops_read); - rrdset_done(st_queued_ops_write); - } - - if(likely(do_merged_ops)) { - rrdset_done(st_merged_ops_read); - rrdset_done(st_merged_ops_write); - } -} - -static inline char *cgroup_chart_type(char *buffer, const char *id, size_t len) { - if(buffer[0]) return buffer; - - if(id[0] == '\0' || (id[0] == '/' && id[1] == '\0')) - strncpy(buffer, "cgroup_root", len); - else - snprintfz(buffer, len, "%s%s", cgroup_chart_id_prefix, id); - - netdata_fix_chart_id(buffer); - return buffer; } static inline void update_cpu_limits(char **filename, unsigned long long *value, struct cgroup *cg) { @@ -3719,7 +3387,7 @@ void update_cgroup_charts(int update_every) { k8s_is_kubepod(cg) ? "CPU Usage (100%% = 1000 mCPU)" : "CPU Usage (100%% = 1 core)"); cg->st_cpu = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu" , NULL , "cpu" @@ -3788,7 +3456,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "CPU Usage within the limits"); cg->st_cpu_limit = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu_limit" , NULL , "cpu" @@ -3840,7 +3508,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "CPU Throttled Runnable Periods"); cg->st_cpu_nr_throttled = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "throttled" , NULL , "cpu" @@ -3865,7 +3533,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "CPU Throttled Time Duration"); cg->st_cpu_throttled_time = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "throttled_duration" , NULL , "cpu" @@ -3892,7 +3560,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "CPU Time Relative Share"); cg->st_cpu_shares = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu_shares" , NULL , "cpu" @@ -3926,7 +3594,7 @@ void update_cgroup_charts(int update_every) { "CPU Usage (100%% = 1 core) Per Core"); cg->st_cpu_per_core = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu_per_core" , NULL , "cpu" @@ -3960,7 +3628,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Memory Usage"); cg->st_mem = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem" , NULL , "mem" @@ -4018,7 +3686,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Writeback Memory"); cg->st_writeback = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "writeback" , NULL , "mem" @@ -4051,7 +3719,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Memory Activity"); cg->st_mem_activity = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem_activity" , NULL , "mem" @@ -4080,7 +3748,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Memory Page Faults"); cg->st_pgfaults = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "pgfaults" , NULL , "mem" @@ -4110,7 +3778,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Used Memory"); cg->st_mem_usage = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem_usage" , NULL , "mem" @@ -4175,7 +3843,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Used RAM within the limits"); cg->st_mem_usage_limit = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem_usage_limit" , NULL , "mem" @@ -4205,7 +3873,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Memory Utilization"); cg->st_mem_utilization = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem_utilization" , NULL , "mem" @@ -4253,7 +3921,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Memory Limit Failures"); cg->st_mem_failcnt = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem_failcnt" , NULL , "mem" @@ -4281,7 +3949,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "I/O Bandwidth (all disks)"); cg->st_io = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "io" , NULL , "disk" @@ -4311,7 +3979,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Serviced I/O Operations (all disks)"); cg->st_serviced_ops = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "serviced_ops" , NULL , "disk" @@ -4341,7 +4009,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Throttle I/O Bandwidth (all disks)"); cg->st_throttle_io = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "throttle_io" , NULL , "disk" @@ -4371,7 +4039,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Throttle Serviced I/O Operations (all disks)"); cg->st_throttle_serviced_ops = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "throttle_serviced_ops" , NULL , "disk" @@ -4401,7 +4069,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Queued I/O Operations (all disks)"); cg->st_queued_ops = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "queued_ops" , NULL , "disk" @@ -4431,7 +4099,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Merged I/O Operations (all disks)"); cg->st_merged_ops = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "merged_ops" , NULL , "disk" @@ -4467,7 +4135,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "CPU some pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu_some_pressure" , NULL , "cpu" @@ -4490,7 +4158,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "CPU some pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu_some_pressure_stall_time" , NULL , "cpu" @@ -4517,7 +4185,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "CPU full pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu_full_pressure" , NULL , "cpu" @@ -4540,7 +4208,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "CPU full pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "cpu_full_pressure_stall_time" , NULL , "cpu" @@ -4570,7 +4238,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "Memory some pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem_some_pressure" , NULL , "mem" @@ -4593,7 +4261,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "Memory some pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "memory_some_pressure_stall_time" , NULL , "mem" @@ -4622,7 +4290,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "Memory full pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "mem_full_pressure" , NULL , "mem" @@ -4646,7 +4314,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "Memory full pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "memory_full_pressure_stall_time" , NULL , "mem" @@ -4676,7 +4344,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "IRQ some pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "irq_some_pressure" , NULL , "interrupts" @@ -4699,7 +4367,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "IRQ some pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "irq_some_pressure_stall_time" , NULL , "interrupts" @@ -4728,7 +4396,7 @@ void update_cgroup_charts(int update_every) { snprintfz(title, CHART_TITLE_MAX, "IRQ full pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "irq_full_pressure" , NULL , "interrupts" @@ -4752,7 +4420,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "IRQ full pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "irq_full_pressure_stall_time" , NULL , "interrupts" @@ -4782,7 +4450,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "I/O some pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "io_some_pressure" , NULL , "disk" @@ -4805,7 +4473,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "I/O some pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "io_some_pressure_stall_time" , NULL , "disk" @@ -4833,7 +4501,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "I/O full pressure"); chart = pcs->share_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "io_full_pressure" , NULL , "disk" @@ -4856,7 +4524,7 @@ void update_cgroup_charts(int update_every) { RRDSET *chart; snprintfz(title, CHART_TITLE_MAX, "I/O full pressure stall time"); chart = pcs->total_time.st = rrdset_create_localhost( - cgroup_chart_type(type, cg->chart_id, RRD_ID_LENGTH_MAX) + cgroup_chart_type(type, cg) , "io_full_pressure_stall_time" , NULL , "disk" diff --git a/collectors/cgroups.plugin/sys_fs_cgroup.h b/collectors/cgroups.plugin/sys_fs_cgroup.h index dc800ba91..625be755d 100644 --- a/collectors/cgroups.plugin/sys_fs_cgroup.h +++ b/collectors/cgroups.plugin/sys_fs_cgroup.h @@ -39,6 +39,6 @@ typedef struct netdata_ebpf_cgroup_shm { #include "../proc.plugin/plugin_proc.h" -char *cgroup_parse_resolved_name_and_labels(DICTIONARY *labels, char *data); +char *cgroup_parse_resolved_name_and_labels(RRDLABELS *labels, char *data); #endif //NETDATA_SYS_FS_CGROUP_H diff --git a/collectors/cgroups.plugin/tests/test_cgroups_plugin.c b/collectors/cgroups.plugin/tests/test_cgroups_plugin.c index a0f915309..bb1fb3988 100644 --- a/collectors/cgroups.plugin/tests/test_cgroups_plugin.c +++ b/collectors/cgroups.plugin/tests/test_cgroups_plugin.c @@ -20,13 +20,12 @@ struct k8s_test_data { int i; }; -static int read_label_callback(const char *name, const char *value, RRDLABEL_SRC ls, void *data) +static int read_label_callback(const char *name, const char *value, void *data) { struct k8s_test_data *test_data = (struct k8s_test_data *)data; test_data->result_key[test_data->i] = name; test_data->result_value[test_data->i] = value; - test_data->result_ls[test_data->i] = ls; test_data->i++; @@ -37,7 +36,7 @@ static void test_cgroup_parse_resolved_name(void **state) { UNUSED(state); - DICTIONARY *labels = rrdlabels_create(); + RRDLABELS *labels = rrdlabels_create(); struct k8s_test_data test_data[] = { // One label diff --git a/collectors/charts.d.plugin/ap/README.md b/collectors/charts.d.plugin/ap/README.md index 339ad1375..5b6e75130 100644..120000 --- a/collectors/charts.d.plugin/ap/README.md +++ b/collectors/charts.d.plugin/ap/README.md @@ -1,104 +1 @@ -<!-- -title: "Access point monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/ap/README.md" -sidebar_label: "Access points" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Remotes/Devices" ---> - -# Access point collector - -The `ap` collector visualizes data related to access points. - -## Example Netdata charts - -![image](https://cloud.githubusercontent.com/assets/2662304/12377654/9f566e88-bd2d-11e5-855a-e0ba96b8fd98.png) - -## How it works - -It does the following: - -1. Runs `iw dev` searching for interfaces that have `type AP`. - - From the same output it collects the SSIDs each AP supports by looking for lines `ssid NAME`. - - Example: - -```sh -# iw dev -phy#0 - Interface wlan0 - ifindex 3 - wdev 0x1 - addr 7c:dd:90:77:34:2a - ssid TSAOUSIS - type AP - channel 7 (2442 MHz), width: 20 MHz, center1: 2442 MHz -``` - -2. For each interface found, it runs `iw INTERFACE station dump`. - - From the output is collects: - - - rx/tx bytes - - rx/tx packets - - tx retries - - tx failed - - signal strength - - rx/tx bitrate - - expected throughput - - Example: - -```sh -# iw wlan0 station dump -Station 40:b8:37:5a:ed:5e (on wlan0) - inactive time: 910 ms - rx bytes: 15588897 - rx packets: 127772 - tx bytes: 52257763 - tx packets: 95802 - tx retries: 2162 - tx failed: 28 - signal: -43 dBm - signal avg: -43 dBm - tx bitrate: 65.0 MBit/s MCS 7 - rx bitrate: 1.0 MBit/s - expected throughput: 32.125Mbps - authorized: yes - authenticated: yes - preamble: long - WMM/WME: yes - MFP: no - TDLS peer: no -``` - -3. For each interface found, it creates 6 charts: - - - Number of Connected clients - - Bandwidth for all clients - - Packets for all clients - - Transmit Issues for all clients - - Average Signal among all clients - - Average Bitrate (including average expected throughput) among all clients - -## Configuration - -If using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), make sure `netdata-plugin-chartsd` is installed. - -Edit the `charts.d/ap.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config charts.d/ap.conf -``` - -You can only set `ap_update_every=NUMBER` to change the data collection frequency. - -## Auto-detection - -The plugin is able to auto-detect if you are running access points on your linux box. - - +integrations/access_points.md
\ No newline at end of file diff --git a/collectors/charts.d.plugin/ap/integrations/access_points.md b/collectors/charts.d.plugin/ap/integrations/access_points.md new file mode 100644 index 000000000..0d8d39046 --- /dev/null +++ b/collectors/charts.d.plugin/ap/integrations/access_points.md @@ -0,0 +1,173 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/ap/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/ap/metadata.yaml" +sidebar_label: "Access Points" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Access Points + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: charts.d.plugin +Module: ap + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +The ap collector visualizes data related to wireless access points. + +It uses the `iw` command line utility to detect access points. For each interface that is of `type AP`, it then runs `iw INTERFACE station dump` and collects statistics. + +This collector is only supported on the following platforms: + +- Linux + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +The plugin is able to auto-detect if you are running access points on your linux box. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per wireless device + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ap.clients | clients | clients | +| ap.net | received, sent | kilobits/s | +| ap.packets | received, sent | packets/s | +| ap.issues | retries, failures | issues/s | +| ap.signal | average signal | dBm | +| ap.bitrate | receive, transmit, expected | Mbps | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Install charts.d plugin + +If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. + + +#### `iw` utility. + +Make sure the `iw` utility is installed. + + +### Configuration + +#### File + +The configuration file name for this integration is `charts.d/ap.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config charts.d/ap.conf +``` +#### Options + +The config file is sourced by the charts.d plugin. It's a standard bash file. + +The following collapsed table contains all the options that can be configured for the ap collector. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| ap_update_every | The data collection frequency. If unset, will inherit the netdata update frequency. | 1 | False | +| ap_priority | Controls the order of charts at the netdata dashboard. | 6900 | False | +| ap_retries | The number of retries to do in case of failure before disabling the collector. | 10 | False | + +</details> + +#### Examples + +##### Change the collection frequency + +Specify a custom collection frequence (update_every) for this collector + +```yaml +# the data collection frequency +# if unset, will inherit the netdata update frequency +ap_update_every=10 + +# the charts priority on the dashboard +#ap_priority=6900 + +# the number of retries to do in case of failure +# before disabling the module +#ap_retries=10 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `ap` collector, run the `charts.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `charts.d.plugin` to debug the collector: + + ```bash + ./charts.d.plugin debug 1 ap + ``` + + diff --git a/collectors/charts.d.plugin/ap/metadata.yaml b/collectors/charts.d.plugin/ap/metadata.yaml index c4e96a14a..ee941e417 100644 --- a/collectors/charts.d.plugin/ap/metadata.yaml +++ b/collectors/charts.d.plugin/ap/metadata.yaml @@ -41,6 +41,9 @@ modules: setup: prerequisites: list: + - title: "Install charts.d plugin" + description: | + If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. - title: "`iw` utility." description: "Make sure the `iw` utility is installed." configuration: diff --git a/collectors/charts.d.plugin/apcupsd/README.md b/collectors/charts.d.plugin/apcupsd/README.md index 00e9697dc..fc6681fe6 100644..120000 --- a/collectors/charts.d.plugin/apcupsd/README.md +++ b/collectors/charts.d.plugin/apcupsd/README.md @@ -1,26 +1 @@ -<!-- -title: "APC UPS monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/apcupsd/README.md" -sidebar_label: "APC UPS" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Remotes/Devices" ---> - -# APC UPS collector - -Monitors different APC UPS models and retrieves status information using `apcaccess` tool. - -## Configuration - -If using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), make sure `netdata-plugin-chartsd` is installed. - -Edit the `charts.d/apcupsd.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config charts.d/apcupsd.conf -``` - - +integrations/apc_ups.md
\ No newline at end of file diff --git a/collectors/charts.d.plugin/apcupsd/integrations/apc_ups.md b/collectors/charts.d.plugin/apcupsd/integrations/apc_ups.md new file mode 100644 index 000000000..4d1f2edd6 --- /dev/null +++ b/collectors/charts.d.plugin/apcupsd/integrations/apc_ups.md @@ -0,0 +1,193 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/apcupsd/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/apcupsd/metadata.yaml" +sidebar_label: "APC UPS" +learn_status: "Published" +learn_rel_path: "Data Collection/UPS" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# APC UPS + + +<img src="https://netdata.cloud/img/apc.svg" width="150"/> + + +Plugin: charts.d.plugin +Module: apcupsd + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor APC UPS performance with Netdata for optimal uninterruptible power supply operations. Enhance your power supply reliability with real-time APC UPS metrics. + +The collector uses the `apcaccess` tool to contact the `apcupsd` daemon and get the APC UPS statistics. + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +By default, with no configuration provided, the collector will try to contact 127.0.0.1:3551 with using the `apcaccess` utility. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per ups + +Metrics related to UPS. Each UPS provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apcupsd.charge | charge | percentage | +| apcupsd.battery.voltage | voltage, nominal | Volts | +| apcupsd.input.voltage | voltage, min, max | Volts | +| apcupsd.output.voltage | absolute, nominal | Volts | +| apcupsd.input.frequency | frequency | Hz | +| apcupsd.load | load | percentage | +| apcupsd.load_usage | load | Watts | +| apcupsd.temperature | temp | Celsius | +| apcupsd.time | time | Minutes | +| apcupsd.online | online | boolean | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ apcupsd_ups_charge ](https://github.com/netdata/netdata/blob/master/health/health.d/apcupsd.conf) | apcupsd.charge | average UPS charge over the last minute | +| [ apcupsd_10min_ups_load ](https://github.com/netdata/netdata/blob/master/health/health.d/apcupsd.conf) | apcupsd.load | average UPS load over the last 10 minutes | +| [ apcupsd_last_collected_secs ](https://github.com/netdata/netdata/blob/master/health/health.d/apcupsd.conf) | apcupsd.load | number of seconds since the last successful data collection | + + +## Setup + +### Prerequisites + +#### Install charts.d plugin + +If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. + + +#### Required software + +Make sure the `apcaccess` and `apcupsd` are installed and running. + + +### Configuration + +#### File + +The configuration file name for this integration is `charts.d/apcupsd.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config charts.d/apcupsd.conf +``` +#### Options + +The config file is sourced by the charts.d plugin. It's a standard bash file. + +The following collapsed table contains all the options that can be configured for the apcupsd collector. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| apcupsd_sources | This is an array of apcupsd sources. You can have multiple entries there. Please refer to the example below on how to set it. | 127.0.0.1:3551 | False | +| apcupsd_timeout | How long to wait for apcupsd to respond. | 3 | False | +| apcupsd_update_every | The data collection frequency. If unset, will inherit the netdata update frequency. | 1 | False | +| apcupsd_priority | The charts priority on the dashboard. | 90000 | False | +| apcupsd_retries | The number of retries to do in case of failure before disabling the collector. | 10 | False | + +</details> + +#### Examples + +##### Multiple apcupsd sources + +Specify a multiple apcupsd sources along with a custom update interval + +```yaml +# add all your APC UPSes in this array - uncomment it too +declare -A apcupsd_sources=( + ["local"]="127.0.0.1:3551", + ["remote"]="1.2.3.4:3551" +) + +# how long to wait for apcupsd to respond +#apcupsd_timeout=3 + +# the data collection frequency +# if unset, will inherit the netdata update frequency +apcupsd_update_every=5 + +# the charts priority on the dashboard +#apcupsd_priority=90000 + +# the number of retries to do in case of failure +# before disabling the module +#apcupsd_retries=10 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `apcupsd` collector, run the `charts.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `charts.d.plugin` to debug the collector: + + ```bash + ./charts.d.plugin debug 1 apcupsd + ``` + + diff --git a/collectors/charts.d.plugin/apcupsd/metadata.yaml b/collectors/charts.d.plugin/apcupsd/metadata.yaml index d078074b7..07d56d48d 100644 --- a/collectors/charts.d.plugin/apcupsd/metadata.yaml +++ b/collectors/charts.d.plugin/apcupsd/metadata.yaml @@ -42,6 +42,9 @@ modules: setup: prerequisites: list: + - title: "Install charts.d plugin" + description: | + If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. - title: "Required software" description: "Make sure the `apcaccess` and `apcupsd` are installed and running." configuration: diff --git a/collectors/charts.d.plugin/charts.d.plugin.in b/collectors/charts.d.plugin/charts.d.plugin.in index 20996eb93..34a5a656e 100755 --- a/collectors/charts.d.plugin/charts.d.plugin.in +++ b/collectors/charts.d.plugin/charts.d.plugin.in @@ -20,6 +20,21 @@ PROGRAM_NAME="$(basename $0)" PROGRAM_NAME="${PROGRAM_NAME/.plugin/}" MODULE_NAME="main" +LOG_LEVEL_ERR=1 +LOG_LEVEL_WARN=2 +LOG_LEVEL_INFO=3 +LOG_LEVEL="$LOG_LEVEL_INFO" + +set_log_severity_level() { + case ${NETDATA_LOG_SEVERITY_LEVEL,,} in + "info") LOG_LEVEL="$LOG_LEVEL_INFO";; + "warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";; + "err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";; + esac +} + +set_log_severity_level + # ----------------------------------------------------------------------------- # create temp dir @@ -55,16 +70,19 @@ log() { } -warning() { - log WARNING "${@}" +info() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return + log INFO "${@}" } -error() { - log ERROR "${@}" +warning() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return + log WARNING "${@}" } -info() { - log INFO "${@}" +error() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return + log ERROR "${@}" } fatal() { diff --git a/collectors/charts.d.plugin/libreswan/README.md b/collectors/charts.d.plugin/libreswan/README.md index b6eeb0180..1416d9597 100644..120000 --- a/collectors/charts.d.plugin/libreswan/README.md +++ b/collectors/charts.d.plugin/libreswan/README.md @@ -1,61 +1 @@ -<!-- -title: "Libreswan IPSec tunnel monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/libreswan/README.md" -sidebar_label: "Libreswan IPSec tunnels" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Networking" ---> - -# Libreswan IPSec tunnel collector - -Collects bytes-in, bytes-out and uptime for all established libreswan IPSEC tunnels. - -The following charts are created, **per tunnel**: - -1. **Uptime** - -- the uptime of the tunnel - -2. **Traffic** - -- bytes in -- bytes out - -## Configuration - -If using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), make sure `netdata-plugin-chartsd` is installed. - -Edit the `charts.d/libreswan.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config charts.d/libreswan.conf -``` - -The plugin executes 2 commands to collect all the information it needs: - -```sh -ipsec whack --status -ipsec whack --trafficstatus -``` - -The first command is used to extract the currently established tunnels, their IDs and their names. -The second command is used to extract the current uptime and traffic. - -Most probably user `netdata` will not be able to query libreswan, so the `ipsec` commands will be denied. -The plugin attempts to run `ipsec` as `sudo ipsec ...`, to get access to libreswan statistics. - -To allow user `netdata` execute `sudo ipsec ...`, create the file `/etc/sudoers.d/netdata` with this content: - -``` -netdata ALL = (root) NOPASSWD: /sbin/ipsec whack --status -netdata ALL = (root) NOPASSWD: /sbin/ipsec whack --trafficstatus -``` - -Make sure the path `/sbin/ipsec` matches your setup (execute `which ipsec` to find the right path). - ---- - - +integrations/libreswan.md
\ No newline at end of file diff --git a/collectors/charts.d.plugin/libreswan/integrations/libreswan.md b/collectors/charts.d.plugin/libreswan/integrations/libreswan.md new file mode 100644 index 000000000..6f93a5f4c --- /dev/null +++ b/collectors/charts.d.plugin/libreswan/integrations/libreswan.md @@ -0,0 +1,193 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/libreswan/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/libreswan/metadata.yaml" +sidebar_label: "Libreswan" +learn_status: "Published" +learn_rel_path: "Data Collection/VPNs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Libreswan + + +<img src="https://netdata.cloud/img/libreswan.png" width="150"/> + + +Plugin: charts.d.plugin +Module: libreswan + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Libreswan performance for optimal IPsec VPN operations. Improve your VPN operations with Netdata''s real-time metrics and built-in alerts. + +The collector uses the `ipsec` command to collect the information it needs. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per IPSEC tunnel + +Metrics related to IPSEC tunnels. Each tunnel provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| libreswan.net | in, out | kilobits/s | +| libreswan.uptime | uptime | seconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Install charts.d plugin + +If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. + + +#### Permissions to execute `ipsec` + +The plugin executes 2 commands to collect all the information it needs: + +```sh +ipsec whack --status +ipsec whack --trafficstatus +``` + +The first command is used to extract the currently established tunnels, their IDs and their names. +The second command is used to extract the current uptime and traffic. + +Most probably user `netdata` will not be able to query libreswan, so the `ipsec` commands will be denied. +The plugin attempts to run `ipsec` as `sudo ipsec ...`, to get access to libreswan statistics. + +To allow user `netdata` execute `sudo ipsec ...`, create the file `/etc/sudoers.d/netdata` with this content: + +``` +netdata ALL = (root) NOPASSWD: /sbin/ipsec whack --status +netdata ALL = (root) NOPASSWD: /sbin/ipsec whack --trafficstatus +``` + +Make sure the path `/sbin/ipsec` matches your setup (execute `which ipsec` to find the right path). + + + +### Configuration + +#### File + +The configuration file name for this integration is `charts.d/libreswan.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config charts.d/libreswan.conf +``` +#### Options + +The config file is sourced by the charts.d plugin. It's a standard bash file. + +The following collapsed table contains all the options that can be configured for the libreswan collector. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| libreswan_update_every | The data collection frequency. If unset, will inherit the netdata update frequency. | 1 | False | +| libreswan_priority | The charts priority on the dashboard | 90000 | False | +| libreswan_retries | The number of retries to do in case of failure before disabling the collector. | 10 | False | +| libreswan_sudo | Whether to run `ipsec` with `sudo` or not. | 1 | False | + +</details> + +#### Examples + +##### Run `ipsec` without sudo + +Run the `ipsec` utility without sudo + +```yaml +# the data collection frequency +# if unset, will inherit the netdata update frequency +#libreswan_update_every=1 + +# the charts priority on the dashboard +#libreswan_priority=90000 + +# the number of retries to do in case of failure +# before disabling the module +#libreswan_retries=10 + +# set to 1, to run ipsec with sudo (the default) +# set to 0, to run ipsec without sudo +libreswan_sudo=0 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `libreswan` collector, run the `charts.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `charts.d.plugin` to debug the collector: + + ```bash + ./charts.d.plugin debug 1 libreswan + ``` + + diff --git a/collectors/charts.d.plugin/libreswan/metadata.yaml b/collectors/charts.d.plugin/libreswan/metadata.yaml index 484d79ede..77cb25450 100644 --- a/collectors/charts.d.plugin/libreswan/metadata.yaml +++ b/collectors/charts.d.plugin/libreswan/metadata.yaml @@ -40,6 +40,9 @@ modules: setup: prerequisites: list: + - title: "Install charts.d plugin" + description: | + If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. - title: "Permissions to execute `ipsec`" description: | The plugin executes 2 commands to collect all the information it needs: diff --git a/collectors/charts.d.plugin/nut/README.md b/collectors/charts.d.plugin/nut/README.md index 4608ce3e1..abfefd6f7 100644..120000 --- a/collectors/charts.d.plugin/nut/README.md +++ b/collectors/charts.d.plugin/nut/README.md @@ -1,79 +1 @@ -<!-- -title: "UPS/PDU monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/nut/README.md" -sidebar_label: "UPS/PDU" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Remotes/Devices" ---> - -# UPS/PDU collector - -Collects UPS data for all power devices configured in the system. - -The following charts will be created: - -1. **UPS Charge** - -- percentage changed - -2. **UPS Battery Voltage** - -- current voltage -- high voltage -- low voltage -- nominal voltage - -3. **UPS Input Voltage** - -- current voltage -- fault voltage -- nominal voltage - -4. **UPS Input Current** - -- nominal current - -5. **UPS Input Frequency** - -- current frequency -- nominal frequency - -6. **UPS Output Voltage** - -- current voltage - -7. **UPS Load** - -- current load - -8. **UPS Temperature** - -- current temperature - -## Configuration - -If using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), make sure `netdata-plugin-chartsd` is installed. - -Edit the `charts.d/nut.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config charts.d/nut.conf -``` - -This is the internal default for `charts.d/nut.conf` - -```sh -# a space separated list of UPS names -# if empty, the list returned by 'upsc -l' will be used -nut_ups= - -# how frequently to collect UPS data -nut_update_every=2 -``` - ---- - - +integrations/network_ups_tools_nut.md
\ No newline at end of file diff --git a/collectors/charts.d.plugin/nut/integrations/network_ups_tools_nut.md b/collectors/charts.d.plugin/nut/integrations/network_ups_tools_nut.md new file mode 100644 index 000000000..74be607a1 --- /dev/null +++ b/collectors/charts.d.plugin/nut/integrations/network_ups_tools_nut.md @@ -0,0 +1,207 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/nut/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/nut/metadata.yaml" +sidebar_label: "Network UPS Tools (NUT)" +learn_status: "Published" +learn_rel_path: "Data Collection/UPS" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Network UPS Tools (NUT) + + +<img src="https://netdata.cloud/img/plug-circle-bolt.svg" width="150"/> + + +Plugin: charts.d.plugin +Module: nut + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Examine UPS/PDU metrics with Netdata for insights into power device performance. Improve your power device performance with comprehensive dashboards and anomaly detection. + +This collector uses the `nut` (Network UPS Tools) to query statistics for multiple UPS devices. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per ups + +Metrics related to UPS. Each UPS provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| nut.charge | charge | percentage | +| nut.runtime | runtime | seconds | +| nut.battery.voltage | voltage, high, low, nominal | Volts | +| nut.input.voltage | voltage, fault, nominal | Volts | +| nut.input.current | nominal | Ampere | +| nut.input.frequency | frequency, nominal | Hz | +| nut.output.voltage | voltage | Volts | +| nut.load | load | percentage | +| nut.load_usage | load_usage | Watts | +| nut.temperature | temp | temperature | +| nut.clients | clients | clients | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ nut_ups_charge ](https://github.com/netdata/netdata/blob/master/health/health.d/nut.conf) | nut.charge | average UPS charge over the last minute | +| [ nut_10min_ups_load ](https://github.com/netdata/netdata/blob/master/health/health.d/nut.conf) | nut.load | average UPS load over the last 10 minutes | +| [ nut_last_collected_secs ](https://github.com/netdata/netdata/blob/master/health/health.d/nut.conf) | nut.load | number of seconds since the last successful data collection | + + +## Setup + +### Prerequisites + +#### Install charts.d plugin + +If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. + + +#### Required software + +Make sure the Network UPS Tools (`nut`) is installed and can detect your UPS devices. + + +### Configuration + +#### File + +The configuration file name for this integration is `charts.d/nut.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config charts.d/nut.conf +``` +#### Options + +The config file is sourced by the charts.d plugin. It's a standard bash file. + +The following collapsed table contains all the options that can be configured for the nut collector. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| nut_ups | A space separated list of UPS names. If empty, the list returned by `upsc -l` will be used. | | False | +| nut_names | Each line represents an alias for one UPS. If empty, the FQDN will be used. | | False | +| nut_timeout | How long to wait for nut to respond. | 2 | False | +| nut_clients_chart | Set this to 1 to enable another chart showing the number of UPS clients connected to `upsd`. | 1 | False | +| nut_update_every | The data collection frequency. If unset, will inherit the netdata update frequency. | 2 | False | +| nut_priority | The charts priority on the dashboard | 90000 | False | +| nut_retries | The number of retries to do in case of failure before disabling the collector. | 10 | False | + +</details> + +#### Examples + +##### Provide names to UPS devices + +Map aliases to UPS devices + +<details><summary>Config</summary> + +```yaml +# a space separated list of UPS names +# if empty, the list returned by 'upsc -l' will be used +#nut_ups= + +# each line represents an alias for one UPS +# if empty, the FQDN will be used +nut_names["XXXXXX"]="UPS-office" +nut_names["YYYYYY"]="UPS-rack" + +# how much time in seconds, to wait for nut to respond +#nut_timeout=2 + +# set this to 1, to enable another chart showing the number +# of UPS clients connected to upsd +#nut_clients_chart=1 + +# the data collection frequency +# if unset, will inherit the netdata update frequency +#nut_update_every=2 + +# the charts priority on the dashboard +#nut_priority=90000 + +# the number of retries to do in case of failure +# before disabling the module +#nut_retries=10 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `nut` collector, run the `charts.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `charts.d.plugin` to debug the collector: + + ```bash + ./charts.d.plugin debug 1 nut + ``` + + diff --git a/collectors/charts.d.plugin/nut/metadata.yaml b/collectors/charts.d.plugin/nut/metadata.yaml index ea2e6b2eb..ed3ffebf7 100644 --- a/collectors/charts.d.plugin/nut/metadata.yaml +++ b/collectors/charts.d.plugin/nut/metadata.yaml @@ -40,6 +40,9 @@ modules: setup: prerequisites: list: + - title: "Install charts.d plugin" + description: | + If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. - title: "Required software" description: "Make sure the Network UPS Tools (`nut`) is installed and can detect your UPS devices." configuration: diff --git a/collectors/charts.d.plugin/opensips/README.md b/collectors/charts.d.plugin/opensips/README.md index 1d7322140..bb85ba6d0 100644..120000 --- a/collectors/charts.d.plugin/opensips/README.md +++ b/collectors/charts.d.plugin/opensips/README.md @@ -1,24 +1 @@ -<!-- -title: "OpenSIPS monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/opensips/README.md" -sidebar_label: "OpenSIPS" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Networking" ---> - -# OpenSIPS collector - -## Configuration - -If using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), make sure `netdata-plugin-chartsd` is installed. - -Edit the `charts.d/opensips.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config charts.d/opensips.conf -``` - - +integrations/opensips.md
\ No newline at end of file diff --git a/collectors/charts.d.plugin/opensips/integrations/opensips.md b/collectors/charts.d.plugin/opensips/integrations/opensips.md new file mode 100644 index 000000000..96abc3325 --- /dev/null +++ b/collectors/charts.d.plugin/opensips/integrations/opensips.md @@ -0,0 +1,191 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/opensips/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/opensips/metadata.yaml" +sidebar_label: "OpenSIPS" +learn_status: "Published" +learn_rel_path: "Data Collection/Telephony Servers" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# OpenSIPS + + +<img src="https://netdata.cloud/img/opensips.png" width="150"/> + + +Plugin: charts.d.plugin +Module: opensips + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Examine OpenSIPS metrics for insights into SIP server operations. Study call rates, error rates, and response times for reliable voice over IP services. + +The collector uses the `opensipsctl` command line utility to gather OpenSIPS metrics. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +The collector will attempt to call `opensipsctl` along with a default number of parameters, even without any configuration. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per OpenSIPS instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| opensips.dialogs_active | active, early | dialogs | +| opensips.users | registered, location, contacts, expires | users | +| opensips.registrar | accepted, rejected | registrations/s | +| opensips.transactions | UAS, UAC | transactions/s | +| opensips.core_rcv | requests, replies | queries/s | +| opensips.core_fwd | requests, replies | queries/s | +| opensips.core_drop | requests, replies | queries/s | +| opensips.core_err | requests, replies | queries/s | +| opensips.core_bad | bad_URIs_rcvd, unsupported_methods, bad_msg_hdr | queries/s | +| opensips.tm_replies | received, relayed, local | replies/s | +| opensips.transactions_status | 2xx, 3xx, 4xx, 5xx, 6xx | transactions/s | +| opensips.transactions_inuse | inuse | transactions | +| opensips.sl_replies | 1xx, 2xx, 3xx, 4xx, 5xx, 6xx, sent, error, ACKed | replies/s | +| opensips.dialogs | processed, expire, failed | dialogs/s | +| opensips.net_waiting | UDP, TCP | kilobytes | +| opensips.uri_checks | positive, negative | checks / sec | +| opensips.traces | requests, replies | traces / sec | +| opensips.shmem | total, used, real_used, max_used, free | kilobytes | +| opensips.shmem_fragment | fragments | fragments | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Install charts.d plugin + +If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. + + +#### Required software + +The collector requires the `opensipsctl` to be installed. + + +### Configuration + +#### File + +The configuration file name for this integration is `charts.d/opensips.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config charts.d/opensips.conf +``` +#### Options + +The config file is sourced by the charts.d plugin. It's a standard bash file. + +The following collapsed table contains all the options that can be configured for the opensips collector. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| opensips_opts | Specify parameters to the `opensipsctl` command. If the default value fails to get global status, set here whatever options are needed to connect to the opensips server. | fifo get_statistics all | False | +| opensips_cmd | If `opensipsctl` is not in $PATH, specify it's full path here. | | False | +| opensips_timeout | How long to wait for `opensipsctl` to respond. | 2 | False | +| opensips_update_every | The data collection frequency. If unset, will inherit the netdata update frequency. | 5 | False | +| opensips_priority | The charts priority on the dashboard. | 80000 | False | +| opensips_retries | The number of retries to do in case of failure before disabling the collector. | 10 | False | + +</details> + +#### Examples + +##### Custom `opensipsctl` command + +Set a custom path to the `opensipsctl` command + +```yaml +#opensips_opts="fifo get_statistics all" +opensips_cmd=/opt/opensips/bin/opensipsctl +#opensips_timeout=2 + +# the data collection frequency +# if unset, will inherit the netdata update frequency +#opensips_update_every=5 + +# the charts priority on the dashboard +#opensips_priority=80000 + +# the number of retries to do in case of failure +# before disabling the module +#opensips_retries=10 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `opensips` collector, run the `charts.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `charts.d.plugin` to debug the collector: + + ```bash + ./charts.d.plugin debug 1 opensips + ``` + + diff --git a/collectors/charts.d.plugin/opensips/metadata.yaml b/collectors/charts.d.plugin/opensips/metadata.yaml index 27f663286..356de5615 100644 --- a/collectors/charts.d.plugin/opensips/metadata.yaml +++ b/collectors/charts.d.plugin/opensips/metadata.yaml @@ -41,6 +41,9 @@ modules: setup: prerequisites: list: + - title: "Install charts.d plugin" + description: | + If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. - title: "Required software" description: "The collector requires the `opensipsctl` to be installed." configuration: diff --git a/collectors/charts.d.plugin/sensors/README.md b/collectors/charts.d.plugin/sensors/README.md index 0dbe96225..7e5a416c4 100644..120000 --- a/collectors/charts.d.plugin/sensors/README.md +++ b/collectors/charts.d.plugin/sensors/README.md @@ -1,81 +1 @@ -# Linux machine sensors collector - -Use this collector when `lm-sensors` doesn't work on your device (e.g. for RPi temperatures). -For all other cases use the [Python collector](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/sensors), which supports multiple -jobs, is more efficient and performs calculations on top of the kernel provided values. - -This plugin will provide charts for all configured system sensors, by reading sensors directly from the kernel. -The values graphed are the raw hardware values of the sensors. - -The plugin will create Netdata charts for: - -1. **Temperature** -2. **Voltage** -3. **Current** -4. **Power** -5. **Fans Speed** -6. **Energy** -7. **Humidity** - -One chart for every sensor chip found and each of the above will be created. - -## Enable the collector - -If using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), make sure `netdata-plugin-chartsd` is installed. - -The `sensors` collector is disabled by default. - -To enable the collector, you need to edit the configuration file of `charts.d/sensors.conf`. You can do so by using the `edit config` script. - -> ### Info -> -> To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. -> It is recommended to use this way for configuring Netdata. -> -> Please also note that after most configuration changes you will need to [restart the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for the changes to take effect. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config charts.d.conf -``` - -You need to uncomment the regarding `sensors`, and set the value to `force`. - -```shell -# example=force -sensors=force -``` - -## Configuration - -Edit the `charts.d/sensors.conf` configuration file using `edit-config`: - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config charts.d/sensors.conf -``` - -This is the internal default for `charts.d/sensors.conf` - -```sh -# the directory the kernel keeps sensor data -sensors_sys_dir="${NETDATA_HOST_PREFIX}/sys/devices" - -# how deep in the tree to check for sensor data -sensors_sys_depth=10 - -# if set to 1, the script will overwrite internal -# script functions with code generated ones -# leave to 1, is faster -sensors_source_update=1 - -# how frequently to collect sensor data -# the default is to collect it at every iteration of charts.d -sensors_update_every= - -# array of sensors which are excluded -# the default is to include all -sensors_excluded=() -``` - ---- +integrations/linux_sensors_sysfs.md
\ No newline at end of file diff --git a/collectors/charts.d.plugin/sensors/integrations/linux_sensors_sysfs.md b/collectors/charts.d.plugin/sensors/integrations/linux_sensors_sysfs.md new file mode 100644 index 000000000..e0ce74d06 --- /dev/null +++ b/collectors/charts.d.plugin/sensors/integrations/linux_sensors_sysfs.md @@ -0,0 +1,200 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/sensors/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/charts.d.plugin/sensors/metadata.yaml" +sidebar_label: "Linux Sensors (sysfs)" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Linux Sensors (sysfs) + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: charts.d.plugin +Module: sensors + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Use this collector when `lm-sensors` doesn't work on your device (e.g. for RPi temperatures). +For all other cases use the [Python collector](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/sensors), which supports multiple jobs, is more efficient and performs calculations on top of the kernel provided values." + + +It will provide charts for all configured system sensors, by reading sensors directly from the kernel. +The values graphed are the raw hardware values of the sensors. + + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, the collector will try to read entries under `/sys/devices` + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per sensor chip + +Metrics related to sensor chips. Each chip provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| sensors.temp | {filename} | Celsius | +| sensors.volt | {filename} | Volts | +| sensors.curr | {filename} | Ampere | +| sensors.power | {filename} | Watt | +| sensors.fans | {filename} | Rotations / Minute | +| sensors.energy | {filename} | Joule | +| sensors.humidity | {filename} | Percent | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Install charts.d plugin + +If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. + + +#### Enable the sensors collector + +The `sensors` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `charts.d.conf` file. + +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config charts.d.conf +``` + +Change the value of the `sensors` setting to `force` and uncomment the line. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + + + +### Configuration + +#### File + +The configuration file name for this integration is `charts.d/sensors.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config charts.d/sensors.conf +``` +#### Options + +The config file is sourced by the charts.d plugin. It's a standard bash file. + +The following collapsed table contains all the options that can be configured for the sensors collector. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| sensors_sys_dir | The directory the kernel exposes sensor data. | /sys/devices | False | +| sensors_sys_depth | How deep in the tree to check for sensor data. | 10 | False | +| sensors_source_update | If set to 1, the script will overwrite internal script functions with code generated ones. | 1 | False | +| sensors_update_every | The data collection frequency. If unset, will inherit the netdata update frequency. | 1 | False | +| sensors_priority | The charts priority on the dashboard. | 90000 | False | +| sensors_retries | The number of retries to do in case of failure before disabling the collector. | 10 | False | + +</details> + +#### Examples + +##### Set sensors path depth + +Set a different sensors path depth + +```yaml +# the directory the kernel keeps sensor data +#sensors_sys_dir="/sys/devices" + +# how deep in the tree to check for sensor data +sensors_sys_depth=5 + +# if set to 1, the script will overwrite internal +# script functions with code generated ones +# leave to 1, is faster +#sensors_source_update=1 + +# the data collection frequency +# if unset, will inherit the netdata update frequency +#sensors_update_every= + +# the charts priority on the dashboard +#sensors_priority=90000 + +# the number of retries to do in case of failure +# before disabling the module +#sensors_retries=10 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `sensors` collector, run the `charts.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `charts.d.plugin` to debug the collector: + + ```bash + ./charts.d.plugin debug 1 sensors + ``` + + diff --git a/collectors/charts.d.plugin/sensors/metadata.yaml b/collectors/charts.d.plugin/sensors/metadata.yaml index 33beaad29..47f6f4042 100644 --- a/collectors/charts.d.plugin/sensors/metadata.yaml +++ b/collectors/charts.d.plugin/sensors/metadata.yaml @@ -44,7 +44,20 @@ modules: description: "" setup: prerequisites: - list: [] + list: + - title: "Install charts.d plugin" + description: | + If [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure `netdata-plugin-chartsd` is installed. + - title: "Enable the sensors collector" + description: | + The `sensors` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `charts.d.conf` file. + + ```bash + cd /etc/netdata # Replace this path with your Netdata config directory, if different + sudo ./edit-config charts.d.conf + ``` + + Change the value of the `sensors` setting to `force` and uncomment the line. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. configuration: file: name: charts.d/sensors.conf diff --git a/collectors/cups.plugin/README.md b/collectors/cups.plugin/README.md index 8652ec575..e32570639 100644..120000 --- a/collectors/cups.plugin/README.md +++ b/collectors/cups.plugin/README.md @@ -1,68 +1 @@ -<!-- -title: "Printers (cups.plugin)" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cups.plugin/README.md" -sidebar_label: "cups.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Remotes/Devices" ---> - -# Printers (cups.plugin) - -`cups.plugin` collects Common Unix Printing System (CUPS) metrics. - -## Prerequisites - -This plugin needs a running local CUPS daemon (`cupsd`). This plugin does not need any configuration. Supports cups since version 1.7. - -If you installed Netdata using our native packages, you will have to additionally install `netdata-plugin-cups` to use this plugin for data collection. It is not installed by default due to the large number of dependencies it requires. - -## Charts - -`cups.plugin` provides one common section `destinations` and one section per destination. - -> Destinations in CUPS represent individual printers or classes (collections or pools) of printers (<https://www.cups.org/doc/cupspm.html#working-with-destinations>) - -The section `server` provides these charts: - -1. **destinations by state** - - - idle - - printing - - stopped - -2. **destinations by options** - - - total - - accepting jobs - - shared - -3. **total job number by status** - - - pending - - processing - - held - -4. **total job size by status** - - - pending - - processing - - held - -For each destination the plugin provides these charts: - -1. **job number by status** - - - pending - - held - - processing - -2. **job size by status** - - - pending - - held - - processing - -At the moment only job status pending, processing, and held are reported because we do not have a method to collect stopped, canceled, aborted and completed jobs which scales. - - +integrations/cups.md
\ No newline at end of file diff --git a/collectors/cups.plugin/cups_plugin.c b/collectors/cups.plugin/cups_plugin.c index ce7f05d4d..82bc457a1 100644 --- a/collectors/cups.plugin/cups_plugin.c +++ b/collectors/cups.plugin/cups_plugin.c @@ -241,6 +241,8 @@ int main(int argc, char **argv) { error_log_errors_per_period = 100; error_log_throttle_period = 3600; + log_set_global_severity_for_external_plugins(); + parse_command_line(argc, argv); errno = 0; diff --git a/collectors/cups.plugin/integrations/cups.md b/collectors/cups.plugin/integrations/cups.md new file mode 100644 index 000000000..aa981a99e --- /dev/null +++ b/collectors/cups.plugin/integrations/cups.md @@ -0,0 +1,140 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/cups.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/cups.plugin/metadata.yaml" +sidebar_label: "CUPS" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# CUPS + + +<img src="https://netdata.cloud/img/cups.png" width="150"/> + + +Plugin: cups.plugin +Module: cups.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor CUPS performance for achieving optimal printing system operations. Monitor job statuses, queue lengths, and error rates to ensure smooth printing tasks. + +The plugin uses CUPS shared library to connect and monitor the server. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs to access the server. Netdata sets permissions during installation time to reach the server through its library. + +### Default Behavior + +#### Auto-Detection + +The plugin detects when CUPS server is running and tries to connect to it. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per CUPS instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cups.dests_state | idle, printing, stopped | dests | +| cups.dests_option | total, acceptingjobs, shared | dests | +| cups.job_num | pending, held, processing | jobs | +| cups.job_size | pending, held, processing | KB | + +### Per destination + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cups.destination_job_num | pending, held, processing | jobs | +| cups.destination_job_size | pending, held, processing | KB | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Minimum setup + +The CUPS server must be installed and running. If you installed `netdata` using a package manager, it is also necessary to install the package `netdata-plugin-cups`. + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:cups]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| command options | Additional parameters for the collector | | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/cups.plugin/metadata.yaml b/collectors/cups.plugin/metadata.yaml index a416d392e..9ec2f4118 100644 --- a/collectors/cups.plugin/metadata.yaml +++ b/collectors/cups.plugin/metadata.yaml @@ -37,7 +37,7 @@ modules: prerequisites: list: - title: Minimum setup - description: "The CUPS server must be installed and running." + description: "The CUPS server must be installed and running. If you installed `netdata` using a package manager, it is also necessary to install the package `netdata-plugin-cups`." configuration: file: name: "netdata.conf" diff --git a/collectors/debugfs.plugin/debugfs_plugin.c b/collectors/debugfs.plugin/debugfs_plugin.c index c189f908d..105b0a9e4 100644 --- a/collectors/debugfs.plugin/debugfs_plugin.c +++ b/collectors/debugfs.plugin/debugfs_plugin.c @@ -168,6 +168,8 @@ int main(int argc, char **argv) // disable syslog for debugfs.plugin error_log_syslog = 0; + log_set_global_severity_for_external_plugins(); + netdata_configured_host_prefix = getenv("NETDATA_HOST_PREFIX"); if (verify_netdata_host_prefix() == -1) exit(1); diff --git a/collectors/debugfs.plugin/integrations/linux_zswap.md b/collectors/debugfs.plugin/integrations/linux_zswap.md new file mode 100644 index 000000000..fa3948149 --- /dev/null +++ b/collectors/debugfs.plugin/integrations/linux_zswap.md @@ -0,0 +1,137 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/debugfs.plugin/integrations/linux_zswap.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/debugfs.plugin/metadata.yaml" +sidebar_label: "Linux ZSwap" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Linux ZSwap + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: debugfs.plugin +Module: /sys/kernel/debug/zswap + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collects zswap performance metrics on Linux systems. + + +Parse data from `debugfs file. + +This collector is only supported on the following platforms: + +- Linux + +This collector only supports collecting metrics from a single instance of this integration. + +This integration requires read access to files under `/sys/kernel/debug/zswap`, which are accessible only to the root user by default. Netdata uses Linux Capabilities to give the plugin access to debugfs. `CAP_DAC_READ_SEARCH` is added automatically during installation. This capability allows bypassing file read permission checks and directory read and execute permission checks. If file capabilities are not usable, then the plugin is instead installed with the SUID bit set in permissions so that it runs as root. + + +### Default Behavior + +#### Auto-Detection + +Assuming that debugfs is mounted and the required permissions are available, this integration will automatically detect whether or not the system is using zswap. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +Monitor the performance statistics of zswap. + +### Per Linux ZSwap instance + +Global zswap performance metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.zswap_pool_compression_ratio | compression_ratio | ratio | +| system.zswap_pool_compressed_size | compressed_size | bytes | +| system.zswap_pool_raw_size | uncompressed_size | bytes | +| system.zswap_rejections | compress_poor, kmemcache_fail, alloc_fail, reclaim_fail | rejections/s | +| system.zswap_pool_limit_hit | limit | events/s | +| system.zswap_written_back_raw_bytes | written_back | bytes/s | +| system.zswap_same_filled_raw_size | same_filled | bytes | +| system.zswap_duplicate_entry | duplicate | entries/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### filesystem + +The debugfs filesystem must be mounted on your host for plugin to collect data. You can run the command-line (`sudo mount -t debugfs none /sys/kernel/debug/`) to mount it locally. It is also recommended to modify your fstab (5) avoiding necessity to mount the filesystem before starting netdata. + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:debugfs]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| command options | Additinal parameters for collector | | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/debugfs.plugin/integrations/power_capping.md b/collectors/debugfs.plugin/integrations/power_capping.md new file mode 100644 index 000000000..b17ece9a6 --- /dev/null +++ b/collectors/debugfs.plugin/integrations/power_capping.md @@ -0,0 +1,131 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/debugfs.plugin/integrations/power_capping.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/debugfs.plugin/metadata.yaml" +sidebar_label: "Power Capping" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Kernel" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Power Capping + + +<img src="https://netdata.cloud/img/powersupply.svg" width="150"/> + + +Plugin: debugfs.plugin +Module: intel_rapl + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collects power capping performance metrics on Linux systems. + + +Parse data from `debugfs file. + +This collector is only supported on the following platforms: + +- Linux + +This collector only supports collecting metrics from a single instance of this integration. + +This integration requires read access to files under `/sys/devices/virtual/powercap`, which are accessible only to the root user by default. Netdata uses Linux Capabilities to give the plugin access to debugfs. `CAP_DAC_READ_SEARCH` is added automatically during installation. This capability allows bypassing file read permission checks and directory read and execute permission checks. If file capabilities are not usable, then the plugin is instead installed with the SUID bit set in permissions so that it runs as root. + + +### Default Behavior + +#### Auto-Detection + +Assuming that debugfs is mounted and the required permissions are available, this integration will automatically detect whether or not the system is using zswap. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +Monitor the Intel RAPL zones Consumption. + +### Per Power Capping instance + +Global Intel RAPL zones. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.powercap_intel_rapl_zone | Power | Watts | +| cpu.powercap_intel_rapl_subzones | dram, core, uncore | Watts | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### filesystem + +The debugfs filesystem must be mounted on your host for plugin to collect data. You can run the command-line (`sudo mount -t debugfs none /sys/kernel/debug/`) to mount it locally. It is also recommended to modify your fstab (5) avoiding necessity to mount the filesystem before starting netdata. + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:debugfs]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| command options | Additinal parameters for collector | | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/debugfs.plugin/integrations/system_memory_fragmentation.md b/collectors/debugfs.plugin/integrations/system_memory_fragmentation.md new file mode 100644 index 000000000..5eed517ed --- /dev/null +++ b/collectors/debugfs.plugin/integrations/system_memory_fragmentation.md @@ -0,0 +1,135 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/debugfs.plugin/integrations/system_memory_fragmentation.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/debugfs.plugin/metadata.yaml" +sidebar_label: "System Memory Fragmentation" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# System Memory Fragmentation + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: debugfs.plugin +Module: /sys/kernel/debug/extfrag + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collects memory fragmentation statistics from the Linux kernel + +Parse data from `debugfs` file + +This collector is only supported on the following platforms: + +- Linux + +This collector only supports collecting metrics from a single instance of this integration. + +This integration requires read access to files under `/sys/kernel/debug/extfrag`, which are accessible only to the root user by default. Netdata uses Linux Capabilities to give the plugin access to debugfs. `CAP_DAC_READ_SEARCH` is added automatically during installation. This capability allows bypassing file read permission checks and directory read and execute permission checks. If file capabilities are not usable, then the plugin is instead installed with the SUID bit set in permissions so that it runs as root. + + +### Default Behavior + +#### Auto-Detection + +Assuming that debugfs is mounted and the required permissions are available, this integration will automatically run by default. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +Monitor the overall memory fragmentation of the system. + +### Per node + +Memory fragmentation statistics for each NUMA node in the system. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| numa_node | The NUMA node the metrics are associated with. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.fragmentation_index_dma | order0, order1, order2, order3, order4, order5, order6, order7, order8, order9, order10 | index | +| mem.fragmentation_index_dma32 | order0, order1, order2, order3, order4, order5, order6, order7, order8, order9, order10 | index | +| mem.fragmentation_index_normal | order0, order1, order2, order3, order4, order5, order6, order7, order8, order9, order10 | index | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### filesystem + +The debugfs filesystem must be mounted on your host for plugin to collect data. You can run the command-line (`sudo mount -t debugfs none /sys/kernel/debug/`) to mount it locally. It is also recommended to modify your fstab (5) avoiding necessity to mount the filesystem before starting netdata. + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:debugfs]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| command options | Additinal parameters for collector | | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/debugfs.plugin/sys_devices_virtual_powercap.c b/collectors/debugfs.plugin/sys_devices_virtual_powercap.c index 5f22b19e2..f79b944a9 100644 --- a/collectors/debugfs.plugin/sys_devices_virtual_powercap.c +++ b/collectors/debugfs.plugin/sys_devices_virtual_powercap.c @@ -186,7 +186,7 @@ int do_sys_devices_virtual_powercap(int update_every, const char *name __maybe_u if(get_measurement(zone->path, &zone->energy_uj)) { fprintf(stdout, "BEGIN '%s'\n" - "SET power = %lld\n" + "SET power = %llu\n" "END\n" , zone->zone_chart_id , zone->energy_uj); @@ -200,7 +200,7 @@ int do_sys_devices_virtual_powercap(int update_every, const char *name __maybe_u for (struct zone_t *subzone = zone->subzones; subzone; subzone = subzone->next) { if(get_measurement(subzone->path, &subzone->energy_uj)) { fprintf(stdout, - "SET '%s' = %lld\n", + "SET '%s' = %llu\n", subzone->name, subzone->energy_uj); } diff --git a/collectors/diskspace.plugin/README.md b/collectors/diskspace.plugin/README.md index 5ca1090fd..c9f4e1c5e 100644..120000 --- a/collectors/diskspace.plugin/README.md +++ b/collectors/diskspace.plugin/README.md @@ -1,55 +1 @@ -# Monitor disk (diskspace.plugin) - -This plugin monitors the disk space usage of mounted disks, under Linux. The plugin requires Netdata to have execute/search permissions on the mount point itself, as well as each component of the absolute path to the mount point. - -Two charts are available for every mount: - -- Disk Space Usage -- Disk Files (inodes) Usage - -## configuration - -Simple patterns can be used to exclude mounts from showed statistics based on path or filesystem. By default read-only mounts are not displayed. To display them `yes` should be set for a chart instead of `auto`. - -By default, Netdata will enable monitoring metrics only when they are not zero. If they are constantly zero they are ignored. Metrics that will start having values, after Netdata is started, will be detected and charts will be automatically added to the dashboard (a refresh of the dashboard is needed for them to appear though). - -Netdata will try to detect mounts that are duplicates (i.e. from the same device), or binds, and will not display charts for them, as the device is usually already monitored. - -To configure this plugin, you need to edit the configuration file `netdata.conf`. You can do so by using the `edit config` script. - -> ### Info -> -> To edit configuration files in a safe way, we provide the [`edit config` script](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) located in your [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory) (typically is `/etc/netdata`) that creates the proper file and opens it in an editor automatically. -> It is recommended to use this way for configuring Netdata. -> -> Please also note that after most configuration changes you will need to [restart the Agent](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for the changes to take effect. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config netdata.conf -``` - -You can enable the effect of each line by uncommenting it. - -You can set `yes` for a chart instead of `auto` to enable it permanently. You can also set the `enable zero metrics` option to `yes` in the `[global]` section which enables charts with zero metrics for all internal Netdata plugins. - -```conf -[plugin:proc:diskspace] - # remove charts of unmounted disks = yes - # update every = 1 - # check for new mount points every = 15 - # exclude space metrics on paths = /proc/* /sys/* /var/run/user/* /run/user/* /snap/* /var/lib/docker/* - # exclude space metrics on filesystems = *gvfs *gluster* *s3fs *ipfs *davfs2 *httpfs *sshfs *gdfs *moosefs fusectl autofs - # space usage for all disks = auto - # inodes usage for all disks = auto -``` - -Charts can be enabled/disabled for every mount separately, just look for the name of the mount after `[plugin:proc:diskspace:`. - -```conf -[plugin:proc:diskspace:/] - # space usage = auto - # inodes usage = auto -``` - -> for disks performance monitoring, see the `proc` plugin, [here](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md#monitoring-disks) +integrations/disk_space.md
\ No newline at end of file diff --git a/collectors/diskspace.plugin/integrations/disk_space.md b/collectors/diskspace.plugin/integrations/disk_space.md new file mode 100644 index 000000000..5dd9514c3 --- /dev/null +++ b/collectors/diskspace.plugin/integrations/disk_space.md @@ -0,0 +1,139 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/diskspace.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/diskspace.plugin/metadata.yaml" +sidebar_label: "Disk space" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Disk space + + +<img src="https://netdata.cloud/img/hard-drive.svg" width="150"/> + + +Plugin: diskspace.plugin +Module: diskspace.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Disk space metrics for proficient storage management. Keep track of usage, free space, and error rates to prevent disk space issues. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +The plugin reads data from `/proc/self/mountinfo` and `/proc/diskstats file`. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per mount point + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| mount_point | Path used to mount a filesystem | +| filesystem | The filesystem used to format a partition. | +| mount_root | Root directory where mount points are present. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| disk.space | avail, used, reserved_for_root | GiB | +| disk.inodes | avail, used, reserved_for_root | inodes | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ disk_space_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/disks.conf) | disk.space | disk ${label:mount_point} space utilization | +| [ disk_inode_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/disks.conf) | disk.inodes | disk ${label:mount_point} inode utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:proc:diskspace]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + +You can also specify per mount point `[plugin:proc:diskspace:mountpoint]` + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| remove charts of unmounted disks | Remove chart when a device is unmounted on host. | yes | False | +| check for new mount points every | Parse proc files frequency. | 15 | False | +| exclude space metrics on paths | Do not show metrics (charts) for listed paths. This option accepts netdata simple pattern. | /proc/* /sys/* /var/run/user/* /run/user/* /snap/* /var/lib/docker/* | False | +| exclude space metrics on filesystems | Do not show metrics (charts) for listed filesystems. This option accepts netdata simple pattern. | *gvfs *gluster* *s3fs *ipfs *davfs2 *httpfs *sshfs *gdfs *moosefs fusectl autofs | False | +| exclude inode metrics on filesystems | Do not show metrics (charts) for listed filesystems. This option accepts netdata simple pattern. | msdosfs msdos vfat overlayfs aufs* *unionfs | False | +| space usage for all disks | Define if plugin will show metrics for space usage. When value is set to `auto` plugin will try to access information to display if filesystem or path was not discarded with previous option. | auto | False | +| inodes usage for all disks | Define if plugin will show metrics for inode usage. When value is set to `auto` plugin will try to access information to display if filesystem or path was not discarded with previous option. | auto | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/diskspace.plugin/plugin_diskspace.c b/collectors/diskspace.plugin/plugin_diskspace.c index 2153494d9..43c9105dc 100644 --- a/collectors/diskspace.plugin/plugin_diskspace.c +++ b/collectors/diskspace.plugin/plugin_diskspace.c @@ -42,7 +42,7 @@ struct mount_point_metadata { int updated; int slow; - DICTIONARY *chart_labels; + RRDLABELS *chart_labels; size_t collected; // the number of times this has been collected diff --git a/collectors/ebpf.plugin/README.md b/collectors/ebpf.plugin/README.md index fb036a5aa..06915ea52 100644 --- a/collectors/ebpf.plugin/README.md +++ b/collectors/ebpf.plugin/README.md @@ -261,7 +261,7 @@ You can also enable the following eBPF programs: - `swap` : This eBPF program creates charts that show information about swap access. - `mdflush`: This eBPF program creates charts that show information about - `sync`: Monitor calls to syscalls sync(2), fsync(2), fdatasync(2), syncfs(2), msync(2), and sync_file_range(2). -- `network viewer`: This eBPF program creates charts with information about `TCP` and `UDP` functions, including the +- `socket`: This eBPF program creates charts with information about `TCP` and `UDP` functions, including the bandwidth consumed by each. multi-device software flushes. - `vfs`: This eBPF program creates charts that show information about VFS (Virtual File System) functions. @@ -302,12 +302,13 @@ are divided in the following sections: #### `[network connections]` -You can configure the information shown on `outbound` and `inbound` charts with the settings in this section. +You can configure the information shown with function `ebpf_socket` using the settings in this section. ```conf [network connections] - maximum dimensions = 500 + enabled = yes resolve hostname ips = no + resolve service names = yes ports = 1-1024 !145 !domain hostnames = !example.com ips = !127.0.0.1/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 fc00::/7 @@ -318,24 +319,23 @@ write `ports = 19999`, Netdata will collect only connections for itself. The `ho [simple patterns](https://github.com/netdata/netdata/blob/master/libnetdata/simple_pattern/README.md). The `ports`, and `ips` settings accept negation (`!`) to deny specific values or asterisk alone to define all values. -In the above example, Netdata will collect metrics for all ports between 1 and 443, with the exception of 53 (domain) -and 145. +In the above example, Netdata will collect metrics for all ports between `1` and `1024`, with the exception of `53` (domain) +and `145`. The following options are available: +- `enabled`: Disable network connections monitoring. This can affect directly some funcion output. +- `resolve hostname ips`: Enable resolving IPs to hostnames. It is disabled by default because it can be too slow. +- `resolve service names`: Convert destination ports into service names, for example, port `53` protocol `UDP` becomes `domain`. + all names are read from /etc/services. - `ports`: Define the destination ports for Netdata to monitor. - `hostnames`: The list of hostnames that can be resolved to an IP address. - `ips`: The IP or range of IPs that you want to monitor. You can use IPv4 or IPv6 addresses, use dashes to define a - range of IPs, or use CIDR values. By default, only data for private IP addresses is collected, but this can - be changed with the `ips` setting. + range of IPs, or use CIDR values. -By default, Netdata displays up to 500 dimensions on network connection charts. If there are more possible dimensions, -they will be bundled into the `other` dimension. You can increase the number of shown dimensions by changing -the `maximum dimensions` setting. - -The dimensions for the traffic charts are created using the destination IPs of the sockets by default. This can be -changed setting `resolve hostname ips = yes` and restarting Netdata, after this Netdata will create dimensions using -the `hostnames` every time that is possible to resolve IPs to their hostnames. +By default the traffic table is created using the destination IPs and ports of the sockets. This can be +changed, so that Netdata uses service names (if possible), by specifying `resolve service name = yes` in the configuration +section. #### `[service name]` @@ -990,13 +990,15 @@ shows how the lockdown module impacts `ebpf.plugin` based on the selected option If you or your distribution compiled the kernel with the last combination, your system cannot load shared libraries required to run `ebpf.plugin`. -## Function +## Functions + +### ebpf_thread The eBPF plugin has a [function](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md) named `ebpf_thread` that controls its internal threads and helps to reduce the overhead on host. Using the function you can run the plugin with all threads disabled and enable them only when you want to take a look in specific areas. -### List threads +#### List threads To list all threads status you can query directly the endpoint function: @@ -1006,7 +1008,7 @@ It is also possible to query a specific thread adding keyword `thread` and threa `http://localhost:19999/api/v1/function?function=ebpf_thread%20thread:mount` -### Enable thread +#### Enable thread It is possible to enable a specific thread using the keyword `enable`: @@ -1019,14 +1021,14 @@ after the thread name: in this example thread `mount` will run during 600 seconds (10 minutes). -### Disable thread +#### Disable thread It is also possible to stop any thread running using the keyword `disable`. For example, to disable `cachestat` you can request: `http://localhost:19999/api/v1/function?function=ebpf_thread%20disable:cachestat` -### Debugging threads +#### Debugging threads You can verify the impact of threads on the host by running the [ebpf_thread_function.sh](https://github.com/netdata/netdata/blob/master/tests/ebpf/ebpf_thread_function.sh) @@ -1036,3 +1038,34 @@ You can check the results of having threads running on your environment in the N dashboard <img src="https://github.com/netdata/netdata/assets/49162938/91823573-114c-4c16-b634-cc46f7bb1bcf" alt="Threads running." /> + +### ebpf_socket + +The eBPF plugin has a [function](https://github.com/netdata/netdata/blob/master/docs/cloud/netdata-functions.md) named +`ebpf_socket` that shows the current status of open sockets on host. + +#### Families + +The plugin shows by default sockets for IPV4 and IPV6, but it is possible to select a specific family by passing the +family as an argument: + +`http://localhost:19999/api/v1/function?function=ebpf_socket%20family:IPV4` + +#### Resolve + +The plugin resolves ports to service names by default. You can show the port number by disabling the name resolution: + +`http://localhost:19999/api/v1/function?function=ebpf_socket%20resolve:NO` + +#### CIDR + +The plugin shows connections for all possible destination IPs by default. You can limit the range by specifying the CIDR: + +`http://localhost:19999/api/v1/function?function=ebpf_socket%20cidr:192.168.1.0/24` + +#### PORT + +The plugin shows connections for all possible ports by default. You can limit the range by specifying a port or range +of ports: + +`http://localhost:19999/api/v1/function?function=ebpf_socket%20port:1-1024` diff --git a/collectors/ebpf.plugin/ebpf.c b/collectors/ebpf.plugin/ebpf.c index 844047305..834808fa5 100644 --- a/collectors/ebpf.plugin/ebpf.c +++ b/collectors/ebpf.plugin/ebpf.c @@ -49,176 +49,258 @@ struct netdata_static_thread cgroup_integration_thread = { }; ebpf_module_t ebpf_modules[] = { - { .thread_name = "process", .config_name = "process", .thread_description = NETDATA_EBPF_MODULE_PROCESS_DESC, - .enabled = 0, .start_routine = ebpf_process_thread, + { .info = {.thread_name = "process", + .config_name = "process", + .thread_description = NETDATA_EBPF_MODULE_PROCESS_DESC}, + .functions = {.start_routine = ebpf_process_thread, + .apps_routine = ebpf_process_create_apps_charts, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_process_create_apps_charts, .maps = NULL, - .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &process_config, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &process_config, .config_file = NETDATA_PROCESS_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_10 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0 }, - { .thread_name = "socket", .config_name = "socket", .thread_description = NETDATA_EBPF_SOCKET_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_socket_thread, + { .info = {.thread_name = "socket", + .config_name = "socket", + .thread_description = NETDATA_EBPF_SOCKET_MODULE_DESC}, + .functions = {.start_routine = ebpf_socket_thread, + .apps_routine = ebpf_socket_create_apps_charts, + .fnct_routine = ebpf_socket_read_open_connections, + .fcnt_name = EBPF_FUNCTION_SOCKET, + .fcnt_desc = EBPF_PLUGIN_SOCKET_FUNCTION_DESCRIPTION, + .fcnt_thread_chart_name = NULL, + .fcnt_thread_lifetime_name = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_socket_create_apps_charts, .maps = NULL, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &socket_config, .config_file = NETDATA_NETWORK_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = socket_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "cachestat", .config_name = "cachestat", .thread_description = NETDATA_EBPF_CACHESTAT_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_cachestat_thread, + { .info = {.thread_name = "cachestat", .config_name = "cachestat", .thread_description = NETDATA_EBPF_CACHESTAT_MODULE_DESC}, + .functions = {.start_routine = ebpf_cachestat_thread, + .apps_routine = ebpf_cachestat_create_apps_charts, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_cachestat_create_apps_charts, .maps = cachestat_maps, - .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &cachestat_config, + .maps = cachestat_maps, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &cachestat_config, .config_file = NETDATA_CACHESTAT_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18| NETDATA_V5_4 | NETDATA_V5_14 | NETDATA_V5_15 | NETDATA_V5_16, .load = EBPF_LOAD_LEGACY, .targets = cachestat_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "sync", .config_name = "sync", .thread_description = NETDATA_EBPF_SYNC_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_sync_thread, + { .info = {.thread_name = "sync", + .config_name = "sync", + .thread_description = NETDATA_EBPF_SYNC_MODULE_DESC}, + .functions = {.start_routine = ebpf_sync_thread, + .apps_routine = NULL, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .maps = NULL, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &sync_config, + .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &sync_config, .config_file = NETDATA_SYNC_CONFIG_FILE, // All syscalls have the same kernels .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = sync_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "dc", .config_name = "dc", .thread_description = NETDATA_EBPF_DC_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_dcstat_thread, + { .info = {.thread_name = "dc", + .config_name = "dc", + .thread_description = NETDATA_EBPF_DC_MODULE_DESC}, + .functions = {.start_routine = ebpf_dcstat_thread, + .apps_routine = ebpf_dcstat_create_apps_charts, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_dcstat_create_apps_charts, .maps = dcstat_maps, + .maps = dcstat_maps, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &dcstat_config, .config_file = NETDATA_DIRECTORY_DCSTAT_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = dc_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "swap", .config_name = "swap", .thread_description = NETDATA_EBPF_SWAP_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_swap_thread, + { .info = {.thread_name = "swap", .config_name = "swap", .thread_description = NETDATA_EBPF_SWAP_MODULE_DESC}, + .functions = {.start_routine = ebpf_swap_thread, + .apps_routine = ebpf_swap_create_apps_charts, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_swap_create_apps_charts, .maps = NULL, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &swap_config, .config_file = NETDATA_DIRECTORY_SWAP_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = swap_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "vfs", .config_name = "vfs", .thread_description = NETDATA_EBPF_VFS_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_vfs_thread, + { .info = {.thread_name = "vfs", + .config_name = "vfs", + .thread_description = NETDATA_EBPF_VFS_MODULE_DESC}, + .functions = {.start_routine = ebpf_vfs_thread, + .apps_routine = ebpf_vfs_create_apps_charts, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_vfs_create_apps_charts, .maps = NULL, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &vfs_config, .config_file = NETDATA_DIRECTORY_VFS_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = vfs_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "filesystem", .config_name = "filesystem", .thread_description = NETDATA_EBPF_FS_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_filesystem_thread, + { .info = {.thread_name = "filesystem", .config_name = "filesystem", .thread_description = NETDATA_EBPF_FS_MODULE_DESC}, + .functions = {.start_routine = ebpf_filesystem_thread, + .apps_routine = NULL, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &fs_config, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &fs_config, .config_file = NETDATA_FILESYSTEM_CONFIG_FILE, //We are setting kernels as zero, because we load eBPF programs according the kernel running. .kernels = 0, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "disk", .config_name = "disk", .thread_description = NETDATA_EBPF_DISK_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_disk_thread, + { .info = {.thread_name = "disk", + .config_name = "disk", + .thread_description = NETDATA_EBPF_DISK_MODULE_DESC}, + .functions = {.start_routine = ebpf_disk_thread, + .apps_routine = NULL, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &disk_config, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &disk_config, .config_file = NETDATA_DISK_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "mount", .config_name = "mount", .thread_description = NETDATA_EBPF_MOUNT_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_mount_thread, + { .info = {.thread_name = "mount", + .config_name = "mount", + .thread_description = NETDATA_EBPF_MOUNT_MODULE_DESC}, + .functions = {.start_routine = ebpf_mount_thread, + .apps_routine = NULL, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &mount_config, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &mount_config, .config_file = NETDATA_MOUNT_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = mount_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "fd", .config_name = "fd", .thread_description = NETDATA_EBPF_FD_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_fd_thread, + { .info = { .thread_name = "fd", + .config_name = "fd", + .thread_description = NETDATA_EBPF_FD_MODULE_DESC}, + .functions = {.start_routine = ebpf_fd_thread, + .apps_routine = ebpf_fd_create_apps_charts, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_fd_create_apps_charts, .maps = NULL, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &fd_config, .config_file = NETDATA_FD_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_11 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = fd_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "hardirq", .config_name = "hardirq", .thread_description = NETDATA_EBPF_HARDIRQ_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_hardirq_thread, + { .info = { .thread_name = "hardirq", + .config_name = "hardirq", + .thread_description = NETDATA_EBPF_HARDIRQ_MODULE_DESC}, + .functions = {.start_routine = ebpf_hardirq_thread, + .apps_routine = NULL, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &hardirq_config, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &hardirq_config, .config_file = NETDATA_HARDIRQ_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "softirq", .config_name = "softirq", .thread_description = NETDATA_EBPF_SOFTIRQ_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_softirq_thread, + { .info = { .thread_name = "softirq", + .config_name = "softirq", + .thread_description = NETDATA_EBPF_SOFTIRQ_MODULE_DESC}, + .functions = {.start_routine = ebpf_softirq_thread, + .apps_routine = NULL, + .fnct_routine = NULL }, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &softirq_config, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &softirq_config, .config_file = NETDATA_SOFTIRQ_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "oomkill", .config_name = "oomkill", .thread_description = NETDATA_EBPF_OOMKILL_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_oomkill_thread, + { .info = {.thread_name = "oomkill", + .config_name = "oomkill", + .thread_description = NETDATA_EBPF_OOMKILL_MODULE_DESC}, + .functions = {.start_routine = ebpf_oomkill_thread, + .apps_routine = ebpf_oomkill_create_apps_charts, + .fnct_routine = NULL},.enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_oomkill_create_apps_charts, .maps = NULL, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &oomkill_config, .config_file = NETDATA_OOMKILL_CONFIG_FILE, .kernels = NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "shm", .config_name = "shm", .thread_description = NETDATA_EBPF_SHM_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_shm_thread, + { .info = {.thread_name = "shm", + .config_name = "shm", + .thread_description = NETDATA_EBPF_SHM_MODULE_DESC}, + .functions = {.start_routine = ebpf_shm_thread, + .apps_routine = ebpf_shm_create_apps_charts, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_LEVEL_REAL_PARENT, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = ebpf_shm_create_apps_charts, .maps = NULL, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &shm_config, .config_file = NETDATA_DIRECTORY_SHM_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = shm_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "mdflush", .config_name = "mdflush", .thread_description = NETDATA_EBPF_MD_MODULE_DESC, - .enabled = 0, .start_routine = ebpf_mdflush_thread, + { .info = { .thread_name = "mdflush", + .config_name = "mdflush", + .thread_description = NETDATA_EBPF_MD_MODULE_DESC}, + .functions = {.start_routine = ebpf_mdflush_thread, + .apps_routine = NULL, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &mdflush_config, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = &mdflush_config, .config_file = NETDATA_DIRECTORY_MDFLUSH_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = mdflush_targets, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = "functions", .config_name = "functions", .thread_description = NETDATA_EBPF_FUNCTIONS_MODULE_DESC, - .enabled = 1, .start_routine = ebpf_function_thread, + { .info = { .thread_name = "functions", + .config_name = "functions", + .thread_description = NETDATA_EBPF_FUNCTIONS_MODULE_DESC}, + .functions = {.start_routine = ebpf_function_thread, + .apps_routine = NULL, + .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 1, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, - .apps_routine = NULL, .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = NULL, + .maps = NULL, .pid_map_size = ND_EBPF_DEFAULT_PID_SIZE, .names = NULL, .cfg = NULL, .config_file = NETDATA_DIRECTORY_FUNCTIONS_CONFIG_FILE, .kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_14, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES, .lifetime = EBPF_DEFAULT_LIFETIME, .running_time = 0}, - { .thread_name = NULL, .enabled = 0, .start_routine = NULL, .update_every = EBPF_DEFAULT_UPDATE_EVERY, + { .info = {.thread_name = NULL, .config_name = NULL}, + .functions = {.start_routine = NULL, .apps_routine = NULL, .fnct_routine = NULL}, + .enabled = NETDATA_THREAD_EBPF_NOT_RUNNING, .update_every = EBPF_DEFAULT_UPDATE_EVERY, .global_charts = 0, .apps_charts = NETDATA_EBPF_APPS_FLAG_NO, .apps_level = NETDATA_APPS_NOT_SET, - .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, .apps_routine = NULL, .maps = NULL, - .pid_map_size = 0, .names = NULL, .cfg = NULL, .config_name = NULL, .kernels = 0, .load = EBPF_LOAD_LEGACY, + .cgroup_charts = CONFIG_BOOLEAN_NO, .mode = MODE_ENTRY, .optional = 0, .maps = NULL, + .pid_map_size = 0, .names = NULL, .cfg = NULL, .kernels = 0, .load = EBPF_LOAD_LEGACY, .targets = NULL, .probe_links = NULL, .objects = NULL, .thread = NULL, .maps_per_core = CONFIG_BOOLEAN_YES}, }; @@ -559,6 +641,8 @@ ebpf_network_viewer_options_t network_viewer_opt; ebpf_plugin_stats_t plugin_statistics = {.core = 0, .legacy = 0, .running = 0, .threads = 0, .tracepoints = 0, .probes = 0, .retprobes = 0, .trampolines = 0, .memlock_kern = 0, .hash_tables = 0}; +netdata_ebpf_judy_pid_t ebpf_judy_pid = {.pid_table = NULL, .index = {.JudyLArray = NULL}}; +bool ebpf_plugin_exit = false; #ifdef LIBBPF_MAJOR_VERSION struct btf *default_btf = NULL; @@ -580,6 +664,61 @@ char *btf_path = NULL; /***************************************************************** * + * FUNCTIONS USED TO MANIPULATE JUDY ARRAY + * + *****************************************************************/ + +/** + * Hashtable insert unsafe + * + * Find or create a value associated to the index + * + * @return The lsocket = 0 when new item added to the array otherwise the existing item value is returned in *lsocket + * we return a pointer to a pointer, so that the caller can put anything needed at the value of the index. + * The pointer to pointer we return has to be used before any other operation that may change the index (insert/delete). + * + */ +void **ebpf_judy_insert_unsafe(PPvoid_t arr, Word_t key) +{ + JError_t J_Error; + Pvoid_t *idx = JudyLIns(arr, key, &J_Error); + if (unlikely(idx == PJERR)) { + netdata_log_error("Cannot add PID to JudyL, JU_ERRNO_* == %u, ID == %d", + JU_ERRNO(&J_Error), JU_ERRID(&J_Error)); + } + + return idx; +} + +/** + * Get PID from judy + * + * Get a pointer for the `pid` from judy_array; + * + * @param judy_array a judy array where PID is the primary key + * @param pid pid stored. + */ +netdata_ebpf_judy_pid_stats_t *ebpf_get_pid_from_judy_unsafe(PPvoid_t judy_array, uint32_t pid) +{ + netdata_ebpf_judy_pid_stats_t **pid_pptr = + (netdata_ebpf_judy_pid_stats_t **)ebpf_judy_insert_unsafe(judy_array, pid); + netdata_ebpf_judy_pid_stats_t *pid_ptr = *pid_pptr; + if (likely(*pid_pptr == NULL)) { + // a new PID added to the index + *pid_pptr = aral_mallocz(ebpf_judy_pid.pid_table); + + pid_ptr = *pid_pptr; + + pid_ptr->cmdline = NULL; + pid_ptr->socket_stats.JudyLArray = NULL; + rw_spinlock_init(&pid_ptr->socket_stats.rw_spinlock); + } + + return pid_ptr; +} + +/***************************************************************** + * * FUNCTIONS USED TO ALLOCATE APPS/CGROUP MEMORIES (ARAL) * *****************************************************************/ @@ -626,7 +765,7 @@ static inline void ebpf_check_before2go() i = 0; int j; pthread_mutex_lock(&ebpf_exit_cleanup); - for (j = 0; ebpf_modules[j].thread_name != NULL; j++) { + for (j = 0; ebpf_modules[j].info.thread_name != NULL; j++) { if (ebpf_modules[j].enabled < NETDATA_THREAD_EBPF_STOPPING) i++; } @@ -704,14 +843,15 @@ void ebpf_unload_legacy_code(struct bpf_object *objects, struct bpf_link **probe static void ebpf_unload_unique_maps() { int i; - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { // These threads are cleaned with other functions if (i != EBPF_MODULE_SOCKET_IDX) continue; if (ebpf_modules[i].enabled != NETDATA_THREAD_EBPF_STOPPED) { if (ebpf_modules[i].enabled != NETDATA_THREAD_EBPF_NOT_RUNNING) - netdata_log_error("Cannot unload maps for thread %s, because it is not stopped.", ebpf_modules[i].thread_name); + netdata_log_error("Cannot unload maps for thread %s, because it is not stopped.", + ebpf_modules[i].info.thread_name); continue; } @@ -775,13 +915,12 @@ static void ebpf_unload_sync() } } -int ebpf_exit_plugin = 0; /** * Close the collector gracefully * * @param sig is the signal number used to close the collector */ -static void ebpf_stop_threads(int sig) +void ebpf_stop_threads(int sig) { UNUSED(sig); static int only_one = 0; @@ -794,11 +933,11 @@ static void ebpf_stop_threads(int sig) } only_one = 1; int i; - for (i = 0; ebpf_modules[i].thread_name != NULL; i++) { + for (i = 0; ebpf_modules[i].info.thread_name != NULL; i++) { if (ebpf_modules[i].enabled < NETDATA_THREAD_EBPF_STOPPING) { netdata_thread_cancel(*ebpf_modules[i].thread->thread); #ifdef NETDATA_DEV_MODE - netdata_log_info("Sending cancel for thread %s", ebpf_modules[i].thread_name); + netdata_log_info("Sending cancel for thread %s", ebpf_modules[i].info.thread_name); #endif } } @@ -811,7 +950,7 @@ static void ebpf_stop_threads(int sig) #endif pthread_mutex_unlock(&mutex_cgroup_shm); - ebpf_exit_plugin = 1; + ebpf_plugin_exit = true; ebpf_check_before2go(); @@ -839,8 +978,8 @@ static void ebpf_stop_threads(int sig) * @param root a pointer for the targets. */ static inline void ebpf_create_apps_for_module(ebpf_module_t *em, struct ebpf_target *root) { - if (em->enabled < NETDATA_THREAD_EBPF_STOPPING && em->apps_charts && em->apps_routine) - em->apps_routine(em, root); + if (em->enabled < NETDATA_THREAD_EBPF_STOPPING && em->apps_charts && em->functions.apps_routine) + em->functions.apps_routine(em, root); } /** @@ -1370,6 +1509,607 @@ void ebpf_read_global_table_stats(netdata_idx_t *stats, /***************************************************************** * + * FUNCTIONS USED WITH SOCKET + * + *****************************************************************/ + +/** + * Netmask + * + * Copied from iprange (https://github.com/firehol/iprange/blob/master/iprange.h) + * + * @param prefix create the netmask based in the CIDR value. + * + * @return + */ +static inline in_addr_t ebpf_netmask(int prefix) { + + if (prefix == 0) + return (~((in_addr_t) - 1)); + else + return (in_addr_t)(~((1 << (32 - prefix)) - 1)); + +} + +/** + * Broadcast + * + * Copied from iprange (https://github.com/firehol/iprange/blob/master/iprange.h) + * + * @param addr is the ip address + * @param prefix is the CIDR value. + * + * @return It returns the last address of the range + */ +static inline in_addr_t ebpf_broadcast(in_addr_t addr, int prefix) +{ + return (addr | ~ebpf_netmask(prefix)); +} + +/** + * Network + * + * Copied from iprange (https://github.com/firehol/iprange/blob/master/iprange.h) + * + * @param addr is the ip address + * @param prefix is the CIDR value. + * + * @return It returns the first address of the range. + */ +static inline in_addr_t ebpf_ipv4_network(in_addr_t addr, int prefix) +{ + return (addr & ebpf_netmask(prefix)); +} + +/** + * Calculate ipv6 first address + * + * @param out the address to store the first address. + * @param in the address used to do the math. + * @param prefix number of bits used to calculate the address + */ +static void get_ipv6_first_addr(union netdata_ip_t *out, union netdata_ip_t *in, uint64_t prefix) +{ + uint64_t mask,tmp; + uint64_t ret[2]; + + memcpy(ret, in->addr32, sizeof(union netdata_ip_t)); + + if (prefix == 128) { + memcpy(out->addr32, in->addr32, sizeof(union netdata_ip_t)); + return; + } else if (!prefix) { + ret[0] = ret[1] = 0; + memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); + return; + } else if (prefix <= 64) { + ret[1] = 0ULL; + + tmp = be64toh(ret[0]); + mask = 0xFFFFFFFFFFFFFFFFULL << (64 - prefix); + tmp &= mask; + ret[0] = htobe64(tmp); + } else { + mask = 0xFFFFFFFFFFFFFFFFULL << (128 - prefix); + tmp = be64toh(ret[1]); + tmp &= mask; + ret[1] = htobe64(tmp); + } + + memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); +} + +/** + * Get IPV6 Last Address + * + * @param out the address to store the last address. + * @param in the address used to do the math. + * @param prefix number of bits used to calculate the address + */ +static void get_ipv6_last_addr(union netdata_ip_t *out, union netdata_ip_t *in, uint64_t prefix) +{ + uint64_t mask,tmp; + uint64_t ret[2]; + memcpy(ret, in->addr32, sizeof(union netdata_ip_t)); + + if (prefix == 128) { + memcpy(out->addr32, in->addr32, sizeof(union netdata_ip_t)); + return; + } else if (!prefix) { + ret[0] = ret[1] = 0xFFFFFFFFFFFFFFFF; + memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); + return; + } else if (prefix <= 64) { + ret[1] = 0xFFFFFFFFFFFFFFFFULL; + + tmp = be64toh(ret[0]); + mask = 0xFFFFFFFFFFFFFFFFULL << (64 - prefix); + tmp |= ~mask; + ret[0] = htobe64(tmp); + } else { + mask = 0xFFFFFFFFFFFFFFFFULL << (128 - prefix); + tmp = be64toh(ret[1]); + tmp |= ~mask; + ret[1] = htobe64(tmp); + } + + memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); +} + +/** + * IP to network long + * + * @param dst the vector to store the result + * @param ip the source ip given by our users. + * @param domain the ip domain (IPV4 or IPV6) + * @param source the original string + * + * @return it returns 0 on success and -1 otherwise. + */ +static inline int ebpf_ip2nl(uint8_t *dst, char *ip, int domain, char *source) +{ + if (inet_pton(domain, ip, dst) <= 0) { + netdata_log_error("The address specified (%s) is invalid ", source); + return -1; + } + + return 0; +} + +/** + * Clean port Structure + * + * Clean the allocated list. + * + * @param clean the list that will be cleaned + */ +void ebpf_clean_port_structure(ebpf_network_viewer_port_list_t **clean) +{ + ebpf_network_viewer_port_list_t *move = *clean; + while (move) { + ebpf_network_viewer_port_list_t *next = move->next; + freez(move->value); + freez(move); + + move = next; + } + *clean = NULL; +} + +/** + * Clean IP structure + * + * Clean the allocated list. + * + * @param clean the list that will be cleaned + */ +void ebpf_clean_ip_structure(ebpf_network_viewer_ip_list_t **clean) +{ + ebpf_network_viewer_ip_list_t *move = *clean; + while (move) { + ebpf_network_viewer_ip_list_t *next = move->next; + freez(move->value); + freez(move); + + move = next; + } + *clean = NULL; +} + +/** + * Parse IP List + * + * Parse IP list and link it. + * + * @param out a pointer to store the link list + * @param ip the value given as parameter + */ +static void ebpf_parse_ip_list_unsafe(void **out, char *ip) +{ + ebpf_network_viewer_ip_list_t **list = (ebpf_network_viewer_ip_list_t **)out; + + char *ipdup = strdupz(ip); + union netdata_ip_t first = { }; + union netdata_ip_t last = { }; + char *is_ipv6; + if (*ip == '*' && *(ip+1) == '\0') { + memset(first.addr8, 0, sizeof(first.addr8)); + memset(last.addr8, 0xFF, sizeof(last.addr8)); + + is_ipv6 = ip; + + ebpf_clean_ip_structure(list); + goto storethisip; + } + + char *end = ip; + // Move while I cannot find a separator + while (*end && *end != '/' && *end != '-') end++; + + // We will use only the classic IPV6 for while, but we could consider the base 85 in a near future + // https://tools.ietf.org/html/rfc1924 + is_ipv6 = strchr(ip, ':'); + + int select; + if (*end && !is_ipv6) { // IPV4 range + select = (*end == '/') ? 0 : 1; + *end++ = '\0'; + if (*end == '!') { + netdata_log_info("The exclusion cannot be in the second part of the range %s, it will be ignored.", ipdup); + goto cleanipdup; + } + + if (!select) { // CIDR + select = ebpf_ip2nl(first.addr8, ip, AF_INET, ipdup); + if (select) + goto cleanipdup; + + select = (int) str2i(end); + if (select < NETDATA_MINIMUM_IPV4_CIDR || select > NETDATA_MAXIMUM_IPV4_CIDR) { + netdata_log_info("The specified CIDR %s is not valid, the IP %s will be ignored.", end, ip); + goto cleanipdup; + } + + last.addr32[0] = htonl(ebpf_broadcast(ntohl(first.addr32[0]), select)); + // This was added to remove + // https://app.codacy.com/manual/netdata/netdata/pullRequest?prid=5810941&bid=19021977 + UNUSED(last.addr32[0]); + + uint32_t ipv4_test = htonl(ebpf_ipv4_network(ntohl(first.addr32[0]), select)); + if (first.addr32[0] != ipv4_test) { + first.addr32[0] = ipv4_test; + struct in_addr ipv4_convert; + ipv4_convert.s_addr = ipv4_test; + char ipv4_msg[INET_ADDRSTRLEN]; + if(inet_ntop(AF_INET, &ipv4_convert, ipv4_msg, INET_ADDRSTRLEN)) + netdata_log_info("The network value of CIDR %s was updated for %s .", ipdup, ipv4_msg); + } + } else { // Range + select = ebpf_ip2nl(first.addr8, ip, AF_INET, ipdup); + if (select) + goto cleanipdup; + + select = ebpf_ip2nl(last.addr8, end, AF_INET, ipdup); + if (select) + goto cleanipdup; + } + + if (htonl(first.addr32[0]) > htonl(last.addr32[0])) { + netdata_log_info("The specified range %s is invalid, the second address is smallest than the first, it will be ignored.", + ipdup); + goto cleanipdup; + } + } else if (is_ipv6) { // IPV6 + if (!*end) { // Unique + select = ebpf_ip2nl(first.addr8, ip, AF_INET6, ipdup); + if (select) + goto cleanipdup; + + memcpy(last.addr8, first.addr8, sizeof(first.addr8)); + } else if (*end == '-') { + *end++ = 0x00; + if (*end == '!') { + netdata_log_info("The exclusion cannot be in the second part of the range %s, it will be ignored.", ipdup); + goto cleanipdup; + } + + select = ebpf_ip2nl(first.addr8, ip, AF_INET6, ipdup); + if (select) + goto cleanipdup; + + select = ebpf_ip2nl(last.addr8, end, AF_INET6, ipdup); + if (select) + goto cleanipdup; + } else { // CIDR + *end++ = 0x00; + if (*end == '!') { + netdata_log_info("The exclusion cannot be in the second part of the range %s, it will be ignored.", ipdup); + goto cleanipdup; + } + + select = str2i(end); + if (select < 0 || select > 128) { + netdata_log_info("The CIDR %s is not valid, the address %s will be ignored.", end, ip); + goto cleanipdup; + } + + uint64_t prefix = (uint64_t)select; + select = ebpf_ip2nl(first.addr8, ip, AF_INET6, ipdup); + if (select) + goto cleanipdup; + + get_ipv6_last_addr(&last, &first, prefix); + + union netdata_ip_t ipv6_test; + get_ipv6_first_addr(&ipv6_test, &first, prefix); + + if (memcmp(first.addr8, ipv6_test.addr8, sizeof(union netdata_ip_t)) != 0) { + memcpy(first.addr8, ipv6_test.addr8, sizeof(union netdata_ip_t)); + + struct in6_addr ipv6_convert; + memcpy(ipv6_convert.s6_addr, ipv6_test.addr8, sizeof(union netdata_ip_t)); + + char ipv6_msg[INET6_ADDRSTRLEN]; + if(inet_ntop(AF_INET6, &ipv6_convert, ipv6_msg, INET6_ADDRSTRLEN)) + netdata_log_info("The network value of CIDR %s was updated for %s .", ipdup, ipv6_msg); + } + } + + if ((be64toh(*(uint64_t *)&first.addr32[2]) > be64toh(*(uint64_t *)&last.addr32[2]) && + !memcmp(first.addr32, last.addr32, 2*sizeof(uint32_t))) || + (be64toh(*(uint64_t *)&first.addr32) > be64toh(*(uint64_t *)&last.addr32)) ) { + netdata_log_info("The specified range %s is invalid, the second address is smallest than the first, it will be ignored.", + ipdup); + goto cleanipdup; + } + } else { // Unique ip + select = ebpf_ip2nl(first.addr8, ip, AF_INET, ipdup); + if (select) + goto cleanipdup; + + memcpy(last.addr8, first.addr8, sizeof(first.addr8)); + } + + ebpf_network_viewer_ip_list_t *store; + + storethisip: + store = callocz(1, sizeof(ebpf_network_viewer_ip_list_t)); + store->value = ipdup; + store->hash = simple_hash(ipdup); + store->ver = (uint8_t)(!is_ipv6)?AF_INET:AF_INET6; + memcpy(store->first.addr8, first.addr8, sizeof(first.addr8)); + memcpy(store->last.addr8, last.addr8, sizeof(last.addr8)); + + ebpf_fill_ip_list_unsafe(list, store, "socket"); + return; + + cleanipdup: + freez(ipdup); +} + +/** + * Parse IP Range + * + * Parse the IP ranges given and create Network Viewer IP Structure + * + * @param ptr is a pointer with the text to parse. + */ +void ebpf_parse_ips_unsafe(char *ptr) +{ + // No value + if (unlikely(!ptr)) + return; + + while (likely(ptr)) { + // Move forward until next valid character + while (isspace(*ptr)) ptr++; + + // No valid value found + if (unlikely(!*ptr)) + return; + + // Find space that ends the list + char *end = strchr(ptr, ' '); + if (end) { + *end++ = '\0'; + } + + int neg = 0; + if (*ptr == '!') { + neg++; + ptr++; + } + + if (isascii(*ptr)) { // Parse port + ebpf_parse_ip_list_unsafe( + (!neg) ? (void **)&network_viewer_opt.included_ips : (void **)&network_viewer_opt.excluded_ips, ptr); + } + + ptr = end; + } +} + +/** + * Fill Port list + * + * @param out a pointer to the link list. + * @param in the structure that will be linked. + */ +static inline void fill_port_list(ebpf_network_viewer_port_list_t **out, ebpf_network_viewer_port_list_t *in) +{ + if (likely(*out)) { + ebpf_network_viewer_port_list_t *move = *out, *store = *out; + uint16_t first = ntohs(in->first); + uint16_t last = ntohs(in->last); + while (move) { + uint16_t cmp_first = ntohs(move->first); + uint16_t cmp_last = ntohs(move->last); + if (cmp_first <= first && first <= cmp_last && + cmp_first <= last && last <= cmp_last ) { + netdata_log_info("The range/value (%u, %u) is inside the range/value (%u, %u) already inserted, it will be ignored.", + first, last, cmp_first, cmp_last); + freez(in->value); + freez(in); + return; + } else if (first <= cmp_first && cmp_first <= last && + first <= cmp_last && cmp_last <= last) { + netdata_log_info("The range (%u, %u) is bigger than previous range (%u, %u) already inserted, the previous will be ignored.", + first, last, cmp_first, cmp_last); + freez(move->value); + move->value = in->value; + move->first = in->first; + move->last = in->last; + freez(in); + return; + } + + store = move; + move = move->next; + } + + store->next = in; + } else { + *out = in; + } + +#ifdef NETDATA_INTERNAL_CHECKS + netdata_log_info("Adding values %s( %u, %u) to %s port list used on network viewer", + in->value, in->first, in->last, + (*out == network_viewer_opt.included_port)?"included":"excluded"); +#endif +} + +/** + * Parse Service List + * + * @param out a pointer to store the link list + * @param service the service used to create the structure that will be linked. + */ +static void ebpf_parse_service_list(void **out, char *service) +{ + ebpf_network_viewer_port_list_t **list = (ebpf_network_viewer_port_list_t **)out; + struct servent *serv = getservbyname((const char *)service, "tcp"); + if (!serv) + serv = getservbyname((const char *)service, "udp"); + + if (!serv) { + netdata_log_info("Cannot resolve the service '%s' with protocols TCP and UDP, it will be ignored", service); + return; + } + + ebpf_network_viewer_port_list_t *w = callocz(1, sizeof(ebpf_network_viewer_port_list_t)); + w->value = strdupz(service); + w->hash = simple_hash(service); + + w->first = w->last = (uint16_t)serv->s_port; + + fill_port_list(list, w); +} + +/** + * Parse port list + * + * Parse an allocated port list with the range given + * + * @param out a pointer to store the link list + * @param range the informed range for the user. + */ +static void ebpf_parse_port_list(void **out, char *range) +{ + int first, last; + ebpf_network_viewer_port_list_t **list = (ebpf_network_viewer_port_list_t **)out; + + char *copied = strdupz(range); + if (*range == '*' && *(range+1) == '\0') { + first = 1; + last = 65535; + + ebpf_clean_port_structure(list); + goto fillenvpl; + } + + char *end = range; + //Move while I cannot find a separator + while (*end && *end != ':' && *end != '-') end++; + + //It has a range + if (likely(*end)) { + *end++ = '\0'; + if (*end == '!') { + netdata_log_info("The exclusion cannot be in the second part of the range, the range %s will be ignored.", copied); + freez(copied); + return; + } + last = str2i((const char *)end); + } else { + last = 0; + } + + first = str2i((const char *)range); + if (first < NETDATA_MINIMUM_PORT_VALUE || first > NETDATA_MAXIMUM_PORT_VALUE) { + netdata_log_info("The first port %d of the range \"%s\" is invalid and it will be ignored!", first, copied); + freez(copied); + return; + } + + if (!last) + last = first; + + if (last < NETDATA_MINIMUM_PORT_VALUE || last > NETDATA_MAXIMUM_PORT_VALUE) { + netdata_log_info("The second port %d of the range \"%s\" is invalid and the whole range will be ignored!", last, copied); + freez(copied); + return; + } + + if (first > last) { + netdata_log_info("The specified order %s is wrong, the smallest value is always the first, it will be ignored!", copied); + freez(copied); + return; + } + + ebpf_network_viewer_port_list_t *w; + fillenvpl: + w = callocz(1, sizeof(ebpf_network_viewer_port_list_t)); + w->value = copied; + w->hash = simple_hash(copied); + w->first = (uint16_t)first; + w->last = (uint16_t)last; + w->cmp_first = (uint16_t)first; + w->cmp_last = (uint16_t)last; + + fill_port_list(list, w); +} + +/** + * Parse Port Range + * + * Parse the port ranges given and create Network Viewer Port Structure + * + * @param ptr is a pointer with the text to parse. + */ +void ebpf_parse_ports(char *ptr) +{ + // No value + if (unlikely(!ptr)) + return; + + while (likely(ptr)) { + // Move forward until next valid character + while (isspace(*ptr)) ptr++; + + // No valid value found + if (unlikely(!*ptr)) + return; + + // Find space that ends the list + char *end = strchr(ptr, ' '); + if (end) { + *end++ = '\0'; + } + + int neg = 0; + if (*ptr == '!') { + neg++; + ptr++; + } + + if (isdigit(*ptr)) { // Parse port + ebpf_parse_port_list( + (!neg) ? (void **)&network_viewer_opt.included_port : (void **)&network_viewer_opt.excluded_port, ptr); + } else if (isalpha(*ptr)) { // Parse service + ebpf_parse_service_list( + (!neg) ? (void **)&network_viewer_opt.included_port : (void **)&network_viewer_opt.excluded_port, ptr); + } else if (*ptr == '*') { // All + ebpf_parse_port_list( + (!neg) ? (void **)&network_viewer_opt.included_port : (void **)&network_viewer_opt.excluded_port, ptr); + } + + ptr = end; + } +} + +/***************************************************************** + * * FUNCTIONS TO DEFINE OPTIONS * *****************************************************************/ @@ -1428,13 +2168,7 @@ static inline void ebpf_set_thread_mode(netdata_run_mode_t lmode) */ static inline void ebpf_enable_specific_chart(struct ebpf_module *em, int disable_cgroup) { - em->enabled = CONFIG_BOOLEAN_YES; - - // oomkill stores data inside apps submenu, so it always need to have apps_enabled for plugin to create - // its chart, without this comparison eBPF.plugin will try to store invalid data when apps is disabled. - if (!strcmp(em->thread_name, "oomkill")) { - em->apps_charts = NETDATA_EBPF_APPS_FLAG_YES; - } + em->enabled = NETDATA_THREAD_EBPF_RUNNING; if (!disable_cgroup) { em->cgroup_charts = CONFIG_BOOLEAN_YES; @@ -1451,8 +2185,8 @@ static inline void ebpf_enable_specific_chart(struct ebpf_module *em, int disabl static inline void disable_all_global_charts() { int i; - for (i = 0; ebpf_modules[i].thread_name; i++) { - ebpf_modules[i].enabled = 0; + for (i = 0; ebpf_modules[i].info.thread_name; i++) { + ebpf_modules[i].enabled = NETDATA_THREAD_EBPF_NOT_RUNNING; ebpf_modules[i].global_charts = 0; } } @@ -1465,7 +2199,7 @@ static inline void disable_all_global_charts() static inline void ebpf_enable_chart(int idx, int disable_cgroup) { int i; - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { if (i == idx) { ebpf_enable_specific_chart(&ebpf_modules[i], disable_cgroup); break; @@ -1481,7 +2215,7 @@ static inline void ebpf_enable_chart(int idx, int disable_cgroup) static inline void ebpf_disable_cgroups() { int i; - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { ebpf_modules[i].cgroup_charts = 0; } } @@ -1662,6 +2396,203 @@ uint32_t ebpf_enable_tracepoints(ebpf_tracepoint_t *tps) *****************************************************************/ /** + * Is ip inside the range + * + * Check if the ip is inside a IP range + * + * @param rfirst the first ip address of the range + * @param rlast the last ip address of the range + * @param cmpfirst the first ip to compare + * @param cmplast the last ip to compare + * @param family the IP family + * + * @return It returns 1 if the IP is inside the range and 0 otherwise + */ +static int ebpf_is_ip_inside_range(union netdata_ip_t *rfirst, union netdata_ip_t *rlast, + union netdata_ip_t *cmpfirst, union netdata_ip_t *cmplast, int family) +{ + if (family == AF_INET) { + if ((rfirst->addr32[0] <= cmpfirst->addr32[0]) && (rlast->addr32[0] >= cmplast->addr32[0])) + return 1; + } else { + if (memcmp(rfirst->addr8, cmpfirst->addr8, sizeof(union netdata_ip_t)) <= 0 && + memcmp(rlast->addr8, cmplast->addr8, sizeof(union netdata_ip_t)) >= 0) { + return 1; + } + + } + return 0; +} + +/** + * Fill IP list + * + * @param out a pointer to the link list. + * @param in the structure that will be linked. + * @param table the modified table. + */ +void ebpf_fill_ip_list_unsafe(ebpf_network_viewer_ip_list_t **out, ebpf_network_viewer_ip_list_t *in, + char *table __maybe_unused) +{ + if (in->ver == AF_INET) { // It is simpler to compare using host order + in->first.addr32[0] = ntohl(in->first.addr32[0]); + in->last.addr32[0] = ntohl(in->last.addr32[0]); + } + if (likely(*out)) { + ebpf_network_viewer_ip_list_t *move = *out, *store = *out; + while (move) { + if (in->ver == move->ver && + ebpf_is_ip_inside_range(&move->first, &move->last, &in->first, &in->last, in->ver)) { +#ifdef NETDATA_DEV_MODE + netdata_log_info("The range/value (%s) is inside the range/value (%s) already inserted, it will be ignored.", + in->value, move->value); +#endif + freez(in->value); + freez(in); + return; + } + store = move; + move = move->next; + } + + store->next = in; + } else { + *out = in; + } + +#ifdef NETDATA_DEV_MODE + char first[256], last[512]; + if (in->ver == AF_INET) { + netdata_log_info("Adding values %s: (%u - %u) to %s IP list \"%s\" used on network viewer", + in->value, in->first.addr32[0], in->last.addr32[0], + (*out == network_viewer_opt.included_ips)?"included":"excluded", + table); + } else { + if (inet_ntop(AF_INET6, in->first.addr8, first, INET6_ADDRSTRLEN) && + inet_ntop(AF_INET6, in->last.addr8, last, INET6_ADDRSTRLEN)) + netdata_log_info("Adding values %s - %s to %s IP list \"%s\" used on network viewer", + first, last, + (*out == network_viewer_opt.included_ips)?"included":"excluded", + table); + } +#endif +} + +/** + * Link hostname + * + * @param out is the output link list + * @param in the hostname to add to list. + */ +static void ebpf_link_hostname(ebpf_network_viewer_hostname_list_t **out, ebpf_network_viewer_hostname_list_t *in) +{ + if (likely(*out)) { + ebpf_network_viewer_hostname_list_t *move = *out; + for (; move->next ; move = move->next ) { + if (move->hash == in->hash && !strcmp(move->value, in->value)) { + netdata_log_info("The hostname %s was already inserted, it will be ignored.", in->value); + freez(in->value); + simple_pattern_free(in->value_pattern); + freez(in); + return; + } + } + + move->next = in; + } else { + *out = in; + } +#ifdef NETDATA_INTERNAL_CHECKS + netdata_log_info("Adding value %s to %s hostname list used on network viewer", + in->value, + (*out == network_viewer_opt.included_hostnames)?"included":"excluded"); +#endif +} + +/** + * Link Hostnames + * + * Parse the list of hostnames to create the link list. + * This is not associated with the IP, because simple patterns like *example* cannot be resolved to IP. + * + * @param out is the output link list + * @param parse is a pointer with the text to parser. + */ +static void ebpf_link_hostnames(char *parse) +{ + // No value + if (unlikely(!parse)) + return; + + while (likely(parse)) { + // Find the first valid value + while (isspace(*parse)) parse++; + + // No valid value found + if (unlikely(!*parse)) + return; + + // Find space that ends the list + char *end = strchr(parse, ' '); + if (end) { + *end++ = '\0'; + } + + int neg = 0; + if (*parse == '!') { + neg++; + parse++; + } + + ebpf_network_viewer_hostname_list_t *hostname = callocz(1 , sizeof(ebpf_network_viewer_hostname_list_t)); + hostname->value = strdupz(parse); + hostname->hash = simple_hash(parse); + hostname->value_pattern = simple_pattern_create(parse, NULL, SIMPLE_PATTERN_EXACT, true); + + ebpf_link_hostname((!neg) ? &network_viewer_opt.included_hostnames : + &network_viewer_opt.excluded_hostnames, + hostname); + + parse = end; + } +} + +/** + * Parse network viewer section + * + * @param cfg the configuration structure + */ +void parse_network_viewer_section(struct config *cfg) +{ + network_viewer_opt.hostname_resolution_enabled = appconfig_get_boolean(cfg, + EBPF_NETWORK_VIEWER_SECTION, + EBPF_CONFIG_RESOLVE_HOSTNAME, + CONFIG_BOOLEAN_NO); + + network_viewer_opt.service_resolution_enabled = appconfig_get_boolean(cfg, + EBPF_NETWORK_VIEWER_SECTION, + EBPF_CONFIG_RESOLVE_SERVICE, + CONFIG_BOOLEAN_YES); + + char *value = appconfig_get(cfg, EBPF_NETWORK_VIEWER_SECTION, EBPF_CONFIG_PORTS, NULL); + ebpf_parse_ports(value); + + if (network_viewer_opt.hostname_resolution_enabled) { + value = appconfig_get(cfg, EBPF_NETWORK_VIEWER_SECTION, EBPF_CONFIG_HOSTNAMES, NULL); + ebpf_link_hostnames(value); + } else { + netdata_log_info("Name resolution is disabled, collector will not parse \"hostnames\" list."); + } + + value = appconfig_get(cfg, + EBPF_NETWORK_VIEWER_SECTION, + "ips", + NULL); + //"ips", "!127.0.0.1/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 fc00::/7 !::1/128"); + ebpf_parse_ips_unsafe(value); +} + +/** * Read Local Ports * * Parse /proc/net/{tcp,udp} and get the ports Linux is listening. @@ -1705,7 +2636,7 @@ static void read_local_ports(char *filename, uint8_t proto) * * Read the local address from the interfaces. */ -static void read_local_addresses() +void ebpf_read_local_addresses_unsafe() { struct ifaddrs *ifaddr, *ifa; if (getifaddrs(&ifaddr) == -1) { @@ -1754,9 +2685,8 @@ static void read_local_addresses() } } - ebpf_fill_ip_list((family == AF_INET)?&network_viewer_opt.ipv4_local_ip:&network_viewer_opt.ipv6_local_ip, - w, - "selector"); + ebpf_fill_ip_list_unsafe( + (family == AF_INET) ? &network_viewer_opt.ipv4_local_ip : &network_viewer_opt.ipv6_local_ip, w, "selector"); } freeifaddrs(ifaddr); @@ -1773,6 +2703,7 @@ void ebpf_start_pthread_variables() pthread_mutex_init(&ebpf_exit_cleanup, NULL); pthread_mutex_init(&collect_data_mutex, NULL); pthread_mutex_init(&mutex_cgroup_shm, NULL); + rw_spinlock_init(&ebpf_judy_pid.index.rw_spinlock); } /** @@ -1780,6 +2711,8 @@ void ebpf_start_pthread_variables() */ static void ebpf_allocate_common_vectors() { + ebpf_judy_pid.pid_table = ebpf_allocate_pid_aral(NETDATA_EBPF_PID_SOCKET_ARAL_TABLE_NAME, + sizeof(netdata_ebpf_judy_pid_stats_t)); ebpf_all_pids = callocz((size_t)pid_max, sizeof(struct ebpf_pid_stat *)); ebpf_aral_init(); } @@ -1825,7 +2758,7 @@ static void ebpf_update_interval(int update_every) int i; int value = (int) appconfig_get_number(&collector_config, EBPF_GLOBAL_SECTION, EBPF_CFG_UPDATE_EVERY, update_every); - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { ebpf_modules[i].update_every = value; } } @@ -1840,7 +2773,7 @@ static void ebpf_update_table_size() int i; uint32_t value = (uint32_t) appconfig_get_number(&collector_config, EBPF_GLOBAL_SECTION, EBPF_CFG_PID_SIZE, ND_EBPF_DEFAULT_PID_SIZE); - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { ebpf_modules[i].pid_map_size = value; } } @@ -1855,7 +2788,7 @@ static void ebpf_update_lifetime() int i; uint32_t value = (uint32_t) appconfig_get_number(&collector_config, EBPF_GLOBAL_SECTION, EBPF_CFG_LIFETIME, EBPF_DEFAULT_LIFETIME); - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { ebpf_modules[i].lifetime = value; } } @@ -1868,7 +2801,7 @@ static void ebpf_update_lifetime() static inline void ebpf_set_load_mode(netdata_ebpf_load_mode_t load, netdata_ebpf_load_mode_t origin) { int i; - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { ebpf_modules[i].load &= ~NETDATA_EBPF_LOAD_METHODS; ebpf_modules[i].load |= load | origin ; } @@ -1897,7 +2830,7 @@ static void ebpf_update_map_per_core() int i; int value = appconfig_get_boolean(&collector_config, EBPF_GLOBAL_SECTION, EBPF_CFG_MAPS_PER_CORE, CONFIG_BOOLEAN_YES); - for (i = 0; ebpf_modules[i].thread_name; i++) { + for (i = 0; ebpf_modules[i].info.thread_name; i++) { ebpf_modules[i].maps_per_core = value; } } @@ -1961,7 +2894,7 @@ static void read_collector_values(int *disable_cgroups, // Read ebpf programs section enabled = appconfig_get_boolean(&collector_config, EBPF_PROGRAMS_SECTION, - ebpf_modules[EBPF_MODULE_PROCESS_IDX].config_name, CONFIG_BOOLEAN_YES); + ebpf_modules[EBPF_MODULE_PROCESS_IDX].info.config_name, CONFIG_BOOLEAN_YES); if (enabled) { ebpf_enable_chart(EBPF_MODULE_PROCESS_IDX, *disable_cgroups); } @@ -1971,7 +2904,7 @@ static void read_collector_values(int *disable_cgroups, CONFIG_BOOLEAN_NO); if (!enabled) enabled = appconfig_get_boolean(&collector_config, EBPF_PROGRAMS_SECTION, - ebpf_modules[EBPF_MODULE_SOCKET_IDX].config_name, + ebpf_modules[EBPF_MODULE_SOCKET_IDX].info.config_name, CONFIG_BOOLEAN_NO); if (enabled) { ebpf_enable_chart(EBPF_MODULE_SOCKET_IDX, *disable_cgroups); @@ -1979,10 +2912,11 @@ static void read_collector_values(int *disable_cgroups, // This is kept to keep compatibility enabled = appconfig_get_boolean(&collector_config, EBPF_PROGRAMS_SECTION, "network connection monitoring", - CONFIG_BOOLEAN_NO); + CONFIG_BOOLEAN_YES); if (!enabled) enabled = appconfig_get_boolean(&collector_config, EBPF_PROGRAMS_SECTION, "network connections", - CONFIG_BOOLEAN_NO); + CONFIG_BOOLEAN_YES); + network_viewer_opt.enabled = enabled; if (enabled) { if (!ebpf_modules[EBPF_MODULE_SOCKET_IDX].enabled) @@ -1991,7 +2925,7 @@ static void read_collector_values(int *disable_cgroups, // Read network viewer section if network viewer is enabled // This is kept here to keep backward compatibility parse_network_viewer_section(&collector_config); - parse_service_name_section(&collector_config); + ebpf_parse_service_name_section(&collector_config); } enabled = appconfig_get_boolean(&collector_config, EBPF_PROGRAMS_SECTION, "cachestat", @@ -2238,7 +3172,7 @@ static void ebpf_parse_args(int argc, char **argv) }; memset(&network_viewer_opt, 0, sizeof(network_viewer_opt)); - network_viewer_opt.max_dim = NETDATA_NV_CAP_VALUE; + rw_spinlock_init(&network_viewer_opt.rw_spinlock); if (argc > 1) { int n = (int)str2l(argv[1]); @@ -2250,6 +3184,7 @@ static void ebpf_parse_args(int argc, char **argv) if (!freq) freq = EBPF_DEFAULT_UPDATE_EVERY; + //rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); if (ebpf_load_collector_config(ebpf_user_config_dir, &disable_cgroups, freq)) { netdata_log_info( "Does not have a configuration file inside `%s/ebpf.d.conf. It will try to load stock file.", @@ -2260,6 +3195,7 @@ static void ebpf_parse_args(int argc, char **argv) } ebpf_load_thread_config(); + //rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); while (1) { int c = getopt_long_only(argc, argv, "", long_options, &option_index); @@ -2457,8 +3393,7 @@ unittest: } if (disable_cgroups) { - if (disable_cgroups) - ebpf_disable_cgroups(); + ebpf_disable_cgroups(); } if (select_threads) { @@ -2510,8 +3445,8 @@ static inline void ebpf_send_hash_table_pid_data(char *chart, uint32_t idx) write_begin_chart(NETDATA_MONITORING_FAMILY, chart); for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { ebpf_module_t *wem = &ebpf_modules[i]; - if (wem->apps_routine) - write_chart_dimension((char *)wem->thread_name, + if (wem->functions.apps_routine) + write_chart_dimension((char *)wem->info.thread_name, (wem->enabled < NETDATA_THREAD_EBPF_STOPPING) ? wem->hash_table_stats[idx]: 0); @@ -2531,7 +3466,7 @@ static inline void ebpf_send_global_hash_table_data() write_begin_chart(NETDATA_MONITORING_FAMILY, NETDATA_EBPF_HASH_TABLES_GLOBAL_ELEMENTS); for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { ebpf_module_t *wem = &ebpf_modules[i]; - write_chart_dimension((char *)wem->thread_name, + write_chart_dimension((char *)wem->info.thread_name, (wem->enabled < NETDATA_THREAD_EBPF_STOPPING) ? NETDATA_CONTROLLER_END: 0); } write_end_chart(); @@ -2551,7 +3486,10 @@ void ebpf_send_statistic_data() int i; for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { ebpf_module_t *wem = &ebpf_modules[i]; - write_chart_dimension((char *)wem->thread_name, (wem->enabled < NETDATA_THREAD_EBPF_STOPPING) ? 1 : 0); + if (wem->functions.fnct_routine) + continue; + + write_chart_dimension((char *)wem->info.thread_name, (wem->enabled < NETDATA_THREAD_EBPF_STOPPING) ? 1 : 0); } write_end_chart(); @@ -2560,7 +3498,10 @@ void ebpf_send_statistic_data() ebpf_module_t *wem = &ebpf_modules[i]; // Threads like VFS is slow to load and this can create an invalid number, this is the motive // we are also testing wem->lifetime value. - write_chart_dimension((char *)wem->thread_name, + if (wem->functions.fnct_routine) + continue; + + write_chart_dimension((char *)wem->info.thread_name, (wem->lifetime && wem->enabled < NETDATA_THREAD_EBPF_STOPPING) ? (long long) (wem->lifetime - wem->running_time): 0) ; @@ -2589,6 +3530,23 @@ void ebpf_send_statistic_data() ebpf_send_hash_table_pid_data(NETDATA_EBPF_HASH_TABLES_INSERT_PID_ELEMENTS, NETDATA_EBPF_GLOBAL_TABLE_PID_TABLE_ADD); ebpf_send_hash_table_pid_data(NETDATA_EBPF_HASH_TABLES_REMOVE_PID_ELEMENTS, NETDATA_EBPF_GLOBAL_TABLE_PID_TABLE_DEL); + + for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { + ebpf_module_t *wem = &ebpf_modules[i]; + if (!wem->functions.fnct_routine) + continue; + + write_begin_chart(NETDATA_MONITORING_FAMILY, (char *)wem->functions.fcnt_thread_chart_name); + write_chart_dimension((char *)wem->info.thread_name, (wem->enabled < NETDATA_THREAD_EBPF_STOPPING) ? 1 : 0); + write_end_chart(); + + write_begin_chart(NETDATA_MONITORING_FAMILY, (char *)wem->functions.fcnt_thread_lifetime_name); + write_chart_dimension((char *)wem->info.thread_name, + (wem->lifetime && wem->enabled < NETDATA_THREAD_EBPF_STOPPING) ? + (long long) (wem->lifetime - wem->running_time): + 0) ; + write_end_chart(); + } } /** @@ -2607,57 +3565,51 @@ static void update_internal_metric_variable() } /** - * Create chart for Statistic Thread + * Create Thread Chart * - * Write to standard output current values for threads. + * Write to standard output current values for threads charts. * + * @param name is the chart name + * @param title chart title. + * @param units chart units + * @param order is the chart order * @param update_every time used to update charts + * @param module a module to create a specific chart. */ -static inline void ebpf_create_statistic_thread_chart(int update_every) +static void ebpf_create_thread_chart(char *name, + char *title, + char *units, + int order, + int update_every, + ebpf_module_t *module) { + // common call for specific and all charts. ebpf_write_chart_cmd(NETDATA_MONITORING_FAMILY, - NETDATA_EBPF_THREADS, - "Threads running.", - "boolean", + name, + title, + units, NETDATA_EBPF_FAMILY, NETDATA_EBPF_CHART_TYPE_LINE, NULL, - NETDATA_EBPF_ORDER_STAT_THREADS, + order, update_every, - NETDATA_EBPF_MODULE_NAME_PROCESS); + "main"); - int i; - for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { - ebpf_write_global_dimension((char *)ebpf_modules[i].thread_name, - (char *)ebpf_modules[i].thread_name, + if (module) { + ebpf_write_global_dimension((char *)module->info.thread_name, + (char *)module->info.thread_name, ebpf_algorithms[NETDATA_EBPF_ABSOLUTE_IDX]); + return; } -} - -/** - * Create lifetime Thread Chart - * - * Write to standard output current values for threads lifetime. - * - * @param update_every time used to update charts - */ -static inline void ebpf_create_lifetime_thread_chart(int update_every) -{ - ebpf_write_chart_cmd(NETDATA_MONITORING_FAMILY, - NETDATA_EBPF_LIFE_TIME, - "Threads running.", - "seconds", - NETDATA_EBPF_FAMILY, - NETDATA_EBPF_CHART_TYPE_LINE, - NULL, - NETDATA_EBPF_ORDER_STAT_LIFE_TIME, - update_every, - NETDATA_EBPF_MODULE_NAME_PROCESS); int i; for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { - ebpf_write_global_dimension((char *)ebpf_modules[i].thread_name, - (char *)ebpf_modules[i].thread_name, + ebpf_module_t *em = &ebpf_modules[i]; + if (em->functions.fnct_routine) + continue; + + ebpf_write_global_dimension((char *)em->info.thread_name, + (char *)em->info.thread_name, ebpf_algorithms[NETDATA_EBPF_ABSOLUTE_IDX]); } } @@ -2792,8 +3744,8 @@ static void ebpf_create_statistic_hash_global_elements(int update_every) int i; for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { - ebpf_write_global_dimension((char *)ebpf_modules[i].thread_name, - (char *)ebpf_modules[i].thread_name, + ebpf_write_global_dimension((char *)ebpf_modules[i].info.thread_name, + (char *)ebpf_modules[i].info.thread_name, ebpf_algorithms[NETDATA_EBPF_ABSOLUTE_IDX]); } } @@ -2824,9 +3776,9 @@ static void ebpf_create_statistic_hash_pid_table(int update_every, char *id, cha int i; for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { ebpf_module_t *wem = &ebpf_modules[i]; - if (wem->apps_routine) - ebpf_write_global_dimension((char *)wem->thread_name, - (char *)wem->thread_name, + if (wem->functions.apps_routine) + ebpf_write_global_dimension((char *)wem->info.thread_name, + (char *)wem->info.thread_name, ebpf_algorithms[NETDATA_EBPF_INCREMENTAL_IDX]); } } @@ -2850,15 +3802,63 @@ static void ebpf_create_statistic_charts(int update_every) create_charts = 0; - ebpf_create_statistic_thread_chart(update_every); + ebpf_create_thread_chart(NETDATA_EBPF_THREADS, + "Threads running.", + "boolean", + NETDATA_EBPF_ORDER_STAT_THREADS, + update_every, + NULL); + /* #ifdef NETDATA_DEV_MODE EBPF_PLUGIN_FUNCTIONS(EBPF_FUNCTION_THREAD, EBPF_PLUGIN_THREAD_FUNCTION_DESCRIPTION); #endif - - ebpf_create_lifetime_thread_chart(update_every); + */ + + ebpf_create_thread_chart(NETDATA_EBPF_LIFE_TIME, + "Time remaining for thread.", + "seconds", + NETDATA_EBPF_ORDER_STAT_LIFE_TIME, + update_every, + NULL); + /* #ifdef NETDATA_DEV_MODE EBPF_PLUGIN_FUNCTIONS(EBPF_FUNCTION_THREAD, EBPF_PLUGIN_THREAD_FUNCTION_DESCRIPTION); #endif + */ + + int i,j; + char name[256]; + for (i = 0, j = NETDATA_EBPF_ORDER_FUNCTION_PER_THREAD; i < EBPF_MODULE_FUNCTION_IDX; i++) { + ebpf_module_t *em = &ebpf_modules[i]; + if (!em->functions.fnct_routine) + continue; + + em->functions.order_thread_chart = j; + snprintfz(name, 255,"%s_%s", NETDATA_EBPF_THREADS, em->info.thread_name); + em->functions.fcnt_thread_chart_name = strdupz(name); + ebpf_create_thread_chart(name, + "Threads running.", + "boolean", + j++, + update_every, + em); +#ifdef NETDATA_DEV_MODE + EBPF_PLUGIN_FUNCTIONS(em->functions.fcnt_name, em->functions.fcnt_desc); +#endif + + em->functions.order_thread_lifetime = j; + snprintfz(name, 255,"%s_%s", NETDATA_EBPF_LIFE_TIME, em->info.thread_name); + em->functions.fcnt_thread_lifetime_name = strdupz(name); + ebpf_create_thread_chart(name, + "Time remaining for thread.", + "seconds", + j++, + update_every, + em); +#ifdef NETDATA_DEV_MODE + EBPF_PLUGIN_FUNCTIONS(em->functions.fcnt_name, em->functions.fcnt_desc); +#endif + } ebpf_create_statistic_load_chart(update_every); @@ -3013,7 +4013,7 @@ static void ebpf_kill_previous_process(char *filename, pid_t pid) */ void ebpf_pid_file(char *filename, size_t length) { - snprintfz(filename, length, "%s%s/ebpf.d/ebpf.pid", netdata_configured_host_prefix, ebpf_plugin_dir); + snprintfz(filename, length, "%s/var/run/ebpf.pid", netdata_configured_host_prefix); } /** @@ -3040,8 +4040,8 @@ static void ebpf_manage_pid(pid_t pid) static void ebpf_set_static_routine() { int i; - for (i = 0; ebpf_modules[i].thread_name; i++) { - ebpf_threads[i].start_routine = ebpf_modules[i].start_routine; + for (i = 0; ebpf_modules[i].info.thread_name; i++) { + ebpf_threads[i].start_routine = ebpf_modules[i].functions.start_routine; } } @@ -3056,6 +4056,9 @@ static void ebpf_manage_pid(pid_t pid) int main(int argc, char **argv) { stderror = stderr; + + log_set_global_severity_for_external_plugins(); + clocks_init(); main_thread_id = gettid(); @@ -3095,7 +4098,7 @@ int main(int argc, char **argv) libbpf_set_strict_mode(LIBBPF_STRICT_ALL); #endif - read_local_addresses(); + ebpf_read_local_addresses_unsafe(); read_local_ports("/proc/net/tcp", IPPROTO_TCP); read_local_ports("/proc/net/tcp6", IPPROTO_TCP); read_local_ports("/proc/net/udp", IPPROTO_UDP); @@ -3116,13 +4119,13 @@ int main(int argc, char **argv) ebpf_module_t *em = &ebpf_modules[i]; em->thread = st; em->thread_id = i; - if (em->enabled) { + if (em->enabled != NETDATA_THREAD_EBPF_NOT_RUNNING) { st->thread = mallocz(sizeof(netdata_thread_t)); em->enabled = NETDATA_THREAD_EBPF_RUNNING; em->lifetime = EBPF_NON_FUNCTION_LIFE_TIME; netdata_thread_create(st->thread, st->name, NETDATA_THREAD_OPTION_DEFAULT, st->start_routine, em); } else { - em->enabled = NETDATA_THREAD_EBPF_NOT_RUNNING; + em->lifetime = EBPF_DEFAULT_LIFETIME; } } @@ -3133,7 +4136,7 @@ int main(int argc, char **argv) int update_apps_list = update_apps_every - 1; int process_maps_per_core = ebpf_modules[EBPF_MODULE_PROCESS_IDX].maps_per_core; //Plugin will be killed when it receives a signal - for ( ; !ebpf_exit_plugin ; global_iterations_counter++) { + for ( ; !ebpf_plugin_exit; global_iterations_counter++) { (void)heartbeat_next(&hb, step); if (global_iterations_counter % EBPF_DEFAULT_UPDATE_EVERY == 0) { diff --git a/collectors/ebpf.plugin/ebpf.d/network.conf b/collectors/ebpf.plugin/ebpf.d/network.conf index 00cbf2e8b..99c32edc1 100644 --- a/collectors/ebpf.plugin/ebpf.d/network.conf +++ b/collectors/ebpf.plugin/ebpf.d/network.conf @@ -26,6 +26,11 @@ # # The `maps per core` defines if hash tables will be per core or not. This option is ignored on kernels older than 4.6. # +# The `collect pid` option defines the PID stored inside hash tables and accepts the following options: +# `real parent`: Only stores real parent inside PID +# `parent` : Only stores parent PID. +# `all` : Stores all PIDs used by software. This is the most expensive option. +# # The `lifetime` defines the time length a thread will run when it is enabled by a function. # # Uncomment lines to define specific options for thread. @@ -35,12 +40,12 @@ # cgroups = no # update every = 10 bandwidth table size = 16384 - ipv4 connection table size = 16384 - ipv6 connection table size = 16384 + socket monitoring table size = 16384 udp connection table size = 4096 ebpf type format = auto - ebpf co-re tracing = trampoline + ebpf co-re tracing = probe maps per core = no + collect pid = all lifetime = 300 # @@ -49,11 +54,12 @@ # This is a feature with status WIP(Work in Progress) # [network connections] - maximum dimensions = 50 + enabled = yes resolve hostnames = no - resolve service names = no + resolve service names = yes ports = * - ips = !127.0.0.1/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 fc00::/7 !::1/128 +# ips = !127.0.0.1/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 fc00::/7 !::1/128 + ips = * hostnames = * [service name] diff --git a/collectors/ebpf.plugin/ebpf.h b/collectors/ebpf.plugin/ebpf.h index 78e3a9252..d52ea5a4a 100644 --- a/collectors/ebpf.plugin/ebpf.h +++ b/collectors/ebpf.plugin/ebpf.h @@ -31,6 +31,7 @@ #include "daemon/main.h" #include "ebpf_apps.h" +#include "ebpf_functions.h" #include "ebpf_cgroup.h" #define NETDATA_EBPF_OLD_CONFIG_FILE "ebpf.conf" @@ -98,6 +99,26 @@ typedef struct netdata_error_report { int err; } netdata_error_report_t; +typedef struct netdata_ebpf_judy_pid { + ARAL *pid_table; + + // Index for PIDs + struct { // support for multiple indexing engines + Pvoid_t JudyLArray; // the hash table + RW_SPINLOCK rw_spinlock; // protect the index + } index; +} netdata_ebpf_judy_pid_t; + +typedef struct netdata_ebpf_judy_pid_stats { + char *cmdline; + + // Index for Socket timestamp + struct { // support for multiple indexing engines + Pvoid_t JudyLArray; // the hash table + RW_SPINLOCK rw_spinlock; // protect the index + } socket_stats; +} netdata_ebpf_judy_pid_stats_t; + extern ebpf_module_t ebpf_modules[]; enum ebpf_main_index { EBPF_MODULE_PROCESS_IDX, @@ -322,10 +343,19 @@ void ebpf_unload_legacy_code(struct bpf_object *objects, struct bpf_link **probe void ebpf_read_global_table_stats(netdata_idx_t *stats, netdata_idx_t *values, int map_fd, int maps_per_core, uint32_t begin, uint32_t end); +void **ebpf_judy_insert_unsafe(PPvoid_t arr, Word_t key); +netdata_ebpf_judy_pid_stats_t *ebpf_get_pid_from_judy_unsafe(PPvoid_t judy_array, uint32_t pid); + +void parse_network_viewer_section(struct config *cfg); +void ebpf_clean_ip_structure(ebpf_network_viewer_ip_list_t **clean); +void ebpf_clean_port_structure(ebpf_network_viewer_port_list_t **clean); +void ebpf_read_local_addresses_unsafe(); extern ebpf_filesystem_partitions_t localfs[]; extern ebpf_sync_syscalls_t local_syscalls[]; -extern int ebpf_exit_plugin; +extern bool ebpf_plugin_exit; +void ebpf_stop_threads(int sig); +extern netdata_ebpf_judy_pid_t ebpf_judy_pid; #define EBPF_MAX_SYNCHRONIZATION_TIME 300 diff --git a/collectors/ebpf.plugin/ebpf_apps.c b/collectors/ebpf.plugin/ebpf_apps.c index c7c0cbbbb..b1b42c8d8 100644 --- a/collectors/ebpf.plugin/ebpf_apps.c +++ b/collectors/ebpf.plugin/ebpf_apps.c @@ -375,58 +375,6 @@ int ebpf_read_hash_table(void *ep, int fd, uint32_t pid) return -1; } -/** - * Read socket statistic - * - * Read information from kernel ring to user ring. - * - * @param ep the table with all process stats values. - * @param fd the file descriptor mapped from kernel - * @param ef a pointer for the functions mapped from dynamic library - * @param pids the list of pids associated to a target. - * - * @return - */ -size_t read_bandwidth_statistic_using_pid_on_target(ebpf_bandwidth_t **ep, int fd, struct ebpf_pid_on_target *pids) -{ - size_t count = 0; - while (pids) { - uint32_t current_pid = pids->pid; - if (!ebpf_read_hash_table(ep[current_pid], fd, current_pid)) - count++; - - pids = pids->next; - } - - return count; -} - -/** - * Read bandwidth statistic using hash table - * - * @param out the output tensor that will receive the information. - * @param fd the file descriptor that has the data - * @param bpf_map_lookup_elem a pointer for the function to read the data - * @param bpf_map_get_next_key a pointer fo the function to read the index. - */ -size_t read_bandwidth_statistic_using_hash_table(ebpf_bandwidth_t **out, int fd) -{ - size_t count = 0; - uint32_t key = 0; - uint32_t next_key = 0; - - while (bpf_map_get_next_key(fd, &key, &next_key) == 0) { - ebpf_bandwidth_t *eps = out[next_key]; - if (!eps) { - eps = callocz(1, sizeof(ebpf_process_stat_t)); - out[next_key] = eps; - } - ebpf_read_hash_table(eps, fd, next_key); - } - - return count; -} - /***************************************************************** * * FUNCTIONS CALLED FROM COLLECTORS @@ -887,6 +835,7 @@ static inline int read_proc_pid_cmdline(struct ebpf_pid_stat *p) { static char cmdline[MAX_CMDLINE + 1]; + int ret = 0; if (unlikely(!p->cmdline_filename)) { char filename[FILENAME_MAX + 1]; snprintfz(filename, FILENAME_MAX, "%s/proc/%d/cmdline", netdata_configured_host_prefix, p->pid); @@ -909,20 +858,23 @@ static inline int read_proc_pid_cmdline(struct ebpf_pid_stat *p) cmdline[i] = ' '; } - if (p->cmdline) - freez(p->cmdline); - p->cmdline = strdupz(cmdline); - debug_log("Read file '%s' contents: %s", p->cmdline_filename, p->cmdline); - return 1; + ret = 1; cleanup: // copy the command to the command line if (p->cmdline) freez(p->cmdline); p->cmdline = strdupz(p->comm); - return 0; + + rw_spinlock_write_lock(&ebpf_judy_pid.index.rw_spinlock); + netdata_ebpf_judy_pid_stats_t *pid_ptr = ebpf_get_pid_from_judy_unsafe(&ebpf_judy_pid.index.JudyLArray, p->pid); + if (pid_ptr) + pid_ptr->cmdline = p->cmdline; + rw_spinlock_write_unlock(&ebpf_judy_pid.index.rw_spinlock); + + return ret; } /** @@ -1238,6 +1190,24 @@ static inline void del_pid_entry(pid_t pid) freez(p->status_filename); freez(p->io_filename); freez(p->cmdline_filename); + + rw_spinlock_write_lock(&ebpf_judy_pid.index.rw_spinlock); + netdata_ebpf_judy_pid_stats_t *pid_ptr = ebpf_get_pid_from_judy_unsafe(&ebpf_judy_pid.index.JudyLArray, p->pid); + if (pid_ptr) { + if (pid_ptr->socket_stats.JudyLArray) { + Word_t local_socket = 0; + Pvoid_t *socket_value; + bool first_socket = true; + while ((socket_value = JudyLFirstThenNext(pid_ptr->socket_stats.JudyLArray, &local_socket, &first_socket))) { + netdata_socket_plus_t *socket_clean = *socket_value; + aral_freez(aral_socket_table, socket_clean); + } + JudyLFreeArray(&pid_ptr->socket_stats.JudyLArray, PJE0); + } + JudyLDel(&ebpf_judy_pid.index.JudyLArray, p->pid, PJE0); + } + rw_spinlock_write_unlock(&ebpf_judy_pid.index.rw_spinlock); + freez(p->cmdline); ebpf_pid_stat_release(p); @@ -1279,12 +1249,6 @@ int get_pid_comm(pid_t pid, size_t n, char *dest) */ void cleanup_variables_from_other_threads(uint32_t pid) { - // Clean socket structures - if (socket_bandwidth_curr) { - ebpf_socket_release(socket_bandwidth_curr[pid]); - socket_bandwidth_curr[pid] = NULL; - } - // Clean cachestat structure if (cachestat_pid) { ebpf_cachestat_release(cachestat_pid[pid]); diff --git a/collectors/ebpf.plugin/ebpf_apps.h b/collectors/ebpf.plugin/ebpf_apps.h index fc894a55f..5ae5342dd 100644 --- a/collectors/ebpf.plugin/ebpf_apps.h +++ b/collectors/ebpf.plugin/ebpf_apps.h @@ -150,24 +150,6 @@ typedef struct ebpf_process_stat { uint8_t removeme; } ebpf_process_stat_t; -typedef struct ebpf_bandwidth { - uint32_t pid; - - uint64_t first; // First timestamp - uint64_t ct; // Last timestamp - uint64_t bytes_sent; // Bytes sent - uint64_t bytes_received; // Bytes received - uint64_t call_tcp_sent; // Number of times tcp_sendmsg was called - uint64_t call_tcp_received; // Number of times tcp_cleanup_rbuf was called - uint64_t retransmit; // Number of times tcp_retransmit was called - uint64_t call_udp_sent; // Number of times udp_sendmsg was called - uint64_t call_udp_received; // Number of times udp_recvmsg was called - uint64_t close; // Number of times tcp_close was called - uint64_t drop; // THIS IS NOT USED FOR WHILE, we are in groom section - uint32_t tcp_v4_connection; // Number of times tcp_v4_connection was called. - uint32_t tcp_v6_connection; // Number of times tcp_v6_connection was called. -} ebpf_bandwidth_t; - /** * Internal function used to write debug messages. * @@ -208,12 +190,6 @@ int ebpf_read_hash_table(void *ep, int fd, uint32_t pid); int get_pid_comm(pid_t pid, size_t n, char *dest); -size_t read_processes_statistic_using_pid_on_target(ebpf_process_stat_t **ep, - int fd, - struct ebpf_pid_on_target *pids); - -size_t read_bandwidth_statistic_using_pid_on_target(ebpf_bandwidth_t **ep, int fd, struct ebpf_pid_on_target *pids); - void collect_data_for_all_processes(int tbl_pid_stats_fd, int maps_per_core); void ebpf_process_apps_accumulator(ebpf_process_stat_t *out, int maps_per_core); diff --git a/collectors/ebpf.plugin/ebpf_cachestat.c b/collectors/ebpf.plugin/ebpf_cachestat.c index affecdea2..890600696 100644 --- a/collectors/ebpf.plugin/ebpf_cachestat.c +++ b/collectors/ebpf.plugin/ebpf_cachestat.c @@ -1288,10 +1288,10 @@ static void cachestat_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -1479,7 +1479,7 @@ static int ebpf_cachestat_load_bpf(ebpf_module_t *em) #endif if (ret) - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); return ret; } diff --git a/collectors/ebpf.plugin/ebpf_cgroup.c b/collectors/ebpf.plugin/ebpf_cgroup.c index fd4e783db..b1e2c0746 100644 --- a/collectors/ebpf.plugin/ebpf_cgroup.c +++ b/collectors/ebpf.plugin/ebpf_cgroup.c @@ -373,7 +373,7 @@ void *ebpf_cgroup_integration(void *ptr) heartbeat_t hb; heartbeat_init(&hb); //Plugin will be killed when it receives a signal - while (!ebpf_exit_plugin) { + while (!ebpf_plugin_exit) { (void)heartbeat_next(&hb, step); // We are using a small heartbeat time to wake up thread, diff --git a/collectors/ebpf.plugin/ebpf_cgroup.h b/collectors/ebpf.plugin/ebpf_cgroup.h index 6620ea10a..ba8346934 100644 --- a/collectors/ebpf.plugin/ebpf_cgroup.h +++ b/collectors/ebpf.plugin/ebpf_cgroup.h @@ -21,7 +21,7 @@ struct pid_on_target2 { ebpf_process_stat_t ps; netdata_dcstat_pid_t dc; netdata_publish_shm_t shm; - ebpf_bandwidth_t socket; + netdata_socket_t socket; netdata_cachestat_pid_t cachestat; struct pid_on_target2 *next; diff --git a/collectors/ebpf.plugin/ebpf_dcstat.c b/collectors/ebpf.plugin/ebpf_dcstat.c index feb935b93..8c6f60133 100644 --- a/collectors/ebpf.plugin/ebpf_dcstat.c +++ b/collectors/ebpf.plugin/ebpf_dcstat.c @@ -1169,10 +1169,10 @@ static void dcstat_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -1311,7 +1311,7 @@ static int ebpf_dcstat_load_bpf(ebpf_module_t *em) #endif if (ret) - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); return ret; } diff --git a/collectors/ebpf.plugin/ebpf_disk.c b/collectors/ebpf.plugin/ebpf_disk.c index 879456270..9dce8dd18 100644 --- a/collectors/ebpf.plugin/ebpf_disk.c +++ b/collectors/ebpf.plugin/ebpf_disk.c @@ -778,10 +778,10 @@ static void disk_collector(ebpf_module_t *em) int maps_per_core = em->maps_per_core; uint32_t running_time = 0; uint32_t lifetime = em->lifetime; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -873,7 +873,7 @@ static int ebpf_disk_load_bpf(ebpf_module_t *em) #endif if (ret) - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); return ret; } diff --git a/collectors/ebpf.plugin/ebpf_fd.c b/collectors/ebpf.plugin/ebpf_fd.c index f039647a1..49c19ca77 100644 --- a/collectors/ebpf.plugin/ebpf_fd.c +++ b/collectors/ebpf.plugin/ebpf_fd.c @@ -1136,10 +1136,10 @@ static void fd_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -1337,7 +1337,7 @@ static int ebpf_fd_load_bpf(ebpf_module_t *em) #endif if (ret) - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); return ret; } diff --git a/collectors/ebpf.plugin/ebpf_filesystem.c b/collectors/ebpf.plugin/ebpf_filesystem.c index 2bff738ca..6203edd9a 100644 --- a/collectors/ebpf.plugin/ebpf_filesystem.c +++ b/collectors/ebpf.plugin/ebpf_filesystem.c @@ -470,12 +470,12 @@ int ebpf_filesystem_initialize_ebpf_data(ebpf_module_t *em) { pthread_mutex_lock(&lock); int i; - const char *saved_name = em->thread_name; + const char *saved_name = em->info.thread_name; uint64_t kernels = em->kernels; for (i = 0; localfs[i].filesystem; i++) { ebpf_filesystem_partitions_t *efp = &localfs[i]; if (!efp->probe_links && efp->flags & NETDATA_FILESYSTEM_LOAD_EBPF_PROGRAM) { - em->thread_name = efp->filesystem; + em->info.thread_name = efp->filesystem; em->kernels = efp->kernels; em->maps = efp->fs_maps; #ifdef LIBBPF_MAJOR_VERSION @@ -484,7 +484,7 @@ int ebpf_filesystem_initialize_ebpf_data(ebpf_module_t *em) if (em->load & EBPF_LOAD_LEGACY) { efp->probe_links = ebpf_load_program(ebpf_plugin_dir, em, running_on_kernel, isrh, &efp->objects); if (!efp->probe_links) { - em->thread_name = saved_name; + em->info.thread_name = saved_name; em->kernels = kernels; em->maps = NULL; pthread_mutex_unlock(&lock); @@ -495,7 +495,7 @@ int ebpf_filesystem_initialize_ebpf_data(ebpf_module_t *em) else { efp->fs_obj = filesystem_bpf__open(); if (!efp->fs_obj) { - em->thread_name = saved_name; + em->info.thread_name = saved_name; em->kernels = kernels; return -1; } else { @@ -515,7 +515,7 @@ int ebpf_filesystem_initialize_ebpf_data(ebpf_module_t *em) } efp->flags &= ~NETDATA_FILESYSTEM_LOAD_EBPF_PROGRAM; } - em->thread_name = saved_name; + em->info.thread_name = saved_name; pthread_mutex_unlock(&lock); em->kernels = kernels; em->maps = NULL; @@ -909,10 +909,10 @@ static void filesystem_collector(ebpf_module_t *em) int counter = update_every - 1; uint32_t running_time = 0; uint32_t lifetime = em->lifetime; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; diff --git a/collectors/ebpf.plugin/ebpf_functions.c b/collectors/ebpf.plugin/ebpf_functions.c index 7a43692bc..ebee66395 100644 --- a/collectors/ebpf.plugin/ebpf_functions.c +++ b/collectors/ebpf.plugin/ebpf_functions.c @@ -4,6 +4,40 @@ #include "ebpf_functions.h" /***************************************************************** + * EBPF FUNCTION COMMON + *****************************************************************/ + +/** + * Function Start thread + * + * Start a specific thread after user request. + * + * @param em The structure with thread information + * @param period + * @return + */ +static int ebpf_function_start_thread(ebpf_module_t *em, int period) +{ + struct netdata_static_thread *st = em->thread; + // another request for thread that already ran, cleanup and restart + if (st->thread) + freez(st->thread); + + if (period <= 0) + period = EBPF_DEFAULT_LIFETIME; + + st->thread = mallocz(sizeof(netdata_thread_t)); + em->enabled = NETDATA_THREAD_EBPF_FUNCTION_RUNNING; + em->lifetime = period; + +#ifdef NETDATA_INTERNAL_CHECKS + netdata_log_info("Starting thread %s with lifetime = %d", em->info.thread_name, period); +#endif + + return netdata_thread_create(st->thread, st->name, NETDATA_THREAD_OPTION_DEFAULT, st->start_routine, em); +} + +/***************************************************************** * EBPF SELECT MODULE *****************************************************************/ @@ -13,17 +47,17 @@ * @param thread_name name of the thread we are looking for. * * @return it returns a pointer for the module that has thread_name on success or NULL otherwise. - */ ebpf_module_t *ebpf_functions_select_module(const char *thread_name) { int i; for (i = 0; i < EBPF_MODULE_FUNCTION_IDX; i++) { - if (strcmp(ebpf_modules[i].thread_name, thread_name) == 0) { + if (strcmp(ebpf_modules[i].info.thread_name, thread_name) == 0) { return &ebpf_modules[i]; } } return NULL; } + */ /***************************************************************** * EBPF HELP FUNCTIONS @@ -35,11 +69,9 @@ ebpf_module_t *ebpf_functions_select_module(const char *thread_name) { * Shows help with all options accepted by thread function. * * @param transaction the transaction id that Netdata sent for this function execution -*/ static void ebpf_function_thread_manipulation_help(const char *transaction) { - pthread_mutex_lock(&lock); - pluginsd_function_result_begin_to_stdout(transaction, HTTP_RESP_OK, "text/plain", now_realtime_sec() + 3600); - fprintf(stdout, "%s", + BUFFER *wb = buffer_create(0, NULL); + buffer_sprintf(wb, "%s", "ebpf.plugin / thread\n" "\n" "Function `thread` allows user to control eBPF threads.\n" @@ -57,13 +89,13 @@ static void ebpf_function_thread_manipulation_help(const char *transaction) { " Disable a sp.\n" "\n" "Filters can be combined. Each filter can be given only one time.\n" - "Process thread is not controlled by functions until we finish the creation of functions per thread..\n" ); - pluginsd_function_result_end_to_stdout(); - fflush(stdout); - pthread_mutex_unlock(&lock); -} + pluginsd_function_result_to_stdout(transaction, HTTP_RESP_OK, "text/plain", now_realtime_sec() + 3600, wb); + + buffer_free(wb); +} +*/ /***************************************************************** * EBPF ERROR FUNCTIONS @@ -79,12 +111,7 @@ static void ebpf_function_thread_manipulation_help(const char *transaction) { * @param msg the error message */ static void ebpf_function_error(const char *transaction, int code, const char *msg) { - char buffer[PLUGINSD_LINE_MAX + 1]; - json_escape_string(buffer, msg, PLUGINSD_LINE_MAX); - - pluginsd_function_result_begin_to_stdout(transaction, code, "application/json", now_realtime_sec()); - fprintf(stdout, "{\"status\":%d,\"error_message\":\"%s\"}", code, buffer); - pluginsd_function_result_end_to_stdout(); + pluginsd_function_json_error_to_stdout(transaction, code, msg); } /***************************************************************** @@ -92,7 +119,7 @@ static void ebpf_function_error(const char *transaction, int code, const char *m *****************************************************************/ /** - * Function enable + * Function: thread * * Enable a specific thread. * @@ -102,7 +129,6 @@ static void ebpf_function_error(const char *transaction, int code, const char *m * @param line_max Number of arguments given * @param timeout The function timeout * @param em The structure with thread information - */ static void ebpf_function_thread_manipulation(const char *transaction, char *function __maybe_unused, char *line_buffer __maybe_unused, @@ -141,27 +167,15 @@ static void ebpf_function_thread_manipulation(const char *transaction, pthread_mutex_lock(&ebpf_exit_cleanup); if (lem->enabled > NETDATA_THREAD_EBPF_FUNCTION_RUNNING) { - struct netdata_static_thread *st = lem->thread; // Load configuration again ebpf_update_module(lem, default_btf, running_on_kernel, isrh); - // another request for thread that already ran, cleanup and restart - if (st->thread) - freez(st->thread); - - if (period <= 0) - period = EBPF_DEFAULT_LIFETIME; - - st->thread = mallocz(sizeof(netdata_thread_t)); - lem->enabled = NETDATA_THREAD_EBPF_FUNCTION_RUNNING; - lem->lifetime = period; - -#ifdef NETDATA_INTERNAL_CHECKS - netdata_log_info("Starting thread %s with lifetime = %d", thread_name, period); -#endif - - netdata_thread_create(st->thread, st->name, NETDATA_THREAD_OPTION_DEFAULT, - st->start_routine, lem); + if (ebpf_function_start_thread(lem, period)) { + ebpf_function_error(transaction, + HTTP_RESP_INTERNAL_SERVER_ERROR, + "Cannot start thread."); + return; + } } else { lem->running_time = 0; if (period > 0) // user is modifying period to run @@ -226,10 +240,10 @@ static void ebpf_function_thread_manipulation(const char *transaction, // THE ORDER SHOULD BE THE SAME WITH THE FIELDS! // thread name - buffer_json_add_array_item_string(wb, wem->thread_name); + buffer_json_add_array_item_string(wb, wem->info.thread_name); // description - buffer_json_add_array_item_string(wb, wem->thread_description); + buffer_json_add_array_item_string(wb, wem->info.thread_description); // Either it is not running or received a disabled signal and it is stopping. if (wem->enabled > NETDATA_THREAD_EBPF_FUNCTION_RUNNING || (!wem->lifetime && (int)wem->running_time == wem->update_every)) { @@ -267,7 +281,7 @@ static void ebpf_function_thread_manipulation(const char *transaction, RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, RRDF_FIELD_FILTER_MULTISELECT, - RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, NULL); + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY | RRDF_FIELD_OPTS_UNIQUE_KEY, NULL); buffer_rrdf_table_add_field(wb, fields_id++, "Description", "Thread Desc", RRDF_FIELD_TYPE_STRING, RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, @@ -349,19 +363,697 @@ static void ebpf_function_thread_manipulation(const char *transaction, buffer_json_finalize(wb); // Lock necessary to avoid race condition - pthread_mutex_lock(&lock); + pluginsd_function_result_to_stdout(transaction, HTTP_RESP_OK, "application/json", expires, wb); + + buffer_free(wb); +} + */ + +/***************************************************************** + * EBPF SOCKET FUNCTION + *****************************************************************/ + +/** + * Thread Help + * + * Shows help with all options accepted by thread function. + * + * @param transaction the transaction id that Netdata sent for this function execution +*/ +static void ebpf_function_socket_help(const char *transaction) { + pluginsd_function_result_begin_to_stdout(transaction, HTTP_RESP_OK, "text/plain", now_realtime_sec() + 3600); + fprintf(stdout, "%s", + "ebpf.plugin / socket\n" + "\n" + "Function `socket` display information for all open sockets during ebpf.plugin runtime.\n" + "During thread runtime the plugin is always collecting data, but when an option is modified, the plugin\n" + "resets completely the previous table and can show a clean data for the first request before to bring the\n" + "modified request.\n" + "\n" + "The following filters are supported:\n" + "\n" + " family:FAMILY\n" + " Shows information for the FAMILY specified. Option accepts IPV4, IPV6 and all, that is the default.\n" + "\n" + " period:PERIOD\n" + " Enable socket to run a specific PERIOD in seconds. When PERIOD is not\n" + " specified plugin will use the default 300 seconds\n" + "\n" + " resolve:BOOL\n" + " Resolve service name, default value is YES.\n" + "\n" + " range:CIDR\n" + " Show sockets that have only a specific destination. Default all addresses.\n" + "\n" + " port:range\n" + " Show sockets that have only a specific destination.\n" + "\n" + " reset\n" + " Send a reset to collector. When a collector receives this command, it uses everything defined in configuration file.\n" + "\n" + " interfaces\n" + " When the collector receives this command, it read all available interfaces on host.\n" + "\n" + "Filters can be combined. Each filter can be given only one time. Default all ports\n" + ); + pluginsd_function_result_end_to_stdout(); + fflush(stdout); +} + +/** + * Fill Fake socket + * + * Fill socket with an invalid request. + * + * @param fake_values is the structure where we are storing the value. + */ +static inline void ebpf_socket_fill_fake_socket(netdata_socket_plus_t *fake_values) +{ + snprintfz(fake_values->socket_string.src_ip, INET6_ADDRSTRLEN, "%s", "127.0.0.1"); + snprintfz(fake_values->socket_string.dst_ip, INET6_ADDRSTRLEN, "%s", "127.0.0.1"); + fake_values->pid = getpid(); + //fake_values->socket_string.src_port = 0; + fake_values->socket_string.dst_port[0] = 0; + snprintfz(fake_values->socket_string.dst_ip, NI_MAXSERV, "%s", "none"); + fake_values->data.family = AF_INET; + fake_values->data.protocol = AF_UNSPEC; +} + +/** + * Fill function buffer + * + * Fill buffer with data to be shown on cloud. + * + * @param wb buffer where we store data. + * @param values data read from hash table + * @param name the process name + */ +static void ebpf_fill_function_buffer(BUFFER *wb, netdata_socket_plus_t *values, char *name) +{ + buffer_json_add_array_item_array(wb); + + // IMPORTANT! + // THE ORDER SHOULD BE THE SAME WITH THE FIELDS! + + // PID + buffer_json_add_array_item_uint64(wb, (uint64_t)values->pid); + + // NAME + buffer_json_add_array_item_string(wb, (name) ? name : "not identified"); + + // Origin + buffer_json_add_array_item_string(wb, (values->data.external_origin) ? "incoming" : "outgoing"); + + // Source IP + buffer_json_add_array_item_string(wb, values->socket_string.src_ip); + + // SRC Port + //buffer_json_add_array_item_uint64(wb, (uint64_t) values->socket_string.src_port); + + // Destination IP + buffer_json_add_array_item_string(wb, values->socket_string.dst_ip); + + // DST Port + buffer_json_add_array_item_string(wb, values->socket_string.dst_port); + + uint64_t connections; + if (values->data.protocol == IPPROTO_TCP) { + // Protocol + buffer_json_add_array_item_string(wb, "TCP"); + + // Bytes received + buffer_json_add_array_item_uint64(wb, (uint64_t) values->data.tcp.tcp_bytes_received); + + // Bytes sent + buffer_json_add_array_item_uint64(wb, (uint64_t) values->data.tcp.tcp_bytes_sent); + + // Connections + connections = values->data.tcp.ipv4_connect + values->data.tcp.ipv6_connect; + } else if (values->data.protocol == IPPROTO_UDP) { + // Protocol + buffer_json_add_array_item_string(wb, "UDP"); + + // Bytes received + buffer_json_add_array_item_uint64(wb, (uint64_t) values->data.udp.udp_bytes_received); + + // Bytes sent + buffer_json_add_array_item_uint64(wb, (uint64_t) values->data.udp.udp_bytes_sent); + + // Connections + connections = values->data.udp.call_udp_sent + values->data.udp.call_udp_received; + } else { + // Protocol + buffer_json_add_array_item_string(wb, "UNSPEC"); + + // Bytes received + buffer_json_add_array_item_uint64(wb, 0); + + // Bytes sent + buffer_json_add_array_item_uint64(wb, 0); + + connections = 1; + } + + // Connections + if (values->flags & NETDATA_SOCKET_FLAGS_ALREADY_OPEN) { + connections++; + } else if (!connections) { + // If no connections, this means that we lost when connection was opened + values->flags |= NETDATA_SOCKET_FLAGS_ALREADY_OPEN; + connections++; + } + buffer_json_add_array_item_uint64(wb, connections); + + buffer_json_array_close(wb); +} + +/** + * Clean Judy array unsafe + * + * Clean all Judy Array allocated to show table when a function is called. + * Before to call this function it is necessary to lock `ebpf_judy_pid.index.rw_spinlock`. + **/ +static void ebpf_socket_clean_judy_array_unsafe() +{ + if (!ebpf_judy_pid.index.JudyLArray) + return; + + Pvoid_t *pid_value, *socket_value; + Word_t local_pid = 0, local_socket = 0; + bool first_pid = true, first_socket = true; + while ((pid_value = JudyLFirstThenNext(ebpf_judy_pid.index.JudyLArray, &local_pid, &first_pid))) { + netdata_ebpf_judy_pid_stats_t *pid_ptr = (netdata_ebpf_judy_pid_stats_t *)*pid_value; + rw_spinlock_write_lock(&pid_ptr->socket_stats.rw_spinlock); + if (pid_ptr->socket_stats.JudyLArray) { + while ((socket_value = JudyLFirstThenNext(pid_ptr->socket_stats.JudyLArray, &local_socket, &first_socket))) { + netdata_socket_plus_t *socket_clean = *socket_value; + aral_freez(aral_socket_table, socket_clean); + } + JudyLFreeArray(&pid_ptr->socket_stats.JudyLArray, PJE0); + pid_ptr->socket_stats.JudyLArray = NULL; + } + rw_spinlock_write_unlock(&pid_ptr->socket_stats.rw_spinlock); + } +} + +/** + * Fill function buffer unsafe + * + * Fill the function buffer with socket information. Before to call this function it is necessary to lock + * ebpf_judy_pid.index.rw_spinlock + * + * @param buf buffer used to store data to be shown by function. + * + * @return it returns 0 on success and -1 otherwise. + */ +static void ebpf_socket_fill_function_buffer_unsafe(BUFFER *buf) +{ + int counter = 0; + + Pvoid_t *pid_value, *socket_value; + Word_t local_pid = 0; + bool first_pid = true; + while ((pid_value = JudyLFirstThenNext(ebpf_judy_pid.index.JudyLArray, &local_pid, &first_pid))) { + netdata_ebpf_judy_pid_stats_t *pid_ptr = (netdata_ebpf_judy_pid_stats_t *)*pid_value; + bool first_socket = true; + Word_t local_timestamp = 0; + rw_spinlock_read_lock(&pid_ptr->socket_stats.rw_spinlock); + if (pid_ptr->socket_stats.JudyLArray) { + while ((socket_value = JudyLFirstThenNext(pid_ptr->socket_stats.JudyLArray, &local_timestamp, &first_socket))) { + netdata_socket_plus_t *values = (netdata_socket_plus_t *)*socket_value; + ebpf_fill_function_buffer(buf, values, pid_ptr->cmdline); + } + counter++; + } + rw_spinlock_read_unlock(&pid_ptr->socket_stats.rw_spinlock); + } + + if (!counter) { + netdata_socket_plus_t fake_values = { }; + ebpf_socket_fill_fake_socket(&fake_values); + ebpf_fill_function_buffer(buf, &fake_values, NULL); + } +} + +/** + * Socket read hash + * + * This is the thread callback. + * This thread is necessary, because we cannot freeze the whole plugin to read the data on very busy socket. + * + * @param buf the buffer to store data; + * @param em the module main structure. + * + * @return It always returns NULL. + */ +void ebpf_socket_read_open_connections(BUFFER *buf, struct ebpf_module *em) +{ + // thread was not initialized or Array was reset + rw_spinlock_read_lock(&ebpf_judy_pid.index.rw_spinlock); + if (!em->maps || (em->maps[NETDATA_SOCKET_OPEN_SOCKET].map_fd == ND_EBPF_MAP_FD_NOT_INITIALIZED) || + !ebpf_judy_pid.index.JudyLArray){ + netdata_socket_plus_t fake_values = { }; + + ebpf_socket_fill_fake_socket(&fake_values); + + ebpf_fill_function_buffer(buf, &fake_values, NULL); + rw_spinlock_read_unlock(&ebpf_judy_pid.index.rw_spinlock); + return; + } + + rw_spinlock_read_lock(&network_viewer_opt.rw_spinlock); + ebpf_socket_fill_function_buffer_unsafe(buf); + rw_spinlock_read_unlock(&network_viewer_opt.rw_spinlock); + rw_spinlock_read_unlock(&ebpf_judy_pid.index.rw_spinlock); +} + +/** + * Function: Socket + * + * Show information for sockets stored in hash tables. + * + * @param transaction the transaction id that Netdata sent for this function execution + * @param function function name and arguments given to thread. + * @param timeout The function timeout + * @param cancelled Variable used to store function status. + */ +static void ebpf_function_socket_manipulation(const char *transaction, + char *function __maybe_unused, + int timeout __maybe_unused, + bool *cancelled __maybe_unused) +{ + UNUSED(timeout); + ebpf_module_t *em = &ebpf_modules[EBPF_MODULE_SOCKET_IDX]; + + char *words[PLUGINSD_MAX_WORDS] = {NULL}; + size_t num_words = quoted_strings_splitter_pluginsd(function, words, PLUGINSD_MAX_WORDS); + const char *name; + int period = -1; + rw_spinlock_write_lock(&ebpf_judy_pid.index.rw_spinlock); + network_viewer_opt.enabled = CONFIG_BOOLEAN_YES; + uint32_t previous; + + for (int i = 1; i < PLUGINSD_MAX_WORDS; i++) { + const char *keyword = get_word(words, num_words, i); + if (!keyword) + break; + + if (strncmp(keyword, EBPF_FUNCTION_SOCKET_FAMILY, sizeof(EBPF_FUNCTION_SOCKET_FAMILY) - 1) == 0) { + name = &keyword[sizeof(EBPF_FUNCTION_SOCKET_FAMILY) - 1]; + previous = network_viewer_opt.family; + uint32_t family = AF_UNSPEC; + if (!strcmp(name, "IPV4")) + family = AF_INET; + else if (!strcmp(name, "IPV6")) + family = AF_INET6; + + if (family != previous) { + rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); + network_viewer_opt.family = family; + rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); + ebpf_socket_clean_judy_array_unsafe(); + } + } else if (strncmp(keyword, EBPF_FUNCTION_SOCKET_PERIOD, sizeof(EBPF_FUNCTION_SOCKET_PERIOD) - 1) == 0) { + name = &keyword[sizeof(EBPF_FUNCTION_SOCKET_PERIOD) - 1]; + pthread_mutex_lock(&ebpf_exit_cleanup); + period = str2i(name); + if (period > 0) { + em->lifetime = period; + } else + em->lifetime = EBPF_NON_FUNCTION_LIFE_TIME; + +#ifdef NETDATA_DEV_MODE + collector_info("Lifetime modified for %u", em->lifetime); +#endif + pthread_mutex_unlock(&ebpf_exit_cleanup); + } else if (strncmp(keyword, EBPF_FUNCTION_SOCKET_RESOLVE, sizeof(EBPF_FUNCTION_SOCKET_RESOLVE) - 1) == 0) { + previous = network_viewer_opt.service_resolution_enabled; + uint32_t resolution; + name = &keyword[sizeof(EBPF_FUNCTION_SOCKET_RESOLVE) - 1]; + resolution = (!strcasecmp(name, "YES")) ? CONFIG_BOOLEAN_YES : CONFIG_BOOLEAN_NO; + + if (previous != resolution) { + rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); + network_viewer_opt.service_resolution_enabled = resolution; + rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); + + ebpf_socket_clean_judy_array_unsafe(); + } + } else if (strncmp(keyword, EBPF_FUNCTION_SOCKET_RANGE, sizeof(EBPF_FUNCTION_SOCKET_RANGE) - 1) == 0) { + name = &keyword[sizeof(EBPF_FUNCTION_SOCKET_RANGE) - 1]; + rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); + ebpf_clean_ip_structure(&network_viewer_opt.included_ips); + ebpf_clean_ip_structure(&network_viewer_opt.excluded_ips); + ebpf_parse_ips_unsafe((char *)name); + rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); + + ebpf_socket_clean_judy_array_unsafe(); + } else if (strncmp(keyword, EBPF_FUNCTION_SOCKET_PORT, sizeof(EBPF_FUNCTION_SOCKET_PORT) - 1) == 0) { + name = &keyword[sizeof(EBPF_FUNCTION_SOCKET_PORT) - 1]; + rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); + ebpf_clean_port_structure(&network_viewer_opt.included_port); + ebpf_clean_port_structure(&network_viewer_opt.excluded_port); + ebpf_parse_ports((char *)name); + rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); + + ebpf_socket_clean_judy_array_unsafe(); + } else if (strncmp(keyword, EBPF_FUNCTION_SOCKET_RESET, sizeof(EBPF_FUNCTION_SOCKET_RESET) - 1) == 0) { + rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); + ebpf_clean_port_structure(&network_viewer_opt.included_port); + ebpf_clean_port_structure(&network_viewer_opt.excluded_port); + + ebpf_clean_ip_structure(&network_viewer_opt.included_ips); + ebpf_clean_ip_structure(&network_viewer_opt.excluded_ips); + ebpf_clean_ip_structure(&network_viewer_opt.ipv4_local_ip); + ebpf_clean_ip_structure(&network_viewer_opt.ipv6_local_ip); + + parse_network_viewer_section(&socket_config); + ebpf_read_local_addresses_unsafe(); + network_viewer_opt.enabled = CONFIG_BOOLEAN_YES; + rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); + } else if (strncmp(keyword, EBPF_FUNCTION_SOCKET_INTERFACES, sizeof(EBPF_FUNCTION_SOCKET_INTERFACES) - 1) == 0) { + rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); + ebpf_read_local_addresses_unsafe(); + rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); + } else if (strncmp(keyword, "help", 4) == 0) { + ebpf_function_socket_help(transaction); + rw_spinlock_write_unlock(&ebpf_judy_pid.index.rw_spinlock); + return; + } + } + rw_spinlock_write_unlock(&ebpf_judy_pid.index.rw_spinlock); + + pthread_mutex_lock(&ebpf_exit_cleanup); + if (em->enabled > NETDATA_THREAD_EBPF_FUNCTION_RUNNING) { + // Cleanup when we already had a thread running + rw_spinlock_write_lock(&ebpf_judy_pid.index.rw_spinlock); + ebpf_socket_clean_judy_array_unsafe(); + rw_spinlock_write_unlock(&ebpf_judy_pid.index.rw_spinlock); + + if (ebpf_function_start_thread(em, period)) { + ebpf_function_error(transaction, + HTTP_RESP_INTERNAL_SERVER_ERROR, + "Cannot start thread."); + pthread_mutex_unlock(&ebpf_exit_cleanup); + return; + } + } else { + if (period < 0 && em->lifetime < EBPF_NON_FUNCTION_LIFE_TIME) { + em->lifetime = EBPF_NON_FUNCTION_LIFE_TIME; + } + } + pthread_mutex_unlock(&ebpf_exit_cleanup); + + time_t expires = now_realtime_sec() + em->update_every; + + BUFFER *wb = buffer_create(PLUGINSD_LINE_MAX, NULL); + buffer_json_initialize(wb, "\"", "\"", 0, true, false); + buffer_json_member_add_uint64(wb, "status", HTTP_RESP_OK); + buffer_json_member_add_string(wb, "type", "table"); + buffer_json_member_add_time_t(wb, "update_every", em->update_every); + buffer_json_member_add_string(wb, "help", EBPF_PLUGIN_SOCKET_FUNCTION_DESCRIPTION); + + // Collect data + buffer_json_member_add_array(wb, "data"); + ebpf_socket_read_open_connections(wb, em); + buffer_json_array_close(wb); // data + + buffer_json_member_add_object(wb, "columns"); + { + int fields_id = 0; + + // IMPORTANT! + // THE ORDER SHOULD BE THE SAME WITH THE VALUES! + buffer_rrdf_table_add_field(wb, fields_id++, "PID", "Process ID", RRDF_FIELD_TYPE_INTEGER, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NUMBER, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, + NULL); + + buffer_rrdf_table_add_field(wb, fields_id++, "Process Name", "Process Name", RRDF_FIELD_TYPE_STRING, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, NULL); + + buffer_rrdf_table_add_field(wb, fields_id++, "Origin", "The connection origin.", RRDF_FIELD_TYPE_STRING, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, NULL); + + buffer_rrdf_table_add_field(wb, fields_id++, "Request from", "Request from IP", RRDF_FIELD_TYPE_STRING, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, NULL); + + /* + buffer_rrdf_table_add_field(wb, fields_id++, "SRC PORT", "Source Port", RRDF_FIELD_TYPE_INTEGER, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NUMBER, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, + NULL); + */ + + buffer_rrdf_table_add_field(wb, fields_id++, "Destination IP", "Destination IP", RRDF_FIELD_TYPE_STRING, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, NULL); + + buffer_rrdf_table_add_field(wb, fields_id++, "Destination Port", "Destination Port", RRDF_FIELD_TYPE_STRING, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, NULL); + + buffer_rrdf_table_add_field(wb, fields_id++, "Protocol", "Communication protocol", RRDF_FIELD_TYPE_STRING, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NONE, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, NULL); + + buffer_rrdf_table_add_field(wb, fields_id++, "Incoming Bandwidth", "Bytes received.", RRDF_FIELD_TYPE_INTEGER, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NUMBER, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, + NULL); + + buffer_rrdf_table_add_field(wb, fields_id++, "Outgoing Bandwidth", "Bytes sent.", RRDF_FIELD_TYPE_INTEGER, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NUMBER, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, + NULL); + + buffer_rrdf_table_add_field(wb, fields_id, "Connections", "Number of calls to tcp_vX_connections and udp_sendmsg, where X is the protocol version.", RRDF_FIELD_TYPE_INTEGER, + RRDF_FIELD_VISUAL_VALUE, RRDF_FIELD_TRANSFORM_NUMBER, 0, NULL, NAN, + RRDF_FIELD_SORT_ASCENDING, NULL, RRDF_FIELD_SUMMARY_COUNT, + RRDF_FIELD_FILTER_MULTISELECT, + RRDF_FIELD_OPTS_VISIBLE | RRDF_FIELD_OPTS_STICKY, + NULL); + } + buffer_json_object_close(wb); // columns + + buffer_json_member_add_object(wb, "charts"); + { + // OutBound Connections + buffer_json_member_add_object(wb, "IPInboundConn"); + { + buffer_json_member_add_string(wb, "name", "TCP Inbound Connection"); + buffer_json_member_add_string(wb, "type", "line"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "connected_tcp"); + buffer_json_add_array_item_string(wb, "connected_udp"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // OutBound Connections + buffer_json_member_add_object(wb, "IPTCPOutboundConn"); + { + buffer_json_member_add_string(wb, "name", "TCP Outbound Connection"); + buffer_json_member_add_string(wb, "type", "line"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "connected_V4"); + buffer_json_add_array_item_string(wb, "connected_V6"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // TCP Functions + buffer_json_member_add_object(wb, "TCPFunctions"); + { + buffer_json_member_add_string(wb, "name", "TCPFunctions"); + buffer_json_member_add_string(wb, "type", "line"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "received"); + buffer_json_add_array_item_string(wb, "sent"); + buffer_json_add_array_item_string(wb, "close"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // TCP Bandwidth + buffer_json_member_add_object(wb, "TCPBandwidth"); + { + buffer_json_member_add_string(wb, "name", "TCPBandwidth"); + buffer_json_member_add_string(wb, "type", "line"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "received"); + buffer_json_add_array_item_string(wb, "sent"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // UDP Functions + buffer_json_member_add_object(wb, "UDPFunctions"); + { + buffer_json_member_add_string(wb, "name", "UDPFunctions"); + buffer_json_member_add_string(wb, "type", "line"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "received"); + buffer_json_add_array_item_string(wb, "sent"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // UDP Bandwidth + buffer_json_member_add_object(wb, "UDPBandwidth"); + { + buffer_json_member_add_string(wb, "name", "UDPBandwidth"); + buffer_json_member_add_string(wb, "type", "line"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "received"); + buffer_json_add_array_item_string(wb, "sent"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + } + buffer_json_object_close(wb); // charts + + buffer_json_member_add_string(wb, "default_sort_column", "PID"); + + // Do we use only on fields that can be groupped? + buffer_json_member_add_object(wb, "group_by"); + { + // group by PID + buffer_json_member_add_object(wb, "PID"); + { + buffer_json_member_add_string(wb, "name", "Process ID"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "PID"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // group by Process Name + buffer_json_member_add_object(wb, "Process Name"); + { + buffer_json_member_add_string(wb, "name", "Process Name"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "Process Name"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // group by Process Name + buffer_json_member_add_object(wb, "Origin"); + { + buffer_json_member_add_string(wb, "name", "Origin"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "Origin"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // group by Request From IP + buffer_json_member_add_object(wb, "Request from"); + { + buffer_json_member_add_string(wb, "name", "Request from IP"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "Request from"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // group by Destination IP + buffer_json_member_add_object(wb, "Destination IP"); + { + buffer_json_member_add_string(wb, "name", "Destination IP"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "Destination IP"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // group by DST Port + buffer_json_member_add_object(wb, "Destination Port"); + { + buffer_json_member_add_string(wb, "name", "Destination Port"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "Destination Port"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + + // group by Protocol + buffer_json_member_add_object(wb, "Protocol"); + { + buffer_json_member_add_string(wb, "name", "Protocol"); + buffer_json_member_add_array(wb, "columns"); + { + buffer_json_add_array_item_string(wb, "Protocol"); + } + buffer_json_array_close(wb); + } + buffer_json_object_close(wb); + } + buffer_json_object_close(wb); // group_by + + buffer_json_member_add_time_t(wb, "expires", expires); + buffer_json_finalize(wb); + + // Lock necessary to avoid race condition pluginsd_function_result_begin_to_stdout(transaction, HTTP_RESP_OK, "application/json", expires); fwrite(buffer_tostring(wb), buffer_strlen(wb), 1, stdout); pluginsd_function_result_end_to_stdout(); fflush(stdout); - pthread_mutex_unlock(&lock); buffer_free(wb); } - /***************************************************************** * EBPF FUNCTION THREAD *****************************************************************/ @@ -375,45 +1067,27 @@ static void ebpf_function_thread_manipulation(const char *transaction, */ void *ebpf_function_thread(void *ptr) { - ebpf_module_t *em = (ebpf_module_t *)ptr; - char buffer[PLUGINSD_LINE_MAX + 1]; - - char *s = NULL; - while(!ebpf_exit_plugin && (s = fgets(buffer, PLUGINSD_LINE_MAX, stdin))) { - char *words[PLUGINSD_MAX_WORDS] = { NULL }; - size_t num_words = quoted_strings_splitter_pluginsd(buffer, words, PLUGINSD_MAX_WORDS); - - const char *keyword = get_word(words, num_words, 0); - - if(keyword && strcmp(keyword, PLUGINSD_KEYWORD_FUNCTION) == 0) { - char *transaction = get_word(words, num_words, 1); - char *timeout_s = get_word(words, num_words, 2); - char *function = get_word(words, num_words, 3); - - if(!transaction || !*transaction || !timeout_s || !*timeout_s || !function || !*function) { - netdata_log_error("Received incomplete %s (transaction = '%s', timeout = '%s', function = '%s'). Ignoring it.", - keyword, - transaction?transaction:"(unset)", - timeout_s?timeout_s:"(unset)", - function?function:"(unset)"); - } - else { - int timeout = str2i(timeout_s); - if (!strncmp(function, EBPF_FUNCTION_THREAD, sizeof(EBPF_FUNCTION_THREAD) - 1)) - ebpf_function_thread_manipulation(transaction, - function, - buffer, - PLUGINSD_LINE_MAX + 1, - timeout, - em); - else - ebpf_function_error(transaction, - HTTP_RESP_NOT_FOUND, - "No function with this name found in ebpf.plugin."); - } + (void)ptr; + + struct functions_evloop_globals *wg = functions_evloop_init(1, + "EBPF", + &lock, + &ebpf_plugin_exit); + + functions_evloop_add_function(wg, + "ebpf_socket", + ebpf_function_socket_manipulation, + PLUGINS_FUNCTIONS_TIMEOUT_DEFAULT); + + heartbeat_t hb; + heartbeat_init(&hb); + while(!ebpf_plugin_exit) { + (void)heartbeat_next(&hb, USEC_PER_SEC); + + if (ebpf_plugin_exit) { + break; } - else - netdata_log_error("Received unknown command: %s", keyword ? keyword : "(unset)"); } + return NULL; } diff --git a/collectors/ebpf.plugin/ebpf_functions.h b/collectors/ebpf.plugin/ebpf_functions.h index b20dab634..795703b42 100644 --- a/collectors/ebpf.plugin/ebpf_functions.h +++ b/collectors/ebpf.plugin/ebpf_functions.h @@ -3,20 +3,25 @@ #ifndef NETDATA_EBPF_FUNCTIONS_H #define NETDATA_EBPF_FUNCTIONS_H 1 +#ifdef NETDATA_DEV_MODE +// Common +static inline void EBPF_PLUGIN_FUNCTIONS(const char *NAME, const char *DESC) { + fprintf(stdout, "%s \"%s\" 10 \"%s\"\n", PLUGINSD_KEYWORD_FUNCTION, NAME, DESC); +} +#endif + // configuration file & description #define NETDATA_DIRECTORY_FUNCTIONS_CONFIG_FILE "functions.conf" #define NETDATA_EBPF_FUNCTIONS_MODULE_DESC "Show information about current function status." // function list #define EBPF_FUNCTION_THREAD "ebpf_thread" +#define EBPF_FUNCTION_SOCKET "ebpf_socket" +// thread constants #define EBPF_PLUGIN_THREAD_FUNCTION_DESCRIPTION "Detailed information about eBPF threads." #define EBPF_PLUGIN_THREAD_FUNCTION_ERROR_THREAD_NOT_FOUND "ebpf.plugin does not have thread named " -#define EBPF_PLUGIN_FUNCTIONS(NAME, DESC) do { \ - fprintf(stdout, PLUGINSD_KEYWORD_FUNCTION " \"" NAME "\" 10 \"%s\"\n", DESC); \ -} while(0) - #define EBPF_THREADS_SELECT_THREAD "thread:" #define EBPF_THREADS_ENABLE_CATEGORY "enable:" #define EBPF_THREADS_DISABLE_CATEGORY "disable:" @@ -24,6 +29,16 @@ #define EBPF_THREAD_STATUS_RUNNING "running" #define EBPF_THREAD_STATUS_STOPPED "stopped" +// socket constants +#define EBPF_PLUGIN_SOCKET_FUNCTION_DESCRIPTION "Detailed information about open sockets." +#define EBPF_FUNCTION_SOCKET_FAMILY "family:" +#define EBPF_FUNCTION_SOCKET_PERIOD "period:" +#define EBPF_FUNCTION_SOCKET_RESOLVE "resolve:" +#define EBPF_FUNCTION_SOCKET_RANGE "range:" +#define EBPF_FUNCTION_SOCKET_PORT "port:" +#define EBPF_FUNCTION_SOCKET_RESET "reset" +#define EBPF_FUNCTION_SOCKET_INTERFACES "interfaces" + void *ebpf_function_thread(void *ptr); #endif diff --git a/collectors/ebpf.plugin/ebpf_hardirq.c b/collectors/ebpf.plugin/ebpf_hardirq.c index 9092c7ac3..707d92577 100644 --- a/collectors/ebpf.plugin/ebpf_hardirq.c +++ b/collectors/ebpf.plugin/ebpf_hardirq.c @@ -580,10 +580,10 @@ static void hardirq_collector(ebpf_module_t *em) //This will be cancelled by its parent uint32_t running_time = 0; uint32_t lifetime = em->lifetime; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; diff --git a/collectors/ebpf.plugin/ebpf_mdflush.c b/collectors/ebpf.plugin/ebpf_mdflush.c index 3548d673b..c0adf2ea4 100644 --- a/collectors/ebpf.plugin/ebpf_mdflush.c +++ b/collectors/ebpf.plugin/ebpf_mdflush.c @@ -345,10 +345,10 @@ static void mdflush_collector(ebpf_module_t *em) int maps_per_core = em->maps_per_core; uint32_t running_time = 0; uint32_t lifetime = em->lifetime; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; diff --git a/collectors/ebpf.plugin/ebpf_mount.c b/collectors/ebpf.plugin/ebpf_mount.c index 57ea5b2f4..473036bd7 100644 --- a/collectors/ebpf.plugin/ebpf_mount.c +++ b/collectors/ebpf.plugin/ebpf_mount.c @@ -367,9 +367,9 @@ static void mount_collector(ebpf_module_t *em) int maps_per_core = em->maps_per_core; uint32_t running_time = 0; uint32_t lifetime = em->lifetime; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -466,7 +466,7 @@ static int ebpf_mount_load_bpf(ebpf_module_t *em) #endif if (ret) - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); return ret; } diff --git a/collectors/ebpf.plugin/ebpf_oomkill.c b/collectors/ebpf.plugin/ebpf_oomkill.c index 84830160a..16ce0bddf 100644 --- a/collectors/ebpf.plugin/ebpf_oomkill.c +++ b/collectors/ebpf.plugin/ebpf_oomkill.c @@ -420,9 +420,9 @@ static void oomkill_collector(ebpf_module_t *em) uint32_t running_time = 0; uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; diff --git a/collectors/ebpf.plugin/ebpf_process.c b/collectors/ebpf.plugin/ebpf_process.c index 3537efc55..577044e59 100644 --- a/collectors/ebpf.plugin/ebpf_process.c +++ b/collectors/ebpf.plugin/ebpf_process.c @@ -1118,10 +1118,10 @@ static void process_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { usec_t dt = heartbeat_next(&hb, USEC_PER_SEC); (void)dt; - if (ebpf_exit_plugin) + if (ebpf_plugin_exit) break; if (++counter == update_every) { diff --git a/collectors/ebpf.plugin/ebpf_process.h b/collectors/ebpf.plugin/ebpf_process.h index d49e38452..310b321d6 100644 --- a/collectors/ebpf.plugin/ebpf_process.h +++ b/collectors/ebpf.plugin/ebpf_process.h @@ -52,7 +52,8 @@ enum netdata_ebpf_stats_order { NETDATA_EBPF_ORDER_STAT_HASH_GLOBAL_TABLE_TOTAL, NETDATA_EBPF_ORDER_STAT_HASH_PID_TABLE_ADDED, NETDATA_EBPF_ORDER_STAT_HASH_PID_TABLE_REMOVED, - NETATA_EBPF_ORDER_STAT_ARAL_BEGIN + NETATA_EBPF_ORDER_STAT_ARAL_BEGIN, + NETDATA_EBPF_ORDER_FUNCTION_PER_THREAD, }; enum netdata_ebpf_load_mode_stats{ diff --git a/collectors/ebpf.plugin/ebpf_shm.c b/collectors/ebpf.plugin/ebpf_shm.c index baeb7204e..c171762b6 100644 --- a/collectors/ebpf.plugin/ebpf_shm.c +++ b/collectors/ebpf.plugin/ebpf_shm.c @@ -1035,9 +1035,9 @@ static void shm_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -1222,7 +1222,7 @@ static int ebpf_shm_load_bpf(ebpf_module_t *em) if (ret) - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); return ret; } diff --git a/collectors/ebpf.plugin/ebpf_socket.c b/collectors/ebpf.plugin/ebpf_socket.c index e4798b30c..3e3897551 100644 --- a/collectors/ebpf.plugin/ebpf_socket.c +++ b/collectors/ebpf.plugin/ebpf_socket.c @@ -5,9 +5,6 @@ #include "ebpf.h" #include "ebpf_socket.h" -// ---------------------------------------------------------------------------- -// ARAL vectors used to speed up processing - /***************************************************************** * * GLOBAL VARIABLES @@ -23,16 +20,7 @@ static char *socket_id_names[NETDATA_MAX_SOCKET_VECTOR] = { "tcp_cleanup_rbuf", "tcp_connect_v4", "tcp_connect_v6", "inet_csk_accept_tcp", "inet_csk_accept_udp" }; -static ebpf_local_maps_t socket_maps[] = {{.name = "tbl_bandwidth", - .internal_input = NETDATA_COMPILED_CONNECTIONS_ALLOWED, - .user_input = NETDATA_MAXIMUM_CONNECTIONS_ALLOWED, - .type = NETDATA_EBPF_MAP_RESIZABLE | NETDATA_EBPF_MAP_PID, - .map_fd = ND_EBPF_MAP_FD_NOT_INITIALIZED, -#ifdef LIBBPF_MAJOR_VERSION - .map_type = BPF_MAP_TYPE_PERCPU_HASH -#endif - }, - {.name = "tbl_global_sock", +static ebpf_local_maps_t socket_maps[] = {{.name = "tbl_global_sock", .internal_input = NETDATA_SOCKET_COUNTER, .user_input = 0, .type = NETDATA_EBPF_MAP_STATIC, .map_fd = ND_EBPF_MAP_FD_NOT_INITIALIZED, @@ -48,16 +36,7 @@ static ebpf_local_maps_t socket_maps[] = {{.name = "tbl_bandwidth", .map_type = BPF_MAP_TYPE_PERCPU_HASH #endif }, - {.name = "tbl_conn_ipv4", - .internal_input = NETDATA_COMPILED_CONNECTIONS_ALLOWED, - .user_input = NETDATA_MAXIMUM_CONNECTIONS_ALLOWED, - .type = NETDATA_EBPF_MAP_STATIC, - .map_fd = ND_EBPF_MAP_FD_NOT_INITIALIZED, -#ifdef LIBBPF_MAJOR_VERSION - .map_type = BPF_MAP_TYPE_PERCPU_HASH -#endif - }, - {.name = "tbl_conn_ipv6", + {.name = "tbl_nd_socket", .internal_input = NETDATA_COMPILED_CONNECTIONS_ALLOWED, .user_input = NETDATA_MAXIMUM_CONNECTIONS_ALLOWED, .type = NETDATA_EBPF_MAP_STATIC, @@ -93,11 +72,6 @@ static netdata_idx_t *socket_hash_values = NULL; static netdata_syscall_stat_t socket_aggregated_data[NETDATA_MAX_SOCKET_VECTOR]; static netdata_publish_syscall_t socket_publish_aggregated[NETDATA_MAX_SOCKET_VECTOR]; -static ebpf_bandwidth_t *bandwidth_vector = NULL; - -pthread_mutex_t nv_mutex; -netdata_vector_plot_t inbound_vectors = { .plot = NULL, .next = 0, .last = 0 }; -netdata_vector_plot_t outbound_vectors = { .plot = NULL, .next = 0, .last = 0 }; netdata_socket_t *socket_values; ebpf_network_viewer_port_list_t *listen_ports = NULL; @@ -108,28 +82,30 @@ struct config socket_config = { .first_section = NULL, .index = { .avl_tree = { .root = NULL, .compar = appconfig_section_compare }, .rwlock = AVL_LOCK_INITIALIZER } }; -netdata_ebpf_targets_t socket_targets[] = { {.name = "inet_csk_accept", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "tcp_retransmit_skb", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "tcp_cleanup_rbuf", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "tcp_close", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "udp_recvmsg", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "tcp_sendmsg", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "udp_sendmsg", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "tcp_v4_connect", .mode = EBPF_LOAD_TRAMPOLINE}, - {.name = "tcp_v6_connect", .mode = EBPF_LOAD_TRAMPOLINE}, +netdata_ebpf_targets_t socket_targets[] = { {.name = "inet_csk_accept", .mode = EBPF_LOAD_PROBE}, + {.name = "tcp_retransmit_skb", .mode = EBPF_LOAD_PROBE}, + {.name = "tcp_cleanup_rbuf", .mode = EBPF_LOAD_PROBE}, + {.name = "tcp_close", .mode = EBPF_LOAD_PROBE}, + {.name = "udp_recvmsg", .mode = EBPF_LOAD_PROBE}, + {.name = "tcp_sendmsg", .mode = EBPF_LOAD_PROBE}, + {.name = "udp_sendmsg", .mode = EBPF_LOAD_PROBE}, + {.name = "tcp_v4_connect", .mode = EBPF_LOAD_PROBE}, + {.name = "tcp_v6_connect", .mode = EBPF_LOAD_PROBE}, {.name = NULL, .mode = EBPF_LOAD_TRAMPOLINE}}; -struct netdata_static_thread socket_threads = { - .name = "EBPF SOCKET READ", - .config_section = NULL, - .config_name = NULL, - .env_name = NULL, - .enabled = 1, - .thread = NULL, - .init_routine = NULL, - .start_routine = NULL +struct netdata_static_thread ebpf_read_socket = { + .name = "EBPF_READ_SOCKET", + .config_section = NULL, + .config_name = NULL, + .env_name = NULL, + .enabled = 1, + .thread = NULL, + .init_routine = NULL, + .start_routine = NULL }; +ARAL *aral_socket_table = NULL; + #ifdef NETDATA_DEV_MODE int socket_disable_priority; #endif @@ -145,7 +121,9 @@ int socket_disable_priority; static void ebpf_socket_disable_probes(struct socket_bpf *obj) { bpf_program__set_autoload(obj->progs.netdata_inet_csk_accept_kretprobe, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_kprobe, false); bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_kretprobe, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_kprobe, false); bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_kretprobe, false); bpf_program__set_autoload(obj->progs.netdata_tcp_retransmit_skb_kprobe, false); bpf_program__set_autoload(obj->progs.netdata_tcp_cleanup_rbuf_kprobe, false); @@ -156,7 +134,6 @@ static void ebpf_socket_disable_probes(struct socket_bpf *obj) bpf_program__set_autoload(obj->progs.netdata_tcp_sendmsg_kprobe, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_kretprobe, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_kprobe, false); - bpf_program__set_autoload(obj->progs.netdata_socket_release_task_kprobe, false); } /** @@ -168,8 +145,10 @@ static void ebpf_socket_disable_probes(struct socket_bpf *obj) */ static void ebpf_socket_disable_trampoline(struct socket_bpf *obj) { - bpf_program__set_autoload(obj->progs.netdata_inet_csk_accept_fentry, false); + bpf_program__set_autoload(obj->progs.netdata_inet_csk_accept_fexit, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_fentry, false); bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_fexit, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_fentry, false); bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_fexit, false); bpf_program__set_autoload(obj->progs.netdata_tcp_retransmit_skb_fentry, false); bpf_program__set_autoload(obj->progs.netdata_tcp_cleanup_rbuf_fentry, false); @@ -180,7 +159,6 @@ static void ebpf_socket_disable_trampoline(struct socket_bpf *obj) bpf_program__set_autoload(obj->progs.netdata_tcp_sendmsg_fexit, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_fentry, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_fexit, false); - bpf_program__set_autoload(obj->progs.netdata_socket_release_task_fentry, false); } /** @@ -190,12 +168,18 @@ static void ebpf_socket_disable_trampoline(struct socket_bpf *obj) */ static void ebpf_set_trampoline_target(struct socket_bpf *obj) { - bpf_program__set_attach_target(obj->progs.netdata_inet_csk_accept_fentry, 0, + bpf_program__set_attach_target(obj->progs.netdata_inet_csk_accept_fexit, 0, socket_targets[NETDATA_FCNT_INET_CSK_ACCEPT].name); + bpf_program__set_attach_target(obj->progs.netdata_tcp_v4_connect_fentry, 0, + socket_targets[NETDATA_FCNT_TCP_V4_CONNECT].name); + bpf_program__set_attach_target(obj->progs.netdata_tcp_v4_connect_fexit, 0, socket_targets[NETDATA_FCNT_TCP_V4_CONNECT].name); + bpf_program__set_attach_target(obj->progs.netdata_tcp_v6_connect_fentry, 0, + socket_targets[NETDATA_FCNT_TCP_V6_CONNECT].name); + bpf_program__set_attach_target(obj->progs.netdata_tcp_v6_connect_fexit, 0, socket_targets[NETDATA_FCNT_TCP_V6_CONNECT].name); @@ -205,7 +189,8 @@ static void ebpf_set_trampoline_target(struct socket_bpf *obj) bpf_program__set_attach_target(obj->progs.netdata_tcp_cleanup_rbuf_fentry, 0, socket_targets[NETDATA_FCNT_CLEANUP_RBUF].name); - bpf_program__set_attach_target(obj->progs.netdata_tcp_close_fentry, 0, socket_targets[NETDATA_FCNT_TCP_CLOSE].name); + bpf_program__set_attach_target(obj->progs.netdata_tcp_close_fentry, 0, + socket_targets[NETDATA_FCNT_TCP_CLOSE].name); bpf_program__set_attach_target(obj->progs.netdata_udp_recvmsg_fentry, 0, socket_targets[NETDATA_FCNT_UDP_RECEVMSG].name); @@ -224,8 +209,6 @@ static void ebpf_set_trampoline_target(struct socket_bpf *obj) bpf_program__set_attach_target(obj->progs.netdata_udp_sendmsg_fexit, 0, socket_targets[NETDATA_FCNT_UDP_SENDMSG].name); - - bpf_program__set_attach_target(obj->progs.netdata_socket_release_task_fentry, 0, EBPF_COMMON_FNCT_CLEAN_UP); } @@ -241,9 +224,13 @@ static inline void ebpf_socket_disable_specific_trampoline(struct socket_bpf *ob { if (sel == MODE_RETURN) { bpf_program__set_autoload(obj->progs.netdata_tcp_sendmsg_fentry, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_fentry, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_fentry, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_fentry, false); } else { bpf_program__set_autoload(obj->progs.netdata_tcp_sendmsg_fexit, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_fexit, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_fexit, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_fexit, false); } } @@ -260,9 +247,13 @@ static inline void ebpf_socket_disable_specific_probe(struct socket_bpf *obj, ne { if (sel == MODE_RETURN) { bpf_program__set_autoload(obj->progs.netdata_tcp_sendmsg_kprobe, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_kprobe, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_kprobe, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_kprobe, false); } else { bpf_program__set_autoload(obj->progs.netdata_tcp_sendmsg_kretprobe, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v4_connect_kretprobe, false); + bpf_program__set_autoload(obj->progs.netdata_tcp_v6_connect_kretprobe, false); bpf_program__set_autoload(obj->progs.netdata_udp_sendmsg_kretprobe, false); } } @@ -275,26 +266,12 @@ static inline void ebpf_socket_disable_specific_probe(struct socket_bpf *obj, ne * @param obj is the main structure for bpf objects. * @param sel option selected by user. */ -static int ebpf_socket_attach_probes(struct socket_bpf *obj, netdata_run_mode_t sel) +static long ebpf_socket_attach_probes(struct socket_bpf *obj, netdata_run_mode_t sel) { obj->links.netdata_inet_csk_accept_kretprobe = bpf_program__attach_kprobe(obj->progs.netdata_inet_csk_accept_kretprobe, true, socket_targets[NETDATA_FCNT_INET_CSK_ACCEPT].name); - int ret = libbpf_get_error(obj->links.netdata_inet_csk_accept_kretprobe); - if (ret) - return -1; - - obj->links.netdata_tcp_v4_connect_kretprobe = bpf_program__attach_kprobe(obj->progs.netdata_tcp_v4_connect_kretprobe, - true, - socket_targets[NETDATA_FCNT_TCP_V4_CONNECT].name); - ret = libbpf_get_error(obj->links.netdata_tcp_v4_connect_kretprobe); - if (ret) - return -1; - - obj->links.netdata_tcp_v6_connect_kretprobe = bpf_program__attach_kprobe(obj->progs.netdata_tcp_v6_connect_kretprobe, - true, - socket_targets[NETDATA_FCNT_TCP_V6_CONNECT].name); - ret = libbpf_get_error(obj->links.netdata_tcp_v6_connect_kretprobe); + long ret = libbpf_get_error(obj->links.netdata_inet_csk_accept_kretprobe); if (ret) return -1; @@ -347,6 +324,20 @@ static int ebpf_socket_attach_probes(struct socket_bpf *obj, netdata_run_mode_t ret = libbpf_get_error(obj->links.netdata_udp_sendmsg_kretprobe); if (ret) return -1; + + obj->links.netdata_tcp_v4_connect_kretprobe = bpf_program__attach_kprobe(obj->progs.netdata_tcp_v4_connect_kretprobe, + true, + socket_targets[NETDATA_FCNT_TCP_V4_CONNECT].name); + ret = libbpf_get_error(obj->links.netdata_tcp_v4_connect_kretprobe); + if (ret) + return -1; + + obj->links.netdata_tcp_v6_connect_kretprobe = bpf_program__attach_kprobe(obj->progs.netdata_tcp_v6_connect_kretprobe, + true, + socket_targets[NETDATA_FCNT_TCP_V6_CONNECT].name); + ret = libbpf_get_error(obj->links.netdata_tcp_v6_connect_kretprobe); + if (ret) + return -1; } else { obj->links.netdata_tcp_sendmsg_kprobe = bpf_program__attach_kprobe(obj->progs.netdata_tcp_sendmsg_kprobe, false, @@ -361,13 +352,21 @@ static int ebpf_socket_attach_probes(struct socket_bpf *obj, netdata_run_mode_t ret = libbpf_get_error(obj->links.netdata_udp_sendmsg_kprobe); if (ret) return -1; - } - obj->links.netdata_socket_release_task_kprobe = bpf_program__attach_kprobe(obj->progs.netdata_socket_release_task_kprobe, - false, EBPF_COMMON_FNCT_CLEAN_UP); - ret = libbpf_get_error(obj->links.netdata_socket_release_task_kprobe); - if (ret) - return -1; + obj->links.netdata_tcp_v4_connect_kprobe = bpf_program__attach_kprobe(obj->progs.netdata_tcp_v4_connect_kprobe, + false, + socket_targets[NETDATA_FCNT_TCP_V4_CONNECT].name); + ret = libbpf_get_error(obj->links.netdata_tcp_v4_connect_kprobe); + if (ret) + return -1; + + obj->links.netdata_tcp_v6_connect_kprobe = bpf_program__attach_kprobe(obj->progs.netdata_tcp_v6_connect_kprobe, + false, + socket_targets[NETDATA_FCNT_TCP_V6_CONNECT].name); + ret = libbpf_get_error(obj->links.netdata_tcp_v6_connect_kprobe); + if (ret) + return -1; + } return 0; } @@ -381,11 +380,9 @@ static int ebpf_socket_attach_probes(struct socket_bpf *obj, netdata_run_mode_t */ static void ebpf_socket_set_hash_tables(struct socket_bpf *obj) { - socket_maps[NETDATA_SOCKET_TABLE_BANDWIDTH].map_fd = bpf_map__fd(obj->maps.tbl_bandwidth); socket_maps[NETDATA_SOCKET_GLOBAL].map_fd = bpf_map__fd(obj->maps.tbl_global_sock); socket_maps[NETDATA_SOCKET_LPORTS].map_fd = bpf_map__fd(obj->maps.tbl_lports); - socket_maps[NETDATA_SOCKET_TABLE_IPV4].map_fd = bpf_map__fd(obj->maps.tbl_conn_ipv4); - socket_maps[NETDATA_SOCKET_TABLE_IPV6].map_fd = bpf_map__fd(obj->maps.tbl_conn_ipv6); + socket_maps[NETDATA_SOCKET_OPEN_SOCKET].map_fd = bpf_map__fd(obj->maps.tbl_nd_socket); socket_maps[NETDATA_SOCKET_TABLE_UDP].map_fd = bpf_map__fd(obj->maps.tbl_nv_udp); socket_maps[NETDATA_SOCKET_TABLE_CTRL].map_fd = bpf_map__fd(obj->maps.socket_ctrl); } @@ -400,22 +397,13 @@ static void ebpf_socket_set_hash_tables(struct socket_bpf *obj) */ static void ebpf_socket_adjust_map(struct socket_bpf *obj, ebpf_module_t *em) { - ebpf_update_map_size(obj->maps.tbl_bandwidth, &socket_maps[NETDATA_SOCKET_TABLE_BANDWIDTH], - em, bpf_map__name(obj->maps.tbl_bandwidth)); - - ebpf_update_map_size(obj->maps.tbl_conn_ipv4, &socket_maps[NETDATA_SOCKET_TABLE_IPV4], - em, bpf_map__name(obj->maps.tbl_conn_ipv4)); - - ebpf_update_map_size(obj->maps.tbl_conn_ipv6, &socket_maps[NETDATA_SOCKET_TABLE_IPV6], - em, bpf_map__name(obj->maps.tbl_conn_ipv6)); + ebpf_update_map_size(obj->maps.tbl_nd_socket, &socket_maps[NETDATA_SOCKET_OPEN_SOCKET], + em, bpf_map__name(obj->maps.tbl_nd_socket)); ebpf_update_map_size(obj->maps.tbl_nv_udp, &socket_maps[NETDATA_SOCKET_TABLE_UDP], em, bpf_map__name(obj->maps.tbl_nv_udp)); - - ebpf_update_map_type(obj->maps.tbl_bandwidth, &socket_maps[NETDATA_SOCKET_TABLE_BANDWIDTH]); - ebpf_update_map_type(obj->maps.tbl_conn_ipv4, &socket_maps[NETDATA_SOCKET_TABLE_IPV4]); - ebpf_update_map_type(obj->maps.tbl_conn_ipv6, &socket_maps[NETDATA_SOCKET_TABLE_IPV6]); + ebpf_update_map_type(obj->maps.tbl_nd_socket, &socket_maps[NETDATA_SOCKET_OPEN_SOCKET]); ebpf_update_map_type(obj->maps.tbl_nv_udp, &socket_maps[NETDATA_SOCKET_TABLE_UDP]); ebpf_update_map_type(obj->maps.socket_ctrl, &socket_maps[NETDATA_SOCKET_TABLE_CTRL]); ebpf_update_map_type(obj->maps.tbl_global_sock, &socket_maps[NETDATA_SOCKET_GLOBAL]); @@ -459,7 +447,7 @@ static inline int ebpf_socket_load_and_attach(struct socket_bpf *obj, ebpf_modul if (test == EBPF_LOAD_TRAMPOLINE) { ret = socket_bpf__attach(obj); } else { - ret = ebpf_socket_attach_probes(obj, em->mode); + ret = (int)ebpf_socket_attach_probes(obj, em->mode); } if (!ret) { @@ -479,211 +467,392 @@ static inline int ebpf_socket_load_and_attach(struct socket_bpf *obj, ebpf_modul *****************************************************************/ /** - * Clean internal socket plot + * Socket Free * - * Clean all structures allocated with strdupz. + * Cleanup variables after child threads to stop * - * @param ptr the pointer with addresses to clean. + * @param ptr thread data. */ -static inline void clean_internal_socket_plot(netdata_socket_plot_t *ptr) +static void ebpf_socket_free(ebpf_module_t *em ) { - freez(ptr->dimension_recv); - freez(ptr->dimension_sent); - freez(ptr->resolved_name); - freez(ptr->dimension_retransmit); + pthread_mutex_lock(&ebpf_exit_cleanup); + em->enabled = NETDATA_THREAD_EBPF_STOPPED; + ebpf_update_stats(&plugin_statistics, em); + ebpf_update_kernel_memory_with_vector(&plugin_statistics, em->maps, EBPF_ACTION_STAT_REMOVE); + pthread_mutex_unlock(&ebpf_exit_cleanup); } /** - * Clean socket plot + * Obsolete Systemd Socket Charts * - * Clean the allocated data for inbound and outbound vectors. -static void clean_allocated_socket_plot() -{ - if (!network_viewer_opt.enabled) - return; - - uint32_t i; - uint32_t end = inbound_vectors.last; - netdata_socket_plot_t *plot = inbound_vectors.plot; - for (i = 0; i < end; i++) { - clean_internal_socket_plot(&plot[i]); - } - - clean_internal_socket_plot(&plot[inbound_vectors.last]); - - end = outbound_vectors.last; - plot = outbound_vectors.plot; - for (i = 0; i < end; i++) { - clean_internal_socket_plot(&plot[i]); - } - clean_internal_socket_plot(&plot[outbound_vectors.last]); -} - */ - -/** - * Clean network ports allocated during initialization. + * Obsolete charts when systemd is enabled * - * @param ptr a pointer to the link list. -static void clean_network_ports(ebpf_network_viewer_port_list_t *ptr) + * @param update_every value to overwrite the update frequency set by the server. + **/ +static void ebpf_obsolete_systemd_socket_charts(int update_every) { - if (unlikely(!ptr)) - return; - - while (ptr) { - ebpf_network_viewer_port_list_t *next = ptr->next; - freez(ptr->value); - freez(ptr); - ptr = next; - } + int order = 20080; + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_CONNECTION_TCP_V4, + "Calls to tcp_v4_connection", + EBPF_COMMON_DIMENSION_CONNECTIONS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_TCP_V4_CONN_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_CONNECTION_TCP_V6, + "Calls to tcp_v6_connection", + EBPF_COMMON_DIMENSION_CONNECTIONS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_TCP_V6_CONN_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_RECV, + "Bytes received", + EBPF_COMMON_DIMENSION_BITS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_BYTES_RECV_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_SENT, + "Bytes sent", + EBPF_COMMON_DIMENSION_BITS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_BYTES_SEND_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_TCP_RECV_CALLS, + "Calls to tcp_cleanup_rbuf.", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_TCP_RECV_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_TCP_SEND_CALLS, + "Calls to tcp_sendmsg.", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_TCP_SEND_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_TCP_RETRANSMIT, + "Calls to tcp_retransmit", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_TCP_RETRANSMIT_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_UDP_SEND_CALLS, + "Calls to udp_sendmsg", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_UDP_SEND_CONTEXT, + order++, + update_every); + + ebpf_write_chart_obsolete(NETDATA_SERVICE_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_UDP_RECV_CALLS, + "Calls to udp_recvmsg", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NETDATA_SERVICES_SOCKET_UDP_RECV_CONTEXT, + order++, + update_every); } - */ +static void ebpf_obsolete_specific_socket_charts(char *type, int update_every); /** - * Clean service names + * Obsolete cgroup chart * - * Clean the allocated link list that stores names. + * Send obsolete for all charts created before to close. * - * @param names the link list. -static void clean_service_names(ebpf_network_viewer_dim_name_t *names) -{ - if (unlikely(!names)) - return; - - while (names) { - ebpf_network_viewer_dim_name_t *next = names->next; - freez(names->name); - freez(names); - names = next; - } -} + * @param em a pointer to `struct ebpf_module` */ +static inline void ebpf_obsolete_socket_cgroup_charts(ebpf_module_t *em) { + pthread_mutex_lock(&mutex_cgroup_shm); -/** - * Clean hostnames - * - * @param hostnames the hostnames to clean -static void clean_hostnames(ebpf_network_viewer_hostname_list_t *hostnames) -{ - if (unlikely(!hostnames)) - return; + ebpf_obsolete_systemd_socket_charts(em->update_every); + + ebpf_cgroup_target_t *ect; + for (ect = ebpf_cgroup_pids; ect ; ect = ect->next) { + if (ect->systemd) + continue; - while (hostnames) { - ebpf_network_viewer_hostname_list_t *next = hostnames->next; - freez(hostnames->value); - simple_pattern_free(hostnames->value_pattern); - freez(hostnames); - hostnames = next; + ebpf_obsolete_specific_socket_charts(ect->name, em->update_every); } + pthread_mutex_unlock(&mutex_cgroup_shm); } - */ /** - * Clean port Structure + * Create apps charts * - * Clean the allocated list. + * Call ebpf_create_chart to create the charts on apps submenu. * - * @param clean the list that will be cleaned + * @param em a pointer to the structure with the default values. */ -void clean_port_structure(ebpf_network_viewer_port_list_t **clean) +void ebpf_socket_obsolete_apps_charts(struct ebpf_module *em) { - ebpf_network_viewer_port_list_t *move = *clean; - while (move) { - ebpf_network_viewer_port_list_t *next = move->next; - freez(move->value); - freez(move); - - move = next; - } - *clean = NULL; -} - -/** - * Clean IP structure - * - * Clean the allocated list. + int order = 20080; + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_CONNECTION_TCP_V4, + "Calls to tcp_v4_connection", + EBPF_COMMON_DIMENSION_CONNECTIONS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_CONNECTION_TCP_V6, + "Calls to tcp_v6_connection", + EBPF_COMMON_DIMENSION_CONNECTIONS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_SENT, + "Bytes sent", + EBPF_COMMON_DIMENSION_BITS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_RECV, + "bytes received", + EBPF_COMMON_DIMENSION_BITS, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_TCP_SEND_CALLS, + "Calls for tcp_sendmsg", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_TCP_RECV_CALLS, + "Calls for tcp_cleanup_rbuf", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_TCP_RETRANSMIT, + "Calls for tcp_retransmit", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_UDP_SEND_CALLS, + "Calls for udp_sendmsg", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_APPS_FAMILY, + NETDATA_NET_APPS_BANDWIDTH_UDP_RECV_CALLS, + "Calls for udp_recvmsg", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_APPS_NET_GROUP, + NETDATA_EBPF_CHART_TYPE_STACKED, + NULL, + order++, + em->update_every); +} + +/** + * Obsolete global charts + * + * Obsolete charts created. * - * @param clean the list that will be cleaned + * @param em a pointer to the structure with the default values. */ -static void clean_ip_structure(ebpf_network_viewer_ip_list_t **clean) +static void ebpf_socket_obsolete_global_charts(ebpf_module_t *em) { - ebpf_network_viewer_ip_list_t *move = *clean; - while (move) { - ebpf_network_viewer_ip_list_t *next = move->next; - freez(move->value); - freez(move); + int order = 21070; + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_INBOUND_CONNECTIONS, + "Inbound connections.", + EBPF_COMMON_DIMENSION_CONNECTIONS, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_TCP_OUTBOUND_CONNECTIONS, + "TCP outbound connections.", + EBPF_COMMON_DIMENSION_CONNECTIONS, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); + + + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_TCP_FUNCTION_COUNT, + "Calls to internal functions", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_TCP_FUNCTION_BITS, + "TCP bandwidth", + EBPF_COMMON_DIMENSION_BITS, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); - move = next; + if (em->mode < MODE_ENTRY) { + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_TCP_FUNCTION_ERROR, + "TCP errors", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); + } + + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_TCP_RETRANSMIT, + "Packages retransmitted", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_UDP_FUNCTION_COUNT, + "UDP calls", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); + + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_UDP_FUNCTION_BITS, + "UDP bandwidth", + EBPF_COMMON_DIMENSION_BITS, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); + + if (em->mode < MODE_ENTRY) { + ebpf_write_chart_obsolete(NETDATA_EBPF_IP_FAMILY, + NETDATA_UDP_FUNCTION_ERROR, + "UDP errors", + EBPF_COMMON_DIMENSION_CALL, + NETDATA_SOCKET_KERNEL_FUNCTIONS, + NETDATA_EBPF_CHART_TYPE_LINE, + NULL, + order++, + em->update_every); } - *clean = NULL; -} + fflush(stdout); +} /** - * Socket Free + * Socket exit * - * Cleanup variables after child threads to stop + * Clean up the main thread. * * @param ptr thread data. */ -static void ebpf_socket_free(ebpf_module_t *em ) +static void ebpf_socket_exit(void *ptr) { - /* We can have thousands of sockets to clean, so we are transferring - * for OS the responsibility while we do not use ARAL here - freez(socket_hash_values); + ebpf_module_t *em = (ebpf_module_t *)ptr; - freez(bandwidth_vector); + if (ebpf_read_socket.thread) + netdata_thread_cancel(*ebpf_read_socket.thread); - freez(socket_values); - clean_allocated_socket_plot(); - freez(inbound_vectors.plot); - freez(outbound_vectors.plot); + if (em->enabled == NETDATA_THREAD_EBPF_FUNCTION_RUNNING) { + pthread_mutex_lock(&lock); - clean_port_structure(&listen_ports); + if (em->cgroup_charts) { + ebpf_obsolete_socket_cgroup_charts(em); + fflush(stdout); + } - clean_network_ports(network_viewer_opt.included_port); - clean_network_ports(network_viewer_opt.excluded_port); - clean_service_names(network_viewer_opt.names); - clean_hostnames(network_viewer_opt.included_hostnames); - clean_hostnames(network_viewer_opt.excluded_hostnames); - */ + if (em->apps_charts & NETDATA_EBPF_APPS_FLAG_CHART_CREATED) { + ebpf_socket_obsolete_apps_charts(em); + fflush(stdout); + } - pthread_mutex_destroy(&nv_mutex); + ebpf_socket_obsolete_global_charts(em); - pthread_mutex_lock(&ebpf_exit_cleanup); - em->enabled = NETDATA_THREAD_EBPF_STOPPED; - ebpf_update_stats(&plugin_statistics, em); - ebpf_update_kernel_memory_with_vector(&plugin_statistics, em->maps, EBPF_ACTION_STAT_REMOVE); - pthread_mutex_unlock(&ebpf_exit_cleanup); -} +#ifdef NETDATA_DEV_MODE + if (ebpf_aral_socket_pid) + ebpf_statistic_obsolete_aral_chart(em, socket_disable_priority); +#endif + pthread_mutex_unlock(&lock); + } -/** - * Socket exit - * - * Clean up the main thread. - * - * @param ptr thread data. - */ -static void ebpf_socket_exit(void *ptr) -{ - ebpf_module_t *em = (ebpf_module_t *)ptr; - pthread_mutex_lock(&nv_mutex); - if (socket_threads.thread) - netdata_thread_cancel(*socket_threads.thread); - pthread_mutex_unlock(&nv_mutex); ebpf_socket_free(em); } -/** - * Socket cleanup - * - * Clean up allocated addresses. - * - * @param ptr thread data. - */ -void ebpf_socket_cleanup(void *ptr) -{ - UNUSED(ptr); -} - /***************************************************************** * * PROCESS DATA AND SEND TO NETDATA @@ -737,174 +906,6 @@ static void ebpf_update_global_publish( } /** - * Update Network Viewer plot data - * - * @param plot the structure where the data will be stored - * @param sock the last update from the socket - */ -static inline void update_nv_plot_data(netdata_plot_values_t *plot, netdata_socket_t *sock) -{ - if (sock->ct != plot->last_time) { - plot->last_time = sock->ct; - plot->plot_recv_packets = sock->recv_packets; - plot->plot_sent_packets = sock->sent_packets; - plot->plot_recv_bytes = sock->recv_bytes; - plot->plot_sent_bytes = sock->sent_bytes; - plot->plot_retransmit = sock->retransmit; - } - - sock->recv_packets = 0; - sock->sent_packets = 0; - sock->recv_bytes = 0; - sock->sent_bytes = 0; - sock->retransmit = 0; -} - -/** - * Calculate Network Viewer Plot - * - * Do math with collected values before to plot data. - */ -static inline void calculate_nv_plot() -{ - pthread_mutex_lock(&nv_mutex); - uint32_t i; - uint32_t end = inbound_vectors.next; - for (i = 0; i < end; i++) { - update_nv_plot_data(&inbound_vectors.plot[i].plot, &inbound_vectors.plot[i].sock); - } - inbound_vectors.max_plot = end; - - // The 'Other' dimension is always calculated for the chart to have at least one dimension - update_nv_plot_data(&inbound_vectors.plot[inbound_vectors.last].plot, - &inbound_vectors.plot[inbound_vectors.last].sock); - - end = outbound_vectors.next; - for (i = 0; i < end; i++) { - update_nv_plot_data(&outbound_vectors.plot[i].plot, &outbound_vectors.plot[i].sock); - } - outbound_vectors.max_plot = end; - - /* - // The 'Other' dimension is always calculated for the chart to have at least one dimension - update_nv_plot_data(&outbound_vectors.plot[outbound_vectors.last].plot, - &outbound_vectors.plot[outbound_vectors.last].sock); - */ - pthread_mutex_unlock(&nv_mutex); -} - -/** - * Network viewer send bytes - * - * @param ptr the structure with values to plot - * @param chart the chart name. - */ -static inline void ebpf_socket_nv_send_bytes(netdata_vector_plot_t *ptr, char *chart) -{ - uint32_t i; - uint32_t end = ptr->last_plot; - netdata_socket_plot_t *w = ptr->plot; - collected_number value; - - write_begin_chart(NETDATA_EBPF_FAMILY, chart); - for (i = 0; i < end; i++) { - value = ((collected_number) w[i].plot.plot_sent_bytes); - write_chart_dimension(w[i].dimension_sent, value); - value = (collected_number) w[i].plot.plot_recv_bytes; - write_chart_dimension(w[i].dimension_recv, value); - } - - i = ptr->last; - value = ((collected_number) w[i].plot.plot_sent_bytes); - write_chart_dimension(w[i].dimension_sent, value); - value = (collected_number) w[i].plot.plot_recv_bytes; - write_chart_dimension(w[i].dimension_recv, value); - write_end_chart(); -} - -/** - * Network Viewer Send packets - * - * @param ptr the structure with values to plot - * @param chart the chart name. - */ -static inline void ebpf_socket_nv_send_packets(netdata_vector_plot_t *ptr, char *chart) -{ - uint32_t i; - uint32_t end = ptr->last_plot; - netdata_socket_plot_t *w = ptr->plot; - collected_number value; - - write_begin_chart(NETDATA_EBPF_FAMILY, chart); - for (i = 0; i < end; i++) { - value = ((collected_number)w[i].plot.plot_sent_packets); - write_chart_dimension(w[i].dimension_sent, value); - value = (collected_number) w[i].plot.plot_recv_packets; - write_chart_dimension(w[i].dimension_recv, value); - } - - i = ptr->last; - value = ((collected_number)w[i].plot.plot_sent_packets); - write_chart_dimension(w[i].dimension_sent, value); - value = (collected_number)w[i].plot.plot_recv_packets; - write_chart_dimension(w[i].dimension_recv, value); - write_end_chart(); -} - -/** - * Network Viewer Send Retransmit - * - * @param ptr the structure with values to plot - * @param chart the chart name. - */ -static inline void ebpf_socket_nv_send_retransmit(netdata_vector_plot_t *ptr, char *chart) -{ - uint32_t i; - uint32_t end = ptr->last_plot; - netdata_socket_plot_t *w = ptr->plot; - collected_number value; - - write_begin_chart(NETDATA_EBPF_FAMILY, chart); - for (i = 0; i < end; i++) { - value = (collected_number) w[i].plot.plot_retransmit; - write_chart_dimension(w[i].dimension_retransmit, value); - } - - i = ptr->last; - value = (collected_number)w[i].plot.plot_retransmit; - write_chart_dimension(w[i].dimension_retransmit, value); - write_end_chart(); -} - -/** - * Send network viewer data - * - * @param ptr the pointer to plot data - */ -static void ebpf_socket_send_nv_data(netdata_vector_plot_t *ptr) -{ - if (!ptr->flags) - return; - - if (ptr == (netdata_vector_plot_t *)&outbound_vectors) { - ebpf_socket_nv_send_bytes(ptr, NETDATA_NV_OUTBOUND_BYTES); - fflush(stdout); - - ebpf_socket_nv_send_packets(ptr, NETDATA_NV_OUTBOUND_PACKETS); - fflush(stdout); - - ebpf_socket_nv_send_retransmit(ptr, NETDATA_NV_OUTBOUND_RETRANSMIT); - fflush(stdout); - } else { - ebpf_socket_nv_send_bytes(ptr, NETDATA_NV_INBOUND_BYTES); - fflush(stdout); - - ebpf_socket_nv_send_packets(ptr, NETDATA_NV_INBOUND_PACKETS); - fflush(stdout); - } -} - -/** * Send Global Inbound connection * * Send number of connections read per protocol. @@ -1112,7 +1113,7 @@ void ebpf_socket_send_apps_data(ebpf_module_t *em, struct ebpf_target *root) * * @param em a pointer to the structure with the default values. */ -static void ebpf_create_global_charts(ebpf_module_t *em) +static void ebpf_socket_create_global_charts(ebpf_module_t *em) { int order = 21070; ebpf_create_chart(NETDATA_EBPF_IP_FAMILY, @@ -1319,138 +1320,6 @@ void ebpf_socket_create_apps_charts(struct ebpf_module *em, void *ptr) em->apps_charts |= NETDATA_EBPF_APPS_FLAG_CHART_CREATED; } -/** - * Create network viewer chart - * - * Create common charts. - * - * @param id chart id - * @param title chart title - * @param units units label - * @param family group name used to attach the chart on dashboard - * @param order chart order - * @param update_every value to overwrite the update frequency set by the server. - * @param ptr plot structure with values. - */ -static void ebpf_socket_create_nv_chart(char *id, char *title, char *units, - char *family, int order, int update_every, netdata_vector_plot_t *ptr) -{ - ebpf_write_chart_cmd(NETDATA_EBPF_FAMILY, - id, - title, - units, - family, - NETDATA_EBPF_CHART_TYPE_STACKED, - NULL, - order, - update_every, - NETDATA_EBPF_MODULE_NAME_SOCKET); - - uint32_t i; - uint32_t end = ptr->last_plot; - netdata_socket_plot_t *w = ptr->plot; - for (i = 0; i < end; i++) { - fprintf(stdout, "DIMENSION %s '' incremental -1 1\n", w[i].dimension_sent); - fprintf(stdout, "DIMENSION %s '' incremental 1 1\n", w[i].dimension_recv); - } - - end = ptr->last; - fprintf(stdout, "DIMENSION %s '' incremental -1 1\n", w[end].dimension_sent); - fprintf(stdout, "DIMENSION %s '' incremental 1 1\n", w[end].dimension_recv); -} - -/** - * Create network viewer retransmit - * - * Create a specific chart. - * - * @param id the chart id - * @param title the chart title - * @param units the units label - * @param family the group name used to attach the chart on dashboard - * @param order the chart order - * @param update_every value to overwrite the update frequency set by the server. - * @param ptr the plot structure with values. - */ -static void ebpf_socket_create_nv_retransmit(char *id, char *title, char *units, - char *family, int order, int update_every, netdata_vector_plot_t *ptr) -{ - ebpf_write_chart_cmd(NETDATA_EBPF_FAMILY, - id, - title, - units, - family, - NETDATA_EBPF_CHART_TYPE_STACKED, - NULL, - order, - update_every, - NETDATA_EBPF_MODULE_NAME_SOCKET); - - uint32_t i; - uint32_t end = ptr->last_plot; - netdata_socket_plot_t *w = ptr->plot; - for (i = 0; i < end; i++) { - fprintf(stdout, "DIMENSION %s '' incremental 1 1\n", w[i].dimension_retransmit); - } - - end = ptr->last; - fprintf(stdout, "DIMENSION %s '' incremental 1 1\n", w[end].dimension_retransmit); -} - -/** - * Create Network Viewer charts - * - * Recreate the charts when new sockets are created. - * - * @param ptr a pointer for inbound or outbound vectors. - * @param update_every value to overwrite the update frequency set by the server. - */ -static void ebpf_socket_create_nv_charts(netdata_vector_plot_t *ptr, int update_every) -{ - // We do not have new sockets, so we do not need move forward - if (ptr->max_plot == ptr->last_plot) - return; - - ptr->last_plot = ptr->max_plot; - - if (ptr == (netdata_vector_plot_t *)&outbound_vectors) { - ebpf_socket_create_nv_chart(NETDATA_NV_OUTBOUND_BYTES, - "Outbound connections (bytes).", EBPF_COMMON_DIMENSION_BYTES, - NETDATA_NETWORK_CONNECTIONS_GROUP, - 21080, - update_every, ptr); - - ebpf_socket_create_nv_chart(NETDATA_NV_OUTBOUND_PACKETS, - "Outbound connections (packets)", - EBPF_COMMON_DIMENSION_PACKETS, - NETDATA_NETWORK_CONNECTIONS_GROUP, - 21082, - update_every, ptr); - - ebpf_socket_create_nv_retransmit(NETDATA_NV_OUTBOUND_RETRANSMIT, - "Retransmitted packets", - EBPF_COMMON_DIMENSION_CALL, - NETDATA_NETWORK_CONNECTIONS_GROUP, - 21083, - update_every, ptr); - } else { - ebpf_socket_create_nv_chart(NETDATA_NV_INBOUND_BYTES, - "Inbound connections (bytes)", EBPF_COMMON_DIMENSION_BYTES, - NETDATA_NETWORK_CONNECTIONS_GROUP, - 21084, - update_every, ptr); - - ebpf_socket_create_nv_chart(NETDATA_NV_INBOUND_PACKETS, - "Inbound connections (packets)", - EBPF_COMMON_DIMENSION_PACKETS, - NETDATA_NETWORK_CONNECTIONS_GROUP, - 21085, - update_every, ptr); - } - - ptr->flags |= NETWORK_VIEWER_CHARTS_CREATED; -} - /***************************************************************** * * READ INFORMATION FROM KERNEL RING @@ -1517,7 +1386,7 @@ static int ebpf_is_specific_ip_inside_range(union netdata_ip_t *cmp, int family) * * @return It returns 1 when cmp is inside and 0 otherwise. */ -static int is_port_inside_range(uint16_t cmp) +static int ebpf_is_port_inside_range(uint16_t cmp) { // We do not have restrictions for ports. if (!network_viewer_opt.excluded_port && !network_viewer_opt.included_port) @@ -1525,7 +1394,6 @@ static int is_port_inside_range(uint16_t cmp) // Test if port is excluded ebpf_network_viewer_port_list_t *move = network_viewer_opt.excluded_port; - cmp = htons(cmp); while (move) { if (move->cmp_first <= cmp && cmp <= move->cmp_last) return 0; @@ -1583,493 +1451,322 @@ int hostname_matches_pattern(char *cmp) * Compare destination addresses and destination ports to define next steps * * @param key the socket read from kernel ring - * @param family the family used to compare IPs (AF_INET and AF_INET6) + * @param data the socket data used also used to refuse some sockets. * * @return It returns 1 if this socket is inside the ranges and 0 otherwise. */ -int is_socket_allowed(netdata_socket_idx_t *key, int family) +int ebpf_is_socket_allowed(netdata_socket_idx_t *key, netdata_socket_t *data) { - if (!is_port_inside_range(key->dport)) - return 0; - - return ebpf_is_specific_ip_inside_range(&key->daddr, family); -} - -/** - * Compare sockets - * - * Compare destination address and destination port. - * We do not compare source port, because it is random. - * We also do not compare source address, because inbound and outbound connections are stored in separated AVL trees. - * - * @param a pointer to netdata_socket_plot - * @param b pointer to netdata_socket_plot - * - * @return It returns 0 case the values are equal, 1 case a is bigger than b and -1 case a is smaller than b. - */ -static int ebpf_compare_sockets(void *a, void *b) -{ - struct netdata_socket_plot *val1 = a; - struct netdata_socket_plot *val2 = b; - int cmp = 0; + int ret = 0; + // If family is not AF_UNSPEC and it is different of specified + if (network_viewer_opt.family && network_viewer_opt.family != data->family) + goto endsocketallowed; - // We do not need to compare val2 family, because data inside hash table is always from the same family - if (val1->family == AF_INET) { //IPV4 - if (network_viewer_opt.included_port || network_viewer_opt.excluded_port) - cmp = memcmp(&val1->index.dport, &val2->index.dport, sizeof(uint16_t)); + if (!ebpf_is_port_inside_range(key->dport)) + goto endsocketallowed; - if (!cmp) { - cmp = memcmp(&val1->index.daddr.addr32[0], &val2->index.daddr.addr32[0], sizeof(uint32_t)); - } - } else { - if (network_viewer_opt.included_port || network_viewer_opt.excluded_port) - cmp = memcmp(&val1->index.dport, &val2->index.dport, sizeof(uint16_t)); + ret = ebpf_is_specific_ip_inside_range(&key->daddr, data->family); - if (!cmp) { - cmp = memcmp(&val1->index.daddr.addr32, &val2->index.daddr.addr32, 4*sizeof(uint32_t)); - } - } - - return cmp; +endsocketallowed: + return ret; } /** - * Build dimension name - * - * Fill dimension name vector with values given - * - * @param dimname the output vector - * @param hostname the hostname for the socket. - * @param service_name the service used to connect. - * @param proto the protocol used in this connection - * @param family is this IPV4(AF_INET) or IPV6(AF_INET6) + * Hash accumulator * - * @return it returns the size of the data copied on success and -1 otherwise. + * @param values the values used to calculate the data. + * @param family the connection family + * @param end the values size. */ -static inline int ebpf_build_outbound_dimension_name(char *dimname, char *hostname, char *service_name, - char *proto, int family) +static void ebpf_hash_socket_accumulator(netdata_socket_t *values, int end) { - if (network_viewer_opt.included_port || network_viewer_opt.excluded_port) - return snprintf(dimname, CONFIG_MAX_NAME - 7, (family == AF_INET)?"%s:%s:%s_":"%s:%s:[%s]_", - service_name, proto, hostname); + int i; + uint8_t protocol = values[0].protocol; + uint64_t ct = values[0].current_timestamp; + uint64_t ft = values[0].first_timestamp; + uint16_t family = AF_UNSPEC; + uint32_t external_origin = values[0].external_origin; + for (i = 1; i < end; i++) { + netdata_socket_t *w = &values[i]; - return snprintf(dimname, CONFIG_MAX_NAME - 7, (family == AF_INET)?"%s:%s_":"%s:[%s]_", - proto, hostname); -} + values[0].tcp.call_tcp_sent += w->tcp.call_tcp_sent; + values[0].tcp.call_tcp_received += w->tcp.call_tcp_received; + values[0].tcp.tcp_bytes_received += w->tcp.tcp_bytes_received; + values[0].tcp.tcp_bytes_sent += w->tcp.tcp_bytes_sent; + values[0].tcp.close += w->tcp.close; + values[0].tcp.retransmit += w->tcp.retransmit; + values[0].tcp.ipv4_connect += w->tcp.ipv4_connect; + values[0].tcp.ipv6_connect += w->tcp.ipv6_connect; -/** - * Fill inbound dimension name - * - * Mount the dimension name with the input given - * - * @param dimname the output vector - * @param service_name the service used to connect. - * @param proto the protocol used in this connection - * - * @return it returns the size of the data copied on success and -1 otherwise. - */ -static inline int build_inbound_dimension_name(char *dimname, char *service_name, char *proto) -{ - return snprintf(dimname, CONFIG_MAX_NAME - 7, "%s:%s_", service_name, - proto); -} - -/** - * Fill Resolved Name - * - * Fill the resolved name structure with the value given. - * The hostname is the largest value possible, if it is necessary to cut some value, it must be cut. - * - * @param ptr the output vector - * @param hostname the hostname resolved or IP. - * @param length the length for the hostname. - * @param service_name the service name associated to the connection - * @param is_outbound the is this an outbound connection - */ -static inline void fill_resolved_name(netdata_socket_plot_t *ptr, char *hostname, size_t length, - char *service_name, int is_outbound) -{ - if (length < NETDATA_MAX_NETWORK_COMBINED_LENGTH) - ptr->resolved_name = strdupz(hostname); - else { - length = NETDATA_MAX_NETWORK_COMBINED_LENGTH; - ptr->resolved_name = mallocz( NETDATA_MAX_NETWORK_COMBINED_LENGTH + 1); - memcpy(ptr->resolved_name, hostname, length); - ptr->resolved_name[length] = '\0'; - } - - char dimname[CONFIG_MAX_NAME]; - int size; - char *protocol; - if (ptr->sock.protocol == IPPROTO_UDP) { - protocol = "UDP"; - } else if (ptr->sock.protocol == IPPROTO_TCP) { - protocol = "TCP"; - } else { - protocol = "ALL"; - } + if (!protocol) + protocol = w->protocol; - if (is_outbound) - size = ebpf_build_outbound_dimension_name(dimname, hostname, service_name, protocol, ptr->family); - else - size = build_inbound_dimension_name(dimname,service_name, protocol); + if (family == AF_UNSPEC) + family = w->family; - if (size > 0) { - strcpy(&dimname[size], "sent"); - dimname[size + 4] = '\0'; - ptr->dimension_sent = strdupz(dimname); + if (w->current_timestamp > ct) + ct = w->current_timestamp; - strcpy(&dimname[size], "recv"); - ptr->dimension_recv = strdupz(dimname); + if (!ft) + ft = w->first_timestamp; - dimname[size - 1] = '\0'; - ptr->dimension_retransmit = strdupz(dimname); + if (w->external_origin) + external_origin = NETDATA_EBPF_SRC_IP_ORIGIN_EXTERNAL; } + + values[0].protocol = (!protocol)?IPPROTO_TCP:protocol; + values[0].current_timestamp = ct; + values[0].first_timestamp = ft; + values[0].external_origin = external_origin; } /** - * Mount dimension names - * - * Fill the vector names after to resolve the addresses + * Translate socket * - * @param ptr a pointer to the structure where the values are stored. - * @param is_outbound is a outbound ptr value? + * Convert socket address to string * - * @return It returns 1 if the name is valid and 0 otherwise. + * @param dst structure where we will store + * @param key the socket address */ -int fill_names(netdata_socket_plot_t *ptr, int is_outbound) +static void ebpf_socket_translate(netdata_socket_plus_t *dst, netdata_socket_idx_t *key) { - char hostname[NI_MAXHOST], service_name[NI_MAXSERV]; - if (ptr->resolved) - return 1; - + uint32_t resolve = network_viewer_opt.service_resolution_enabled; + char service[NI_MAXSERV]; int ret; - static int resolve_name = -1; - static int resolve_service = -1; - if (resolve_name == -1) - resolve_name = network_viewer_opt.hostname_resolution_enabled; - - if (resolve_service == -1) - resolve_service = network_viewer_opt.service_resolution_enabled; - - netdata_socket_idx_t *idx = &ptr->index; - - char *errname = { "Not resolved" }; - // Resolve Name - if (ptr->family == AF_INET) { //IPV4 - struct sockaddr_in myaddr; - memset(&myaddr, 0 , sizeof(myaddr)); - - myaddr.sin_family = ptr->family; - if (is_outbound) { - myaddr.sin_port = idx->dport; - myaddr.sin_addr.s_addr = idx->daddr.addr32[0]; - } else { - myaddr.sin_port = idx->sport; - myaddr.sin_addr.s_addr = idx->saddr.addr32[0]; - } - - ret = (!resolve_name)?-1:getnameinfo((struct sockaddr *)&myaddr, sizeof(myaddr), hostname, - sizeof(hostname), service_name, sizeof(service_name), NI_NAMEREQD); - - if (!ret && !resolve_service) { - snprintf(service_name, sizeof(service_name), "%u", ntohs(myaddr.sin_port)); + if (dst->data.family == AF_INET) { + struct sockaddr_in ipv4_addr = { }; + ipv4_addr.sin_port = 0; + ipv4_addr.sin_addr.s_addr = key->saddr.addr32[0]; + ipv4_addr.sin_family = AF_INET; + if (resolve) { + // NI_NAMEREQD : It is too slow + ret = getnameinfo((struct sockaddr *) &ipv4_addr, sizeof(ipv4_addr), dst->socket_string.src_ip, + INET6_ADDRSTRLEN, service, NI_MAXSERV, NI_NUMERICHOST | NI_NUMERICSERV); + if (ret) { + collector_error("Cannot resolve name: %s", gai_strerror(ret)); + resolve = 0; + } else { + ipv4_addr.sin_addr.s_addr = key->daddr.addr32[0]; + + ipv4_addr.sin_port = key->dport; + ret = getnameinfo((struct sockaddr *) &ipv4_addr, sizeof(ipv4_addr), dst->socket_string.dst_ip, + INET6_ADDRSTRLEN, dst->socket_string.dst_port, NI_MAXSERV, + NI_NUMERICHOST); + if (ret) { + collector_error("Cannot resolve name: %s", gai_strerror(ret)); + resolve = 0; + } + } } - if (ret) { - // I cannot resolve the name, I will use the IP - if (!inet_ntop(AF_INET, &myaddr.sin_addr.s_addr, hostname, NI_MAXHOST)) { - strncpy(hostname, errname, 13); - } + // When resolution fail, we should use addresses + if (!resolve) { + ipv4_addr.sin_addr.s_addr = key->saddr.addr32[0]; - snprintf(service_name, sizeof(service_name), "%u", ntohs(myaddr.sin_port)); - ret = 1; - } - } else { // IPV6 - struct sockaddr_in6 myaddr6; - memset(&myaddr6, 0 , sizeof(myaddr6)); - - myaddr6.sin6_family = AF_INET6; - if (is_outbound) { - myaddr6.sin6_port = idx->dport; - memcpy(myaddr6.sin6_addr.s6_addr, idx->daddr.addr8, sizeof(union netdata_ip_t)); - } else { - myaddr6.sin6_port = idx->sport; - memcpy(myaddr6.sin6_addr.s6_addr, idx->saddr.addr8, sizeof(union netdata_ip_t)); - } + if(!inet_ntop(AF_INET, &ipv4_addr.sin_addr, dst->socket_string.src_ip, INET6_ADDRSTRLEN)) + netdata_log_info("Cannot convert IP %u .", ipv4_addr.sin_addr.s_addr); - ret = (!resolve_name)?-1:getnameinfo((struct sockaddr *)&myaddr6, sizeof(myaddr6), hostname, - sizeof(hostname), service_name, sizeof(service_name), NI_NAMEREQD); + ipv4_addr.sin_addr.s_addr = key->daddr.addr32[0]; - if (!ret && !resolve_service) { - snprintf(service_name, sizeof(service_name), "%u", ntohs(myaddr6.sin6_port)); + if(!inet_ntop(AF_INET, &ipv4_addr.sin_addr, dst->socket_string.dst_ip, INET6_ADDRSTRLEN)) + netdata_log_info("Cannot convert IP %u .", ipv4_addr.sin_addr.s_addr); + snprintfz(dst->socket_string.dst_port, NI_MAXSERV, "%u", ntohs(key->dport)); } - - if (ret) { - // I cannot resolve the name, I will use the IP - if (!inet_ntop(AF_INET6, myaddr6.sin6_addr.s6_addr, hostname, NI_MAXHOST)) { - strncpy(hostname, errname, 13); + } else { + struct sockaddr_in6 ipv6_addr = { }; + memcpy(&ipv6_addr.sin6_addr, key->saddr.addr8, sizeof(key->saddr.addr8)); + ipv6_addr.sin6_family = AF_INET6; + if (resolve) { + ret = getnameinfo((struct sockaddr *) &ipv6_addr, sizeof(ipv6_addr), dst->socket_string.src_ip, + INET6_ADDRSTRLEN, service, NI_MAXSERV, NI_NUMERICHOST | NI_NUMERICSERV); + if (ret) { + collector_error("Cannot resolve name: %s", gai_strerror(ret)); + resolve = 0; + } else { + memcpy(&ipv6_addr.sin6_addr, key->daddr.addr8, sizeof(key->daddr.addr8)); + ret = getnameinfo((struct sockaddr *) &ipv6_addr, sizeof(ipv6_addr), dst->socket_string.dst_ip, + INET6_ADDRSTRLEN, dst->socket_string.dst_port, NI_MAXSERV, + NI_NUMERICHOST); + if (ret) { + collector_error("Cannot resolve name: %s", gai_strerror(ret)); + resolve = 0; + } } + } - snprintf(service_name, sizeof(service_name), "%u", ntohs(myaddr6.sin6_port)); + if (!resolve) { + memcpy(&ipv6_addr.sin6_addr, key->saddr.addr8, sizeof(key->saddr.addr8)); + if(!inet_ntop(AF_INET6, &ipv6_addr.sin6_addr, dst->socket_string.src_ip, INET6_ADDRSTRLEN)) + netdata_log_info("Cannot convert IPv6 Address."); - ret = 1; + memcpy(&ipv6_addr.sin6_addr, key->daddr.addr8, sizeof(key->daddr.addr8)); + if(!inet_ntop(AF_INET6, &ipv6_addr.sin6_addr, dst->socket_string.dst_ip, INET6_ADDRSTRLEN)) + netdata_log_info("Cannot convert IPv6 Address."); + snprintfz(dst->socket_string.dst_port, NI_MAXSERV, "%u", ntohs(key->dport)); } } + dst->pid = key->pid; - fill_resolved_name(ptr, hostname, - strlen(hostname) + strlen(service_name)+ NETDATA_DOTS_PROTOCOL_COMBINED_LENGTH, - service_name, is_outbound); - - if (resolve_name && !ret) - ret = hostname_matches_pattern(hostname); - - ptr->resolved++; - - return ret; -} - -/** - * Fill last Network Viewer Dimension - * - * Fill the unique dimension that is always plotted. - * - * @param ptr the pointer for the last dimension - * @param is_outbound is this an inbound structure? - */ -static void fill_last_nv_dimension(netdata_socket_plot_t *ptr, int is_outbound) -{ - char hostname[NI_MAXHOST], service_name[NI_MAXSERV]; - char *other = { "other" }; - // We are also copying the NULL bytes to avoid warnings in new compilers - strncpy(hostname, other, 6); - strncpy(service_name, other, 6); - - ptr->family = AF_INET; - ptr->sock.protocol = 255; - ptr->flags = (!is_outbound)?NETDATA_INBOUND_DIRECTION:NETDATA_OUTBOUND_DIRECTION; - - fill_resolved_name(ptr, hostname, 10 + NETDATA_DOTS_PROTOCOL_COMBINED_LENGTH, service_name, is_outbound); - -#ifdef NETDATA_INTERNAL_CHECKS - netdata_log_info("Last %s dimension added: ID = %u, IP = OTHER, NAME = %s, DIM1 = %s, DIM2 = %s, DIM3 = %s", - (is_outbound)?"outbound":"inbound", network_viewer_opt.max_dim - 1, ptr->resolved_name, - ptr->dimension_recv, ptr->dimension_sent, ptr->dimension_retransmit); + if (!strcmp(dst->socket_string.dst_port, "0")) + snprintfz(dst->socket_string.dst_port, NI_MAXSERV, "%u", ntohs(key->dport)); +#ifdef NETDATA_DEV_MODE + collector_info("New socket: { ORIGIN IP: %s, ORIGIN : %u, DST IP:%s, DST PORT: %s, PID: %u, PROTO: %d, FAMILY: %d}", + dst->socket_string.src_ip, + dst->data.external_origin, + dst->socket_string.dst_ip, + dst->socket_string.dst_port, + dst->pid, + dst->data.protocol, + dst->data.family + ); #endif } /** - * Update Socket Data + * Update array vectors * - * Update the socket information with last collected data + * Read data from hash table and update vectors. * - * @param sock - * @param lvalues + * @param em the structure with configuration */ -static inline void update_socket_data(netdata_socket_t *sock, netdata_socket_t *lvalues) +static void ebpf_update_array_vectors(ebpf_module_t *em) { - sock->recv_packets = lvalues->recv_packets; - sock->sent_packets = lvalues->sent_packets; - sock->recv_bytes = lvalues->recv_bytes; - sock->sent_bytes = lvalues->sent_bytes; - sock->retransmit = lvalues->retransmit; - sock->ct = lvalues->ct; -} + netdata_thread_disable_cancelability(); + netdata_socket_idx_t key = {}; + netdata_socket_idx_t next_key = {}; -/** - * Store socket inside avl - * - * Store the socket values inside the avl tree. - * - * @param out the structure with information used to plot charts. - * @param lvalues Values read from socket ring. - * @param lindex the index information, the real socket. - * @param family the family associated to the socket - * @param flags the connection flags - */ -static void store_socket_inside_avl(netdata_vector_plot_t *out, netdata_socket_t *lvalues, - netdata_socket_idx_t *lindex, int family, uint32_t flags) -{ - netdata_socket_plot_t test, *ret ; + int maps_per_core = em->maps_per_core; + int fd = em->maps[NETDATA_SOCKET_OPEN_SOCKET].map_fd; - memcpy(&test.index, lindex, sizeof(netdata_socket_idx_t)); - test.flags = flags; + netdata_socket_t *values = socket_values; + size_t length = sizeof(netdata_socket_t); + int test, end; + if (maps_per_core) { + length *= ebpf_nprocs; + end = ebpf_nprocs; + } else + end = 1; - ret = (netdata_socket_plot_t *) avl_search_lock(&out->tree, (avl_t *)&test); - if (ret) { - if (lvalues->ct != ret->plot.last_time) { - update_socket_data(&ret->sock, lvalues); + // We need to reset the values when we are working on kernel 4.15 or newer, because kernel does not create + // values for specific processor unless it is used to store data. As result of this behavior one the next socket + // can have values from the previous one. + memset(values, 0, length); + time_t update_time = time(NULL); + while (bpf_map_get_next_key(fd, &key, &next_key) == 0) { + test = bpf_map_lookup_elem(fd, &key, values); + if (test < 0) { + goto end_socket_loop; } - } else { - uint32_t curr = out->next; - uint32_t last = out->last; - - netdata_socket_plot_t *w = &out->plot[curr]; - - int resolved; - if (curr == last) { - if (lvalues->ct != w->plot.last_time) { - update_socket_data(&w->sock, lvalues); - } - return; - } else { - memcpy(&w->sock, lvalues, sizeof(netdata_socket_t)); - memcpy(&w->index, lindex, sizeof(netdata_socket_idx_t)); - w->family = family; - resolved = fill_names(w, out != (netdata_vector_plot_t *)&inbound_vectors); + if (key.pid > (uint32_t)pid_max) { + goto end_socket_loop; } - if (!resolved) { - freez(w->resolved_name); - freez(w->dimension_sent); - freez(w->dimension_recv); - freez(w->dimension_retransmit); - - memset(w, 0, sizeof(netdata_socket_plot_t)); + ebpf_hash_socket_accumulator(values, end); + ebpf_socket_fill_publish_apps(key.pid, values); - return; + // We update UDP to show info with charts, but we do not show them with functions + /* + if (key.dport == NETDATA_EBPF_UDP_PORT && values[0].protocol == IPPROTO_UDP) { + bpf_map_delete_elem(fd, &key); + goto end_socket_loop; } + */ - w->flags = flags; - netdata_socket_plot_t *check ; - check = (netdata_socket_plot_t *) avl_insert_lock(&out->tree, (avl_t *)w); - if (check != w) - netdata_log_error("Internal error, cannot insert the AVL tree."); - -#ifdef NETDATA_INTERNAL_CHECKS - char iptext[INET6_ADDRSTRLEN]; - if (inet_ntop(family, &w->index.daddr.addr8, iptext, sizeof(iptext))) - netdata_log_info("New %s dimension added: ID = %u, IP = %s, NAME = %s, DIM1 = %s, DIM2 = %s, DIM3 = %s", - (out == &inbound_vectors)?"inbound":"outbound", curr, iptext, w->resolved_name, - w->dimension_recv, w->dimension_sent, w->dimension_retransmit); -#endif - curr++; - if (curr > last) - curr = last; - out->next = curr; - } -} - -/** - * Compare Vector to store - * - * Compare input values with local address to select table to store. - * - * @param direction store inbound and outbound direction. - * @param cmp index read from hash table. - * @param proto the protocol read. - * - * @return It returns the structure with address to compare. - */ -netdata_vector_plot_t * select_vector_to_store(uint32_t *direction, netdata_socket_idx_t *cmp, uint8_t proto) -{ - if (!listen_ports) { - *direction = NETDATA_OUTBOUND_DIRECTION; - return &outbound_vectors; - } + // Discard non-bind sockets + if (!key.daddr.addr64[0] && !key.daddr.addr64[1] && !key.saddr.addr64[0] && !key.saddr.addr64[1]) { + bpf_map_delete_elem(fd, &key); + goto end_socket_loop; + } - ebpf_network_viewer_port_list_t *move_ports = listen_ports; - while (move_ports) { - if (move_ports->protocol == proto && move_ports->first == cmp->sport) { - *direction = NETDATA_INBOUND_DIRECTION; - return &inbound_vectors; + // When socket is not allowed, we do not append it to table, but we are still keeping it to accumulate data. + if (!ebpf_is_socket_allowed(&key, values)) { + goto end_socket_loop; } - move_ports = move_ports->next; - } + // Get PID structure + rw_spinlock_write_lock(&ebpf_judy_pid.index.rw_spinlock); + PPvoid_t judy_array = &ebpf_judy_pid.index.JudyLArray; + netdata_ebpf_judy_pid_stats_t *pid_ptr = ebpf_get_pid_from_judy_unsafe(judy_array, key.pid); + if (!pid_ptr) { + goto end_socket_loop; + } - *direction = NETDATA_OUTBOUND_DIRECTION; - return &outbound_vectors; -} + // Get Socket structure + rw_spinlock_write_lock(&pid_ptr->socket_stats.rw_spinlock); + netdata_socket_plus_t **socket_pptr = (netdata_socket_plus_t **)ebpf_judy_insert_unsafe( + &pid_ptr->socket_stats.JudyLArray, values[0].first_timestamp); + netdata_socket_plus_t *socket_ptr = *socket_pptr; + bool translate = false; + if (likely(*socket_pptr == NULL)) { + *socket_pptr = aral_mallocz(aral_socket_table); -/** - * Hash accumulator - * - * @param values the values used to calculate the data. - * @param key the key to store data. - * @param family the connection family - * @param end the values size. - */ -static void hash_accumulator(netdata_socket_t *values, netdata_socket_idx_t *key, int family, int end) -{ - if (!network_viewer_opt.enabled || !is_socket_allowed(key, family)) - return; + socket_ptr = *socket_pptr; - uint64_t bsent = 0, brecv = 0, psent = 0, precv = 0; - uint16_t retransmit = 0; - int i; - uint8_t protocol = values[0].protocol; - uint64_t ct = values[0].ct; - for (i = 1; i < end; i++) { - netdata_socket_t *w = &values[i]; - - precv += w->recv_packets; - psent += w->sent_packets; - brecv += w->recv_bytes; - bsent += w->sent_bytes; - retransmit += w->retransmit; + translate = true; + } + uint64_t prev_period = socket_ptr->data.current_timestamp; + memcpy(&socket_ptr->data, &values[0], sizeof(netdata_socket_t)); + if (translate) + ebpf_socket_translate(socket_ptr, &key); + else { // Check socket was updated + if (prev_period) { + if (values[0].current_timestamp > prev_period) // Socket updated + socket_ptr->last_update = update_time; + else if ((update_time - socket_ptr->last_update) > em->update_every) { + // Socket was not updated since last read + JudyLDel(&pid_ptr->socket_stats.JudyLArray, values[0].first_timestamp, PJE0); + aral_freez(aral_socket_table, socket_ptr); + } + } else // First time + socket_ptr->last_update = update_time; + } - if (!protocol) - protocol = w->protocol; + rw_spinlock_write_unlock(&pid_ptr->socket_stats.rw_spinlock); + rw_spinlock_write_unlock(&ebpf_judy_pid.index.rw_spinlock); - if (w->ct != ct) - ct = w->ct; +end_socket_loop: + memset(values, 0, length); + memcpy(&key, &next_key, sizeof(key)); } - - values[0].recv_packets += precv; - values[0].sent_packets += psent; - values[0].recv_bytes += brecv; - values[0].sent_bytes += bsent; - values[0].retransmit += retransmit; - values[0].protocol = (!protocol)?IPPROTO_TCP:protocol; - values[0].ct = ct; - - uint32_t dir; - netdata_vector_plot_t *table = select_vector_to_store(&dir, key, protocol); - store_socket_inside_avl(table, &values[0], key, family, dir); + netdata_thread_enable_cancelability(); } /** - * Read socket hash table + * Socket thread * - * Read data from hash tables created on kernel ring. + * Thread used to generate socket charts. * - * @param fd the hash table with data. - * @param family the family associated to the hash table - * @param maps_per_core do I need to read all cores? + * @param ptr a pointer to `struct ebpf_module` * - * @return it returns 0 on success and -1 otherwise. + * @return It always return NULL */ -static void ebpf_read_socket_hash_table(int fd, int family, int maps_per_core) +void *ebpf_read_socket_thread(void *ptr) { - netdata_socket_idx_t key = {}; - netdata_socket_idx_t next_key = {}; + heartbeat_t hb; + heartbeat_init(&hb); - netdata_socket_t *values = socket_values; - size_t length = sizeof(netdata_socket_t); - int test, end; - if (maps_per_core) { - length *= ebpf_nprocs; - end = ebpf_nprocs; - } else - end = 1; + ebpf_module_t *em = (ebpf_module_t *)ptr; - while (bpf_map_get_next_key(fd, &key, &next_key) == 0) { - // We need to reset the values when we are working on kernel 4.15 or newer, because kernel does not create - // values for specific processor unless it is used to store data. As result of this behavior one the next socket - // can have values from the previous one. - memset(values, 0, length); - test = bpf_map_lookup_elem(fd, &key, values); - if (test < 0) { - key = next_key; + ebpf_update_array_vectors(em); + + int update_every = em->update_every; + int counter = update_every - 1; + + uint32_t running_time = 0; + uint32_t lifetime = em->lifetime; + usec_t period = update_every * USEC_PER_SEC; + while (!ebpf_plugin_exit && running_time < lifetime) { + (void)heartbeat_next(&hb, period); + if (ebpf_plugin_exit || ++counter != update_every) continue; - } - hash_accumulator(values, &key, family, end); + ebpf_update_array_vectors(em); - key = next_key; + counter = 0; } + + return NULL; } /** @@ -2165,44 +1862,6 @@ static void read_listen_table() } /** - * Socket read hash - * - * This is the thread callback. - * This thread is necessary, because we cannot freeze the whole plugin to read the data on very busy socket. - * - * @param ptr It is a NULL value for this thread. - * - * @return It always returns NULL. - */ -void *ebpf_socket_read_hash(void *ptr) -{ - netdata_thread_cleanup_push(ebpf_socket_cleanup, ptr); - ebpf_module_t *em = (ebpf_module_t *)ptr; - - heartbeat_t hb; - heartbeat_init(&hb); - int fd_ipv4 = socket_maps[NETDATA_SOCKET_TABLE_IPV4].map_fd; - int fd_ipv6 = socket_maps[NETDATA_SOCKET_TABLE_IPV6].map_fd; - int maps_per_core = em->maps_per_core; - // This thread is cancelled from another thread - uint32_t running_time; - uint32_t lifetime = em->lifetime; - for (running_time = 0;!ebpf_exit_plugin && running_time < lifetime; running_time++) { - (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin) - break; - - pthread_mutex_lock(&nv_mutex); - ebpf_read_socket_hash_table(fd_ipv4, AF_INET, maps_per_core); - ebpf_read_socket_hash_table(fd_ipv6, AF_INET6, maps_per_core); - pthread_mutex_unlock(&nv_mutex); - } - - netdata_thread_cleanup_pop(1); - return NULL; -} - -/** * Read the hash table and store data to allocated vectors. * * @param stats vector used to read data from control table. @@ -2251,9 +1910,9 @@ static void ebpf_socket_read_hash_global_tables(netdata_idx_t *stats, int maps_p * Fill publish apps when necessary. * * @param current_pid the PID that I am updating - * @param eb the structure with data read from memory. + * @param ns the structure with data read from memory. */ -void ebpf_socket_fill_publish_apps(uint32_t current_pid, ebpf_bandwidth_t *eb) +void ebpf_socket_fill_publish_apps(uint32_t current_pid, netdata_socket_t *ns) { ebpf_socket_publish_apps_t *curr = socket_bandwidth_curr[current_pid]; if (!curr) { @@ -2261,98 +1920,33 @@ void ebpf_socket_fill_publish_apps(uint32_t current_pid, ebpf_bandwidth_t *eb) socket_bandwidth_curr[current_pid] = curr; } - curr->bytes_sent = eb->bytes_sent; - curr->bytes_received = eb->bytes_received; - curr->call_tcp_sent = eb->call_tcp_sent; - curr->call_tcp_received = eb->call_tcp_received; - curr->retransmit = eb->retransmit; - curr->call_udp_sent = eb->call_udp_sent; - curr->call_udp_received = eb->call_udp_received; - curr->call_close = eb->close; - curr->call_tcp_v4_connection = eb->tcp_v4_connection; - curr->call_tcp_v6_connection = eb->tcp_v6_connection; -} - -/** - * Bandwidth accumulator. - * - * @param out the vector with the values to sum - */ -void ebpf_socket_bandwidth_accumulator(ebpf_bandwidth_t *out, int maps_per_core) -{ - int i, end = (maps_per_core) ? ebpf_nprocs : 1; - ebpf_bandwidth_t *total = &out[0]; - for (i = 1; i < end; i++) { - ebpf_bandwidth_t *move = &out[i]; - total->bytes_sent += move->bytes_sent; - total->bytes_received += move->bytes_received; - total->call_tcp_sent += move->call_tcp_sent; - total->call_tcp_received += move->call_tcp_received; - total->retransmit += move->retransmit; - total->call_udp_sent += move->call_udp_sent; - total->call_udp_received += move->call_udp_received; - total->close += move->close; - total->tcp_v4_connection += move->tcp_v4_connection; - total->tcp_v6_connection += move->tcp_v6_connection; - } -} - -/** - * Update the apps data reading information from the hash table - * - * @param maps_per_core do I need to read all cores? - */ -static void ebpf_socket_update_apps_data(int maps_per_core) -{ - int fd = socket_maps[NETDATA_SOCKET_TABLE_BANDWIDTH].map_fd; - ebpf_bandwidth_t *eb = bandwidth_vector; - uint32_t key; - struct ebpf_pid_stat *pids = ebpf_root_of_pids; - size_t length = sizeof(ebpf_bandwidth_t); - if (maps_per_core) - length *= ebpf_nprocs; - while (pids) { - key = pids->pid; - - if (bpf_map_lookup_elem(fd, &key, eb)) { - pids = pids->next; - continue; - } - - ebpf_socket_bandwidth_accumulator(eb, maps_per_core); - - ebpf_socket_fill_publish_apps(key, eb); - - memset(eb, 0, length); + curr->bytes_sent += ns->tcp.tcp_bytes_sent; + curr->bytes_received += ns->tcp.tcp_bytes_received; + curr->call_tcp_sent += ns->tcp.call_tcp_sent; + curr->call_tcp_received += ns->tcp.call_tcp_received; + curr->retransmit += ns->tcp.retransmit; + curr->call_close += ns->tcp.close; + curr->call_tcp_v4_connection += ns->tcp.ipv4_connect; + curr->call_tcp_v6_connection += ns->tcp.ipv6_connect; - pids = pids->next; - } + curr->call_udp_sent += ns->udp.call_udp_sent; + curr->call_udp_received += ns->udp.call_udp_received; } /** * Update cgroup * * Update cgroup data based in PIDs. - * - * @param maps_per_core do I need to read all cores? */ -static void ebpf_update_socket_cgroup(int maps_per_core) +static void ebpf_update_socket_cgroup() { ebpf_cgroup_target_t *ect ; - ebpf_bandwidth_t *eb = bandwidth_vector; - int fd = socket_maps[NETDATA_SOCKET_TABLE_BANDWIDTH].map_fd; - - size_t length = sizeof(ebpf_bandwidth_t); - if (maps_per_core) - length *= ebpf_nprocs; - pthread_mutex_lock(&mutex_cgroup_shm); for (ect = ebpf_cgroup_pids; ect; ect = ect->next) { struct pid_on_target2 *pids; for (pids = ect->pids; pids; pids = pids->next) { int pid = pids->pid; - ebpf_bandwidth_t *out = &pids->socket; ebpf_socket_publish_apps_t *publish = &ect->publish_socket; if (likely(socket_bandwidth_curr) && socket_bandwidth_curr[pid]) { ebpf_socket_publish_apps_t *in = socket_bandwidth_curr[pid]; @@ -2367,25 +1961,6 @@ static void ebpf_update_socket_cgroup(int maps_per_core) publish->call_close = in->call_close; publish->call_tcp_v4_connection = in->call_tcp_v4_connection; publish->call_tcp_v6_connection = in->call_tcp_v6_connection; - } else { - if (!bpf_map_lookup_elem(fd, &pid, eb)) { - ebpf_socket_bandwidth_accumulator(eb, maps_per_core); - - memcpy(out, eb, sizeof(ebpf_bandwidth_t)); - - publish->bytes_sent = out->bytes_sent; - publish->bytes_received = out->bytes_received; - publish->call_tcp_sent = out->call_tcp_sent; - publish->call_tcp_received = out->call_tcp_received; - publish->retransmit = out->retransmit; - publish->call_udp_sent = out->call_udp_sent; - publish->call_udp_received = out->call_udp_received; - publish->call_close = out->close; - publish->call_tcp_v4_connection = out->tcp_v4_connection; - publish->call_tcp_v6_connection = out->tcp_v6_connection; - - memset(eb, 0, length); - } } } } @@ -2406,18 +1981,18 @@ static void ebpf_socket_sum_cgroup_pids(ebpf_socket_publish_apps_t *socket, stru memset(&accumulator, 0, sizeof(accumulator)); while (pids) { - ebpf_bandwidth_t *w = &pids->socket; - - accumulator.bytes_received += w->bytes_received; - accumulator.bytes_sent += w->bytes_sent; - accumulator.call_tcp_received += w->call_tcp_received; - accumulator.call_tcp_sent += w->call_tcp_sent; - accumulator.retransmit += w->retransmit; - accumulator.call_udp_received += w->call_udp_received; - accumulator.call_udp_sent += w->call_udp_sent; - accumulator.call_close += w->close; - accumulator.call_tcp_v4_connection += w->tcp_v4_connection; - accumulator.call_tcp_v6_connection += w->tcp_v6_connection; + netdata_socket_t *w = &pids->socket; + + accumulator.bytes_received += w->tcp.tcp_bytes_received; + accumulator.bytes_sent += w->tcp.tcp_bytes_sent; + accumulator.call_tcp_received += w->tcp.call_tcp_received; + accumulator.call_tcp_sent += w->tcp.call_tcp_sent; + accumulator.retransmit += w->tcp.retransmit; + accumulator.call_close += w->tcp.close; + accumulator.call_tcp_v4_connection += w->tcp.ipv4_connect; + accumulator.call_tcp_v6_connection += w->tcp.ipv6_connect; + accumulator.call_udp_received += w->udp.call_udp_received; + accumulator.call_udp_sent += w->udp.call_udp_sent; pids = pids->next; } @@ -2902,15 +2477,6 @@ static void socket_collector(ebpf_module_t *em) { heartbeat_t hb; heartbeat_init(&hb); - uint32_t network_connection = network_viewer_opt.enabled; - - if (network_connection) { - socket_threads.thread = mallocz(sizeof(netdata_thread_t)); - socket_threads.start_routine = ebpf_socket_read_hash; - - netdata_thread_create(socket_threads.thread, socket_threads.name, - NETDATA_THREAD_OPTION_DEFAULT, ebpf_socket_read_hash, em); - } int cgroups = em->cgroup_charts; if (cgroups) @@ -2924,9 +2490,9 @@ static void socket_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -2937,14 +2503,8 @@ static void socket_collector(ebpf_module_t *em) } pthread_mutex_lock(&collect_data_mutex); - if (socket_apps_enabled) - ebpf_socket_update_apps_data(maps_per_core); - if (cgroups) - ebpf_update_socket_cgroup(maps_per_core); - - if (network_connection) - calculate_nv_plot(); + ebpf_update_socket_cgroup(); pthread_mutex_lock(&lock); if (socket_global_enabled) @@ -2963,20 +2523,6 @@ static void socket_collector(ebpf_module_t *em) fflush(stdout); - if (network_connection) { - // We are calling fflush many times, because when we have a lot of dimensions - // we began to have not expected outputs and Netdata closed the plugin. - pthread_mutex_lock(&nv_mutex); - ebpf_socket_create_nv_charts(&inbound_vectors, update_every); - fflush(stdout); - ebpf_socket_send_nv_data(&inbound_vectors); - - ebpf_socket_create_nv_charts(&outbound_vectors, update_every); - fflush(stdout); - ebpf_socket_send_nv_data(&outbound_vectors); - pthread_mutex_unlock(&nv_mutex); - - } pthread_mutex_unlock(&lock); pthread_mutex_unlock(&collect_data_mutex); @@ -2998,42 +2544,24 @@ static void socket_collector(ebpf_module_t *em) *****************************************************************/ /** - * Allocate vectors used with this thread. + * Initialize vectors used with this thread. + * * We are not testing the return, because callocz does this and shutdown the software * case it was not possible to allocate. - * - * @param apps is apps enabled? */ -static void ebpf_socket_allocate_global_vectors(int apps) +static void ebpf_socket_initialize_global_vectors() { memset(socket_aggregated_data, 0 ,NETDATA_MAX_SOCKET_VECTOR * sizeof(netdata_syscall_stat_t)); memset(socket_publish_aggregated, 0 ,NETDATA_MAX_SOCKET_VECTOR * sizeof(netdata_publish_syscall_t)); socket_hash_values = callocz(ebpf_nprocs, sizeof(netdata_idx_t)); - if (apps) { - ebpf_socket_aral_init(); - socket_bandwidth_curr = callocz((size_t)pid_max, sizeof(ebpf_socket_publish_apps_t *)); - bandwidth_vector = callocz((size_t)ebpf_nprocs, sizeof(ebpf_bandwidth_t)); - } + ebpf_socket_aral_init(); + socket_bandwidth_curr = callocz((size_t)pid_max, sizeof(ebpf_socket_publish_apps_t *)); - socket_values = callocz((size_t)ebpf_nprocs, sizeof(netdata_socket_t)); - if (network_viewer_opt.enabled) { - inbound_vectors.plot = callocz(network_viewer_opt.max_dim, sizeof(netdata_socket_plot_t)); - outbound_vectors.plot = callocz(network_viewer_opt.max_dim, sizeof(netdata_socket_plot_t)); - } -} + aral_socket_table = ebpf_allocate_pid_aral(NETDATA_EBPF_SOCKET_ARAL_TABLE_NAME, + sizeof(netdata_socket_plus_t)); -/** - * Initialize Inbound and Outbound - * - * Initialize the common outbound and inbound sockets. - */ -static void initialize_inbound_outbound() -{ - inbound_vectors.last = network_viewer_opt.max_dim - 1; - outbound_vectors.last = inbound_vectors.last; - fill_last_nv_dimension(&inbound_vectors.plot[inbound_vectors.last], 0); - fill_last_nv_dimension(&outbound_vectors.plot[outbound_vectors.last], 1); + socket_values = callocz((size_t)ebpf_nprocs, sizeof(netdata_socket_t)); } /***************************************************************** @@ -3043,793 +2571,6 @@ static void initialize_inbound_outbound() *****************************************************************/ /** - * Fill Port list - * - * @param out a pointer to the link list. - * @param in the structure that will be linked. - */ -static inline void fill_port_list(ebpf_network_viewer_port_list_t **out, ebpf_network_viewer_port_list_t *in) -{ - if (likely(*out)) { - ebpf_network_viewer_port_list_t *move = *out, *store = *out; - uint16_t first = ntohs(in->first); - uint16_t last = ntohs(in->last); - while (move) { - uint16_t cmp_first = ntohs(move->first); - uint16_t cmp_last = ntohs(move->last); - if (cmp_first <= first && first <= cmp_last && - cmp_first <= last && last <= cmp_last ) { - netdata_log_info("The range/value (%u, %u) is inside the range/value (%u, %u) already inserted, it will be ignored.", - first, last, cmp_first, cmp_last); - freez(in->value); - freez(in); - return; - } else if (first <= cmp_first && cmp_first <= last && - first <= cmp_last && cmp_last <= last) { - netdata_log_info("The range (%u, %u) is bigger than previous range (%u, %u) already inserted, the previous will be ignored.", - first, last, cmp_first, cmp_last); - freez(move->value); - move->value = in->value; - move->first = in->first; - move->last = in->last; - freez(in); - return; - } - - store = move; - move = move->next; - } - - store->next = in; - } else { - *out = in; - } - -#ifdef NETDATA_INTERNAL_CHECKS - netdata_log_info("Adding values %s( %u, %u) to %s port list used on network viewer", - in->value, ntohs(in->first), ntohs(in->last), - (*out == network_viewer_opt.included_port)?"included":"excluded"); -#endif -} - -/** - * Parse Service List - * - * @param out a pointer to store the link list - * @param service the service used to create the structure that will be linked. - */ -static void parse_service_list(void **out, char *service) -{ - ebpf_network_viewer_port_list_t **list = (ebpf_network_viewer_port_list_t **)out; - struct servent *serv = getservbyname((const char *)service, "tcp"); - if (!serv) - serv = getservbyname((const char *)service, "udp"); - - if (!serv) { - netdata_log_info("Cannot resolv the service '%s' with protocols TCP and UDP, it will be ignored", service); - return; - } - - ebpf_network_viewer_port_list_t *w = callocz(1, sizeof(ebpf_network_viewer_port_list_t)); - w->value = strdupz(service); - w->hash = simple_hash(service); - - w->first = w->last = (uint16_t)serv->s_port; - - fill_port_list(list, w); -} - -/** - * Netmask - * - * Copied from iprange (https://github.com/firehol/iprange/blob/master/iprange.h) - * - * @param prefix create the netmask based in the CIDR value. - * - * @return - */ -static inline in_addr_t netmask(int prefix) { - - if (prefix == 0) - return (~((in_addr_t) - 1)); - else - return (in_addr_t)(~((1 << (32 - prefix)) - 1)); - -} - -/** - * Broadcast - * - * Copied from iprange (https://github.com/firehol/iprange/blob/master/iprange.h) - * - * @param addr is the ip address - * @param prefix is the CIDR value. - * - * @return It returns the last address of the range - */ -static inline in_addr_t broadcast(in_addr_t addr, int prefix) -{ - return (addr | ~netmask(prefix)); -} - -/** - * Network - * - * Copied from iprange (https://github.com/firehol/iprange/blob/master/iprange.h) - * - * @param addr is the ip address - * @param prefix is the CIDR value. - * - * @return It returns the first address of the range. - */ -static inline in_addr_t ipv4_network(in_addr_t addr, int prefix) -{ - return (addr & netmask(prefix)); -} - -/** - * IP to network long - * - * @param dst the vector to store the result - * @param ip the source ip given by our users. - * @param domain the ip domain (IPV4 or IPV6) - * @param source the original string - * - * @return it returns 0 on success and -1 otherwise. - */ -static inline int ip2nl(uint8_t *dst, char *ip, int domain, char *source) -{ - if (inet_pton(domain, ip, dst) <= 0) { - netdata_log_error("The address specified (%s) is invalid ", source); - return -1; - } - - return 0; -} - -/** - * Get IPV6 Last Address - * - * @param out the address to store the last address. - * @param in the address used to do the math. - * @param prefix number of bits used to calculate the address - */ -static void get_ipv6_last_addr(union netdata_ip_t *out, union netdata_ip_t *in, uint64_t prefix) -{ - uint64_t mask,tmp; - uint64_t ret[2]; - memcpy(ret, in->addr32, sizeof(union netdata_ip_t)); - - if (prefix == 128) { - memcpy(out->addr32, in->addr32, sizeof(union netdata_ip_t)); - return; - } else if (!prefix) { - ret[0] = ret[1] = 0xFFFFFFFFFFFFFFFF; - memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); - return; - } else if (prefix <= 64) { - ret[1] = 0xFFFFFFFFFFFFFFFFULL; - - tmp = be64toh(ret[0]); - if (prefix > 0) { - mask = 0xFFFFFFFFFFFFFFFFULL << (64 - prefix); - tmp |= ~mask; - } - ret[0] = htobe64(tmp); - } else { - mask = 0xFFFFFFFFFFFFFFFFULL << (128 - prefix); - tmp = be64toh(ret[1]); - tmp |= ~mask; - ret[1] = htobe64(tmp); - } - - memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); -} - -/** - * Calculate ipv6 first address - * - * @param out the address to store the first address. - * @param in the address used to do the math. - * @param prefix number of bits used to calculate the address - */ -static void get_ipv6_first_addr(union netdata_ip_t *out, union netdata_ip_t *in, uint64_t prefix) -{ - uint64_t mask,tmp; - uint64_t ret[2]; - - memcpy(ret, in->addr32, sizeof(union netdata_ip_t)); - - if (prefix == 128) { - memcpy(out->addr32, in->addr32, sizeof(union netdata_ip_t)); - return; - } else if (!prefix) { - ret[0] = ret[1] = 0; - memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); - return; - } else if (prefix <= 64) { - ret[1] = 0ULL; - - tmp = be64toh(ret[0]); - if (prefix > 0) { - mask = 0xFFFFFFFFFFFFFFFFULL << (64 - prefix); - tmp &= mask; - } - ret[0] = htobe64(tmp); - } else { - mask = 0xFFFFFFFFFFFFFFFFULL << (128 - prefix); - tmp = be64toh(ret[1]); - tmp &= mask; - ret[1] = htobe64(tmp); - } - - memcpy(out->addr32, ret, sizeof(union netdata_ip_t)); -} - -/** - * Is ip inside the range - * - * Check if the ip is inside a IP range - * - * @param rfirst the first ip address of the range - * @param rlast the last ip address of the range - * @param cmpfirst the first ip to compare - * @param cmplast the last ip to compare - * @param family the IP family - * - * @return It returns 1 if the IP is inside the range and 0 otherwise - */ -static int ebpf_is_ip_inside_range(union netdata_ip_t *rfirst, union netdata_ip_t *rlast, - union netdata_ip_t *cmpfirst, union netdata_ip_t *cmplast, int family) -{ - if (family == AF_INET) { - if ((rfirst->addr32[0] <= cmpfirst->addr32[0]) && (rlast->addr32[0] >= cmplast->addr32[0])) - return 1; - } else { - if (memcmp(rfirst->addr8, cmpfirst->addr8, sizeof(union netdata_ip_t)) <= 0 && - memcmp(rlast->addr8, cmplast->addr8, sizeof(union netdata_ip_t)) >= 0) { - return 1; - } - - } - return 0; -} - -/** - * Fill IP list - * - * @param out a pointer to the link list. - * @param in the structure that will be linked. - * @param table the modified table. - */ -void ebpf_fill_ip_list(ebpf_network_viewer_ip_list_t **out, ebpf_network_viewer_ip_list_t *in, char *table) -{ -#ifndef NETDATA_INTERNAL_CHECKS - UNUSED(table); -#endif - if (in->ver == AF_INET) { // It is simpler to compare using host order - in->first.addr32[0] = ntohl(in->first.addr32[0]); - in->last.addr32[0] = ntohl(in->last.addr32[0]); - } - if (likely(*out)) { - ebpf_network_viewer_ip_list_t *move = *out, *store = *out; - while (move) { - if (in->ver == move->ver && - ebpf_is_ip_inside_range(&move->first, &move->last, &in->first, &in->last, in->ver)) { - netdata_log_info("The range/value (%s) is inside the range/value (%s) already inserted, it will be ignored.", - in->value, move->value); - freez(in->value); - freez(in); - return; - } - store = move; - move = move->next; - } - - store->next = in; - } else { - *out = in; - } - -#ifdef NETDATA_INTERNAL_CHECKS - char first[256], last[512]; - if (in->ver == AF_INET) { - netdata_log_info("Adding values %s: (%u - %u) to %s IP list \"%s\" used on network viewer", - in->value, in->first.addr32[0], in->last.addr32[0], - (*out == network_viewer_opt.included_ips)?"included":"excluded", - table); - } else { - if (inet_ntop(AF_INET6, in->first.addr8, first, INET6_ADDRSTRLEN) && - inet_ntop(AF_INET6, in->last.addr8, last, INET6_ADDRSTRLEN)) - netdata_log_info("Adding values %s - %s to %s IP list \"%s\" used on network viewer", - first, last, - (*out == network_viewer_opt.included_ips)?"included":"excluded", - table); - } -#endif -} - -/** - * Parse IP List - * - * Parse IP list and link it. - * - * @param out a pointer to store the link list - * @param ip the value given as parameter - */ -static void ebpf_parse_ip_list(void **out, char *ip) -{ - ebpf_network_viewer_ip_list_t **list = (ebpf_network_viewer_ip_list_t **)out; - - char *ipdup = strdupz(ip); - union netdata_ip_t first = { }; - union netdata_ip_t last = { }; - char *is_ipv6; - if (*ip == '*' && *(ip+1) == '\0') { - memset(first.addr8, 0, sizeof(first.addr8)); - memset(last.addr8, 0xFF, sizeof(last.addr8)); - - is_ipv6 = ip; - - clean_ip_structure(list); - goto storethisip; - } - - char *end = ip; - // Move while I cannot find a separator - while (*end && *end != '/' && *end != '-') end++; - - // We will use only the classic IPV6 for while, but we could consider the base 85 in a near future - // https://tools.ietf.org/html/rfc1924 - is_ipv6 = strchr(ip, ':'); - - int select; - if (*end && !is_ipv6) { // IPV4 range - select = (*end == '/') ? 0 : 1; - *end++ = '\0'; - if (*end == '!') { - netdata_log_info("The exclusion cannot be in the second part of the range %s, it will be ignored.", ipdup); - goto cleanipdup; - } - - if (!select) { // CIDR - select = ip2nl(first.addr8, ip, AF_INET, ipdup); - if (select) - goto cleanipdup; - - select = (int) str2i(end); - if (select < NETDATA_MINIMUM_IPV4_CIDR || select > NETDATA_MAXIMUM_IPV4_CIDR) { - netdata_log_info("The specified CIDR %s is not valid, the IP %s will be ignored.", end, ip); - goto cleanipdup; - } - - last.addr32[0] = htonl(broadcast(ntohl(first.addr32[0]), select)); - // This was added to remove - // https://app.codacy.com/manual/netdata/netdata/pullRequest?prid=5810941&bid=19021977 - UNUSED(last.addr32[0]); - - uint32_t ipv4_test = htonl(ipv4_network(ntohl(first.addr32[0]), select)); - if (first.addr32[0] != ipv4_test) { - first.addr32[0] = ipv4_test; - struct in_addr ipv4_convert; - ipv4_convert.s_addr = ipv4_test; - char ipv4_msg[INET_ADDRSTRLEN]; - if(inet_ntop(AF_INET, &ipv4_convert, ipv4_msg, INET_ADDRSTRLEN)) - netdata_log_info("The network value of CIDR %s was updated for %s .", ipdup, ipv4_msg); - } - } else { // Range - select = ip2nl(first.addr8, ip, AF_INET, ipdup); - if (select) - goto cleanipdup; - - select = ip2nl(last.addr8, end, AF_INET, ipdup); - if (select) - goto cleanipdup; - } - - if (htonl(first.addr32[0]) > htonl(last.addr32[0])) { - netdata_log_info("The specified range %s is invalid, the second address is smallest than the first, it will be ignored.", - ipdup); - goto cleanipdup; - } - } else if (is_ipv6) { // IPV6 - if (!*end) { // Unique - select = ip2nl(first.addr8, ip, AF_INET6, ipdup); - if (select) - goto cleanipdup; - - memcpy(last.addr8, first.addr8, sizeof(first.addr8)); - } else if (*end == '-') { - *end++ = 0x00; - if (*end == '!') { - netdata_log_info("The exclusion cannot be in the second part of the range %s, it will be ignored.", ipdup); - goto cleanipdup; - } - - select = ip2nl(first.addr8, ip, AF_INET6, ipdup); - if (select) - goto cleanipdup; - - select = ip2nl(last.addr8, end, AF_INET6, ipdup); - if (select) - goto cleanipdup; - } else { // CIDR - *end++ = 0x00; - if (*end == '!') { - netdata_log_info("The exclusion cannot be in the second part of the range %s, it will be ignored.", ipdup); - goto cleanipdup; - } - - select = str2i(end); - if (select < 0 || select > 128) { - netdata_log_info("The CIDR %s is not valid, the address %s will be ignored.", end, ip); - goto cleanipdup; - } - - uint64_t prefix = (uint64_t)select; - select = ip2nl(first.addr8, ip, AF_INET6, ipdup); - if (select) - goto cleanipdup; - - get_ipv6_last_addr(&last, &first, prefix); - - union netdata_ip_t ipv6_test; - get_ipv6_first_addr(&ipv6_test, &first, prefix); - - if (memcmp(first.addr8, ipv6_test.addr8, sizeof(union netdata_ip_t)) != 0) { - memcpy(first.addr8, ipv6_test.addr8, sizeof(union netdata_ip_t)); - - struct in6_addr ipv6_convert; - memcpy(ipv6_convert.s6_addr, ipv6_test.addr8, sizeof(union netdata_ip_t)); - - char ipv6_msg[INET6_ADDRSTRLEN]; - if(inet_ntop(AF_INET6, &ipv6_convert, ipv6_msg, INET6_ADDRSTRLEN)) - netdata_log_info("The network value of CIDR %s was updated for %s .", ipdup, ipv6_msg); - } - } - - if ((be64toh(*(uint64_t *)&first.addr32[2]) > be64toh(*(uint64_t *)&last.addr32[2]) && - !memcmp(first.addr32, last.addr32, 2*sizeof(uint32_t))) || - (be64toh(*(uint64_t *)&first.addr32) > be64toh(*(uint64_t *)&last.addr32)) ) { - netdata_log_info("The specified range %s is invalid, the second address is smallest than the first, it will be ignored.", - ipdup); - goto cleanipdup; - } - } else { // Unique ip - select = ip2nl(first.addr8, ip, AF_INET, ipdup); - if (select) - goto cleanipdup; - - memcpy(last.addr8, first.addr8, sizeof(first.addr8)); - } - - ebpf_network_viewer_ip_list_t *store; - -storethisip: - store = callocz(1, sizeof(ebpf_network_viewer_ip_list_t)); - store->value = ipdup; - store->hash = simple_hash(ipdup); - store->ver = (uint8_t)(!is_ipv6)?AF_INET:AF_INET6; - memcpy(store->first.addr8, first.addr8, sizeof(first.addr8)); - memcpy(store->last.addr8, last.addr8, sizeof(last.addr8)); - - ebpf_fill_ip_list(list, store, "socket"); - return; - -cleanipdup: - freez(ipdup); -} - -/** - * Parse IP Range - * - * Parse the IP ranges given and create Network Viewer IP Structure - * - * @param ptr is a pointer with the text to parse. - */ -static void ebpf_parse_ips(char *ptr) -{ - // No value - if (unlikely(!ptr)) - return; - - while (likely(ptr)) { - // Move forward until next valid character - while (isspace(*ptr)) ptr++; - - // No valid value found - if (unlikely(!*ptr)) - return; - - // Find space that ends the list - char *end = strchr(ptr, ' '); - if (end) { - *end++ = '\0'; - } - - int neg = 0; - if (*ptr == '!') { - neg++; - ptr++; - } - - if (isascii(*ptr)) { // Parse port - ebpf_parse_ip_list((!neg)?(void **)&network_viewer_opt.included_ips: - (void **)&network_viewer_opt.excluded_ips, - ptr); - } - - ptr = end; - } -} - - - -/** - * Parse port list - * - * Parse an allocated port list with the range given - * - * @param out a pointer to store the link list - * @param range the informed range for the user. - */ -static void parse_port_list(void **out, char *range) -{ - int first, last; - ebpf_network_viewer_port_list_t **list = (ebpf_network_viewer_port_list_t **)out; - - char *copied = strdupz(range); - if (*range == '*' && *(range+1) == '\0') { - first = 1; - last = 65535; - - clean_port_structure(list); - goto fillenvpl; - } - - char *end = range; - //Move while I cannot find a separator - while (*end && *end != ':' && *end != '-') end++; - - //It has a range - if (likely(*end)) { - *end++ = '\0'; - if (*end == '!') { - netdata_log_info("The exclusion cannot be in the second part of the range, the range %s will be ignored.", copied); - freez(copied); - return; - } - last = str2i((const char *)end); - } else { - last = 0; - } - - first = str2i((const char *)range); - if (first < NETDATA_MINIMUM_PORT_VALUE || first > NETDATA_MAXIMUM_PORT_VALUE) { - netdata_log_info("The first port %d of the range \"%s\" is invalid and it will be ignored!", first, copied); - freez(copied); - return; - } - - if (!last) - last = first; - - if (last < NETDATA_MINIMUM_PORT_VALUE || last > NETDATA_MAXIMUM_PORT_VALUE) { - netdata_log_info("The second port %d of the range \"%s\" is invalid and the whole range will be ignored!", last, copied); - freez(copied); - return; - } - - if (first > last) { - netdata_log_info("The specified order %s is wrong, the smallest value is always the first, it will be ignored!", copied); - freez(copied); - return; - } - - ebpf_network_viewer_port_list_t *w; -fillenvpl: - w = callocz(1, sizeof(ebpf_network_viewer_port_list_t)); - w->value = copied; - w->hash = simple_hash(copied); - w->first = (uint16_t)htons((uint16_t)first); - w->last = (uint16_t)htons((uint16_t)last); - w->cmp_first = (uint16_t)first; - w->cmp_last = (uint16_t)last; - - fill_port_list(list, w); -} - -/** - * Read max dimension. - * - * Netdata plot two dimensions per connection, so it is necessary to adjust the values. - * - * @param cfg the configuration structure - */ -static void read_max_dimension(struct config *cfg) -{ - int maxdim ; - maxdim = (int) appconfig_get_number(cfg, - EBPF_NETWORK_VIEWER_SECTION, - EBPF_MAXIMUM_DIMENSIONS, - NETDATA_NV_CAP_VALUE); - if (maxdim < 0) { - netdata_log_error("'maximum dimensions = %d' must be a positive number, Netdata will change for default value %ld.", - maxdim, NETDATA_NV_CAP_VALUE); - maxdim = NETDATA_NV_CAP_VALUE; - } - - maxdim /= 2; - if (!maxdim) { - netdata_log_info("The number of dimensions is too small (%u), we are setting it to minimum 2", network_viewer_opt.max_dim); - network_viewer_opt.max_dim = 1; - return; - } - - network_viewer_opt.max_dim = (uint32_t)maxdim; -} - -/** - * Parse Port Range - * - * Parse the port ranges given and create Network Viewer Port Structure - * - * @param ptr is a pointer with the text to parse. - */ -static void parse_ports(char *ptr) -{ - // No value - if (unlikely(!ptr)) - return; - - while (likely(ptr)) { - // Move forward until next valid character - while (isspace(*ptr)) ptr++; - - // No valid value found - if (unlikely(!*ptr)) - return; - - // Find space that ends the list - char *end = strchr(ptr, ' '); - if (end) { - *end++ = '\0'; - } - - int neg = 0; - if (*ptr == '!') { - neg++; - ptr++; - } - - if (isdigit(*ptr)) { // Parse port - parse_port_list((!neg)?(void **)&network_viewer_opt.included_port:(void **)&network_viewer_opt.excluded_port, - ptr); - } else if (isalpha(*ptr)) { // Parse service - parse_service_list((!neg)?(void **)&network_viewer_opt.included_port:(void **)&network_viewer_opt.excluded_port, - ptr); - } else if (*ptr == '*') { // All - parse_port_list((!neg)?(void **)&network_viewer_opt.included_port:(void **)&network_viewer_opt.excluded_port, - ptr); - } - - ptr = end; - } -} - -/** - * Link hostname - * - * @param out is the output link list - * @param in the hostname to add to list. - */ -static void link_hostname(ebpf_network_viewer_hostname_list_t **out, ebpf_network_viewer_hostname_list_t *in) -{ - if (likely(*out)) { - ebpf_network_viewer_hostname_list_t *move = *out; - for (; move->next ; move = move->next ) { - if (move->hash == in->hash && !strcmp(move->value, in->value)) { - netdata_log_info("The hostname %s was already inserted, it will be ignored.", in->value); - freez(in->value); - simple_pattern_free(in->value_pattern); - freez(in); - return; - } - } - - move->next = in; - } else { - *out = in; - } -#ifdef NETDATA_INTERNAL_CHECKS - netdata_log_info("Adding value %s to %s hostname list used on network viewer", - in->value, - (*out == network_viewer_opt.included_hostnames)?"included":"excluded"); -#endif -} - -/** - * Link Hostnames - * - * Parse the list of hostnames to create the link list. - * This is not associated with the IP, because simple patterns like *example* cannot be resolved to IP. - * - * @param out is the output link list - * @param parse is a pointer with the text to parser. - */ -static void link_hostnames(char *parse) -{ - // No value - if (unlikely(!parse)) - return; - - while (likely(parse)) { - // Find the first valid value - while (isspace(*parse)) parse++; - - // No valid value found - if (unlikely(!*parse)) - return; - - // Find space that ends the list - char *end = strchr(parse, ' '); - if (end) { - *end++ = '\0'; - } - - int neg = 0; - if (*parse == '!') { - neg++; - parse++; - } - - ebpf_network_viewer_hostname_list_t *hostname = callocz(1 , sizeof(ebpf_network_viewer_hostname_list_t)); - hostname->value = strdupz(parse); - hostname->hash = simple_hash(parse); - hostname->value_pattern = simple_pattern_create(parse, NULL, SIMPLE_PATTERN_EXACT, true); - - link_hostname((!neg)?&network_viewer_opt.included_hostnames:&network_viewer_opt.excluded_hostnames, - hostname); - - parse = end; - } -} - -/** - * Parse network viewer section - * - * @param cfg the configuration structure - */ -void parse_network_viewer_section(struct config *cfg) -{ - read_max_dimension(cfg); - - network_viewer_opt.hostname_resolution_enabled = appconfig_get_boolean(cfg, - EBPF_NETWORK_VIEWER_SECTION, - EBPF_CONFIG_RESOLVE_HOSTNAME, - CONFIG_BOOLEAN_NO); - - network_viewer_opt.service_resolution_enabled = appconfig_get_boolean(cfg, - EBPF_NETWORK_VIEWER_SECTION, - EBPF_CONFIG_RESOLVE_SERVICE, - CONFIG_BOOLEAN_NO); - - char *value = appconfig_get(cfg, EBPF_NETWORK_VIEWER_SECTION, EBPF_CONFIG_PORTS, NULL); - parse_ports(value); - - if (network_viewer_opt.hostname_resolution_enabled) { - value = appconfig_get(cfg, EBPF_NETWORK_VIEWER_SECTION, EBPF_CONFIG_HOSTNAMES, NULL); - link_hostnames(value); - } else { - netdata_log_info("Name resolution is disabled, collector will not parser \"hostnames\" list."); - } - - value = appconfig_get(cfg, EBPF_NETWORK_VIEWER_SECTION, - "ips", "!127.0.0.1/8 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 fc00::/7 !::1/128"); - ebpf_parse_ips(value); -} - -/** * Link dimension name * * Link user specified names inside a link list. @@ -3838,7 +2579,7 @@ void parse_network_viewer_section(struct config *cfg) * @param hash the calculated hash for the dimension name. * @param name the dimension name. */ -static void link_dimension_name(char *port, uint32_t hash, char *value) +static void ebpf_link_dimension_name(char *port, uint32_t hash, char *value) { int test = str2i(port); if (test < NETDATA_MINIMUM_PORT_VALUE || test > NETDATA_MAXIMUM_PORT_VALUE){ @@ -3883,13 +2624,13 @@ static void link_dimension_name(char *port, uint32_t hash, char *value) * * @param cfg the configuration structure */ -void parse_service_name_section(struct config *cfg) +void ebpf_parse_service_name_section(struct config *cfg) { struct section *co = appconfig_get_section(cfg, EBPF_SERVICE_NAME_SECTION); if (co) { struct config_option *cv; for (cv = co->values; cv ; cv = cv->next) { - link_dimension_name(cv->name, cv->hash, cv->value); + ebpf_link_dimension_name(cv->name, cv->hash, cv->value); } } @@ -3910,23 +2651,21 @@ void parse_service_name_section(struct config *cfg) // if variable has an invalid value, we assume netdata is using 19999 int default_port = str2i(port_string); if (default_port > 0 && default_port < 65536) - link_dimension_name(port_string, simple_hash(port_string), "Netdata"); + ebpf_link_dimension_name(port_string, simple_hash(port_string), "Netdata"); } } +/** + * Parse table size options + * + * @param cfg configuration options read from user file. + */ void parse_table_size_options(struct config *cfg) { - socket_maps[NETDATA_SOCKET_TABLE_BANDWIDTH].user_input = (uint32_t) appconfig_get_number(cfg, - EBPF_GLOBAL_SECTION, - EBPF_CONFIG_BANDWIDTH_SIZE, NETDATA_MAXIMUM_CONNECTIONS_ALLOWED); - - socket_maps[NETDATA_SOCKET_TABLE_IPV4].user_input = (uint32_t) appconfig_get_number(cfg, - EBPF_GLOBAL_SECTION, - EBPF_CONFIG_IPV4_SIZE, NETDATA_MAXIMUM_CONNECTIONS_ALLOWED); - - socket_maps[NETDATA_SOCKET_TABLE_IPV6].user_input = (uint32_t) appconfig_get_number(cfg, - EBPF_GLOBAL_SECTION, - EBPF_CONFIG_IPV6_SIZE, NETDATA_MAXIMUM_CONNECTIONS_ALLOWED); + socket_maps[NETDATA_SOCKET_OPEN_SOCKET].user_input = (uint32_t) appconfig_get_number(cfg, + EBPF_GLOBAL_SECTION, + EBPF_CONFIG_SOCKET_MONITORING_SIZE, + NETDATA_MAXIMUM_CONNECTIONS_ALLOWED); socket_maps[NETDATA_SOCKET_TABLE_UDP].user_input = (uint32_t) appconfig_get_number(cfg, EBPF_GLOBAL_SECTION, @@ -3965,7 +2704,7 @@ static int ebpf_socket_load_bpf(ebpf_module_t *em) #endif if (ret) { - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); } return ret; @@ -3985,25 +2724,23 @@ void *ebpf_socket_thread(void *ptr) netdata_thread_cleanup_push(ebpf_socket_exit, ptr); ebpf_module_t *em = (ebpf_module_t *)ptr; - em->maps = socket_maps; - - parse_table_size_options(&socket_config); - - if (pthread_mutex_init(&nv_mutex, NULL)) { - netdata_log_error("Cannot initialize local mutex"); - goto endsocket; + if (em->enabled > NETDATA_THREAD_EBPF_FUNCTION_RUNNING) { + collector_error("There is already a thread %s running", em->info.thread_name); + return NULL; } - ebpf_socket_allocate_global_vectors(em->apps_charts); + em->maps = socket_maps; - if (network_viewer_opt.enabled) { - memset(&inbound_vectors.tree, 0, sizeof(avl_tree_lock)); - memset(&outbound_vectors.tree, 0, sizeof(avl_tree_lock)); - avl_init_lock(&inbound_vectors.tree, ebpf_compare_sockets); - avl_init_lock(&outbound_vectors.tree, ebpf_compare_sockets); + rw_spinlock_write_lock(&network_viewer_opt.rw_spinlock); + // It was not enabled from main config file (ebpf.d.conf) + if (!network_viewer_opt.enabled) + network_viewer_opt.enabled = appconfig_get_boolean(&socket_config, EBPF_NETWORK_VIEWER_SECTION, "enabled", + CONFIG_BOOLEAN_YES); + rw_spinlock_write_unlock(&network_viewer_opt.rw_spinlock); - initialize_inbound_outbound(); - } + parse_table_size_options(&socket_config); + + ebpf_socket_initialize_global_vectors(); if (running_on_kernel < NETDATA_EBPF_KERNEL_5_0) em->mode = MODE_ENTRY; @@ -4026,8 +2763,15 @@ void *ebpf_socket_thread(void *ptr) socket_aggregated_data, socket_publish_aggregated, socket_dimension_names, socket_id_names, algorithms, NETDATA_MAX_SOCKET_VECTOR); + ebpf_read_socket.thread = mallocz(sizeof(netdata_thread_t)); + netdata_thread_create(ebpf_read_socket.thread, + ebpf_read_socket.name, + NETDATA_THREAD_OPTION_DEFAULT, + ebpf_read_socket_thread, + em); + pthread_mutex_lock(&lock); - ebpf_create_global_charts(em); + ebpf_socket_create_global_charts(em); ebpf_update_stats(&plugin_statistics, em); ebpf_update_kernel_memory_with_vector(&plugin_statistics, em->maps, EBPF_ACTION_STAT_ADD); diff --git a/collectors/ebpf.plugin/ebpf_socket.h b/collectors/ebpf.plugin/ebpf_socket.h index ae2ee28ab..fb2404c24 100644 --- a/collectors/ebpf.plugin/ebpf_socket.h +++ b/collectors/ebpf.plugin/ebpf_socket.h @@ -4,6 +4,11 @@ #include <stdint.h> #include "libnetdata/avl/avl.h" +#include <sys/socket.h> +#ifdef HAVE_NETDB_H +#include <netdb.h> +#endif + // Module name & description #define NETDATA_EBPF_MODULE_NAME_SOCKET "socket" #define NETDATA_EBPF_SOCKET_MODULE_DESC "Monitors TCP and UDP bandwidth. This thread is integrated with apps and cgroup." @@ -11,8 +16,6 @@ // Vector indexes #define NETDATA_UDP_START 3 -#define NETDATA_SOCKET_READ_SLEEP_MS 800000ULL - // config file #define NETDATA_NETWORK_CONFIG_FILE "network.conf" #define EBPF_NETWORK_VIEWER_SECTION "network connections" @@ -21,18 +24,13 @@ #define EBPF_CONFIG_RESOLVE_SERVICE "resolve service names" #define EBPF_CONFIG_PORTS "ports" #define EBPF_CONFIG_HOSTNAMES "hostnames" -#define EBPF_CONFIG_BANDWIDTH_SIZE "bandwidth table size" -#define EBPF_CONFIG_IPV4_SIZE "ipv4 connection table size" -#define EBPF_CONFIG_IPV6_SIZE "ipv6 connection table size" +#define EBPF_CONFIG_SOCKET_MONITORING_SIZE "socket monitoring table size" #define EBPF_CONFIG_UDP_SIZE "udp connection table size" -#define EBPF_MAXIMUM_DIMENSIONS "maximum dimensions" enum ebpf_socket_table_list { - NETDATA_SOCKET_TABLE_BANDWIDTH, NETDATA_SOCKET_GLOBAL, NETDATA_SOCKET_LPORTS, - NETDATA_SOCKET_TABLE_IPV4, - NETDATA_SOCKET_TABLE_IPV6, + NETDATA_SOCKET_OPEN_SOCKET, NETDATA_SOCKET_TABLE_UDP, NETDATA_SOCKET_TABLE_CTRL }; @@ -122,13 +120,6 @@ typedef enum ebpf_socket_idx { #define NETDATA_NET_APPS_BANDWIDTH_UDP_SEND_CALLS "bandwidth_udp_send" #define NETDATA_NET_APPS_BANDWIDTH_UDP_RECV_CALLS "bandwidth_udp_recv" -// Network viewer charts -#define NETDATA_NV_OUTBOUND_BYTES "outbound_bytes" -#define NETDATA_NV_OUTBOUND_PACKETS "outbound_packets" -#define NETDATA_NV_OUTBOUND_RETRANSMIT "outbound_retransmit" -#define NETDATA_NV_INBOUND_BYTES "inbound_bytes" -#define NETDATA_NV_INBOUND_PACKETS "inbound_packets" - // Port range #define NETDATA_MINIMUM_PORT_VALUE 1 #define NETDATA_MAXIMUM_PORT_VALUE 65535 @@ -163,6 +154,8 @@ typedef enum ebpf_socket_idx { // ARAL name #define NETDATA_EBPF_SOCKET_ARAL_NAME "ebpf_socket" +#define NETDATA_EBPF_PID_SOCKET_ARAL_TABLE_NAME "ebpf_pid_socket" +#define NETDATA_EBPF_SOCKET_ARAL_TABLE_NAME "ebpf_socket_tbl" typedef struct ebpf_socket_publish_apps { // Data read @@ -246,10 +239,11 @@ typedef struct ebpf_network_viewer_hostname_list { struct ebpf_network_viewer_hostname_list *next; } ebpf_network_viewer_hostname_list_t; -#define NETDATA_NV_CAP_VALUE 50L typedef struct ebpf_network_viewer_options { + RW_SPINLOCK rw_spinlock; + uint32_t enabled; - uint32_t max_dim; // Store value read from 'maximum dimensions' + uint32_t family; // AF_INET, AF_INET6 or AF_UNSPEC (both) uint32_t hostname_resolution_enabled; uint32_t service_resolution_enabled; @@ -275,98 +269,82 @@ extern ebpf_network_viewer_options_t network_viewer_opt; * Structure to store socket information */ typedef struct netdata_socket { - uint64_t recv_packets; - uint64_t sent_packets; - uint64_t recv_bytes; - uint64_t sent_bytes; - uint64_t first; // First timestamp - uint64_t ct; // Current timestamp - uint32_t retransmit; // It is never used with UDP + // Timestamp + uint64_t first_timestamp; + uint64_t current_timestamp; + // Socket additional info uint16_t protocol; - uint16_t reserved; + uint16_t family; + uint32_t external_origin; + struct { + uint32_t call_tcp_sent; + uint32_t call_tcp_received; + uint64_t tcp_bytes_sent; + uint64_t tcp_bytes_received; + uint32_t close; //It is never used with UDP + uint32_t retransmit; //It is never used with UDP + uint32_t ipv4_connect; + uint32_t ipv6_connect; + } tcp; + + struct { + uint32_t call_udp_sent; + uint32_t call_udp_received; + uint64_t udp_bytes_sent; + uint64_t udp_bytes_received; + } udp; } netdata_socket_t; -typedef struct netdata_plot_values { - // Values used in the previous iteration - uint64_t recv_packets; - uint64_t sent_packets; - uint64_t recv_bytes; - uint64_t sent_bytes; - uint32_t retransmit; +typedef enum netdata_socket_flags { + NETDATA_SOCKET_FLAGS_ALREADY_OPEN = (1<<0) +} netdata_socket_flags_t; + +typedef enum netdata_socket_src_ip_origin { + NETDATA_EBPF_SRC_IP_ORIGIN_LOCAL, + NETDATA_EBPF_SRC_IP_ORIGIN_EXTERNAL +} netdata_socket_src_ip_origin_t; - uint64_t last_time; +typedef struct netata_socket_plus { + netdata_socket_t data; // Data read from database + uint32_t pid; + time_t last_update; + netdata_socket_flags_t flags; + + struct { + char src_ip[INET6_ADDRSTRLEN + 1]; + // uint16_t src_port; + char dst_ip[INET6_ADDRSTRLEN+ 1]; + char dst_port[NI_MAXSERV + 1]; + } socket_string; +} netdata_socket_plus_t; + +enum netdata_udp_ports { + NETDATA_EBPF_UDP_PORT = 53 +}; - // Values used to plot - uint64_t plot_recv_packets; - uint64_t plot_sent_packets; - uint64_t plot_recv_bytes; - uint64_t plot_sent_bytes; - uint16_t plot_retransmit; -} netdata_plot_values_t; +extern ARAL *aral_socket_table; /** * Index used together previous structure */ typedef struct netdata_socket_idx { union netdata_ip_t saddr; - uint16_t sport; + //uint16_t sport; union netdata_ip_t daddr; uint16_t dport; + uint32_t pid; } netdata_socket_idx_t; -// Next values were defined according getnameinfo(3) -#define NETDATA_MAX_NETWORK_COMBINED_LENGTH 1018 -#define NETDATA_DOTS_PROTOCOL_COMBINED_LENGTH 5 // :TCP: -#define NETDATA_DIM_LENGTH_WITHOUT_SERVICE_PROTOCOL 979 - -#define NETDATA_INBOUND_DIRECTION (uint32_t)1 -#define NETDATA_OUTBOUND_DIRECTION (uint32_t)2 -/** - * Allocate the maximum number of structures in the beginning, this can force the collector to use more memory - * in the long term, on the other had it is faster. - */ -typedef struct netdata_socket_plot { - // Search - avl_t avl; - netdata_socket_idx_t index; - - // Current data - netdata_socket_t sock; - - // Previous values and values used to write on chart. - netdata_plot_values_t plot; - - int family; // AF_INET or AF_INET6 - char *resolved_name; // Resolve only in the first call - unsigned char resolved; - - char *dimension_sent; - char *dimension_recv; - char *dimension_retransmit; - - uint32_t flags; -} netdata_socket_plot_t; - -#define NETWORK_VIEWER_CHARTS_CREATED (uint32_t)1 -typedef struct netdata_vector_plot { - netdata_socket_plot_t *plot; // Vector used to plot charts - - avl_tree_lock tree; // AVL tree to speed up search - uint32_t last; // The 'other' dimension, the last chart accepted. - uint32_t next; // The next position to store in the vector. - uint32_t max_plot; // Max number of elements to plot. - uint32_t last_plot; // Last element plot - - uint32_t flags; // Flags - -} netdata_vector_plot_t; - -void clean_port_structure(ebpf_network_viewer_port_list_t **clean); +void ebpf_clean_port_structure(ebpf_network_viewer_port_list_t **clean); extern ebpf_network_viewer_port_list_t *listen_ports; void update_listen_table(uint16_t value, uint16_t proto, netdata_passive_connection_t *values); -void parse_network_viewer_section(struct config *cfg); -void ebpf_fill_ip_list(ebpf_network_viewer_ip_list_t **out, ebpf_network_viewer_ip_list_t *in, char *table); -void parse_service_name_section(struct config *cfg); +void ebpf_fill_ip_list_unsafe(ebpf_network_viewer_ip_list_t **out, ebpf_network_viewer_ip_list_t *in, char *table); +void ebpf_parse_service_name_section(struct config *cfg); +void ebpf_parse_ips_unsafe(char *ptr); +void ebpf_parse_ports(char *ptr); +void ebpf_socket_read_open_connections(BUFFER *buf, struct ebpf_module *em); +void ebpf_socket_fill_publish_apps(uint32_t current_pid, netdata_socket_t *ns); + extern struct config socket_config; extern netdata_ebpf_targets_t socket_targets[]; diff --git a/collectors/ebpf.plugin/ebpf_softirq.c b/collectors/ebpf.plugin/ebpf_softirq.c index 8d8930a10..711ff43a6 100644 --- a/collectors/ebpf.plugin/ebpf_softirq.c +++ b/collectors/ebpf.plugin/ebpf_softirq.c @@ -218,9 +218,9 @@ static void softirq_collector(ebpf_module_t *em) //This will be cancelled by its parent uint32_t running_time = 0; uint32_t lifetime = em->lifetime; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; diff --git a/collectors/ebpf.plugin/ebpf_swap.c b/collectors/ebpf.plugin/ebpf_swap.c index 359fe2308..d0c8cee3d 100644 --- a/collectors/ebpf.plugin/ebpf_swap.c +++ b/collectors/ebpf.plugin/ebpf_swap.c @@ -124,13 +124,6 @@ static int ebpf_swap_attach_kprobe(struct swap_bpf *obj) if (ret) return -1; - obj->links.netdata_release_task_probe = bpf_program__attach_kprobe(obj->progs.netdata_release_task_probe, - false, - EBPF_COMMON_FNCT_CLEAN_UP); - ret = libbpf_get_error(obj->links.netdata_swap_writepage_probe); - if (ret) - return -1; - return 0; } @@ -176,7 +169,6 @@ static void ebpf_swap_adjust_map(struct swap_bpf *obj, ebpf_module_t *em) static void ebpf_swap_disable_release_task(struct swap_bpf *obj) { bpf_program__set_autoload(obj->progs.netdata_release_task_fentry, false); - bpf_program__set_autoload(obj->progs.netdata_release_task_probe, false); } /** @@ -804,9 +796,9 @@ static void swap_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; @@ -959,7 +951,7 @@ static int ebpf_swap_load_bpf(ebpf_module_t *em) #endif if (ret) - netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->thread_name); + netdata_log_error("%s %s", EBPF_DEFAULT_ERROR_MSG, em->info.thread_name); return ret; } diff --git a/collectors/ebpf.plugin/ebpf_sync.c b/collectors/ebpf.plugin/ebpf_sync.c index 521d39f31..95dda19fd 100644 --- a/collectors/ebpf.plugin/ebpf_sync.c +++ b/collectors/ebpf.plugin/ebpf_sync.c @@ -383,7 +383,7 @@ static void ebpf_sync_exit(void *ptr) */ static int ebpf_sync_load_legacy(ebpf_sync_syscalls_t *w, ebpf_module_t *em) { - em->thread_name = w->syscall; + em->info.thread_name = w->syscall; if (!w->probe_links) { w->probe_links = ebpf_load_program(ebpf_plugin_dir, em, running_on_kernel, isrh, &w->objects); if (!w->probe_links) { @@ -413,7 +413,7 @@ static int ebpf_sync_initialize_syscall(ebpf_module_t *em) #endif int i; - const char *saved_name = em->thread_name; + const char *saved_name = em->info.thread_name; int errors = 0; for (i = 0; local_syscalls[i].syscall; i++) { ebpf_sync_syscalls_t *w = &local_syscalls[i]; @@ -424,7 +424,7 @@ static int ebpf_sync_initialize_syscall(ebpf_module_t *em) if (ebpf_sync_load_legacy(w, em)) errors++; - em->thread_name = saved_name; + em->info.thread_name = saved_name; } #ifdef LIBBPF_MAJOR_VERSION else { @@ -446,12 +446,12 @@ static int ebpf_sync_initialize_syscall(ebpf_module_t *em) w->enabled = false; } - em->thread_name = saved_name; + em->info.thread_name = saved_name; } #endif } } - em->thread_name = saved_name; + em->info.thread_name = saved_name; memset(sync_counter_aggregated_data, 0 , NETDATA_SYNC_IDX_END * sizeof(netdata_syscall_stat_t)); memset(sync_counter_publish_aggregated, 0 , NETDATA_SYNC_IDX_END * sizeof(netdata_publish_syscall_t)); @@ -560,9 +560,9 @@ static void sync_collector(ebpf_module_t *em) int maps_per_core = em->maps_per_core; uint32_t running_time = 0; uint32_t lifetime = em->lifetime; - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; diff --git a/collectors/ebpf.plugin/ebpf_unittest.c b/collectors/ebpf.plugin/ebpf_unittest.c index 3e1443ad3..11b449e03 100644 --- a/collectors/ebpf.plugin/ebpf_unittest.c +++ b/collectors/ebpf.plugin/ebpf_unittest.c @@ -12,8 +12,8 @@ ebpf_module_t test_em; void ebpf_ut_initialize_structure(netdata_run_mode_t mode) { memset(&test_em, 0, sizeof(ebpf_module_t)); - test_em.thread_name = strdupz("process"); - test_em.config_name = test_em.thread_name; + test_em.info.thread_name = strdupz("process"); + test_em.info.config_name = test_em.info.thread_name; test_em.kernels = NETDATA_V3_10 | NETDATA_V4_14 | NETDATA_V4_16 | NETDATA_V4_18 | NETDATA_V5_4 | NETDATA_V5_10 | NETDATA_V5_14; test_em.pid_map_size = ND_EBPF_DEFAULT_PID_SIZE; @@ -28,7 +28,7 @@ void ebpf_ut_initialize_structure(netdata_run_mode_t mode) */ void ebpf_ut_cleanup_memory() { - freez((void *)test_em.thread_name); + freez((void *)test_em.info.thread_name); } /** @@ -70,14 +70,14 @@ int ebpf_ut_load_real_binary() */ int ebpf_ut_load_fake_binary() { - const char *original = test_em.thread_name; + const char *original = test_em.info.thread_name; - test_em.thread_name = strdupz("I_am_not_here"); + test_em.info.thread_name = strdupz("I_am_not_here"); int ret = ebpf_ut_load_binary(); ebpf_ut_cleanup_memory(); - test_em.thread_name = original; + test_em.info.thread_name = original; return !ret; } diff --git a/collectors/ebpf.plugin/ebpf_vfs.c b/collectors/ebpf.plugin/ebpf_vfs.c index e566e169d..1e06e2a75 100644 --- a/collectors/ebpf.plugin/ebpf_vfs.c +++ b/collectors/ebpf.plugin/ebpf_vfs.c @@ -1960,9 +1960,9 @@ static void vfs_collector(ebpf_module_t *em) uint32_t lifetime = em->lifetime; netdata_idx_t *stats = em->hash_table_stats; memset(stats, 0, sizeof(em->hash_table_stats)); - while (!ebpf_exit_plugin && running_time < lifetime) { + while (!ebpf_plugin_exit && running_time < lifetime) { (void)heartbeat_next(&hb, USEC_PER_SEC); - if (ebpf_exit_plugin || ++counter != update_every) + if (ebpf_plugin_exit || ++counter != update_every) continue; counter = 0; diff --git a/collectors/ebpf.plugin/integrations/ebpf_cachestat.md b/collectors/ebpf.plugin/integrations/ebpf_cachestat.md new file mode 100644 index 000000000..3f2d2f57d --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_cachestat.md @@ -0,0 +1,174 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_cachestat.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Cachestat" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Cachestat + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: cachestat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Linux page cache events giving for users a general vision about how his kernel is manipulating files. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF Cachestat instance + +These metrics show total number of calls to functions inside kernel. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.cachestat_ratio | ratio | % | +| mem.cachestat_dirties | dirty | page/s | +| mem.cachestat_hits | hit | hits/s | +| mem.cachestat_misses | miss | misses/s | + +### Per apps + +These Metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.cachestat_ratio | a dimension per app group | % | +| apps.cachestat_dirties | a dimension per app group | page/s | +| apps.cachestat_hits | a dimension per app group | hits/s | +| apps.cachestat_misses | a dimension per app group | misses/s | + +### Per cgroup + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.cachestat_ratio | ratio | % | +| cgroup.cachestat_dirties | dirty | page/s | +| cgroup.cachestat_hits | hit | hits/s | +| cgroup.cachestat_misses | miss | misses/s | +| services.cachestat_ratio | a dimension per systemd service | % | +| services.cachestat_dirties | a dimension per systemd service | page/s | +| services.cachestat_hits | a dimension per systemd service | hits/s | +| services.cachestat_misses | a dimension per systemd service | misses/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/cachestat.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/cachestat.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_dcstat.md b/collectors/ebpf.plugin/integrations/ebpf_dcstat.md new file mode 100644 index 000000000..6d9abea2c --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_dcstat.md @@ -0,0 +1,172 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_dcstat.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF DCstat" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF DCstat + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: dcstat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor directory cache events per application given an overall vision about files on memory or storage device. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per apps + +These Metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.dc_ratio | a dimension per app group | % | +| apps.dc_reference | a dimension per app group | files | +| apps.dc_not_cache | a dimension per app group | files | +| apps.dc_not_found | a dimension per app group | files | + +### Per filesystem + +These metrics show total number of calls to functions inside kernel. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| filesystem.dc_reference | reference, slow, miss | files | +| filesystem.dc_hit_ratio | ratio | % | + +### Per cgroup + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.dc_ratio | ratio | % | +| cgroup.dc_reference | reference | files | +| cgroup.dc_not_cache | slow | files | +| cgroup.dc_not_found | miss | files | +| services.dc_ratio | a dimension per systemd service | % | +| services.dc_reference | a dimension per systemd service | files | +| services.dc_not_cache | a dimension per systemd service | files | +| services.dc_not_found | a dimension per systemd service | files | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/dcstat.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/dcstat.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config option</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_disk.md b/collectors/ebpf.plugin/integrations/ebpf_disk.md new file mode 100644 index 000000000..12eafce86 --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_disk.md @@ -0,0 +1,136 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_disk.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Disk" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Disk + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: disk + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Measure latency for I/O events on disk. + +Attach tracepoints to internal kernel functions. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per disk + +These metrics measure latency for I/O events on every hard disk present on host. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| disk.latency_io | latency | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug/`).` + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/disk.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/disk.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_filedescriptor.md b/collectors/ebpf.plugin/integrations/ebpf_filedescriptor.md new file mode 100644 index 000000000..0a749ec31 --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_filedescriptor.md @@ -0,0 +1,172 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_filedescriptor.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Filedescriptor" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Filedescriptor + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: filedescriptor + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor calls for functions responsible to open or close a file descriptor and possible errors. + +Attach tracing (kprobe and trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netdata sets necessary permissions during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +Depending of kernel version and frequency that files are open and close, this thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + +These Metrics show grouped information per cgroup/service. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.fd_open | open | calls/s | +| cgroup.fd_open_error | open | calls/s | +| cgroup.fd_closed | close | calls/s | +| cgroup.fd_close_error | close | calls/s | +| services.file_open | a dimension per systemd service | calls/s | +| services.file_open_error | a dimension per systemd service | calls/s | +| services.file_closed | a dimension per systemd service | calls/s | +| services.file_close_error | a dimension per systemd service | calls/s | + +### Per eBPF Filedescriptor instance + +These metrics show total number of calls to functions inside kernel. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| filesystem.file_descriptor | open, close | calls/s | +| filesystem.file_error | open, close | calls/s | + +### Per apps + +These Metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.file_open | a dimension per app group | calls/s | +| apps.file_open_error | a dimension per app group | calls/s | +| apps.file_closed | a dimension per app group | calls/s | +| apps.file_close_error | a dimension per app group | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/fd.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/fd.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_filesystem.md b/collectors/ebpf.plugin/integrations/ebpf_filesystem.md new file mode 100644 index 000000000..b6050657b --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_filesystem.md @@ -0,0 +1,162 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_filesystem.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Filesystem" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Filesystem + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: filesystem + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor latency for main actions on filesystem like I/O events. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per filesystem + +Latency charts associate with filesystem actions. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| filesystem.read_latency | latency period | calls/s | +| filesystem.open_latency | latency period | calls/s | +| filesystem.sync_latency | latency period | calls/s | + +### Per iilesystem + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| filesystem.write_latency | latency period | calls/s | + +### Per eBPF Filesystem instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| filesystem.attributte_latency | latency period | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/filesystem.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/filesystem.conf +``` +#### Options + +This configuration file have two different sections. The `[global]` overwrites default options, while `[filesystem]` allow user to select the filesystems to monitor. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | +| btrfsdist | Enable or disable latency monitoring for functions associated with btrfs filesystem. | yes | False | +| ext4dist | Enable or disable latency monitoring for functions associated with ext4 filesystem. | yes | False | +| nfsdist | Enable or disable latency monitoring for functions associated with nfs filesystem. | yes | False | +| xfsdist | Enable or disable latency monitoring for functions associated with xfs filesystem. | yes | False | +| zfsdist | Enable or disable latency monitoring for functions associated with zfs filesystem. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_hardirq.md b/collectors/ebpf.plugin/integrations/ebpf_hardirq.md new file mode 100644 index 000000000..cd89cd589 --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_hardirq.md @@ -0,0 +1,136 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_hardirq.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Hardirq" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Hardirq + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: hardirq + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor latency for each HardIRQ available. + +Attach tracepoints to internal kernel functions. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF Hardirq instance + +These metrics show latest timestamp for each hardIRQ available on host. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.hardirq_latency | hardirq names | milliseconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug/`). + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/hardirq.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/hardirq.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_mdflush.md b/collectors/ebpf.plugin/integrations/ebpf_mdflush.md new file mode 100644 index 000000000..51df30b47 --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_mdflush.md @@ -0,0 +1,131 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_mdflush.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF MDflush" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF MDflush + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: mdflush + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor when flush events happen between disks. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that `md_flush_request` is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF MDflush instance + +Number of times md_flush_request was called since last time. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mdstat.mdstat_flush | disk | flushes | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/mdflush.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/mdflush.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_mount.md b/collectors/ebpf.plugin/integrations/ebpf_mount.md new file mode 100644 index 000000000..063ffcbad --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_mount.md @@ -0,0 +1,139 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_mount.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Mount" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Mount + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: mount + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor calls for mount and umount syscall. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT, CONFIG_HAVE_SYSCALL_TRACEPOINTS), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF Mount instance + +Calls for syscalls mount an umount. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mount_points.call | mount, umount | calls/s | +| mount_points.error | mount, umount | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug/`).` + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/mount.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/mount.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_oomkill.md b/collectors/ebpf.plugin/integrations/ebpf_oomkill.md new file mode 100644 index 000000000..372921387 --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_oomkill.md @@ -0,0 +1,155 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_oomkill.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF OOMkill" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF OOMkill + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: oomkill + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor applications that reach out of memory. + +Attach tracepoint to internal kernel functions. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + +These metrics show cgroup/service that reached OOM. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.oomkills | cgroup name | kills | +| services.oomkills | a dimension per systemd service | kills | + +### Per apps + +These metrics show cgroup/service that reached OOM. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.oomkills | a dimension per app group | kills | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug/`). + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/oomkill.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/oomkill.conf +``` +#### Options + +Overwrite default configuration reducing number of I/O events + + +#### Examples +There are no configuration examples. + + + +## Troubleshooting + +### update every + + + +### ebpf load mode + + + +### lifetime + + + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_process.md b/collectors/ebpf.plugin/integrations/ebpf_process.md new file mode 100644 index 000000000..3bd92a06e --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_process.md @@ -0,0 +1,110 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_process.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Process" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Process + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: process + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor internal memory usage. + +Uses netdata internal statistic to monitor memory management by plugin. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF Process instance + +How plugin is allocating memory. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| netdata.ebpf_aral_stat_size | memory | bytes | +| netdata.ebpf_aral_stat_alloc | aral | calls | +| netdata.ebpf_threads | total, running | threads | +| netdata.ebpf_load_methods | legacy, co-re | methods | +| netdata.ebpf_kernel_memory | memory_locked | bytes | +| netdata.ebpf_hash_tables_count | hash_table | hash tables | +| netdata.ebpf_aral_stat_size | memory | bytes | +| netdata.ebpf_aral_stat_alloc | aral | calls | +| netdata.ebpf_aral_stat_size | memory | bytes | +| netdata.ebpf_aral_stat_alloc | aral | calls | +| netdata.ebpf_hash_tables_insert_pid_elements | thread | rows | +| netdata.ebpf_hash_tables_remove_pid_elements | thread | rows | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Netdata flags. + +To have these charts you need to compile netdata with flag `NETDATA_DEV_MODE`. + + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_processes.md b/collectors/ebpf.plugin/integrations/ebpf_processes.md new file mode 100644 index 000000000..6d3c0d40e --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_processes.md @@ -0,0 +1,182 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_processes.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Processes" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Processes + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: processes + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor calls for function creating tasks (threads and processes) inside Linux kernel. + +Attach tracing (kprobe or tracepoint, and trampoline) to internal kernel functions. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF Processes instance + +These metrics show total number of calls to functions inside kernel. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.process_thread | process | calls/s | +| system.process_status | process, zombie | difference | +| system.exit | process | calls/s | +| system.task_error | task | calls/s | + +### Per apps + +These Metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.process_create | a dimension per app group | calls/s | +| apps.thread_create | a dimension per app group | calls/s | +| apps.task_exit | a dimension per app group | calls/s | +| apps.task_close | a dimension per app group | calls/s | +| apps.task_error | a dimension per app group | calls/s | + +### Per cgroup + +These Metrics show grouped information per cgroup/service. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.process_create | process | calls/s | +| cgroup.thread_create | thread | calls/s | +| cgroup.task_exit | exit | calls/s | +| cgroup.task_close | process | calls/s | +| cgroup.task_error | process | calls/s | +| services.process_create | a dimension per systemd service | calls/s | +| services.thread_create | a dimension per systemd service | calls/s | +| services.task_close | a dimension per systemd service | calls/s | +| services.task_exit | a dimension per systemd service | calls/s | +| services.task_error | a dimension per systemd service | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug/`). + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/process.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/process.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). This plugin will always try to attach a tracepoint, so option here will impact only function used to monitor task (thread and process) creation. | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_shm.md b/collectors/ebpf.plugin/integrations/ebpf_shm.md new file mode 100644 index 000000000..2cfcbeb16 --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_shm.md @@ -0,0 +1,180 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_shm.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF SHM" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF SHM + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: shm + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor syscall responsible to manipulate shared memory. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + +These Metrics show grouped information per cgroup/service. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.shmget | get | calls/s | +| cgroup.shmat | at | calls/s | +| cgroup.shmdt | dt | calls/s | +| cgroup.shmctl | ctl | calls/s | +| services.shmget | a dimension per systemd service | calls/s | +| services.shmat | a dimension per systemd service | calls/s | +| services.shmdt | a dimension per systemd service | calls/s | +| services.shmctl | a dimension per systemd service | calls/s | + +### Per apps + +These Metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.shmget_call | a dimension per app group | calls/s | +| apps.shmat_call | a dimension per app group | calls/s | +| apps.shmdt_call | a dimension per app group | calls/s | +| apps.shmctl_call | a dimension per app group | calls/s | + +### Per eBPF SHM instance + +These Metrics show number of calls for specified syscall. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.shared_memory_calls | get, at, dt, ctl | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug/`).` + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/shm.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/shm.conf +``` +#### Options + +This configuration file have two different sections. The `[global]` overwrites all default options, while `[syscalls]` allow user to select the syscall to monitor. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | +| shmget | Enable or disable monitoring for syscall `shmget` | yes | False | +| shmat | Enable or disable monitoring for syscall `shmat` | yes | False | +| shmdt | Enable or disable monitoring for syscall `shmdt` | yes | False | +| shmctl | Enable or disable monitoring for syscall `shmctl` | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_socket.md b/collectors/ebpf.plugin/integrations/ebpf_socket.md new file mode 100644 index 000000000..3d621f439 --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_socket.md @@ -0,0 +1,197 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_socket.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Socket" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Socket + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: socket + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor bandwidth consumption per application for protocols TCP and UDP. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF Socket instance + +These metrics show total number of calls to functions inside kernel. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ip.inbound_conn | connection_tcp | connections/s | +| ip.tcp_outbound_conn | received | connections/s | +| ip.tcp_functions | received, send, closed | calls/s | +| ip.total_tcp_bandwidth | received, send | kilobits/s | +| ip.tcp_error | received, send | calls/s | +| ip.tcp_retransmit | retransmited | calls/s | +| ip.udp_functions | received, send | calls/s | +| ip.total_udp_bandwidth | received, send | kilobits/s | +| ip.udp_error | received, send | calls/s | + +### Per apps + +These metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.outbound_conn_v4 | a dimension per app group | connections/s | +| apps.outbound_conn_v6 | a dimension per app group | connections/s | +| apps.total_bandwidth_sent | a dimension per app group | kilobits/s | +| apps.total_bandwidth_recv | a dimension per app group | kilobits/s | +| apps.bandwidth_tcp_send | a dimension per app group | calls/s | +| apps.bandwidth_tcp_recv | a dimension per app group | calls/s | +| apps.bandwidth_tcp_retransmit | a dimension per app group | calls/s | +| apps.bandwidth_udp_send | a dimension per app group | calls/s | +| apps.bandwidth_udp_recv | a dimension per app group | calls/s | +| services.net_conn_ipv4 | a dimension per systemd service | connections/s | + +### Per cgroup + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.net_conn_ipv4 | connected_v4 | connections/s | +| cgroup.net_conn_ipv6 | connected_v6 | connections/s | +| cgroup.net_bytes_recv | received | calls/s | +| cgroup.net_bytes_sent | sent | calls/s | +| cgroup.net_tcp_recv | received | calls/s | +| cgroup.net_tcp_send | sent | calls/s | +| cgroup.net_retransmit | retransmitted | calls/s | +| cgroup.net_udp_send | sent | calls/s | +| cgroup.net_udp_recv | received | calls/s | +| services.net_conn_ipv6 | a dimension per systemd service | connections/s | +| services.net_bytes_recv | a dimension per systemd service | kilobits/s | +| services.net_bytes_sent | a dimension per systemd service | kilobits/s | +| services.net_tcp_recv | a dimension per systemd service | calls/s | +| services.net_tcp_send | a dimension per systemd service | calls/s | +| services.net_tcp_retransmit | a dimension per systemd service | calls/s | +| services.net_udp_send | a dimension per systemd service | calls/s | +| services.net_udp_recv | a dimension per systemd service | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/network.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/network.conf +``` +#### Options + +All options are defined inside section `[global]`. Options inside `network connections` are ignored for while. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| bandwidth table size | Number of elements stored inside hash tables used to monitor calls per PID. | 16384 | False | +| ipv4 connection table size | Number of elements stored inside hash tables used to monitor calls per IPV4 connections. | 16384 | False | +| ipv6 connection table size | Number of elements stored inside hash tables used to monitor calls per IPV6 connections. | 16384 | False | +| udp connection table size | Number of temporary elements stored inside hash tables used to monitor UDP connections. | 4096 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_softirq.md b/collectors/ebpf.plugin/integrations/ebpf_softirq.md new file mode 100644 index 000000000..3a061368c --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_softirq.md @@ -0,0 +1,136 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_softirq.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF SoftIRQ" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF SoftIRQ + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: softirq + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor latency for each SoftIRQ available. + +Attach kprobe to internal kernel functions. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF SoftIRQ instance + +These metrics show latest timestamp for each softIRQ available on host. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.softirq_latency | soft IRQs | milliseconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug/`).` + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/softirq.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/softirq.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_swap.md b/collectors/ebpf.plugin/integrations/ebpf_swap.md new file mode 100644 index 000000000..502cd5bce --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_swap.md @@ -0,0 +1,165 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_swap.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF SWAP" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF SWAP + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: swap + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitors when swap has I/O events and applications executing events. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + +These Metrics show grouped information per cgroup/service. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.swap_read | read | calls/s | +| cgroup.swap_write | write | calls/s | +| services.swap_read | a dimension per systemd service | calls/s | +| services.swap_write | a dimension per systemd service | calls/s | + +### Per apps + +These Metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.swap_read_call | a dimension per app group | calls/s | +| apps.swap_write_call | a dimension per app group | calls/s | + +### Per eBPF SWAP instance + +These metrics show total number of calls to functions inside kernel. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.swapcalls | write, read | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/swap.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/swap.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_sync.md b/collectors/ebpf.plugin/integrations/ebpf_sync.md new file mode 100644 index 000000000..024c3e30e --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_sync.md @@ -0,0 +1,156 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_sync.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF Sync" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF Sync + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: sync + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor syscall responsible to move data from memory to storage device. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT, CONFIG_HAVE_SYSCALL_TRACEPOINTS), files inside debugfs, and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per eBPF Sync instance + +These metrics show total number of calls to functions inside kernel. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.file_sync | fsync, fdatasync | calls/s | +| mem.meory_map | msync | calls/s | +| mem.sync | sync, syncfs | calls/s | +| mem.file_segment | sync_file_range | calls/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ sync_freq ](https://github.com/netdata/netdata/blob/master/health/health.d/synchronization.conf) | mem.sync | number of sync() system calls. Every call causes all pending modifications to filesystem metadata and cached file data to be written to the underlying filesystems. | + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + +#### Debug Filesystem + +This thread needs to attach a tracepoint to monitor when a process schedule an exit event. To allow this specific feaure, it is necessary to mount `debugfs` (`mount -t debugfs none /sys/kernel/debug`). + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/sync.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/sync.conf +``` +#### Options + +This configuration file have two different sections. The `[global]` overwrites all default options, while `[syscalls]` allow user to select the syscall to monitor. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | +| sync | Enable or disable monitoring for syscall `sync` | yes | False | +| msync | Enable or disable monitoring for syscall `msync` | yes | False | +| fsync | Enable or disable monitoring for syscall `fsync` | yes | False | +| fdatasync | Enable or disable monitoring for syscall `fdatasync` | yes | False | +| syncfs | Enable or disable monitoring for syscall `syncfs` | yes | False | +| sync_file_range | Enable or disable monitoring for syscall `sync_file_range` | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ebpf.plugin/integrations/ebpf_vfs.md b/collectors/ebpf.plugin/integrations/ebpf_vfs.md new file mode 100644 index 000000000..aa8d82caa --- /dev/null +++ b/collectors/ebpf.plugin/integrations/ebpf_vfs.md @@ -0,0 +1,207 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/integrations/ebpf_vfs.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ebpf.plugin/metadata.yaml" +sidebar_label: "eBPF VFS" +learn_status: "Published" +learn_rel_path: "Data Collection/eBPF" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# eBPF VFS + + +<img src="https://netdata.cloud/img/ebpf.jpg" width="150"/> + + +Plugin: ebpf.plugin +Module: vfs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor I/O events on Linux Virtual Filesystem. + +Attach tracing (kprobe, trampoline) to internal kernel functions according options used to compile kernel. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid because it loads data inside kernel. Netada sets necessary permission during installation time. + +### Default Behavior + +#### Auto-Detection + +The plugin checks kernel compilation flags (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) and presence of BTF files to decide which eBPF program will be attached. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +This thread will add overhead every time that an internal kernel function monitored by this thread is called. The estimated additional period of time is between 90-200ms per call on kernels that do not have BTF technology. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per cgroup + +These Metrics show grouped information per cgroup/service. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cgroup.vfs_unlink | delete | calls/s | +| cgroup.vfs_write | write | calls/s | +| cgroup.vfs_write_error | write | calls/s | +| cgroup.vfs_read | read | calls/s | +| cgroup.vfs_read_error | read | calls/s | +| cgroup.vfs_write_bytes | write | bytes/s | +| cgroup.vfs_read_bytes | read | bytes/s | +| cgroup.vfs_fsync | fsync | calls/s | +| cgroup.vfs_fsync_error | fsync | calls/s | +| cgroup.vfs_open | open | calls/s | +| cgroup.vfs_open_error | open | calls/s | +| cgroup.vfs_create | create | calls/s | +| cgroup.vfs_create_error | create | calls/s | +| services.vfs_unlink | a dimension per systemd service | calls/s | +| services.vfs_write | a dimension per systemd service | calls/s | +| services.vfs_write_error | a dimension per systemd service | calls/s | +| services.vfs_read | a dimension per systemd service | calls/s | +| services.vfs_read_error | a dimension per systemd service | calls/s | +| services.vfs_write_bytes | a dimension per systemd service | bytes/s | +| services.vfs_read_bytes | a dimension per systemd service | bytes/s | +| services.vfs_fsync | a dimension per systemd service | calls/s | +| services.vfs_fsync_error | a dimension per systemd service | calls/s | +| services.vfs_open | a dimension per systemd service | calls/s | +| services.vfs_open_error | a dimension per systemd service | calls/s | +| services.vfs_create | a dimension per systemd service | calls/s | +| services.vfs_create_error | a dimension per systemd service | calls/s | + +### Per eBPF VFS instance + +These Metrics show grouped information per cgroup/service. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| filesystem.vfs_deleted_objects | delete | calls/s | +| filesystem.vfs_io | read, write | calls/s | +| filesystem.vfs_io_bytes | read, write | bytes/s | +| filesystem.vfs_io_error | read, write | calls/s | +| filesystem.vfs_fsync | fsync | calls/s | +| filesystem.vfs_fsync_error | fsync | calls/s | +| filesystem.vfs_open | open | calls/s | +| filesystem.vfs_open_error | open | calls/s | +| filesystem.vfs_create | create | calls/s | +| filesystem.vfs_create_error | create | calls/s | + +### Per apps + +These Metrics show grouped information per apps group. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| apps.file_deleted | a dimension per app group | calls/s | +| apps.vfs_write_call | a dimension per app group | calls/s | +| apps.vfs_write_error | a dimension per app group | calls/s | +| apps.vfs_read_call | a dimension per app group | calls/s | +| apps.vfs_read_error | a dimension per app group | calls/s | +| apps.vfs_write_bytes | a dimension per app group | bytes/s | +| apps.vfs_read_bytes | a dimension per app group | bytes/s | +| apps.vfs_fsync | a dimension per app group | calls/s | +| apps.vfs_fsync_error | a dimension per app group | calls/s | +| apps.vfs_open | a dimension per app group | calls/s | +| apps.vfs_open_error | a dimension per app group | calls/s | +| apps.vfs_create | a dimension per app group | calls/s | +| apps.vfs_create_error | a dimension per app group | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Compile kernel + +Check if your kernel was compiled with necessary options (CONFIG_KPROBES, CONFIG_BPF, CONFIG_BPF_SYSCALL, CONFIG_BPF_JIT) in `/proc/config.gz` or inside /boot/config file. Some cited names can be different accoring preferences of Linux distributions. +When you do not have options set, it is necessary to get the kernel source code from https://kernel.org or a kernel package from your distribution, this last is preferred. The kernel compilation has a well definedd pattern, but distributions can deliver their configuration files +with different names. + +Now follow steps: +1. Copy the configuration file to /usr/src/linux/.config. +2. Select the necessary options: make oldconfig +3. Compile your kernel image: make bzImage +4. Compile your modules: make modules +5. Copy your new kernel image for boot loader directory +6. Install the new modules: make modules_install +7. Generate an initial ramdisk image (`initrd`) if it is necessary. +8. Update your boot loader + + + +### Configuration + +#### File + +The configuration file name for this integration is `ebpf.d/vfs.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ebpf.d/vfs.conf +``` +#### Options + +All options are defined inside section `[global]`. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 5 | False | +| ebpf load mode | Define whether plugin will monitor the call (`entry`) for the functions or it will also monitor the return (`return`). | entry | False | +| apps | Enable or disable integration with apps.plugin | no | False | +| cgroups | Enable or disable integration with cgroup.plugin | no | False | +| pid table size | Number of elements stored inside hash tables used to monitor calls per PID. | 32768 | False | +| ebpf type format | Define the file type to load an eBPF program. Three options are available: `legacy` (Attach only `kprobe`), `co-re` (Plugin tries to use `trampoline` when available), and `auto` (plugin check OS configuration before to load). | auto | False | +| ebpf co-re tracing | Select the attach method used by plugin when `co-re` is defined in previous option. Two options are available: `trampoline` (Option with lowest overhead), and `probe` (the same of legacy code). | trampoline | False | +| maps per core | Define how plugin will load their hash maps. When enabled (`yes`) plugin will load one hash table per core, instead to have centralized information. | yes | False | +| lifetime | Set default lifetime for thread when enabled by cloud. | 300 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/dev.cpu.0.freq.md b/collectors/freebsd.plugin/integrations/dev.cpu.0.freq.md new file mode 100644 index 000000000..4172aec21 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/dev.cpu.0.freq.md @@ -0,0 +1,110 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/dev.cpu.0.freq.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "dev.cpu.0.freq" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# dev.cpu.0.freq + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: dev.cpu.0.freq + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Read current CPU Scaling frequency. + +Current CPU Scaling Frequency + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per dev.cpu.0.freq instance + +The metric shows status of CPU frequency, it is direct affected by system load. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.scaling_cur_freq | frequency | MHz | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `Config options`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config Config options +``` +#### Options + + + +<details><summary></summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| dev.cpu.0.freq | Enable or disable CPU Scaling frequency metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/dev.cpu.temperature.md b/collectors/freebsd.plugin/integrations/dev.cpu.temperature.md new file mode 100644 index 000000000..20a1a0256 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/dev.cpu.temperature.md @@ -0,0 +1,119 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/dev.cpu.temperature.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "dev.cpu.temperature" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# dev.cpu.temperature + + +<img src="https://netdata.cloud/img/freebsd.org" width="150"/> + + +Plugin: freebsd.plugin +Module: dev.cpu.temperature + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Get current CPU temperature + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per dev.cpu.temperature instance + +This metric show latest CPU temperature. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.temperature | a dimension per core | Celsius | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| dev.cpu.temperature | Enable or disable CPU temperature metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/devstat.md b/collectors/freebsd.plugin/integrations/devstat.md new file mode 100644 index 000000000..bb578d1dd --- /dev/null +++ b/collectors/freebsd.plugin/integrations/devstat.md @@ -0,0 +1,154 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/devstat.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "devstat" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# devstat + + +<img src="https://netdata.cloud/img/hard-drive.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: devstat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information per hard disk available on host. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per devstat instance + +These metrics give a general vision about I/O events on disks. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.io | io, out | KiB/s | + +### Per disk + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| disk.io | reads, writes, frees | KiB/s | +| disk.ops | reads, writes, other, frees | operations/s | +| disk.qops | operations | operations | +| disk.util | utilization | % of time working | +| disk.iotime | reads, writes, other, frees | milliseconds/s | +| disk.await | reads, writes, other, frees | milliseconds/operation | +| disk.avgsz | reads, writes, frees | KiB/operation | +| disk.svctm | svctm | milliseconds/operation | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 10min_disk_utilization ](https://github.com/netdata/netdata/blob/master/health/health.d/disks.conf) | disk.util | average percentage of time ${label:device} disk was busy over the last 10 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:kern.devstat]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| enable new disks detected at runtime | Enable or disable possibility to detect new disks. | auto | False | +| performance metrics for pass devices | Enable or disable metrics for disks with type `PASS`. | auto | False | +| total bandwidth for all disks | Enable or disable total bandwidth metric for all disks. | yes | False | +| bandwidth for all disks | Enable or disable bandwidth for all disks metric. | auto | False | +| operations for all disks | Enable or disable operations for all disks metric. | auto | False | +| queued operations for all disks | Enable or disable queued operations for all disks metric. | auto | False | +| utilization percentage for all disks | Enable or disable utilization percentage for all disks metric. | auto | False | +| i/o time for all disks | Enable or disable I/O time for all disks metric. | auto | False | +| average completed i/o time for all disks | Enable or disable average completed I/O time for all disks metric. | auto | False | +| average completed i/o bandwidth for all disks | Enable or disable average completed I/O bandwidth for all disks metric. | auto | False | +| average service time for all disks | Enable or disable average service time for all disks metric. | auto | False | +| disable by default disks matching | Do not create charts for disks listed. | | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/getifaddrs.md b/collectors/freebsd.plugin/integrations/getifaddrs.md new file mode 100644 index 000000000..86950622e --- /dev/null +++ b/collectors/freebsd.plugin/integrations/getifaddrs.md @@ -0,0 +1,160 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/getifaddrs.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "getifaddrs" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# getifaddrs + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: getifaddrs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect traffic per network interface. + +The plugin calls `getifaddrs` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per getifaddrs instance + +General overview about network traffic. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.net | received, sent | kilobits/s | +| system.packets | received, sent, multicast_received, multicast_sent | packets/s | +| system.ipv4 | received, sent | kilobits/s | +| system.ipv6 | received, sent | kilobits/s | + +### Per network device + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| net.net | received, sent | kilobits/s | +| net.packets | received, sent, multicast_received, multicast_sent | packets/s | +| net.errors | inbound, outbound | errors/s | +| net.drops | inbound, outbound | drops/s | +| net.events | collisions | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ interface_speed ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.net | network interface ${label:device} current speed | +| [ inbound_packets_dropped_ratio ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.drops | ratio of inbound dropped packets for the network interface ${label:device} over the last 10 minutes | +| [ outbound_packets_dropped_ratio ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.drops | ratio of outbound dropped packets for the network interface ${label:device} over the last 10 minutes | +| [ 1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ 10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | +| [ interface_inbound_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.errors | number of inbound errors for the network interface ${label:device} in the last 10 minutes | +| [ interface_outbound_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.errors | number of outbound errors for the network interface ${label:device} in the last 10 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:getifaddrs]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| enable new interfaces detected at runtime | Enable or disable possibility to discover new interface after plugin starts. | auto | False | +| total bandwidth for physical interfaces | Enable or disable total bandwidth for physical interfaces metric. | auto | False | +| total packets for physical interfaces | Enable or disable total packets for physical interfaces metric. | auto | False | +| total bandwidth for ipv4 interface | Enable or disable total bandwidth for IPv4 interface metric. | auto | False | +| total bandwidth for ipv6 interfaces | Enable or disable total bandwidth for ipv6 interfaces metric. | auto | False | +| bandwidth for all interfaces | Enable or disable bandwidth for all interfaces metric. | auto | False | +| packets for all interfaces | Enable or disable packets for all interfaces metric. | auto | False | +| errors for all interfaces | Enable or disable errors for all interfaces metric. | auto | False | +| drops for all interfaces | Enable or disable drops for all interfaces metric. | auto | False | +| collisions for all interface | Enable or disable collisions for all interface metric. | auto | False | +| disable by default interfaces matching | Do not display data for intterfaces listed. | lo* | False | +| set physical interfaces for system.net | Do not show network traffic for listed interfaces. | igb* ix* cxl* em* ixl* ixlv* bge* ixgbe* vtnet* vmx* re* igc* dwc* | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/getmntinfo.md b/collectors/freebsd.plugin/integrations/getmntinfo.md new file mode 100644 index 000000000..43c1df165 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/getmntinfo.md @@ -0,0 +1,130 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/getmntinfo.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "getmntinfo" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# getmntinfo + + +<img src="https://netdata.cloud/img/hard-drive.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: getmntinfo + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information per mount point. + +The plugin calls `getmntinfo` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per mount point + +These metrics show detailss about mount point usages. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| disk.space | avail, used, reserved_for_root | GiB | +| disk.inodes | avail, used, reserved_for_root | inodes | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ disk_space_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/disks.conf) | disk.space | disk ${label:mount_point} space utilization | +| [ disk_inode_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/disks.conf) | disk.inodes | disk ${label:mount_point} inode utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:getmntinfo]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| enable new mount points detected at runtime | Cheeck new mount points during runtime. | auto | False | +| space usage for all disks | Enable or disable space usage for all disks metric. | auto | False | +| inodes usage for all disks | Enable or disable inodes usage for all disks metric. | auto | False | +| exclude space metrics on paths | Do not show metrics for listed paths. | /proc/* | False | +| exclude space metrics on filesystems | Do not monitor listed filesystems. | autofs procfs subfs devfs none | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/hw.intrcnt.md b/collectors/freebsd.plugin/integrations/hw.intrcnt.md new file mode 100644 index 000000000..5865a6f15 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/hw.intrcnt.md @@ -0,0 +1,120 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/hw.intrcnt.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "hw.intrcnt" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# hw.intrcnt + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: hw.intrcnt + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Get total number of interrupts + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per hw.intrcnt instance + +These metrics show system interrupts frequency. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.intr | interrupts | interrupts/s | +| system.interrupts | a dimension per interrupt | interrupts/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config option</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| hw.intrcnt | Enable or disable Interrupts metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/ipfw.md b/collectors/freebsd.plugin/integrations/ipfw.md new file mode 100644 index 000000000..4bd4d120b --- /dev/null +++ b/collectors/freebsd.plugin/integrations/ipfw.md @@ -0,0 +1,125 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/ipfw.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "ipfw" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# ipfw + + +<img src="https://netdata.cloud/img/firewall.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: ipfw + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information about FreeBSD firewall. + +The plugin uses RAW socket to communicate with kernel and collect data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per ipfw instance + +Theese metrics show FreeBSD firewall statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipfw.mem | dynamic, static | bytes | +| ipfw.packets | a dimension per static rule | packets/s | +| ipfw.bytes | a dimension per static rule | bytes/s | +| ipfw.active | a dimension per dynamic rule | rules | +| ipfw.expired | a dimension per dynamic rule | rules | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:ipfw]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| counters for static rules | Enable or disable counters for static rules metric. | yes | False | +| number of dynamic rules | Enable or disable number of dynamic rules metric. | yes | False | +| allocated memory | Enable or disable allocated memory metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/kern.cp_time.md b/collectors/freebsd.plugin/integrations/kern.cp_time.md new file mode 100644 index 000000000..8c509671b --- /dev/null +++ b/collectors/freebsd.plugin/integrations/kern.cp_time.md @@ -0,0 +1,138 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/kern.cp_time.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "kern.cp_time" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# kern.cp_time + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: kern.cp_time + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Total CPU utilization + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per kern.cp_time instance + +These metrics show CPU usage statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.cpu | nice, system, user, interrupt, idle | percentage | + +### Per core + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.cpu | nice, system, user, interrupt, idle | percentage | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cpu.conf) | system.cpu | average CPU utilization over the last 10 minutes (excluding iowait, nice and steal) | +| [ 10min_cpu_iowait ](https://github.com/netdata/netdata/blob/master/health/health.d/cpu.conf) | system.cpu | average CPU iowait time over the last 10 minutes | +| [ 20min_steal_cpu ](https://github.com/netdata/netdata/blob/master/health/health.d/cpu.conf) | system.cpu | average CPU steal time over the last 20 minutes | +| [ 10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cpu.conf) | system.cpu | average CPU utilization over the last 10 minutes (excluding nice) | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + +The netdata main configuration file. + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| kern.cp_time | Enable or disable Total CPU usage. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/kern.ipc.msq.md b/collectors/freebsd.plugin/integrations/kern.ipc.msq.md new file mode 100644 index 000000000..56fb11002 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/kern.ipc.msq.md @@ -0,0 +1,121 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/kern.ipc.msq.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "kern.ipc.msq" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# kern.ipc.msq + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: kern.ipc.msq + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect number of IPC message Queues + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per kern.ipc.msq instance + +These metrics show statistics IPC messages statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ipc_msq_queues | queues | queues | +| system.ipc_msq_messages | messages | messages | +| system.ipc_msq_size | allocated, used | bytes | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| kern.ipc.msq | Enable or disable IPC message queue metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/kern.ipc.sem.md b/collectors/freebsd.plugin/integrations/kern.ipc.sem.md new file mode 100644 index 000000000..6dc7d15a1 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/kern.ipc.sem.md @@ -0,0 +1,126 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/kern.ipc.sem.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "kern.ipc.sem" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# kern.ipc.sem + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: kern.ipc.sem + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information about semaphore. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per kern.ipc.sem instance + +These metrics shows counters for semaphores on host. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ipc_semaphores | semaphores | semaphores | +| system.ipc_semaphore_arrays | arrays | arrays | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ semaphores_used ](https://github.com/netdata/netdata/blob/master/health/health.d/ipc.conf) | system.ipc_semaphores | IPC semaphore utilization | +| [ semaphore_arrays_used ](https://github.com/netdata/netdata/blob/master/health/health.d/ipc.conf) | system.ipc_semaphore_arrays | IPC semaphore arrays utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| kern.ipc.sem | Enable or disable semaphore metrics. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/kern.ipc.shm.md b/collectors/freebsd.plugin/integrations/kern.ipc.shm.md new file mode 100644 index 000000000..e29aa9617 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/kern.ipc.shm.md @@ -0,0 +1,120 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/kern.ipc.shm.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "kern.ipc.shm" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# kern.ipc.shm + + +<img src="https://netdata.cloud/img/memory.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: kern.ipc.shm + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect shared memory information. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per kern.ipc.shm instance + +These metrics give status about current shared memory segments. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ipc_shared_mem_segs | segments | segments | +| system.ipc_shared_mem_size | allocated | KiB | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| kern.ipc.shm | Enable or disable shared memory metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.inet.icmp.stats.md b/collectors/freebsd.plugin/integrations/net.inet.icmp.stats.md new file mode 100644 index 000000000..ad3b7c2c1 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.inet.icmp.stats.md @@ -0,0 +1,123 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.inet.icmp.stats.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.inet.icmp.stats" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.inet.icmp.stats + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.inet.icmp.stats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information about ICMP traffic. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.inet.icmp.stats instance + +These metrics show ICMP connections statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv4.icmp | received, sent | packets/s | +| ipv4.icmp_errors | InErrors, OutErrors, InCsumErrors | packets/s | +| ipv4.icmpmsg | InEchoReps, OutEchoReps, InEchos, OutEchos | packets/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:net.inet.icmp.stats]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| IPv4 ICMP packets | Enable or disable IPv4 ICMP packets metric. | yes | False | +| IPv4 ICMP error | Enable or disable IPv4 ICMP error metric. | yes | False | +| IPv4 ICMP messages | Enable or disable IPv4 ICMP messages metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.inet.ip.stats.md b/collectors/freebsd.plugin/integrations/net.inet.ip.stats.md new file mode 100644 index 000000000..e3a048391 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.inet.ip.stats.md @@ -0,0 +1,125 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.inet.ip.stats.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.inet.ip.stats" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.inet.ip.stats + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.inet.ip.stats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect IP stats + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.inet.ip.stats instance + +These metrics show IPv4 connections statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv4.packets | received, sent, forwarded, delivered | packets/s | +| ipv4.fragsout | ok, failed, created | packets/s | +| ipv4.fragsin | ok, failed, all | packets/s | +| ipv4.errors | InDiscards, OutDiscards, InHdrErrors, OutNoRoutes, InAddrErrors, InUnknownProtos | packets/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:net.inet.ip.stats]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| ipv4 packets | Enable or disable IPv4 packets metric. | yes | False | +| ipv4 fragments sent | Enable or disable IPv4 fragments sent metric. | yes | False | +| ipv4 fragments assembly | Enable or disable IPv4 fragments assembly metric. | yes | False | +| ipv4 errors | Enable or disable IPv4 errors metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.inet.tcp.states.md b/collectors/freebsd.plugin/integrations/net.inet.tcp.states.md new file mode 100644 index 000000000..958d413af --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.inet.tcp.states.md @@ -0,0 +1,124 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.inet.tcp.states.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.inet.tcp.states" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.inet.tcp.states + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.inet.tcp.states + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.inet.tcp.states instance + +A counter for TCP connections. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv4.tcpsock | connections | active connections | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ tcp_connections ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_conn.conf) | ipv4.tcpsock | IPv4 TCP connections utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| net.inet.tcp.states | Enable or disable TCP state metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.inet.tcp.stats.md b/collectors/freebsd.plugin/integrations/net.inet.tcp.stats.md new file mode 100644 index 000000000..9aaca2656 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.inet.tcp.stats.md @@ -0,0 +1,141 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.inet.tcp.stats.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.inet.tcp.stats" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.inet.tcp.stats + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.inet.tcp.stats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect overall information about TCP connections. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.inet.tcp.stats instance + +These metrics show TCP connections statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv4.tcppackets | received, sent | packets/s | +| ipv4.tcperrors | InErrs, InCsumErrors, RetransSegs | packets/s | +| ipv4.tcphandshake | EstabResets, ActiveOpens, PassiveOpens, AttemptFails | events/s | +| ipv4.tcpconnaborts | baddata, userclosed, nomemory, timeout, linger | connections/s | +| ipv4.tcpofo | inqueue | packets/s | +| ipv4.tcpsyncookies | received, sent, failed | packets/s | +| ipv4.tcplistenissues | overflows | packets/s | +| ipv4.ecnpkts | InCEPkts, InECT0Pkts, InECT1Pkts, OutECT0Pkts, OutECT1Pkts | packets/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 1m_ipv4_tcp_resets_sent ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ipv4.tcphandshake | average number of sent TCP RESETS over the last minute | +| [ 10s_ipv4_tcp_resets_sent ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ipv4.tcphandshake | average number of sent TCP RESETS over the last 10 seconds. This can indicate a port scan, or that a service running on this host has crashed. Netdata will not send a clear notification for this alarm. | +| [ 1m_ipv4_tcp_resets_received ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ipv4.tcphandshake | average number of received TCP RESETS over the last minute | +| [ 10s_ipv4_tcp_resets_received ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ipv4.tcphandshake | average number of received TCP RESETS over the last 10 seconds. This can be an indication that a service this host needs has crashed. Netdata will not send a clear notification for this alarm. | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:net.inet.tcp.stats]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| ipv4 TCP packets | Enable or disable ipv4 TCP packets metric. | yes | False | +| ipv4 TCP errors | Enable or disable pv4 TCP errors metric. | yes | False | +| ipv4 TCP handshake issues | Enable or disable ipv4 TCP handshake issue metric. | yes | False | +| TCP connection aborts | Enable or disable TCP connection aborts metric. | auto | False | +| TCP out-of-order queue | Enable or disable TCP out-of-order queue metric. | auto | False | +| TCP SYN cookies | Enable or disable TCP SYN cookies metric. | auto | False | +| TCP listen issues | Enable or disable TCP listen issues metric. | auto | False | +| ECN packets | Enable or disable ECN packets metric. | auto | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.inet.udp.stats.md b/collectors/freebsd.plugin/integrations/net.inet.udp.stats.md new file mode 100644 index 000000000..6e9d4a7e0 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.inet.udp.stats.md @@ -0,0 +1,127 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.inet.udp.stats.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.inet.udp.stats" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.inet.udp.stats + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.inet.udp.stats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information about UDP connections. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.inet.udp.stats instance + +These metrics show UDP connections statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv4.udppackets | received, sent | packets/s | +| ipv4.udperrors | InErrors, NoPorts, RcvbufErrors, InCsumErrors, IgnoredMulti | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 1m_ipv4_udp_receive_buffer_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/udp_errors.conf) | ipv4.udperrors | average number of UDP receive buffer errors over the last minute | +| [ 1m_ipv4_udp_send_buffer_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/udp_errors.conf) | ipv4.udperrors | average number of UDP send buffer errors over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:net.inet.udp.stats]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| ipv4 UDP packets | Enable or disable ipv4 UDP packets metric. | yes | False | +| ipv4 UDP errors | Enable or disable ipv4 UDP errors metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.inet6.icmp6.stats.md b/collectors/freebsd.plugin/integrations/net.inet6.icmp6.stats.md new file mode 100644 index 000000000..b10088759 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.inet6.icmp6.stats.md @@ -0,0 +1,131 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.inet6.icmp6.stats.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.inet6.icmp6.stats" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.inet6.icmp6.stats + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.inet6.icmp6.stats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information abou IPv6 ICMP + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.inet6.icmp6.stats instance + +Collect IPv6 ICMP traffic statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv6.icmp | received, sent | messages/s | +| ipv6.icmpredir | received, sent | redirects/s | +| ipv6.icmperrors | InErrors, OutErrors, InCsumErrors, InDestUnreachs, InPktTooBigs, InTimeExcds, InParmProblems, OutDestUnreachs, OutTimeExcds, OutParmProblems | errors/s | +| ipv6.icmpechos | InEchos, OutEchos, InEchoReplies, OutEchoReplies | messages/s | +| ipv6.icmprouter | InSolicits, OutSolicits, InAdvertisements, OutAdvertisements | messages/s | +| ipv6.icmpneighbor | InSolicits, OutSolicits, InAdvertisements, OutAdvertisements | messages/s | +| ipv6.icmptypes | InType1, InType128, InType129, InType136, OutType1, OutType128, OutType129, OutType133, OutType135, OutType143 | messages/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:net.inet6.icmp6.stats]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| icmp | Enable or disable ICMP metric. | auto | False | +| icmp redirects | Enable or disable ICMP redirects metric. | auto | False | +| icmp errors | Enable or disable ICMP errors metric. | auto | False | +| icmp echos | Enable or disable ICMP echos metric. | auto | False | +| icmp router | Enable or disable ICMP router metric. | auto | False | +| icmp neighbor | Enable or disable ICMP neighbor metric. | auto | False | +| icmp types | Enable or disable ICMP types metric. | auto | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.inet6.ip6.stats.md b/collectors/freebsd.plugin/integrations/net.inet6.ip6.stats.md new file mode 100644 index 000000000..f282bb972 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.inet6.ip6.stats.md @@ -0,0 +1,125 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.inet6.ip6.stats.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.inet6.ip6.stats" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.inet6.ip6.stats + + +<img src="https://netdata.cloud/img/network.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.inet6.ip6.stats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information abou IPv6 stats. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.inet6.ip6.stats instance + +These metrics show general information about IPv6 connections. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv6.packets | received, sent, forwarded, delivers | packets/s | +| ipv6.fragsout | ok, failed, all | packets/s | +| ipv6.fragsin | ok, failed, timeout, all | packets/s | +| ipv6.errors | InDiscards, OutDiscards, InHdrErrors, InAddrErrors, InTruncatedPkts, InNoRoutes, OutNoRoutes | packets/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:net.inet6.ip6.stats]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| ipv6 packets | Enable or disable ipv6 packet metric. | auto | False | +| ipv6 fragments sent | Enable or disable ipv6 fragments sent metric. | auto | False | +| ipv6 fragments assembly | Enable or disable ipv6 fragments assembly metric. | auto | False | +| ipv6 errors | Enable or disable ipv6 errors metric. | auto | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/net.isr.md b/collectors/freebsd.plugin/integrations/net.isr.md new file mode 100644 index 000000000..a378ea30f --- /dev/null +++ b/collectors/freebsd.plugin/integrations/net.isr.md @@ -0,0 +1,139 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/net.isr.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "net.isr" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# net.isr + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: net.isr + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information about system softnet stat. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per net.isr instance + +These metrics show statistics about softnet stats. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.softnet_stat | dispatched, hybrid_dispatched, qdrops, queued | events/s | + +### Per core + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.softnet_stat | dispatched, hybrid_dispatched, qdrops, queued | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 1min_netdev_backlog_exceeded ](https://github.com/netdata/netdata/blob/master/health/health.d/softnet.conf) | system.softnet_stat | average number of dropped packets in the last minute due to exceeded net.core.netdev_max_backlog | +| [ 1min_netdev_budget_ran_outs ](https://github.com/netdata/netdata/blob/master/health/health.d/softnet.conf) | system.softnet_stat | average number of times ksoftirq ran out of sysctl net.core.netdev_budget or net.core.netdev_budget_usecs with work remaining over the last minute (this can be a cause for dropped packets) | +| [ 10min_netisr_backlog_exceeded ](https://github.com/netdata/netdata/blob/master/health/health.d/softnet.conf) | system.softnet_stat | average number of drops in the last minute due to exceeded sysctl net.route.netisr_maxqlen (this can be a cause for dropped packets) | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:net.isr]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| netisr | Enable or disable general vision about softnet stat metrics. | yes | False | +| netisr per core | Enable or disable softnet stat metric per core. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/system.ram.md b/collectors/freebsd.plugin/integrations/system.ram.md new file mode 100644 index 000000000..a1163e9ca --- /dev/null +++ b/collectors/freebsd.plugin/integrations/system.ram.md @@ -0,0 +1,128 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/system.ram.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "system.ram" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# system.ram + + +<img src="https://netdata.cloud/img/memory.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: system.ram + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Show information about system memory usage. + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per system.ram instance + +This metric shows RAM usage statistics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ram | free, active, inactive, wired, cache, laundry, buffers | MiB | +| mem.available | avail | MiB | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/ram.conf) | system.ram | system memory utilization | +| [ ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/ram.conf) | system.ram | system memory utilization | +| [ ram_available ](https://github.com/netdata/netdata/blob/master/health/health.d/ram.conf) | mem.available | percentage of estimated amount of RAM available for userspace processes, without causing swapping | +| [ ram_available ](https://github.com/netdata/netdata/blob/master/health/health.d/ram.conf) | mem.available | percentage of estimated amount of RAM available for userspace processes, without causing swapping | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| system.ram | Enable or disable system RAM metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/uptime.md b/collectors/freebsd.plugin/integrations/uptime.md new file mode 100644 index 000000000..1674be026 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/uptime.md @@ -0,0 +1,119 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/uptime.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "uptime" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# uptime + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: uptime + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Show period of time server is up. + +The plugin calls `clock_gettime` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per uptime instance + +How long the system is running. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.uptime | uptime | seconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.loadavg | Enable or disable load average metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.loadavg.md b/collectors/freebsd.plugin/integrations/vm.loadavg.md new file mode 100644 index 000000000..a934cf8f4 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.loadavg.md @@ -0,0 +1,127 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.loadavg.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.loadavg" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.loadavg + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.loadavg + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +System Load Average + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.loadavg instance + +Monitoring for number of threads running or waiting. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.load | load1, load5, load15 | load | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ load_cpu_number ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | number of active CPU cores in the system | +| [ load_average_15 ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | system fifteen-minute load average | +| [ load_average_5 ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | system five-minute load average | +| [ load_average_1 ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | system one-minute load average | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.loadavg | Enable or disable load average metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.stats.sys.v_intr.md b/collectors/freebsd.plugin/integrations/vm.stats.sys.v_intr.md new file mode 100644 index 000000000..58a0d4ced --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.stats.sys.v_intr.md @@ -0,0 +1,119 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.stats.sys.v_intr.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.stats.sys.v_intr" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.stats.sys.v_intr + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.stats.sys.v_intr + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Device interrupts + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.stats.sys.v_intr instance + +The metric show device interrupt frequency. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.dev_intr | interrupts | interrupts/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config option</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.stats.sys.v_intr | Enable or disable device interrupts metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.stats.sys.v_soft.md b/collectors/freebsd.plugin/integrations/vm.stats.sys.v_soft.md new file mode 100644 index 000000000..bc934239c --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.stats.sys.v_soft.md @@ -0,0 +1,119 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.stats.sys.v_soft.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.stats.sys.v_soft" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.stats.sys.v_soft + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.stats.sys.v_soft + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Software Interrupt + +vm.stats.sys.v_soft + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.stats.sys.v_soft instance + +This metric shows software interrupt frequency. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.soft_intr | interrupts | interrupts/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config option</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.stats.sys.v_soft | Enable or disable software inerrupts metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.stats.sys.v_swtch.md b/collectors/freebsd.plugin/integrations/vm.stats.sys.v_swtch.md new file mode 100644 index 000000000..4e16a2869 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.stats.sys.v_swtch.md @@ -0,0 +1,120 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.stats.sys.v_swtch.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.stats.sys.v_swtch" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.stats.sys.v_swtch + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.stats.sys.v_swtch + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +CPU context switch + +The plugin calls `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.stats.sys.v_swtch instance + +The metric count the number of context switches happening on host. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ctxt | switches | context switches/s | +| system.forks | started | processes/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.stats.sys.v_swtch | Enable or disable CPU context switch metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.stats.vm.v_pgfaults.md b/collectors/freebsd.plugin/integrations/vm.stats.vm.v_pgfaults.md new file mode 100644 index 000000000..5c79c3443 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.stats.vm.v_pgfaults.md @@ -0,0 +1,119 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.stats.vm.v_pgfaults.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.stats.vm.v_pgfaults" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.stats.vm.v_pgfaults + + +<img src="https://netdata.cloud/img/memory.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.stats.vm.v_pgfaults + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect memory page faults events. + +The plugin calls `sysctl` function to collect necessary data + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.stats.vm.v_pgfaults instance + +The number of page faults happened on host. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.pgfaults | memory, io_requiring, cow, cow_optimized, in_transit | page faults/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.stats.vm.v_pgfaults | Enable or disable Memory page fault metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.stats.vm.v_swappgs.md b/collectors/freebsd.plugin/integrations/vm.stats.vm.v_swappgs.md new file mode 100644 index 000000000..8ce678c97 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.stats.vm.v_swappgs.md @@ -0,0 +1,124 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.stats.vm.v_swappgs.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.stats.vm.v_swappgs" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.stats.vm.v_swappgs + + +<img src="https://netdata.cloud/img/memory.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.stats.vm.v_swappgs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +The metric swap amount of data read from and written to SWAP. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.stats.vm.v_swappgs instance + +This metric shows events happening on SWAP. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.swapio | io, out | KiB/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 30min_ram_swapped_out ](https://github.com/netdata/netdata/blob/master/health/health.d/swap.conf) | mem.swapio | percentage of the system RAM swapped in the last 30 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.stats.vm.v_swappgs | Enable or disable infoormation about SWAP I/O metric. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.swap_info.md b/collectors/freebsd.plugin/integrations/vm.swap_info.md new file mode 100644 index 000000000..345e82327 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.swap_info.md @@ -0,0 +1,124 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.swap_info.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.swap_info" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.swap_info + + +<img src="https://netdata.cloud/img/freebsd.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.swap_info + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect information about SWAP memory. + +The plugin calls `sysctlnametomib` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.swap_info instance + +This metric shows the SWAP usage. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.swap | free, used | MiB | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ used_swap ](https://github.com/netdata/netdata/blob/master/health/health.d/swap.conf) | mem.swap | swap memory utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| vm.swap_info | Enable or disable SWAP metrics. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/vm.vmtotal.md b/collectors/freebsd.plugin/integrations/vm.vmtotal.md new file mode 100644 index 000000000..3d9d91633 --- /dev/null +++ b/collectors/freebsd.plugin/integrations/vm.vmtotal.md @@ -0,0 +1,128 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/vm.vmtotal.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "vm.vmtotal" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# vm.vmtotal + + +<img src="https://netdata.cloud/img/memory.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: vm.vmtotal + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect Virtual Memory information from host. + +The plugin calls function `sysctl` to collect data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per vm.vmtotal instance + +These metrics show an overall vision about processes running. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.active_processes | active | processes | +| system.processes | running, blocked | processes | +| mem.real | used | MiB | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ active_processes ](https://github.com/netdata/netdata/blob/master/health/health.d/processes.conf) | system.active_processes | system process IDs (PID) space utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:vm.vmtotal]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config Options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| enable total processes | Number of active processes. | yes | False | +| processes running | Show number of processes running or blocked. | yes | False | +| real memory | Memeory used on host. | yes | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/integrations/zfs.md b/collectors/freebsd.plugin/integrations/zfs.md new file mode 100644 index 000000000..bb099ddcb --- /dev/null +++ b/collectors/freebsd.plugin/integrations/zfs.md @@ -0,0 +1,151 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/integrations/zfs.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freebsd.plugin/metadata.yaml" +sidebar_label: "zfs" +learn_status: "Published" +learn_rel_path: "Data Collection/FreeBSD" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# zfs + + +<img src="https://netdata.cloud/img/filesystem.svg" width="150"/> + + +Plugin: freebsd.plugin +Module: zfs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collect metrics for ZFS filesystem + +The plugin uses `sysctl` function to collect necessary data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per zfs instance + +These metrics show detailed information about ZFS filesystem. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| zfs.arc_size | arcsz, target, min, max | MiB | +| zfs.l2_size | actual, size | MiB | +| zfs.reads | arc, demand, prefetch, metadata, l2 | reads/s | +| zfs.bytes | read, write | KiB/s | +| zfs.hits | hits, misses | percentage | +| zfs.hits_rate | hits, misses | events/s | +| zfs.dhits | hits, misses | percentage | +| zfs.dhits_rate | hits, misses | events/s | +| zfs.phits | hits, misses | percentage | +| zfs.phits_rate | hits, misses | events/s | +| zfs.mhits | hits, misses | percentage | +| zfs.mhits_rate | hits, misses | events/s | +| zfs.l2hits | hits, misses | percentage | +| zfs.l2hits_rate | hits, misses | events/s | +| zfs.list_hits | mfu, mfu_ghost, mru, mru_ghost | hits/s | +| zfs.arc_size_breakdown | recent, frequent | percentage | +| zfs.memory_ops | throttled | operations/s | +| zfs.important_ops | evict_skip, deleted, mutex_miss, hash_collisions | operations/s | +| zfs.actual_hits | hits, misses | percentage | +| zfs.actual_hits_rate | hits, misses | events/s | +| zfs.demand_data_hits | hits, misses | percentage | +| zfs.demand_data_hits_rate | hits, misses | events/s | +| zfs.prefetch_data_hits | hits, misses | percentage | +| zfs.prefetch_data_hits_rate | hits, misses | events/s | +| zfs.hash_elements | current, max | elements | +| zfs.hash_chains | current, max | chains | +| zfs.trim_bytes | TRIMmed | bytes | +| zfs.trim_requests | successful, failed, unsupported | requests | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ zfs_memory_throttle ](https://github.com/netdata/netdata/blob/master/health/health.d/zfs.conf) | zfs.memory_ops | number of times ZFS had to limit the ARC growth in the last 10 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freebsd:zfs_arcstats]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| show zero charts | Do not show charts with zero metrics. | no | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/freebsd.plugin/metadata.yaml b/collectors/freebsd.plugin/metadata.yaml index fca8982f7..d68fc3137 100644 --- a/collectors/freebsd.plugin/metadata.yaml +++ b/collectors/freebsd.plugin/metadata.yaml @@ -2893,36 +2893,16 @@ modules: metric: net.net info: network interface ${label:device} current speed os: "*" - - name: 1m_received_traffic_overflow - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.net - info: average inbound utilization for the network interface ${label:device} over the last minute - os: "linux" - - name: 1m_sent_traffic_overflow - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.net - info: average outbound utilization for the network interface ${label:device} over the last minute - os: "linux" - name: inbound_packets_dropped_ratio link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets + metric: net.drops info: ratio of inbound dropped packets for the network interface ${label:device} over the last 10 minutes - os: "linux" + os: "*" - name: outbound_packets_dropped_ratio link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets - info: ratio of outbound dropped packets for the network interface ${label:device} over the last 10 minutes - os: "linux" - - name: wifi_inbound_packets_dropped_ratio - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets - info: ratio of inbound dropped packets for the network interface ${label:device} over the last 10 minutes - os: "linux" - - name: wifi_outbound_packets_dropped_ratio - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets + metric: net.drops info: ratio of outbound dropped packets for the network interface ${label:device} over the last 10 minutes - os: "linux" + os: "*" - name: 1m_received_packets_rate link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf metric: net.packets @@ -2931,9 +2911,7 @@ modules: - name: 10s_received_packets_storm link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf metric: net.packets - info: - ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over - the last minute + info: ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute os: "linux freebsd" - name: interface_inbound_errors link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf @@ -2945,16 +2923,6 @@ modules: metric: net.errors info: number of outbound errors for the network interface ${label:device} in the last 10 minutes os: "freebsd" - - name: inbound_packets_dropped - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.drops - info: number of inbound dropped packets for the network interface ${label:device} in the last 10 minutes - os: "linux" - - name: outbound_packets_dropped - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.drops - info: number of outbound dropped packets for the network interface ${label:device} in the last 10 minutes - os: "linux" metrics: folding: title: Metrics diff --git a/collectors/freeipmi.plugin/README.md b/collectors/freeipmi.plugin/README.md index 5a9fd93c0..f55ebf73d 100644..120000 --- a/collectors/freeipmi.plugin/README.md +++ b/collectors/freeipmi.plugin/README.md @@ -1,287 +1 @@ -<!-- -title: "freeipmi.plugin" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freeipmi.plugin/README.md" -sidebar_label: "freeipmi.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Devices" ---> - -# freeipmi.plugin - -Netdata has a [freeipmi](https://www.gnu.org/software/freeipmi/) plugin. - -> FreeIPMI provides in-band and out-of-band IPMI software based on the IPMI v1.5/2.0 specification. The IPMI -> specification defines a set of interfaces for platform management and is implemented by a number vendors for system -> management. The features of IPMI that most users will be interested in are sensor monitoring, system event monitoring, -> power control, and serial-over-LAN (SOL). - -## Installing the FreeIPMI plugin - -When using our official DEB/RPM packages, the FreeIPMI plugin is included in a separate package named -`netdata-plugin-freeipmi` which needs to be manually installed using your system package manager. It is not -installed automatically due to the large number of dependencies it requires. - -When using a static build of Netdata, the FreeIPMI plugin will be included and installed automatically, though -you will still need to have FreeIPMI installed on your system to be able to use the plugin. - -When using a local build of Netdata, you need to ensure that the FreeIPMI development packages (typically -called `libipmimonitoring-dev`, `libipmimonitoring-devel`, or `freeipmi-devel`) are installed when building Netdata. - -### Special Considerations - -Accessing IPMI requires root access, so the FreeIPMI plugin is automatically installed setuid root. - -FreeIPMI does not work correctly on IBM POWER systems, thus Netdata’s FreeIPMI plugin is not usable on such systems. - -If you have not previously used IPMI on your system, you will probably need to run the `ipmimonitoring` command as root -to initiailze IPMI settings so that the Netdata plugin works correctly. It should return information about available -seensors on the system. - -In some distributions `libipmimonitoring.pc` is located in a non-standard directory, which -can cause building the plugin to fail when building Netdata from source. In that case you -should find the file and link it to the standard pkg-config directory. Usually, running `sudo ln -s -/usr/lib/$(uname -m)-linux-gnu/pkgconfig/libipmimonitoring.pc/libipmimonitoring.pc /usr/lib/pkgconfig/libipmimonitoring.pc` -resolves this issue. - -## Metrics - -The plugin does a speed test when it starts, to find out the duration needed by the IPMI processor to respond. Depending -on the speed of your IPMI processor, charts may need several seconds to show up on the dashboard. - -Metrics grouped by *scope*. - -The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. - -### global - -These metrics refer to the monitored host. - -This scope has no labels. - -Metrics: - -| Metric | Dimensions | Unit | -|----------|:----------:|:------:| -| ipmi.sel | events | events | - -### sensor - -These metrics refer to the sensor. - -Labels: - -| Label | Description | -|-----------|-----------------------------------------------------------------------------------------------------------------| -| sensor | Sensor name. Same value as the "Name" column in the `ipmi-sensors` output. | -| type | Sensor type. Same value as the "Type" column in the `ipmi-sensors` output. | -| component | General sensor component. Identified by Netdata based on sensor name and type (e.g. System, Processor, Memory). | - -Metrics: - -| Metric | Dimensions | Unit | -|-----------------------------|:-----------------------------------:|:----------:| -| ipmi.sensor_state | nominal, critical, warning, unknown | state | -| ipmi.sensor_temperature_c | temperature | Celsius | -| ipmi.sensor_temperature_f | temperature | Fahrenheit | -| ipmi.sensor_voltage | voltage | Volts | -| ipmi.sensor_ampere | ampere | Amps | -| ipmi.sensor_fan_speed | rotations | RPM | -| ipmi.sensor_power | power | Watts | -| ipmi.sensor_reading_percent | percentage | % | - -## Alarms - -There are 2 alarms: - -- The sensor is in a warning or critical state. -- System Event Log (SEL) is non-empty. - -## Configuration - -The plugin supports a few options. To see them, run: - -```text -# ./freeipmi.plugin --help - - netdata freeipmi.plugin v1.40.0-137-gf162c25bd - Copyright (C) 2023 Netdata Inc. - Released under GNU General Public License v3 or later. - All rights reserved. - - This program is a data collector plugin for netdata. - - Available command line options: - - SECONDS data collection frequency - minimum: 5 - - debug enable verbose output - default: disabled - - sel - no-sel enable/disable SEL collection - default: enabled - - reread-sdr-cache re-read SDR cache on every iteration - default: disabled - - interpret-oem-data attempt to parse OEM data - default: disabled - - assume-system-event-record - tread illegal SEL events records as normal - default: disabled - - ignore-non-interpretable-sensors - do not read sensors that cannot be interpreted - default: disabled - - bridge-sensors bridge sensors not owned by the BMC - default: disabled - - shared-sensors enable shared sensors, if found - default: disabled - - no-discrete-reading do not read sensors that their event/reading type code is invalid - default: enabled - - ignore-scanning-disabled - Ignore the scanning bit and read sensors no matter what - default: disabled - - assume-bmc-owner assume the BMC is the sensor owner no matter what - (usually bridging is required too) - default: disabled - - hostname HOST - username USER - password PASS connect to remote IPMI host - default: local IPMI processor - - no-auth-code-check - noauthcodecheck don't check the authentication codes returned - - driver-type IPMIDRIVER - Specify the driver type to use instead of doing an auto selection. - The currently available outofband drivers are LAN and LAN_2_0, - which perform IPMI 1.5 and IPMI 2.0 respectively. - The currently available inband drivers are KCS, SSIF, OPENIPMI and SUNBMC. - - sdr-cache-dir PATH directory for SDR cache files - default: /tmp - - sensor-config-file FILE filename to read sensor configuration - default: system default - - sel-config-file FILE filename to read sel configuration - default: system default - - ignore N1,N2,N3,... sensor IDs to ignore - default: none - - ignore-status N1,N2,N3,... sensor IDs to ignore status (nominal/warning/critical) - default: none - - -v - -V - version print version and exit - - Linux kernel module for IPMI is CPU hungry. - On Linux run this to lower kipmiN CPU utilization: - # echo 10 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us - - or create: /etc/modprobe.d/ipmi.conf with these contents: - options ipmi_si kipmid_max_busy_us=10 - - For more information: - https://github.com/netdata/netdata/tree/master/collectors/freeipmi.plugin -``` - -You can set these options in `/etc/netdata/netdata.conf` at this section: - -``` -[plugin:freeipmi] - update every = 5 - command options = -``` - -Append to `command options =` the settings you need. The minimum `update every` is 5 (enforced internally by the -plugin). IPMI is slow and CPU hungry. So, once every 5 seconds is pretty acceptable. - -## Ignoring specific sensors - -Specific sensor IDs can be excluded from freeipmi tools by editing `/etc/freeipmi/freeipmi.conf` and setting the IDs to -be ignored at `ipmi-sensors-exclude-record-ids`. **However this file is not used by `libipmimonitoring`** (the library -used by Netdata's `freeipmi.plugin`). - -So, `freeipmi.plugin` supports the option `ignore` that accepts a comma separated list of sensor IDs to ignore. To -configure it, edit `/etc/netdata/netdata.conf` and set: - -``` -[plugin:freeipmi] - command options = ignore 1,2,3,4,... -``` - -To find the IDs to ignore, run the command `ipmimonitoring`. The first column is the wanted ID: - -``` -ID | Name | Type | State | Reading | Units | Event -1 | Ambient Temp | Temperature | Nominal | 26.00 | C | 'OK' -2 | Altitude | Other Units Based Sensor | Nominal | 480.00 | ft | 'OK' -3 | Avg Power | Current | Nominal | 100.00 | W | 'OK' -4 | Planar 3.3V | Voltage | Nominal | 3.29 | V | 'OK' -5 | Planar 5V | Voltage | Nominal | 4.90 | V | 'OK' -6 | Planar 12V | Voltage | Nominal | 11.99 | V | 'OK' -7 | Planar VBAT | Voltage | Nominal | 2.95 | V | 'OK' -8 | Fan 1A Tach | Fan | Nominal | 3132.00 | RPM | 'OK' -9 | Fan 1B Tach | Fan | Nominal | 2150.00 | RPM | 'OK' -10 | Fan 2A Tach | Fan | Nominal | 2494.00 | RPM | 'OK' -11 | Fan 2B Tach | Fan | Nominal | 1825.00 | RPM | 'OK' -12 | Fan 3A Tach | Fan | Nominal | 3538.00 | RPM | 'OK' -13 | Fan 3B Tach | Fan | Nominal | 2625.00 | RPM | 'OK' -14 | Fan 1 | Entity Presence | Nominal | N/A | N/A | 'Entity Present' -15 | Fan 2 | Entity Presence | Nominal | N/A | N/A | 'Entity Present' -... -``` - -## Debugging - -You can run the plugin by hand: - -```sh -# become user netdata -sudo su -s /bin/sh netdata - -# run the plugin in debug mode -/usr/libexec/netdata/plugins.d/freeipmi.plugin 5 debug -``` - -You will get verbose output on what the plugin does. - -## kipmi0 CPU usage - -There have been reports that kipmi is showing increased CPU when the IPMI is queried. To lower the CPU consumption of -the system you can issue this command: - -```sh -echo 10 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us -``` - -You can also permanently set the above setting by creating the file `/etc/modprobe.d/ipmi.conf` with this content: - -```sh -# prevent kipmi from consuming 100% CPU -options ipmi_si kipmid_max_busy_us=10 -``` - -This instructs the kernel IPMI module to pause for a tick between checking IPMI. Querying IPMI will be a lot slower -now (e.g. several seconds for IPMI to respond), but `kipmi` will not use any noticeable CPU. You can also use a higher -number (this is the number of microseconds to poll IPMI for a response, before waiting for a tick). - -If you need to disable IPMI for Netdata, edit `/etc/netdata/netdata.conf` and set: - -``` -[plugins] - freeipmi = no -``` +integrations/intelligent_platform_management_interface_ipmi.md
\ No newline at end of file diff --git a/collectors/freeipmi.plugin/freeipmi_plugin.c b/collectors/freeipmi.plugin/freeipmi_plugin.c index 56a1c4998..63147d621 100644 --- a/collectors/freeipmi.plugin/freeipmi_plugin.c +++ b/collectors/freeipmi.plugin/freeipmi_plugin.c @@ -1146,7 +1146,7 @@ int netdata_ipmi_detect_speed_secs(struct ipmi_monitoring_ipmi_config *ipmi_conf successful++; if(unlikely(state->debug)) - fprintf(stderr, "%s: %s data collection speed was %llu usec\n", + fprintf(stderr, "%s: %s data collection speed was %"PRIu64" usec\n", program_name, netdata_collect_type_to_string(type), end - start); // add it to our total @@ -1307,7 +1307,7 @@ static size_t send_ipmi_sensor_metrics_to_netdata(struct netdata_ipmi_state *sta if(likely(sn->do_metric)) { if(unlikely(!is_sensor_updated(sn->last_collected_metric_ut, state->updates.now_ut, state->sensors.freq_ut))) { if(unlikely(state->debug)) - fprintf(stderr, "%s: %s() sensor '%s' metric is not UPDATED (last updated %llu, now %llu, freq %llu\n", + fprintf(stderr, "%s: %s() sensor '%s' metric is not UPDATED (last updated %"PRIu64", now %"PRIu64", freq %"PRIu64"\n", program_name, __FUNCTION__, sn->sensor_name, sn->last_collected_metric_ut, state->updates.now_ut, state->sensors.freq_ut); } else { @@ -1360,7 +1360,7 @@ static size_t send_ipmi_sensor_metrics_to_netdata(struct netdata_ipmi_state *sta if(likely(sn->do_state)) { if(unlikely(!is_sensor_updated(sn->last_collected_state_ut, state->updates.now_ut, state->sensors.freq_ut))) { if (unlikely(state->debug)) - fprintf(stderr, "%s: %s() sensor '%s' state is not UPDATED (last updated %llu, now %llu, freq %llu\n", + fprintf(stderr, "%s: %s() sensor '%s' state is not UPDATED (last updated %"PRIu64", now %"PRIu64", freq %"PRIu64"\n", program_name, __FUNCTION__, sn->sensor_name, sn->last_collected_state_ut, state->updates.now_ut, state->sensors.freq_ut); } else { @@ -1450,6 +1450,8 @@ int main (int argc, char **argv) { error_log_errors_per_period = 100; error_log_throttle_period = 3600; + log_set_global_severity_for_external_plugins(); + // initialize the threads netdata_threads_init_for_external_plugins(0); // set the default threads stack size here @@ -1870,7 +1872,7 @@ int main (int argc, char **argv) { send_ipmi_sel_metrics_to_netdata(&state); if(unlikely(debug)) - fprintf(stderr, "%s: iteration %zu, dt %llu usec, sensors ever collected %zu, sensors last collected %zu \n" + fprintf(stderr, "%s: iteration %zu, dt %"PRIu64" usec, sensors ever collected %zu, sensors last collected %zu \n" , program_name , iteration , dt diff --git a/collectors/freeipmi.plugin/integrations/intelligent_platform_management_interface_ipmi.md b/collectors/freeipmi.plugin/integrations/intelligent_platform_management_interface_ipmi.md new file mode 100644 index 000000000..6d894667b --- /dev/null +++ b/collectors/freeipmi.plugin/integrations/intelligent_platform_management_interface_ipmi.md @@ -0,0 +1,274 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/freeipmi.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/freeipmi.plugin/metadata.yaml" +sidebar_label: "Intelligent Platform Management Interface (IPMI)" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Intelligent Platform Management Interface (IPMI) + + +<img src="https://netdata.cloud/img/netdata.png" width="150"/> + + +Plugin: freeipmi.plugin +Module: freeipmi + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +"Monitor enterprise server sensor readings, event log entries, and hardware statuses to ensure reliable server operations." + + +The plugin uses open source library IPMImonitoring to communicate with sensors. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid. + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +Linux kernel module for IPMI can create big overhead. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +The plugin does a speed test when it starts, to find out the duration needed by the IPMI processor to respond. Depending on the speed of your IPMI processor, charts may need several seconds to show up on the dashboard. + + +### Per Intelligent Platform Management Interface (IPMI) instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipmi.sel | events | events | + +### Per sensor + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| sensor | The sensor name | +| type | One of 45 recognized sensor types (Battery, Voltage...) | +| component | One of 25 recognized components (Processor, Peripheral). | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipmi.sensor_state | nominal, critical, warning, unknown | state | +| ipmi.sensor_temperature_c | temperature | Celsius | +| ipmi.sensor_temperature_f | temperature | Fahrenheit | +| ipmi.sensor_voltage | voltage | Volts | +| ipmi.sensor_ampere | ampere | Amps | +| ipmi.sensor_fan_speed | rotations | RPM | +| ipmi.sensor_power | power | Watts | +| ipmi.sensor_reading_percent | percentage | % | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ ipmi_sensor_state ](https://github.com/netdata/netdata/blob/master/health/health.d/ipmi.conf) | ipmi.sensor_state | IPMI sensor ${label:sensor} (${label:component}) state | + + +## Setup + +### Prerequisites + +#### Install freeipmi.plugin + +When using our official DEB/RPM packages, the FreeIPMI plugin is included in a separate package named `netdata-plugin-freeipmi` which needs to be manually installed using your system package manager. It is not installed automatically due to the large number of dependencies it requires. + +When using a static build of Netdata, the FreeIPMI plugin will be included and installed automatically, though you will still need to have FreeIPMI installed on your system to be able to use the plugin. + +When using a local build of Netdata, you need to ensure that the FreeIPMI development packages (typically called `libipmimonitoring-dev`, `libipmimonitoring-devel`, or `freeipmi-devel`) are installed when building Netdata. + + +#### Preliminary actions + +If you have not previously used IPMI on your system, you will probably need to run the `ipmimonitoring` command as root +to initialize IPMI settings so that the Netdata plugin works correctly. It should return information about available sensors on the system. + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:freeipmi]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + +The configuration is set using command line options: + +``` +# netdata.conf +[plugin:freeipmi] + command options = opt1 opt2 ... optN +``` + +To display a help message listing the available command line options: + +```bash +./usr/libexec/netdata/plugins.d/freeipmi.plugin --help +``` + + +<details><summary>Command options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| SECONDS | Data collection frequency. | | False | +| debug | Enable verbose output. | disabled | False | +| no-sel | Disable System Event Log (SEL) collection. | disabled | False | +| reread-sdr-cache | Re-read SDR cache on every iteration. | disabled | False | +| interpret-oem-data | Attempt to parse OEM data. | disabled | False | +| assume-system-event-record | treat illegal SEL events records as normal. | disabled | False | +| ignore-non-interpretable-sensors | Do not read sensors that cannot be interpreted. | disabled | False | +| bridge-sensors | Bridge sensors not owned by the BMC. | disabled | False | +| shared-sensors | Enable shared sensors if found. | disabled | False | +| no-discrete-reading | Do not read sensors if their event/reading type code is invalid. | enabled | False | +| ignore-scanning-disabled | Ignore the scanning bit and read sensors no matter what. | disabled | False | +| assume-bmc-owner | Assume the BMC is the sensor owner no matter what (usually bridging is required too). | disabled | False | +| hostname HOST | Remote IPMI hostname or IP address. | local | False | +| username USER | Username that will be used when connecting to the remote host. | | False | +| password PASS | Password that will be used when connecting to the remote host. | | False | +| noauthcodecheck / no-auth-code-check | Don't check the authentication codes returned. | | False | +| driver-type IPMIDRIVER | Specify the driver type to use instead of doing an auto selection. The currently available outofband drivers are LAN and LAN_2_0, which perform IPMI 1.5 and IPMI 2.0 respectively. The currently available inband drivers are KCS, SSIF, OPENIPMI and SUNBMC. | | False | +| sdr-cache-dir PATH | SDR cache files directory. | /tmp | False | +| sensor-config-file FILE | Sensors configuration filename. | system default | False | +| sel-config-file FILE | SEL configuration filename. | system default | False | +| ignore N1,N2,N3,... | Sensor IDs to ignore. | | False | +| ignore-status N1,N2,N3,... | Sensor IDs to ignore status (nominal/warning/critical). | | False | +| -v | Print version and exit. | | False | +| --help | Print usage message and exit. | | False | + +</details> + +#### Examples + +##### Decrease data collection frequency + +Basic example decreasing data collection frequency. The minimum `update every` is 5 (enforced internally by the plugin). IPMI is slow and CPU hungry. So, once every 5 seconds is pretty acceptable. + +```yaml +[plugin:freeipmi] + update every = 10 + +``` +##### Disable SEL collection + +Append to `command options =` the options you need. + +<details><summary>Config</summary> + +```yaml +[plugin:freeipmi] + command options = no-sel + +``` +</details> + +##### Ignore specific sensors + +Specific sensor IDs can be excluded from freeipmi tools by editing `/etc/freeipmi/freeipmi.conf` and setting the IDs to be ignored at `ipmi-sensors-exclude-record-ids`. + +**However this file is not used by `libipmimonitoring`** (the library used by Netdata's `freeipmi.plugin`). + +To find the IDs to ignore, run the command `ipmimonitoring`. The first column is the wanted ID: + +ID | Name | Type | State | Reading | Units | Event +1 | Ambient Temp | Temperature | Nominal | 26.00 | C | 'OK' +2 | Altitude | Other Units Based Sensor | Nominal | 480.00 | ft | 'OK' +3 | Avg Power | Current | Nominal | 100.00 | W | 'OK' +4 | Planar 3.3V | Voltage | Nominal | 3.29 | V | 'OK' +5 | Planar 5V | Voltage | Nominal | 4.90 | V | 'OK' +6 | Planar 12V | Voltage | Nominal | 11.99 | V | 'OK' +7 | Planar VBAT | Voltage | Nominal | 2.95 | V | 'OK' +8 | Fan 1A Tach | Fan | Nominal | 3132.00 | RPM | 'OK' +9 | Fan 1B Tach | Fan | Nominal | 2150.00 | RPM | 'OK' +10 | Fan 2A Tach | Fan | Nominal | 2494.00 | RPM | 'OK' +11 | Fan 2B Tach | Fan | Nominal | 1825.00 | RPM | 'OK' +12 | Fan 3A Tach | Fan | Nominal | 3538.00 | RPM | 'OK' +13 | Fan 3B Tach | Fan | Nominal | 2625.00 | RPM | 'OK' +14 | Fan 1 | Entity Presence | Nominal | N/A | N/A | 'Entity Present' +15 | Fan 2 | Entity Presence | Nominal | N/A | N/A | 'Entity Present' +... + +`freeipmi.plugin` supports the option `ignore` that accepts a comma separated list of sensor IDs to ignore. To configure it set on `netdata.conf`: + + +<details><summary>Config</summary> + +```yaml +[plugin:freeipmi] + command options = ignore 1,2,3,4,... + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + + + +### kimpi0 CPU usage + + + + diff --git a/collectors/freeipmi.plugin/metadata.yaml b/collectors/freeipmi.plugin/metadata.yaml index 9540410bf..f8c75c2cb 100644 --- a/collectors/freeipmi.plugin/metadata.yaml +++ b/collectors/freeipmi.plugin/metadata.yaml @@ -2,7 +2,7 @@ plugin_name: freeipmi.plugin modules: - meta: plugin_name: freeipmi.plugin - module_name: sensors + module_name: freeipmi monitored_instance: name: Intelligent Platform Management Interface (IPMI) link: "https://en.wikipedia.org/wiki/Intelligent_Platform_Management_Interface" @@ -42,34 +42,225 @@ modules: setup: prerequisites: list: - - title: Preliminary actions + - title: Install freeipmi.plugin description: | - If you have not previously used IPMI on your system, you will probably need to run the ipmimonitoring command as root to initialize IPMI settings so that the Netdata plugin works correctly. It should return information about available sensors on the system. + When using our official DEB/RPM packages, the FreeIPMI plugin is included in a separate package named `netdata-plugin-freeipmi` which needs to be manually installed using your system package manager. It is not installed automatically due to the large number of dependencies it requires. + + When using a static build of Netdata, the FreeIPMI plugin will be included and installed automatically, though you will still need to have FreeIPMI installed on your system to be able to use the plugin. - In some distributions libipmimonitoring.pc is located in a non-standard directory, which can cause building the plugin to fail when building Netdata from source. In that case you should find the file and link it to the standard pkg-config directory. Usually, running sudo ln -s /usr/lib/$(uname -m)-linux-gnu/pkgconfig/libipmimonitoring.pc/libipmimonitoring.pc /usr/lib/pkgconfig/libipmimonitoring.pc resolves this issue. + When using a local build of Netdata, you need to ensure that the FreeIPMI development packages (typically called `libipmimonitoring-dev`, `libipmimonitoring-devel`, or `freeipmi-devel`) are installed when building Netdata. + - title: Preliminary actions + description: | + If you have not previously used IPMI on your system, you will probably need to run the `ipmimonitoring` command as root + to initialize IPMI settings so that the Netdata plugin works correctly. It should return information about available sensors on the system. configuration: file: name: "netdata.conf" - section_name: '[plugin:freeipmi]' - description: "This is netdata main configuration file" + section_name: "[plugin:freeipmi]" options: - description: "This tool receives command line options that are visible when user run: `./usr/libexec/netdata/plugins.d/freeipmi.plugin --help`" + description: | + The configuration is set using command line options: + + ``` + # netdata.conf + [plugin:freeipmi] + command options = opt1 opt2 ... optN + ``` + + To display a help message listing the available command line options: + + ```bash + ./usr/libexec/netdata/plugins.d/freeipmi.plugin --help + ``` folding: - title: "Config options" + title: "Command options" enabled: true list: - - name: command options - description: Variable used to pass arguments for the plugin. - default_value: 1 + - name: SECONDS + description: Data collection frequency. + default_value: "" + required: false + - name: debug + description: Enable verbose output. + default_value: disabled + required: false + - name: no-sel + description: Disable System Event Log (SEL) collection. + default_value: disabled + required: false + - name: reread-sdr-cache + description: Re-read SDR cache on every iteration. + default_value: disabled + required: false + - name: interpret-oem-data + description: Attempt to parse OEM data. + default_value: disabled + required: false + - name: assume-system-event-record + description: treat illegal SEL events records as normal. + default_value: disabled + required: false + - name: ignore-non-interpretable-sensors + description: Do not read sensors that cannot be interpreted. + default_value: disabled + required: false + - name: bridge-sensors + description: Bridge sensors not owned by the BMC. + default_value: disabled + required: false + - name: shared-sensors + description: Enable shared sensors if found. + default_value: disabled + required: false + - name: no-discrete-reading + description: Do not read sensors if their event/reading type code is invalid. + default_value: enabled + required: false + - name: ignore-scanning-disabled + description: Ignore the scanning bit and read sensors no matter what. + default_value: disabled + required: false + - name: assume-bmc-owner + description: Assume the BMC is the sensor owner no matter what (usually bridging is required too). + default_value: disabled + required: false + - name: hostname HOST + description: Remote IPMI hostname or IP address. + default_value: local + required: false + - name: username USER + description: Username that will be used when connecting to the remote host. + default_value: "" + required: false + - name: password PASS + description: Password that will be used when connecting to the remote host. + default_value: "" + required: false + - name: noauthcodecheck / no-auth-code-check + description: Don't check the authentication codes returned. + default_value: "" + required: false + - name: driver-type IPMIDRIVER + description: Specify the driver type to use instead of doing an auto selection. The currently available outofband drivers are LAN and LAN_2_0, which perform IPMI 1.5 and IPMI 2.0 respectively. The currently available inband drivers are KCS, SSIF, OPENIPMI and SUNBMC. + default_value: "" + required: false + - name: sdr-cache-dir PATH + description: SDR cache files directory. + default_value: /tmp + required: false + - name: sensor-config-file FILE + description: Sensors configuration filename. + default_value: system default + required: false + - name: sel-config-file FILE + description: SEL configuration filename. + default_value: system default + required: false + - name: ignore N1,N2,N3,... + description: Sensor IDs to ignore. + default_value: "" + required: false + - name: ignore-status N1,N2,N3,... + description: Sensor IDs to ignore status (nominal/warning/critical). + default_value: "" + required: false + - name: -v + description: Print version and exit. + default_value: "" + required: false + - name: --help + description: Print usage message and exit. + default_value: "" required: false examples: folding: enabled: true - title: "" - list: [] + title: "Config" + list: + - name: Decrease data collection frequency + description: Basic example decreasing data collection frequency. The minimum `update every` is 5 (enforced internally by the plugin). IPMI is slow and CPU hungry. So, once every 5 seconds is pretty acceptable. + config: | + [plugin:freeipmi] + update every = 10 + folding: + enabled: false + - name: Disable SEL collection + description: Append to `command options =` the options you need. + config: | + [plugin:freeipmi] + command options = no-sel + - name: Ignore specific sensors + description: | + Specific sensor IDs can be excluded from freeipmi tools by editing `/etc/freeipmi/freeipmi.conf` and setting the IDs to be ignored at `ipmi-sensors-exclude-record-ids`. + + **However this file is not used by `libipmimonitoring`** (the library used by Netdata's `freeipmi.plugin`). + + To find the IDs to ignore, run the command `ipmimonitoring`. The first column is the wanted ID: + + ID | Name | Type | State | Reading | Units | Event + 1 | Ambient Temp | Temperature | Nominal | 26.00 | C | 'OK' + 2 | Altitude | Other Units Based Sensor | Nominal | 480.00 | ft | 'OK' + 3 | Avg Power | Current | Nominal | 100.00 | W | 'OK' + 4 | Planar 3.3V | Voltage | Nominal | 3.29 | V | 'OK' + 5 | Planar 5V | Voltage | Nominal | 4.90 | V | 'OK' + 6 | Planar 12V | Voltage | Nominal | 11.99 | V | 'OK' + 7 | Planar VBAT | Voltage | Nominal | 2.95 | V | 'OK' + 8 | Fan 1A Tach | Fan | Nominal | 3132.00 | RPM | 'OK' + 9 | Fan 1B Tach | Fan | Nominal | 2150.00 | RPM | 'OK' + 10 | Fan 2A Tach | Fan | Nominal | 2494.00 | RPM | 'OK' + 11 | Fan 2B Tach | Fan | Nominal | 1825.00 | RPM | 'OK' + 12 | Fan 3A Tach | Fan | Nominal | 3538.00 | RPM | 'OK' + 13 | Fan 3B Tach | Fan | Nominal | 2625.00 | RPM | 'OK' + 14 | Fan 1 | Entity Presence | Nominal | N/A | N/A | 'Entity Present' + 15 | Fan 2 | Entity Presence | Nominal | N/A | N/A | 'Entity Present' + ... + + `freeipmi.plugin` supports the option `ignore` that accepts a comma separated list of sensor IDs to ignore. To configure it set on `netdata.conf`: + config: | + [plugin:freeipmi] + command options = ignore 1,2,3,4,... troubleshooting: problems: - list: [] + list: + - name: Debug Mode + description: | + You can run `freeipmi.plugin` with the debug option enabled, to troubleshoot issues with it. The output should give you clues as to why the collector isn't working. + + - Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + + - Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + + - Run the `freeipmi.plugin` in debug mode: + + ```bash + ./freeipmi.plugin 5 debug + ``` + - name: kimpi0 CPU usage + description: | + There have been reports that kipmi is showing increased CPU when the IPMI is queried. To lower the CPU consumption of the system you can issue this command: + + ```sh + echo 10 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us + ``` + + You can also permanently set the above setting by creating the file `/etc/modprobe.d/ipmi.conf` with this content: + + ```sh + # prevent kipmi from consuming 100% CPU + options ipmi_si kipmid_max_busy_us=10 + ``` + + This instructs the kernel IPMI module to pause for a tick between checking IPMI. Querying IPMI will be a lot slower now (e.g. several seconds for IPMI to respond), but `kipmi` will not use any noticeable CPU. + + You can also use a higher number (this is the number of microseconds to poll IPMI for a response, before waiting for a tick). alerts: - name: ipmi_sensor_state link: https://github.com/netdata/netdata/blob/master/health/health.d/ipmi.conf @@ -79,9 +270,20 @@ modules: folding: title: Metrics enabled: false - description: "" + description: | + The plugin does a speed test when it starts, to find out the duration needed by the IPMI processor to respond. Depending on the speed of your IPMI processor, charts may need several seconds to show up on the dashboard. availability: [] scopes: + - name: global + description: These metrics refer to the entire monitored application. + labels: [] + metrics: + - name: ipmi.sel + description: IPMI Events + unit: "events" + chart_type: area + dimensions: + - name: events - name: sensor description: "" labels: @@ -92,12 +294,6 @@ modules: - name: component description: One of 25 recognized components (Processor, Peripheral). metrics: - - name: ipmi.sel - description: IPMI Events - unit: "events" - chart_type: area - dimensions: - - name: events - name: ipmi.sensor_state description: IPMI Sensors State unit: "state" diff --git a/collectors/idlejitter.plugin/README.md b/collectors/idlejitter.plugin/README.md index 9474a2b97..1ce460b62 100644..120000 --- a/collectors/idlejitter.plugin/README.md +++ b/collectors/idlejitter.plugin/README.md @@ -1,36 +1 @@ -<!-- -title: "idlejitter.plugin" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/idlejitter.plugin/README.md" -sidebar_label: "idlejitter.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/QoS" ---> - -# idlejitter.plugin - -Idle jitter is a measure of delays in timing for user processes caused by scheduling limitations. - -## How Netdata measures idle jitter - -A thread is spawned that requests to sleep for 20000 microseconds (20ms). -When the system wakes it up, it measures how many microseconds have passed. -The difference between the requested and the actual duration of the sleep, is the idle jitter. -This is done at most 50 times per second, to ensure we have a good average. - -This number is useful: - -- In multimedia-streaming environments such as VoIP gateways, where the CPU jitter can affect the quality of the service. -- On time servers and other systems that require very precise timing, where CPU jitter can actively interfere with timing precision. -- On gaming systems, where CPU jitter can cause frame drops and stuttering. -- In cloud infrastructure that can pause the VM or container for a small duration to perform operations at the host. - -## Charts - -idlejitter.plugin generates the idlejitter chart which measures CPU idle jitter in milliseconds lost per second. - -## Configuration - -This chart is available without any configuration. - - +integrations/idle_os_jitter.md
\ No newline at end of file diff --git a/collectors/idlejitter.plugin/integrations/idle_os_jitter.md b/collectors/idlejitter.plugin/integrations/idle_os_jitter.md new file mode 100644 index 000000000..da650cde9 --- /dev/null +++ b/collectors/idlejitter.plugin/integrations/idle_os_jitter.md @@ -0,0 +1,117 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/idlejitter.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/idlejitter.plugin/metadata.yaml" +sidebar_label: "Idle OS Jitter" +learn_status: "Published" +learn_rel_path: "Data Collection/Synthetic Checks" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Idle OS Jitter + + +<img src="https://netdata.cloud/img/syslog.png" width="150"/> + + +Plugin: idlejitter.plugin +Module: idlejitter.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor delays in timing for user processes caused by scheduling limitations to optimize the system to run latency sensitive applications with minimal jitter, improving consistency and quality of service. + + +A thread is spawned that requests to sleep for fixed amount of time. When the system wakes it up, it measures how many microseconds have passed. The difference between the requested and the actual duration of the sleep, is the idle jitter. This is done dozens of times per second to ensure we have a representative sample. + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration will run by default on all supported systems. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Idle OS Jitter instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.idlejitter | min, max, average | microseconds lost/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + +This integration only supports a single configuration option, and most users will not need to change it. + + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| loop time in ms | Specifies the target time for the data collection thread to sleep, measured in miliseconds. | 20 | False | + +#### Examples +There are no configuration examples. + + diff --git a/collectors/ioping.plugin/README.md b/collectors/ioping.plugin/README.md index 73fc35fb0..cb660f13b 100644..120000 --- a/collectors/ioping.plugin/README.md +++ b/collectors/ioping.plugin/README.md @@ -1,80 +1 @@ -# Monitor I/O latency using ioping.plugin - -The ioping plugin supports monitoring I/O latency for any number of directories/files/devices, by pinging them with `ioping`. - -A recent version of `ioping` is required (one that supports option `-N`). -The supplied plugin can install it, by running: - -```sh -/usr/libexec/netdata/plugins.d/ioping.plugin install -``` - -The `-e` option can be supplied to indicate where the Netdata environment file is installed. The default path is `/etc/netdata/.environment`. - -The above will download, build and install the right version as `/usr/libexec/netdata/plugins.d/ioping`. - -Then you need to edit `/etc/netdata/ioping.conf` (to edit it on your system run -`/etc/netdata/edit-config ioping.conf`) like this: - -```sh -# uncomment the following line - it should already be there -ioping="/usr/libexec/netdata/plugins.d/ioping" - -# set here the directory/file/device, you need to ping -destination="destination" - -# override the chart update frequency - the default is inherited from Netdata -update_every="1s" - -# the request size in bytes to ping the destination -request_size="4k" - -# other iping options - these are the defaults -ioping_opts="-T 1000000 -R" -``` - -## alarms - -Netdata will automatically attach a few alarms for each host. -Check the [latest versions of the ioping alarms](https://raw.githubusercontent.com/netdata/netdata/master/health/health.d/ioping.conf) - -## Multiple ioping Plugins With Different Settings - -You may need to run multiple ioping plugins with different settings or different end points. -For example, you may need to ping one destination once per 10 seconds, and another once per second. - -Netdata allows you to add as many `ioping` plugins as you like. - -Follow this procedure: - -**1. Create New ioping Configuration File** - -```sh -# Step Into Configuration Directory -cd /etc/netdata - -# Copy Original ioping Configuration File To New Configuration File -cp ioping.conf ioping2.conf -``` - -Edit `ioping2.conf` and set the settings and the destination you need for the seconds instance. - -**2. Soft Link Original ioping Plugin to New Plugin File** - -```sh -# Become root (If The Step Step Is Performed As Non-Root User) -sudo su - -# Step Into The Plugins Directory -cd /usr/libexec/netdata/plugins.d - -# Link ioping.plugin to ioping2.plugin -ln -s ioping.plugin ioping2.plugin -``` - -That's it. Netdata will detect the new plugin and start it. - -You can name the new plugin any name you like. -Just make sure the plugin and the configuration file have the same name. - - +integrations/ioping.md
\ No newline at end of file diff --git a/collectors/ioping.plugin/integrations/ioping.md b/collectors/ioping.plugin/integrations/ioping.md new file mode 100644 index 000000000..4c16d2e3a --- /dev/null +++ b/collectors/ioping.plugin/integrations/ioping.md @@ -0,0 +1,132 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/ioping.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/ioping.plugin/metadata.yaml" +sidebar_label: "IOPing" +learn_status: "Published" +learn_rel_path: "Data Collection/Synthetic Checks" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# IOPing + + +<img src="https://netdata.cloud/img/syslog.png" width="150"/> + + +Plugin: ioping.plugin +Module: ioping.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor IOPing metrics for efficient disk I/O latency tracking. Keep track of read/write speeds, latency, and error rates for optimized disk operations. + +Plugin uses `ioping` command. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per disk + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ioping.latency | latency | microseconds | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ ioping_disk_latency ](https://github.com/netdata/netdata/blob/master/health/health.d/ioping.conf) | ioping.latency | average I/O latency over the last 10 seconds | + + +## Setup + +### Prerequisites + +#### Install ioping + +You can install the command by passing the argument `install` to the plugin (`/usr/libexec/netdata/plugins.d/ioping.plugin install`). + + + +### Configuration + +#### File + +The configuration file name for this integration is `ioping.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config ioping.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 1s | False | +| destination | The directory/file/device to ioping. | | True | +| request_size | The request size in bytes to ioping the destination (symbolic modifiers are supported) | 4k | False | +| ioping_opts | Options passed to `ioping` commands. | -T 1000000 | False | + +</details> + +#### Examples + +##### Basic Configuration + +This example has the minimum configuration necessary to have the plugin running. + +<details><summary>Config</summary> + +```yaml +destination="/dev/sda" + +``` +</details> + + diff --git a/collectors/ioping.plugin/ioping.plugin.in b/collectors/ioping.plugin/ioping.plugin.in index 1d79eb706..d1283bad9 100755 --- a/collectors/ioping.plugin/ioping.plugin.in +++ b/collectors/ioping.plugin/ioping.plugin.in @@ -96,6 +96,21 @@ fi PROGRAM_NAME="$(basename "${0}")" +LOG_LEVEL_ERR=1 +LOG_LEVEL_WARN=2 +LOG_LEVEL_INFO=3 +LOG_LEVEL="$LOG_LEVEL_INFO" + +set_log_severity_level() { + case ${NETDATA_LOG_SEVERITY_LEVEL,,} in + "info") LOG_LEVEL="$LOG_LEVEL_INFO";; + "warn" | "warning") LOG_LEVEL="$LOG_LEVEL_WARN";; + "err" | "error") LOG_LEVEL="$LOG_LEVEL_ERR";; + esac +} + +set_log_severity_level + logdate() { date "+%Y-%m-%d %H:%M:%S" } @@ -108,18 +123,21 @@ log() { } +info() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_INFO" -gt "$LOG_LEVEL" ]] && return + log INFO "${@}" +} + warning() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_WARN" -gt "$LOG_LEVEL" ]] && return log WARNING "${@}" } error() { + [[ -n "$LOG_LEVEL" && "$LOG_LEVEL_ERR" -gt "$LOG_LEVEL" ]] && return log ERROR "${@}" } -info() { - log INFO "${@}" -} - fatal() { log FATAL "${@}" echo "DISABLE" diff --git a/collectors/macos.plugin/README.md b/collectors/macos.plugin/README.md index 509e22edc..2ea6842e4 100644..120000 --- a/collectors/macos.plugin/README.md +++ b/collectors/macos.plugin/README.md @@ -1,16 +1 @@ -<!-- -title: "macos.plugin" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/macos.plugin/README.md" -sidebar_label: "macos.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/System metrics" ---> - -# macos.plugin - -Collects resource usage and performance data on macOS systems - -By default, Netdata will enable monitoring metrics for disks, memory, and network only when they are not zero. If they are constantly zero they are ignored. Metrics that will start having values, after Netdata is started, will be detected and charts will be automatically added to the dashboard (a refresh of the dashboard is needed for them to appear though). Use `yes` instead of `auto` in plugin configuration sections to enable these charts permanently. You can also set the `enable zero metrics` option to `yes` in the `[global]` section which enables charts with zero metrics for all internal Netdata plugins. - - +integrations/macos.md
\ No newline at end of file diff --git a/collectors/macos.plugin/integrations/macos.md b/collectors/macos.plugin/integrations/macos.md new file mode 100644 index 000000000..e5c54c3dc --- /dev/null +++ b/collectors/macos.plugin/integrations/macos.md @@ -0,0 +1,285 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/macos.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/macos.plugin/metadata.yaml" +sidebar_label: "macOS" +learn_status: "Published" +learn_rel_path: "Data Collection/macOS Systems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# macOS + + +<img src="https://netdata.cloud/img/macos.svg" width="150"/> + + +Plugin: macos.plugin +Module: mach_smi + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor macOS metrics for efficient operating system performance. + +The plugin uses three different methods to collect data: + - The function `sysctlbyname` is called to collect network, swap, loadavg, and boot time. + - The functtion `host_statistic` is called to collect CPU and Virtual memory data; + - The function `IOServiceGetMatchingServices` to collect storage information. + + +This collector is only supported on the following platforms: + +- macOS + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per macOS instance + +These metrics refer to hardware and network monitoring. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.cpu | user, nice, system, idle | percentage | +| system.ram | active, wired, throttled, compressor, inactive, purgeable, speculative, free | MiB | +| mem.swapio | io, out | KiB/s | +| mem.pgfaults | memory, cow, pagein, pageout, compress, decompress, zero_fill, reactivate, purge | faults/s | +| system.load | load1, load5, load15 | load | +| mem.swap | free, used | MiB | +| system.ipv4 | received, sent | kilobits/s | +| ipv4.tcppackets | received, sent | packets/s | +| ipv4.tcperrors | InErrs, InCsumErrors, RetransSegs | packets/s | +| ipv4.tcphandshake | EstabResets, ActiveOpens, PassiveOpens, AttemptFails | events/s | +| ipv4.tcpconnaborts | baddata, userclosed, nomemory, timeout | connections/s | +| ipv4.tcpofo | inqueue | packets/s | +| ipv4.tcpsyncookies | received, sent, failed | packets/s | +| ipv4.ecnpkts | CEP, NoECTP | packets/s | +| ipv4.udppackets | received, sent | packets/s | +| ipv4.udperrors | RcvbufErrors, InErrors, NoPorts, InCsumErrors, IgnoredMulti | events/s | +| ipv4.icmp | received, sent | packets/s | +| ipv4.icmp_errors | InErrors, OutErrors, InCsumErrors | packets/s | +| ipv4.icmpmsg | InEchoReps, OutEchoReps, InEchos, OutEchos | packets/s | +| ipv4.packets | received, sent, forwarded, delivered | packets/s | +| ipv4.fragsout | ok, failed, created | packets/s | +| ipv4.fragsin | ok, failed, all | packets/s | +| ipv4.errors | InDiscards, OutDiscards, InHdrErrors, OutNoRoutes, InAddrErrors, InUnknownProtos | packets/s | +| ipv6.packets | received, sent, forwarded, delivers | packets/s | +| ipv6.fragsout | ok, failed, all | packets/s | +| ipv6.fragsin | ok, failed, timeout, all | packets/s | +| ipv6.errors | InDiscards, OutDiscards, InHdrErrors, InAddrErrors, InTruncatedPkts, InNoRoutes, OutNoRoutes | packets/s | +| ipv6.icmp | received, sent | messages/s | +| ipv6.icmpredir | received, sent | redirects/s | +| ipv6.icmperrors | InErrors, OutErrors, InCsumErrors, InDestUnreachs, InPktTooBigs, InTimeExcds, InParmProblems, OutDestUnreachs, OutTimeExcds, OutParmProblems | errors/s | +| ipv6.icmpechos | InEchos, OutEchos, InEchoReplies, OutEchoReplies | messages/s | +| ipv6.icmprouter | InSolicits, OutSolicits, InAdvertisements, OutAdvertisements | messages/s | +| ipv6.icmpneighbor | InSolicits, OutSolicits, InAdvertisements, OutAdvertisements | messages/s | +| ipv6.icmptypes | InType1, InType128, InType129, InType136, OutType1, OutType128, OutType129, OutType133, OutType135, OutType143 | messages/s | +| system.uptime | uptime | seconds | +| system.io | in, out | KiB/s | + +### Per disk + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| disk.io | read, writes | KiB/s | +| disk.ops | read, writes | operations/s | +| disk.util | utilization | % of time working | +| disk.iotime | reads, writes | milliseconds/s | +| disk.await | reads, writes | milliseconds/operation | +| disk.avgsz | reads, writes | KiB/operation | +| disk.svctm | svctm | milliseconds/operation | + +### Per mount point + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| disk.space | avail, used, reserved_for_root | GiB | +| disk.inodes | avail, used, reserved_for_root | inodes | + +### Per network device + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| net.net | received, sent | kilobits/s | +| net.packets | received, sent, multicast_received, multicast_sent | packets/s | +| net.errors | inbound, outbound | errors/s | +| net.drops | inbound | drops/s | +| net.events | frames, collisions, carrier | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ interface_speed ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.net | network interface ${label:device} current speed | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + +There are three sections in the file which you can configure: + +- `[plugin:macos:sysctl]` - Enable or disable monitoring for network, swap, loadavg, and boot time. +- `[plugin:macos:mach_smi]` - Enable or disable monitoring for CPU and Virtual memory. +- `[plugin:macos:iokit]` - Enable or disable monitoring for storage device. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| enable load average | Enable or disable monitoring of load average metrics (load1, load5, load15). | yes | False | +| system swap | Enable or disable monitoring of system swap metrics (free, used). | yes | False | +| bandwidth | Enable or disable monitoring of network bandwidth metrics (received, sent). | yes | False | +| ipv4 TCP packets | Enable or disable monitoring of IPv4 TCP total packets metrics (received, sent). | yes | False | +| ipv4 TCP errors | Enable or disable monitoring of IPv4 TCP packets metrics (Input Errors, Checksum, Retransmission segments). | yes | False | +| ipv4 TCP handshake issues | Enable or disable monitoring of IPv4 TCP handshake metrics (Established Resets, Active Opens, Passive Opens, Attempt Fails). | yes | False | +| ECN packets | Enable or disable monitoring of ECN statistics metrics (InCEPkts, InNoECTPkts). | auto | False | +| TCP SYN cookies | Enable or disable monitoring of TCP SYN cookies metrics (received, sent, failed). | auto | False | +| TCP out-of-order queue | Enable or disable monitoring of TCP out-of-order queue metrics (inqueue). | auto | False | +| TCP connection aborts | Enable or disable monitoring of TCP connection aborts metrics (Bad Data, User closed, No memory, Timeout). | auto | False | +| ipv4 UDP packets | Enable or disable monitoring of ipv4 UDP packets metrics (sent, received.). | yes | False | +| ipv4 UDP errors | Enable or disable monitoring of ipv4 UDP errors metrics (Recieved Buffer error, Input Errors, No Ports, IN Checksum Errors, Ignore Multi). | yes | False | +| ipv4 icmp packets | Enable or disable monitoring of IPv4 ICMP packets metrics (sent, received, in error, OUT error, IN Checksum error). | yes | False | +| ipv4 icmp messages | Enable or disable monitoring of ipv4 ICMP messages metrics (I/O messages, I/O Errors, In Checksum). | yes | False | +| ipv4 packets | Enable or disable monitoring of ipv4 packets metrics (received, sent, forwarded, delivered). | yes | False | +| ipv4 fragments sent | Enable or disable monitoring of IPv4 fragments sent metrics (ok, fails, creates). | yes | False | +| ipv4 fragments assembly | Enable or disable monitoring of IPv4 fragments assembly metrics (ok, failed, all). | yes | False | +| ipv4 errors | Enable or disable monitoring of IPv4 errors metrics (I/O discard, I/O HDR errors, In Addr errors, In Unknown protos, OUT No Routes). | yes | False | +| ipv6 packets | Enable or disable monitoring of IPv6 packets metrics (received, sent, forwarded, delivered). | auto | False | +| ipv6 fragments sent | Enable or disable monitoring of IPv6 fragments sent metrics (ok, failed, all). | auto | False | +| ipv6 fragments assembly | Enable or disable monitoring of IPv6 fragments assembly metrics (ok, failed, timeout, all). | auto | False | +| ipv6 errors | Enable or disable monitoring of IPv6 errors metrics (I/O Discards, In Hdr Errors, In Addr Errors, In Truncaedd Packets, I/O No Routes). | auto | False | +| icmp | Enable or disable monitoring of ICMP metrics (sent, received). | auto | False | +| icmp redirects | Enable or disable monitoring of ICMP redirects metrics (received, sent). | auto | False | +| icmp errors | Enable or disable monitoring of ICMP metrics (I/O Errors, In Checksums, In Destination Unreachable, In Packet too big, In Time Exceeds, In Parm Problem, Out Dest Unreachable, Out Timee Exceeds, Out Parm Problems.). | auto | False | +| icmp echos | Enable or disable monitoring of ICMP echos metrics (I/O Echos, I/O Echo Reply). | auto | False | +| icmp router | Enable or disable monitoring of ICMP router metrics (I/O Solicits, I/O Advertisements). | auto | False | +| icmp neighbor | Enable or disable monitoring of ICMP neighbor metrics (I/O Solicits, I/O Advertisements). | auto | False | +| icmp types | Enable or disable monitoring of ICMP types metrics (I/O Type1, I/O Type128, I/O Type129, Out Type133, Out Type135, In Type136, Out Type145). | auto | False | +| space usage for all disks | Enable or disable monitoring of space usage for all disks metrics (available, used, reserved for root). | yes | False | +| inodes usage for all disks | Enable or disable monitoring of inodes usage for all disks metrics (available, used, reserved for root). | yes | False | +| bandwidth | Enable or disable monitoring of bandwidth metrics (received, sent). | yes | False | +| system uptime | Enable or disable monitoring of system uptime metrics (uptime). | yes | False | +| cpu utilization | Enable or disable monitoring of CPU utilization metrics (user, nice, system, idel). | yes | False | +| system ram | Enable or disable monitoring of system RAM metrics (Active, Wired, throttled, compressor, inactive, purgeable, speculative, free). | yes | False | +| swap i/o | Enable or disable monitoring of SWAP I/O metrics (I/O Swap). | yes | False | +| memory page faults | Enable or disable monitoring of memory page faults metrics (memory, cow, I/O page, compress, decompress, zero fill, reactivate, purge). | yes | False | +| disk i/o | Enable or disable monitoring of disk I/O metrics (In, Out). | yes | False | + +</details> + +#### Examples + +##### Disable swap monitoring. + +A basic example that discards swap monitoring + +<details><summary>Config</summary> + +```yaml +[plugin:macos:sysctl] + system swap = no +[plugin:macos:mach_smi] + swap i/o = no + +``` +</details> + +##### Disable complete Machine SMI section. + +A basic example that discards swap monitoring + +<details><summary>Config</summary> + +```yaml +[plugin:macos:mach_smi] + cpu utilization = no + system ram = no + swap i/o = no + memory page faults = no + disk i/o = no + +``` +</details> + + diff --git a/collectors/nfacct.plugin/README.md b/collectors/nfacct.plugin/README.md index ae6597a40..ea320d139 100644..120000 --- a/collectors/nfacct.plugin/README.md +++ b/collectors/nfacct.plugin/README.md @@ -1,63 +1 @@ -<!-- -title: "Monitor Netfilter statistics (nfacct.plugin)" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/nfacct.plugin/README.md" -sidebar_label: "Netfilter statistics (nfacct.plugin)" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Networking" ---> - -# Monitor Netfilter statistics (nfacct.plugin) - -`nfacct.plugin` collects Netfilter statistics. - -## Prerequisites - -If you are using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), install the -`netdata-plugin-nfacct` package using your system package manager. - -If you built Netdata locally: - -1. install `libmnl-dev` and `libnetfilter-acct-dev` using the package manager of your system. - -2. re-install Netdata from source. The installer will detect that the required libraries are now available and will also build `netdata.plugin`. - -Keep in mind that NFACCT requires root access, so the plugin is setuid to root. - -## Charts - -The plugin provides Netfilter connection tracker statistics and nfacct packet and bandwidth accounting: - -Connection tracker: - -1. Connections. -2. Changes. -3. Expectations. -4. Errors. -5. Searches. - -Netfilter accounting: - -1. Packets. -2. Bandwidth. - -## Configuration - -If you need to disable NFACCT for Netdata, edit /etc/netdata/netdata.conf and set: - -``` -[plugins] - nfacct = no -``` - -## Debugging - -You can run the plugin by hand: - -``` -sudo /usr/libexec/netdata/plugins.d/nfacct.plugin 1 debug -``` - -You will get verbose output on what the plugin does. - - +integrations/netfilter.md
\ No newline at end of file diff --git a/collectors/nfacct.plugin/integrations/netfilter.md b/collectors/nfacct.plugin/integrations/netfilter.md new file mode 100644 index 000000000..616e29e97 --- /dev/null +++ b/collectors/nfacct.plugin/integrations/netfilter.md @@ -0,0 +1,131 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/nfacct.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/nfacct.plugin/metadata.yaml" +sidebar_label: "Netfilter" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Firewall" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Netfilter + + +<img src="https://netdata.cloud/img/netfilter.png" width="150"/> + + +Plugin: nfacct.plugin +Module: nfacct.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Netfilter metrics for optimal packet filtering and manipulation. Keep tabs on packet counts, dropped packets, and error rates to secure network operations. + +Netdata uses libmnl (https://www.netfilter.org/projects/libmnl/index.html) to collect information. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +This plugin needs setuid. + +### Default Behavior + +#### Auto-Detection + +This plugin uses socket to connect with netfilter to collect data + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Netfilter instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| netfilter.netlink_new | new, ignore, invalid | connections/s | +| netfilter.netlink_changes | insert, delete, delete_list | changes/s | +| netfilter.netlink_search | searched, search_restart, found | searches/s | +| netfilter.netlink_errors | icmp_error, insert_failed, drop, early_drop | events/s | +| netfilter.netlink_expect | created, deleted, new | expectations/s | +| netfilter.nfacct_packets | a dimension per nfacct object | packets/s | +| netfilter.nfacct_bytes | a dimension per nfacct object | kilobytes/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Install required packages + +Install `libmnl-dev` and `libnetfilter-acct-dev` using the package manager of your system. + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:nfacct]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| command options | Additinal parameters for collector | | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/nfacct.plugin/plugin_nfacct.c b/collectors/nfacct.plugin/plugin_nfacct.c index 430ceab52..a788d1a03 100644 --- a/collectors/nfacct.plugin/plugin_nfacct.c +++ b/collectors/nfacct.plugin/plugin_nfacct.c @@ -18,6 +18,8 @@ #define NETDATA_CHART_PRIO_NETFILTER_PACKETS 8906 #define NETDATA_CHART_PRIO_NETFILTER_BYTES 8907 +#define NFACCT_RESTART_EVERY_SECONDS 86400 // restart the plugin every this many seconds + static inline size_t mnl_buffer_size() { long s = MNL_SOCKET_BUFFER_SIZE; if(s <= 0) return 8192; @@ -760,6 +762,8 @@ int main(int argc, char **argv) { error_log_errors_per_period = 100; error_log_throttle_period = 3600; + log_set_global_severity_for_external_plugins(); + // ------------------------------------------------------------------------ // parse command line parameters @@ -852,7 +856,7 @@ int main(int argc, char **argv) { if(unlikely(netdata_exit)) break; if(debug && iteration) - fprintf(stderr, "nfacct.plugin: iteration %zu, dt %llu usec\n" + fprintf(stderr, "nfacct.plugin: iteration %zu, dt %"PRIu64" usec\n" , iteration , dt ); @@ -879,9 +883,11 @@ int main(int argc, char **argv) { fflush(stdout); - // restart check (14400 seconds) - if(now_monotonic_sec() - started_t > 14400) break; + if (now_monotonic_sec() - started_t > NFACCT_RESTART_EVERY_SECONDS) { + collector_info("NFACCT reached my lifetime expectancy. Exiting to restart."); + fprintf(stdout, "EXIT\n"); + fflush(stdout); + exit(0); + } } - - collector_info("NFACCT process exiting"); } diff --git a/collectors/perf.plugin/README.md b/collectors/perf.plugin/README.md index a8bd4b0e5..fb8a0cd69 100644..120000 --- a/collectors/perf.plugin/README.md +++ b/collectors/perf.plugin/README.md @@ -1,87 +1 @@ -<!-- -title: "Monitor CPU performance statistics (perf.plugin)" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/perf.plugin/README.md" -sidebar_label: "CPU performance statistics (perf.plugin)" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/System metrics" ---> - -# Monitor CPU performance statistics (perf.plugin) - -`perf.plugin` collects system-wide CPU performance statistics from Performance Monitoring Units (PMU) using -the `perf_event_open()` system call. - -## Important Notes - -If you are using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), you will need to install -the `netdata-plugin-perf` package using your system package manager. - -Accessing hardware PMUs requires root permissions, so the plugin is setuid to root. - -Keep in mind that the number of PMUs in a system is usually quite limited and every hardware monitoring -event for every CPU core needs a separate file descriptor to be opened. - -## Charts - -The plugin provides statistics for general hardware and software performance monitoring events: - -Hardware events: - -1. CPU cycles -2. Instructions -3. Branch instructions -4. Cache operations -5. BUS cycles -6. Stalled frontend and backend cycles - -Software events: - -1. CPU migrations -2. Alignment faults -3. Emulation faults - -Hardware cache events: - -1. L1D cache operations -2. L1D prefetch cache operations -3. L1I cache operations -4. LL cache operations -5. DTLB cache operations -6. ITLB cache operations -7. PBU cache operations - -## Configuration - -The plugin is disabled by default because the number of PMUs is usually quite limited and it is not desired to -allow Netdata to struggle silently for PMUs, interfering with other performance monitoring software. If you need to -enable the perf plugin, edit /etc/netdata/netdata.conf and set: - -```raw -[plugins] - perf = yes -``` - -```raw -[plugin:perf] - update every = 1 - command options = all -``` - -You can use the `command options` parameter to pick what data should be collected and which charts should be -displayed. If `all` is used, all general performance monitoring counters are probed and corresponding charts -are enabled for the available counters. You can also define a particular set of enabled charts using the -following keywords: `cycles`, `instructions`, `branch`, `cache`, `bus`, `stalled`, `migrations`, `alignment`, -`emulation`, `L1D`, `L1D-prefetch`, `L1I`, `LL`, `DTLB`, `ITLB`, `PBU`. - -## Debugging - -You can run the plugin by hand: - -```raw -sudo /usr/libexec/netdata/plugins.d/perf.plugin 1 all debug -``` - -You will get verbose output on what the plugin does. - - +integrations/cpu_performance.md
\ No newline at end of file diff --git a/collectors/perf.plugin/integrations/cpu_performance.md b/collectors/perf.plugin/integrations/cpu_performance.md new file mode 100644 index 000000000..a4adeb1ca --- /dev/null +++ b/collectors/perf.plugin/integrations/cpu_performance.md @@ -0,0 +1,191 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/perf.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/perf.plugin/metadata.yaml" +sidebar_label: "CPU performance" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# CPU performance + + +<img src="https://netdata.cloud/img/bolt.svg" width="150"/> + + +Plugin: perf.plugin +Module: perf.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors CPU performance metrics about cycles, instructions, migrations, cache operations and more. + +It uses syscall (2) to open a file descriptior to monitor the perf events. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +It needs setuid to use necessary syscall to collect perf events. Netada sets the permission during installation time. + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per CPU performance instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| perf.cpu_cycles | cpu, ref_cpu | cycles/s | +| perf.instructions | instructions | instructions/s | +| perf.instructions_per_cycle | ipc | instructions/cycle | +| perf.branch_instructions | instructions, misses | instructions/s | +| perf.cache | references, misses | operations/s | +| perf.bus_cycles | bus | cycles/s | +| perf.stalled_cycles | frontend, backend | cycles/s | +| perf.migrations | migrations | migrations | +| perf.alignment_faults | faults | faults | +| perf.emulation_faults | faults | faults | +| perf.l1d_cache | read_access, read_misses, write_access, write_misses | events/s | +| perf.l1d_cache_prefetch | prefetches | prefetches/s | +| perf.l1i_cache | read_access, read_misses | events/s | +| perf.ll_cache | read_access, read_misses, write_access, write_misses | events/s | +| perf.dtlb_cache | read_access, read_misses, write_access, write_misses | events/s | +| perf.itlb_cache | read_access, read_misses | events/s | +| perf.pbu_cache | read_access | events/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Install perf plugin + +If you are [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure the `netdata-plugin-perf` package is installed. + + +#### Enable the pref plugin + +The plugin is disabled by default because the number of PMUs is usually quite limited and it is not desired to allow Netdata to struggle silently for PMUs, interfering with other performance monitoring software. + +To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `netdata.conf` file. + +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config netdata.conf +``` + +Change the value of the `perf` setting to `yes` in the `[plugins]` section. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:perf]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + +You can get the available options running: + +```bash +/usr/libexec/netdata/plugins.d/perf.plugin --help +```` + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| command options | Command options that specify charts shown by plugin. `cycles`, `instructions`, `branch`, `cache`, `bus`, `stalled`, `migrations`, `alignment`, `emulation`, `L1D`, `L1D-prefetch`, `L1I`, `LL`, `DTLB`, `ITLB`, `PBU`. | 1 | True | + +</details> + +#### Examples + +##### All metrics + +Monitor all metrics available. + +```yaml +[plugin:perf] + command options = all + +``` +##### CPU cycles + +Monitor CPU cycles. + +<details><summary>Config</summary> + +```yaml +[plugin:perf] + command options = cycles + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + + + + diff --git a/collectors/perf.plugin/metadata.yaml b/collectors/perf.plugin/metadata.yaml index d7539b502..eada3351d 100644 --- a/collectors/perf.plugin/metadata.yaml +++ b/collectors/perf.plugin/metadata.yaml @@ -40,7 +40,22 @@ modules: description: "" setup: prerequisites: - list: [] + list: + - title: Install perf plugin + description: | + If you are [using our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/UPDATE.md#determine-which-installation-method-you-used), make sure the `netdata-plugin-perf` package is installed. + - title: Enable the pref plugin + description: | + The plugin is disabled by default because the number of PMUs is usually quite limited and it is not desired to allow Netdata to struggle silently for PMUs, interfering with other performance monitoring software. + + To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `netdata.conf` file. + + ```bash + cd /etc/netdata # Replace this path with your Netdata config directory, if different + sudo ./edit-config netdata.conf + ``` + + Change the value of the `perf` setting to `yes` in the `[plugins]` section. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. configuration: file: name: "netdata.conf" @@ -49,7 +64,7 @@ modules: options: description: | You can get the available options running: - + ```bash /usr/libexec/netdata/plugins.d/perf.plugin --help ```` @@ -62,7 +77,7 @@ modules: default_value: 1 required: false - name: command options - description: Command options that specify charts shown by plugin. + description: Command options that specify charts shown by plugin. `cycles`, `instructions`, `branch`, `cache`, `bus`, `stalled`, `migrations`, `alignment`, `emulation`, `L1D`, `L1D-prefetch`, `L1I`, `LL`, `DTLB`, `ITLB`, `PBU`. default_value: 1 required: true examples: @@ -84,7 +99,28 @@ modules: command options = cycles troubleshooting: problems: - list: [] + list: + - name: Debug Mode + description: | + You can run `perf.plugin` with the debug option enabled, to troubleshoot issues with it. The output should give you clues as to why the collector isn't working. + + - Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + + - Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + + - Run the `perf.plugin` in debug mode: + + ```bash + ./perf.plugin 1 all debug + ``` alerts: [] metrics: folding: diff --git a/collectors/perf.plugin/perf_plugin.c b/collectors/perf.plugin/perf_plugin.c index 68c0f917d..31dae03e5 100644 --- a/collectors/perf.plugin/perf_plugin.c +++ b/collectors/perf.plugin/perf_plugin.c @@ -1298,6 +1298,8 @@ int main(int argc, char **argv) { error_log_errors_per_period = 100; error_log_throttle_period = 3600; + log_set_global_severity_for_external_plugins(); + parse_command_line(argc, argv); errno = 0; @@ -1328,7 +1330,7 @@ int main(int argc, char **argv) { if(unlikely(netdata_exit)) break; if(unlikely(debug && iteration)) - fprintf(stderr, "perf.plugin: iteration %zu, dt %llu usec\n" + fprintf(stderr, "perf.plugin: iteration %zu, dt %"PRIu64" usec\n" , iteration , dt ); diff --git a/collectors/plugins.d/README.md b/collectors/plugins.d/README.md index 1c3b50cb7..0752d389b 100644 --- a/collectors/plugins.d/README.md +++ b/collectors/plugins.d/README.md @@ -14,20 +14,20 @@ from external processes, thus allowing Netdata to use **external plugins**. ## Provided External Plugins -|plugin|language|O/S|description| -|:----:|:------:|:-:|:----------| -|[apps.plugin](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/README.md)|`C`|linux, freebsd|monitors the whole process tree on Linux and FreeBSD and breaks down system resource usage by **process**, **user** and **user group**.| -|[charts.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/README.md)|`BASH`|all|a **plugin orchestrator** for data collection modules written in `BASH` v4+.| -|[cups.plugin](https://github.com/netdata/netdata/blob/master/collectors/cups.plugin/README.md)|`C`|all|monitors **CUPS**| -|[ebpf.plugin](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md)|`C`|linux|monitors different metrics on environments using kernel internal functions.| -|[go.d.plugin](https://github.com/netdata/go.d.plugin/blob/master/README.md)|`GO`|all|collects metrics from the system, applications, or third-party APIs.| -|[ioping.plugin](https://github.com/netdata/netdata/blob/master/collectors/ioping.plugin/README.md)|`C`|all|measures disk latency.| -|[freeipmi.plugin](https://github.com/netdata/netdata/blob/master/collectors/freeipmi.plugin/README.md)|`C`|linux|collects metrics from enterprise hardware sensors, on Linux servers.| -|[nfacct.plugin](https://github.com/netdata/netdata/blob/master/collectors/nfacct.plugin/README.md)|`C`|linux|collects netfilter firewall, connection tracker and accounting metrics using `libmnl` and `libnetfilter_acct`.| -|[xenstat.plugin](https://github.com/netdata/netdata/blob/master/collectors/xenstat.plugin/README.md)|`C`|linux|collects XenServer and XCP-ng metrics using `lxenstat`.| -|[perf.plugin](https://github.com/netdata/netdata/blob/master/collectors/perf.plugin/README.md)|`C`|linux|collects CPU performance metrics using performance monitoring units (PMU).| -|[python.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md)|`python`|all|a **plugin orchestrator** for data collection modules written in `python` v2 or v3 (both are supported).| -|[slabinfo.plugin](https://github.com/netdata/netdata/blob/master/collectors/slabinfo.plugin/README.md)|`C`|linux|collects kernel internal cache objects (SLAB) metrics.| +| plugin | language | O/S | description | +|:------------------------------------------------------------------------------------------------------:|:--------:|:--------------:|:----------------------------------------------------------------------------------------------------------------------------------------| +| [apps.plugin](https://github.com/netdata/netdata/blob/master/collectors/apps.plugin/README.md) | `C` | linux, freebsd | monitors the whole process tree on Linux and FreeBSD and breaks down system resource usage by **process**, **user** and **user group**. | +| [charts.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/README.md) | `BASH` | all | a **plugin orchestrator** for data collection modules written in `BASH` v4+. | +| [cups.plugin](https://github.com/netdata/netdata/blob/master/collectors/cups.plugin/README.md) | `C` | all | monitors **CUPS** | +| [ebpf.plugin](https://github.com/netdata/netdata/blob/master/collectors/ebpf.plugin/README.md) | `C` | linux | monitors different metrics on environments using kernel internal functions. | +| [go.d.plugin](https://github.com/netdata/go.d.plugin/blob/master/README.md) | `GO` | all | collects metrics from the system, applications, or third-party APIs. | +| [ioping.plugin](https://github.com/netdata/netdata/blob/master/collectors/ioping.plugin/README.md) | `C` | all | measures disk latency. | +| [freeipmi.plugin](https://github.com/netdata/netdata/blob/master/collectors/freeipmi.plugin/README.md) | `C` | linux | collects metrics from enterprise hardware sensors, on Linux servers. | +| [nfacct.plugin](https://github.com/netdata/netdata/blob/master/collectors/nfacct.plugin/README.md) | `C` | linux | collects netfilter firewall, connection tracker and accounting metrics using `libmnl` and `libnetfilter_acct`. | +| [xenstat.plugin](https://github.com/netdata/netdata/blob/master/collectors/xenstat.plugin/README.md) | `C` | linux | collects XenServer and XCP-ng metrics using `lxenstat`. | +| [perf.plugin](https://github.com/netdata/netdata/blob/master/collectors/perf.plugin/README.md) | `C` | linux | collects CPU performance metrics using performance monitoring units (PMU). | +| [python.d.plugin](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md) | `python` | all | a **plugin orchestrator** for data collection modules written in `python` v2 or v3 (both are supported). | +| [slabinfo.plugin](https://github.com/netdata/netdata/blob/master/collectors/slabinfo.plugin/README.md) | `C` | linux | collects kernel internal cache objects (SLAB) metrics. | Plugin orchestrators may also be described as **modular plugins**. They are modular since they accept custom made modules to be included. Writing modules for these plugins is easier than accessing the native Netdata API directly. You will find modules already available for each orchestrator under the directory of the particular modular plugin (e.g. under python.d.plugin for the python orchestrator). Each of these modular plugins has each own methods for defining modules. Please check the examples and their documentation. @@ -154,18 +154,18 @@ every 5 seconds. There are a few environment variables that are set by `netdata` and are available for the plugin to use. -|variable|description| -|:------:|:----------| -|`NETDATA_USER_CONFIG_DIR`|The directory where all Netdata-related user configuration should be stored. If the plugin requires custom user configuration, this is the place the user has saved it (normally under `/etc/netdata`).| -|`NETDATA_STOCK_CONFIG_DIR`|The directory where all Netdata -related stock configuration should be stored. If the plugin is shipped with configuration files, this is the place they can be found (normally under `/usr/lib/netdata/conf.d`).| -|`NETDATA_PLUGINS_DIR`|The directory where all Netdata plugins are stored.| -|`NETDATA_USER_PLUGINS_DIRS`|The list of directories where custom plugins are stored.| -|`NETDATA_WEB_DIR`|The directory where the web files of Netdata are saved.| -|`NETDATA_CACHE_DIR`|The directory where the cache files of Netdata are stored. Use this directory if the plugin requires a place to store data. A new directory should be created for the plugin for this purpose, inside this directory.| -|`NETDATA_LOG_DIR`|The directory where the log files are stored. By default the `stderr` output of the plugin will be saved in the `error.log` file of Netdata.| -|`NETDATA_HOST_PREFIX`|This is used in environments where system directories like `/sys` and `/proc` have to be accessed at a different path.| -|`NETDATA_DEBUG_FLAGS`|This is a number (probably in hex starting with `0x`), that enables certain Netdata debugging features. Check **\[[Tracing Options]]** for more information.| -|`NETDATA_UPDATE_EVERY`|The minimum number of seconds between chart refreshes. This is like the **internal clock** of Netdata (it is user configurable, defaulting to `1`). There is no meaning for a plugin to update its values more frequently than this number of seconds.| +| variable | description | +|:---------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `NETDATA_USER_CONFIG_DIR` | The directory where all Netdata-related user configuration should be stored. If the plugin requires custom user configuration, this is the place the user has saved it (normally under `/etc/netdata`). | +| `NETDATA_STOCK_CONFIG_DIR` | The directory where all Netdata -related stock configuration should be stored. If the plugin is shipped with configuration files, this is the place they can be found (normally under `/usr/lib/netdata/conf.d`). | +| `NETDATA_PLUGINS_DIR` | The directory where all Netdata plugins are stored. | +| `NETDATA_USER_PLUGINS_DIRS` | The list of directories where custom plugins are stored. | +| `NETDATA_WEB_DIR` | The directory where the web files of Netdata are saved. | +| `NETDATA_CACHE_DIR` | The directory where the cache files of Netdata are stored. Use this directory if the plugin requires a place to store data. A new directory should be created for the plugin for this purpose, inside this directory. | +| `NETDATA_LOG_DIR` | The directory where the log files are stored. By default the `stderr` output of the plugin will be saved in the `error.log` file of Netdata. | +| `NETDATA_HOST_PREFIX` | This is used in environments where system directories like `/sys` and `/proc` have to be accessed at a different path. | +| `NETDATA_DEBUG_FLAGS` | This is a number (probably in hex starting with `0x`), that enables certain Netdata debugging features. Check **\[[Tracing Options]]** for more information. | +| `NETDATA_UPDATE_EVERY` | The minimum number of seconds between chart refreshes. This is like the **internal clock** of Netdata (it is user configurable, defaulting to `1`). There is no meaning for a plugin to update its values more frequently than this number of seconds. | ### The output of the plugin @@ -298,7 +298,7 @@ the template is: the context is giving the template of the chart. For example, if multiple charts present the same information for a different family, they should have the same `context` - this is used for looking up rendering information for the chart (colors, sizes, informational texts) and also apply alarms to it + this is used for looking up rendering information for the chart (colors, sizes, informational texts) and also apply alerts to it - `charttype` @@ -388,12 +388,12 @@ the template is: > VARIABLE [SCOPE] name = value -`VARIABLE` defines a variable that can be used in alarms. This is to used for setting constants (like the max connections a server may accept). +`VARIABLE` defines a variable that can be used in alerts. This is to used for setting constants (like the max connections a server may accept). Variables support 2 scopes: - `GLOBAL` or `HOST` to define the variable at the host level. -- `LOCAL` or `CHART` to define the variable at the chart level. Use chart-local variables when the same variable may exist for different charts (i.e. Netdata monitors 2 mysql servers, and you need to set the `max_connections` each server accepts). Using chart-local variables is the ideal to build alarm templates. +- `LOCAL` or `CHART` to define the variable at the chart level. Use chart-local variables when the same variable may exist for different charts (i.e. Netdata monitors 2 mysql servers, and you need to set the `max_connections` each server accepts). Using chart-local variables is the ideal to build alert templates. The position of the `VARIABLE` line, sets its default scope (in case you do not specify a scope). So, defining a `VARIABLE` before any `CHART`, or between `END` and `BEGIN` (outside any chart), sets `GLOBAL` scope, while defining a `VARIABLE` just after a `CHART` or a `DIMENSION`, or within the `BEGIN` - `END` block of a chart, sets `LOCAL` scope. diff --git a/collectors/plugins.d/gperf-config.txt b/collectors/plugins.d/gperf-config.txt index b8140e66c..a1d0c51ba 100644 --- a/collectors/plugins.d/gperf-config.txt +++ b/collectors/plugins.d/gperf-config.txt @@ -36,20 +36,22 @@ SET, 11, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_ VARIABLE, 53, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 19 DYNCFG_ENABLE, 101, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 20 DYNCFG_REGISTER_MODULE, 102, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 21 -REPORT_JOB_STATUS, 110, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 22 +DYNCFG_REGISTER_JOB, 103, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 22 +REPORT_JOB_STATUS, 110, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 23 +DELETE_JOB, 111, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 24 # # Streaming only keywords # -CLAIMED_ID, 61, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 23 -BEGIN2, 2, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 24 -SET2, 1, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 25 -END2, 3, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 26 +CLAIMED_ID, 61, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 25 +BEGIN2, 2, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 26 +SET2, 1, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 27 +END2, 3, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 28 # # Streaming Replication keywords # -CHART_DEFINITION_END, 33, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 27 -RBEGIN, 22, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 28 -RDSTATE, 23, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 29 -REND, 25, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 30 -RSET, 21, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 31 -RSSTATE, 24, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 32 +CHART_DEFINITION_END, 33, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 29 +RBEGIN, 22, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 30 +RDSTATE, 23, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 31 +REND, 25, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 32 +RSET, 21, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 33 +RSSTATE, 24, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 34 diff --git a/collectors/plugins.d/gperf-hashtable.h b/collectors/plugins.d/gperf-hashtable.h index e7d20126f..5bbf9fa98 100644 --- a/collectors/plugins.d/gperf-hashtable.h +++ b/collectors/plugins.d/gperf-hashtable.h @@ -30,12 +30,12 @@ #endif -#define GPERF_PARSER_TOTAL_KEYWORDS 32 +#define GPERF_PARSER_TOTAL_KEYWORDS 34 #define GPERF_PARSER_MIN_WORD_LENGTH 3 #define GPERF_PARSER_MAX_WORD_LENGTH 22 #define GPERF_PARSER_MIN_HASH_VALUE 3 -#define GPERF_PARSER_MAX_HASH_VALUE 41 -/* maximum key range = 39, duplicates = 0 */ +#define GPERF_PARSER_MAX_HASH_VALUE 36 +/* maximum key range = 34, duplicates = 0 */ #ifdef __GNUC__ __inline @@ -49,32 +49,32 @@ gperf_keyword_hash_function (register const char *str, register size_t len) { static unsigned char asso_values[] = { - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 16, 7, 2, 11, 0, - 8, 42, 3, 9, 42, 42, 9, 42, 0, 2, - 42, 42, 1, 3, 42, 7, 17, 42, 27, 2, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, - 42, 42, 42, 42, 42, 42 + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 12, 28, 5, 2, 0, + 0, 37, 3, 13, 37, 37, 14, 37, 0, 2, + 37, 37, 1, 3, 37, 6, 10, 37, 32, 2, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, + 37, 37, 37, 37, 37, 37 }; return len + asso_values[(unsigned char)str[1]] + asso_values[(unsigned char)str[0]]; } @@ -84,70 +84,72 @@ static PARSER_KEYWORD gperf_keywords[] = {(char*)0}, {(char*)0}, {(char*)0}, #line 30 "gperf-config.txt" {"END", 13, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 13}, -#line 46 "gperf-config.txt" - {"END2", 3, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 26}, -#line 53 "gperf-config.txt" - {"REND", 25, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 30}, +#line 48 "gperf-config.txt" + {"END2", 3, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 28}, +#line 55 "gperf-config.txt" + {"REND", 25, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 32}, #line 35 "gperf-config.txt" {"SET", 11, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 18}, -#line 45 "gperf-config.txt" - {"SET2", 1, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 25}, -#line 54 "gperf-config.txt" - {"RSET", 21, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 31}, +#line 47 "gperf-config.txt" + {"SET2", 1, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 27}, +#line 56 "gperf-config.txt" + {"RSET", 21, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 33}, #line 18 "gperf-config.txt" {"HOST", 71, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 4}, +#line 54 "gperf-config.txt" + {"RDSTATE", 23, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 31}, +#line 57 "gperf-config.txt" + {"RSSTATE", 24, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 34}, +#line 41 "gperf-config.txt" + {"DELETE_JOB", 111, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 24}, #line 26 "gperf-config.txt" {"CHART", 32, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 9}, -#line 55 "gperf-config.txt" - {"RSSTATE", 24, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 32}, -#line 25 "gperf-config.txt" - {"BEGIN", 12, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 8}, -#line 44 "gperf-config.txt" - {"BEGIN2", 2, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 24}, -#line 51 "gperf-config.txt" - {"RBEGIN", 22, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 28}, +#line 31 "gperf-config.txt" + {"FUNCTION", 41, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 14}, #line 21 "gperf-config.txt" {"HOST_LABEL", 74, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 7}, #line 19 "gperf-config.txt" {"HOST_DEFINE", 72, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 5}, -#line 27 "gperf-config.txt" - {"CLABEL", 34, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 10}, -#line 39 "gperf-config.txt" - {"REPORT_JOB_STATUS", 110, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 22}, -#line 52 "gperf-config.txt" - {"RDSTATE", 23, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 29}, -#line 20 "gperf-config.txt" - {"HOST_DEFINE_END", 73, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 6}, -#line 43 "gperf-config.txt" - {"CLAIMED_ID", 61, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 23}, -#line 15 "gperf-config.txt" - {"FLUSH", 97, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 1}, -#line 31 "gperf-config.txt" - {"FUNCTION", 41, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 14}, -#line 28 "gperf-config.txt" - {"CLABEL_COMMIT", 35, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 11}, -#line 50 "gperf-config.txt" - {"CHART_DEFINITION_END", 33, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 27}, #line 37 "gperf-config.txt" {"DYNCFG_ENABLE", 101, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 20}, -#line 16 "gperf-config.txt" - {"DISABLE", 98, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 2}, +#line 40 "gperf-config.txt" + {"REPORT_JOB_STATUS", 110, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 23}, +#line 15 "gperf-config.txt" + {"FLUSH", 97, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 1}, +#line 20 "gperf-config.txt" + {"HOST_DEFINE_END", 73, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 6}, #line 34 "gperf-config.txt" {"OVERWRITE", 52, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 17}, +#line 16 "gperf-config.txt" + {"DISABLE", 98, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 2}, +#line 39 "gperf-config.txt" + {"DYNCFG_REGISTER_JOB", 103, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 22}, #line 29 "gperf-config.txt" {"DIMENSION", 31, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 12}, -#line 33 "gperf-config.txt" - {"LABEL", 51, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 16}, -#line 17 "gperf-config.txt" - {"EXIT", 99, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 3}, - {(char*)0}, {(char*)0}, {(char*)0}, +#line 27 "gperf-config.txt" + {"CLABEL", 34, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 10}, #line 38 "gperf-config.txt" {"DYNCFG_REGISTER_MODULE", 102, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 21}, #line 32 "gperf-config.txt" {"FUNCTION_RESULT_BEGIN", 42, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 15}, - {(char*)0}, {(char*)0}, {(char*)0}, {(char*)0}, +#line 52 "gperf-config.txt" + {"CHART_DEFINITION_END", 33, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 29}, +#line 45 "gperf-config.txt" + {"CLAIMED_ID", 61, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 25}, #line 36 "gperf-config.txt" - {"VARIABLE", 53, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 19} + {"VARIABLE", 53, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 19}, +#line 33 "gperf-config.txt" + {"LABEL", 51, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 16}, +#line 28 "gperf-config.txt" + {"CLABEL_COMMIT", 35, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 11}, +#line 25 "gperf-config.txt" + {"BEGIN", 12, PARSER_INIT_PLUGINSD|PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 8}, +#line 46 "gperf-config.txt" + {"BEGIN2", 2, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 26}, +#line 53 "gperf-config.txt" + {"RBEGIN", 22, PARSER_INIT_STREAMING, WORKER_PARSER_FIRST_JOB + 30}, +#line 17 "gperf-config.txt" + {"EXIT", 99, PARSER_INIT_PLUGINSD, WORKER_PARSER_FIRST_JOB + 3} }; PARSER_KEYWORD * diff --git a/collectors/plugins.d/plugins_d.h b/collectors/plugins.d/plugins_d.h index 4988b5071..7c5df4168 100644 --- a/collectors/plugins.d/plugins_d.h +++ b/collectors/plugins.d/plugins_d.h @@ -10,47 +10,15 @@ #define PLUGINSD_CMD_MAX (FILENAME_MAX*2) #define PLUGINSD_STOCK_PLUGINS_DIRECTORY_PATH 0 -#define PLUGINSD_KEYWORD_CHART "CHART" -#define PLUGINSD_KEYWORD_CHART_DEFINITION_END "CHART_DEFINITION_END" -#define PLUGINSD_KEYWORD_DIMENSION "DIMENSION" -#define PLUGINSD_KEYWORD_BEGIN "BEGIN" -#define PLUGINSD_KEYWORD_SET "SET" -#define PLUGINSD_KEYWORD_END "END" -#define PLUGINSD_KEYWORD_FLUSH "FLUSH" -#define PLUGINSD_KEYWORD_DISABLE "DISABLE" -#define PLUGINSD_KEYWORD_VARIABLE "VARIABLE" -#define PLUGINSD_KEYWORD_LABEL "LABEL" -#define PLUGINSD_KEYWORD_OVERWRITE "OVERWRITE" -#define PLUGINSD_KEYWORD_CLABEL "CLABEL" -#define PLUGINSD_KEYWORD_CLABEL_COMMIT "CLABEL_COMMIT" -#define PLUGINSD_KEYWORD_FUNCTION "FUNCTION" -#define PLUGINSD_KEYWORD_FUNCTION_RESULT_BEGIN "FUNCTION_RESULT_BEGIN" -#define PLUGINSD_KEYWORD_FUNCTION_RESULT_END "FUNCTION_RESULT_END" - -#define PLUGINSD_KEYWORD_REPLAY_CHART "REPLAY_CHART" -#define PLUGINSD_KEYWORD_REPLAY_BEGIN "RBEGIN" -#define PLUGINSD_KEYWORD_REPLAY_SET "RSET" -#define PLUGINSD_KEYWORD_REPLAY_RRDDIM_STATE "RDSTATE" -#define PLUGINSD_KEYWORD_REPLAY_RRDSET_STATE "RSSTATE" -#define PLUGINSD_KEYWORD_REPLAY_END "REND" - -#define PLUGINSD_KEYWORD_BEGIN_V2 "BEGIN2" -#define PLUGINSD_KEYWORD_SET_V2 "SET2" -#define PLUGINSD_KEYWORD_END_V2 "END2" - -#define PLUGINSD_KEYWORD_HOST_DEFINE "HOST_DEFINE" -#define PLUGINSD_KEYWORD_HOST_DEFINE_END "HOST_DEFINE_END" -#define PLUGINSD_KEYWORD_HOST_LABEL "HOST_LABEL" -#define PLUGINSD_KEYWORD_HOST "HOST" +#define PLUGINSD_KEYWORD_FUNCTION_PAYLOAD "FUNCTION_PAYLOAD" +#define PLUGINSD_KEYWORD_FUNCTION_PAYLOAD_END "FUNCTION_PAYLOAD_END" #define PLUGINSD_KEYWORD_DYNCFG_ENABLE "DYNCFG_ENABLE" #define PLUGINSD_KEYWORD_DYNCFG_REGISTER_MODULE "DYNCFG_REGISTER_MODULE" +#define PLUGINSD_KEYWORD_DYNCFG_REGISTER_JOB "DYNCFG_REGISTER_JOB" #define PLUGINSD_KEYWORD_REPORT_JOB_STATUS "REPORT_JOB_STATUS" - -#define PLUGINSD_KEYWORD_EXIT "EXIT" - -#define PLUGINS_FUNCTIONS_TIMEOUT_DEFAULT 10 // seconds +#define PLUGINSD_KEYWORD_DELETE_JOB "DELETE_JOB" #define PLUGINSD_LINE_MAX_SSL_READ 512 @@ -99,37 +67,4 @@ void pluginsd_process_thread_cleanup(void *ptr); size_t pluginsd_initialize_plugin_directories(); -#define pluginsd_function_result_begin_to_buffer(wb, transaction, code, content_type, expires) \ - buffer_sprintf(wb \ - , PLUGINSD_KEYWORD_FUNCTION_RESULT_BEGIN " \"%s\" %d \"%s\" %ld\n" \ - , (transaction) ? (transaction) : "" \ - , (int)(code) \ - , (content_type) ? (content_type) : "" \ - , (long int)(expires) \ - ) - -#define pluginsd_function_result_end_to_buffer(wb) \ - buffer_strcat(wb, "\n" PLUGINSD_KEYWORD_FUNCTION_RESULT_END "\n") - -#define pluginsd_function_result_begin_to_stdout(transaction, code, content_type, expires) \ - fprintf(stdout \ - , PLUGINSD_KEYWORD_FUNCTION_RESULT_BEGIN " \"%s\" %d \"%s\" %ld\n" \ - , (transaction) ? (transaction) : "" \ - , (int)(code) \ - , (content_type) ? (content_type) : "" \ - , (long int)(expires) \ - ) - -#define pluginsd_function_result_end_to_stdout() \ - fprintf(stdout, "\n" PLUGINSD_KEYWORD_FUNCTION_RESULT_END "\n") - -static inline void pluginsd_function_json_error(const char *transaction, int code, const char *msg) { - char buffer[PLUGINSD_LINE_MAX + 1]; - json_escape_string(buffer, msg, PLUGINSD_LINE_MAX); - - pluginsd_function_result_begin_to_stdout(transaction, code, "application/json", now_realtime_sec()); - fprintf(stdout, "{\"status\":%d,\"error_message\":\"%s\"}", code, buffer); - pluginsd_function_result_end_to_stdout(); -} - #endif /* NETDATA_PLUGINS_D_H */ diff --git a/collectors/plugins.d/pluginsd_parser.c b/collectors/plugins.d/pluginsd_parser.c index bc265a3af..68667c785 100644 --- a/collectors/plugins.d/pluginsd_parser.c +++ b/collectors/plugins.d/pluginsd_parser.c @@ -4,6 +4,9 @@ #define LOG_FUNCTIONS false +#define SERVING_STREAMING(parser) (parser->repertoire == PARSER_INIT_STREAMING) +#define SERVING_PLUGINSD(parser) (parser->repertoire == PARSER_INIT_PLUGINSD) + static ssize_t send_to_plugin(const char *txt, void *data) { PARSER *parser = data; @@ -353,7 +356,7 @@ static inline PARSER_RC pluginsd_end(char **words, size_t num_words, PARSER *par static void pluginsd_host_define_cleanup(PARSER *parser) { string_freez(parser->user.host_define.hostname); - dictionary_destroy(parser->user.host_define.rrdlabels); + rrdlabels_destroy(parser->user.host_define.rrdlabels); parser->user.host_define.hostname = NULL; parser->user.host_define.rrdlabels = NULL; @@ -390,17 +393,17 @@ static inline PARSER_RC pluginsd_host_define(char **words, size_t num_words, PAR return PARSER_RC_OK; } -static inline PARSER_RC pluginsd_host_dictionary(char **words, size_t num_words, PARSER *parser, DICTIONARY *dict, const char *keyword) { +static inline PARSER_RC pluginsd_host_dictionary(char **words, size_t num_words, PARSER *parser, RRDLABELS *labels, const char *keyword) { char *name = get_word(words, num_words, 1); char *value = get_word(words, num_words, 2); if(!name || !*name || !value) return PLUGINSD_DISABLE_PLUGIN(parser, keyword, "missing parameters"); - if(!parser->user.host_define.parsing_host || !dict) + if(!parser->user.host_define.parsing_host || !labels) return PLUGINSD_DISABLE_PLUGIN(parser, keyword, "host is not defined, send " PLUGINSD_KEYWORD_HOST_DEFINE " before this"); - rrdlabels_add(dict, name, value, RRDLABEL_SRC_CONFIG); + rrdlabels_add(labels, name, value, RRDLABEL_SRC_CONFIG); return PARSER_RC_OK; } @@ -733,14 +736,16 @@ static inline PARSER_RC pluginsd_dimension(char **words, size_t num_words, PARSE struct inflight_function { int code; int timeout; - BUFFER *destination_wb; STRING *function; - void (*callback)(BUFFER *wb, int code, void *callback_data); - void *callback_data; + BUFFER *result_body_wb; + rrd_function_result_callback_t result_cb; + void *result_cb_data; usec_t timeout_ut; usec_t started_ut; usec_t sent_ut; const char *payload; + PARSER *parser; + bool virtual; }; static void inflight_functions_insert_callback(const DICTIONARY_ITEM *item, void *func, void *parser_ptr) { @@ -751,42 +756,44 @@ static void inflight_functions_insert_callback(const DICTIONARY_ITEM *item, void // leave this code as default, so that when the dictionary is destroyed this will be sent back to the caller pf->code = HTTP_RESP_GATEWAY_TIMEOUT; + const char *transaction = dictionary_acquired_item_name(item); + char buffer[2048 + 1]; snprintfz(buffer, 2048, "%s %s %d \"%s\"\n", pf->payload ? "FUNCTION_PAYLOAD" : "FUNCTION", - dictionary_acquired_item_name(item), + transaction, pf->timeout, string2str(pf->function)); // send the command to the plugin - int ret = send_to_plugin(buffer, parser); + ssize_t ret = send_to_plugin(buffer, parser); pf->sent_ut = now_realtime_usec(); if(ret < 0) { - netdata_log_error("FUNCTION: failed to send function to plugin, error %d", ret); - rrd_call_function_error(pf->destination_wb, "Failed to communicate with collector", HTTP_RESP_BACKEND_FETCH_FAILED); + netdata_log_error("FUNCTION '%s': failed to send it to the plugin, error %zd", string2str(pf->function), ret); + rrd_call_function_error(pf->result_body_wb, "Failed to communicate with collector", HTTP_RESP_SERVICE_UNAVAILABLE); } else { internal_error(LOG_FUNCTIONS, - "FUNCTION '%s' with transaction '%s' sent to collector (%d bytes, in %llu usec)", + "FUNCTION '%s' with transaction '%s' sent to collector (%zd bytes, in %"PRIu64" usec)", string2str(pf->function), dictionary_acquired_item_name(item), ret, pf->sent_ut - pf->started_ut); } if (!pf->payload) return; - + // send the payload to the plugin ret = send_to_plugin(pf->payload, parser); if(ret < 0) { - netdata_log_error("FUNCTION_PAYLOAD: failed to send function to plugin, error %d", ret); - rrd_call_function_error(pf->destination_wb, "Failed to communicate with collector", HTTP_RESP_BACKEND_FETCH_FAILED); + netdata_log_error("FUNCTION_PAYLOAD '%s': failed to send function to plugin, error %zd", string2str(pf->function), ret); + rrd_call_function_error(pf->result_body_wb, "Failed to communicate with collector", HTTP_RESP_SERVICE_UNAVAILABLE); } else { internal_error(LOG_FUNCTIONS, - "FUNCTION_PAYLOAD '%s' with transaction '%s' sent to collector (%d bytes, in %llu usec)", + "FUNCTION_PAYLOAD '%s' with transaction '%s' sent to collector (%zd bytes, in %"PRIu64" usec)", string2str(pf->function), dictionary_acquired_item_name(item), ret, pf->sent_ut - pf->started_ut); } @@ -798,23 +805,90 @@ static bool inflight_functions_conflict_callback(const DICTIONARY_ITEM *item __m struct inflight_function *pf = new_func; netdata_log_error("PLUGINSD_PARSER: duplicate UUID on pending function '%s' detected. Ignoring the second one.", string2str(pf->function)); - pf->code = rrd_call_function_error(pf->destination_wb, "This request is already in progress", HTTP_RESP_BAD_REQUEST); - pf->callback(pf->destination_wb, pf->code, pf->callback_data); + pf->code = rrd_call_function_error(pf->result_body_wb, "This request is already in progress", HTTP_RESP_BAD_REQUEST); + pf->result_cb(pf->result_body_wb, pf->code, pf->result_cb_data); string_freez(pf->function); return false; } -static void inflight_functions_delete_callback(const DICTIONARY_ITEM *item __maybe_unused, void *func, void *parser_ptr __maybe_unused) { +void delete_job_finalize(struct parser *parser __maybe_unused, struct configurable_plugin *plug, const char *fnc_sig, int code) { + if (code != DYNCFG_VFNC_RET_CFG_ACCEPTED) + return; + + char *params_local = strdupz(fnc_sig); + char *words[DYNCFG_MAX_WORDS]; + size_t words_c = quoted_strings_splitter(params_local, words, DYNCFG_MAX_WORDS, isspace_map_pluginsd); + + if (words_c != 3) { + netdata_log_error("PLUGINSD_PARSER: invalid number of parameters for delete_job"); + freez(params_local); + return; + } + + const char *module = words[1]; + const char *job = words[2]; + + delete_job(plug, module, job); + + unlink_job(plug->name, module, job); + + rrdpush_send_job_deleted(localhost, plug->name, module, job); + + freez(params_local); +} + +void set_job_finalize(struct parser *parser __maybe_unused, struct configurable_plugin *plug __maybe_unused, const char *fnc_sig, int code) { + if (code != DYNCFG_VFNC_RET_CFG_ACCEPTED) + return; + + char *params_local = strdupz(fnc_sig); + char *words[DYNCFG_MAX_WORDS]; + size_t words_c = quoted_strings_splitter(params_local, words, DYNCFG_MAX_WORDS, isspace_map_pluginsd); + + if (words_c != 3) { + netdata_log_error("PLUGINSD_PARSER: invalid number of parameters for set_job_config"); + freez(params_local); + return; + } + + const char *module_name = get_word(words, words_c, 1); + const char *job_name = get_word(words, words_c, 2); + + if (register_job(parser->user.host->configurable_plugins, parser->user.cd->configuration->name, module_name, job_name, JOB_TYPE_USER, JOB_FLG_USER_CREATED, 1)) { + freez(params_local); + return; + } + + // only send this if it is not existing already (register_job cares for that) + rrdpush_send_dyncfg_reg_job(localhost, parser->user.cd->configuration->name, module_name, job_name, JOB_TYPE_USER, JOB_FLG_USER_CREATED); + + freez(params_local); +} + +static void inflight_functions_delete_callback(const DICTIONARY_ITEM *item __maybe_unused, void *func, void *parser_ptr) { struct inflight_function *pf = func; + struct parser *parser = (struct parser *)parser_ptr; internal_error(LOG_FUNCTIONS, - "FUNCTION '%s' result of transaction '%s' received from collector (%zu bytes, request %llu usec, response %llu usec)", + "FUNCTION '%s' result of transaction '%s' received from collector (%zu bytes, request %"PRIu64" usec, response %"PRIu64" usec)", string2str(pf->function), dictionary_acquired_item_name(item), - buffer_strlen(pf->destination_wb), pf->sent_ut - pf->started_ut, now_realtime_usec() - pf->sent_ut); + buffer_strlen(pf->result_body_wb), pf->sent_ut - pf->started_ut, now_realtime_usec() - pf->sent_ut); + + if (pf->virtual && SERVING_PLUGINSD(parser)) { + if (pf->payload) { + if (strncmp(string2str(pf->function), FUNCTION_NAME_SET_JOB_CONFIG, strlen(FUNCTION_NAME_SET_JOB_CONFIG)) == 0) + set_job_finalize(parser, parser->user.cd->configuration, string2str(pf->function), pf->code); + dyn_conf_store_config(string2str(pf->function), pf->payload, parser->user.cd->configuration); + } else if (strncmp(string2str(pf->function), FUNCTION_NAME_DELETE_JOB, strlen(FUNCTION_NAME_DELETE_JOB)) == 0) { + delete_job_finalize(parser, parser->user.cd->configuration, string2str(pf->function), pf->code); + } + } + + pf->result_cb(pf->result_body_wb, pf->code, pf->result_cb_data); - pf->callback(pf->destination_wb, pf->code, pf->callback_data); string_freez(pf->function); + freez((void *)pf->payload); } void inflight_functions_init(PARSER *parser) { @@ -830,11 +904,11 @@ static void inflight_functions_garbage_collect(PARSER *parser, usec_t now) { dfe_start_write(parser->inflight.functions, pf) { if (pf->timeout_ut < now) { internal_error(true, - "FUNCTION '%s' removing expired transaction '%s', after %llu usec.", + "FUNCTION '%s' removing expired transaction '%s', after %"PRIu64" usec.", string2str(pf->function), pf_dfe.name, now - pf->started_ut); - if(!buffer_strlen(pf->destination_wb) || pf->code == HTTP_RESP_OK) - pf->code = rrd_call_function_error(pf->destination_wb, + if(!buffer_strlen(pf->result_body_wb) || pf->code == HTTP_RESP_OK) + pf->code = rrd_call_function_error(pf->result_body_wb, "Timeout waiting for collector response.", HTTP_RESP_GATEWAY_TIMEOUT); @@ -847,35 +921,73 @@ static void inflight_functions_garbage_collect(PARSER *parser, usec_t now) { dfe_done(pf); } +void pluginsd_function_cancel(void *data) { + struct inflight_function *look_for = data, *t; + + bool sent = false; + dfe_start_read(look_for->parser->inflight.functions, t) { + if(look_for == t) { + const char *transaction = t_dfe.name; + + internal_error(true, "PLUGINSD: sending function cancellation to plugin for transaction '%s'", transaction); + + char buffer[2048 + 1]; + snprintfz(buffer, 2048, "%s %s\n", + PLUGINSD_KEYWORD_FUNCTION_CANCEL, + transaction); + + // send the command to the plugin + ssize_t ret = send_to_plugin(buffer, t->parser); + if(ret < 0) + sent = true; + + break; + } + } + dfe_done(t); + + if(sent <= 0) + netdata_log_error("PLUGINSD: FUNCTION_CANCEL request didn't match any pending function requests in pluginsd.d."); +} + // this is the function that is called from // rrd_call_function_and_wait() and rrd_call_function_async() -static int pluginsd_execute_function_callback(BUFFER *destination_wb, int timeout, const char *function, void *collector_data, void (*callback)(BUFFER *wb, int code, void *callback_data), void *callback_data) { - PARSER *parser = collector_data; +static int pluginsd_function_execute_cb(BUFFER *result_body_wb, int timeout, const char *function, + void *execute_cb_data, + rrd_function_result_callback_t result_cb, void *result_cb_data, + rrd_function_is_cancelled_cb_t is_cancelled_cb __maybe_unused, + void *is_cancelled_cb_data __maybe_unused, + rrd_function_register_canceller_cb_t register_canceller_cb, + void *register_canceller_db_data) { + PARSER *parser = execute_cb_data; usec_t now = now_realtime_usec(); struct inflight_function tmp = { .started_ut = now, - .timeout_ut = now + timeout * USEC_PER_SEC, - .destination_wb = destination_wb, + .timeout_ut = now + timeout * USEC_PER_SEC + RRDFUNCTIONS_TIMEOUT_EXTENSION_UT, + .result_body_wb = result_body_wb, .timeout = timeout, .function = string_strdupz(function), - .callback = callback, - .callback_data = callback_data, - .payload = NULL + .result_cb = result_cb, + .result_cb_data = result_cb_data, + .payload = NULL, + .parser = parser, }; uuid_t uuid; - uuid_generate_time(uuid); + uuid_generate_random(uuid); - char key[UUID_STR_LEN]; - uuid_unparse_lower(uuid, key); + char transaction[UUID_STR_LEN]; + uuid_unparse_lower(uuid, transaction); dictionary_write_lock(parser->inflight.functions); // if there is any error, our dictionary callbacks will call the caller callback to notify // the caller about the error - no need for error handling here. - dictionary_set(parser->inflight.functions, key, &tmp, sizeof(struct inflight_function)); + void *t = dictionary_set(parser->inflight.functions, transaction, &tmp, sizeof(struct inflight_function)); + if(register_canceller_cb) + register_canceller_cb(register_canceller_db_data, pluginsd_function_cancel, t); if(!parser->inflight.smaller_timeout || tmp.timeout_ut < parser->inflight.smaller_timeout) parser->inflight.smaller_timeout = tmp.timeout_ut; @@ -890,6 +1002,8 @@ static int pluginsd_execute_function_callback(BUFFER *destination_wb, int timeou } static inline PARSER_RC pluginsd_function(char **words, size_t num_words, PARSER *parser) { + // a plugin or a child is registering a function + bool global = false; size_t i = 1; if(num_words >= 2 && strcmp(get_word(words, num_words, 1), "GLOBAL") == 0) { @@ -926,7 +1040,7 @@ static inline PARSER_RC pluginsd_function(char **words, size_t num_words, PARSER timeout = PLUGINS_FUNCTIONS_TIMEOUT_DEFAULT; } - rrd_collector_add_function(host, st, name, timeout, help, false, pluginsd_execute_function_callback, parser); + rrd_function_add(host, st, name, timeout, help, false, pluginsd_function_execute_cb, parser); parser->user.data_collections_count++; @@ -973,18 +1087,18 @@ static inline PARSER_RC pluginsd_function_result_begin(char **words, size_t num_ } else { if(format && *format) - pf->destination_wb->content_type = functions_format_to_content_type(format); + pf->result_body_wb->content_type = functions_format_to_content_type(format); pf->code = code; - pf->destination_wb->expires = expiration; + pf->result_body_wb->expires = expiration; if(expiration <= now_realtime_sec()) - buffer_no_cacheable(pf->destination_wb); + buffer_no_cacheable(pf->result_body_wb); else - buffer_cacheable(pf->destination_wb); + buffer_cacheable(pf->result_body_wb); } - parser->defer.response = (pf) ? pf->destination_wb : NULL; + parser->defer.response = (pf) ? pf->result_body_wb : NULL; parser->defer.end_keyword = PLUGINSD_KEYWORD_FUNCTION_RESULT_END; parser->defer.action = pluginsd_function_result_end; parser->defer.action_data = string_strdupz(key); // it is ok is key is NULL @@ -1163,7 +1277,7 @@ static inline PARSER_RC pluginsd_clabel(char **words, size_t num_words, PARSER * const char *value = get_word(words, num_words, 2); const char *label_source = get_word(words, num_words, 3); - if (!name || !value || !*label_source) { + if (!name || !value || !label_source) { netdata_log_error("Ignoring malformed or empty CHART LABEL command."); return PLUGINSD_DISABLE_PLUGIN(parser, NULL, NULL); } @@ -1604,7 +1718,7 @@ static inline PARSER_RC pluginsd_begin_v2(char **words, size_t num_words, PARSER if(!pluginsd_set_scope_chart(parser, st, PLUGINSD_KEYWORD_BEGIN_V2)) return PLUGINSD_DISABLE_PLUGIN(parser, NULL, NULL); - if(unlikely(rrdset_flag_check(st, RRDSET_FLAG_OBSOLETE | RRDSET_FLAG_ARCHIVED))) + if(unlikely(rrdset_flag_check(st, RRDSET_FLAG_OBSOLETE))) rrdset_isnot_obsolete(st); timing_step(TIMING_STEP_BEGIN2_FIND_CHART); @@ -1894,7 +2008,7 @@ struct mutex_cond { int rc; }; -static void virt_fnc_got_data_cb(BUFFER *wb, int code, void *callback_data) +static void virt_fnc_got_data_cb(BUFFER *wb __maybe_unused, int code, void *callback_data) { struct mutex_cond *ctx = callback_data; pthread_mutex_lock(&ctx->lock); @@ -1904,9 +2018,81 @@ static void virt_fnc_got_data_cb(BUFFER *wb, int code, void *callback_data) } #define VIRT_FNC_TIMEOUT 1 +#define VIRT_FNC_BUF_SIZE (4096) +void call_virtual_function_async(BUFFER *wb, RRDHOST *host, const char *name, const char *payload, rrd_function_result_callback_t callback, void *callback_data) { + PARSER *parser = NULL; + + //TODO simplify (as we really need only first parameter to get plugin name maybe we can avoid parsing all) + char *words[PLUGINSD_MAX_WORDS]; + char *function_with_params = strdupz(name); + size_t num_words = quoted_strings_splitter(function_with_params, words, PLUGINSD_MAX_WORDS, isspace_map_pluginsd); + + if (num_words < 2) { + netdata_log_error("PLUGINSD: virtual function name is empty."); + freez(function_with_params); + return; + } + + const DICTIONARY_ITEM *cpi = dictionary_get_and_acquire_item(host->configurable_plugins, get_word(words, num_words, 1)); + if (unlikely(cpi == NULL)) { + netdata_log_error("PLUGINSD: virtual function plugin '%s' not found.", name); + freez(function_with_params); + return; + } + struct configurable_plugin *cp = dictionary_acquired_item_value(cpi); + parser = (PARSER *)cp->cb_usr_ctx; + + BUFFER *function_out = buffer_create(VIRT_FNC_BUF_SIZE, NULL); + // if we are forwarding this to a plugin (as opposed to streaming/child) we have to remove the first parameter (plugin_name) + buffer_strcat(function_out, get_word(words, num_words, 0)); + for (size_t i = 1; i < num_words; i++) { + if (i == 1 && SERVING_PLUGINSD(parser)) + continue; + buffer_sprintf(function_out, " %s", get_word(words, num_words, i)); + } + freez(function_with_params); + + usec_t now = now_realtime_usec(); + + struct inflight_function tmp = { + .started_ut = now, + .timeout_ut = now + VIRT_FNC_TIMEOUT + USEC_PER_SEC, + .result_body_wb = wb, + .timeout = VIRT_FNC_TIMEOUT * 10, + .function = string_strdupz(buffer_tostring(function_out)), + .result_cb = callback, + .result_cb_data = callback_data, + .payload = payload != NULL ? strdupz(payload) : NULL, + .virtual = true, + }; + buffer_free(function_out); + + uuid_t uuid; + uuid_generate_time(uuid); + + char key[UUID_STR_LEN]; + uuid_unparse_lower(uuid, key); + + dictionary_write_lock(parser->inflight.functions); + + // if there is any error, our dictionary callbacks will call the caller callback to notify + // the caller about the error - no need for error handling here. + dictionary_set(parser->inflight.functions, key, &tmp, sizeof(struct inflight_function)); + + if(!parser->inflight.smaller_timeout || tmp.timeout_ut < parser->inflight.smaller_timeout) + parser->inflight.smaller_timeout = tmp.timeout_ut; + + // garbage collect stale inflight functions + if(parser->inflight.smaller_timeout < now) + inflight_functions_garbage_collect(parser, now); + + dictionary_write_unlock(parser->inflight.functions); +} + + dyncfg_config_t call_virtual_function_blocking(PARSER *parser, const char *name, int *rc, const char *payload) { usec_t now = now_realtime_usec(); - BUFFER *wb = buffer_create(4096, NULL); + BUFFER *wb = buffer_create(VIRT_FNC_BUF_SIZE, NULL); struct mutex_cond cond = { .lock = PTHREAD_MUTEX_INITIALIZER, @@ -1916,12 +2102,13 @@ dyncfg_config_t call_virtual_function_blocking(PARSER *parser, const char *name, struct inflight_function tmp = { .started_ut = now, .timeout_ut = now + VIRT_FNC_TIMEOUT + USEC_PER_SEC, - .destination_wb = wb, + .result_body_wb = wb, .timeout = VIRT_FNC_TIMEOUT, .function = string_strdupz(name), - .callback = virt_fnc_got_data_cb, - .callback_data = &cond, - .payload = payload, + .result_cb = virt_fnc_got_data_cb, + .result_cb_data = &cond, + .payload = payload != NULL ? strdupz(payload) : NULL, + .virtual = true, }; uuid_t uuid; @@ -1968,98 +2155,188 @@ dyncfg_config_t call_virtual_function_blocking(PARSER *parser, const char *name, return cfg; } -static dyncfg_config_t get_plugin_config_cb(void *usr_ctx) +#define CVF_MAX_LEN (1024) +static dyncfg_config_t get_plugin_config_cb(void *usr_ctx, const char *plugin_name) { PARSER *parser = usr_ctx; - return call_virtual_function_blocking(parser, "get_plugin_config", NULL, NULL); + + if (SERVING_STREAMING(parser)) { + char buf[CVF_MAX_LEN + 1]; + snprintfz(buf, CVF_MAX_LEN, FUNCTION_NAME_GET_PLUGIN_CONFIG " %s", plugin_name); + return call_virtual_function_blocking(parser, buf, NULL, NULL); + } + + return call_virtual_function_blocking(parser, FUNCTION_NAME_GET_PLUGIN_CONFIG, NULL, NULL); } -static dyncfg_config_t get_plugin_config_schema_cb(void *usr_ctx) +static dyncfg_config_t get_plugin_config_schema_cb(void *usr_ctx, const char *plugin_name) { PARSER *parser = usr_ctx; + + if (SERVING_STREAMING(parser)) { + char buf[CVF_MAX_LEN + 1]; + snprintfz(buf, CVF_MAX_LEN, FUNCTION_NAME_GET_PLUGIN_CONFIG_SCHEMA " %s", plugin_name); + return call_virtual_function_blocking(parser, buf, NULL, NULL); + } + return call_virtual_function_blocking(parser, "get_plugin_config_schema", NULL, NULL); } -static dyncfg_config_t get_module_config_cb(void *usr_ctx, const char *module_name) +static dyncfg_config_t get_module_config_cb(void *usr_ctx, const char *plugin_name, const char *module_name) { PARSER *parser = usr_ctx; - char buf[1024]; - snprintfz(buf, sizeof(buf), "get_module_config %s", module_name); - return call_virtual_function_blocking(parser, buf, NULL, NULL); + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_GET_MODULE_CONFIG); + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + + buffer_sprintf(wb, " %s", module_name); + + dyncfg_config_t ret = call_virtual_function_blocking(parser, buffer_tostring(wb), NULL, NULL); + + buffer_free(wb); + + return ret; } -static dyncfg_config_t get_module_config_schema_cb(void *usr_ctx, const char *module_name) +static dyncfg_config_t get_module_config_schema_cb(void *usr_ctx, const char *plugin_name, const char *module_name) { PARSER *parser = usr_ctx; - char buf[1024]; - snprintfz(buf, sizeof(buf), "get_module_config_schema %s", module_name); - return call_virtual_function_blocking(parser, buf, NULL, NULL); + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_GET_MODULE_CONFIG_SCHEMA); + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + + buffer_sprintf(wb, " %s", module_name); + + dyncfg_config_t ret = call_virtual_function_blocking(parser, buffer_tostring(wb), NULL, NULL); + + buffer_free(wb); + + return ret; } -static dyncfg_config_t get_job_config_schema_cb(void *usr_ctx, const char *module_name) +static dyncfg_config_t get_job_config_schema_cb(void *usr_ctx, const char *plugin_name, const char *module_name) { PARSER *parser = usr_ctx; - char buf[1024]; - snprintfz(buf, sizeof(buf), "get_job_config_schema %s", module_name); - return call_virtual_function_blocking(parser, buf, NULL, NULL); + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_GET_JOB_CONFIG_SCHEMA); + + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + + buffer_sprintf(wb, " %s", module_name); + + dyncfg_config_t ret = call_virtual_function_blocking(parser, buffer_tostring(wb), NULL, NULL); + + buffer_free(wb); + + return ret; } -static dyncfg_config_t get_job_config_cb(void *usr_ctx, const char *module_name, const char* job_name) +static dyncfg_config_t get_job_config_cb(void *usr_ctx, const char *plugin_name, const char *module_name, const char* job_name) { PARSER *parser = usr_ctx; - char buf[1024]; - snprintfz(buf, sizeof(buf), "get_job_config %s %s", module_name, job_name); - return call_virtual_function_blocking(parser, buf, NULL, NULL); + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_GET_JOB_CONFIG); + + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + + buffer_sprintf(wb, " %s %s", module_name, job_name); + + dyncfg_config_t ret = call_virtual_function_blocking(parser, buffer_tostring(wb), NULL, NULL); + + buffer_free(wb); + + return ret; } -enum set_config_result set_plugin_config_cb(void *usr_ctx, dyncfg_config_t *cfg) +enum set_config_result set_plugin_config_cb(void *usr_ctx, const char *plugin_name, dyncfg_config_t *cfg) { PARSER *parser = usr_ctx; + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_SET_PLUGIN_CONFIG); + + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + int rc; - call_virtual_function_blocking(parser, "set_plugin_config", &rc, cfg->data); - if(rc != 1) + call_virtual_function_blocking(parser, buffer_tostring(wb), &rc, cfg->data); + + buffer_free(wb); + if(rc != DYNCFG_VFNC_RET_CFG_ACCEPTED) return SET_CONFIG_REJECTED; return SET_CONFIG_ACCEPTED; } -enum set_config_result set_module_config_cb(void *usr_ctx, const char *module_name, dyncfg_config_t *cfg) +enum set_config_result set_module_config_cb(void *usr_ctx, const char *plugin_name, const char *module_name, dyncfg_config_t *cfg) { PARSER *parser = usr_ctx; + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_SET_MODULE_CONFIG); + + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + + buffer_sprintf(wb, " %s", module_name); + int rc; + call_virtual_function_blocking(parser, buffer_tostring(wb), &rc, cfg->data); - char buf[1024]; - snprintfz(buf, sizeof(buf), "set_module_config %s", module_name); - call_virtual_function_blocking(parser, buf, &rc, cfg->data); + buffer_free(wb); - if(rc != 1) + if(rc != DYNCFG_VFNC_RET_CFG_ACCEPTED) return SET_CONFIG_REJECTED; return SET_CONFIG_ACCEPTED; } -enum set_config_result set_job_config_cb(void *usr_ctx, const char *module_name, const char *job_name, dyncfg_config_t *cfg) +enum set_config_result set_job_config_cb(void *usr_ctx, const char *plugin_name, const char *module_name, const char *job_name, dyncfg_config_t *cfg) { PARSER *parser = usr_ctx; + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_SET_JOB_CONFIG); + + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + + buffer_sprintf(wb, " %s %s", module_name, job_name); + int rc; + call_virtual_function_blocking(parser, buffer_tostring(wb), &rc, cfg->data); - char buf[1024]; - snprintfz(buf, sizeof(buf), "set_job_config %s %s", module_name, job_name); - call_virtual_function_blocking(parser, buf, &rc, cfg->data); + buffer_free(wb); - if(rc != 1) + if(rc != DYNCFG_VFNC_RET_CFG_ACCEPTED) return SET_CONFIG_REJECTED; return SET_CONFIG_ACCEPTED; } -enum set_config_result delete_job_cb(void *usr_ctx, const char *module_name, const char *job_name) +enum set_config_result delete_job_cb(void *usr_ctx, const char *plugin_name ,const char *module_name, const char *job_name) { PARSER *parser = usr_ctx; + BUFFER *wb = buffer_create(CVF_MAX_LEN, NULL); + + buffer_strcat(wb, FUNCTION_NAME_DELETE_JOB); + + if (SERVING_STREAMING(parser)) + buffer_sprintf(wb, " %s", plugin_name); + + buffer_sprintf(wb, " %s %s", module_name, job_name); + int rc; + call_virtual_function_blocking(parser, buffer_tostring(wb), &rc, NULL); - char buf[1024]; - snprintfz(buf, sizeof(buf), "delete_job %s %s", module_name, job_name); - call_virtual_function_blocking(parser, buf, &rc, NULL); + buffer_free(wb); - if(rc != 1) + if(rc != DYNCFG_VFNC_RET_CFG_ACCEPTED) return SET_CONFIG_REJECTED; return SET_CONFIG_ACCEPTED; } @@ -2079,37 +2356,65 @@ static inline PARSER_RC pluginsd_register_plugin(char **words __maybe_unused, si cfg->get_config_schema_cb = get_plugin_config_schema_cb; cfg->cb_usr_ctx = parser; - parser->user.cd->cfg_dict_item = register_plugin(cfg); - - if (unlikely(parser->user.cd->cfg_dict_item == NULL)) { + const DICTIONARY_ITEM *di = register_plugin(parser->user.host->configurable_plugins, cfg, SERVING_PLUGINSD(parser)); + if (unlikely(di == NULL)) { freez(cfg->name); freez(cfg); return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_ENABLE, "error registering plugin"); } - parser->user.cd->configuration = cfg; + if (SERVING_PLUGINSD(parser)) { + // this is optimization for pluginsd to avoid extra dictionary lookup + // as we know which plugin is comunicating with us + parser->user.cd->cfg_dict_item = di; + parser->user.cd->configuration = cfg; + } else { + // register_plugin keeps the item acquired, so we need to release it + dictionary_acquired_item_release(parser->user.host->configurable_plugins, di); + } + + rrdpush_send_dyncfg_enable(parser->user.host, cfg->name); + return PARSER_RC_OK; } +#define LOG_MSG_SIZE (1024) +#define MODULE_NAME_IDX (SERVING_PLUGINSD(parser) ? 1 : 2) +#define MODULE_TYPE_IDX (SERVING_PLUGINSD(parser) ? 2 : 3) static inline PARSER_RC pluginsd_register_module(char **words __maybe_unused, size_t num_words __maybe_unused, PARSER *parser __maybe_unused) { netdata_log_info("PLUGINSD: DYNCFG_REG_MODULE"); - struct configurable_plugin *plug_cfg = parser->user.cd->configuration; - if (unlikely(plug_cfg == NULL)) - return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_MODULE, "you have to enable dynamic configuration first using " PLUGINSD_KEYWORD_DYNCFG_ENABLE); - - if (unlikely(num_words != 3)) - return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_MODULE, "expected 2 parameters module_name followed by module_type"); + size_t expected_num_words = SERVING_PLUGINSD(parser) ? 3 : 4; + + if (unlikely(num_words != expected_num_words)) { + char log[LOG_MSG_SIZE + 1]; + snprintfz(log, LOG_MSG_SIZE, "expected %zu (got %zu) parameters: %smodule_name module_type", expected_num_words - 1, num_words - 1, SERVING_PLUGINSD(parser) ? "" : "plugin_name "); + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_MODULE, log); + } + + struct configurable_plugin *plug_cfg; + const DICTIONARY_ITEM *di = NULL; + if (SERVING_PLUGINSD(parser)) { + plug_cfg = parser->user.cd->configuration; + if (unlikely(plug_cfg == NULL)) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_MODULE, "you have to enable dynamic configuration first using " PLUGINSD_KEYWORD_DYNCFG_ENABLE); + } else { + di = dictionary_get_and_acquire_item(parser->user.host->configurable_plugins, words[1]); + if (unlikely(di == NULL)) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_MODULE, "plugin not found"); + + plug_cfg = (struct configurable_plugin *)dictionary_acquired_item_value(di); + } struct module *mod = callocz(1, sizeof(struct module)); - mod->type = str2_module_type(words[2]); + mod->type = str2_module_type(words[MODULE_TYPE_IDX]); if (unlikely(mod->type == MOD_TYPE_UNKNOWN)) { freez(mod); return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_MODULE, "unknown module type (allowed: job_array, single)"); } - mod->name = strdupz(words[1]); + mod->name = strdupz(words[MODULE_NAME_IDX]); mod->set_config_cb = set_module_config_cb; mod->get_config_cb = get_module_config_cb; @@ -2122,27 +2427,111 @@ static inline PARSER_RC pluginsd_register_module(char **words __maybe_unused, si mod->delete_job_cb = delete_job_cb; mod->job_config_cb_usr_ctx = parser; - register_module(plug_cfg, mod); + register_module(parser->user.host->configurable_plugins, plug_cfg, mod, SERVING_PLUGINSD(parser)); + + if (di != NULL) + dictionary_acquired_item_release(parser->user.host->configurable_plugins, di); + + rrdpush_send_dyncfg_reg_module(parser->user.host, plug_cfg->name, mod->name, mod->type); + return PARSER_RC_OK; } -// job_status <module_name> <job_name> <status_code> <state> <message> -static inline PARSER_RC pluginsd_job_status(char **words, size_t num_words, PARSER *parser) -{ - if (unlikely(num_words != 6 && num_words != 5)) - return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_REPORT_JOB_STATUS, "expected 4 or 5 parameters: module_name, job_name, status_code, state, [optional: message]"); +static inline PARSER_RC pluginsd_register_job_common(char **words __maybe_unused, size_t num_words __maybe_unused, PARSER *parser __maybe_unused, const char *plugin_name) { + if (atol(words[3]) < 0) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_JOB, "invalid flags"); + dyncfg_job_flg_t flags = atol(words[3]); + if (SERVING_PLUGINSD(parser)) + flags |= JOB_FLG_PLUGIN_PUSHED; + else + flags |= JOB_FLG_STREAMING_PUSHED; - int state = atoi(words[4]); + enum job_type job_type = str2job_type(words[2]); + if (job_type == JOB_TYPE_UNKNOWN) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_JOB, "unknown job type"); + if (SERVING_PLUGINSD(parser) && job_type == JOB_TYPE_USER) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_JOB, "plugins cannot push jobs of type \"user\" (this is allowed only in streaming)"); - enum job_status job_status = str2job_state(words[3]); - if (unlikely(job_status == JOB_STATUS_UNKNOWN)) - return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_REPORT_JOB_STATUS, "unknown job state"); + if (register_job(parser->user.host->configurable_plugins, plugin_name, words[0], words[1], job_type, flags, 0)) // ignore existing is off as this is explicitly called register job + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_JOB, "error registering job"); + + rrdpush_send_dyncfg_reg_job(parser->user.host, plugin_name, words[0], words[1], job_type, flags); + return PARSER_RC_OK; +} + +static inline PARSER_RC pluginsd_register_job(char **words __maybe_unused, size_t num_words __maybe_unused, PARSER *parser __maybe_unused) { + size_t expected_num_words = SERVING_PLUGINSD(parser) ? 5 : 6; + + if (unlikely(num_words != expected_num_words)) { + char log[LOG_MSG_SIZE + 1]; + snprintfz(log, LOG_MSG_SIZE, "expected %zu (got %zu) parameters: %smodule_name job_name job_type", expected_num_words - 1, num_words - 1, SERVING_PLUGINSD(parser) ? "" : "plugin_name "); + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DYNCFG_REGISTER_JOB, log); + } + + if (SERVING_PLUGINSD(parser)) { + return pluginsd_register_job_common(&words[1], num_words - 1, parser, parser->user.cd->configuration->name); + } + return pluginsd_register_job_common(&words[2], num_words - 2, parser, words[1]); +} + +static inline PARSER_RC pluginsd_job_status_common(char **words, size_t num_words, PARSER *parser, const char *plugin_name) { + int state = str2i(words[3]); + + enum job_status status = str2job_state(words[2]); + if (unlikely(SERVING_PLUGINSD(parser) && status == JOB_STATUS_UNKNOWN)) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_REPORT_JOB_STATUS, "unknown job status"); char *message = NULL; - if (num_words == 6) - message = strdupz(words[5]); + if (num_words == 5) + message = words[4]; + + const DICTIONARY_ITEM *plugin_item; + DICTIONARY *job_dict; + const DICTIONARY_ITEM *job_item = report_job_status_acq_lock(parser->user.host->configurable_plugins, &plugin_item, &job_dict, plugin_name, words[0], words[1], status, state, message); + + if (job_item != NULL) { + struct job *job = dictionary_acquired_item_value(job_item); + rrdpush_send_job_status_update(parser->user.host, plugin_name, words[0], job); + + pthread_mutex_unlock(&job->lock); + dictionary_acquired_item_release(job_dict, job_item); + dictionary_acquired_item_release(parser->user.host->configurable_plugins, plugin_item); + } + + return PARSER_RC_OK; +} - report_job_status(parser->user.cd->configuration, words[1], words[2], job_status, state, message); +// job_status [plugin_name if streaming] <module_name> <job_name> <status_code> <state> [message] +static PARSER_RC pluginsd_job_status(char **words, size_t num_words, PARSER *parser) { + if (SERVING_PLUGINSD(parser)) { + if (unlikely(num_words != 5 && num_words != 6)) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_REPORT_JOB_STATUS, "expected 4 or 5 parameters: module_name, job_name, status_code, state, [optional: message]"); + } else { + if (unlikely(num_words != 6 && num_words != 7)) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_REPORT_JOB_STATUS, "expected 5 or 6 parameters: plugin_name, module_name, job_name, status_code, state, [optional: message]"); + } + + if (SERVING_PLUGINSD(parser)) { + return pluginsd_job_status_common(&words[1], num_words - 1, parser, parser->user.cd->configuration->name); + } + return pluginsd_job_status_common(&words[2], num_words - 2, parser, words[1]); +} + +static PARSER_RC pluginsd_delete_job(char **words, size_t num_words, PARSER *parser) { + // this can confuse a bit but there is a diference between KEYWORD_DELETE_JOB and actual delete_job function + // they are of opossite direction + if (num_words != 4) + return PLUGINSD_DISABLE_PLUGIN(parser, PLUGINSD_KEYWORD_DELETE_JOB, "expected 2 parameters: plugin_name, module_name, job_name"); + + const char *plugin_name = get_word(words, num_words, 1); + const char *module_name = get_word(words, num_words, 2); + const char *job_name = get_word(words, num_words, 3); + + if (SERVING_STREAMING(parser)) + delete_job_pname(parser->user.host->configurable_plugins, plugin_name, module_name, job_name); + + // forward to parent if any + rrdpush_send_job_deleted(parser->user.host, plugin_name, module_name, job_name); return PARSER_RC_OK; } @@ -2309,15 +2698,22 @@ inline size_t pluginsd_process(RRDHOST *host, struct plugind *cd, FILE *fp_plugi netdata_thread_cleanup_push(pluginsd_process_thread_cleanup, parser); buffered_reader_init(&parser->reader); - char buffer[PLUGINSD_LINE_MAX + 2]; + BUFFER *buffer = buffer_create(sizeof(parser->reader.read_buffer) + 2, NULL); while(likely(service_running(SERVICE_COLLECTORS))) { - if (unlikely(!buffered_reader_next_line(&parser->reader, buffer, PLUGINSD_LINE_MAX + 2))) { + if (unlikely(!buffered_reader_next_line(&parser->reader, buffer))) { if(unlikely(!buffered_reader_read_timeout(&parser->reader, fileno((FILE *)parser->fp_input), 2 * 60 * MSEC_PER_SEC))) break; + + continue; } - else if(unlikely(parser_action(parser, buffer))) + + if(unlikely(parser_action(parser, buffer->buffer))) break; + + buffer->len = 0; + buffer->buffer[0] = '\0'; } + buffer_free(buffer); cd->unsafe.enabled = parser->user.enabled; count = parser->user.data_collections_count; @@ -2452,10 +2848,19 @@ PARSER_RC parser_execute(PARSER *parser, PARSER_KEYWORD *keyword, char **words, case 101: return pluginsd_register_plugin(words, num_words, parser); - + case 102: return pluginsd_register_module(words, num_words, parser); + case 103: + return pluginsd_register_job(words, num_words, parser); + + case 110: + return pluginsd_job_status(words, num_words, parser); + + case 111: + return pluginsd_delete_job(words, num_words, parser); + default: fatal("Unknown keyword '%s' with id %zu", keyword->keyword, keyword->id); } @@ -2472,14 +2877,20 @@ void parser_init_repertoire(PARSER *parser, PARSER_REPERTOIRE repertoire) { } } +static void parser_destroy_dyncfg(PARSER *parser) { + if (parser->user.cd != NULL && parser->user.cd->configuration != NULL) { + unregister_plugin(parser->user.host->configurable_plugins, parser->user.cd->cfg_dict_item); + parser->user.cd->configuration = NULL; + } else if (parser->user.host != NULL && SERVING_STREAMING(parser) && parser->user.host != localhost){ + dictionary_flush(parser->user.host->configurable_plugins); + } +} + void parser_destroy(PARSER *parser) { if (unlikely(!parser)) return; - if (parser->user.cd != NULL && parser->user.cd->configuration != NULL) { - unregister_plugin(parser->user.cd->cfg_dict_item); - parser->user.cd->configuration = NULL; - } + parser_destroy_dyncfg(parser); dictionary_destroy(parser->inflight.functions); freez(parser); diff --git a/collectors/plugins.d/pluginsd_parser.h b/collectors/plugins.d/pluginsd_parser.h index 5e1ea1242..74767569b 100644 --- a/collectors/plugins.d/pluginsd_parser.h +++ b/collectors/plugins.d/pluginsd_parser.h @@ -10,6 +10,9 @@ // this has to be in-sync with the same at receiver.c #define WORKER_RECEIVER_JOB_REPLICATION_COMPLETION (WORKER_PARSER_FIRST_JOB - 3) +// this controls the max response size of a function +#define PLUGINSD_MAX_DEFERRED_SIZE (20 * 1024 * 1024) + // PARSER return codes typedef enum __attribute__ ((__packed__)) parser_rc { PARSER_RC_OK, // Callback was successful, go on @@ -43,8 +46,8 @@ typedef struct parser_user_object { void *opaque; struct plugind *cd; int trust_durations; - DICTIONARY *new_host_labels; - DICTIONARY *chart_rrdlabels_linked_temporarily; + RRDLABELS *new_host_labels; + RRDLABELS *chart_rrdlabels_linked_temporarily; size_t data_collections_count; int enabled; @@ -55,7 +58,7 @@ typedef struct parser_user_object { uuid_t machine_guid; char machine_guid_str[UUID_STR_LEN]; STRING *hostname; - DICTIONARY *rrdlabels; + RRDLABELS *rrdlabels; } host_define; struct parser_user_object_replay { @@ -151,15 +154,15 @@ static inline int parser_action(PARSER *parser, char *input) { parser->line++; if(unlikely(parser->flags & PARSER_DEFER_UNTIL_KEYWORD)) { - char command[PLUGINSD_LINE_MAX + 1]; - bool has_keyword = find_first_keyword(input, command, PLUGINSD_LINE_MAX, isspace_map_pluginsd); + char command[100 + 1]; + bool has_keyword = find_first_keyword(input, command, 100, isspace_map_pluginsd); if(!has_keyword || strcmp(command, parser->defer.end_keyword) != 0) { if(parser->defer.response) { buffer_strcat(parser->defer.response, input); - if(buffer_strlen(parser->defer.response) > 10 * 1024 * 1024) { - // more than 10MB of data - // a bad plugin that did not send the end_keyword + if(buffer_strlen(parser->defer.response) > PLUGINSD_MAX_DEFERRED_SIZE) { + // more than PLUGINSD_MAX_DEFERRED_SIZE of data, + // or a bad plugin that did not send the end_keyword internal_error(true, "PLUGINSD: deferred response is too big (%zu bytes). Stopping this plugin.", buffer_strlen(parser->defer.response)); return 1; } @@ -180,7 +183,7 @@ static inline int parser_action(PARSER *parser, char *input) { return 0; } - char *words[PLUGINSD_MAX_WORDS]; + static __thread char *words[PLUGINSD_MAX_WORDS]; size_t num_words = quoted_strings_splitter_pluginsd(input, words, PLUGINSD_MAX_WORDS); const char *command = get_word(words, num_words, 0); diff --git a/collectors/proc.plugin/README.md b/collectors/proc.plugin/README.md index 16ae6f412..62e46569f 100644 --- a/collectors/proc.plugin/README.md +++ b/collectors/proc.plugin/README.md @@ -398,11 +398,11 @@ You can set the following values for each configuration option: #### Wireless configuration -#### alarms +#### alerts -There are several alarms defined in `health.d/net.conf`. +There are several alerts defined in `health.d/net.conf`. -The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alarms can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a child or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alarm with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-line-families) line in the alarm configuration. For example, if you want to disable the `inbound packets dropped` alarm for `eth0`, set `families: !eth0 *` in the alarm definition for `template: inbound_packets_dropped`. +The tricky ones are `inbound packets dropped` and `inbound packets dropped ratio`. They have quite a strict policy so that they warn users about possible issues. These alerts can be annoying for some network configurations. It is especially true for some bonding configurations if an interface is a child or a bonding interface itself. If it is expected to have a certain number of drops on an interface for a certain network configuration, a separate alert with different triggering thresholds can be created or the existing one can be disabled for this specific interface. It can be done with the help of the [families](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alert-line-families) line in the alert configuration. For example, if you want to disable the `inbound packets dropped` alert for `eth0`, set `families: !eth0 *` in the alert definition for `template: inbound_packets_dropped`. #### configuration diff --git a/collectors/proc.plugin/integrations/amd_gpu.md b/collectors/proc.plugin/integrations/amd_gpu.md new file mode 100644 index 000000000..c9964dbb7 --- /dev/null +++ b/collectors/proc.plugin/integrations/amd_gpu.md @@ -0,0 +1,109 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/amd_gpu.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "AMD GPU" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# AMD GPU + + +<img src="https://netdata.cloud/img/amd.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/class/drm + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors AMD GPU metrics, such as utilization, clock frequency and memory usage. + +It reads `/sys/class/drm` to collect metrics for every AMD GPU card instance it encounters. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per gpu + +These metrics refer to the GPU. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| product_name | GPU product name (e.g. AMD RX 6600) | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| amdgpu.gpu_utilization | utilization | percentage | +| amdgpu.gpu_mem_utilization | utilization | percentage | +| amdgpu.gpu_clk_frequency | frequency | MHz | +| amdgpu.gpu_mem_clk_frequency | frequency | MHz | +| amdgpu.gpu_mem_vram_usage_perc | usage | percentage | +| amdgpu.gpu_mem_vram_usage | free, used | bytes | +| amdgpu.gpu_mem_vis_vram_usage_perc | usage | percentage | +| amdgpu.gpu_mem_vis_vram_usage | free, used | bytes | +| amdgpu.gpu_mem_gtt_usage_perc | usage | percentage | +| amdgpu.gpu_mem_gtt_usage | free, used | bytes | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/btrfs.md b/collectors/proc.plugin/integrations/btrfs.md new file mode 100644 index 000000000..7c0764cf0 --- /dev/null +++ b/collectors/proc.plugin/integrations/btrfs.md @@ -0,0 +1,136 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/btrfs.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "BTRFS" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Filesystem/BTRFS" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# BTRFS + + +<img src="https://netdata.cloud/img/filesystem.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/fs/btrfs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides usage and error statistics from the BTRFS filesystem. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per btrfs filesystem + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| filesystem_uuid | TBD | +| filesystem_label | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| btrfs.disk | unallocated, data_free, data_used, meta_free, meta_used, sys_free, sys_used | MiB | +| btrfs.data | free, used | MiB | +| btrfs.metadata | free, used, reserved | MiB | +| btrfs.system | free, used | MiB | +| btrfs.commits | commits | commits | +| btrfs.commits_perc_time | commits | percentage | +| btrfs.commit_timings | last, max | ms | + +### Per btrfs device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| device_id | TBD | +| filesystem_uuid | TBD | +| filesystem_label | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| btrfs.device_errors | write_errs, read_errs, flush_errs, corruption_errs, generation_errs | errors | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ btrfs_allocated ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.disk | percentage of allocated BTRFS physical disk space | +| [ btrfs_data ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.data | utilization of BTRFS data space | +| [ btrfs_metadata ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.metadata | utilization of BTRFS metadata space | +| [ btrfs_system ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.system | utilization of BTRFS system space | +| [ btrfs_device_read_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.device_errors | number of encountered BTRFS read errors | +| [ btrfs_device_write_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.device_errors | number of encountered BTRFS write errors | +| [ btrfs_device_flush_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.device_errors | number of encountered BTRFS flush errors | +| [ btrfs_device_corruption_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.device_errors | number of encountered BTRFS corruption errors | +| [ btrfs_device_generation_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/btrfs.conf) | btrfs.device_errors | number of encountered BTRFS generation errors | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/conntrack.md b/collectors/proc.plugin/integrations/conntrack.md new file mode 100644 index 000000000..543aafc16 --- /dev/null +++ b/collectors/proc.plugin/integrations/conntrack.md @@ -0,0 +1,104 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/conntrack.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Conntrack" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Firewall" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Conntrack + + +<img src="https://netdata.cloud/img/firewall.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/stat/nf_conntrack + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors the connection tracking mechanism of Netfilter in the Linux Kernel. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Conntrack instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| netfilter.conntrack_sockets | connections | active connections | +| netfilter.conntrack_new | new, ignore, invalid | connections/s | +| netfilter.conntrack_changes | inserted, deleted, delete_list | changes/s | +| netfilter.conntrack_expect | created, deleted, new | expectations/s | +| netfilter.conntrack_search | searched, restarted, found | searches/s | +| netfilter.conntrack_errors | icmp_error, error_failed, drop, early_drop | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ netfilter_conntrack_full ](https://github.com/netdata/netdata/blob/master/health/health.d/netfilter.conf) | netfilter.conntrack_sockets | netfilter connection tracker table size utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/disk_statistics.md b/collectors/proc.plugin/integrations/disk_statistics.md new file mode 100644 index 000000000..fc2ce5b08 --- /dev/null +++ b/collectors/proc.plugin/integrations/disk_statistics.md @@ -0,0 +1,148 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/disk_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Disk Statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Disk" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Disk Statistics + + +<img src="https://netdata.cloud/img/hard-drive.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/diskstats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Detailed statistics for each of your system's disk devices and partitions. +The data is reported by the kernel and can be used to monitor disk activity on a Linux system. + +Get valuable insight into how your disks are performing and where potential bottlenecks might be. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Disk Statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.io | in, out | KiB/s | + +### Per disk + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| device | TBD | +| mount_point | TBD | +| device_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| disk.io | reads, writes | KiB/s | +| disk_ext.io | discards | KiB/s | +| disk.ops | reads, writes | operations/s | +| disk_ext.ops | discards, flushes | operations/s | +| disk.qops | operations | operations | +| disk.backlog | backlog | milliseconds | +| disk.busy | busy | milliseconds | +| disk.util | utilization | % of time working | +| disk.mops | reads, writes | merged operations/s | +| disk_ext.mops | discards | merged operations/s | +| disk.iotime | reads, writes | milliseconds/s | +| disk_ext.iotime | discards, flushes | milliseconds/s | +| disk.await | reads, writes | milliseconds/operation | +| disk_ext.await | discards, flushes | milliseconds/operation | +| disk.avgsz | reads, writes | KiB/operation | +| disk_ext.avgsz | discards | KiB/operation | +| disk.svctm | svctm | milliseconds/operation | +| disk.bcache_cache_alloc | ununsed, dirty, clean, metadata, undefined | percentage | +| disk.bcache_hit_ratio | 5min, 1hour, 1day, ever | percentage | +| disk.bcache_rates | congested, writeback | KiB/s | +| disk.bcache_size | dirty | MiB | +| disk.bcache_usage | avail | percentage | +| disk.bcache_cache_read_races | races, errors | operations/s | +| disk.bcache | hits, misses, collisions, readaheads | operations/s | +| disk.bcache_bypass | hits, misses | operations/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 10min_disk_backlog ](https://github.com/netdata/netdata/blob/master/health/health.d/disks.conf) | disk.backlog | average backlog size of the ${label:device} disk over the last 10 minutes | +| [ 10min_disk_utilization ](https://github.com/netdata/netdata/blob/master/health/health.d/disks.conf) | disk.util | average percentage of time ${label:device} disk was busy over the last 10 minutes | +| [ bcache_cache_dirty ](https://github.com/netdata/netdata/blob/master/health/health.d/bcache.conf) | disk.bcache_cache_alloc | percentage of cache space used for dirty data and metadata (this usually means your SSD cache is too small) | +| [ bcache_cache_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/bcache.conf) | disk.bcache_cache_read_races | number of times data was read from the cache, the bucket was reused and invalidated in the last 10 minutes (when this occurs the data is reread from the backing device) | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/entropy.md b/collectors/proc.plugin/integrations/entropy.md new file mode 100644 index 000000000..debf2e75e --- /dev/null +++ b/collectors/proc.plugin/integrations/entropy.md @@ -0,0 +1,132 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/entropy.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Entropy" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/System" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Entropy + + +<img src="https://netdata.cloud/img/syslog.png" width="150"/> + + +Plugin: proc.plugin +Module: /proc/sys/kernel/random/entropy_avail + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Entropy, a measure of the randomness or unpredictability of data. + +In the context of cryptography, entropy is used to generate random numbers or keys that are essential for +secure communication and encryption. Without a good source of entropy, cryptographic protocols can become +vulnerable to attacks that exploit the predictability of the generated keys. + +In most operating systems, entropy is generated by collecting random events from various sources, such as +hardware interrupts, mouse movements, keyboard presses, and disk activity. These events are fed into a pool +of entropy, which is then used to generate random numbers when needed. + +The `/dev/random` device in Linux is one such source of entropy, and it provides an interface for programs +to access the pool of entropy. When a program requests random numbers, it reads from the `/dev/random` device, +which blocks until enough entropy is available to generate the requested numbers. This ensures that the +generated numbers are truly random and not predictable. + +However, if the pool of entropy gets depleted, the `/dev/random` device may block indefinitely, causing +programs that rely on random numbers to slow down or even freeze. This is especially problematic for +cryptographic protocols that require a continuous stream of random numbers, such as SSL/TLS and SSH. + +To avoid this issue, some systems use a hardware random number generator (RNG) to generate high-quality +entropy. A hardware RNG generates random numbers by measuring physical phenomena, such as thermal noise or +radioactive decay. These sources of randomness are considered to be more reliable and unpredictable than +software-based sources. + +One such hardware RNG is the Trusted Platform Module (TPM), which is a dedicated hardware chip that is used +for cryptographic operations and secure boot. The TPM contains a built-in hardware RNG that generates +high-quality entropy, which can be used to seed the pool of entropy in the operating system. + +Alternatively, software-based solutions such as `Haveged` can be used to generate additional entropy by +exploiting sources of randomness in the system, such as CPU utilization and network traffic. These solutions +can help to mitigate the risk of entropy depletion, but they may not be as reliable as hardware-based solutions. + + + + +This collector is only supported on the following platforms: + +- linux + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Entropy instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.entropy | entropy | entropy | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ lowest_entropy ](https://github.com/netdata/netdata/blob/master/health/health.d/entropy.conf) | system.entropy | minimum number of bits of entropy available for the kernel’s random number generator | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/infiniband.md b/collectors/proc.plugin/integrations/infiniband.md new file mode 100644 index 000000000..6ebefe73e --- /dev/null +++ b/collectors/proc.plugin/integrations/infiniband.md @@ -0,0 +1,98 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/infiniband.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "InfiniBand" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# InfiniBand + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/class/infiniband + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors InfiniBand network inteface statistics. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per infiniband port + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ib.bytes | Received, Sent | kilobits/s | +| ib.packets | Received, Sent, Mcast_rcvd, Mcast_sent, Ucast_rcvd, Ucast_sent | packets/s | +| ib.errors | Pkts_malformated, Pkts_rcvd_discarded, Pkts_sent_discarded, Tick_Wait_to_send, Pkts_missed_resource, Buffer_overrun, Link_Downed, Link_recovered, Link_integrity_err, Link_minor_errors, Pkts_rcvd_with_EBP, Pkts_rcvd_discarded_by_switch, Pkts_sent_discarded_by_switch | errors/s | +| ib.hwerrors | Duplicated_packets, Pkt_Seq_Num_gap, Ack_timer_expired, Drop_missing_buffer, Drop_out_of_sequence, NAK_sequence_rcvd, CQE_err_Req, CQE_err_Resp, CQE_Flushed_err_Req, CQE_Flushed_err_Resp, Remote_access_err_Req, Remote_access_err_Resp, Remote_invalid_req, Local_length_err_Resp, RNR_NAK_Packets, CNP_Pkts_ignored, RoCE_ICRC_Errors | errors/s | +| ib.hwpackets | RoCEv2_Congestion_sent, RoCEv2_Congestion_rcvd, IB_Congestion_handled, ATOMIC_req_rcvd, Connection_req_rcvd, Read_req_rcvd, Write_req_rcvd, RoCE_retrans_adaptive, RoCE_retrans_timeout, RoCE_slow_restart, RoCE_slow_restart_congestion, RoCE_slow_restart_count | packets/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/inter_process_communication.md b/collectors/proc.plugin/integrations/inter_process_communication.md new file mode 100644 index 000000000..b36b02d3b --- /dev/null +++ b/collectors/proc.plugin/integrations/inter_process_communication.md @@ -0,0 +1,119 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/inter_process_communication.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Inter Process Communication" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/IPC" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Inter Process Communication + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: ipc + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +IPC stands for Inter-Process Communication. It is a mechanism which allows processes to communicate with each +other and synchronize their actions. + +This collector exposes information about: + +- Message Queues: This allows messages to be exchanged between processes. It's a more flexible method that + allows messages to be placed onto a queue and read at a later time. + +- Shared Memory: This method allows for the fastest form of IPC because processes can exchange data by + reading/writing into shared memory segments. + +- Semaphores: They are used to synchronize the operations performed by independent processes. So, if multiple + processes are trying to access a single shared resource, semaphores can ensure that only one process + accesses the resource at a given time. + + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Inter Process Communication instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ipc_semaphores | semaphores | semaphores | +| system.ipc_semaphore_arrays | arrays | arrays | +| system.message_queue_message | a dimension per queue | messages | +| system.message_queue_bytes | a dimension per queue | bytes | +| system.shared_memory_segments | segments | segments | +| system.shared_memory_bytes | bytes | bytes | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ semaphores_used ](https://github.com/netdata/netdata/blob/master/health/health.d/ipc.conf) | system.ipc_semaphores | IPC semaphore utilization | +| [ semaphore_arrays_used ](https://github.com/netdata/netdata/blob/master/health/health.d/ipc.conf) | system.ipc_semaphore_arrays | IPC semaphore arrays utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/interrupts.md b/collectors/proc.plugin/integrations/interrupts.md new file mode 100644 index 000000000..756324163 --- /dev/null +++ b/collectors/proc.plugin/integrations/interrupts.md @@ -0,0 +1,140 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/interrupts.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Interrupts" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/CPU" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Interrupts + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/interrupts + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitors `/proc/interrupts`, a file organized by CPU and then by the type of interrupt. +The numbers reported are the counts of the interrupts that have occurred of each type. + +An interrupt is a signal to the processor emitted by hardware or software indicating an event that needs +immediate attention. The processor then interrupts its current activities and executes the interrupt handler +to deal with the event. This is part of the way a computer multitasks and handles concurrent processing. + +The types of interrupts include: + +- **I/O interrupts**: These are caused by I/O devices like the keyboard, mouse, printer, etc. For example, when + you type something on the keyboard, an interrupt is triggered so the processor can handle the new input. + +- **Timer interrupts**: These are generated at regular intervals by the system's timer circuit. It's primarily + used to switch the CPU among different tasks. + +- **Software interrupts**: These are generated by a program requiring disk I/O operations, or other system resources. + +- **Hardware interrupts**: These are caused by hardware conditions such as power failure, overheating, etc. + +Monitoring `/proc/interrupts` can be used for: + +- **Performance tuning**: If an interrupt is happening very frequently, it could be a sign that a device is not + configured correctly, or there is a software bug causing unnecessary interrupts. This could lead to system + performance degradation. + +- **System troubleshooting**: If you're seeing a lot of unexpected interrupts, it could be a sign of a hardware problem. + +- **Understanding system behavior**: More generally, keeping an eye on what interrupts are occurring can help you + understand what your system is doing. It can provide insights into the system's interaction with hardware, + drivers, and other parts of the kernel. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Interrupts instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.interrupts | a dimension per device | interrupts/s | + +### Per cpu core + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| cpu | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.interrupts | a dimension per device | interrupts/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/ip_virtual_server.md b/collectors/proc.plugin/integrations/ip_virtual_server.md new file mode 100644 index 000000000..22f43544e --- /dev/null +++ b/collectors/proc.plugin/integrations/ip_virtual_server.md @@ -0,0 +1,96 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/ip_virtual_server.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "IP Virtual Server" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# IP Virtual Server + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/ip_vs_stats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors IP Virtual Server statistics + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per IP Virtual Server instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipvs.sockets | connections | connections/s | +| ipvs.packets | received, sent | packets/s | +| ipvs.net | received, sent | kilobits/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/ipv6_socket_statistics.md b/collectors/proc.plugin/integrations/ipv6_socket_statistics.md new file mode 100644 index 000000000..bf0fbaa00 --- /dev/null +++ b/collectors/proc.plugin/integrations/ipv6_socket_statistics.md @@ -0,0 +1,98 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/ipv6_socket_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "IPv6 Socket Statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# IPv6 Socket Statistics + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/sockstat6 + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides IPv6 socket statistics. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per IPv6 Socket Statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipv6.sockstat6_tcp_sockets | inuse | sockets | +| ipv6.sockstat6_udp_sockets | inuse | sockets | +| ipv6.sockstat6_udplite_sockets | inuse | sockets | +| ipv6.sockstat6_raw_sockets | inuse | sockets | +| ipv6.sockstat6_frag_sockets | inuse | fragments | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/kernel_same-page_merging.md b/collectors/proc.plugin/integrations/kernel_same-page_merging.md new file mode 100644 index 000000000..bed7891bd --- /dev/null +++ b/collectors/proc.plugin/integrations/kernel_same-page_merging.md @@ -0,0 +1,102 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/kernel_same-page_merging.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Kernel Same-Page Merging" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Kernel Same-Page Merging + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/kernel/mm/ksm + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Kernel Samepage Merging (KSM) is a memory-saving feature in Linux that enables the kernel to examine the +memory of different processes and identify identical pages. It then merges these identical pages into a +single page that the processes share. This is particularly useful for virtualization, where multiple virtual +machines might be running the same operating system or applications and have many identical pages. + +The collector provides information about the operation and effectiveness of KSM on your system. + + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Kernel Same-Page Merging instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.ksm | shared, unshared, sharing, volatile | MiB | +| mem.ksm_savings | savings, offered | MiB | +| mem.ksm_ratios | savings | percentage | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/md_raid.md b/collectors/proc.plugin/integrations/md_raid.md new file mode 100644 index 000000000..ef78b8269 --- /dev/null +++ b/collectors/proc.plugin/integrations/md_raid.md @@ -0,0 +1,124 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/md_raid.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "MD RAID" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Disk" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# MD RAID + + +<img src="https://netdata.cloud/img/hard-drive.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/mdstat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors the status of MD RAID devices. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per MD RAID instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| md.health | a dimension per md array | failed disks | + +### Per md array + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| device | TBD | +| raid_level | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| md.disks | inuse, down | disks | +| md.mismatch_cnt | count | unsynchronized blocks | +| md.status | check, resync, recovery, reshape | percent | +| md.expected_time_until_operation_finish | finish_in | seconds | +| md.operation_speed | speed | KiB/s | +| md.nonredundant | available | boolean | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ mdstat_last_collected ](https://github.com/netdata/netdata/blob/master/health/health.d/mdstat.conf) | md.disks | number of seconds since the last successful data collection | +| [ mdstat_disks ](https://github.com/netdata/netdata/blob/master/health/health.d/mdstat.conf) | md.disks | number of devices in the down state for the ${label:device} ${label:raid_level} array. Any number > 0 indicates that the array is degraded. | +| [ mdstat_mismatch_cnt ](https://github.com/netdata/netdata/blob/master/health/health.d/mdstat.conf) | md.mismatch_cnt | number of unsynchronized blocks for the ${label:device} ${label:raid_level} array | +| [ mdstat_nonredundant_last_collected ](https://github.com/netdata/netdata/blob/master/health/health.d/mdstat.conf) | md.nonredundant | number of seconds since the last successful data collection | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/memory_modules_dimms.md b/collectors/proc.plugin/integrations/memory_modules_dimms.md new file mode 100644 index 000000000..dc59fe5fc --- /dev/null +++ b/collectors/proc.plugin/integrations/memory_modules_dimms.md @@ -0,0 +1,145 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/memory_modules_dimms.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Memory modules (DIMMs)" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Memory modules (DIMMs) + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/devices/system/edac/mc + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +The Error Detection and Correction (EDAC) subsystem is detecting and reporting errors in the system's memory, +primarily ECC (Error-Correcting Code) memory errors. + +The collector provides data for: + +- Per memory controller (MC): correctable and uncorrectable errors. These can be of 2 kinds: + - errors related to a DIMM + - errors that cannot be associated with a DIMM + +- Per memory DIMM: correctable and uncorrectable errors. There are 2 kinds: + - memory controllers that can identify the physical DIMMS and report errors directly for them, + - memory controllers that report errors for memory address ranges that can be linked to dimms. + In this case the DIMMS reported may be more than the physical DIMMS installed. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per memory controller + +These metrics refer to the memory controller. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| controller | [mcX](https://www.kernel.org/doc/html/v5.0/admin-guide/ras.html#mcx-directories) directory name of this memory controller. | +| mc_name | Memory controller type. | +| size_mb | The amount of memory in megabytes that this memory controller manages. | +| max_location | Last available memory slot in this memory controller. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.edac_mc | correctable, uncorrectable, correctable_noinfo, uncorrectable_noinfo | errors/s | + +### Per memory module + +These metrics refer to the memory module (or rank, [depends on the memory controller](https://www.kernel.org/doc/html/v5.0/admin-guide/ras.html#f5)). + +Labels: + +| Label | Description | +|:-----------|:----------------| +| controller | [mcX](https://www.kernel.org/doc/html/v5.0/admin-guide/ras.html#mcx-directories) directory name of this memory controller. | +| dimm | [dimmX or rankX](https://www.kernel.org/doc/html/v5.0/admin-guide/ras.html#dimmx-or-rankx-directories) directory name of this memory module. | +| dimm_dev_type | Type of DRAM device used in this memory module. For example, x1, x2, x4, x8. | +| dimm_edac_mode | Used type of error detection and correction. For example, S4ECD4ED would mean a Chipkill with x4 DRAM. | +| dimm_label | Label assigned to this memory module. | +| dimm_location | Location of the memory module. | +| dimm_mem_type | Type of the memory module. | +| size | The amount of memory in megabytes that this memory module manages. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.edac_mc | correctable, uncorrectable | errors/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ ecc_memory_mc_noinfo_correctable ](https://github.com/netdata/netdata/blob/master/health/health.d/memory.conf) | mem.edac_mc | memory controller ${label:controller} ECC correctable errors (unknown DIMM slot) in the last 10 minutes | +| [ ecc_memory_mc_noinfo_uncorrectable ](https://github.com/netdata/netdata/blob/master/health/health.d/memory.conf) | mem.edac_mc | memory controller ${label:controller} ECC uncorrectable errors (unknown DIMM slot) in the last 10 minutes | +| [ ecc_memory_dimm_correctable ](https://github.com/netdata/netdata/blob/master/health/health.d/memory.conf) | mem.edac_mc_dimm | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC correctable errors in the last 10 minutes | +| [ ecc_memory_dimm_uncorrectable ](https://github.com/netdata/netdata/blob/master/health/health.d/memory.conf) | mem.edac_mc_dimm | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC uncorrectable errors in the last 10 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/memory_statistics.md b/collectors/proc.plugin/integrations/memory_statistics.md new file mode 100644 index 000000000..712b4b5e8 --- /dev/null +++ b/collectors/proc.plugin/integrations/memory_statistics.md @@ -0,0 +1,137 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/memory_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Memory Statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Memory Statistics + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/vmstat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Linux Virtual memory subsystem. + +Information about memory management, indicating how effectively the kernel allocates and frees +memory resources in response to system demands. + +Monitors page faults, which occur when a process requests a portion of its memory that isn't +immediately available. Monitoring these events can help diagnose inefficiencies in memory management and +provide insights into application behavior. + +Tracks swapping activity — a vital aspect of memory management where the kernel moves data from RAM to +swap space, and vice versa, based on memory demand and usage. It also monitors the utilization of zswap, +a compressed cache for swap pages, and provides insights into its usage and performance implications. + +In the context of virtualized environments, it tracks the ballooning mechanism which is used to balance +memory resources between host and guest systems. + +For systems using NUMA architecture, it provides insights into the local and remote memory accesses, which +can impact the performance based on the memory access times. + +The collector also watches for 'Out of Memory' kills, a drastic measure taken by the system when it runs out +of memory resources. + + + + +This collector is only supported on the following platforms: + +- linux + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Memory Statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.swapio | in, out | KiB/s | +| system.pgpgio | in, out | KiB/s | +| system.pgfaults | minor, major | faults/s | +| mem.balloon | inflate, deflate, migrate | KiB/s | +| mem.zswapio | in, out | KiB/s | +| mem.ksm_cow | swapin, write | KiB/s | +| mem.thp_faults | alloc, fallback, fallback_charge | events/s | +| mem.thp_file | alloc, fallback, mapped, fallback_charge | events/s | +| mem.thp_zero | alloc, failed | events/s | +| mem.thp_collapse | alloc, failed | events/s | +| mem.thp_split | split, failed, split_pmd, split_deferred | events/s | +| mem.thp_swapout | swapout, fallback | events/s | +| mem.thp_compact | success, fail, stall | events/s | +| mem.oom_kill | kills | kills/s | +| mem.numa | local, foreign, interleave, other, pte_updates, huge_pte_updates, hint_faults, hint_faults_local, pages_migrated | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 30min_ram_swapped_out ](https://github.com/netdata/netdata/blob/master/health/health.d/swap.conf) | mem.swapio | percentage of the system RAM swapped in the last 30 minutes | +| [ oom_kill ](https://github.com/netdata/netdata/blob/master/health/health.d/ram.conf) | mem.oom_kill | number of out of memory kills in the last 30 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/memory_usage.md b/collectors/proc.plugin/integrations/memory_usage.md new file mode 100644 index 000000000..0eef72b12 --- /dev/null +++ b/collectors/proc.plugin/integrations/memory_usage.md @@ -0,0 +1,134 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/memory_usage.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Memory Usage" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Memory Usage + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/meminfo + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +`/proc/meminfo` provides detailed information about the system's current memory usage. It includes information +about different types of memory, RAM, Swap, ZSwap, HugePages, Transparent HugePages (THP), Kernel memory, +SLAB memory, memory mappings, and more. + +Monitoring /proc/meminfo can be useful for: + +- **Performance Tuning**: Understanding your system's memory usage can help you make decisions about system + tuning and optimization. For example, if your system is frequently low on free memory, it might benefit + from more RAM. + +- **Troubleshooting**: If your system is experiencing problems, `/proc/meminfo` can provide clues about + whether memory usage is a factor. For example, if your system is slow and cached swap is high, it could + mean that your system is swapping out a lot of memory to disk, which can degrade performance. + +- **Capacity Planning**: By monitoring memory usage over time, you can understand trends and make informed + decisions about future capacity needs. + + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Memory Usage instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ram | free, used, cached, buffers | MiB | +| mem.available | avail | MiB | +| mem.swap | free, used | MiB | +| mem.swap_cached | cached | MiB | +| mem.zswap | in-ram, on-disk | MiB | +| mem.hwcorrupt | HardwareCorrupted | MiB | +| mem.commited | Commited_AS | MiB | +| mem.writeback | Dirty, Writeback, FuseWriteback, NfsWriteback, Bounce | MiB | +| mem.kernel | Slab, KernelStack, PageTables, VmallocUsed, Percpu | MiB | +| mem.slab | reclaimable, unreclaimable | MiB | +| mem.hugepages | free, used, surplus, reserved | MiB | +| mem.thp | anonymous, shmem | MiB | +| mem.thp_details | ShmemPmdMapped, FileHugePages, FilePmdMapped | MiB | +| mem.reclaiming | Active, Inactive, Active(anon), Inactive(anon), Active(file), Inactive(file), Unevictable, Mlocked | MiB | +| mem.high_low | high_used, low_used, high_free, low_free | MiB | +| mem.cma | used, free | MiB | +| mem.directmaps | 4k, 2m, 4m, 1g | MiB | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ ram_in_use ](https://github.com/netdata/netdata/blob/master/health/health.d/ram.conf) | system.ram | system memory utilization | +| [ ram_available ](https://github.com/netdata/netdata/blob/master/health/health.d/ram.conf) | mem.available | percentage of estimated amount of RAM available for userspace processes, without causing swapping | +| [ used_swap ](https://github.com/netdata/netdata/blob/master/health/health.d/swap.conf) | mem.swap | swap memory utilization | +| [ 1hour_memory_hw_corrupted ](https://github.com/netdata/netdata/blob/master/health/health.d/memory.conf) | mem.hwcorrupt | amount of memory corrupted due to a hardware failure | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/network_interfaces.md b/collectors/proc.plugin/integrations/network_interfaces.md new file mode 100644 index 000000000..0d26b5b66 --- /dev/null +++ b/collectors/proc.plugin/integrations/network_interfaces.md @@ -0,0 +1,136 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/network_interfaces.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Network interfaces" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Network interfaces + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/dev + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor network interface metrics about bandwidth, state, errors and more. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Network interfaces instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.net | received, sent | kilobits/s | + +### Per network device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| interface_type | TBD | +| device | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| net.net | received, sent | kilobits/s | +| net.speed | speed | kilobits/s | +| net.duplex | full, half, unknown | state | +| net.operstate | up, down, notpresent, lowerlayerdown, testing, dormant, unknown | state | +| net.carrier | up, down | state | +| net.mtu | mtu | octets | +| net.packets | received, sent, multicast | packets/s | +| net.errors | inbound, outbound | errors/s | +| net.drops | inbound, outbound | drops/s | +| net.fifo | receive, transmit | errors | +| net.compressed | received, sent | packets/s | +| net.events | frames, collisions, carrier | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ interface_speed ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.net | network interface ${label:device} current speed | +| [ 1m_received_traffic_overflow ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.net | average inbound utilization for the network interface ${label:device} over the last minute | +| [ 1m_sent_traffic_overflow ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.net | average outbound utilization for the network interface ${label:device} over the last minute | +| [ inbound_packets_dropped_ratio ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.drops | ratio of inbound dropped packets for the network interface ${label:device} over the last 10 minutes | +| [ outbound_packets_dropped_ratio ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.drops | ratio of outbound dropped packets for the network interface ${label:device} over the last 10 minutes | +| [ wifi_inbound_packets_dropped_ratio ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.drops | ratio of inbound dropped packets for the network interface ${label:device} over the last 10 minutes | +| [ wifi_outbound_packets_dropped_ratio ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.drops | ratio of outbound dropped packets for the network interface ${label:device} over the last 10 minutes | +| [ 1m_received_packets_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.packets | average number of packets received by the network interface ${label:device} over the last minute | +| [ 10s_received_packets_storm ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.packets | ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute | +| [ 10min_fifo_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/net.conf) | net.fifo | number of FIFO errors for the network interface ${label:device} in the last 10 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/network_statistics.md b/collectors/proc.plugin/integrations/network_statistics.md new file mode 100644 index 000000000..f43da8339 --- /dev/null +++ b/collectors/proc.plugin/integrations/network_statistics.md @@ -0,0 +1,160 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/network_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Network statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Network statistics + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/netstat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides metrics from the `netstat`, `snmp` and `snmp6` modules. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Network statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.ip | received, sent | kilobits/s | +| ip.tcpmemorypressures | pressures | events/s | +| ip.tcpconnaborts | baddata, userclosed, nomemory, timeout, linger, failed | connections/s | +| ip.tcpreorders | timestamp, sack, fack, reno | packets/s | +| ip.tcpofo | inqueue, dropped, merged, pruned | packets/s | +| ip.tcpsyncookies | received, sent, failed | packets/s | +| ip.tcp_syn_queue | drops, cookies | packets/s | +| ip.tcp_accept_queue | overflows, drops | packets/s | +| ip.tcpsock | connections | active connections | +| ip.tcppackets | received, sent | packets/s | +| ip.tcperrors | InErrs, InCsumErrors, RetransSegs | packets/s | +| ip.tcpopens | active, passive | connections/s | +| ip.tcphandshake | EstabResets, OutRsts, AttemptFails, SynRetrans | events/s | +| ipv4.packets | received, sent, forwarded, delivered | packets/s | +| ipv4.errors | InDiscards, OutDiscards, InNoRoutes, OutNoRoutes, InHdrErrors, InAddrErrors, InTruncatedPkts, InCsumErrors | packets/s | +| ipc4.bcast | received, sent | kilobits/s | +| ipv4.bcastpkts | received, sent | packets/s | +| ipv4.mcast | received, sent | kilobits/s | +| ipv4.mcastpkts | received, sent | packets/s | +| ipv4.icmp | received, sent | packets/s | +| ipv4.icmpmsg | InEchoReps, OutEchoReps, InDestUnreachs, OutDestUnreachs, InRedirects, OutRedirects, InEchos, OutEchos, InRouterAdvert, OutRouterAdvert, InRouterSelect, OutRouterSelect, InTimeExcds, OutTimeExcds, InParmProbs, OutParmProbs, InTimestamps, OutTimestamps, InTimestampReps, OutTimestampReps | packets/s | +| ipv4.icmp_errors | InErrors, OutErrors, InCsumErrors | packets/s | +| ipv4.udppackets | received, sent | packets/s | +| ipv4.udperrors | RcvbufErrors, SndbufErrors, InErrors, NoPorts, InCsumErrors, IgnoredMulti | events/s | +| ipv4.udplite | received, sent | packets/s | +| ipv4.udplite_errors | RcvbufErrors, SndbufErrors, InErrors, NoPorts, InCsumErrors, IgnoredMulti | packets/s | +| ipv4.ecnpkts | CEP, NoECTP, ECTP0, ECTP1 | packets/s | +| ipv4.fragsin | ok, failed, all | packets/s | +| ipv4.fragsout | ok, failed, created | packets/s | +| system.ipv6 | received, sent | kilobits/s | +| ipv6.packets | received, sent, forwarded, delivers | packets/s | +| ipv6.errors | InDiscards, OutDiscards, InHdrErrors, InAddrErrors, InUnknownProtos, InTooBigErrors, InTruncatedPkts, InNoRoutes, OutNoRoutes | packets/s | +| ipv6.bcast | received, sent | kilobits/s | +| ipv6.mcast | received, sent | kilobits/s | +| ipv6.mcastpkts | received, sent | packets/s | +| ipv6.udppackets | received, sent | packets/s | +| ipv6.udperrors | RcvbufErrors, SndbufErrors, InErrors, NoPorts, InCsumErrors, IgnoredMulti | events/s | +| ipv6.udplitepackets | received, sent | packets/s | +| ipv6.udpliteerrors | RcvbufErrors, SndbufErrors, InErrors, NoPorts, InCsumErrors | events/s | +| ipv6.icmp | received, sent | messages/s | +| ipv6.icmpredir | received, sent | redirects/s | +| ipv6.icmperrors | InErrors, OutErrors, InCsumErrors, InDestUnreachs, InPktTooBigs, InTimeExcds, InParmProblems, OutDestUnreachs, OutPktTooBigs, OutTimeExcds, OutParmProblems | errors/s | +| ipv6.icmpechos | InEchos, OutEchos, InEchoReplies, OutEchoReplies | messages/s | +| ipv6.groupmemb | InQueries, OutQueries, InResponses, OutResponses, InReductions, OutReductions | messages/s | +| ipv6.icmprouter | InSolicits, OutSolicits, InAdvertisements, OutAdvertisements | messages/s | +| ipv6.icmpneighbor | InSolicits, OutSolicits, InAdvertisements, OutAdvertisements | messages/s | +| ipv6.icmpmldv2 | received, sent | reports/s | +| ipv6.icmptypes | InType1, InType128, InType129, InType136, OutType1, OutType128, OutType129, OutType133, OutType135, OutType143 | messages/s | +| ipv6.ect | InNoECTPkts, InECT1Pkts, InECT0Pkts, InCEPkts | packets/s | +| ipv6.ect | InNoECTPkts, InECT1Pkts, InECT0Pkts, InCEPkts | packets/s | +| ipv6.fragsin | ok, failed, timeout, all | packets/s | +| ipv6.fragsout | ok, failed, all | packets/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 1m_tcp_syn_queue_drops ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_listen.conf) | ip.tcp_syn_queue | average number of SYN requests was dropped due to the full TCP SYN queue over the last minute (SYN cookies were not enabled) | +| [ 1m_tcp_syn_queue_cookies ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_listen.conf) | ip.tcp_syn_queue | average number of sent SYN cookies due to the full TCP SYN queue over the last minute | +| [ 1m_tcp_accept_queue_overflows ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_listen.conf) | ip.tcp_accept_queue | average number of overflows in the TCP accept queue over the last minute | +| [ 1m_tcp_accept_queue_drops ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_listen.conf) | ip.tcp_accept_queue | average number of dropped packets in the TCP accept queue over the last minute | +| [ tcp_connections ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_conn.conf) | ip.tcpsock | TCP connections utilization | +| [ 1m_ip_tcp_resets_sent ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ip.tcphandshake | average number of sent TCP RESETS over the last minute | +| [ 10s_ip_tcp_resets_sent ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ip.tcphandshake | average number of sent TCP RESETS over the last 10 seconds. This can indicate a port scan, or that a service running on this host has crashed. Netdata will not send a clear notification for this alarm. | +| [ 1m_ip_tcp_resets_received ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ip.tcphandshake | average number of received TCP RESETS over the last minute | +| [ 10s_ip_tcp_resets_received ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf) | ip.tcphandshake | average number of received TCP RESETS over the last 10 seconds. This can be an indication that a service this host needs has crashed. Netdata will not send a clear notification for this alarm. | +| [ 1m_ipv4_udp_receive_buffer_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/udp_errors.conf) | ipv4.udperrors | average number of UDP receive buffer errors over the last minute | +| [ 1m_ipv4_udp_send_buffer_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/udp_errors.conf) | ipv4.udperrors | average number of UDP send buffer errors over the last minute | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/nfs_client.md b/collectors/proc.plugin/integrations/nfs_client.md new file mode 100644 index 000000000..696e0c0d6 --- /dev/null +++ b/collectors/proc.plugin/integrations/nfs_client.md @@ -0,0 +1,98 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/nfs_client.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "NFS Client" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Filesystem/NFS" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# NFS Client + + +<img src="https://netdata.cloud/img/nfs.png" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/rpc/nfs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides statistics from the Linux kernel's NFS Client. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per NFS Client instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| nfs.net | udp, tcp | operations/s | +| nfs.rpc | calls, retransmits, auth_refresh | calls/s | +| nfs.proc2 | a dimension per proc2 call | calls/s | +| nfs.proc3 | a dimension per proc3 call | calls/s | +| nfs.proc4 | a dimension per proc4 call | calls/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/nfs_server.md b/collectors/proc.plugin/integrations/nfs_server.md new file mode 100644 index 000000000..ddbf03f90 --- /dev/null +++ b/collectors/proc.plugin/integrations/nfs_server.md @@ -0,0 +1,103 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/nfs_server.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "NFS Server" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Filesystem/NFS" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# NFS Server + + +<img src="https://netdata.cloud/img/nfs.png" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/rpc/nfsd + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides statistics from the Linux kernel's NFS Server. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per NFS Server instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| nfsd.readcache | hits, misses, nocache | reads/s | +| nfsd.filehandles | stale | handles/s | +| nfsd.io | read, write | kilobytes/s | +| nfsd.threads | threads | threads | +| nfsd.net | udp, tcp | packets/s | +| nfsd.rpc | calls, bad_format, bad_auth | calls/s | +| nfsd.proc2 | a dimension per proc2 call | calls/s | +| nfsd.proc3 | a dimension per proc3 call | calls/s | +| nfsd.proc4 | a dimension per proc4 call | calls/s | +| nfsd.proc4ops | a dimension per proc4 operation | operations/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/non-uniform_memory_access.md b/collectors/proc.plugin/integrations/non-uniform_memory_access.md new file mode 100644 index 000000000..58b96a3e7 --- /dev/null +++ b/collectors/proc.plugin/integrations/non-uniform_memory_access.md @@ -0,0 +1,110 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/non-uniform_memory_access.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Non-Uniform Memory Access" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Non-Uniform Memory Access + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/devices/system/node + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Information about NUMA (Non-Uniform Memory Access) nodes on the system. + +NUMA is a method of configuring a cluster of microprocessor in a multiprocessing system so that they can +share memory locally, improving performance and the ability of the system to be expanded. NUMA is used in a +symmetric multiprocessing (SMP) system. + +In a NUMA system, processors, memory, and I/O devices are grouped together into cells, also known as nodes. +Each node has its own memory and set of I/O devices, and one or more processors. While a processor can access +memory in any of the nodes, it does so faster when accessing memory within its own node. + +The collector provides statistics on memory allocations for processes running on the NUMA nodes, revealing the +efficiency of memory allocations in multi-node systems. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per numa node + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| numa_node | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.numa_nodes | hit, miss, local, foreign, interleave, other | events/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/page_types.md b/collectors/proc.plugin/integrations/page_types.md new file mode 100644 index 000000000..7f84182de --- /dev/null +++ b/collectors/proc.plugin/integrations/page_types.md @@ -0,0 +1,112 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/page_types.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Page types" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Page types + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/pagetypeinfo + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides metrics about the system's memory page types + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Page types instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.pagetype_global | a dimension per pagesize | B | + +### Per node, zone, type + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| node_id | TBD | +| node_zone | TBD | +| node_type | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.pagetype | a dimension per pagesize | B | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/power_supply.md b/collectors/proc.plugin/integrations/power_supply.md new file mode 100644 index 000000000..4980f845b --- /dev/null +++ b/collectors/proc.plugin/integrations/power_supply.md @@ -0,0 +1,106 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/power_supply.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Power Supply" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Power Supply" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Power Supply + + +<img src="https://netdata.cloud/img/powersupply.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/class/power_supply + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors Power supply metrics, such as battery status, AC power status and more. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per power device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| device | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| powersupply.capacity | capacity | percentage | +| powersupply.charge | empty_design, empty, now, full, full_design | Ah | +| powersupply.energy | empty_design, empty, now, full, full_design | Wh | +| powersupply.voltage | min_design, min, now, max, max_design | V | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ linux_power_supply_capacity ](https://github.com/netdata/netdata/blob/master/health/health.d/linux_power_supply.conf) | powersupply.capacity | percentage of remaining power supply capacity | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/pressure_stall_information.md b/collectors/proc.plugin/integrations/pressure_stall_information.md new file mode 100644 index 000000000..e590a8d38 --- /dev/null +++ b/collectors/proc.plugin/integrations/pressure_stall_information.md @@ -0,0 +1,128 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/pressure_stall_information.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Pressure Stall Information" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Pressure" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Pressure Stall Information + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/pressure + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Introduced in Linux kernel 4.20, `/proc/pressure` provides information about system pressure stall information +(PSI). PSI is a feature that allows the system to track the amount of time the system is stalled due to +resource contention, such as CPU, memory, or I/O. + +The collectors monitored 3 separate files for CPU, memory, and I/O: + +- **cpu**: Tracks the amount of time tasks are stalled due to CPU contention. +- **memory**: Tracks the amount of time tasks are stalled due to memory contention. +- **io**: Tracks the amount of time tasks are stalled due to I/O contention. +- **irq**: Tracks the amount of time tasks are stalled due to IRQ contention. + +Each of them provides metrics for stall time over the last 10 seconds, 1 minute, 5 minutes, and 15 minutes. + +Monitoring the /proc/pressure files can provide important insights into system performance and capacity planning: + +- **Identifying resource contention**: If these metrics are consistently high, it indicates that tasks are + frequently being stalled due to lack of resources, which can significantly degrade system performance. + +- **Troubleshooting performance issues**: If a system is experiencing performance issues, these metrics can + help identify whether resource contention is the cause. + +- **Capacity planning**: By monitoring these metrics over time, you can understand trends in resource + utilization and make informed decisions about when to add more resources to your system. + + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Pressure Stall Information instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.cpu_some_pressure | some10, some60, some300 | percentage | +| system.cpu_some_pressure_stall_time | time | ms | +| system.cpu_full_pressure | some10, some60, some300 | percentage | +| system.cpu_full_pressure_stall_time | time | ms | +| system.memory_some_pressure | some10, some60, some300 | percentage | +| system.memory_some_pressure_stall_time | time | ms | +| system.memory_full_pressure | some10, some60, some300 | percentage | +| system.memory_full_pressure_stall_time | time | ms | +| system.io_some_pressure | some10, some60, some300 | percentage | +| system.io_some_pressure_stall_time | time | ms | +| system.io_full_pressure | some10, some60, some300 | percentage | +| system.io_full_pressure_stall_time | time | ms | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/sctp_statistics.md b/collectors/proc.plugin/integrations/sctp_statistics.md new file mode 100644 index 000000000..ad9c26bf5 --- /dev/null +++ b/collectors/proc.plugin/integrations/sctp_statistics.md @@ -0,0 +1,98 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/sctp_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "SCTP Statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# SCTP Statistics + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/sctp/snmp + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides statistics about the Stream Control Transmission Protocol (SCTP). + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per SCTP Statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| sctp.established | established | associations | +| sctp.transitions | active, passive, aborted, shutdown | transitions/s | +| sctp.packets | received, sent | packets/s | +| sctp.packet_errors | invalid, checksum | packets/s | +| sctp.fragmentation | reassembled, fragmented | packets/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/socket_statistics.md b/collectors/proc.plugin/integrations/socket_statistics.md new file mode 100644 index 000000000..2c59f9883 --- /dev/null +++ b/collectors/proc.plugin/integrations/socket_statistics.md @@ -0,0 +1,108 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/socket_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Socket statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Socket statistics + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/sockstat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides socket statistics. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Socket statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ip.sockstat_sockets | used | sockets | +| ipv4.sockstat_tcp_sockets | alloc, orphan, inuse, timewait | sockets | +| ipv4.sockstat_tcp_mem | mem | KiB | +| ipv4.sockstat_udp_sockets | inuse | sockets | +| ipv4.sockstat_udp_mem | mem | sockets | +| ipv4.sockstat_udplite_sockets | inuse | sockets | +| ipv4.sockstat_raw_sockets | inuse | sockets | +| ipv4.sockstat_frag_sockets | inuse | fragments | +| ipv4.sockstat_frag_mem | mem | KiB | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ tcp_orphans ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_orphans.conf) | ipv4.sockstat_tcp_sockets | orphan IPv4 TCP sockets utilization | +| [ tcp_memory ](https://github.com/netdata/netdata/blob/master/health/health.d/tcp_mem.conf) | ipv4.sockstat_tcp_mem | TCP memory utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/softirq_statistics.md b/collectors/proc.plugin/integrations/softirq_statistics.md new file mode 100644 index 000000000..56cf9ab5c --- /dev/null +++ b/collectors/proc.plugin/integrations/softirq_statistics.md @@ -0,0 +1,132 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/softirq_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "SoftIRQ statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/CPU" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# SoftIRQ statistics + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/softirqs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +In the Linux kernel, handling of hardware interrupts is split into two halves: the top half and the bottom half. +The top half is the routine that responds immediately to an interrupt, while the bottom half is deferred to be processed later. + +Softirqs are a mechanism in the Linux kernel used to handle the bottom halves of interrupts, which can be +deferred and processed later in a context where it's safe to enable interrupts. + +The actual work of handling the interrupt is offloaded to a softirq and executed later when the system +decides it's a good time to process them. This helps to keep the system responsive by not blocking the top +half for too long, which could lead to missed interrupts. + +Monitoring `/proc/softirqs` is useful for: + +- **Performance tuning**: A high rate of softirqs could indicate a performance issue. For instance, a high + rate of network softirqs (`NET_RX` and `NET_TX`) could indicate a network performance issue. + +- **Troubleshooting**: If a system is behaving unexpectedly, checking the softirqs could provide clues about + what is going on. For example, a sudden increase in block device softirqs (BLOCK) might indicate a problem + with a disk. + +- **Understanding system behavior**: Knowing what types of softirqs are happening can help you understand what + your system is doing, particularly in terms of how it's interacting with hardware and how it's handling + interrupts. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per SoftIRQ statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.softirqs | a dimension per softirq | softirqs/s | + +### Per cpu core + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| cpu | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.softirqs | a dimension per softirq | softirqs/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/softnet_statistics.md b/collectors/proc.plugin/integrations/softnet_statistics.md new file mode 100644 index 000000000..84ac5ac88 --- /dev/null +++ b/collectors/proc.plugin/integrations/softnet_statistics.md @@ -0,0 +1,134 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/softnet_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Softnet Statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Softnet Statistics + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/softnet_stat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +`/proc/net/softnet_stat` provides statistics that relate to the handling of network packets by softirq. + +It provides information about: + +- Total number of processed packets (`processed`). +- Times ksoftirq ran out of quota (`dropped`). +- Times net_rx_action was rescheduled. +- Number of times processed all lists before quota. +- Number of times did not process all lists due to quota. +- Number of times net_rx_action was rescheduled for GRO (Generic Receive Offload) cells. +- Number of times GRO cells were processed. + +Monitoring the /proc/net/softnet_stat file can be useful for: + +- **Network performance monitoring**: By tracking the total number of processed packets and how many packets + were dropped, you can gain insights into your system's network performance. + +- **Troubleshooting**: If you're experiencing network-related issues, this collector can provide valuable clues. + For instance, a high number of dropped packets may indicate a network problem. + +- **Capacity planning**: If your system is consistently processing near its maximum capacity of network + packets, it might be time to consider upgrading your network infrastructure. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Softnet Statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.softnet_stat | processed, dropped, squeezed, received_rps, flow_limit_count | events/s | + +### Per cpu core + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.softnet_stat | processed, dropped, squeezed, received_rps, flow_limit_count | events/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 1min_netdev_backlog_exceeded ](https://github.com/netdata/netdata/blob/master/health/health.d/softnet.conf) | system.softnet_stat | average number of dropped packets in the last minute due to exceeded net.core.netdev_max_backlog | +| [ 1min_netdev_budget_ran_outs ](https://github.com/netdata/netdata/blob/master/health/health.d/softnet.conf) | system.softnet_stat | average number of times ksoftirq ran out of sysctl net.core.netdev_budget or net.core.netdev_budget_usecs with work remaining over the last minute (this can be a cause for dropped packets) | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/synproxy.md b/collectors/proc.plugin/integrations/synproxy.md new file mode 100644 index 000000000..04169773b --- /dev/null +++ b/collectors/proc.plugin/integrations/synproxy.md @@ -0,0 +1,96 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/synproxy.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Synproxy" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Firewall" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Synproxy + + +<img src="https://netdata.cloud/img/firewall.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/stat/synproxy + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides statistics about the Synproxy netfilter module. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Synproxy instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| netfilter.synproxy_syn_received | received | packets/s | +| netfilter.synproxy_conn_reopened | reopened | connections/s | +| netfilter.synproxy_cookies | valid, invalid, retransmits | cookies/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/system_load_average.md b/collectors/proc.plugin/integrations/system_load_average.md new file mode 100644 index 000000000..caff72737 --- /dev/null +++ b/collectors/proc.plugin/integrations/system_load_average.md @@ -0,0 +1,127 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/system_load_average.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "System Load Average" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/System" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# System Load Average + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/loadavg + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +The `/proc/loadavg` file provides information about the system load average. + +The load average is a measure of the amount of computational work that a system performs. It is a +representation of the average system load over a period of time. + +This file contains three numbers representing the system load averages for the last 1, 5, and 15 minutes, +respectively. It also includes the currently running processes and the total number of processes. + +Monitoring the load average can be used for: + +- **System performance**: If the load average is too high, it may indicate that your system is overloaded. + On a system with a single CPU, if the load average is 1, it means the single CPU is fully utilized. If the + load averages are consistently higher than the number of CPUs/cores, it may indicate that your system is + overloaded and tasks are waiting for CPU time. + +- **Troubleshooting**: If the load average is unexpectedly high, it can be a sign of a problem. This could be + due to a runaway process, a software bug, or a hardware issue. + +- **Capacity planning**: By monitoring the load average over time, you can understand the trends in your + system's workload. This can help with capacity planning and scaling decisions. + +Remember that load average not only considers CPU usage, but also includes processes waiting for disk I/O. +Therefore, high load averages could be due to I/O contention as well as CPU contention. + + + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per System Load Average instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.load | load1, load5, load15 | load | +| system.active_processes | active | processes | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ load_cpu_number ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | number of active CPU cores in the system | +| [ load_average_15 ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | system fifteen-minute load average | +| [ load_average_5 ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | system five-minute load average | +| [ load_average_1 ](https://github.com/netdata/netdata/blob/master/health/health.d/load.conf) | system.load | system one-minute load average | +| [ active_processes ](https://github.com/netdata/netdata/blob/master/health/health.d/processes.conf) | system.active_processes | system process IDs (PID) space utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/system_statistics.md b/collectors/proc.plugin/integrations/system_statistics.md new file mode 100644 index 000000000..2932dd8d2 --- /dev/null +++ b/collectors/proc.plugin/integrations/system_statistics.md @@ -0,0 +1,168 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/system_statistics.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "System statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/System" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# System statistics + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/stat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +CPU utilization, states and frequencies and key Linux system performance metrics. + +The `/proc/stat` file provides various types of system statistics: + +- The overall system CPU usage statistics +- Per CPU core statistics +- The total context switching of the system +- The total number of processes running +- The total CPU interrupts +- The total CPU softirqs + +The collector also reads: + +- `/proc/schedstat` for statistics about the process scheduler in the Linux kernel. +- `/sys/devices/system/cpu/[X]/thermal_throttle/core_throttle_count` to get the count of thermal throttling events for a specific CPU core on Linux systems. +- `/sys/devices/system/cpu/[X]/thermal_throttle/package_throttle_count` to get the count of thermal throttling events for a specific CPU package on a Linux system. +- `/sys/devices/system/cpu/[X]/cpufreq/scaling_cur_freq` to get the current operating frequency of a specific CPU core. +- `/sys/devices/system/cpu/[X]/cpufreq/stats/time_in_state` to get the amount of time the CPU has spent in each of its available frequency states. +- `/sys/devices/system/cpu/[X]/cpuidle/state[X]/name` to get the names of the idle states for each CPU core in a Linux system. +- `/sys/devices/system/cpu/[X]/cpuidle/state[X]/time` to get the total time each specific CPU core has spent in each idle state since the system was started. + + + + +This collector is only supported on the following platforms: + +- linux + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +The collector auto-detects all metrics. No configuration is needed. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The collector disables cpu frequency and idle state monitoring when there are more than 128 CPU cores available. + + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per System statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.cpu | guest_nice, guest, steal, softirq, irq, user, system, nice, iowait, idle | percentage | +| system.intr | interrupts | interrupts/s | +| system.ctxt | switches | context switches/s | +| system.forks | started | processes/s | +| system.processes | running, blocked | processes | +| cpu.core_throttling | a dimension per cpu core | events/s | +| cpu.package_throttling | a dimension per package | events/s | +| cpu.cpufreq | a dimension per cpu core | MHz | + +### Per cpu core + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| cpu | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| cpu.cpu | guest_nice, guest, steal, softirq, irq, user, system, nice, iowait, idle | percentage | +| cpuidle.cpu_cstate_residency_time | a dimension per c-state | percentage | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ 10min_cpu_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/cpu.conf) | system.cpu | average CPU utilization over the last 10 minutes (excluding iowait, nice and steal) | +| [ 10min_cpu_iowait ](https://github.com/netdata/netdata/blob/master/health/health.d/cpu.conf) | system.cpu | average CPU iowait time over the last 10 minutes | +| [ 20min_steal_cpu ](https://github.com/netdata/netdata/blob/master/health/health.d/cpu.conf) | system.cpu | average CPU steal time over the last 20 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `plugin:proc:/proc/stat` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/system_uptime.md b/collectors/proc.plugin/integrations/system_uptime.md new file mode 100644 index 000000000..7eedd4313 --- /dev/null +++ b/collectors/proc.plugin/integrations/system_uptime.md @@ -0,0 +1,107 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/system_uptime.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "System Uptime" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/System" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# System Uptime + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/uptime + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +The amount of time the system has been up (running). + +Uptime is a critical aspect of overall system performance: + +- **Availability**: Uptime monitoring can show whether a server is consistently available or experiences frequent downtimes. +- **Performance Monitoring**: While server uptime alone doesn't provide detailed performance data, analyzing the duration and frequency of downtimes can help identify patterns or trends. +- **Proactive problem detection**: If server uptime monitoring reveals unexpected downtimes or a decreasing uptime trend, it can serve as an early warning sign of potential problems. +- **Root cause analysis**: When investigating server downtime, the uptime metric alone may not provide enough information to pinpoint the exact cause. +- **Load balancing**: Uptime data can indirectly indicate load balancing issues if certain servers have significantly lower uptimes than others. +- **Optimize maintenance efforts**: Servers with consistently low uptimes or frequent downtimes may require more attention. +- **Compliance requirements**: Server uptime data can be used to demonstrate compliance with regulatory requirements or SLAs that mandate a minimum level of server availability. + + + + +This collector is only supported on the following platforms: + +- linux + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per System Uptime instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.uptime | uptime | seconds | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/wireless_network_interfaces.md b/collectors/proc.plugin/integrations/wireless_network_interfaces.md new file mode 100644 index 000000000..57375b975 --- /dev/null +++ b/collectors/proc.plugin/integrations/wireless_network_interfaces.md @@ -0,0 +1,99 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/wireless_network_interfaces.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "Wireless network interfaces" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Wireless network interfaces + + +<img src="https://netdata.cloud/img/network-wired.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/net/wireless + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor wireless devices with metrics about status, link quality, signal level, noise level and more. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per wireless device + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| wireless.status | status | status | +| wireless.link_quality | link_quality | value | +| wireless.signal_level | signal_level | dBm | +| wireless.noise_level | noise_level | dBm | +| wireless.discarded_packets | nwid, crypt, frag, retry, misc | packets/s | +| wireless.missed_beacons | missed_beacons | frames/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/zfs_adaptive_replacement_cache.md b/collectors/proc.plugin/integrations/zfs_adaptive_replacement_cache.md new file mode 100644 index 000000000..d62d12ee6 --- /dev/null +++ b/collectors/proc.plugin/integrations/zfs_adaptive_replacement_cache.md @@ -0,0 +1,124 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/zfs_adaptive_replacement_cache.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "ZFS Adaptive Replacement Cache" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Filesystem/ZFS" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# ZFS Adaptive Replacement Cache + + +<img src="https://netdata.cloud/img/filesystem.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/spl/kstat/zfs/arcstats + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration monitors ZFS Adadptive Replacement Cache (ARC) statistics. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per ZFS Adaptive Replacement Cache instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| zfs.arc_size | arcsz, target, min, max | MiB | +| zfs.l2_size | actual, size | MiB | +| zfs.reads | arc, demand, prefetch, metadata, l2 | reads/s | +| zfs.bytes | read, write | KiB/s | +| zfs.hits | hits, misses | percentage | +| zfs.hits_rate | hits, misses | events/s | +| zfs.dhits | hits, misses | percentage | +| zfs.dhits_rate | hits, misses | events/s | +| zfs.phits | hits, misses | percentage | +| zfs.phits_rate | hits, misses | events/s | +| zfs.mhits | hits, misses | percentage | +| zfs.mhits_rate | hits, misses | events/s | +| zfs.l2hits | hits, misses | percentage | +| zfs.l2hits_rate | hits, misses | events/s | +| zfs.list_hits | mfu, mfu_ghost, mru, mru_ghost | hits/s | +| zfs.arc_size_breakdown | recent, frequent | percentage | +| zfs.memory_ops | direct, throttled, indirect | operations/s | +| zfs.important_ops | evict_skip, deleted, mutex_miss, hash_collisions | operations/s | +| zfs.actual_hits | hits, misses | percentage | +| zfs.actual_hits_rate | hits, misses | events/s | +| zfs.demand_data_hits | hits, misses | percentage | +| zfs.demand_data_hits_rate | hits, misses | events/s | +| zfs.prefetch_data_hits | hits, misses | percentage | +| zfs.prefetch_data_hits_rate | hits, misses | events/s | +| zfs.hash_elements | current, max | elements | +| zfs.hash_chains | current, max | chains | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ zfs_memory_throttle ](https://github.com/netdata/netdata/blob/master/health/health.d/zfs.conf) | zfs.memory_ops | number of times ZFS had to limit the ARC growth in the last 10 minutes | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/zfs_pools.md b/collectors/proc.plugin/integrations/zfs_pools.md new file mode 100644 index 000000000..b913572e3 --- /dev/null +++ b/collectors/proc.plugin/integrations/zfs_pools.md @@ -0,0 +1,104 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/zfs_pools.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "ZFS Pools" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Filesystem/ZFS" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# ZFS Pools + + +<img src="https://netdata.cloud/img/filesystem.svg" width="150"/> + + +Plugin: proc.plugin +Module: /proc/spl/kstat/zfs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This integration provides metrics about the state of ZFS pools. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per zfs pool + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| pool | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| zfspool.state | online, degraded, faulted, offline, removed, unavail, suspended | boolean | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ zfs_pool_state_warn ](https://github.com/netdata/netdata/blob/master/health/health.d/zfs.conf) | zfspool.state | ZFS pool ${label:pool} state is degraded | +| [ zfs_pool_state_crit ](https://github.com/netdata/netdata/blob/master/health/health.d/zfs.conf) | zfspool.state | ZFS pool ${label:pool} state is faulted or unavail | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/integrations/zram.md b/collectors/proc.plugin/integrations/zram.md new file mode 100644 index 000000000..0bcda3eaf --- /dev/null +++ b/collectors/proc.plugin/integrations/zram.md @@ -0,0 +1,105 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/integrations/zram.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/proc.plugin/metadata.yaml" +sidebar_label: "ZRAM" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Memory" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# ZRAM + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: proc.plugin +Module: /sys/block/zram + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +zRAM, or compressed RAM, is a block device that uses a portion of your system's RAM as a block device. +The data written to this block device is compressed and stored in memory. + +The collectors provides information about the operation and the effectiveness of zRAM on your system. + + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per zram device + + + +Labels: + +| Label | Description | +|:-----------|:----------------| +| device | TBD | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.zram_usage | compressed, metadata | MiB | +| mem.zram_savings | savings, original | MiB | +| mem.zram_ratio | ratio | ratio | +| mem.zram_efficiency | percent | percentage | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + + + +There are no configuration options. + +#### Examples +There are no configuration examples. + + diff --git a/collectors/proc.plugin/metadata.yaml b/collectors/proc.plugin/metadata.yaml index 81d83f50e..45351b36f 100644 --- a/collectors/proc.plugin/metadata.yaml +++ b/collectors/proc.plugin/metadata.yaml @@ -2643,22 +2643,22 @@ modules: os: "linux" - name: inbound_packets_dropped_ratio link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets + metric: net.drops info: ratio of inbound dropped packets for the network interface ${label:device} over the last 10 minutes os: "linux" - name: outbound_packets_dropped_ratio link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets + metric: net.drops info: ratio of outbound dropped packets for the network interface ${label:device} over the last 10 minutes os: "linux" - name: wifi_inbound_packets_dropped_ratio link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets + metric: net.drops info: ratio of inbound dropped packets for the network interface ${label:device} over the last 10 minutes os: "linux" - name: wifi_outbound_packets_dropped_ratio link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.packets + metric: net.drops info: ratio of outbound dropped packets for the network interface ${label:device} over the last 10 minutes os: "linux" - name: 1m_received_packets_rate @@ -2669,20 +2669,8 @@ modules: - name: 10s_received_packets_storm link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf metric: net.packets - info: - ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over - the last minute + info: ratio of average number of received packets for the network interface ${label:device} over the last 10 seconds, compared to the rate over the last minute os: "linux freebsd" - - name: inbound_packets_dropped - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.drops - info: number of inbound dropped packets for the network interface ${label:device} in the last 10 minutes - os: "linux" - - name: outbound_packets_dropped - link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf - metric: net.drops - info: number of outbound dropped packets for the network interface ${label:device} in the last 10 minutes - os: "linux" - name: 10min_fifo_errors link: https://github.com/netdata/netdata/blob/master/health/health.d/net.conf metric: net.fifo @@ -3140,29 +3128,29 @@ modules: os: "linux" - name: tcp_connections link: https://github.com/netdata/netdata/blob/master/health/health.d/tcp_conn.conf - metric: ipv4.tcpsock - info: IPv4 TCP connections utilization + metric: ip.tcpsock + info: TCP connections utilization os: "linux" - - name: 1m_ipv4_tcp_resets_sent + - name: 1m_ip_tcp_resets_sent link: https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf - metric: ipv4.tcphandshake + metric: ip.tcphandshake info: average number of sent TCP RESETS over the last minute os: "linux" - - name: 10s_ipv4_tcp_resets_sent + - name: 10s_ip_tcp_resets_sent link: https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf - metric: ipv4.tcphandshake + metric: ip.tcphandshake info: average number of sent TCP RESETS over the last 10 seconds. This can indicate a port scan, or that a service running on this host has crashed. Netdata will not send a clear notification for this alarm. os: "linux" - - name: 1m_ipv4_tcp_resets_received + - name: 1m_ip_tcp_resets_received link: https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf - metric: ipv4.tcphandshake + metric: ip.tcphandshake info: average number of received TCP RESETS over the last minute os: "linux freebsd" - - name: 10s_ipv4_tcp_resets_received + - name: 10s_ip_tcp_resets_received link: https://github.com/netdata/netdata/blob/master/health/health.d/tcp_resets.conf - metric: ipv4.tcphandshake + metric: ip.tcphandshake info: average number of received TCP RESETS over the last 10 seconds. This can be an indication that a service this host needs has crashed. Netdata will not send a clear notification for this alarm. @@ -3189,57 +3177,12 @@ modules: labels: [] metrics: - name: system.ip - description: IP Bandwidth + description: IPv4 Bandwidth unit: "kilobits/s" chart_type: area dimensions: - name: received - name: sent - - name: ip.inerrors - description: IP Input Errors - unit: "packets/s" - chart_type: line - dimensions: - - name: noroutes - - name: truncated - - name: checksum - - name: ip.mcast - description: IP Multicast Bandwidth - unit: "kilobits/s" - chart_type: area - dimensions: - - name: received - - name: sent - - name: ip.bcast - description: IP Broadcast Bandwidth - unit: "kilobits/s" - chart_type: area - dimensions: - - name: received - - name: sent - - name: ip.mcastpkts - description: IP Multicast Packets - unit: "packets/s" - chart_type: line - dimensions: - - name: received - - name: sent - - name: ip.bcastpkts - description: IP Broadcast Packets - unit: "packets/s" - chart_type: line - dimensions: - - name: received - - name: sent - - name: ip.ecnpkts - description: IP ECN Statistics - unit: "packets/s" - chart_type: line - dimensions: - - name: CEP - - name: NoECTP - - name: ECTP0 - - name: ECTP1 - name: ip.tcpmemorypressures description: TCP Memory Pressures unit: "events/s" @@ -3297,31 +3240,52 @@ modules: dimensions: - name: overflows - name: drops - - name: ipv4.packets - description: IPv4 Packets + - name: ip.tcpsock + description: IPv4 TCP Connections + unit: "active connections" + chart_type: line + dimensions: + - name: connections + - name: ip.tcppackets + description: IPv4 TCP Packets unit: "packets/s" chart_type: line dimensions: - name: received - name: sent - - name: forwarded - - name: delivered - - name: ipv4.fragsout - description: IPv4 Fragments Sent + - name: ip.tcperrors + description: IPv4 TCP Errors unit: "packets/s" chart_type: line dimensions: - - name: ok - - name: failed - - name: created - - name: ipv4.fragsin - description: IPv4 Fragments Reassembly + - name: InErrs + - name: InCsumErrors + - name: RetransSegs + - name: ip.tcpopens + description: IPv4 TCP Opens + unit: "connections/s" + chart_type: line + dimensions: + - name: active + - name: passive + - name: ip.tcphandshake + description: IPv4 TCP Handshake Issues + unit: "events/s" + chart_type: line + dimensions: + - name: EstabResets + - name: OutRsts + - name: AttemptFails + - name: SynRetrans + - name: ipv4.packets + description: IPv4 Packets unit: "packets/s" chart_type: line dimensions: - - name: ok - - name: failed - - name: all + - name: received + - name: sent + - name: forwarded + - name: delivered - name: ipv4.errors description: IPv4 Errors unit: "packets/s" @@ -3329,25 +3293,47 @@ modules: dimensions: - name: InDiscards - name: OutDiscards - - name: InHdrErrors + - name: InNoRoutes - name: OutNoRoutes + - name: InHdrErrors - name: InAddrErrors - - name: InUnknownProtos - - name: ipv4.icmp - description: IPv4 ICMP Packets + - name: InTruncatedPkts + - name: InCsumErrors + - name: ipc4.bcast + description: IP Broadcast Bandwidth + unit: "kilobits/s" + chart_type: area + dimensions: + - name: received + - name: sent + - name: ipv4.bcastpkts + description: IP Broadcast Packets unit: "packets/s" chart_type: line dimensions: - name: received - name: sent - - name: ipv4.icmp_errors - description: IPv4 ICMP Errors + - name: ipv4.mcast + description: IPv4 Multicast Bandwidth + unit: "kilobits/s" + chart_type: area + dimensions: + - name: received + - name: sent + - name: ipv4.mcastpkts + description: IP Multicast Packets unit: "packets/s" chart_type: line dimensions: - - name: InErrors - - name: OutErrors - - name: InCsumErrors + - name: received + - name: sent + - name: ipv4.icmp + description: IPv4 ICMP Packets + unit: "packets/s" + chart_type: line + dimensions: + - name: received + - name: sent - name: ipv4.icmpmsg description: IPv4 ICMP Messages unit: "packets/s" @@ -3373,43 +3359,14 @@ modules: - name: OutTimestamps - name: InTimestampReps - name: OutTimestampReps - - name: ipv4.tcpsock - description: IPv4 TCP Connections - unit: "active connections" - chart_type: line - dimensions: - - name: connections - - name: ipv4.tcppackets - description: IPv4 TCP Packets - unit: "packets/s" - chart_type: line - dimensions: - - name: received - - name: sent - - name: ipv4.tcperrors - description: IPv4 TCP Errors + - name: ipv4.icmp_errors + description: IPv4 ICMP Errors unit: "packets/s" chart_type: line dimensions: - - name: InErrs + - name: InErrors + - name: OutErrors - name: InCsumErrors - - name: RetransSegs - - name: ipv4.tcpopens - description: IPv4 TCP Opens - unit: "connections/s" - chart_type: line - dimensions: - - name: active - - name: passive - - name: ipv4.tcphandshake - description: IPv4 TCP Handshake Issues - unit: "events/s" - chart_type: line - dimensions: - - name: EstabResets - - name: OutRsts - - name: AttemptFails - - name: SynRetrans - name: ipv4.udppackets description: IPv4 UDP Packets unit: "packets/s" @@ -3446,6 +3403,31 @@ modules: - name: NoPorts - name: InCsumErrors - name: IgnoredMulti + - name: ipv4.ecnpkts + description: IP ECN Statistics + unit: "packets/s" + chart_type: line + dimensions: + - name: CEP + - name: NoECTP + - name: ECTP0 + - name: ECTP1 + - name: ipv4.fragsin + description: IPv4 Fragments Reassembly + unit: "packets/s" + chart_type: line + dimensions: + - name: ok + - name: failed + - name: all + - name: ipv4.fragsout + description: IPv4 Fragments Sent + unit: "packets/s" + chart_type: line + dimensions: + - name: ok + - name: failed + - name: created - name: system.ipv6 description: IPv6 Bandwidth unit: "kilobits/s" @@ -3453,7 +3435,7 @@ modules: dimensions: - name: received - name: sent - - name: system.ipv6 + - name: ipv6.packets description: IPv6 Packets unit: "packets/s" chart_type: line @@ -3462,23 +3444,6 @@ modules: - name: sent - name: forwarded - name: delivers - - name: ipv6.fragsout - description: IPv6 Fragments Sent - unit: "packets/s" - chart_type: line - dimensions: - - name: ok - - name: failed - - name: all - - name: ipv6.fragsin - description: IPv6 Fragments Reassembly - unit: "packets/s" - chart_type: line - dimensions: - - name: ok - - name: failed - - name: timeout - - name: all - name: ipv6.errors description: IPv6 Errors unit: "packets/s" @@ -3493,6 +3458,27 @@ modules: - name: InTruncatedPkts - name: InNoRoutes - name: OutNoRoutes + - name: ipv6.bcast + description: IPv6 Broadcast Bandwidth + unit: "kilobits/s" + chart_type: area + dimensions: + - name: received + - name: sent + - name: ipv6.mcast + description: IPv6 Multicast Bandwidth + unit: "kilobits/s" + chart_type: area + dimensions: + - name: received + - name: sent + - name: ipv6.mcastpkts + description: IPv6 Multicast Packets + unit: "packets/s" + chart_type: line + dimensions: + - name: received + - name: sent - name: ipv6.udppackets description: IPv6 UDP Packets unit: "packets/s" @@ -3528,27 +3514,6 @@ modules: - name: InErrors - name: NoPorts - name: InCsumErrors - - name: ipv6.mcast - description: IPv6 Multicast Bandwidth - unit: "kilobits/s" - chart_type: area - dimensions: - - name: received - - name: sent - - name: ipv6.bcast - description: IPv6 Broadcast Bandwidth - unit: "kilobits/s" - chart_type: area - dimensions: - - name: received - - name: sent - - name: ipv6.mcastpkts - description: IPv6 Multicast Packets - unit: "packets/s" - chart_type: line - dimensions: - - name: received - - name: sent - name: ipv6.icmp description: IPv6 ICMP Messages unit: "messages/s" @@ -3657,6 +3622,23 @@ modules: - name: InECT1Pkts - name: InECT0Pkts - name: InCEPkts + - name: ipv6.fragsin + description: IPv6 Fragments Reassembly + unit: "packets/s" + chart_type: line + dimensions: + - name: ok + - name: failed + - name: timeout + - name: all + - name: ipv6.fragsout + description: IPv6 Fragments Sent + unit: "packets/s" + chart_type: line + dimensions: + - name: ok + - name: failed + - name: all - meta: plugin_name: proc.plugin module_name: /proc/net/sockstat @@ -3734,8 +3716,8 @@ modules: description: "" labels: [] metrics: - - name: ipv4.sockstat_sockets - description: IPv4 Sockets Used + - name: ip.sockstat_sockets + description: Sockets used for all address families unit: "sockets" chart_type: line dimensions: diff --git a/collectors/proc.plugin/plugin_proc.h b/collectors/proc.plugin/plugin_proc.h index a90f4838e..a0ddd76c4 100644 --- a/collectors/proc.plugin/plugin_proc.h +++ b/collectors/proc.plugin/plugin_proc.h @@ -58,7 +58,7 @@ void netdev_rename_device_add( const char *host_device, const char *container_device, const char *container_name, - DICTIONARY *labels, + RRDLABELS *labels, const char *ctx_prefix); void netdev_rename_device_del(const char *host_device); diff --git a/collectors/proc.plugin/proc_diskstats.c b/collectors/proc.plugin/proc_diskstats.c index 09c6498e3..e65c42212 100644 --- a/collectors/proc.plugin/proc_diskstats.c +++ b/collectors/proc.plugin/proc_diskstats.c @@ -17,6 +17,11 @@ static struct disk { char *disk; // the name of the disk (sda, sdb, etc, after being looked up) char *device; // the device of the disk (before being looked up) + char *disk_by_id; + char *model; + char *serial; +// bool rotational; +// bool removable; uint32_t hash; unsigned long major; unsigned long minor; @@ -172,6 +177,8 @@ static char *path_to_sys_block_device = NULL; static char *path_to_sys_block_device_bcache = NULL; static char *path_to_sys_devices_virtual_block_device = NULL; static char *path_to_device_mapper = NULL; +static char *path_to_dev_disk = NULL; +static char *path_to_sys_block = NULL; static char *path_to_device_label = NULL; static char *path_to_device_id = NULL; static char *path_to_veritas_volume_groups = NULL; @@ -469,6 +476,109 @@ static inline char *get_disk_name(unsigned long major, unsigned long minor, char return strdup(result); } +static inline bool ends_with(const char *str, const char *suffix) { + if (!str || !suffix) + return false; + + size_t len_str = strlen(str); + size_t len_suffix = strlen(suffix); + if (len_suffix > len_str) + return false; + + return strncmp(str + len_str - len_suffix, suffix, len_suffix) == 0; +} + +static inline char *get_disk_by_id(char *device) { + char pathname[256 + 1]; + snprintfz(pathname, 256, "%s/by-id", path_to_dev_disk); + + struct dirent *entry; + DIR *dp = opendir(pathname); + if (dp == NULL) { + internal_error(true, "Cannot open '%s'", pathname); + return NULL; + } + + while ((entry = readdir(dp))) { + // We ignore the '.' and '..' entries + if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) + continue; + + if(strncmp(entry->d_name, "md-uuid-", 8) == 0 || + strncmp(entry->d_name, "dm-uuid-", 8) == 0 || + strncmp(entry->d_name, "nvme-eui.", 9) == 0 || + strncmp(entry->d_name, "wwn-", 4) == 0 || + strncmp(entry->d_name, "lvm-pv-uuid-", 12) == 0) + continue; + + char link_target[256 + 1]; + char full_path[256 + 1]; + snprintfz(full_path, 256, "%s/%s", pathname, entry->d_name); + + ssize_t len = readlink(full_path, link_target, 256); + if (len == -1) + continue; + + link_target[len] = '\0'; + + if (ends_with(link_target, device)) { + char *s = strdupz(entry->d_name); + closedir(dp); + return s; + } + } + + closedir(dp); + return NULL; +} + +static inline char *get_disk_model(char *device) { + char path[256 + 1]; + char buffer[256 + 1]; + + snprintfz(path, 256, "%s/%s/device/model", path_to_sys_block, device); + if(read_file(path, buffer, 256) != 0) { + snprintfz(path, 256, "%s/%s/device/name", path_to_sys_block, device); + if(read_file(path, buffer, 256) != 0) + return NULL; + } + + return strdupz(buffer); +} + +static inline char *get_disk_serial(char *device) { + char path[256 + 1]; + char buffer[256 + 1]; + + snprintfz(path, 256, "%s/%s/device/serial", path_to_sys_block, device); + if(read_file(path, buffer, 256) != 0) + return NULL; + + return strdupz(buffer); +} + +//static inline bool get_disk_rotational(char *device) { +// char path[256 + 1]; +// char buffer[256 + 1]; +// +// snprintfz(path, 256, "%s/%s/queue/rotational", path_to_sys_block, device); +// if(read_file(path, buffer, 256) != 0) +// return false; +// +// return buffer[0] == '1'; +//} +// +//static inline bool get_disk_removable(char *device) { +// char path[256 + 1]; +// char buffer[256 + 1]; +// +// snprintfz(path, 256, "%s/%s/removable", path_to_sys_block, device); +// if(read_file(path, buffer, 256) != 0) +// return false; +// +// return buffer[0] == '1'; +//} + static void get_disk_config(struct disk *d) { int def_enable = global_enable_new_disks_detected_at_runtime; @@ -594,6 +704,11 @@ static struct disk *get_disk(unsigned long major, unsigned long minor, char *dis d->disk = get_disk_name(major, minor, disk); d->device = strdupz(disk); + d->disk_by_id = get_disk_by_id(disk); + d->model = get_disk_model(disk); + d->serial = get_disk_serial(disk); +// d->rotational = get_disk_rotational(disk); +// d->removable = get_disk_removable(disk); d->hash = simple_hash(d->device); d->major = major; d->minor = minor; @@ -854,6 +969,11 @@ static struct disk *get_disk(unsigned long major, unsigned long minor, char *dis static void add_labels_to_disk(struct disk *d, RRDSET *st) { rrdlabels_add(st->rrdlabels, "device", d->disk, RRDLABEL_SRC_AUTO); rrdlabels_add(st->rrdlabels, "mount_point", d->mount_point, RRDLABEL_SRC_AUTO); + rrdlabels_add(st->rrdlabels, "id", d->disk_by_id, RRDLABEL_SRC_AUTO); + rrdlabels_add(st->rrdlabels, "model", d->model, RRDLABEL_SRC_AUTO); + rrdlabels_add(st->rrdlabels, "serial", d->serial, RRDLABEL_SRC_AUTO); +// rrdlabels_add(st->rrdlabels, "rotational", d->rotational ? "true" : "false", RRDLABEL_SRC_AUTO); +// rrdlabels_add(st->rrdlabels, "removable", d->removable ? "true" : "false", RRDLABEL_SRC_AUTO); switch (d->type) { default: @@ -922,6 +1042,12 @@ int do_proc_diskstats(int update_every, usec_t dt) { snprintfz(buffer, FILENAME_MAX, "%s/dev/mapper", netdata_configured_host_prefix); path_to_device_mapper = config_get(CONFIG_SECTION_PLUGIN_PROC_DISKSTATS, "path to device mapper", buffer); + snprintfz(buffer, FILENAME_MAX, "%s/dev/disk", netdata_configured_host_prefix); + path_to_dev_disk = config_get(CONFIG_SECTION_PLUGIN_PROC_DISKSTATS, "path to /dev/disk", buffer); + + snprintfz(buffer, FILENAME_MAX, "%s/sys/block", netdata_configured_host_prefix); + path_to_sys_block = config_get(CONFIG_SECTION_PLUGIN_PROC_DISKSTATS, "path to /sys/block", buffer); + snprintfz(buffer, FILENAME_MAX, "%s/dev/disk/by-label", netdata_configured_host_prefix); path_to_device_label = config_get(CONFIG_SECTION_PLUGIN_PROC_DISKSTATS, "path to /dev/disk/by-label", buffer); @@ -2026,6 +2152,9 @@ int do_proc_diskstats(int update_every, usec_t dt) { freez(t->disk); freez(t->device); + freez(t->disk_by_id); + freez(t->model); + freez(t->serial); freez(t->mount_point); freez(t->chart_id); freez(t); diff --git a/collectors/proc.plugin/proc_net_dev.c b/collectors/proc.plugin/proc_net_dev.c index 16881d170..eb2d0e0c0 100644 --- a/collectors/proc.plugin/proc_net_dev.c +++ b/collectors/proc.plugin/proc_net_dev.c @@ -123,7 +123,7 @@ static struct netdev { const char *chart_family; - DICTIONARY *chart_labels; + RRDLABELS *chart_labels; int flipped; unsigned long priority; @@ -348,7 +348,7 @@ static struct netdev_rename { const char *container_name; const char *ctx_prefix; - DICTIONARY *chart_labels; + RRDLABELS *chart_labels; int processed; @@ -373,7 +373,7 @@ void netdev_rename_device_add( const char *host_device, const char *container_device, const char *container_name, - DICTIONARY *labels, + RRDLABELS *labels, const char *ctx_prefix) { netdata_mutex_lock(&netdev_rename_mutex); diff --git a/collectors/proc.plugin/proc_net_netstat.c b/collectors/proc.plugin/proc_net_netstat.c index ce3068c0e..170daad5d 100644 --- a/collectors/proc.plugin/proc_net_netstat.c +++ b/collectors/proc.plugin/proc_net_netstat.c @@ -2,9 +2,9 @@ #include "plugin_proc.h" -#define RRD_TYPE_NET_NETSTAT "ip" -#define RRD_TYPE_NET_SNMP "ipv4" -#define RRD_TYPE_NET_SNMP6 "ipv6" +#define RRD_TYPE_NET_IP "ip" +#define RRD_TYPE_NET_IP4 "ipv4" +#define RRD_TYPE_NET_IP6 "ipv6" #define PLUGIN_PROC_MODULE_NETSTAT_NAME "/proc/net/netstat" #define CONFIG_SECTION_PLUGIN_PROC_NETSTAT "plugin:" PLUGIN_PROC_CONFIG_NAME ":" PLUGIN_PROC_MODULE_NETSTAT_NAME @@ -424,7 +424,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "packets" , NULL , "packets" @@ -464,7 +464,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "fragsout" , NULL , "fragments6" @@ -506,7 +506,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "fragsin" , NULL , "fragments6" @@ -557,7 +557,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "errors" , NULL , "errors" @@ -605,7 +605,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "udppackets" , NULL , "udp6" @@ -647,7 +647,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "udperrors" , NULL , "udp6" @@ -689,7 +689,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "udplitepackets" , NULL , "udplite6" @@ -730,7 +730,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "udpliteerrors" , NULL , "udplite6" @@ -771,7 +771,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "mcast" , NULL , "multicast6" @@ -806,7 +806,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "bcast" , NULL , "broadcast6" @@ -841,7 +841,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "mcastpkts" , NULL , "multicast6" @@ -876,7 +876,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmp" , NULL , "icmp6" @@ -910,7 +910,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmpredir" , NULL , "icmp6" @@ -962,7 +962,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmperrors" , NULL , "icmp6" @@ -1018,7 +1018,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmpechos" , NULL , "icmp6" @@ -1064,7 +1064,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "groupmemb" , NULL , "icmp6" @@ -1109,7 +1109,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmprouter" , NULL , "icmp6" @@ -1151,7 +1151,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmpneighbor" , NULL , "icmp6" @@ -1189,7 +1189,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmpmldv2" , NULL , "icmp6" @@ -1239,7 +1239,7 @@ static void do_proc_net_snmp6(int update_every) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6 + RRD_TYPE_NET_IP6 , "icmptypes" , NULL , "icmp6" @@ -1287,7 +1287,7 @@ static void do_proc_net_snmp6(int update_every) { if (unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP6, + RRD_TYPE_NET_IP6, "ect", NULL, "packets", @@ -1852,11 +1852,11 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_system_ip)) { st_system_ip = rrdset_create_localhost( "system" - , RRD_TYPE_NET_NETSTAT + , "ip" // FIXME: this is ipv4. Not changing it because it will require to do changes in cloud-frontend too , NULL , "network" , NULL - , "IP Bandwidth" + , "IPv4 Bandwidth" , "kilobits/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME @@ -1874,43 +1874,6 @@ int do_proc_net_netstat(int update_every, usec_t dt) { rrdset_done(st_system_ip); } - if(do_inerrors == CONFIG_BOOLEAN_YES || (do_inerrors == CONFIG_BOOLEAN_AUTO && - (ipext_InNoRoutes || - ipext_InTruncatedPkts || - netdata_zero_metrics_enabled == CONFIG_BOOLEAN_YES))) { - do_inerrors = CONFIG_BOOLEAN_YES; - static RRDSET *st_ip_inerrors = NULL; - static RRDDIM *rd_noroutes = NULL, *rd_truncated = NULL, *rd_checksum = NULL; - - if(unlikely(!st_ip_inerrors)) { - st_ip_inerrors = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT - , "inerrors" - , NULL - , "errors" - , NULL - , "IP Input Errors" - , "packets/s" - , PLUGIN_PROC_NAME - , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IP_ERRORS - , update_every - , RRDSET_TYPE_LINE - ); - - rrdset_flag_set(st_ip_inerrors, RRDSET_FLAG_DETAIL); - - rd_noroutes = rrddim_add(st_ip_inerrors, "InNoRoutes", "noroutes", 1, 1, RRD_ALGORITHM_INCREMENTAL); - rd_truncated = rrddim_add(st_ip_inerrors, "InTruncatedPkts", "truncated", 1, 1, RRD_ALGORITHM_INCREMENTAL); - rd_checksum = rrddim_add(st_ip_inerrors, "InCsumErrors", "checksum", 1, 1, RRD_ALGORITHM_INCREMENTAL); - } - - rrddim_set_by_pointer(st_ip_inerrors, rd_noroutes, ipext_InNoRoutes); - rrddim_set_by_pointer(st_ip_inerrors, rd_truncated, ipext_InTruncatedPkts); - rrddim_set_by_pointer(st_ip_inerrors, rd_checksum, ipext_InCsumErrors); - rrdset_done(st_ip_inerrors); - } - if(do_mcast == CONFIG_BOOLEAN_YES || (do_mcast == CONFIG_BOOLEAN_AUTO && (ipext_InMcastOctets || ipext_OutMcastOctets || @@ -1921,7 +1884,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_ip_mcast)) { st_ip_mcast = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP4 , "mcast" , NULL , "multicast" @@ -1930,7 +1893,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "kilobits/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IP_MCAST + , NETDATA_CHART_PRIO_IPV4_MCAST , update_every , RRDSET_TYPE_AREA ); @@ -1960,16 +1923,16 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_ip_bcast)) { st_ip_bcast = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP4 , "bcast" , NULL , "broadcast" , NULL - , "IP Broadcast Bandwidth" + , "IPv4 Broadcast Bandwidth" , "kilobits/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IP_BCAST + , NETDATA_CHART_PRIO_IPV4_BCAST , update_every , RRDSET_TYPE_AREA ); @@ -1999,16 +1962,16 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_ip_mcastpkts)) { st_ip_mcastpkts = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP4 , "mcastpkts" , NULL , "multicast" , NULL - , "IP Multicast Packets" + , "IPv4 Multicast Packets" , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IP_MCAST_PACKETS + , NETDATA_CHART_PRIO_IPV4_MCAST_PACKETS , update_every , RRDSET_TYPE_LINE ); @@ -2035,16 +1998,16 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_ip_bcastpkts)) { st_ip_bcastpkts = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP4 , "bcastpkts" , NULL , "broadcast" , NULL - , "IP Broadcast Packets" + , "IPv4 Broadcast Packets" , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IP_BCAST_PACKETS + , NETDATA_CHART_PRIO_IPV4_BCAST_PACKETS , update_every , RRDSET_TYPE_LINE ); @@ -2073,16 +2036,16 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_ecnpkts)) { st_ecnpkts = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP4 , "ecnpkts" , NULL , "ecn" , NULL - , "IP ECN Statistics" + , "IPv4 ECN Statistics" , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IP_ECN + , NETDATA_CHART_PRIO_IPV4_ECN , update_every , RRDSET_TYPE_LINE ); @@ -2114,7 +2077,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_tcpmemorypressures)) { st_tcpmemorypressures = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP , "tcpmemorypressures" , NULL , "tcp" @@ -2123,7 +2086,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "events/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IP_TCP_MEM + , NETDATA_CHART_PRIO_IP_TCP_MEM_PRESSURE , update_every , RRDSET_TYPE_LINE ); @@ -2150,7 +2113,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_tcpconnaborts)) { st_tcpconnaborts = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP , "tcpconnaborts" , NULL , "tcp" @@ -2194,7 +2157,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_tcpreorders)) { st_tcpreorders = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP , "tcpreorders" , NULL , "tcp" @@ -2236,7 +2199,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_ip_tcpofo)) { st_ip_tcpofo = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP , "tcpofo" , NULL , "tcp" @@ -2276,7 +2239,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_syncookies)) { st_syncookies = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP , "tcpsyncookies" , NULL , "tcp" @@ -2315,7 +2278,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_syn_queue)) { st_syn_queue = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP , "tcp_syn_queue" , NULL , "tcp" @@ -2351,7 +2314,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_accept_queue)) { st_accept_queue = rrdset_create_localhost( - RRD_TYPE_NET_NETSTAT + RRD_TYPE_NET_IP , "tcp_accept_queue" , NULL , "tcp" @@ -2392,7 +2355,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "packets" , NULL , "packets" @@ -2433,7 +2396,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "fragsout" , NULL , "fragments" @@ -2442,7 +2405,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_FRAGMENTS + , NETDATA_CHART_PRIO_IPV4_FRAGMENTS_OUT , update_every , RRDSET_TYPE_LINE ); @@ -2473,7 +2436,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "fragsin" , NULL , "fragments" @@ -2482,7 +2445,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_FRAGMENTS + 1 + , NETDATA_CHART_PRIO_IPV4_FRAGMENTS_IN , update_every , RRDSET_TYPE_LINE ); @@ -2513,13 +2476,16 @@ int do_proc_net_netstat(int update_every, usec_t dt) { static RRDDIM *rd_InDiscards = NULL, *rd_OutDiscards = NULL, *rd_InHdrErrors = NULL, + *rd_InNoRoutes = NULL, *rd_OutNoRoutes = NULL, *rd_InAddrErrors = NULL, + *rd_InTruncatedPkts = NULL, + *rd_InCsumErrors = NULL, *rd_InUnknownProtos = NULL; if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "errors" , NULL , "errors" @@ -2537,11 +2503,14 @@ int do_proc_net_netstat(int update_every, usec_t dt) { rd_InDiscards = rrddim_add(st, "InDiscards", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); rd_OutDiscards = rrddim_add(st, "OutDiscards", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL); - rd_InHdrErrors = rrddim_add(st, "InHdrErrors", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + rd_InNoRoutes = rrddim_add(st, "InNoRoutes", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); rd_OutNoRoutes = rrddim_add(st, "OutNoRoutes", NULL, -1, 1, RRD_ALGORITHM_INCREMENTAL); + rd_InHdrErrors = rrddim_add(st, "InHdrErrors", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); rd_InAddrErrors = rrddim_add(st, "InAddrErrors", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); rd_InUnknownProtos = rrddim_add(st, "InUnknownProtos", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + rd_InTruncatedPkts = rrddim_add(st, "InTruncatedPkts", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); + rd_InCsumErrors = rrddim_add(st, "InCsumErrors", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); } rrddim_set_by_pointer(st, rd_InDiscards, (collected_number)snmp_root.ip_InDiscards); @@ -2549,7 +2518,10 @@ int do_proc_net_netstat(int update_every, usec_t dt) { rrddim_set_by_pointer(st, rd_InHdrErrors, (collected_number)snmp_root.ip_InHdrErrors); rrddim_set_by_pointer(st, rd_InAddrErrors, (collected_number)snmp_root.ip_InAddrErrors); rrddim_set_by_pointer(st, rd_InUnknownProtos, (collected_number)snmp_root.ip_InUnknownProtos); + rrddim_set_by_pointer(st, rd_InNoRoutes, (collected_number)ipext_InNoRoutes); rrddim_set_by_pointer(st, rd_OutNoRoutes, (collected_number)snmp_root.ip_OutNoRoutes); + rrddim_set_by_pointer(st, rd_InTruncatedPkts, (collected_number)ipext_InTruncatedPkts); + rrddim_set_by_pointer(st, rd_InCsumErrors, (collected_number)ipext_InCsumErrors); rrdset_done(st); } @@ -2571,7 +2543,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_packets)) { st_packets = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "icmp" , NULL , "icmp" @@ -2580,7 +2552,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_ICMP + , NETDATA_CHART_PRIO_IPV4_ICMP_PACKETS , update_every , RRDSET_TYPE_LINE ); @@ -2602,7 +2574,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st_errors)) { st_errors = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "icmp_errors" , NULL , "icmp" @@ -2611,7 +2583,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_ICMP + 1 + , NETDATA_CHART_PRIO_IPV4_ICMP_ERRORS , update_every , RRDSET_TYPE_LINE ); @@ -2678,7 +2650,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "icmpmsg" , NULL , "icmp" @@ -2687,7 +2659,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_ICMP + 2 + , NETDATA_CHART_PRIO_IPV4_ICMP_MESSAGES , update_every , RRDSET_TYPE_LINE ); @@ -2754,16 +2726,16 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP , "tcpsock" , NULL , "tcp" , NULL - , "IPv4 TCP Connections" + , "TCP Connections" , "active connections" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_TCP + , NETDATA_CHART_PRIO_IP_TCP_ESTABLISHED_CONNS , update_every , RRDSET_TYPE_LINE ); @@ -2787,7 +2759,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP , "tcppackets" , NULL , "tcp" @@ -2796,7 +2768,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_TCP + 4 + , NETDATA_CHART_PRIO_IP_TCP_PACKETS , update_every , RRDSET_TYPE_LINE ); @@ -2826,7 +2798,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP , "tcperrors" , NULL , "tcp" @@ -2835,7 +2807,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_TCP + 20 + , NETDATA_CHART_PRIO_IP_TCP_ERRORS , update_every , RRDSET_TYPE_LINE ); @@ -2864,7 +2836,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP , "tcpopens" , NULL , "tcp" @@ -2873,7 +2845,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "connections/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_TCP + 5 + , NETDATA_CHART_PRIO_IP_TCP_OPENS , update_every , RRDSET_TYPE_LINE ); @@ -2903,7 +2875,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP , "tcphandshake" , NULL , "tcp" @@ -2912,7 +2884,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "events/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_TCP + 30 + , NETDATA_CHART_PRIO_IP_TCP_HANDSHAKE , update_every , RRDSET_TYPE_LINE ); @@ -2946,7 +2918,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "udppackets" , NULL , "udp" @@ -2955,7 +2927,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_UDP + , NETDATA_CHART_PRIO_IPV4_UDP_PACKETS , update_every , RRDSET_TYPE_LINE ); @@ -2991,7 +2963,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "udperrors" , NULL , "udp" @@ -3000,7 +2972,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "events/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_UDP + 10 + , NETDATA_CHART_PRIO_IPV4_UDP_ERRORS , update_every , RRDSET_TYPE_LINE ); @@ -3044,7 +3016,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "udplite" , NULL , "udplite" @@ -3053,7 +3025,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_UDPLITE + , NETDATA_CHART_PRIO_IPV4_UDPLITE_PACKETS , update_every , RRDSET_TYPE_LINE ); @@ -3078,7 +3050,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - RRD_TYPE_NET_SNMP + RRD_TYPE_NET_IP4 , "udplite_errors" , NULL , "udplite" @@ -3087,7 +3059,7 @@ int do_proc_net_netstat(int update_every, usec_t dt) { , "packets/s" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NETSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_UDPLITE + 10 + , NETDATA_CHART_PRIO_IPV4_UDPLITE_ERRORS , update_every , RRDSET_TYPE_LINE); diff --git a/collectors/proc.plugin/proc_net_sockstat.c b/collectors/proc.plugin/proc_net_sockstat.c index e94b891ca..b0feab5fa 100644 --- a/collectors/proc.plugin/proc_net_sockstat.c +++ b/collectors/proc.plugin/proc_net_sockstat.c @@ -228,16 +228,16 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { if(unlikely(!st)) { st = rrdset_create_localhost( - "ipv4" + "ip" , "sockstat_sockets" , NULL , "sockets" , NULL - , "IPv4 Sockets Used" + , "Sockets used for all address families" , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_SOCKETS + , NETDATA_CHART_PRIO_IP_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -272,7 +272,7 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { , NULL , "tcp" , NULL - , "IPv4 TCP Sockets" + , "TCP Sockets" , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME @@ -310,11 +310,11 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { , NULL , "tcp" , NULL - , "IPv4 TCP Sockets Memory" + , "TCP Sockets Memory" , "KiB" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_TCP_MEM + , NETDATA_CHART_PRIO_IPV4_TCP_SOCKETS_MEM , update_every , RRDSET_TYPE_AREA ); @@ -347,7 +347,7 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_UDP + , NETDATA_CHART_PRIO_IPV4_UDP_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -380,7 +380,7 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { , "KiB" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_UDP_MEM + , NETDATA_CHART_PRIO_IPV4_UDP_SOCKETS_MEM , update_every , RRDSET_TYPE_AREA ); @@ -413,7 +413,7 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_UDPLITE + , NETDATA_CHART_PRIO_IPV4_UDPLITE_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -479,7 +479,7 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { , "fragments" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_FRAGMENTS + , NETDATA_CHART_PRIO_IPV4_FRAGMENTS_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -512,7 +512,7 @@ int do_proc_net_sockstat(int update_every, usec_t dt) { , "KiB" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT_NAME - , NETDATA_CHART_PRIO_IPV4_FRAGMENTS_MEM + , NETDATA_CHART_PRIO_IPV4_FRAGMENTS_SOCKETS_MEM , update_every , RRDSET_TYPE_AREA ); diff --git a/collectors/proc.plugin/proc_net_sockstat6.c b/collectors/proc.plugin/proc_net_sockstat6.c index 065cf6055..16e0248af 100644 --- a/collectors/proc.plugin/proc_net_sockstat6.c +++ b/collectors/proc.plugin/proc_net_sockstat6.c @@ -130,7 +130,7 @@ int do_proc_net_sockstat6(int update_every, usec_t dt) { , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT6_NAME - , NETDATA_CHART_PRIO_IPV6_TCP + , NETDATA_CHART_PRIO_IPV6_TCP_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -163,7 +163,7 @@ int do_proc_net_sockstat6(int update_every, usec_t dt) { , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT6_NAME - , NETDATA_CHART_PRIO_IPV6_UDP + , NETDATA_CHART_PRIO_IPV6_UDP_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -196,7 +196,7 @@ int do_proc_net_sockstat6(int update_every, usec_t dt) { , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT6_NAME - , NETDATA_CHART_PRIO_IPV6_UDPLITE + , NETDATA_CHART_PRIO_IPV6_UDPLITE_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -229,7 +229,7 @@ int do_proc_net_sockstat6(int update_every, usec_t dt) { , "sockets" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT6_NAME - , NETDATA_CHART_PRIO_IPV6_RAW + , NETDATA_CHART_PRIO_IPV6_RAW_SOCKETS , update_every , RRDSET_TYPE_LINE ); @@ -262,7 +262,7 @@ int do_proc_net_sockstat6(int update_every, usec_t dt) { , "fragments" , PLUGIN_PROC_NAME , PLUGIN_PROC_MODULE_NET_SOCKSTAT6_NAME - , NETDATA_CHART_PRIO_IPV6_FRAGMENTS + , NETDATA_CHART_PRIO_IPV6_FRAGMENTS_SOCKETS , update_every , RRDSET_TYPE_LINE ); diff --git a/collectors/proc.plugin/sys_devices_pci_aer.c b/collectors/proc.plugin/sys_devices_pci_aer.c index 134426238..296195182 100644 --- a/collectors/proc.plugin/sys_devices_pci_aer.c +++ b/collectors/proc.plugin/sys_devices_pci_aer.c @@ -268,6 +268,11 @@ int do_proc_sys_devices_pci_aer(int update_every, usec_t dt __maybe_unused) { title = "PCI Root-Port Advanced Error Reporting (AER) Fatal Errors"; context = "pci.rootport_aer_fatal"; break; + + default: + title = "Unknown PCI Advanced Error Reporting"; + context = "pci.unknown_aer"; + break; } char id[RRD_ID_LENGTH_MAX + 1]; diff --git a/collectors/proc.plugin/sys_devices_system_edac_mc.c b/collectors/proc.plugin/sys_devices_system_edac_mc.c index 0947f61f0..fdaa22cb7 100644 --- a/collectors/proc.plugin/sys_devices_system_edac_mc.c +++ b/collectors/proc.plugin/sys_devices_system_edac_mc.c @@ -265,22 +265,22 @@ int do_proc_sys_devices_system_edac_mc(int update_every, usec_t dt __maybe_unuse char buffer[1024 + 1]; - if(read_edac_mc_rank_file(m->name, d->name, "dimm_dev_type", buffer, 1024)) + if (read_edac_mc_rank_file(m->name, d->name, "dimm_dev_type", buffer, 1024)) rrdlabels_add(d->st->rrdlabels, "dimm_dev_type", buffer, RRDLABEL_SRC_AUTO); - if(read_edac_mc_rank_file(m->name, d->name, "dimm_edac_mode", buffer, 1024)) + if (read_edac_mc_rank_file(m->name, d->name, "dimm_edac_mode", buffer, 1024)) rrdlabels_add(d->st->rrdlabels, "dimm_edac_mode", buffer, RRDLABEL_SRC_AUTO); - if(read_edac_mc_rank_file(m->name, d->name, "dimm_label", buffer, 1024)) + if (read_edac_mc_rank_file(m->name, d->name, "dimm_label", buffer, 1024)) rrdlabels_add(d->st->rrdlabels, "dimm_label", buffer, RRDLABEL_SRC_AUTO); - if(read_edac_mc_rank_file(m->name, d->name, "dimm_location", buffer, 1024)) + if (read_edac_mc_rank_file(m->name, d->name, "dimm_location", buffer, 1024)) rrdlabels_add(d->st->rrdlabels, "dimm_location", buffer, RRDLABEL_SRC_AUTO); - if(read_edac_mc_rank_file(m->name, d->name, "dimm_mem_type", buffer, 1024)) + if (read_edac_mc_rank_file(m->name, d->name, "dimm_mem_type", buffer, 1024)) rrdlabels_add(d->st->rrdlabels, "dimm_mem_type", buffer, RRDLABEL_SRC_AUTO); - if(read_edac_mc_rank_file(m->name, d->name, "size", buffer, 1024)) + if (read_edac_mc_rank_file(m->name, d->name, "size", buffer, 1024)) rrdlabels_add(d->st->rrdlabels, "size", buffer, RRDLABEL_SRC_AUTO); d->ce.rd = rrddim_add(d->st, "correctable", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL); diff --git a/collectors/python.d.plugin/adaptec_raid/README.md b/collectors/python.d.plugin/adaptec_raid/README.md index 41d5b62e0..97a103eb9 100644..120000 --- a/collectors/python.d.plugin/adaptec_raid/README.md +++ b/collectors/python.d.plugin/adaptec_raid/README.md @@ -1,103 +1 @@ -<!-- -title: "Adaptec RAID controller monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/adaptec_raid/README.md" -sidebar_label: "Adaptec RAID" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Hardware" ---> - -# Adaptec RAID controller collector - -Collects logical and physical devices metrics using `arcconf` command-line utility. - -Executed commands: - -- `sudo -n arcconf GETCONFIG 1 LD` -- `sudo -n arcconf GETCONFIG 1 PD` - -## Requirements - -The module uses `arcconf`, which can only be executed by `root`. It uses -`sudo` and assumes that it is configured such that the `netdata` user can execute `arcconf` as root without a password. - -- Add to your `/etc/sudoers` file: - -`which arcconf` shows the full path to the binary. - -```bash -netdata ALL=(root) NOPASSWD: /path/to/arcconf -``` - -- Reset Netdata's systemd - unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux - distributions with systemd) - -The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `arcconf` using `sudo`. - - -As the `root` user, do the following: - -```cmd -mkdir /etc/systemd/system/netdata.service.d -echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf -systemctl daemon-reload -systemctl restart netdata.service -``` - -## Charts - -- Logical Device Status -- Physical Device State -- Physical Device S.M.A.R.T warnings -- Physical Device Temperature - -## Enable the collector - -The `adaptec_raid` collector is disabled by default. To enable it, use `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` -file. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d.conf -``` - -Change the value of the `adaptec_raid` setting to `yes`. Save the file and restart the Netdata Agent with `sudo -systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Configuration - -Edit the `python.d/adaptec_raid.conf` configuration file using `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/adaptec_raid.conf -``` - -![image](https://user-images.githubusercontent.com/22274335/47278133-6d306680-d601-11e8-87c2-cc9c0f42d686.png) - - - - -### Troubleshooting - -To troubleshoot issues with the `adaptec_raid` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `adaptec_raid` module in debug mode: - -```bash -./python.d.plugin adaptec_raid debug trace -``` - +integrations/adaptecraid.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/adaptec_raid/integrations/adaptecraid.md b/collectors/python.d.plugin/adaptec_raid/integrations/adaptecraid.md new file mode 100644 index 000000000..59e359d0d --- /dev/null +++ b/collectors/python.d.plugin/adaptec_raid/integrations/adaptecraid.md @@ -0,0 +1,203 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/adaptec_raid/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/adaptec_raid/metadata.yaml" +sidebar_label: "AdaptecRAID" +learn_status: "Published" +learn_rel_path: "Data Collection/Storage, Mount Points and Filesystems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# AdaptecRAID + + +<img src="https://netdata.cloud/img/adaptec.svg" width="150"/> + + +Plugin: python.d.plugin +Module: adaptec_raid + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Adaptec RAID hardware storage controller metrics about both physical and logical drives. + + +It uses the arcconf command line utility (from adaptec) to monitor your raid controller. + +Executed commands: + - `sudo -n arcconf GETCONFIG 1 LD` + - `sudo -n arcconf GETCONFIG 1 PD` + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + +The module uses arcconf, which can only be executed by root. It uses sudo and assumes that it is configured such that the netdata user can execute arcconf as root without a password. + +### Default Behavior + +#### Auto-Detection + +After all the permissions are satisfied, netdata should be to execute commands via the arcconf command line utility + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per AdaptecRAID instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| adaptec_raid.ld_status | a dimension per logical device | bool | +| adaptec_raid.pd_state | a dimension per physical device | bool | +| adaptec_raid.smart_warnings | a dimension per physical device | count | +| adaptec_raid.temperature | a dimension per physical device | celsius | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ adaptec_raid_ld_status ](https://github.com/netdata/netdata/blob/master/health/health.d/adaptec_raid.conf) | adaptec_raid.ld_status | logical device status is failed or degraded | +| [ adaptec_raid_pd_state ](https://github.com/netdata/netdata/blob/master/health/health.d/adaptec_raid.conf) | adaptec_raid.pd_state | physical device state is not online | + + +## Setup + +### Prerequisites + +#### Grant permissions for netdata, to run arcconf as sudoer + +The module uses arcconf, which can only be executed by root. It uses sudo and assumes that it is configured such that the netdata user can execute arcconf as root without a password. + +Add to your /etc/sudoers file: +which arcconf shows the full path to the binary. + +```bash +netdata ALL=(root) NOPASSWD: /path/to/arcconf +``` + + +#### Reset Netdata's systemd unit CapabilityBoundingSet (Linux distributions with systemd) + +The default CapabilityBoundingSet doesn't allow using sudo, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute arcconf using sudo. + +As root user, do the following: + +```bash +mkdir /etc/systemd/system/netdata.service.d +echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf +systemctl daemon-reload +systemctl restart netdata.service +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/adaptec_raid.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/adaptec_raid.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration per job + +```yaml +job_name: + name: my_job_name + update_every: 1 # the JOB's data collection frequency + priority: 60000 # the JOB's order on the dashboard + penalty: yes # the JOB's penalty + autodetection_retry: 0 # the JOB's re-check interval in seconds + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `adaptec_raid` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin adaptec_raid debug trace + ``` + + diff --git a/collectors/python.d.plugin/adaptec_raid/metadata.yaml b/collectors/python.d.plugin/adaptec_raid/metadata.yaml index 7ee4ce7c2..c69baff4a 100644 --- a/collectors/python.d.plugin/adaptec_raid/metadata.yaml +++ b/collectors/python.d.plugin/adaptec_raid/metadata.yaml @@ -27,8 +27,8 @@ modules: It uses the arcconf command line utility (from adaptec) to monitor your raid controller. Executed commands: - - sudo -n arcconf GETCONFIG 1 LD - - sudo -n arcconf GETCONFIG 1 PD + - `sudo -n arcconf GETCONFIG 1 LD` + - `sudo -n arcconf GETCONFIG 1 PD` supported_platforms: include: [] exclude: [] diff --git a/collectors/python.d.plugin/alarms/README.md b/collectors/python.d.plugin/alarms/README.md index 0f956b291..85759ae6c 100644..120000 --- a/collectors/python.d.plugin/alarms/README.md +++ b/collectors/python.d.plugin/alarms/README.md @@ -1,89 +1 @@ -<!-- -title: "Alarms" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/alarms/README.md" -sidebar_label: "Alarms" -learn_status: "Published" -learn_rel_path: "Integrations/Monitor/Netdata" ---> - -# Alarms - -This collector creates an 'Alarms' menu with one line plot showing alarm states over time. Alarm states are mapped to integer values according to the below default mapping. Any alarm status types not in this mapping will be ignored (Note: This mapping can be changed by editing the `status_map` in the `alarms.conf` file). If you would like to learn more about the different alarm statuses check out the docs [here](https://github.com/netdata/netdata/blob/master/health/REFERENCE.md#alarm-statuses). - -``` -{ - 'CLEAR': 0, - 'WARNING': 1, - 'CRITICAL': 2 -} -``` - -## Charts - -Below is an example of the chart produced when running `stress-ng --all 2` for a few minutes. You can see the various warning and critical alarms raised. - -![alarms collector](https://user-images.githubusercontent.com/1153921/101641493-0b086a80-39ef-11eb-9f55-0713e5dfb19f.png) - -## Configuration - -Enable the collector and [restart Netdata](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md). - -```bash -cd /etc/netdata/ -sudo ./edit-config python.d.conf -# Set `alarms: no` to `alarms: yes` -sudo systemctl restart netdata -``` - -If needed, edit the `python.d/alarms.conf` configuration file using `edit-config` from the your agent's [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is usually at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/alarms.conf -``` - -The `alarms` specific part of the `alarms.conf` file should look like this: - -```yaml -# what url to pull data from -local: - url: 'http://127.0.0.1:19999/api/v1/alarms?all' - # define how to map alarm status to numbers for the chart - status_map: - CLEAR: 0 - WARNING: 1 - CRITICAL: 2 - # set to true to include a chart with calculated alarm values over time - collect_alarm_values: false - # define the type of chart for plotting status over time e.g. 'line' or 'stacked' - alarm_status_chart_type: 'line' - # a "," separated list of words you want to filter alarm names for. For example 'cpu,load' would filter for only - # alarms with "cpu" or "load" in alarm name. Default includes all. - alarm_contains_words: '' - # a "," separated list of words you want to exclude based on alarm name. For example 'cpu,load' would exclude - # all alarms with "cpu" or "load" in alarm name. Default excludes None. - alarm_excludes_words: '' -``` - -It will default to pulling all alarms at each time step from the Netdata rest api at `http://127.0.0.1:19999/api/v1/alarms?all` -### Troubleshooting - -To troubleshoot issues with the `alarms` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `alarms` module in debug mode: - -```bash -./python.d.plugin alarms debug trace -``` - +integrations/netdata_agent_alarms.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/alarms/integrations/netdata_agent_alarms.md b/collectors/python.d.plugin/alarms/integrations/netdata_agent_alarms.md new file mode 100644 index 000000000..95e4a4a3b --- /dev/null +++ b/collectors/python.d.plugin/alarms/integrations/netdata_agent_alarms.md @@ -0,0 +1,200 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/alarms/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/alarms/metadata.yaml" +sidebar_label: "Netdata Agent alarms" +learn_status: "Published" +learn_rel_path: "Data Collection/Other" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Netdata Agent alarms + +Plugin: python.d.plugin +Module: alarms + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector creates an 'Alarms' menu with one line plot of `alarms.status`. + + +Alarm status is read from the Netdata agent rest api [`/api/v1/alarms?all`](https://learn.netdata.cloud/api#/alerts/alerts1). + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +It discovers instances of Netdata running on localhost, and gathers metrics from `http://127.0.0.1:19999/api/v1/alarms?all`. `CLEAR` status is mapped to `0`, `WARNING` to `1` and `CRITICAL` to `2`. Also, by default all alarms produced will be monitored. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Netdata Agent alarms instance + +These metrics refer to the entire monitored application. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| alarms.status | a dimension per alarm representing the latest status of the alarm. | status | +| alarms.values | a dimension per alarm representing the latest collected value of the alarm. | value | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/alarms.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/alarms.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| url | Netdata agent alarms endpoint to collect from. Can be local or remote so long as reachable by agent. | http://127.0.0.1:19999/api/v1/alarms?all | True | +| status_map | Mapping of alarm status to integer number that will be the metric value collected. | {"CLEAR": 0, "WARNING": 1, "CRITICAL": 2} | True | +| collect_alarm_values | set to true to include a chart with calculated alarm values over time. | False | True | +| alarm_status_chart_type | define the type of chart for plotting status over time e.g. 'line' or 'stacked'. | line | True | +| alarm_contains_words | A "," separated list of words you want to filter alarm names for. For example 'cpu,load' would filter for only alarms with "cpu" or "load" in alarm name. Default includes all. | | True | +| alarm_excludes_words | A "," separated list of words you want to exclude based on alarm name. For example 'cpu,load' would exclude all alarms with "cpu" or "load" in alarm name. Default excludes None. | | True | +| update_every | Sets the default data collection frequency. | 10 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration. + +```yaml +jobs: + url: 'http://127.0.0.1:19999/api/v1/alarms?all' + +``` +##### Advanced + +An advanced example configuration with multiple jobs collecting different subsets of alarms for plotting on different charts. +"ML" job will collect status and values for all alarms with "ml_" in the name. Default job will collect status for all other alarms. + + +<details><summary>Config</summary> + +```yaml +ML: + update_every: 5 + url: 'http://127.0.0.1:19999/api/v1/alarms?all' + status_map: + CLEAR: 0 + WARNING: 1 + CRITICAL: 2 + collect_alarm_values: true + alarm_status_chart_type: 'stacked' + alarm_contains_words: 'ml_' + +Default: + update_every: 5 + url: 'http://127.0.0.1:19999/api/v1/alarms?all' + status_map: + CLEAR: 0 + WARNING: 1 + CRITICAL: 2 + collect_alarm_values: false + alarm_status_chart_type: 'stacked' + alarm_excludes_words: 'ml_' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `alarms` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin alarms debug trace + ``` + + diff --git a/collectors/python.d.plugin/am2320/README.md b/collectors/python.d.plugin/am2320/README.md index b8a6acb0b..0bc5ea90e 100644..120000 --- a/collectors/python.d.plugin/am2320/README.md +++ b/collectors/python.d.plugin/am2320/README.md @@ -1,76 +1 @@ -<!-- -title: "AM2320 sensor monitoring with netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/am2320/README.md" -sidebar_label: "AM2320" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Remotes/Devices" ---> - -# AM2320 sensor monitoring with netdata - -Displays a graph of the temperature and humidity from a AM2320 sensor. - -## Requirements - - Adafruit Circuit Python AM2320 library - - Adafruit AM2320 I2C sensor - - Python 3 (Adafruit libraries are not Python 2.x compatible) - - -It produces the following charts: -1. **Temperature** -2. **Humidity** - -## Configuration - -Edit the `python.d/am2320.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/am2320.conf -``` - -Raspberry Pi Instructions: - -Hardware install: -Connect the am2320 to the Raspberry Pi I2C pins - -Raspberry Pi 3B/4 Pins: - -- Board 3.3V (pin 1) to sensor VIN (pin 1) -- Board SDA (pin 3) to sensor SDA (pin 2) -- Board GND (pin 6) to sensor GND (pin 3) -- Board SCL (pin 5) to sensor SCL (pin 4) - -You may also need to add two I2C pullup resistors if your board does not already have them. The Raspberry Pi does have internal pullup resistors but it doesn't hurt to add them anyway. You can use 2.2K - 10K but we will just use 10K. The resistors go from VDD to SCL and SDA each. - -Software install: -- `sudo pip3 install adafruit-circuitpython-am2320` -- edit `/etc/netdata/netdata.conf` -- find `[plugin:python.d]` -- add `command options = -ppython3` -- save the file. -- restart the netdata service. -- check the dashboard. - -### Troubleshooting - -To troubleshoot issues with the `am2320` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `am2320` module in debug mode: - -```bash -./python.d.plugin am2320 debug trace -``` - +integrations/am2320.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/am2320/integrations/am2320.md b/collectors/python.d.plugin/am2320/integrations/am2320.md new file mode 100644 index 000000000..9b41a8fd6 --- /dev/null +++ b/collectors/python.d.plugin/am2320/integrations/am2320.md @@ -0,0 +1,180 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/am2320/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/am2320/metadata.yaml" +sidebar_label: "AM2320" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# AM2320 + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: python.d.plugin +Module: am2320 + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors AM2320 sensor metrics about temperature and humidity. + +It retrieves temperature and humidity values by contacting an AM2320 sensor over i2c. + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +Assuming prerequisites are met, the collector will try to connect to the sensor via i2c + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per AM2320 instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| am2320.temperature | temperature | celsius | +| am2320.humidity | humidity | percentage | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Sensor connection to a Raspberry Pi + +Connect the am2320 to the Raspberry Pi I2C pins + +Raspberry Pi 3B/4 Pins: + +- Board 3.3V (pin 1) to sensor VIN (pin 1) +- Board SDA (pin 3) to sensor SDA (pin 2) +- Board GND (pin 6) to sensor GND (pin 3) +- Board SCL (pin 5) to sensor SCL (pin 4) + +You may also need to add two I2C pullup resistors if your board does not already have them. The Raspberry Pi does have internal pullup resistors but it doesn't hurt to add them anyway. You can use 2.2K - 10K but we will just use 10K. The resistors go from VDD to SCL and SDA each. + + +#### Software requirements + +Install the Adafruit Circuit Python AM2320 library: + +`sudo pip3 install adafruit-circuitpython-am2320` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/am2320.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/am2320.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Local sensor + +A basic JOB configuration + +```yaml +local_sensor: + name: 'Local AM2320' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `am2320` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin am2320 debug trace + ``` + + diff --git a/collectors/python.d.plugin/beanstalk/README.md b/collectors/python.d.plugin/beanstalk/README.md index c86ca354a..4efe13889 100644..120000 --- a/collectors/python.d.plugin/beanstalk/README.md +++ b/collectors/python.d.plugin/beanstalk/README.md @@ -1,156 +1 @@ -<!-- -title: "Beanstalk monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/beanstalk/README.md" -sidebar_label: "Beanstalk" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Message brokers" ---> - -# Beanstalk collector - -Provides server and tube-level statistics. - -## Requirements - -- `python-beanstalkc` - -**Server statistics:** - -1. **Cpu usage** in cpu time - - - user - - system - -2. **Jobs rate** in jobs/s - - - total - - timeouts - -3. **Connections rate** in connections/s - - - connections - -4. **Commands rate** in commands/s - - - put - - peek - - peek-ready - - peek-delayed - - peek-buried - - reserve - - use - - watch - - ignore - - delete - - release - - bury - - kick - - stats - - stats-job - - stats-tube - - list-tubes - - list-tube-used - - list-tubes-watched - - pause-tube - -5. **Current tubes** in tubes - - - tubes - -6. **Current jobs** in jobs - - - urgent - - ready - - reserved - - delayed - - buried - -7. **Current connections** in connections - - - written - - producers - - workers - - waiting - -8. **Binlog** in records/s - - - written - - migrated - -9. **Uptime** in seconds - - - uptime - -**Per tube statistics:** - -1. **Jobs rate** in jobs/s - - - jobs - -2. **Jobs** in jobs - - - using - - ready - - reserved - - delayed - - buried - -3. **Connections** in connections - - - using - - waiting - - watching - -4. **Commands** in commands/s - - - deletes - - pauses - -5. **Pause** in seconds - - - since - - left - -## Configuration - -Edit the `python.d/beanstalk.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/beanstalk.conf -``` - -Sample: - -```yaml -host : '127.0.0.1' -port : 11300 -``` - -If no configuration is given, module will attempt to connect to beanstalkd on `127.0.0.1:11300` address - - - - -### Troubleshooting - -To troubleshoot issues with the `beanstalk` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `beanstalk` module in debug mode: - -```bash -./python.d.plugin beanstalk debug trace -``` - +integrations/beanstalk.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/beanstalk/integrations/beanstalk.md b/collectors/python.d.plugin/beanstalk/integrations/beanstalk.md new file mode 100644 index 000000000..cf2f0dac1 --- /dev/null +++ b/collectors/python.d.plugin/beanstalk/integrations/beanstalk.md @@ -0,0 +1,218 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/beanstalk/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/beanstalk/metadata.yaml" +sidebar_label: "Beanstalk" +learn_status: "Published" +learn_rel_path: "Data Collection/Message Brokers" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Beanstalk + + +<img src="https://netdata.cloud/img/beanstalk.svg" width="150"/> + + +Plugin: python.d.plugin +Module: beanstalk + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Beanstalk metrics to enhance job queueing and processing efficiency. Track job rates, processing times, and queue lengths for better task management. + +The collector uses the `beanstalkc` python module to connect to a `beanstalkd` service and gather metrics. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +If no configuration is given, module will attempt to connect to beanstalkd on 127.0.0.1:11300 address. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Beanstalk instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| beanstalk.cpu_usage | user, system | cpu time | +| beanstalk.jobs_rate | total, timeouts | jobs/s | +| beanstalk.connections_rate | connections | connections/s | +| beanstalk.commands_rate | put, peek, peek-ready, peek-delayed, peek-buried, reserve, use, watch, ignore, delete, bury, kick, stats, stats-job, stats-tube, list-tubes, list-tube-used, list-tubes-watched, pause-tube | commands/s | +| beanstalk.connections_rate | tubes | tubes | +| beanstalk.current_jobs | urgent, ready, reserved, delayed, buried | jobs | +| beanstalk.current_connections | written, producers, workers, waiting | connections | +| beanstalk.binlog | written, migrated | records/s | +| beanstalk.uptime | uptime | seconds | + +### Per tube + +Metrics related to Beanstalk tubes. Each tube produces its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| beanstalk.jobs_rate | jobs | jobs/s | +| beanstalk.jobs | urgent, ready, reserved, delayed, buried | jobs | +| beanstalk.connections | using, waiting, watching | connections | +| beanstalk.commands | deletes, pauses | commands/s | +| beanstalk.pause | since, left | seconds | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ beanstalk_server_buried_jobs ](https://github.com/netdata/netdata/blob/master/health/health.d/beanstalkd.conf) | beanstalk.current_jobs | number of buried jobs across all tubes. You need to manually kick them so they can be processed. Presence of buried jobs in a tube does not affect new jobs. | + + +## Setup + +### Prerequisites + +#### beanstalkc python module + +The collector requires the `beanstalkc` python module to be installed. + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/beanstalk.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/beanstalk.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| host | IP or URL to a beanstalk service. | 127.0.0.1 | False | +| port | Port to the IP or URL to a beanstalk service. | 11300 | False | + +</details> + +#### Examples + +##### Remote beanstalk server + +A basic remote beanstalk server + +```yaml +remote: + name: 'beanstalk' + host: '1.2.3.4' + port: 11300 + +``` +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local_beanstalk' + host: '127.0.0.1' + port: 11300 + +remote_job: + name: 'remote_beanstalk' + host: '192.0.2.1' + port: 113000 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `beanstalk` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin beanstalk debug trace + ``` + + diff --git a/collectors/python.d.plugin/beanstalk/metadata.yaml b/collectors/python.d.plugin/beanstalk/metadata.yaml index b6ff2f116..7dff9cb3a 100644 --- a/collectors/python.d.plugin/beanstalk/metadata.yaml +++ b/collectors/python.d.plugin/beanstalk/metadata.yaml @@ -8,7 +8,7 @@ modules: link: "https://beanstalkd.github.io/" categories: - data-collection.message-brokers - - data-collection.task-queues + #- data-collection.task-queues icon_filename: "beanstalk.svg" related_resources: integrations: diff --git a/collectors/python.d.plugin/bind_rndc/README.md b/collectors/python.d.plugin/bind_rndc/README.md index aa173f385..03a182ae8 100644..120000 --- a/collectors/python.d.plugin/bind_rndc/README.md +++ b/collectors/python.d.plugin/bind_rndc/README.md @@ -1,102 +1 @@ -<!-- -title: "ISC Bind monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/bind_rndc/README.md" -sidebar_label: "ISC Bind" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# ISC Bind collector - -Collects Name server summary performance statistics using `rndc` tool. - -## Requirements - -- Version of bind must be 9.6 + -- Netdata must have permissions to run `rndc stats` - -It produces: - -1. **Name server statistics** - - - requests - - responses - - success - - auth_answer - - nonauth_answer - - nxrrset - - failure - - nxdomain - - recursion - - duplicate - - rejections - -2. **Incoming queries** - - - RESERVED0 - - A - - NS - - CNAME - - SOA - - PTR - - MX - - TXT - - X25 - - AAAA - - SRV - - NAPTR - - A6 - - DS - - RSIG - - DNSKEY - - SPF - - ANY - - DLV - -3. **Outgoing queries** - -- Same as Incoming queries - -## Configuration - -Edit the `python.d/bind_rndc.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/bind_rndc.conf -``` - -Sample: - -```yaml -local: - named_stats_path : '/var/log/bind/named.stats' -``` - -If no configuration is given, module will attempt to read named.stats file at `/var/log/bind/named.stats` - - - - -### Troubleshooting - -To troubleshoot issues with the `bind_rndc` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `bind_rndc` module in debug mode: - -```bash -./python.d.plugin bind_rndc debug trace -``` - +integrations/isc_bind_rndc.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/bind_rndc/integrations/isc_bind_rndc.md b/collectors/python.d.plugin/bind_rndc/integrations/isc_bind_rndc.md new file mode 100644 index 000000000..cc847272d --- /dev/null +++ b/collectors/python.d.plugin/bind_rndc/integrations/isc_bind_rndc.md @@ -0,0 +1,214 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/bind_rndc/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/bind_rndc/metadata.yaml" +sidebar_label: "ISC Bind (RNDC)" +learn_status: "Published" +learn_rel_path: "Data Collection/DNS and DHCP Servers" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# ISC Bind (RNDC) + + +<img src="https://netdata.cloud/img/isc.png" width="150"/> + + +Plugin: python.d.plugin +Module: bind_rndc + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor ISCBind (RNDC) performance for optimal DNS server operations. Monitor query rates, response times, and error rates to ensure reliable DNS service delivery. + +This collector uses the `rndc` tool to dump (named.stats) statistics then read them to gather Bind Name Server summary performance metrics. + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +If no configuration is given, the collector will attempt to read named.stats file at `/var/log/bind/named.stats` + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per ISC Bind (RNDC) instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| bind_rndc.name_server_statistics | requests, rejected_queries, success, failure, responses, duplicate, recursion, nxrrset, nxdomain, non_auth_answer, auth_answer, dropped_queries | stats | +| bind_rndc.incoming_queries | a dimension per incoming query type | queries | +| bind_rndc.outgoing_queries | a dimension per outgoing query type | queries | +| bind_rndc.stats_size | stats_size | MiB | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ bind_rndc_stats_file_size ](https://github.com/netdata/netdata/blob/master/health/health.d/bind_rndc.conf) | bind_rndc.stats_size | BIND statistics-file size | + + +## Setup + +### Prerequisites + +#### Minimum bind version and permissions + +Version of bind must be >=9.6 and the Netdata user must have permissions to run `rndc stats` + +#### Setup log rotate for bind stats + +BIND appends logs at EVERY RUN. It is NOT RECOMMENDED to set `update_every` below 30 sec. +It is STRONGLY RECOMMENDED to create a `bind-rndc.conf` file for logrotate. + +To set up BIND to dump stats do the following: + +1. Add to 'named.conf.options' options {}: +`statistics-file "/var/log/bind/named.stats";` + +2. Create bind/ directory in /var/log: +`cd /var/log/ && mkdir bind` + +3. Change owner of directory to 'bind' user: +`chown bind bind/` + +4. RELOAD (NOT restart) BIND: +`systemctl reload bind9.service` + +5. Run as a root 'rndc stats' to dump (BIND will create named.stats in new directory) + +To allow Netdata to run 'rndc stats' change '/etc/bind/rndc.key' group to netdata: +`chown :netdata rndc.key` + +Last, BUT NOT least, is to create bind-rndc.conf in logrotate.d/: +``` +/var/log/bind/named.stats { + + daily + rotate 4 + compress + delaycompress + create 0644 bind bind + missingok + postrotate + rndc reload > /dev/null + endscript +} +``` +To test your logrotate conf file run as root: +`logrotate /etc/logrotate.d/bind-rndc -d (debug dry-run mode)` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/bind_rndc.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/bind_rndc.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| named_stats_path | Path to the named stats, after being dumped by `nrdc` | /var/log/bind/named.stats | False | + +</details> + +#### Examples + +##### Local bind stats + +Define a local path to bind stats file + +```yaml +local: + named_stats_path: '/var/log/bind/named.stats' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `bind_rndc` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin bind_rndc debug trace + ``` + + diff --git a/collectors/python.d.plugin/bind_rndc/metadata.yaml b/collectors/python.d.plugin/bind_rndc/metadata.yaml index 1e9fb24fe..e3568e448 100644 --- a/collectors/python.d.plugin/bind_rndc/metadata.yaml +++ b/collectors/python.d.plugin/bind_rndc/metadata.yaml @@ -4,7 +4,7 @@ modules: plugin_name: python.d.plugin module_name: bind_rndc monitored_instance: - name: ISCBind (RNDC) + name: ISC Bind (RNDC) link: "https://www.isc.org/bind/" categories: - data-collection.dns-and-dhcp-servers diff --git a/collectors/python.d.plugin/boinc/README.md b/collectors/python.d.plugin/boinc/README.md index ea4397754..22c10ca17 100644..120000 --- a/collectors/python.d.plugin/boinc/README.md +++ b/collectors/python.d.plugin/boinc/README.md @@ -1,64 +1 @@ -<!-- -title: "BOINC monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/boinc/README.md" -sidebar_label: "BOINC" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Distributed computing" ---> - -# BOINC collector - -Monitors task counts for the Berkeley Open Infrastructure Networking Computing (BOINC) distributed computing client using the same RPC interface that the BOINC monitoring GUI does. - -It provides charts tracking the total number of tasks and active tasks, as well as ones tracking each of the possible states for tasks. - -## Configuration - -Edit the `python.d/boinc.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/boinc.conf -``` - -BOINC requires use of a password to access it's RPC interface. You can -find this password in the `gui_rpc_auth.cfg` file in your BOINC directory. - -By default, the module will try to auto-detect the password by looking -in `/var/lib/boinc` for this file (this is the location most Linux -distributions use for a system-wide BOINC installation), so things may -just work without needing configuration for the local system. - -You can monitor remote systems as well: - -```yaml -remote: - hostname: some-host - password: some-password -``` - - - - -### Troubleshooting - -To troubleshoot issues with the `boinc` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `boinc` module in debug mode: - -```bash -./python.d.plugin boinc debug trace -``` - +integrations/boinc.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/boinc/integrations/boinc.md b/collectors/python.d.plugin/boinc/integrations/boinc.md new file mode 100644 index 000000000..961f79537 --- /dev/null +++ b/collectors/python.d.plugin/boinc/integrations/boinc.md @@ -0,0 +1,203 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/boinc/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/boinc/metadata.yaml" +sidebar_label: "BOINC" +learn_status: "Published" +learn_rel_path: "Data Collection/Distributed Computing Systems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# BOINC + + +<img src="https://netdata.cloud/img/bolt.svg" width="150"/> + + +Plugin: python.d.plugin +Module: boinc + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors task counts for the Berkeley Open Infrastructure Networking Computing (BOINC) distributed computing client. + +It uses the same RPC interface that the BOINC monitoring GUI does. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, the module will try to auto-detect the password to the RPC interface by looking in `/var/lib/boinc` for this file (this is the location most Linux distributions use for a system-wide BOINC installation), so things may just work without needing configuration for a local system. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per BOINC instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| boinc.tasks | Total, Active | tasks | +| boinc.states | New, Downloading, Ready to Run, Compute Errors, Uploading, Uploaded, Aborted, Failed Uploads | tasks | +| boinc.sched | Uninitialized, Preempted, Scheduled | tasks | +| boinc.process | Uninitialized, Executing, Suspended, Aborted, Quit, Copy Pending | tasks | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ boinc_total_tasks ](https://github.com/netdata/netdata/blob/master/health/health.d/boinc.conf) | boinc.tasks | average number of total tasks over the last 10 minutes | +| [ boinc_active_tasks ](https://github.com/netdata/netdata/blob/master/health/health.d/boinc.conf) | boinc.tasks | average number of active tasks over the last 10 minutes | +| [ boinc_compute_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/boinc.conf) | boinc.states | average number of compute errors over the last 10 minutes | +| [ boinc_upload_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/boinc.conf) | boinc.states | average number of failed uploads over the last 10 minutes | + + +## Setup + +### Prerequisites + +#### Boinc RPC interface + +BOINC requires use of a password to access it's RPC interface. You can find this password in the `gui_rpc_auth.cfg` file in your BOINC directory. + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/boinc.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/boinc.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| hostname | Define a hostname where boinc is running. | localhost | False | +| port | The port of boinc RPC interface. | | False | +| password | Provide a password to connect to a boinc RPC interface. | | False | + +</details> + +#### Examples + +##### Configuration of a remote boinc instance + +A basic JOB configuration for a remote boinc instance + +```yaml +remote: + hostname: '1.2.3.4' + port: 1234 + password: 'some-password' + +``` +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + host: '127.0.0.1' + port: 1234 + password: 'some-password' + +remote_job: + name: 'remote' + host: '192.0.2.1' + port: 1234 + password: some-other-password + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `boinc` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin boinc debug trace + ``` + + diff --git a/collectors/python.d.plugin/ceph/README.md b/collectors/python.d.plugin/ceph/README.md index 555491ad7..654248b70 100644..120000 --- a/collectors/python.d.plugin/ceph/README.md +++ b/collectors/python.d.plugin/ceph/README.md @@ -1,71 +1 @@ -<!-- -title: "CEPH monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ceph/README.md" -sidebar_label: "CEPH" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Storage" ---> - -# CEPH collector - -Monitors the ceph cluster usage and consumption data of a server, and produces: - -- Cluster statistics (usage, available, latency, objects, read/write rate) -- OSD usage -- OSD latency -- Pool usage -- Pool read/write operations -- Pool read/write rate -- number of objects per pool - -## Requirements - -- `rados` python module -- Granting read permissions to ceph group from keyring file - -```shell -# chmod 640 /etc/ceph/ceph.client.admin.keyring -``` - -## Configuration - -Edit the `python.d/ceph.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/ceph.conf -``` - -Sample: - -```yaml -local: - config_file: '/etc/ceph/ceph.conf' - keyring_file: '/etc/ceph/ceph.client.admin.keyring' -``` - - - - -### Troubleshooting - -To troubleshoot issues with the `ceph` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `ceph` module in debug mode: - -```bash -./python.d.plugin ceph debug trace -``` - +integrations/ceph.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/ceph/integrations/ceph.md b/collectors/python.d.plugin/ceph/integrations/ceph.md new file mode 100644 index 000000000..051121148 --- /dev/null +++ b/collectors/python.d.plugin/ceph/integrations/ceph.md @@ -0,0 +1,193 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ceph/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ceph/metadata.yaml" +sidebar_label: "Ceph" +learn_status: "Published" +learn_rel_path: "Data Collection/Storage, Mount Points and Filesystems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Ceph + + +<img src="https://netdata.cloud/img/ceph.svg" width="150"/> + + +Plugin: python.d.plugin +Module: ceph + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Ceph metrics about Cluster statistics, OSD usage, latency and Pool statistics. + +Uses the `rados` python module to connect to a Ceph cluster. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Ceph instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ceph.general_usage | avail, used | KiB | +| ceph.general_objects | cluster | objects | +| ceph.general_bytes | read, write | KiB/s | +| ceph.general_operations | read, write | operations | +| ceph.general_latency | apply, commit | milliseconds | +| ceph.pool_usage | a dimension per Ceph Pool | KiB | +| ceph.pool_objects | a dimension per Ceph Pool | objects | +| ceph.pool_read_bytes | a dimension per Ceph Pool | KiB/s | +| ceph.pool_write_bytes | a dimension per Ceph Pool | KiB/s | +| ceph.pool_read_operations | a dimension per Ceph Pool | operations | +| ceph.pool_write_operations | a dimension per Ceph Pool | operations | +| ceph.osd_usage | a dimension per Ceph OSD | KiB | +| ceph.osd_size | a dimension per Ceph OSD | KiB | +| ceph.apply_latency | a dimension per Ceph OSD | milliseconds | +| ceph.commit_latency | a dimension per Ceph OSD | milliseconds | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ ceph_cluster_space_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/ceph.conf) | ceph.general_usage | cluster disk space utilization | + + +## Setup + +### Prerequisites + +#### `rados` python module + +Make sure the `rados` python module is installed + +#### Granting read permissions to ceph group from keyring file + +Execute: `chmod 640 /etc/ceph/ceph.client.admin.keyring` + +#### Create a specific rados_id + +You can optionally create a rados_id to use instead of admin + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/ceph.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/ceph.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| config_file | Ceph config file | | True | +| keyring_file | Ceph keyring file. netdata user must be added into ceph group and keyring file must be read group permission. | | True | +| rados_id | A rados user id to use for connecting to the Ceph cluster. | admin | False | + +</details> + +#### Examples + +##### Basic local Ceph cluster + +A basic configuration to connect to a local Ceph cluster. + +```yaml +local: + config_file: '/etc/ceph/ceph.conf' + keyring_file: '/etc/ceph/ceph.client.admin.keyring' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `ceph` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin ceph debug trace + ``` + + diff --git a/collectors/python.d.plugin/changefinder/README.md b/collectors/python.d.plugin/changefinder/README.md index 0e9bab887..0ca704eb1 100644..120000 --- a/collectors/python.d.plugin/changefinder/README.md +++ b/collectors/python.d.plugin/changefinder/README.md @@ -1,241 +1 @@ -<!-- -title: "Online change point detection with Netdata" -description: "Use ML-driven change point detection to narrow your focus and shorten root cause analysis." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/changefinder/README.md" -sidebar_label: "changefinder" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/QoS" ---> - -# Online change point detection with Netdata - -This collector uses the Python [changefinder](https://github.com/shunsukeaihara/changefinder) library to -perform [online](https://en.wikipedia.org/wiki/Online_machine_learning) [changepoint detection](https://en.wikipedia.org/wiki/Change_detection) -on your Netdata charts and/or dimensions. - -Instead of this collector just _collecting_ data, it also does some computation on the data it collects to return a -changepoint score for each chart or dimension you configure it to work on. This is -an [online](https://en.wikipedia.org/wiki/Online_machine_learning) machine learning algorithm so there is no batch step -to train the model, instead it evolves over time as more data arrives. That makes this particular algorithm quite cheap -to compute at each step of data collection (see the notes section below for more details) and it should scale fairly -well to work on lots of charts or hosts (if running on a parent node for example). - -> As this is a somewhat unique collector and involves often subjective concepts like changepoints and anomalies, we would love to hear any feedback on it from the community. Please let us know on the [community forum](https://community.netdata.cloud/t/changefinder-collector-feedback/972) or drop us a note at [analytics-ml-team@netdata.cloud](mailto:analytics-ml-team@netdata.cloud) for any and all feedback, both positive and negative. This sort of feedback is priceless to help us make complex features more useful. - -## Charts - -Two charts are available: - -### ChangeFinder Scores (`changefinder.scores`) - -This chart shows the percentile of the score that is output from the ChangeFinder library (it is turned off by default -but available with `show_scores: true`). - -A high observed score is more likely to be a valid changepoint worth exploring, even more so when multiple charts or -dimensions have high changepoint scores at the same time or very close together. - -### ChangeFinder Flags (`changefinder.flags`) - -This chart shows `1` or `0` if the latest score has a percentile value that exceeds the `cf_threshold` threshold. By -default, any scores that are in the 99th or above percentile will raise a flag on this chart. - -The raw changefinder score itself can be a little noisy and so limiting ourselves to just periods where it surpasses -the 99th percentile can help manage the "[signal to noise ratio](https://en.wikipedia.org/wiki/Signal-to-noise_ratio)" -better. - -The `cf_threshold` parameter might be one you want to play around with to tune things specifically for the workloads on -your node and the specific charts you want to monitor. For example, maybe the 95th percentile might work better for you -than the 99th percentile. - -Below is an example of the chart produced by this collector. The first 3/4 of the period looks normal in that we see a -few individual changes being picked up somewhat randomly over time. But then at around 14:59 towards the end of the -chart we see two periods with 'spikes' of multiple changes for a small period of time. This is the sort of pattern that -might be a sign something on the system that has changed sufficiently enough to merit some investigation. - -![changepoint-collector](https://user-images.githubusercontent.com/2178292/108773528-665de980-7556-11eb-895d-798669bcd695.png) - -## Requirements - -- This collector will only work with Python 3 and requires the packages below be installed. - -```bash -# become netdata user -sudo su -s /bin/bash netdata -# install required packages for the netdata user -pip3 install --user numpy==1.19.5 changefinder==0.03 scipy==1.5.4 -``` - -**Note**: if you need to tell Netdata to use Python 3 then you can pass the below command in the python plugin section -of your `netdata.conf` file. - -```yaml -[ plugin:python.d ] - # update every = 1 - command options = -ppython3 -``` - -## Configuration - -Install the Python requirements above, enable the collector and restart Netdata. - -```bash -cd /etc/netdata/ -sudo ./edit-config python.d.conf -# Set `changefinder: no` to `changefinder: yes` -sudo systemctl restart netdata -``` - -The configuration for the changefinder collector defines how it will behave on your system and might take some -experimentation with over time to set it optimally for your node. Out of the box, the config comes with -some [sane defaults](https://www.netdata.cloud/blog/redefining-monitoring-netdata/) to get you started that try to -balance the flexibility and power of the ML models with the goal of being as cheap as possible in term of cost on the -node resources. - -_**Note**: If you are unsure about any of the below configuration options then it's best to just ignore all this and -leave the `changefinder.conf` file alone to begin with. Then you can return to it later if you would like to tune things -a bit more once the collector is running for a while and you have a feeling for its performance on your node._ - -Edit the `python.d/changefinder.conf` configuration file using `edit-config` from the your -agent's [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is usually at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/changefinder.conf -``` - -The default configuration should look something like this. Here you can see each parameter (with sane defaults) and some -information about each one and what it does. - -```yaml -# - -# JOBS (data collection sources) - -# Pull data from local Netdata node. -local: - - # A friendly name for this job. - name: 'local' - - # What host to pull data from. - host: '127.0.0.1:19999' - - # What charts to pull data for - A regex like 'system\..*|' or 'system\..*|apps.cpu|apps.mem' etc. - charts_regex: 'system\..*' - - # Charts to exclude, useful if you would like to exclude some specific charts. - # Note: should be a ',' separated string like 'chart.name,chart.name'. - charts_to_exclude: '' - - # Get ChangeFinder scores 'per_dim' or 'per_chart'. - mode: 'per_chart' - - # Default parameters that can be passed to the changefinder library. - cf_r: 0.5 - cf_order: 1 - cf_smooth: 15 - - # The percentile above which scores will be flagged. - cf_threshold: 99 - - # The number of recent scores to use when calculating the percentile of the changefinder score. - n_score_samples: 14400 - - # Set to true if you also want to chart the percentile scores in addition to the flags. - # Mainly useful for debugging or if you want to dive deeper on how the scores are evolving over time. - show_scores: false -``` - -## Troubleshooting - -To see any relevant log messages you can use a command like below. - -```bash -grep 'changefinder' /var/log/netdata/error.log -``` - -If you would like to log in as `netdata` user and run the collector in debug mode to see more detail. - -```bash -# become netdata user -sudo su -s /bin/bash netdata -# run collector in debug using `nolock` option if netdata is already running the collector itself. -/usr/libexec/netdata/plugins.d/python.d.plugin changefinder debug trace nolock -``` - -## Notes - -- It may take an hour or two (depending on your choice of `n_score_samples`) for the collector to 'settle' into it's - typical behaviour in terms of the trained models and scores you will see in the normal running of your node. Mainly - this is because it can take a while to build up a proper distribution of previous scores in over to convert the raw - score returned by the ChangeFinder algorithm into a percentile based on the most recent `n_score_samples` that have - already been produced. So when you first turn the collector on, it will have a lot of flags in the beginning and then - should 'settle down' once it has built up enough history. This is a typical characteristic of online machine learning - approaches which need some initial window of time before they can be useful. -- As this collector does most of the work in Python itself, you may want to try it out first on a test or development - system to get a sense of its performance characteristics on a node similar to where you would like to use it. -- On a development n1-standard-2 (2 vCPUs, 7.5 GB memory) vm running Ubuntu 18.04 LTS and not doing any work some of the - typical performance characteristics we saw from running this collector (with defaults) were: - - A runtime (`netdata.runtime_changefinder`) of ~30ms. - - Typically ~1% additional cpu usage. - - About ~85mb of ram (`apps.mem`) being continually used by the `python.d.plugin` under default configuration. - -## Useful links and further reading - -- [PyPi changefinder](https://pypi.org/project/changefinder/) reference page. -- [GitHub repo](https://github.com/shunsukeaihara/changefinder) for the changefinder library. -- Relevant academic papers: - - Yamanishi K, Takeuchi J. A unifying framework for detecting outliers and change points from nonstationary time - series data. 8th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD02. 2002: - 676. ([pdf](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.3469&rep=rep1&type=pdf)) - - Kawahara Y, Sugiyama M. Sequential Change-Point Detection Based on Direct Density-Ratio Estimation. SIAM - International Conference on Data Mining. 2009: - 389–400. ([pdf](https://onlinelibrary.wiley.com/doi/epdf/10.1002/sam.10124)) - - Liu S, Yamada M, Collier N, Sugiyama M. Change-point detection in time-series data by relative density-ratio - estimation. Neural Networks. Jul.2013 43:72–83. [PubMed: 23500502] ([pdf](https://arxiv.org/pdf/1203.0453.pdf)) - - T. Iwata, K. Nakamura, Y. Tokusashi, and H. Matsutani, “Accelerating Online Change-Point Detection Algorithm using - 10 GbE FPGA NIC,” Proc. International European Conference on Parallel and Distributed Computing (Euro-Par’18) - Workshops, vol.11339, pp.506–517, Aug. - 2018 ([pdf](https://www.arc.ics.keio.ac.jp/~matutani/papers/iwata_heteropar2018.pdf)) -- The [ruptures](https://github.com/deepcharles/ruptures) python package is also a good place to learn more about - changepoint detection (mostly offline as opposed to online but deals with similar concepts). -- A nice [blog post](https://techrando.com/2019/08/14/a-brief-introduction-to-change-point-detection-using-python/) - showing some of the other options and libraries for changepoint detection in Python. -- [Bayesian changepoint detection](https://github.com/hildensia/bayesian_changepoint_detection) library - we may explore - implementing a collector for this or integrating this approach into this collector at a future date if there is - interest and it proves computationaly feasible. -- You might also find the - Netdata [anomalies collector](https://github.com/netdata/netdata/tree/master/collectors/python.d.plugin/anomalies) - interesting. -- [Anomaly Detection](https://en.wikipedia.org/wiki/Anomaly_detection) wikipedia page. -- [Anomaly Detection YouTube playlist](https://www.youtube.com/playlist?list=PL6Zhl9mK2r0KxA6rB87oi4kWzoqGd5vp0) - maintained by [andrewm4894](https://github.com/andrewm4894/) from Netdata. -- [awesome-TS-anomaly-detection](https://github.com/rob-med/awesome-TS-anomaly-detection) Github list of useful tools, - libraries and resources. -- [Mendeley public group](https://www.mendeley.com/community/interesting-anomaly-detection-papers/) with some - interesting anomaly detection papers we have been reading. -- Good [blog post](https://www.anodot.com/blog/what-is-anomaly-detection/) from Anodot on time series anomaly detection. - Anodot also have some great whitepapers in this space too that some may find useful. -- Novelty and outlier detection in - the [scikit-learn documentation](https://scikit-learn.org/stable/modules/outlier_detection.html). - -### Troubleshooting - -To troubleshoot issues with the `changefinder` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `changefinder` module in debug mode: - -```bash -./python.d.plugin changefinder debug trace -``` - +integrations/python.d_changefinder.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/changefinder/integrations/python.d_changefinder.md b/collectors/python.d.plugin/changefinder/integrations/python.d_changefinder.md new file mode 100644 index 000000000..2265d9620 --- /dev/null +++ b/collectors/python.d.plugin/changefinder/integrations/python.d_changefinder.md @@ -0,0 +1,216 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/changefinder/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/changefinder/metadata.yaml" +sidebar_label: "python.d changefinder" +learn_status: "Published" +learn_rel_path: "Data Collection/Other" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# python.d changefinder + +Plugin: python.d.plugin +Module: changefinder + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector uses the Python [changefinder](https://github.com/shunsukeaihara/changefinder) library to +perform [online](https://en.wikipedia.org/wiki/Online_machine_learning) [changepoint detection](https://en.wikipedia.org/wiki/Change_detection) +on your Netdata charts and/or dimensions. + + +Instead of this collector just _collecting_ data, it also does some computation on the data it collects to return a changepoint score for each chart or dimension you configure it to work on. This is an [online](https://en.wikipedia.org/wiki/Online_machine_learning) machine learning algorithm so there is no batch step to train the model, instead it evolves over time as more data arrives. That makes this particular algorithm quite cheap to compute at each step of data collection (see the notes section below for more details) and it should scale fairly well to work on lots of charts or hosts (if running on a parent node for example). +### Notes - It may take an hour or two (depending on your choice of `n_score_samples`) for the collector to 'settle' into it's + typical behaviour in terms of the trained models and scores you will see in the normal running of your node. Mainly + this is because it can take a while to build up a proper distribution of previous scores in over to convert the raw + score returned by the ChangeFinder algorithm into a percentile based on the most recent `n_score_samples` that have + already been produced. So when you first turn the collector on, it will have a lot of flags in the beginning and then + should 'settle down' once it has built up enough history. This is a typical characteristic of online machine learning + approaches which need some initial window of time before they can be useful. +- As this collector does most of the work in Python itself, you may want to try it out first on a test or development + system to get a sense of its performance characteristics on a node similar to where you would like to use it. +- On a development n1-standard-2 (2 vCPUs, 7.5 GB memory) vm running Ubuntu 18.04 LTS and not doing any work some of the + typical performance characteristics we saw from running this collector (with defaults) were: + - A runtime (`netdata.runtime_changefinder`) of ~30ms. + - Typically ~1% additional cpu usage. + - About ~85mb of ram (`apps.mem`) being continually used by the `python.d.plugin` under default configuration. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default this collector will work over all `system.*` charts. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per python.d changefinder instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| changefinder.scores | a dimension per chart | score | +| changefinder.flags | a dimension per chart | flag | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Python Requirements + +This collector will only work with Python 3 and requires the packages below be installed. + +```bash +# become netdata user +sudo su -s /bin/bash netdata +# install required packages for the netdata user +pip3 install --user numpy==1.19.5 changefinder==0.03 scipy==1.5.4 +``` + +**Note**: if you need to tell Netdata to use Python 3 then you can pass the below command in the python plugin section +of your `netdata.conf` file. + +```yaml +[ plugin:python.d ] + # update every = 1 + command options = -ppython3 +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/changefinder.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/changefinder.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| charts_regex | what charts to pull data for - A regex like `system\..*/` or `system\..*/apps.cpu/apps.mem` etc. | system\..* | True | +| charts_to_exclude | charts to exclude, useful if you would like to exclude some specific charts. note: should be a ',' separated string like 'chart.name,chart.name'. | | False | +| mode | get ChangeFinder scores 'per_dim' or 'per_chart'. | per_chart | True | +| cf_r | default parameters that can be passed to the changefinder library. | 0.5 | False | +| cf_order | default parameters that can be passed to the changefinder library. | 1 | False | +| cf_smooth | default parameters that can be passed to the changefinder library. | 15 | False | +| cf_threshold | the percentile above which scores will be flagged. | 99 | False | +| n_score_samples | the number of recent scores to use when calculating the percentile of the changefinder score. | 14400 | False | +| show_scores | set to true if you also want to chart the percentile scores in addition to the flags. (mainly useful for debugging or if you want to dive deeper on how the scores are evolving over time) | False | False | + +</details> + +#### Examples + +##### Default + +Default configuration. + +```yaml +local: + name: 'local' + host: '127.0.0.1:19999' + charts_regex: 'system\..*' + charts_to_exclude: '' + mode: 'per_chart' + cf_r: 0.5 + cf_order: 1 + cf_smooth: 15 + cf_threshold: 99 + n_score_samples: 14400 + show_scores: false + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `changefinder` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin changefinder debug trace + ``` + +### Debug Mode + + + +### Log Messages + + + + diff --git a/collectors/python.d.plugin/changefinder/metadata.yaml b/collectors/python.d.plugin/changefinder/metadata.yaml index 6dcd903e7..170d9146a 100644 --- a/collectors/python.d.plugin/changefinder/metadata.yaml +++ b/collectors/python.d.plugin/changefinder/metadata.yaml @@ -5,55 +5,187 @@ modules: module_name: changefinder monitored_instance: name: python.d changefinder - link: '' + link: "" categories: - data-collection.other - icon_filename: '' + icon_filename: "" related_resources: integrations: list: [] info_provided_to_referring_integrations: - description: '' - keywords: [] + description: "" + keywords: + - change detection + - anomaly detection + - machine learning + - ml most_popular: false overview: data_collection: - metrics_description: '' - method_description: '' + metrics_description: | + This collector uses the Python [changefinder](https://github.com/shunsukeaihara/changefinder) library to + perform [online](https://en.wikipedia.org/wiki/Online_machine_learning) [changepoint detection](https://en.wikipedia.org/wiki/Change_detection) + on your Netdata charts and/or dimensions. + method_description: > + Instead of this collector just _collecting_ data, it also does some computation on the data it collects to return a + changepoint score for each chart or dimension you configure it to work on. This is + an [online](https://en.wikipedia.org/wiki/Online_machine_learning) machine learning algorithm so there is no batch step + to train the model, instead it evolves over time as more data arrives. That makes this particular algorithm quite cheap + to compute at each step of data collection (see the notes section below for more details) and it should scale fairly + well to work on lots of charts or hosts (if running on a parent node for example). + + ### Notes + - It may take an hour or two (depending on your choice of `n_score_samples`) for the collector to 'settle' into it's + typical behaviour in terms of the trained models and scores you will see in the normal running of your node. Mainly + this is because it can take a while to build up a proper distribution of previous scores in over to convert the raw + score returned by the ChangeFinder algorithm into a percentile based on the most recent `n_score_samples` that have + already been produced. So when you first turn the collector on, it will have a lot of flags in the beginning and then + should 'settle down' once it has built up enough history. This is a typical characteristic of online machine learning + approaches which need some initial window of time before they can be useful. + - As this collector does most of the work in Python itself, you may want to try it out first on a test or development + system to get a sense of its performance characteristics on a node similar to where you would like to use it. + - On a development n1-standard-2 (2 vCPUs, 7.5 GB memory) vm running Ubuntu 18.04 LTS and not doing any work some of the + typical performance characteristics we saw from running this collector (with defaults) were: + - A runtime (`netdata.runtime_changefinder`) of ~30ms. + - Typically ~1% additional cpu usage. + - About ~85mb of ram (`apps.mem`) being continually used by the `python.d.plugin` under default configuration. supported_platforms: include: [] exclude: [] multi_instance: true additional_permissions: - description: '' + description: "" default_behavior: auto_detection: - description: '' + description: "By default this collector will work over all `system.*` charts." limits: - description: '' + description: "" performance_impact: - description: '' + description: "" setup: prerequisites: - list: [] + list: + - title: Python Requirements + description: | + This collector will only work with Python 3 and requires the packages below be installed. + + ```bash + # become netdata user + sudo su -s /bin/bash netdata + # install required packages for the netdata user + pip3 install --user numpy==1.19.5 changefinder==0.03 scipy==1.5.4 + ``` + + **Note**: if you need to tell Netdata to use Python 3 then you can pass the below command in the python plugin section + of your `netdata.conf` file. + + ```yaml + [ plugin:python.d ] + # update every = 1 + command options = -ppython3 + ``` configuration: file: - name: '' - description: '' + name: python.d/changefinder.conf + description: "" options: - description: '' + description: | + There are 2 sections: + + * Global variables + * One or more JOBS that can define multiple different instances to monitor. + + The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + + Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + + Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. folding: - title: '' + title: "Config options" enabled: true - list: [] + list: + - name: charts_regex + description: what charts to pull data for - A regex like `system\..*|` or `system\..*|apps.cpu|apps.mem` etc. + default_value: "system\\..*" + required: true + - name: charts_to_exclude + description: | + charts to exclude, useful if you would like to exclude some specific charts. + note: should be a ',' separated string like 'chart.name,chart.name'. + default_value: "" + required: false + - name: mode + description: get ChangeFinder scores 'per_dim' or 'per_chart'. + default_value: "per_chart" + required: true + - name: cf_r + description: default parameters that can be passed to the changefinder library. + default_value: 0.5 + required: false + - name: cf_order + description: default parameters that can be passed to the changefinder library. + default_value: 1 + required: false + - name: cf_smooth + description: default parameters that can be passed to the changefinder library. + default_value: 15 + required: false + - name: cf_threshold + description: the percentile above which scores will be flagged. + default_value: 99 + required: false + - name: n_score_samples + description: the number of recent scores to use when calculating the percentile of the changefinder score. + default_value: 14400 + required: false + - name: show_scores + description: | + set to true if you also want to chart the percentile scores in addition to the flags. (mainly useful for debugging or if you want to dive deeper on how the scores are evolving over time) + default_value: false + required: false examples: folding: enabled: true - title: '' - list: [] + title: "Config" + list: + - name: Default + description: Default configuration. + folding: + enabled: false + config: | + local: + name: 'local' + host: '127.0.0.1:19999' + charts_regex: 'system\..*' + charts_to_exclude: '' + mode: 'per_chart' + cf_r: 0.5 + cf_order: 1 + cf_smooth: 15 + cf_threshold: 99 + n_score_samples: 14400 + show_scores: false troubleshooting: problems: - list: [] + list: + - name: "Debug Mode" + description: | + If you would like to log in as `netdata` user and run the collector in debug mode to see more detail. + + ```bash + # become netdata user + sudo su -s /bin/bash netdata + # run collector in debug using `nolock` option if netdata is already running the collector itself. + /usr/libexec/netdata/plugins.d/python.d.plugin changefinder debug trace nolock + ``` + - name: "Log Messages" + description: | + To see any relevant log messages you can use a command like below. + + ```bash + grep 'changefinder' /var/log/netdata/error.log + grep 'changefinder' /var/log/netdata/collector.log + ``` alerts: [] metrics: folding: diff --git a/collectors/python.d.plugin/dovecot/README.md b/collectors/python.d.plugin/dovecot/README.md index 2397b7478..c4749cedc 100644..120000 --- a/collectors/python.d.plugin/dovecot/README.md +++ b/collectors/python.d.plugin/dovecot/README.md @@ -1,128 +1 @@ -<!-- -title: "Dovecot monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/dovecot/README.md" -sidebar_label: "Dovecot" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# Dovecot collector - -Provides statistics information from Dovecot server. - -Statistics are taken from dovecot socket by executing `EXPORT global` command. -More information about dovecot stats can be found on [project wiki page.](http://wiki2.dovecot.org/Statistics) - -Module isn't compatible with new statistic api (v2.3), but you are still able to use the module with Dovecot v2.3 -by following [upgrading steps.](https://wiki2.dovecot.org/Upgrading/2.3). - -**Requirement:** -Dovecot UNIX socket with R/W permissions for user `netdata` or Dovecot with configured TCP/IP socket. - -Module gives information with following charts: - -1. **sessions** - - - active sessions - -2. **logins** - - - logins - -3. **commands** - number of IMAP commands - - - commands - -4. **Faults** - - - minor - - major - -5. **Context Switches** - - - voluntary - - involuntary - -6. **disk** in bytes/s - - - read - - write - -7. **bytes** in bytes/s - - - read - - write - -8. **number of syscalls** in syscalls/s - - - read - - write - -9. **lookups** - number of lookups per second - - - path - - attr - -10. **hits** - number of cache hits - - - hits - -11. **attempts** - authorization attempts - - - success - - failure - -12. **cache** - cached authorization hits - - - hit - - miss - -## Configuration - -Edit the `python.d/dovecot.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/dovecot.conf -``` - -Sample: - -```yaml -localtcpip: - name : 'local' - host : '127.0.0.1' - port : 24242 - -localsocket: - name : 'local' - socket : '/var/run/dovecot/stats' -``` - -If no configuration is given, module will attempt to connect to dovecot using unix socket localized in `/var/run/dovecot/stats` - - - - -### Troubleshooting - -To troubleshoot issues with the `dovecot` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `dovecot` module in debug mode: - -```bash -./python.d.plugin dovecot debug trace -``` - +integrations/dovecot.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/dovecot/integrations/dovecot.md b/collectors/python.d.plugin/dovecot/integrations/dovecot.md new file mode 100644 index 000000000..4057a5b6c --- /dev/null +++ b/collectors/python.d.plugin/dovecot/integrations/dovecot.md @@ -0,0 +1,196 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/dovecot/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/dovecot/metadata.yaml" +sidebar_label: "Dovecot" +learn_status: "Published" +learn_rel_path: "Data Collection/Mail Servers" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Dovecot + + +<img src="https://netdata.cloud/img/dovecot.svg" width="150"/> + + +Plugin: python.d.plugin +Module: dovecot + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Dovecot metrics about sessions, logins, commands, page faults and more. + +It uses the dovecot socket and executes the `EXPORT global` command to get the statistics. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +If no configuration is given, the collector will attempt to connect to dovecot using unix socket localized in `/var/run/dovecot/stats` + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Dovecot instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| dovecot.sessions | active sessions | number | +| dovecot.logins | logins | number | +| dovecot.commands | commands | commands | +| dovecot.faults | minor, major | faults | +| dovecot.context_switches | voluntary, involuntary | switches | +| dovecot.io | read, write | KiB/s | +| dovecot.net | read, write | kilobits/s | +| dovecot.syscalls | read, write | syscalls/s | +| dovecot.lookup | path, attr | number/s | +| dovecot.cache | hits | hits/s | +| dovecot.auth | ok, failed | attempts | +| dovecot.auth_cache | hit, miss | number | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Dovecot configuration + +The Dovecot UNIX socket should have R/W permissions for user netdata, or Dovecot should be configured with a TCP/IP socket. + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/dovecot.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/dovecot.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| socket | Use this socket to communicate with Devcot | /var/run/dovecot/stats | False | +| host | Instead of using a socket, you can point the collector to an ip for devcot statistics. | | False | +| port | Used in combination with host, configures the port devcot listens to. | | False | + +</details> + +#### Examples + +##### Local TCP + +A basic TCP configuration. + +<details><summary>Config</summary> + +```yaml +localtcpip: + name: 'local' + host: '127.0.0.1' + port: 24242 + +``` +</details> + +##### Local socket + +A basic local socket configuration + +<details><summary>Config</summary> + +```yaml +localsocket: + name: 'local' + socket: '/var/run/dovecot/stats' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `dovecot` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin dovecot debug trace + ``` + + diff --git a/collectors/python.d.plugin/example/README.md b/collectors/python.d.plugin/example/README.md index 63ec7a298..55877a99a 100644..120000 --- a/collectors/python.d.plugin/example/README.md +++ b/collectors/python.d.plugin/example/README.md @@ -1,38 +1 @@ -<!-- -title: "Example module in Python" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/example/README.md" -sidebar_label: "Example module in Python" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Mock Collectors" ---> - -# Example module in Python - -You can add custom data collectors using Python. - -Netdata provides an [example python data collection module](https://github.com/netdata/netdata/tree/master/collectors/python.d.plugin/example). - -If you want to write your own collector, read our [writing a new Python module](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md#how-to-write-a-new-module) tutorial. - - -### Troubleshooting - -To troubleshoot issues with the `example` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `example` module in debug mode: - -```bash -./python.d.plugin example debug trace -``` - +integrations/example_collector.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/example/integrations/example_collector.md b/collectors/python.d.plugin/example/integrations/example_collector.md new file mode 100644 index 000000000..44b405a7d --- /dev/null +++ b/collectors/python.d.plugin/example/integrations/example_collector.md @@ -0,0 +1,170 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/example/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/example/metadata.yaml" +sidebar_label: "Example collector" +learn_status: "Published" +learn_rel_path: "Data Collection/Other" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Example collector + +Plugin: python.d.plugin +Module: example + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Example collector that generates some random numbers as metrics. + +If you want to write your own collector, read our [writing a new Python module](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md#how-to-write-a-new-module) tutorial. + + +The `get_data()` function uses `random.randint()` to generate a random number which will be collected as a metric. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Example collector instance + +These metrics refer to the entire monitored application. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| example.random | random | number | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/example.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/example.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| num_lines | The number of lines to create. | 4 | False | +| lower | The lower bound of numbers to randomly sample from. | 0 | False | +| upper | The upper bound of numbers to randomly sample from. | 100 | False | +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration. + +```yaml +four_lines: + name: "Four Lines" + update_every: 1 + priority: 60000 + penalty: yes + autodetection_retry: 0 + num_lines: 4 + lower: 0 + upper: 100 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `example` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin example debug trace + ``` + + diff --git a/collectors/python.d.plugin/exim/README.md b/collectors/python.d.plugin/exim/README.md index bc00ab7c6..f1f2ef9f9 100644..120000 --- a/collectors/python.d.plugin/exim/README.md +++ b/collectors/python.d.plugin/exim/README.md @@ -1,64 +1 @@ -<!-- -title: "Exim monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/exim/README.md" -sidebar_label: "Exim" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# Exim collector - -Simple module executing `exim -bpc` to grab exim queue. -This command can take a lot of time to finish its execution thus it is not recommended to run it every second. - -## Requirements - -The module uses the `exim` binary, which can only be executed as root by default. We need to allow other users to `exim` binary. We solve that adding `queue_list_requires_admin` statement in exim configuration and set to `false`, because it is `true` by default. On many Linux distributions, the default location of `exim` configuration is in `/etc/exim.conf`. - -1. Edit the `exim` configuration with your preferred editor and add: -`queue_list_requires_admin = false` -2. Restart `exim` and Netdata - -*WHM (CPanel) server* - -On a WHM server, you can reconfigure `exim` over the WHM interface with the following steps. - -1. Login to WHM -2. Navigate to Service Configuration --> Exim Configuration Manager --> tab Advanced Editor -3. Scroll down to the button **Add additional configuration setting** and click on it. -4. In the new dropdown which will appear above we need to find and choose: -`queue_list_requires_admin` and set to `false` -5. Scroll to the end and click the **Save** button. - -It produces only one chart: - -1. **Exim Queue Emails** - - - emails - -Configuration is not needed. - - - - -### Troubleshooting - -To troubleshoot issues with the `exim` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `exim` module in debug mode: - -```bash -./python.d.plugin exim debug trace -``` - +integrations/exim.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/exim/integrations/exim.md b/collectors/python.d.plugin/exim/integrations/exim.md new file mode 100644 index 000000000..328d17870 --- /dev/null +++ b/collectors/python.d.plugin/exim/integrations/exim.md @@ -0,0 +1,180 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/exim/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/exim/metadata.yaml" +sidebar_label: "Exim" +learn_status: "Published" +learn_rel_path: "Data Collection/Mail Servers" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Exim + + +<img src="https://netdata.cloud/img/exim.jpg" width="150"/> + + +Plugin: python.d.plugin +Module: exim + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Exim mail queue. + +It uses the `exim` command line binary to get the statistics. + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +Assuming setup prerequisites are met, the collector will try to gather statistics using the method described above, even without any configuration. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Exim instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| exim.qemails | emails | emails | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Exim configuration - local installation + +The module uses the `exim` binary, which can only be executed as root by default. We need to allow other users to `exim` binary. We solve that adding `queue_list_requires_admin` statement in exim configuration and set to `false`, because it is `true` by default. On many Linux distributions, the default location of `exim` configuration is in `/etc/exim.conf`. + +1. Edit the `exim` configuration with your preferred editor and add: +`queue_list_requires_admin = false` +2. Restart `exim` and Netdata + + +#### Exim configuration - WHM (CPanel) server + +On a WHM server, you can reconfigure `exim` over the WHM interface with the following steps. + +1. Login to WHM +2. Navigate to Service Configuration --> Exim Configuration Manager --> tab Advanced Editor +3. Scroll down to the button **Add additional configuration setting** and click on it. +4. In the new dropdown which will appear above we need to find and choose: +`queue_list_requires_admin` and set to `false` +5. Scroll to the end and click the **Save** button. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/exim.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/exim.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| command | Path and command to the `exim` binary | exim -bpc | False | + +</details> + +#### Examples + +##### Local exim install + +A basic local exim install + +```yaml +local: + command: 'exim -bpc' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `exim` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin exim debug trace + ``` + + diff --git a/collectors/python.d.plugin/fail2ban/README.md b/collectors/python.d.plugin/fail2ban/README.md index 41276d5f7..642a8bcf5 100644..120000 --- a/collectors/python.d.plugin/fail2ban/README.md +++ b/collectors/python.d.plugin/fail2ban/README.md @@ -1,105 +1 @@ -<!-- -title: "Fail2ban monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/fail2ban/README.md" -sidebar_label: "Fail2ban" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Apps" ---> - -# Fail2ban collector - -Monitors the fail2ban log file to show all bans for all active jails. - -## Requirements - -The `fail2ban.log` file must be readable by the user `netdata`: - -- change the file ownership and access permissions. -- update `/etc/logrotate.d/fail2ban` to persists the changes after rotating the log file. - -<details> - <summary>Click to expand the instruction.</summary> - -To change the file ownership and access permissions, execute the following: - -```shell -sudo chown root:netdata /var/log/fail2ban.log -sudo chmod 640 /var/log/fail2ban.log -``` - -To persist the changes after rotating the log file, add `create 640 root netdata` to the `/etc/logrotate.d/fail2ban`: - -```shell -/var/log/fail2ban.log { - - weekly - rotate 4 - compress - - delaycompress - missingok - postrotate - fail2ban-client flushlogs 1>/dev/null - endscript - - # If fail2ban runs as non-root it still needs to have write access - # to logfiles. - # create 640 fail2ban adm - create 640 root netdata -} -``` - -</details> - -## Charts - -- Failed attempts in attempts/s -- Bans in bans/s -- Banned IP addresses (since the last restart of netdata) in ips - -## Configuration - -Edit the `python.d/fail2ban.conf` configuration file using `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/fail2ban.conf -``` - -Sample: - -```yaml -local: - log_path: '/var/log/fail2ban.log' - conf_path: '/etc/fail2ban/jail.local' - exclude: 'dropbear apache' -``` - -If no configuration is given, module will attempt to read log file at `/var/log/fail2ban.log` and conf file -at `/etc/fail2ban/jail.local`. If conf file is not found default jail is `ssh`. - - - - -### Troubleshooting - -To troubleshoot issues with the `fail2ban` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `fail2ban` module in debug mode: - -```bash -./python.d.plugin fail2ban debug trace -``` - +integrations/fail2ban.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/fail2ban/integrations/fail2ban.md b/collectors/python.d.plugin/fail2ban/integrations/fail2ban.md new file mode 100644 index 000000000..64bfe21ba --- /dev/null +++ b/collectors/python.d.plugin/fail2ban/integrations/fail2ban.md @@ -0,0 +1,208 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/fail2ban/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/fail2ban/metadata.yaml" +sidebar_label: "Fail2ban" +learn_status: "Published" +learn_rel_path: "Data Collection/Authentication and Authorization" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Fail2ban + + +<img src="https://netdata.cloud/img/fail2ban.png" width="150"/> + + +Plugin: python.d.plugin +Module: fail2ban + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Fail2ban performance for prime intrusion prevention operations. Monitor ban counts, jail statuses, and failed login attempts to ensure robust network security. + + +It collects metrics through reading the default log and configuration files of fail2ban. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The `fail2ban.log` file must be readable by the user `netdata`. + - change the file ownership and access permissions. + - update `/etc/logrotate.d/fail2ban`` to persist the changes after rotating the log file. + +To change the file ownership and access permissions, execute the following: + +```shell +sudo chown root:netdata /var/log/fail2ban.log +sudo chmod 640 /var/log/fail2ban.log +``` + +To persist the changes after rotating the log file, add `create 640 root netdata` to the `/etc/logrotate.d/fail2ban`: + +```shell +/var/log/fail2ban.log { + + weekly + rotate 4 + compress + + delaycompress + missingok + postrotate + fail2ban-client flushlogs 1>/dev/null + endscript + + # If fail2ban runs as non-root it still needs to have write access + # to logfiles. + # create 640 fail2ban adm + create 640 root netdata +} +``` + + +### Default Behavior + +#### Auto-Detection + +By default the collector will attempt to read log file at /var/log/fail2ban.log and conf file at /etc/fail2ban/jail.local. +If conf file is not found default jail is ssh. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Fail2ban instance + +These metrics refer to the entire monitored application. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| fail2ban.failed_attempts | a dimension per jail | attempts/s | +| fail2ban.bans | a dimension per jail | bans/s | +| fail2ban.banned_ips | a dimension per jail | ips | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/fail2ban.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/fail2ban.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| log_path | path to fail2ban.log. | /var/log/fail2ban.log | False | +| conf_path | path to jail.local/jail.conf. | /etc/fail2ban/jail.local | False | +| conf_dir | path to jail.d/. | /etc/fail2ban/jail.d/ | False | +| exclude | jails you want to exclude from autodetection. | | False | +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration. + +```yaml +local: + log_path: '/var/log/fail2ban.log' + conf_path: '/etc/fail2ban/jail.local' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `fail2ban` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin fail2ban debug trace + ``` + +### Debug Mode + + + + diff --git a/collectors/python.d.plugin/fail2ban/metadata.yaml b/collectors/python.d.plugin/fail2ban/metadata.yaml index 80aa68b62..61f762679 100644 --- a/collectors/python.d.plugin/fail2ban/metadata.yaml +++ b/collectors/python.d.plugin/fail2ban/metadata.yaml @@ -35,29 +35,29 @@ modules: The `fail2ban.log` file must be readable by the user `netdata`. - change the file ownership and access permissions. - update `/etc/logrotate.d/fail2ban`` to persist the changes after rotating the log file. - + To change the file ownership and access permissions, execute the following: - + ```shell sudo chown root:netdata /var/log/fail2ban.log sudo chmod 640 /var/log/fail2ban.log ``` - + To persist the changes after rotating the log file, add `create 640 root netdata` to the `/etc/logrotate.d/fail2ban`: - + ```shell /var/log/fail2ban.log { - + weekly rotate 4 compress - + delaycompress missingok postrotate fail2ban-client flushlogs 1>/dev/null endscript - + # If fail2ban runs as non-root it still needs to have write access # to logfiles. # create 640 fail2ban adm @@ -67,7 +67,8 @@ modules: default_behavior: auto_detection: description: | - By default the collector will attempt to read log file at /var/log/fail2ban.log and conf file at /etc/fail2ban/jail.local. If conf file is not found default jail is ssh. + By default the collector will attempt to read log file at /var/log/fail2ban.log and conf file at /etc/fail2ban/jail.local. + If conf file is not found default jail is ssh. limits: description: "" performance_impact: @@ -77,19 +78,19 @@ modules: list: [] configuration: file: - name: "" + name: python.d/fail2ban.conf description: "" options: description: | There are 2 sections: - + * Global variables * One or more JOBS that can define multiple different instances to monitor. - + The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - + Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - + Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. folding: title: Config options @@ -146,7 +147,26 @@ modules: conf_path: '/etc/fail2ban/jail.local' troubleshooting: problems: - list: [] + list: + - name: Debug Mode + description: | + To troubleshoot issues with the `fail2ban` module, run the `python.d.plugin` with the debug option enabled. + The output will give you the output of the data collection job or error messages on why the collector isn't working. + + First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's + not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the + plugin's directory, switch to the `netdata` user. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + sudo su -s /bin/bash netdata + ``` + + Now you can manually run the `fail2ban` module in debug mode: + + ```bash + ./python.d.plugin fail2ban debug trace + ``` alerts: [] metrics: folding: diff --git a/collectors/python.d.plugin/gearman/README.md b/collectors/python.d.plugin/gearman/README.md index 329c34726..70189d698 100644..120000 --- a/collectors/python.d.plugin/gearman/README.md +++ b/collectors/python.d.plugin/gearman/README.md @@ -1,73 +1 @@ -<!-- -title: "Gearman monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/gearman/README.md" -sidebar_label: "Gearman" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Distributed computing" ---> - -# Gearman collector - -Monitors Gearman worker statistics. A chart is shown for each job as well as one showing a summary of all workers. - -Note: Charts may show as a line graph rather than an area -graph if you load Netdata with no jobs running. To change -this go to "Settings" > "Which dimensions to show?" and -select "All". - -Plugin can obtain data from tcp socket **OR** unix socket. - -**Requirement:** -Socket MUST be readable by netdata user. - -It produces: - - * Workers queued - * Workers idle - * Workers running - -## Configuration - -Edit the `python.d/gearman.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/gearman.conf -``` - -```yaml -localhost: - name : 'local' - host : 'localhost' - port : 4730 - - # TLS information can be provided as well - tls : no - cert : /path/to/cert - key : /path/to/key -``` - -When no configuration file is found, module tries to connect to TCP/IP socket: `localhost:4730`. - -### Troubleshooting - -To troubleshoot issues with the `gearman` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `gearman` module in debug mode: - -```bash -./python.d.plugin gearman debug trace -``` - +integrations/gearman.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/gearman/integrations/gearman.md b/collectors/python.d.plugin/gearman/integrations/gearman.md new file mode 100644 index 000000000..f988e7448 --- /dev/null +++ b/collectors/python.d.plugin/gearman/integrations/gearman.md @@ -0,0 +1,209 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/gearman/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/gearman/metadata.yaml" +sidebar_label: "Gearman" +learn_status: "Published" +learn_rel_path: "Data Collection/Distributed Computing Systems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Gearman + + +<img src="https://netdata.cloud/img/gearman.png" width="150"/> + + +Plugin: python.d.plugin +Module: gearman + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Gearman metrics for proficient system task distribution. Track job counts, worker statuses, and queue lengths for effective distributed task management. + +This collector connects to a Gearman instance via either TCP or unix socket. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +When no configuration file is found, the collector tries to connect to TCP/IP socket: localhost:4730. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Gearman instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| gearman.total_jobs | Pending, Running | Jobs | + +### Per gearman job + +Metrics related to Gearman jobs. Each job produces its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| gearman.single_job | Pending, Idle, Runnning | Jobs | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ gearman_workers_queued ](https://github.com/netdata/netdata/blob/master/health/health.d/gearman.conf) | gearman.single_job | average number of queued jobs over the last 10 minutes | + + +## Setup + +### Prerequisites + +#### Socket permissions + +The gearman UNIX socket should have read permission for user netdata. + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/gearman.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/gearman.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| host | URL or IP where gearman is running. | localhost | False | +| port | Port of URL or IP where gearman is running. | 4730 | False | +| tls | Use tls to connect to gearman. | false | False | +| cert | Provide a certificate file if needed to connect to a TLS gearman instance. | | False | +| key | Provide a key file if needed to connect to a TLS gearman instance. | | False | + +</details> + +#### Examples + +##### Local gearman service + +A basic host and port gearman configuration for localhost. + +```yaml +localhost: + name: 'local' + host: 'localhost' + port: 4730 + +``` +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + host: 'localhost' + port: 4730 + +remote: + name: 'remote' + host: '192.0.2.1' + port: 4730 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `gearman` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin gearman debug trace + ``` + + diff --git a/collectors/python.d.plugin/go_expvar/README.md b/collectors/python.d.plugin/go_expvar/README.md index f86fa6d04..f28a82f34 100644..120000 --- a/collectors/python.d.plugin/go_expvar/README.md +++ b/collectors/python.d.plugin/go_expvar/README.md @@ -1,342 +1 @@ -<!-- -title: "Go applications monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/go_expvar/README.md" -sidebar_label: "Go applications" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Application Performance Monitoring" ---> - -# Go applications collector - -Monitors Go application that exposes its metrics with the use of `expvar` package from the Go standard library. The package produces charts for Go runtime memory statistics and optionally any number of custom charts. - -The `go_expvar` module produces the following charts: - -1. **Heap allocations** in kB - - - alloc: size of objects allocated on the heap - - inuse: size of allocated heap spans - -2. **Stack allocations** in kB - - - inuse: size of allocated stack spans - -3. **MSpan allocations** in kB - - - inuse: size of allocated mspan structures - -4. **MCache allocations** in kB - - - inuse: size of allocated mcache structures - -5. **Virtual memory** in kB - - - sys: size of reserved virtual address space - -6. **Live objects** - - - live: number of live objects in memory - -7. **GC pauses average** in ns - - - avg: average duration of all GC stop-the-world pauses - -## Monitoring Go applications - -Netdata can be used to monitor running Go applications that expose their metrics with -the use of the [expvar package](https://golang.org/pkg/expvar/) included in Go standard library. - -The `expvar` package exposes these metrics over HTTP and is very easy to use. -Consider this minimal sample below: - -```go -package main - -import ( - _ "expvar" - "net/http" -) - -func main() { - http.ListenAndServe("127.0.0.1:8080", nil) -} -``` - -When imported this way, the `expvar` package registers a HTTP handler at `/debug/vars` that -exposes Go runtime's memory statistics in JSON format. You can inspect the output by opening -the URL in your browser (or by using `wget` or `curl`). - -Sample output: - -```json -{ -"cmdline": ["./expvar-demo-binary"], -"memstats": {"Alloc":630856,"TotalAlloc":630856,"Sys":3346432,"Lookups":27, <omitted for brevity>} -} -``` - -You can of course expose and monitor your own variables as well. -Here is a sample Go application that exposes a few custom variables: - -```go -package main - -import ( - "expvar" - "net/http" - "runtime" - "time" -) - -func main() { - - tick := time.NewTicker(1 * time.Second) - num_go := expvar.NewInt("runtime.goroutines") - counters := expvar.NewMap("counters") - counters.Set("cnt1", new(expvar.Int)) - counters.Set("cnt2", new(expvar.Float)) - - go http.ListenAndServe(":8080", nil) - - for { - select { - case <- tick.C: - num_go.Set(int64(runtime.NumGoroutine())) - counters.Add("cnt1", 1) - counters.AddFloat("cnt2", 1.452) - } - } -} -``` - -Apart from the runtime memory stats, this application publishes two counters and the -number of currently running Goroutines and updates these stats every second. - -In the next section, we will cover how to monitor and chart these exposed stats with -the use of `netdata`s `go_expvar` module. - -### Using Netdata go_expvar module - -The `go_expvar` module is disabled by default. To enable it, edit `python.d.conf` (to edit it on your system run -`/etc/netdata/edit-config python.d.conf`), and change the `go_expvar` variable to `yes`: - -``` -# Enable / Disable python.d.plugin modules -#default_run: yes -# -# If "default_run" = "yes" the default for all modules is enabled (yes). -# Setting any of these to "no" will disable it. -# -# If "default_run" = "no" the default for all modules is disabled (no). -# Setting any of these to "yes" will enable it. -... -go_expvar: yes -... -``` - -Next, we need to edit the module configuration file (found at `/etc/netdata/python.d/go_expvar.conf` by default) (to -edit it on your system run `/etc/netdata/edit-config python.d/go_expvar.conf`). The module configuration consists of -jobs, where each job can be used to monitor a separate Go application. Let's see a sample job configuration: - -``` -# /etc/netdata/python.d/go_expvar.conf - -app1: - name : 'app1' - url : 'http://127.0.0.1:8080/debug/vars' - collect_memstats: true - extra_charts: {} -``` - -Let's go over each of the defined options: - -``` -name: 'app1' -``` - -This is the job name that will appear at the Netdata dashboard. -If not defined, the job_name (top level key) will be used. - -``` -url: 'http://127.0.0.1:8080/debug/vars' -``` - -This is the URL of the expvar endpoint. As the expvar handler can be installed -in a custom path, the whole URL has to be specified. This value is mandatory. - -``` -collect_memstats: true -``` - -Whether to enable collecting stats about Go runtime's memory. You can find more -information about the exposed values at the [runtime package docs](https://golang.org/pkg/runtime/#MemStats). - -``` -extra_charts: {} -``` - -Enables the user to specify custom expvars to monitor and chart. -Will be explained in more detail below. - -**Note: if `collect_memstats` is disabled and no `extra_charts` are defined, the plugin will -disable itself, as there will be no data to collect!** - -Apart from these options, each job supports options inherited from Netdata's `python.d.plugin` -and its base `UrlService` class. These are: - -``` -update_every: 1 # the job's data collection frequency -priority: 60000 # the job's order on the dashboard -user: admin # use when the expvar endpoint is protected by HTTP Basic Auth -password: sekret # use when the expvar endpoint is protected by HTTP Basic Auth -``` - -### Monitoring custom vars with go_expvar - -Now, memory stats might be useful, but what if you want Netdata to monitor some custom values -that your Go application exposes? The `go_expvar` module can do that as well with the use of -the `extra_charts` configuration variable. - -The `extra_charts` variable is a YaML list of Netdata chart definitions. -Each chart definition has the following keys: - -``` -id: Netdata chart ID -options: a key-value mapping of chart options -lines: a list of line definitions -``` - -**Note: please do not use dots in the chart or line ID field. -See [this issue](https://github.com/netdata/netdata/pull/1902#issuecomment-284494195) for explanation.** - -Please see these two links to the official Netdata documentation for more information about the values: - -- [External plugins - charts](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md#chart) -- [Chart variables](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md#global-variables-order-and-chart) - -**Line definitions** - -Each chart can define multiple lines (dimensions). -A line definition is a key-value mapping of line options. -Each line can have the following options: - -``` -# mandatory -expvar_key: the name of the expvar as present in the JSON output of /debug/vars endpoint -expvar_type: value type; supported are "float" or "int" -id: the id of this line/dimension in Netdata - -# optional - Netdata defaults are used if these options are not defined -name: '' -algorithm: absolute -multiplier: 1 -divisor: 100 if expvar_type == float, 1 if expvar_type == int -hidden: False -``` - -Please see the following link for more information about the options and their default values: -[External plugins - dimensions](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md#dimension) - -Apart from top-level expvars, this plugin can also parse expvars stored in a multi-level map; -All dicts in the resulting JSON document are then flattened to one level. -Expvar names are joined together with '.' when flattening. - -Example: - -``` -{ - "counters": {"cnt1": 1042, "cnt2": 1512.9839999999983}, - "runtime.goroutines": 5 -} -``` - -In the above case, the exported variables will be available under `runtime.goroutines`, -`counters.cnt1` and `counters.cnt2` expvar_keys. If the flattening results in a key collision, -the first defined key wins and all subsequent keys with the same name are ignored. - -## Enable the collector - -The `go_expvar` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d.conf -``` - -Change the value of the `go_expvar` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl -restart netdata`, or the appropriate method for your system, to finish enabling the `go_expvar` collector. - -## Configuration - -Edit the `python.d/go_expvar.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/go_expvar.conf -``` - -The configuration below matches the second Go application described above. -Netdata will monitor and chart memory stats for the application, as well as a custom chart of -running goroutines and two dummy counters. - -``` -app1: - name : 'app1' - url : 'http://127.0.0.1:8080/debug/vars' - collect_memstats: true - extra_charts: - - id: "runtime_goroutines" - options: - name: num_goroutines - title: "runtime: number of goroutines" - units: goroutines - family: runtime - context: expvar.runtime.goroutines - chart_type: line - lines: - - {expvar_key: 'runtime.goroutines', expvar_type: int, id: runtime_goroutines} - - id: "foo_counters" - options: - name: counters - title: "some random counters" - units: awesomeness - family: counters - context: expvar.foo.counters - chart_type: line - lines: - - {expvar_key: 'counters.cnt1', expvar_type: int, id: counters_cnt1} - - {expvar_key: 'counters.cnt2', expvar_type: float, id: counters_cnt2} -``` - -**Netdata charts example** - -The images below show how do the final charts in Netdata look. - -![Memory stats charts](https://cloud.githubusercontent.com/assets/15180106/26762052/62b4af58-493b-11e7-9e69-146705acfc2c.png) - -![Custom charts](https://cloud.githubusercontent.com/assets/15180106/26762051/62ae915e-493b-11e7-8518-bd25a3886650.png) - - -### Troubleshooting - -To troubleshoot issues with the `go_expvar` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `go_expvar` module in debug mode: - -```bash -./python.d.plugin go_expvar debug trace -``` - +integrations/go_applications_expvar.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/go_expvar/integrations/go_applications_expvar.md b/collectors/python.d.plugin/go_expvar/integrations/go_applications_expvar.md new file mode 100644 index 000000000..be4db4b70 --- /dev/null +++ b/collectors/python.d.plugin/go_expvar/integrations/go_applications_expvar.md @@ -0,0 +1,334 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/go_expvar/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/go_expvar/metadata.yaml" +sidebar_label: "Go applications (EXPVAR)" +learn_status: "Published" +learn_rel_path: "Data Collection/APM" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Go applications (EXPVAR) + + +<img src="https://netdata.cloud/img/go.png" width="150"/> + + +Plugin: python.d.plugin +Module: go_expvar + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Go applications that expose their metrics with the use of the `expvar` package from the Go standard library. It produces charts for Go runtime memory statistics and optionally any number of custom charts. + +It connects via http to gather the metrics exposed via the `expvar` package. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Go applications (EXPVAR) instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| expvar.memstats.heap | alloc, inuse | KiB | +| expvar.memstats.stack | inuse | KiB | +| expvar.memstats.mspan | inuse | KiB | +| expvar.memstats.mcache | inuse | KiB | +| expvar.memstats.live_objects | live | objects | +| expvar.memstats.sys | sys | KiB | +| expvar.memstats.gc_pauses | avg | ns | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Enable the go_expvar collector + +The `go_expvar` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. + +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config python.d.conf +``` + +Change the value of the `go_expvar` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + + +#### Sample `expvar` usage in a Go application + +The `expvar` package exposes metrics over HTTP and is very easy to use. +Consider this minimal sample below: + +```go +package main + +import ( + _ "expvar" + "net/http" +) + +func main() { + http.ListenAndServe("127.0.0.1:8080", nil) +} +``` + +When imported this way, the `expvar` package registers a HTTP handler at `/debug/vars` that +exposes Go runtime's memory statistics in JSON format. You can inspect the output by opening +the URL in your browser (or by using `wget` or `curl`). + +Sample output: + +```json +{ +"cmdline": ["./expvar-demo-binary"], +"memstats": {"Alloc":630856,"TotalAlloc":630856,"Sys":3346432,"Lookups":27, <omitted for brevity>} +} +``` + +You can of course expose and monitor your own variables as well. +Here is a sample Go application that exposes a few custom variables: + +```go +package main + +import ( + "expvar" + "net/http" + "runtime" + "time" +) + +func main() { + + tick := time.NewTicker(1 * time.Second) + num_go := expvar.NewInt("runtime.goroutines") + counters := expvar.NewMap("counters") + counters.Set("cnt1", new(expvar.Int)) + counters.Set("cnt2", new(expvar.Float)) + + go http.ListenAndServe(":8080", nil) + + for { + select { + case <- tick.C: + num_go.Set(int64(runtime.NumGoroutine())) + counters.Add("cnt1", 1) + counters.AddFloat("cnt2", 1.452) + } + } +} +``` + +Apart from the runtime memory stats, this application publishes two counters and the +number of currently running Goroutines and updates these stats every second. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/go_expvar.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/go_expvar.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. Each JOB can be used to monitor a different Go application. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| url | the URL and port of the expvar endpoint. Please include the whole path of the endpoint, as the expvar handler can be installed in a non-standard location. | | True | +| user | If the URL is password protected, this is the username to use. | | False | +| pass | If the URL is password protected, this is the password to use. | | False | +| collect_memstats | Enables charts for Go runtime's memory statistics. | | False | +| extra_charts | Defines extra data/charts to monitor, please see the example below. | | False | + +</details> + +#### Examples + +##### Monitor a Go app1 application + +The example below sets a configuration for a Go application, called `app1`. Besides the `memstats`, the application also exposes two counters and the number of currently running Goroutines and updates these stats every second. + +The `go_expvar` collector can monitor these as well with the use of the `extra_charts` configuration variable. + +The `extra_charts` variable is a YaML list of Netdata chart definitions. +Each chart definition has the following keys: + +``` +id: Netdata chart ID +options: a key-value mapping of chart options +lines: a list of line definitions +``` + +**Note: please do not use dots in the chart or line ID field. +See [this issue](https://github.com/netdata/netdata/pull/1902#issuecomment-284494195) for explanation.** + +Please see these two links to the official Netdata documentation for more information about the values: + +- [External plugins - charts](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md#chart) +- [Chart variables](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/README.md#global-variables-order-and-chart) + +**Line definitions** + +Each chart can define multiple lines (dimensions). +A line definition is a key-value mapping of line options. +Each line can have the following options: + +``` +# mandatory +expvar_key: the name of the expvar as present in the JSON output of /debug/vars endpoint +expvar_type: value type; supported are "float" or "int" +id: the id of this line/dimension in Netdata + +# optional - Netdata defaults are used if these options are not defined +name: '' +algorithm: absolute +multiplier: 1 +divisor: 100 if expvar_type == float, 1 if expvar_type == int +hidden: False +``` + +Please see the following link for more information about the options and their default values: +[External plugins - dimensions](https://github.com/netdata/netdata/blob/master/collectors/plugins.d/README.md#dimension) + +Apart from top-level expvars, this plugin can also parse expvars stored in a multi-level map; +All dicts in the resulting JSON document are then flattened to one level. +Expvar names are joined together with '.' when flattening. + +Example: + +``` +{ + "counters": {"cnt1": 1042, "cnt2": 1512.9839999999983}, + "runtime.goroutines": 5 +} +``` + +In the above case, the exported variables will be available under `runtime.goroutines`, +`counters.cnt1` and `counters.cnt2` expvar_keys. If the flattening results in a key collision, +the first defined key wins and all subsequent keys with the same name are ignored. + + +```yaml +app1: + name : 'app1' + url : 'http://127.0.0.1:8080/debug/vars' + collect_memstats: true + extra_charts: + - id: "runtime_goroutines" + options: + name: num_goroutines + title: "runtime: number of goroutines" + units: goroutines + family: runtime + context: expvar.runtime.goroutines + chart_type: line + lines: + - {expvar_key: 'runtime.goroutines', expvar_type: int, id: runtime_goroutines} + - id: "foo_counters" + options: + name: counters + title: "some random counters" + units: awesomeness + family: counters + context: expvar.foo.counters + chart_type: line + lines: + - {expvar_key: 'counters.cnt1', expvar_type: int, id: counters_cnt1} + - {expvar_key: 'counters.cnt2', expvar_type: float, id: counters_cnt2} + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `go_expvar` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin go_expvar debug trace + ``` + + diff --git a/collectors/python.d.plugin/go_expvar/metadata.yaml b/collectors/python.d.plugin/go_expvar/metadata.yaml index 92669dd9c..9419b024a 100644 --- a/collectors/python.d.plugin/go_expvar/metadata.yaml +++ b/collectors/python.d.plugin/go_expvar/metadata.yaml @@ -4,7 +4,7 @@ modules: plugin_name: python.d.plugin module_name: go_expvar monitored_instance: - name: Go applications + name: Go applications (EXPVAR) link: "https://pkg.go.dev/expvar" categories: - data-collection.apm @@ -39,6 +39,16 @@ modules: setup: prerequisites: list: + - title: "Enable the go_expvar collector" + description: | + The `go_expvar` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. + + ```bash + cd /etc/netdata # Replace this path with your Netdata config directory, if different + sudo ./edit-config python.d.conf + ``` + + Change the value of the `go_expvar` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - title: "Sample `expvar` usage in a Go application" description: | The `expvar` package exposes metrics over HTTP and is very easy to use. diff --git a/collectors/python.d.plugin/hddtemp/README.md b/collectors/python.d.plugin/hddtemp/README.md index b42da7346..95c7593f8 100644..120000 --- a/collectors/python.d.plugin/hddtemp/README.md +++ b/collectors/python.d.plugin/hddtemp/README.md @@ -1,61 +1 @@ -<!-- -title: "Hard drive temperature monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/hddtemp/README.md" -sidebar_label: "Hard drive temperature" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Hardware" ---> - -# Hard drive temperature collector - -Monitors disk temperatures from one or more `hddtemp` daemons. - -**Requirement:** -Running `hddtemp` in daemonized mode with access on tcp port - -It produces one chart **Temperature** with dynamic number of dimensions (one per disk) - -## Configuration - -Edit the `python.d/hddtemp.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/hddtemp.conf -``` - -Sample: - -```yaml -update_every: 3 -host: "127.0.0.1" -port: 7634 -``` - -If no configuration is given, module will attempt to connect to hddtemp daemon on `127.0.0.1:7634` address - - - - -### Troubleshooting - -To troubleshoot issues with the `hddtemp` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `hddtemp` module in debug mode: - -```bash -./python.d.plugin hddtemp debug trace -``` - +integrations/hdd_temperature.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/hddtemp/integrations/hdd_temperature.md b/collectors/python.d.plugin/hddtemp/integrations/hdd_temperature.md new file mode 100644 index 000000000..29512bba3 --- /dev/null +++ b/collectors/python.d.plugin/hddtemp/integrations/hdd_temperature.md @@ -0,0 +1,216 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/hddtemp/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/hddtemp/metadata.yaml" +sidebar_label: "HDD temperature" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# HDD temperature + + +<img src="https://netdata.cloud/img/hard-drive.svg" width="150"/> + + +Plugin: python.d.plugin +Module: hddtemp + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors disk temperatures. + + +It uses the `hddtemp` daemon to gather the metrics. + + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, this collector will attempt to connect to the `hddtemp` daemon on `127.0.0.1:7634` + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per HDD temperature instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| hddtemp.temperatures | a dimension per disk | Celsius | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Run `hddtemp` in daemon mode + +You can execute `hddtemp` in TCP/IP daemon mode by using the `-d` argument. + +So running `hddtemp -d` would run the daemon, by default on port 7634. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/hddtemp.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/hddtemp.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + +By default this collector will try to autodetect disks (autodetection works only for disk which names start with "sd"). However this can be overridden by setting the option `disks` to an array of desired disks. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | local | False | +| devices | Array of desired disks to detect, in case their name doesn't start with `sd`. | | False | +| host | The IP or HOSTNAME to connect to. | localhost | True | +| port | The port to connect to. | 7634 | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration. + +```yaml +localhost: + name: 'local' + host: '127.0.0.1' + port: 7634 + +``` +##### Custom disk names + +An example defining the disk names to detect. + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + host: '127.0.0.1' + port: 7634 + devices: + - customdisk1 + - customdisk2 + +``` +</details> + +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + host: '127.0.0.1' + port: 7634 + +remote_job: + name : 'remote' + host : 'http://192.0.2.1:2812' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `hddtemp` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin hddtemp debug trace + ``` + + diff --git a/collectors/python.d.plugin/hddtemp/metadata.yaml b/collectors/python.d.plugin/hddtemp/metadata.yaml index ee62dc96d..d8b56fc66 100644 --- a/collectors/python.d.plugin/hddtemp/metadata.yaml +++ b/collectors/python.d.plugin/hddtemp/metadata.yaml @@ -105,7 +105,7 @@ modules: examples: folding: enabled: true - title: "" + title: "Config" list: - name: Basic description: A basic example configuration. diff --git a/collectors/python.d.plugin/hpssa/README.md b/collectors/python.d.plugin/hpssa/README.md index 12b250475..82802d8b4 100644..120000 --- a/collectors/python.d.plugin/hpssa/README.md +++ b/collectors/python.d.plugin/hpssa/README.md @@ -1,106 +1 @@ -<!-- -title: "HP Smart Storage Arrays monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/hpssa/README.md" -sidebar_label: "HP Smart Storage Arrays" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Storage" ---> - -# HP Smart Storage Arrays collector - -Monitors controller, cache module, logical and physical drive state and temperature using `ssacli` tool. - -Executed commands: - -- `sudo -n ssacli ctrl all show config detail` - -## Requirements: - -This module uses `ssacli`, which can only be executed by root. It uses -`sudo` and assumes that it is configured such that the `netdata` user can execute `ssacli` as root without a password. - -- Add to your `/etc/sudoers` file: - -`which ssacli` shows the full path to the binary. - -```bash -netdata ALL=(root) NOPASSWD: /path/to/ssacli -``` - -- Reset Netdata's systemd - unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux - distributions with systemd) - -The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `ssacli` using `sudo`. - -As the `root` user, do the following: - -```cmd -mkdir /etc/systemd/system/netdata.service.d -echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf -systemctl daemon-reload -systemctl restart netdata.service -``` - -## Charts - -- Controller status -- Controller temperature -- Logical drive status -- Physical drive status -- Physical drive temperature - -## Enable the collector - -The `hpssa` collector is disabled by default. To enable it, use `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` -file. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d.conf -``` - -Change the value of the `hpssa` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl -restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Configuration - -Edit the `python.d/hpssa.conf` configuration file using `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/hpssa.conf -``` - -If `ssacli` cannot be found in the `PATH`, configure it in `hpssa.conf`. - -```yaml -ssacli_path: /usr/sbin/ssacli -``` - -Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -### Troubleshooting - -To troubleshoot issues with the `hpssa` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `hpssa` module in debug mode: - -```bash -./python.d.plugin hpssa debug trace -``` - +integrations/hp_smart_storage_arrays.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/hpssa/integrations/hp_smart_storage_arrays.md b/collectors/python.d.plugin/hpssa/integrations/hp_smart_storage_arrays.md new file mode 100644 index 000000000..8ec7a5c5c --- /dev/null +++ b/collectors/python.d.plugin/hpssa/integrations/hp_smart_storage_arrays.md @@ -0,0 +1,204 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/hpssa/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/hpssa/metadata.yaml" +sidebar_label: "HP Smart Storage Arrays" +learn_status: "Published" +learn_rel_path: "Data Collection/Storage, Mount Points and Filesystems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# HP Smart Storage Arrays + + +<img src="https://netdata.cloud/img/hp.svg" width="150"/> + + +Plugin: python.d.plugin +Module: hpssa + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors HP Smart Storage Arrays metrics about operational statuses and temperatures. + +It uses the command line tool `ssacli`. The exact command used is `sudo -n ssacli ctrl all show config detail` + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +If no configuration is provided, the collector will try to execute the `ssacli` binary. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per HP Smart Storage Arrays instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| hpssa.ctrl_status | ctrl_{adapter slot}_status, cache_{adapter slot}_status, battery_{adapter slot}_status per adapter | Status | +| hpssa.ctrl_temperature | ctrl_{adapter slot}_temperature, cache_{adapter slot}_temperature per adapter | Celsius | +| hpssa.ld_status | a dimension per logical drive | Status | +| hpssa.pd_status | a dimension per physical drive | Status | +| hpssa.pd_temperature | a dimension per physical drive | Celsius | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Enable the hpssa collector + +The `hpssa` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. + +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config python.d.conf +``` + +Change the value of the `hpssa` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + + +#### Allow user netdata to execute `ssacli` as root. + +This module uses `ssacli`, which can only be executed by root. It uses `sudo` and assumes that it is configured such that the `netdata` user can execute `ssacli` as root without a password. + +- Add to your `/etc/sudoers` file: + +`which ssacli` shows the full path to the binary. + +```bash +netdata ALL=(root) NOPASSWD: /path/to/ssacli +``` + +- Reset Netdata's systemd + unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux + distributions with systemd) + +The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `ssacli` using `sudo`. + +As the `root` user, do the following: + +```cmd +mkdir /etc/systemd/system/netdata.service.d +echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf +systemctl daemon-reload +systemctl restart netdata.service +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/hpssa.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/hpssa.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| ssacli_path | Path to the `ssacli` command line utility. Configure this if `ssacli` is not in the $PATH | | False | +| use_sudo | Whether or not to use `sudo` to execute `ssacli` | True | False | + +</details> + +#### Examples + +##### Local simple config + +A basic configuration, specyfing the path to `ssacli` + +```yaml +local: + ssacli_path: /usr/sbin/ssacli + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `hpssa` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin hpssa debug trace + ``` + + diff --git a/collectors/python.d.plugin/hpssa/metadata.yaml b/collectors/python.d.plugin/hpssa/metadata.yaml index dc91f05e4..7871cc276 100644 --- a/collectors/python.d.plugin/hpssa/metadata.yaml +++ b/collectors/python.d.plugin/hpssa/metadata.yaml @@ -40,6 +40,16 @@ modules: setup: prerequisites: list: + - title: 'Enable the hpssa collector' + description: | + The `hpssa` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. + + ```bash + cd /etc/netdata # Replace this path with your Netdata config directory, if different + sudo ./edit-config python.d.conf + ``` + + Change the value of the `hpssa` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - title: 'Allow user netdata to execute `ssacli` as root.' description: | This module uses `ssacli`, which can only be executed by root. It uses `sudo` and assumes that it is configured such that the `netdata` user can execute `ssacli` as root without a password. diff --git a/collectors/python.d.plugin/icecast/README.md b/collectors/python.d.plugin/icecast/README.md index 25bbf738e..db3c1b572 100644..120000 --- a/collectors/python.d.plugin/icecast/README.md +++ b/collectors/python.d.plugin/icecast/README.md @@ -1,67 +1 @@ -<!-- -title: "Icecast monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/icecast/README.md" -sidebar_label: "Icecast" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Networking" ---> - -# Icecast collector - -Monitors the number of listeners for active sources. - -## Requirements - -- icecast version >= 2.4.0 - -It produces the following charts: - -1. **Listeners** in listeners - -- source number - -## Configuration - -Edit the `python.d/icecast.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/icecast.conf -``` - -Needs only `url` to server's `/status-json.xsl` - -Here is an example for remote server: - -```yaml -remote: - url : 'http://1.2.3.4:8443/status-json.xsl' -``` - -Without configuration, module attempts to connect to `http://localhost:8443/status-json.xsl` - - - - -### Troubleshooting - -To troubleshoot issues with the `icecast` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `icecast` module in debug mode: - -```bash -./python.d.plugin icecast debug trace -``` - +integrations/icecast.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/icecast/integrations/icecast.md b/collectors/python.d.plugin/icecast/integrations/icecast.md new file mode 100644 index 000000000..06c317864 --- /dev/null +++ b/collectors/python.d.plugin/icecast/integrations/icecast.md @@ -0,0 +1,165 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/icecast/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/icecast/metadata.yaml" +sidebar_label: "Icecast" +learn_status: "Published" +learn_rel_path: "Data Collection/Media Services" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Icecast + + +<img src="https://netdata.cloud/img/icecast.svg" width="150"/> + + +Plugin: python.d.plugin +Module: icecast + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Icecast listener counts. + +It connects to an icecast URL and uses the `status-json.xsl` endpoint to retrieve statistics. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +Without configuration, the collector attempts to connect to http://localhost:8443/status-json.xsl + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Icecast instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| icecast.listeners | a dimension for each active source | listeners | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Icecast minimum version + +Needs at least icecast version >= 2.4.0 + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/icecast.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/icecast.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| url | The URL (and port) to the icecast server. Needs to also include `/status-json.xsl` | http://localhost:8443/status-json.xsl | False | +| user | Username to use to connect to `url` if it's password protected. | | False | +| pass | Password to use to connect to `url` if it's password protected. | | False | + +</details> + +#### Examples + +##### Remote Icecast server + +Configure a remote icecast server + +```yaml +remote: + url: 'http://1.2.3.4:8443/status-json.xsl' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `icecast` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin icecast debug trace + ``` + + diff --git a/collectors/python.d.plugin/ipfs/README.md b/collectors/python.d.plugin/ipfs/README.md index c990ae34f..eee6a07b2 100644..120000 --- a/collectors/python.d.plugin/ipfs/README.md +++ b/collectors/python.d.plugin/ipfs/README.md @@ -1,74 +1 @@ -<!-- -title: "IPFS monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ipfs/README.md" -sidebar_label: "IPFS" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Storage" ---> - -# IPFS collector - -Collects [`IPFS`](https://ipfs.io) basic information like file system bandwidth, peers and repo metrics. - -## Charts - -It produces the following charts: - -- Bandwidth in `kilobits/s` -- Peers in `peers` -- Repo Size in `GiB` -- Repo Objects in `objects` - -## Configuration - -Edit the `python.d/ipfs.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/ipfs.conf -``` - - - -Calls to the following endpoints are disabled due to `IPFS` bugs: - -- `/api/v0/stats/repo` (https://github.com/ipfs/go-ipfs/issues/3874) -- `/api/v0/pin/ls` (https://github.com/ipfs/go-ipfs/issues/7528) - -Can be enabled in the collector configuration file. - -The configuration needs only `url` to `IPFS` server, here is an example for 2 `IPFS` instances: - -```yaml -localhost: - url: 'http://localhost:5001' - -remote: - url: 'http://203.0.113.10::5001' -``` - - - - -### Troubleshooting - -To troubleshoot issues with the `ipfs` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `ipfs` module in debug mode: - -```bash -./python.d.plugin ipfs debug trace -``` - +integrations/ipfs.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/ipfs/integrations/ipfs.md b/collectors/python.d.plugin/ipfs/integrations/ipfs.md new file mode 100644 index 000000000..c43c27b34 --- /dev/null +++ b/collectors/python.d.plugin/ipfs/integrations/ipfs.md @@ -0,0 +1,202 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ipfs/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/ipfs/metadata.yaml" +sidebar_label: "IPFS" +learn_status: "Published" +learn_rel_path: "Data Collection/Storage, Mount Points and Filesystems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# IPFS + + +<img src="https://netdata.cloud/img/ipfs.svg" width="150"/> + + +Plugin: python.d.plugin +Module: ipfs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors IPFS server metrics about its quality and performance. + +It connects to an http endpoint of the IPFS server to collect the metrics + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +If the endpoint is accessible by the Agent, netdata will autodetect it + +#### Limits + +Calls to the following endpoints are disabled due to IPFS bugs: + +/api/v0/stats/repo (https://github.com/ipfs/go-ipfs/issues/3874) +/api/v0/pin/ls (https://github.com/ipfs/go-ipfs/issues/7528) + + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per IPFS instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| ipfs.bandwidth | in, out | kilobits/s | +| ipfs.peers | peers | peers | +| ipfs.repo_size | avail, size | GiB | +| ipfs.repo_objects | objects, pinned, recursive_pins | objects | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ ipfs_datastore_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/ipfs.conf) | ipfs.repo_size | IPFS datastore utilization | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/ipfs.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/ipfs.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary></summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | The JOB's name as it will appear at the dashboard (by default is the job_name) | job_name | False | +| url | URL to the IPFS API | no | True | +| repoapi | Collect repo metrics. | no | False | +| pinapi | Set status of IPFS pinned object polling. | no | False | + +</details> + +#### Examples + +##### Basic (default out-of-the-box) + +A basic example configuration, one job will run at a time. Autodetect mechanism uses it by default. + +```yaml +localhost: + name: 'local' + url: 'http://localhost:5001' + repoapi: no + pinapi: no + +``` +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + url: 'http://localhost:5001' + repoapi: no + pinapi: no + +remote_host: + name: 'remote' + url: 'http://192.0.2.1:5001' + repoapi: no + pinapi: no + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `ipfs` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin ipfs debug trace + ``` + + diff --git a/collectors/python.d.plugin/litespeed/README.md b/collectors/python.d.plugin/litespeed/README.md index 1ad5ad42c..e7418b3dc 100644..120000 --- a/collectors/python.d.plugin/litespeed/README.md +++ b/collectors/python.d.plugin/litespeed/README.md @@ -1,95 +1 @@ -<!-- -title: "LiteSpeed monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/litespeed/README.md" -sidebar_label: "LiteSpeed" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Application Performance Monitoring" ---> - -# LiteSpeed collector - -Collects web server performance metrics for network, connection, requests, and cache. - -It produces: - -1. **Network Throughput HTTP** in kilobits/s - - - in - - out - -2. **Network Throughput HTTPS** in kilobits/s - - - in - - out - -3. **Connections HTTP** in connections - - - free - - used - -4. **Connections HTTPS** in connections - - - free - - used - -5. **Requests** in requests/s - - - requests - -6. **Requests In Processing** in requests - - - processing - -7. **Public Cache Hits** in hits/s - - - hits - -8. **Private Cache Hits** in hits/s - - - hits - -9. **Static Hits** in hits/s - - - hits - -## Configuration - -Edit the `python.d/litespeed.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/litespeed.conf -``` - -```yaml -local: - path : 'PATH' -``` - -If no configuration is given, module will use "/tmp/lshttpd/". - - - - -### Troubleshooting - -To troubleshoot issues with the `litespeed` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `litespeed` module in debug mode: - -```bash -./python.d.plugin litespeed debug trace -``` - +integrations/litespeed.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/litespeed/integrations/litespeed.md b/collectors/python.d.plugin/litespeed/integrations/litespeed.md new file mode 100644 index 000000000..511c112e9 --- /dev/null +++ b/collectors/python.d.plugin/litespeed/integrations/litespeed.md @@ -0,0 +1,169 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/litespeed/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/litespeed/metadata.yaml" +sidebar_label: "Litespeed" +learn_status: "Published" +learn_rel_path: "Data Collection/Web Servers and Web Proxies" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Litespeed + + +<img src="https://netdata.cloud/img/litespeed.svg" width="150"/> + + +Plugin: python.d.plugin +Module: litespeed + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Examine Litespeed metrics for insights into web server operations. Analyze request rates, response times, and error rates for efficient web service delivery. + +The collector uses the statistics under /tmp/lshttpd to gather the metrics. + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +If no configuration is present, the collector will attempt to read files under /tmp/lshttpd/. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Litespeed instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| litespeed.net_throughput | in, out | kilobits/s | +| litespeed.net_throughput | in, out | kilobits/s | +| litespeed.connections | free, used | conns | +| litespeed.connections | free, used | conns | +| litespeed.requests | requests | requests/s | +| litespeed.requests_processing | processing | requests | +| litespeed.cache | hits | hits/s | +| litespeed.cache | hits | hits/s | +| litespeed.static | hits | hits/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/litespeed.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/litespeed.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| path | Use a different path than the default, where the lightspeed stats files reside. | /tmp/lshttpd/ | False | + +</details> + +#### Examples + +##### Set the path to statistics + +Change the path for the litespeed stats files + +```yaml +localhost: + name: 'local' + path: '/tmp/lshttpd' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `litespeed` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin litespeed debug trace + ``` + + diff --git a/collectors/python.d.plugin/megacli/README.md b/collectors/python.d.plugin/megacli/README.md index 1af4d0ea7..e5df4d41d 100644..120000 --- a/collectors/python.d.plugin/megacli/README.md +++ b/collectors/python.d.plugin/megacli/README.md @@ -1,109 +1 @@ -<!-- -title: "MegaRAID controller monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/megacli/README.md" -sidebar_label: "MegaRAID controllers" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Devices" ---> - -# MegaRAID controller collector - -Collects adapter, physical drives and battery stats using `megacli` command-line tool. - -Executed commands: - -- `sudo -n megacli -LDPDInfo -aAll` -- `sudo -n megacli -AdpBbuCmd -a0` - -## Requirements - -The module uses `megacli`, which can only be executed by `root`. It uses -`sudo` and assumes that it is configured such that the `netdata` user can execute `megacli` as root without a password. - -- Add to your `/etc/sudoers` file: - -`which megacli` shows the full path to the binary. - -```bash -netdata ALL=(root) NOPASSWD: /path/to/megacli -``` - -- Reset Netdata's systemd - unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux - distributions with systemd) - -The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `megacli` using `sudo`. - - -As the `root` user, do the following: - -```cmd -mkdir /etc/systemd/system/netdata.service.d -echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf -systemctl daemon-reload -systemctl restart netdata.service -``` - -## Charts - -- Adapter State -- Physical Drives Media Errors -- Physical Drives Predictive Failures -- Battery Relative State of Charge -- Battery Cycle Count - -## Enable the collector - -The `megacli` collector is disabled by default. To enable it, use `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` -file. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d.conf -``` - -Change the value of the `megacli` setting to `yes`. Save the file and restart the Netdata Agent -with `sudo systemctl restart netdata`, or the appropriate method for your system. - -## Configuration - -Edit the `python.d/megacli.conf` configuration file using `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/megacli.conf -``` - -Battery stats disabled by default. To enable them, modify `megacli.conf`. - -```yaml -do_battery: yes -``` - -Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate -method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - - -### Troubleshooting - -To troubleshoot issues with the `megacli` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `megacli` module in debug mode: - -```bash -./python.d.plugin megacli debug trace -``` - +integrations/megacli.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/megacli/integrations/megacli.md b/collectors/python.d.plugin/megacli/integrations/megacli.md new file mode 100644 index 000000000..bb3bdf6f2 --- /dev/null +++ b/collectors/python.d.plugin/megacli/integrations/megacli.md @@ -0,0 +1,219 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/megacli/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/megacli/metadata.yaml" +sidebar_label: "MegaCLI" +learn_status: "Published" +learn_rel_path: "Data Collection/Storage, Mount Points and Filesystems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# MegaCLI + + +<img src="https://netdata.cloud/img/hard-drive.svg" width="150"/> + + +Plugin: python.d.plugin +Module: megacli + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Examine MegaCLI metrics with Netdata for insights into RAID controller performance. Improve your RAID controller efficiency with real-time MegaCLI metrics. + +Collects adapter, physical drives and battery stats using megacli command-line tool + +Executed commands: + + - `sudo -n megacli -LDPDInfo -aAll` + - `sudo -n megacli -AdpBbuCmd -a0` + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + +The module uses megacli, which can only be executed by root. It uses sudo and assumes that it is configured such that the netdata user can execute megacli as root without a password. + +### Default Behavior + +#### Auto-Detection + +After all the permissions are satisfied, netdata should be to execute commands via the megacli command line utility + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per MegaCLI instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| megacli.adapter_degraded | a dimension per adapter | is degraded | +| megacli.pd_media_error | a dimension per physical drive | errors/s | +| megacli.pd_predictive_failure | a dimension per physical drive | failures/s | + +### Per battery + +Metrics related to Battery Backup Units, each BBU provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| megacli.bbu_relative_charge | adapter {battery id} | percentage | +| megacli.bbu_cycle_count | adapter {battery id} | cycle count | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ megacli_adapter_state ](https://github.com/netdata/netdata/blob/master/health/health.d/megacli.conf) | megacli.adapter_degraded | adapter is in the degraded state (0: false, 1: true) | +| [ megacli_pd_media_errors ](https://github.com/netdata/netdata/blob/master/health/health.d/megacli.conf) | megacli.pd_media_error | number of physical drive media errors | +| [ megacli_pd_predictive_failures ](https://github.com/netdata/netdata/blob/master/health/health.d/megacli.conf) | megacli.pd_predictive_failure | number of physical drive predictive failures | +| [ megacli_bbu_relative_charge ](https://github.com/netdata/netdata/blob/master/health/health.d/megacli.conf) | megacli.bbu_relative_charge | average battery backup unit (BBU) relative state of charge over the last 10 seconds | +| [ megacli_bbu_cycle_count ](https://github.com/netdata/netdata/blob/master/health/health.d/megacli.conf) | megacli.bbu_cycle_count | average battery backup unit (BBU) charge cycles count over the last 10 seconds | + + +## Setup + +### Prerequisites + +#### Grant permissions for netdata, to run megacli as sudoer + +The module uses megacli, which can only be executed by root. It uses sudo and assumes that it is configured such that the netdata user can execute megacli as root without a password. + +Add to your /etc/sudoers file: +which megacli shows the full path to the binary. + +```bash +netdata ALL=(root) NOPASSWD: /path/to/megacli +``` + + +#### Reset Netdata's systemd unit CapabilityBoundingSet (Linux distributions with systemd) + +The default CapabilityBoundingSet doesn't allow using sudo, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute arcconf using sudo. + +As root user, do the following: + +```bash +mkdir /etc/systemd/system/netdata.service.d +echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf +systemctl daemon-reload +systemctl restart netdata.service +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/megacli.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/megacli.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| do_battery | default is no. Battery stats (adds additional call to megacli `megacli -AdpBbuCmd -a0`). | no | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration per job + +```yaml +job_name: + name: myname + update_every: 1 + priority: 60000 + penalty: yes + autodetection_retry: 0 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `megacli` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin megacli debug trace + ``` + + diff --git a/collectors/python.d.plugin/megacli/metadata.yaml b/collectors/python.d.plugin/megacli/metadata.yaml index f75a8d2ab..4a2ba43ee 100644 --- a/collectors/python.d.plugin/megacli/metadata.yaml +++ b/collectors/python.d.plugin/megacli/metadata.yaml @@ -27,8 +27,8 @@ modules: Executed commands: - sudo -n megacli -LDPDInfo -aAll - sudo -n megacli -AdpBbuCmd -a0 + - `sudo -n megacli -LDPDInfo -aAll` + - `sudo -n megacli -AdpBbuCmd -a0` supported_platforms: include: [] exclude: [] diff --git a/collectors/python.d.plugin/memcached/README.md b/collectors/python.d.plugin/memcached/README.md index 612bd49d7..2cb76d33c 100644..120000 --- a/collectors/python.d.plugin/memcached/README.md +++ b/collectors/python.d.plugin/memcached/README.md @@ -1,122 +1 @@ -<!-- -title: "Memcached monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/memcached/README.md" -sidebar_label: "Memcached" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Databases" ---> - -# Memcached collector - -Collects memory-caching system performance metrics. It reads server response to stats command ([stats interface](https://github.com/memcached/memcached/wiki/Commands#stats)). - - -1. **Network** in kilobytes/s - - - read - - written - -2. **Connections** per second - - - current - - rejected - - total - -3. **Items** in cluster - - - current - - total - -4. **Evicted and Reclaimed** items - - - evicted - - reclaimed - -5. **GET** requests/s - - - hits - - misses - -6. **GET rate** rate in requests/s - - - rate - -7. **SET rate** rate in requests/s - - - rate - -8. **DELETE** requests/s - - - hits - - misses - -9. **CAS** requests/s - - - hits - - misses - - bad value - -10. **Increment** requests/s - - - hits - - misses - -11. **Decrement** requests/s - - - hits - - misses - -12. **Touch** requests/s - - - hits - - misses - -13. **Touch rate** rate in requests/s - - - rate - -## Configuration - -Edit the `python.d/memcached.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/memcached.conf -``` - -Sample: - -```yaml -localtcpip: - name : 'local' - host : '127.0.0.1' - port : 24242 -``` - -If no configuration is given, module will attempt to connect to memcached instance on `127.0.0.1:11211` address. - - - - -### Troubleshooting - -To troubleshoot issues with the `memcached` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `memcached` module in debug mode: - -```bash -./python.d.plugin memcached debug trace -``` - +integrations/memcached.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/memcached/integrations/memcached.md b/collectors/python.d.plugin/memcached/integrations/memcached.md new file mode 100644 index 000000000..012758304 --- /dev/null +++ b/collectors/python.d.plugin/memcached/integrations/memcached.md @@ -0,0 +1,214 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/memcached/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/memcached/metadata.yaml" +sidebar_label: "Memcached" +learn_status: "Published" +learn_rel_path: "Data Collection/Databases" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Memcached + + +<img src="https://netdata.cloud/img/memcached.svg" width="150"/> + + +Plugin: python.d.plugin +Module: memcached + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor Memcached metrics for proficient in-memory key-value store operations. Track cache hits, misses, and memory usage for efficient data caching. + +It reads server response to stats command ([stats interface](https://github.com/memcached/memcached/wiki/Commands#stats)). + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +If no configuration is given, collector will attempt to connect to memcached instance on `127.0.0.1:11211` address. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Memcached instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| memcached.cache | available, used | MiB | +| memcached.net | in, out | kilobits/s | +| memcached.connections | current, rejected, total | connections/s | +| memcached.items | current, total | items | +| memcached.evicted_reclaimed | reclaimed, evicted | items | +| memcached.get | hints, misses | requests | +| memcached.get_rate | rate | requests/s | +| memcached.set_rate | rate | requests/s | +| memcached.delete | hits, misses | requests | +| memcached.cas | hits, misses, bad value | requests | +| memcached.increment | hits, misses | requests | +| memcached.decrement | hits, misses | requests | +| memcached.touch | hits, misses | requests | +| memcached.touch_rate | rate | requests/s | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ memcached_cache_memory_usage ](https://github.com/netdata/netdata/blob/master/health/health.d/memcached.conf) | memcached.cache | cache memory utilization | +| [ memcached_cache_fill_rate ](https://github.com/netdata/netdata/blob/master/health/health.d/memcached.conf) | memcached.cache | average rate the cache fills up (positive), or frees up (negative) space over the last hour | +| [ memcached_out_of_cache_space_time ](https://github.com/netdata/netdata/blob/master/health/health.d/memcached.conf) | memcached.cache | estimated time the cache will run out of space if the system continues to add data at the same rate as the past hour | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/memcached.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/memcached.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| host | the host to connect to. | 127.0.0.1 | False | +| port | the port to connect to. | 11211 | False | +| update_every | Sets the default data collection frequency. | 10 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### localhost + +An example configuration for localhost. + +```yaml +localhost: + name: 'local' + host: 'localhost' + port: 11211 + +``` +##### localipv4 + +An example configuration for localipv4. + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + host: '127.0.0.1' + port: 11211 + +``` +</details> + +##### localipv6 + +An example configuration for localipv6. + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + host: '::1' + port: 11211 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `memcached` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin memcached debug trace + ``` + + diff --git a/collectors/python.d.plugin/monit/README.md b/collectors/python.d.plugin/monit/README.md index f762de0d3..ac69496f4 100644..120000 --- a/collectors/python.d.plugin/monit/README.md +++ b/collectors/python.d.plugin/monit/README.md @@ -1,78 +1 @@ -<!-- -title: "Monit monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/monit/README.md" -sidebar_label: "Monit" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Storage" ---> - -# Monit collector - -Monit monitoring module. Data is grabbed from stats XML interface (exists for a long time, but not mentioned in official -documentation). Mostly this plugin shows statuses of monit targets, i.e. -[statuses of specified checks](https://mmonit.com/monit/documentation/monit.html#Service-checks). - -1. **Filesystems** - - - Filesystems - - Directories - - Files - - Pipes - -2. **Applications** - - - Processes (+threads/childs) - - Programs - -3. **Network** - - - Hosts (+latency) - - Network interfaces - -## Configuration - -Edit the `python.d/monit.conf` configuration file using `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically -at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/monit.conf -``` - -Sample: - -```yaml -local: - name: 'local' - url: 'http://localhost:2812' - user: : admin - pass: : monit -``` - -If no configuration is given, module will attempt to connect to monit as `http://localhost:2812`. - - - - -### Troubleshooting - -To troubleshoot issues with the `monit` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `monit` module in debug mode: - -```bash -./python.d.plugin monit debug trace -``` - +integrations/monit.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/monit/integrations/monit.md b/collectors/python.d.plugin/monit/integrations/monit.md new file mode 100644 index 000000000..ecf522f84 --- /dev/null +++ b/collectors/python.d.plugin/monit/integrations/monit.md @@ -0,0 +1,213 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/monit/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/monit/metadata.yaml" +sidebar_label: "Monit" +learn_status: "Published" +learn_rel_path: "Data Collection/Synthetic Checks" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Monit + + +<img src="https://netdata.cloud/img/monit.png" width="150"/> + + +Plugin: python.d.plugin +Module: monit + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Monit targets such as filesystems, directories, files, FIFO pipes and more. + + +It gathers data from Monit's XML interface. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, this collector will attempt to connect to Monit at `http://localhost:2812` + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Monit instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| monit.filesystems | a dimension per target | filesystems | +| monit.directories | a dimension per target | directories | +| monit.files | a dimension per target | files | +| monit.fifos | a dimension per target | pipes | +| monit.programs | a dimension per target | programs | +| monit.services | a dimension per target | processes | +| monit.process_uptime | a dimension per target | seconds | +| monit.process_threads | a dimension per target | threads | +| monit.process_childrens | a dimension per target | children | +| monit.hosts | a dimension per target | hosts | +| monit.host_latency | a dimension per target | milliseconds | +| monit.networks | a dimension per target | interfaces | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/monit.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/monit.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | local | False | +| url | The URL to fetch Monit's metrics. | http://localhost:2812 | True | +| user | Username in case the URL is password protected. | | False | +| pass | Password in case the URL is password protected. | | False | + +</details> + +#### Examples + +##### Basic + +A basic configuration example. + +```yaml +localhost: + name : 'local' + url : 'http://localhost:2812' + +``` +##### Basic Authentication + +Example using basic username and password in order to authenticate. + +<details><summary>Config</summary> + +```yaml +localhost: + name : 'local' + url : 'http://localhost:2812' + user: 'foo' + pass: 'bar' + +``` +</details> + +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local' + url: 'http://localhost:2812' + +remote_job: + name: 'remote' + url: 'http://192.0.2.1:2812' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `monit` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin monit debug trace + ``` + + diff --git a/collectors/python.d.plugin/nsd/README.md b/collectors/python.d.plugin/nsd/README.md index ccc4e712b..59fcfe491 100644..120000 --- a/collectors/python.d.plugin/nsd/README.md +++ b/collectors/python.d.plugin/nsd/README.md @@ -1,91 +1 @@ -<!-- -title: "NSD monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nsd/README.md" -sidebar_label: "NSD" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Networking" ---> - -# NSD collector - -Uses the `nsd-control stats_noreset` command to provide `nsd` statistics. - -## Requirements - -- Version of `nsd` must be 4.0+ -- Netdata must have permissions to run `nsd-control stats_noreset` - -It produces: - -1. **Queries** - - - queries - -2. **Zones** - - - master - - slave - -3. **Protocol** - - - udp - - udp6 - - tcp - - tcp6 - -4. **Query Type** - - - A - - NS - - CNAME - - SOA - - PTR - - HINFO - - MX - - NAPTR - - TXT - - AAAA - - SRV - - ANY - -5. **Transfer** - - - NOTIFY - - AXFR - -6. **Return Code** - - - NOERROR - - FORMERR - - SERVFAIL - - NXDOMAIN - - NOTIMP - - REFUSED - - YXDOMAIN - -Configuration is not needed. - - - - -### Troubleshooting - -To troubleshoot issues with the `nsd` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `nsd` module in debug mode: - -```bash -./python.d.plugin nsd debug trace -``` - +integrations/name_server_daemon.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md b/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md new file mode 100644 index 000000000..8ed86bdf9 --- /dev/null +++ b/collectors/python.d.plugin/nsd/integrations/name_server_daemon.md @@ -0,0 +1,198 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nsd/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/nsd/metadata.yaml" +sidebar_label: "Name Server Daemon" +learn_status: "Published" +learn_rel_path: "Data Collection/DNS and DHCP Servers" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Name Server Daemon + + +<img src="https://netdata.cloud/img/nsd.svg" width="150"/> + + +Plugin: python.d.plugin +Module: nsd + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors NSD statistics like queries, zones, protocols, query types and more. + + +It uses the `nsd-control stats_noreset` command to gather metrics. + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +If permissions are satisfied, the collector will be able to run `nsd-control stats_noreset`, thus collecting metrics. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Name Server Daemon instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| nsd.queries | queries | queries/s | +| nsd.zones | master, slave | zones | +| nsd.protocols | udp, udp6, tcp, tcp6 | queries/s | +| nsd.type | A, NS, CNAME, SOA, PTR, HINFO, MX, NAPTR, TXT, AAAA, SRV, ANY | queries/s | +| nsd.transfer | NOTIFY, AXFR | queries/s | +| nsd.rcode | NOERROR, FORMERR, SERVFAIL, NXDOMAIN, NOTIMP, REFUSED, YXDOMAIN | queries/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### NSD version + +The version of `nsd` must be 4.0+. + + +#### Provide Netdata the permissions to run the command + +Netdata must have permissions to run the `nsd-control stats_noreset` command. + +You can: + +- Add "netdata" user to "nsd" group: + ``` + usermod -aG nsd netdata + ``` +- Add Netdata to sudoers + 1. Edit the sudoers file: + ``` + visudo -f /etc/sudoers.d/netdata + ``` + 2. Add the entry: + ``` + Defaults:netdata !requiretty + netdata ALL=(ALL) NOPASSWD: /usr/sbin/nsd-control stats_noreset + ``` + + > Note that you will need to set the `command` option to `sudo /usr/sbin/nsd-control stats_noreset` if you use this method. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/nsd.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/nsd.conf +``` +#### Options + +This particular collector does not need further configuration to work if permissions are satisfied, but you can always customize it's data collection behavior. + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 30 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| command | The command to run | nsd-control stats_noreset | False | + +</details> + +#### Examples + +##### Basic + +A basic configuration example. + +```yaml +local: + name: 'nsd_local' + command: 'nsd-control stats_noreset' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `nsd` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin nsd debug trace + ``` + + diff --git a/collectors/python.d.plugin/nsd/metadata.yaml b/collectors/python.d.plugin/nsd/metadata.yaml index bd0a256f3..f5e2c46b0 100644 --- a/collectors/python.d.plugin/nsd/metadata.yaml +++ b/collectors/python.d.plugin/nsd/metadata.yaml @@ -40,6 +40,9 @@ modules: setup: prerequisites: list: + - title: NSD version + description: | + The version of `nsd` must be 4.0+. - title: Provide Netdata the permissions to run the command description: | Netdata must have permissions to run the `nsd-control stats_noreset` command. diff --git a/collectors/python.d.plugin/openldap/README.md b/collectors/python.d.plugin/openldap/README.md index eddf40b2c..45f36b9b9 100644..120000 --- a/collectors/python.d.plugin/openldap/README.md +++ b/collectors/python.d.plugin/openldap/README.md @@ -1,102 +1 @@ -<!-- -title: "OpenLDAP monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/openldap/README.md" -sidebar_label: "OpenLDAP" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Networking" ---> - -# OpenLDAP collector - -Provides statistics information from openldap (slapd) server. -Statistics are taken from LDAP monitoring interface. Manual page, slapd-monitor(5) is available. - -**Requirement:** - -- Follow instructions from <https://www.openldap.org/doc/admin24/monitoringslapd.html> to activate monitoring interface. -- Install python ldap module `pip install ldap` or `yum install python-ldap` -- Modify openldap.conf with your credentials - -### Module gives information with following charts: - -1. **connections** - - - total connections number - -2. **Bytes** - - - sent - -3. **operations** - - - completed - - initiated - -4. **referrals** - - - sent - -5. **entries** - - - sent - -6. **ldap operations** - - - bind - - search - - unbind - - add - - delete - - modify - - compare - -7. **waiters** - - - read - - write - -## Configuration - -Edit the `python.d/openldap.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/openldap.conf -``` - -Sample: - -```yaml -openldap: - name : 'local' - username : "cn=monitor,dc=superb,dc=eu" - password : "testpass" - server : 'localhost' - port : 389 -``` - - - - -### Troubleshooting - -To troubleshoot issues with the `openldap` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `openldap` module in debug mode: - -```bash -./python.d.plugin openldap debug trace -``` - +integrations/openldap.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/openldap/integrations/openldap.md b/collectors/python.d.plugin/openldap/integrations/openldap.md new file mode 100644 index 000000000..375132edb --- /dev/null +++ b/collectors/python.d.plugin/openldap/integrations/openldap.md @@ -0,0 +1,214 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/openldap/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/openldap/metadata.yaml" +sidebar_label: "OpenLDAP" +learn_status: "Published" +learn_rel_path: "Data Collection/Authentication and Authorization" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# OpenLDAP + + +<img src="https://netdata.cloud/img/statsd.png" width="150"/> + + +Plugin: python.d.plugin +Module: openldap + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors OpenLDAP metrics about connections, operations, referrals and more. + +Statistics are taken from the monitoring interface of a openLDAP (slapd) server + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +This collector doesn't work until all the prerequisites are checked. + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per OpenLDAP instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| openldap.total_connections | connections | connections/s | +| openldap.traffic_stats | sent | KiB/s | +| openldap.operations_status | completed, initiated | ops/s | +| openldap.referrals | sent | referrals/s | +| openldap.entries | sent | entries/s | +| openldap.ldap_operations | bind, search, unbind, add, delete, modify, compare | ops/s | +| openldap.waiters | write, read | waiters/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Configure the openLDAP server to expose metrics to monitor it. + +Follow instructions from https://www.openldap.org/doc/admin24/monitoringslapd.html to activate monitoring interface. + + +#### Install python-ldap module + +Install python ldap module + +1. From pip package manager + +```bash +pip install ldap +``` + +2. With apt package manager (in most deb based distros) + + +```bash +apt-get install python-ldap +``` + + +3. With yum package manager (in most rpm based distros) + + +```bash +yum install python-ldap +``` + + +#### Insert credentials for Netdata to access openLDAP server + +Use the `ldappasswd` utility to set a password for the username you will use. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/openldap.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/openldap.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| username | The bind user with right to access monitor statistics | | True | +| password | The password for the binded user | | True | +| server | The listening address of the LDAP server. In case of TLS, use the hostname which the certificate is published for. | | True | +| port | The listening port of the LDAP server. Change to 636 port in case of TLS connection. | 389 | True | +| use_tls | Make True if a TLS connection is used over ldaps:// | False | False | +| use_start_tls | Make True if a TLS connection is used over ldap:// | False | False | +| cert_check | False if you want to ignore certificate check | True | True | +| timeout | Seconds to timeout if no connection exist | | True | + +</details> + +#### Examples + +##### Basic + +A basic example configuration. + +```yaml +username: "cn=admin" +password: "pass" +server: "localhost" +port: "389" +check_cert: True +timeout: 1 + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `openldap` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin openldap debug trace + ``` + + diff --git a/collectors/python.d.plugin/oracledb/README.md b/collectors/python.d.plugin/oracledb/README.md index 315816de0..a75e3611e 100644..120000 --- a/collectors/python.d.plugin/oracledb/README.md +++ b/collectors/python.d.plugin/oracledb/README.md @@ -1,115 +1 @@ -<!-- -title: "OracleDB monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/oracledb/README.md" -sidebar_label: "OracleDB" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Databases" ---> - -# OracleDB collector - -Monitors the performance and health metrics of the Oracle database. - -## Requirements - -- `oracledb` package. - -It produces following charts: - -- session activity - - Session Count - - Session Limit Usage - - Logons -- disk activity - - Physical Disk Reads/Writes - - Sorts On Disk - - Full Table Scans -- database and buffer activity - - Database Wait Time Ratio - - Shared Pool Free Memory - - In-Memory Sorts Ratio - - SQL Service Response Time - - User Rollbacks - - Enqueue Timeouts -- cache - - Cache Hit Ratio - - Global Cache Blocks Events -- activities - - Activities -- wait time - - Wait Time -- tablespace - - Size - - Usage - - Usage In Percent -- allocated space - - Size - - Usage - - Usage In Percent - -## prerequisite - -To use the Oracle module do the following: - -1. Install `oracledb` package ([link](https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html)). - -2. Create a read-only `netdata` user with proper access to your Oracle Database Server. - -Connect to your Oracle database with an administrative user and execute: - -```SQL -CREATE USER netdata IDENTIFIED BY <PASSWORD>; - -GRANT CONNECT TO netdata; -GRANT SELECT_CATALOG_ROLE TO netdata; -``` - -## Configuration - -Edit the `python.d/oracledb.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/oracledb.conf -``` - -```yaml -local: - user: 'netdata' - password: 'secret' - server: 'localhost:1521' - service: 'XE' - - -remote: - user: 'netdata' - password: 'secret' - server: '10.0.0.1:1521' - service: 'XE' -``` - -All parameters are required. Without them module will fail to start. - - -### Troubleshooting - -To troubleshoot issues with the `oracledb` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `oracledb` module in debug mode: - -```bash -./python.d.plugin oracledb debug trace -``` - +integrations/oracle_db.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/oracledb/integrations/oracle_db.md b/collectors/python.d.plugin/oracledb/integrations/oracle_db.md new file mode 100644 index 000000000..cb6637e8a --- /dev/null +++ b/collectors/python.d.plugin/oracledb/integrations/oracle_db.md @@ -0,0 +1,225 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/oracledb/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/oracledb/metadata.yaml" +sidebar_label: "Oracle DB" +learn_status: "Published" +learn_rel_path: "Data Collection/Databases" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Oracle DB + + +<img src="https://netdata.cloud/img/oracle.svg" width="150"/> + + +Plugin: python.d.plugin +Module: oracledb + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors OracleDB database metrics about sessions, tables, memory and more. + +It collects the metrics via the supported database client library + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +In order for this collector to work, it needs a read-only user `netdata` in the RDBMS. + + +### Default Behavior + +#### Auto-Detection + +When the requirements are met, databases on the local host on port 1521 will be auto-detected + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +These metrics refer to the entire monitored application. + +### Per Oracle DB instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| oracledb.session_count | total, active | sessions | +| oracledb.session_limit_usage | usage | % | +| oracledb.logons | logons | events/s | +| oracledb.physical_disk_read_writes | reads, writes | events/s | +| oracledb.sorts_on_disks | sorts | events/s | +| oracledb.full_table_scans | full table scans | events/s | +| oracledb.database_wait_time_ratio | wait time ratio | % | +| oracledb.shared_pool_free_memory | free memory | % | +| oracledb.in_memory_sorts_ratio | in-memory sorts | % | +| oracledb.sql_service_response_time | time | seconds | +| oracledb.user_rollbacks | rollbacks | events/s | +| oracledb.enqueue_timeouts | enqueue timeouts | events/s | +| oracledb.cache_hit_ration | buffer, cursor, library, row | % | +| oracledb.global_cache_blocks | corrupted, lost | events/s | +| oracledb.activity | parse count, execute count, user commits, user rollbacks | events/s | +| oracledb.wait_time | application, configuration, administrative, concurrency, commit, network, user I/O, system I/O, scheduler, other | ms | +| oracledb.tablespace_size | a dimension per active tablespace | KiB | +| oracledb.tablespace_usage | a dimension per active tablespace | KiB | +| oracledb.tablespace_usage_in_percent | a dimension per active tablespace | % | +| oracledb.allocated_size | a dimension per active tablespace | B | +| oracledb.allocated_usage | a dimension per active tablespace | B | +| oracledb.allocated_usage_in_percent | a dimension per active tablespace | % | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Install the python-oracledb package + +You can follow the official guide below to install the required package: + +Source: https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html + + +#### Create a read only user for netdata + +Follow the official instructions for your oracle RDBMS to create a read-only user for netdata. The operation may follow this approach + +Connect to your Oracle database with an administrative user and execute: + +```bash +CREATE USER netdata IDENTIFIED BY <PASSWORD>; + +GRANT CONNECT TO netdata; +GRANT SELECT_CATALOG_ROLE TO netdata; +``` + + +#### Edit the configuration + +Edit the configuration troubleshooting: + +1. Provide a valid user for the netdata collector to access the database +2. Specify the network target this database is listening. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/oracledb.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/oracledb.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| user | The username for the user account. | no | True | +| password | The password for the user account. | no | True | +| server | The IP address or hostname (and port) of the Oracle Database Server. | no | True | +| service | The Oracle Database service name. To view the services available on your server run this query, `select SERVICE_NAME from gv$session where sid in (select sid from V$MYSTAT)`. | no | True | +| protocol | one of the strings "tcp" or "tcps" indicating whether to use unencrypted network traffic or encrypted network traffic | no | True | + +</details> + +#### Examples + +##### Basic + +A basic example configuration, two jobs described for two databases. + +```yaml +local: + user: 'netdata' + password: 'secret' + server: 'localhost:1521' + service: 'XE' + protocol: 'tcps' + +remote: + user: 'netdata' + password: 'secret' + server: '10.0.0.1:1521' + service: 'XE' + protocol: 'tcps' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `oracledb` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin oracledb debug trace + ``` + + diff --git a/collectors/python.d.plugin/pandas/README.md b/collectors/python.d.plugin/pandas/README.md index 19b11d5be..2fabe63c1 100644..120000 --- a/collectors/python.d.plugin/pandas/README.md +++ b/collectors/python.d.plugin/pandas/README.md @@ -1,96 +1 @@ -# Ingest structured data (Pandas) - -<a href="https://pandas.pydata.org/" target="_blank"> - <img src="https://pandas.pydata.org/docs/_static/pandas.svg" alt="Pandas" width="100px" height="50px" /> - </a> - -[Pandas](https://pandas.pydata.org/) is a de-facto standard in reading and processing most types of structured data in Python. -If you have metrics appearing in a CSV, JSON, XML, HTML, or [other supported format](https://pandas.pydata.org/docs/user_guide/io.html), -either locally or via some HTTP endpoint, you can easily ingest and present those metrics in Netdata, by leveraging the Pandas collector. - -The collector uses [pandas](https://pandas.pydata.org/) to pull data and do pandas-based -preprocessing, before feeding to Netdata. - -## Requirements - -This collector depends on some Python (Python 3 only) packages that can usually be installed via `pip` or `pip3`. - -```bash -sudo pip install pandas requests -``` - -Note: If you would like to use [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) to query a database, you will need to install the below packages as well. - -```bash -sudo pip install 'sqlalchemy<2.0' psycopg2-binary -``` - -## Configuration - -Below is an example configuration to query some json weather data from [Open-Meteo](https://open-meteo.com), -do some data wrangling on it and save in format as expected by Netdata. - -```yaml -# example pulling some hourly temperature data -temperature: - name: "temperature" - update_every: 3 - chart_configs: - - name: "temperature_by_city" - title: "Temperature By City" - family: "temperature.today" - context: "pandas.temperature" - type: "line" - units: "Celsius" - df_steps: > - pd.DataFrame.from_dict( - {city: requests.get( - f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}&hourly=temperature_2m' - ).json()['hourly']['temperature_2m'] - for (city,lat,lng) - in [ - ('dublin', 53.3441, -6.2675), - ('athens', 37.9792, 23.7166), - ('london', 51.5002, -0.1262), - ('berlin', 52.5235, 13.4115), - ('paris', 48.8567, 2.3510), - ] - } - ); # use dictionary comprehension to make multiple requests; - df.describe(); # get aggregate stats for each city; - df.transpose()[['mean', 'max', 'min']].reset_index(); # just take mean, min, max; - df.rename(columns={'index':'city'}); # some column renaming; - df.pivot(columns='city').mean().to_frame().reset_index(); # force to be one row per city; - df.rename(columns={0:'degrees'}); # some column renaming; - pd.concat([df, df['city']+'_'+df['level_0']], axis=1); # add new column combining city and summary measurement label; - df.rename(columns={0:'measurement'}); # some column renaming; - df[['measurement', 'degrees']].set_index('measurement'); # just take two columns we want; - df.sort_index(); # sort by city name; - df.transpose(); # transpose so its just one wide row; -``` - -`chart_configs` is a list of dictionary objects where each one defines the sequence of `df_steps` to be run using [`pandas`](https://pandas.pydata.org/), -and the `name`, `title` etc to define the -[CHART variables](https://github.com/netdata/netdata/blob/master/docs/guides/python-collector.md#create-charts) -that will control how the results will look in netdata. - -The example configuration above would result in a `data` dictionary like the below being collected by Netdata -at each time step. They keys in this dictionary will be the "dimensions" of the chart. - -```javascript -{'athens_max': 26.2, 'athens_mean': 19.45952380952381, 'athens_min': 12.2, 'berlin_max': 17.4, 'berlin_mean': 10.764285714285714, 'berlin_min': 5.7, 'dublin_max': 15.3, 'dublin_mean': 12.008928571428571, 'dublin_min': 6.6, 'london_max': 18.9, 'london_mean': 12.510714285714286, 'london_min': 5.2, 'paris_max': 19.4, 'paris_mean': 12.054166666666665, 'paris_min': 4.8} -``` - -Which, given the above configuration would end up as a chart like below in Netdata. - -![pandas collector temperature example chart](https://user-images.githubusercontent.com/2178292/195075312-8ce8cf68-5172-48e3-af09-104ffecfcdd6.png) - -## Notes -- Each line in `df_steps` must return a pandas -[DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object (`df`) at each step. -- You can use -[this colab notebook](https://colab.research.google.com/drive/1VYrddSegZqGtkWGFuiUbMbUk5f3rW6Hi?usp=sharing) -to mock up and work on your `df_steps` iteratively before adding them to your config. -- This collector is expecting one row in the final pandas DataFrame. It is that first row that will be taken -as the most recent values for each dimension on each chart using (`df.to_dict(orient='records')[0]`). -See [pd.to_dict()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html). +integrations/pandas.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/pandas/integrations/pandas.md b/collectors/python.d.plugin/pandas/integrations/pandas.md new file mode 100644 index 000000000..d5da2f262 --- /dev/null +++ b/collectors/python.d.plugin/pandas/integrations/pandas.md @@ -0,0 +1,364 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/pandas/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/pandas/metadata.yaml" +sidebar_label: "Pandas" +learn_status: "Published" +learn_rel_path: "Data Collection/Generic Data Collection" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Pandas + + +<img src="https://netdata.cloud/img/pandas.png" width="150"/> + + +Plugin: python.d.plugin +Module: pandas + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +[Pandas](https://pandas.pydata.org/) is a de-facto standard in reading and processing most types of structured data in Python. +If you have metrics appearing in a CSV, JSON, XML, HTML, or [other supported format](https://pandas.pydata.org/docs/user_guide/io.html), +either locally or via some HTTP endpoint, you can easily ingest and present those metrics in Netdata, by leveraging the Pandas collector. + +This collector can be used to collect pretty much anything that can be read by Pandas, and then processed by Pandas. + + +The collector uses [pandas](https://pandas.pydata.org/) to pull data and do pandas-based preprocessing, before feeding to Netdata. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +This collector is expecting one row in the final pandas DataFrame. It is that first row that will be taken +as the most recent values for each dimension on each chart using (`df.to_dict(orient='records')[0]`). +See [pd.to_dict()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html)." + + +### Per Pandas instance + +These metrics refer to the entire monitored application. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Python Requirements + +This collector depends on some Python (Python 3 only) packages that can usually be installed via `pip` or `pip3`. + +```bash +sudo pip install pandas requests +``` + +Note: If you would like to use [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) to query a database, you will need to install the below packages as well. + +```bash +sudo pip install 'sqlalchemy<2.0' psycopg2-binary +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/pandas.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/pandas.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| chart_configs | an array of chart configuration dictionaries | [] | True | +| chart_configs.name | name of the chart to be displayed in the dashboard. | None | True | +| chart_configs.title | title of the chart to be displayed in the dashboard. | None | True | +| chart_configs.family | [family](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md#families) of the chart to be displayed in the dashboard. | None | True | +| chart_configs.context | [context](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md#contexts) of the chart to be displayed in the dashboard. | None | True | +| chart_configs.type | the type of the chart to be displayed in the dashboard. | None | True | +| chart_configs.units | the units of the chart to be displayed in the dashboard. | None | True | +| chart_configs.df_steps | a series of pandas operations (one per line) that each returns a dataframe. | None | True | +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Temperature API Example + +example pulling some hourly temperature data, a chart for today forecast (mean,min,max) and another chart for current. + +<details><summary>Config</summary> + +```yaml +temperature: + name: "temperature" + update_every: 5 + chart_configs: + - name: "temperature_forecast_by_city" + title: "Temperature By City - Today Forecast" + family: "temperature.today" + context: "pandas.temperature" + type: "line" + units: "Celsius" + df_steps: > + pd.DataFrame.from_dict( + {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}&hourly=temperature_2m').json()['hourly']['temperature_2m'] + for (city,lat,lng) + in [ + ('dublin', 53.3441, -6.2675), + ('athens', 37.9792, 23.7166), + ('london', 51.5002, -0.1262), + ('berlin', 52.5235, 13.4115), + ('paris', 48.8567, 2.3510), + ('madrid', 40.4167, -3.7033), + ('new_york', 40.71, -74.01), + ('los_angeles', 34.05, -118.24), + ] + } + ); + df.describe(); # get aggregate stats for each city; + df.transpose()[['mean', 'max', 'min']].reset_index(); # just take mean, min, max; + df.rename(columns={'index':'city'}); # some column renaming; + df.pivot(columns='city').mean().to_frame().reset_index(); # force to be one row per city; + df.rename(columns={0:'degrees'}); # some column renaming; + pd.concat([df, df['city']+'_'+df['level_0']], axis=1); # add new column combining city and summary measurement label; + df.rename(columns={0:'measurement'}); # some column renaming; + df[['measurement', 'degrees']].set_index('measurement'); # just take two columns we want; + df.sort_index(); # sort by city name; + df.transpose(); # transpose so its just one wide row; + - name: "temperature_current_by_city" + title: "Temperature By City - Current" + family: "temperature.current" + context: "pandas.temperature" + type: "line" + units: "Celsius" + df_steps: > + pd.DataFrame.from_dict( + {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}¤t_weather=true').json()['current_weather'] + for (city,lat,lng) + in [ + ('dublin', 53.3441, -6.2675), + ('athens', 37.9792, 23.7166), + ('london', 51.5002, -0.1262), + ('berlin', 52.5235, 13.4115), + ('paris', 48.8567, 2.3510), + ('madrid', 40.4167, -3.7033), + ('new_york', 40.71, -74.01), + ('los_angeles', 34.05, -118.24), + ] + } + ); + df.transpose(); + df[['temperature']]; + df.transpose(); + +``` +</details> + +##### API CSV Example + +example showing a read_csv from a url and some light pandas data wrangling. + +<details><summary>Config</summary> + +```yaml +example_csv: + name: "example_csv" + update_every: 2 + chart_configs: + - name: "london_system_cpu" + title: "London System CPU - Ratios" + family: "london_system_cpu" + context: "pandas" + type: "line" + units: "n" + df_steps: > + pd.read_csv('https://london.my-netdata.io/api/v1/data?chart=system.cpu&format=csv&after=-60', storage_options={'User-Agent': 'netdata'}); + df.drop('time', axis=1); + df.mean().to_frame().transpose(); + df.apply(lambda row: (row.user / row.system), axis = 1).to_frame(); + df.rename(columns={0:'average_user_system_ratio'}); + df*100; + +``` +</details> + +##### API JSON Example + +example showing a read_json from a url and some light pandas data wrangling. + +<details><summary>Config</summary> + +```yaml +example_json: + name: "example_json" + update_every: 2 + chart_configs: + - name: "london_system_net" + title: "London System Net - Total Bandwidth" + family: "london_system_net" + context: "pandas" + type: "area" + units: "kilobits/s" + df_steps: > + pd.DataFrame(requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['data'], columns=requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['labels']); + df.drop('time', axis=1); + abs(df); + df.sum(axis=1).to_frame(); + df.rename(columns={0:'total_bandwidth'}); + +``` +</details> + +##### XML Example + +example showing a read_xml from a url and some light pandas data wrangling. + +<details><summary>Config</summary> + +```yaml +example_xml: + name: "example_xml" + update_every: 2 + line_sep: "|" + chart_configs: + - name: "temperature_forcast" + title: "Temperature Forecast" + family: "temp" + context: "pandas.temp" + type: "line" + units: "celsius" + df_steps: > + pd.read_xml('http://metwdb-openaccess.ichec.ie/metno-wdb2ts/locationforecast?lat=54.7210798611;long=-8.7237392806', xpath='./product/time[1]/location/temperature', parser='etree')| + df.rename(columns={'value': 'dublin'})| + df[['dublin']]| + +``` +</details> + +##### SQL Example + +example showing a read_sql from a postgres database using sqlalchemy. + +<details><summary>Config</summary> + +```yaml +sql: + name: "sql" + update_every: 5 + chart_configs: + - name: "sql" + title: "SQL Example" + family: "sql.example" + context: "example" + type: "line" + units: "percent" + df_steps: > + pd.read_sql_query( + sql='\ + select \ + random()*100 as metric_1, \ + random()*100 as metric_2 \ + ', + con=create_engine('postgresql://localhost/postgres?user=netdata&password=netdata') + ); + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `pandas` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin pandas debug trace + ``` + + diff --git a/collectors/python.d.plugin/pandas/metadata.yaml b/collectors/python.d.plugin/pandas/metadata.yaml index 28a1d3b21..92ee1e986 100644 --- a/collectors/python.d.plugin/pandas/metadata.yaml +++ b/collectors/python.d.plugin/pandas/metadata.yaml @@ -5,7 +5,7 @@ modules: module_name: pandas monitored_instance: name: Pandas - link: https://learn.netdata.cloud/docs/data-collection/generic-data-collection/structured-data-pandas + link: https://pandas.pydata.org/ categories: - data-collection.generic-data-collection icon_filename: pandas.png @@ -26,8 +26,6 @@ modules: either locally or via some HTTP endpoint, you can easily ingest and present those metrics in Netdata, by leveraging the Pandas collector. This collector can be used to collect pretty much anything that can be read by Pandas, and then processed by Pandas. - - More detailed information can be found in the Netdata documentation [here](https://learn.netdata.cloud/docs/data-collection/generic-data-collection/structured-data-pandas). method_description: | The collector uses [pandas](https://pandas.pydata.org/) to pull data and do pandas-based preprocessing, before feeding to Netdata. supported_platforms: @@ -92,11 +90,11 @@ modules: default_value: None required: true - name: chart_configs.family - description: "[family](https://learn.netdata.cloud/docs/data-collection/chart-dimensions-contexts-and-families#family) of the chart to be displayed in the dashboard." + description: "[family](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md#families) of the chart to be displayed in the dashboard." default_value: None required: true - name: chart_configs.context - description: "[context](https://learn.netdata.cloud/docs/data-collection/chart-dimensions-contexts-and-families#context) of the chart to be displayed in the dashboard." + description: "[context](https://github.com/netdata/netdata/blob/master/docs/cloud/visualize/interact-new-charts.md#contexts) of the chart to be displayed in the dashboard." default_value: None required: true - name: chart_configs.type diff --git a/collectors/python.d.plugin/postfix/README.md b/collectors/python.d.plugin/postfix/README.md index ba5565499..c62eb5c24 100644..120000 --- a/collectors/python.d.plugin/postfix/README.md +++ b/collectors/python.d.plugin/postfix/README.md @@ -1,59 +1 @@ -<!-- -title: "Postfix monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/postfix/README.md" -sidebar_label: "Postfix" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# Postfix collector - -Monitors MTA email queue statistics using [postqueue](http://www.postfix.org/postqueue.1.html) tool. - -The collector executes `postqueue -p` to get Postfix queue statistics. - -## Requirements - -Postfix has internal access controls that limit activities on the mail queue. By default, all users are allowed to view -the queue. If your system is configured with stricter access controls, you need to grant the `netdata` user access to -view the mail queue. In order to do it, add `netdata` to `authorized_mailq_users` in the `/etc/postfix/main.cf` file. - -See the `authorized_mailq_users` setting in -the [Postfix documentation](https://www.postfix.org/postconf.5.html) for more details. - -## Charts - -It produces only two charts: - -1. **Postfix Queue Emails** - - - emails - -2. **Postfix Queue Emails Size** in KB - - - size - -## Configuration - -Configuration is not needed. -### Troubleshooting - -To troubleshoot issues with the `postfix` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `postfix` module in debug mode: - -```bash -./python.d.plugin postfix debug trace -``` - +integrations/postfix.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/postfix/integrations/postfix.md b/collectors/python.d.plugin/postfix/integrations/postfix.md new file mode 100644 index 000000000..7113d7ddd --- /dev/null +++ b/collectors/python.d.plugin/postfix/integrations/postfix.md @@ -0,0 +1,150 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/postfix/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/postfix/metadata.yaml" +sidebar_label: "Postfix" +learn_status: "Published" +learn_rel_path: "Data Collection/Mail Servers" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Postfix + + +<img src="https://netdata.cloud/img/postfix.svg" width="150"/> + + +Plugin: python.d.plugin +Module: postfix + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Keep an eye on Postfix metrics for efficient mail server operations. +Improve your mail server performance with Netdata's real-time metrics and built-in alerts. + + +Monitors MTA email queue statistics using [postqueue](http://www.postfix.org/postqueue.1.html) tool. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +Postfix has internal access controls that limit activities on the mail queue. By default, all users are allowed to view the queue. If your system is configured with stricter access controls, you need to grant the `netdata` user access to view the mail queue. In order to do it, add `netdata` to `authorized_mailq_users` in the `/etc/postfix/main.cf` file. +See the `authorized_mailq_users` setting in the [Postfix documentation](https://www.postfix.org/postconf.5.html) for more details. + + +### Default Behavior + +#### Auto-Detection + +The collector executes `postqueue -p` to get Postfix queue statistics. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Postfix instance + +These metrics refer to the entire monitored application. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| postfix.qemails | emails | emails | +| postfix.qsize | size | KiB | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +There is no configuration file. +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples +There are no configuration examples. + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `postfix` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin postfix debug trace + ``` + + diff --git a/collectors/python.d.plugin/puppet/README.md b/collectors/python.d.plugin/puppet/README.md index 3b0c55b97..b6c4c83f9 100644..120000 --- a/collectors/python.d.plugin/puppet/README.md +++ b/collectors/python.d.plugin/puppet/README.md @@ -1,90 +1 @@ -<!-- -title: "Puppet monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/puppet/README.md" -sidebar_label: "Puppet" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Provisioning tools" ---> - -# Puppet collector - -Monitor status of Puppet Server and Puppet DB. - -Following charts are drawn: - -1. **JVM Heap** - - - committed (allocated from OS) - - used (actual use) - -2. **JVM Non-Heap** - - - committed (allocated from OS) - - used (actual use) - -3. **CPU Usage** - - - execution - - GC (taken by garbage collection) - -4. **File Descriptors** - - - max - - used - -## Configuration - -Edit the `python.d/puppet.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/puppet.conf -``` - -```yaml -puppetdb: - url: 'https://fqdn.example.com:8081' - tls_cert_file: /path/to/client.crt - tls_key_file: /path/to/client.key - autodetection_retry: 1 - -puppetserver: - url: 'https://fqdn.example.com:8140' - autodetection_retry: 1 -``` - -When no configuration is given, module uses `https://fqdn.example.com:8140`. - -### notes - -- Exact Fully Qualified Domain Name of the node should be used. -- Usually Puppet Server/DB startup time is VERY long. So, there should - be quite reasonable retry count. -- Secure PuppetDB config may require client certificate. Not applies - to default PuppetDB configuration though. - - - - -### Troubleshooting - -To troubleshoot issues with the `puppet` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `puppet` module in debug mode: - -```bash -./python.d.plugin puppet debug trace -``` - +integrations/puppet.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/puppet/integrations/puppet.md b/collectors/python.d.plugin/puppet/integrations/puppet.md new file mode 100644 index 000000000..be68749a3 --- /dev/null +++ b/collectors/python.d.plugin/puppet/integrations/puppet.md @@ -0,0 +1,214 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/puppet/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/puppet/metadata.yaml" +sidebar_label: "Puppet" +learn_status: "Published" +learn_rel_path: "Data Collection/CICD Platforms" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Puppet + + +<img src="https://netdata.cloud/img/puppet.svg" width="150"/> + + +Plugin: python.d.plugin +Module: puppet + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Puppet metrics about JVM Heap, Non-Heap, CPU usage and file descriptors.' + + +It uses Puppet's metrics API endpoint to gather the metrics. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, this collector will use `https://fqdn.example.com:8140` as the URL to look for metrics. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Puppet instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| puppet.jvm | committed, used | MiB | +| puppet.jvm | committed, used | MiB | +| puppet.cpu | execution, GC | percentage | +| puppet.fdopen | used | descriptors | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/puppet.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/puppet.conf +``` +#### Options + +This particular collector does not need further configuration to work if permissions are satisfied, but you can always customize it's data collection behavior. + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + +> Notes: +> - Exact Fully Qualified Domain Name of the node should be used. +> - Usually Puppet Server/DB startup time is VERY long. So, there should be quite reasonable retry count. +> - A secured PuppetDB config may require a client certificate. This does not apply to the default PuppetDB configuration though. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| url | HTTP or HTTPS URL, exact Fully Qualified Domain Name of the node should be used. | https://fqdn.example.com:8081 | True | +| tls_verify | Control HTTPS server certificate verification. | False | False | +| tls_ca_file | Optional CA (bundle) file to use | | False | +| tls_cert_file | Optional client certificate file | | False | +| tls_key_file | Optional client key file | | False | +| update_every | Sets the default data collection frequency. | 30 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration + +```yaml +puppetserver: + url: 'https://fqdn.example.com:8140' + autodetection_retry: 1 + +``` +##### TLS Certificate + +An example using a TLS certificate + +<details><summary>Config</summary> + +```yaml +puppetdb: + url: 'https://fqdn.example.com:8081' + tls_cert_file: /path/to/client.crt + tls_key_file: /path/to/client.key + autodetection_retry: 1 + +``` +</details> + +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +puppetserver1: + url: 'https://fqdn.example.com:8140' + autodetection_retry: 1 + +puppetserver2: + url: 'https://fqdn.example2.com:8140' + autodetection_retry: 1 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `puppet` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin puppet debug trace + ``` + + diff --git a/collectors/python.d.plugin/python.d.plugin.in b/collectors/python.d.plugin/python.d.plugin.in index 681ceb403..bc171e032 100644 --- a/collectors/python.d.plugin/python.d.plugin.in +++ b/collectors/python.d.plugin/python.d.plugin.in @@ -582,8 +582,8 @@ class Plugin: try: statuses = JobsStatuses().from_file(abs_path) except Exception as error: - self.log.error("[{0}] config file invalid YAML format: {1}".format( - module_name, ' '.join([v.strip() for v in str(error).split('\n')]))) + self.log.error("'{0}' invalid JSON format: {1}".format( + abs_path, ' '.join([v.strip() for v in str(error).split('\n')]))) return None self.log.debug("'{0}' is loaded".format(abs_path)) return statuses @@ -876,6 +876,17 @@ def main(): cmd = parse_command_line() log = PythonDLogger() + level = os.getenv('NETDATA_LOG_SEVERITY_LEVEL') or str() + level = level.lower() + if level == 'debug': + log.logger.severity = 'DEBUG' + elif level == 'info': + log.logger.severity = 'INFO' + elif level == 'warn' or level == 'warning': + log.logger.severity = 'WARNING' + elif level == 'err' or level == 'error': + log.logger.severity = 'ERROR' + if cmd.debug: log.logger.severity = 'DEBUG' if cmd.trace: diff --git a/collectors/python.d.plugin/rethinkdbs/README.md b/collectors/python.d.plugin/rethinkdbs/README.md index 527ce4c31..78ddcfa18 100644..120000 --- a/collectors/python.d.plugin/rethinkdbs/README.md +++ b/collectors/python.d.plugin/rethinkdbs/README.md @@ -1,77 +1 @@ -<!-- -title: "RethinkDB monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/rethinkdbs/README.md" -sidebar_label: "RethinkDB" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Databases" ---> - -# RethinkDB collector - -Collects database server and cluster statistics. - -Following charts are drawn: - -1. **Connected Servers** - - - connected - - missing - -2. **Active Clients** - - - active - -3. **Queries** per second - - - queries - -4. **Documents** per second - - - documents - -## Configuration - -Edit the `python.d/rethinkdbs.conf` configuration file using `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically -at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/rethinkdbs.conf -``` - -```yaml -localhost: - name: 'local' - host: '127.0.0.1' - port: 28015 - user: "user" - password: "pass" -``` - -When no configuration file is found, module tries to connect to `127.0.0.1:28015`. - - - - -### Troubleshooting - -To troubleshoot issues with the `rethinkdbs` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `rethinkdbs` module in debug mode: - -```bash -./python.d.plugin rethinkdbs debug trace -``` - +integrations/rethinkdb.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/rethinkdbs/integrations/rethinkdb.md b/collectors/python.d.plugin/rethinkdbs/integrations/rethinkdb.md new file mode 100644 index 000000000..c0b2cfbfd --- /dev/null +++ b/collectors/python.d.plugin/rethinkdbs/integrations/rethinkdb.md @@ -0,0 +1,189 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/rethinkdbs/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/rethinkdbs/metadata.yaml" +sidebar_label: "RethinkDB" +learn_status: "Published" +learn_rel_path: "Data Collection/Databases" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# RethinkDB + + +<img src="https://netdata.cloud/img/rethinkdb.png" width="150"/> + + +Plugin: python.d.plugin +Module: rethinkdbs + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors metrics about RethinkDB clusters and database servers. + +It uses the `rethinkdb` python module to connect to a RethinkDB server instance and gather statistics. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +When no configuration file is found, the collector tries to connect to 127.0.0.1:28015. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per RethinkDB instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| rethinkdb.cluster_connected_servers | connected, missing | servers | +| rethinkdb.cluster_clients_active | active | clients | +| rethinkdb.cluster_queries | queries | queries/s | +| rethinkdb.cluster_documents | reads, writes | documents/s | + +### Per database server + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| rethinkdb.client_connections | connections | connections | +| rethinkdb.clients_active | active | clients | +| rethinkdb.queries | queries | queries/s | +| rethinkdb.documents | reads, writes | documents/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Required python module + +The collector requires the `rethinkdb` python module to be installed. + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/rethinkdbs.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/rethinkdbs.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| host | Hostname or ip of the RethinkDB server. | localhost | False | +| port | Port to connect to the RethinkDB server. | 28015 | False | +| user | The username to use to connect to the RethinkDB server. | admin | False | +| password | The password to use to connect to the RethinkDB server. | | False | +| timeout | Set a connect timeout to the RethinkDB server. | 2 | False | + +</details> + +#### Examples + +##### Local RethinkDB server + +An example of a configuration for a local RethinkDB server + +```yaml +localhost: + name: 'local' + host: '127.0.0.1' + port: 28015 + user: "user" + password: "pass" + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `rethinkdbs` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin rethinkdbs debug trace + ``` + + diff --git a/collectors/python.d.plugin/retroshare/README.md b/collectors/python.d.plugin/retroshare/README.md index b7f2fcb14..4e4c2cdb7 100644..120000 --- a/collectors/python.d.plugin/retroshare/README.md +++ b/collectors/python.d.plugin/retroshare/README.md @@ -1,70 +1 @@ -<!-- -title: "RetroShare monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/retroshare/README.md" -sidebar_label: "RetroShare" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Apm" ---> - -# RetroShare collector - -Monitors application bandwidth, peers and DHT metrics. - -This module will monitor one or more `RetroShare` applications, depending on your configuration. - -## Charts - -This module produces the following charts: - -- Bandwidth in `kilobits/s` -- Peers in `peers` -- DHT in `peers` - - -## Configuration - -Edit the `python.d/retroshare.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/retroshare.conf -``` - -Here is an example for 2 servers: - -```yaml -localhost: - url : 'http://localhost:9090' - user : "user" - password : "pass" - -remote: - url : 'http://203.0.113.1:9090' - user : "user" - password : "pass" -``` - - - -### Troubleshooting - -To troubleshoot issues with the `retroshare` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `retroshare` module in debug mode: - -```bash -./python.d.plugin retroshare debug trace -``` - +integrations/retroshare.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/retroshare/integrations/retroshare.md b/collectors/python.d.plugin/retroshare/integrations/retroshare.md new file mode 100644 index 000000000..753a218c1 --- /dev/null +++ b/collectors/python.d.plugin/retroshare/integrations/retroshare.md @@ -0,0 +1,190 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/retroshare/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/retroshare/metadata.yaml" +sidebar_label: "RetroShare" +learn_status: "Published" +learn_rel_path: "Data Collection/Media Services" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# RetroShare + + +<img src="https://netdata.cloud/img/retroshare.png" width="150"/> + + +Plugin: python.d.plugin +Module: retroshare + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors RetroShare statistics such as application bandwidth, peers, and DHT metrics. + +It connects to the RetroShare web interface to gather metrics. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +The collector will attempt to connect and detect a RetroShare web interface through http://localhost:9090, even without any configuration. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per RetroShare instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| retroshare.bandwidth | Upload, Download | kilobits/s | +| retroshare.peers | All friends, Connected friends | peers | +| retroshare.dht | DHT nodes estimated, RS nodes estimated | peers | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ retroshare_dht_working ](https://github.com/netdata/netdata/blob/master/health/health.d/retroshare.conf) | retroshare.dht | number of DHT peers | + + +## Setup + +### Prerequisites + +#### RetroShare web interface + +RetroShare needs to be configured to enable the RetroShare WEB Interface and allow access from the Netdata host. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/retroshare.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/retroshare.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| url | The URL to the RetroShare Web UI. | http://localhost:9090 | False | + +</details> + +#### Examples + +##### Local RetroShare Web UI + +A basic configuration for a RetroShare server running on localhost. + +<details><summary>Config</summary> + +```yaml +localhost: + name: 'local retroshare' + url: 'http://localhost:9090' + +``` +</details> + +##### Remote RetroShare Web UI + +A basic configuration for a remote RetroShare server. + +<details><summary>Config</summary> + +```yaml +remote: + name: 'remote retroshare' + url: 'http://1.2.3.4:9090' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `retroshare` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin retroshare debug trace + ``` + + diff --git a/collectors/python.d.plugin/riakkv/README.md b/collectors/python.d.plugin/riakkv/README.md index e822c551e..f43ece09b 100644..120000 --- a/collectors/python.d.plugin/riakkv/README.md +++ b/collectors/python.d.plugin/riakkv/README.md @@ -1,149 +1 @@ -<!-- -title: "Riak KV monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/riakkv/README.md" -sidebar_label: "Riak KV" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Databases" ---> - -# Riak KV collector - -Collects database stats from `/stats` endpoint. - -## Requirements - -- An accessible `/stats` endpoint. See [the Riak KV configuration reference documentation](https://docs.riak.com/riak/kv/2.2.3/configuring/reference/#client-interfaces) - for how to enable this. - -The following charts are included, which are mostly derived from the metrics -listed -[here](https://docs.riak.com/riak/kv/latest/using/reference/statistics-monitoring/index.html#riak-metrics-to-graph). - -1. **Throughput** in operations/s - -- **KV operations** - - gets - - puts - -- **Data type updates** - - counters - - sets - - maps - -- **Search queries** - - queries - -- **Search documents** - - indexed - -- **Strong consistency operations** - - gets - - puts - -2. **Latency** in milliseconds - -- **KV latency** of the past minute - - get (mean, median, 95th / 99th / 100th percentile) - - put (mean, median, 95th / 99th / 100th percentile) - -- **Data type latency** of the past minute - - counter_merge (mean, median, 95th / 99th / 100th percentile) - - set_merge (mean, median, 95th / 99th / 100th percentile) - - map_merge (mean, median, 95th / 99th / 100th percentile) - -- **Search latency** of the past minute - - query (median, min, max, 95th / 99th percentile) - - index (median, min, max, 95th / 99th percentile) - -- **Strong consistency latency** of the past minute - - get (mean, median, 95th / 99th / 100th percentile) - - put (mean, median, 95th / 99th / 100th percentile) - -3. **Erlang VM metrics** - -- **System counters** - - processes - -- **Memory allocation** in MB - - processes.allocated - - processes.used - -4. **General load / health metrics** - -- **Siblings encountered in KV operations** during the past minute - - get (mean, median, 95th / 99th / 100th percentile) - -- **Object size in KV operations** during the past minute in KB - - get (mean, median, 95th / 99th / 100th percentile) - -- **Message queue length** in unprocessed messages - - vnodeq_size (mean, median, 95th / 99th / 100th percentile) - -- **Index operations** encountered by Search - - errors - -- **Protocol buffer connections** - - active - -- **Repair operations coordinated by this node** - - read - -- **Active finite state machines by kind** - - get - - put - - secondary_index - - list_keys - -- **Rejected finite state machines** - - get - - put - -- **Number of writes to Search failed due to bad data format by reason** - - bad_entry - - extract_fail - -## Configuration - -Edit the `python.d/riakkv.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/riakkv.conf -``` - -The module needs to be passed the full URL to Riak's stats endpoint. -For example: - -```yaml -myriak: - url: http://myriak.example.com:8098/stats -``` - -With no explicit configuration given, the module will attempt to connect to -`http://localhost:8098/stats`. - -The default update frequency for the plugin is set to 2 seconds as Riak -internally updates the metrics every second. If we were to update the metrics -every second, the resulting graph would contain odd jitter. -### Troubleshooting - -To troubleshoot issues with the `riakkv` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `riakkv` module in debug mode: - -```bash -./python.d.plugin riakkv debug trace -``` - +integrations/riakkv.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/riakkv/integrations/riakkv.md b/collectors/python.d.plugin/riakkv/integrations/riakkv.md new file mode 100644 index 000000000..f83def446 --- /dev/null +++ b/collectors/python.d.plugin/riakkv/integrations/riakkv.md @@ -0,0 +1,219 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/riakkv/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/riakkv/metadata.yaml" +sidebar_label: "RiakKV" +learn_status: "Published" +learn_rel_path: "Data Collection/Databases" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# RiakKV + + +<img src="https://netdata.cloud/img/riak.svg" width="150"/> + + +Plugin: python.d.plugin +Module: riakkv + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors RiakKV metrics about throughput, latency, resources and more.' + + +This collector reads the database stats from the `/stats` endpoint. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +If the /stats endpoint is accessible, RiakKV instances on the local host running on port 8098 will be autodetected. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per RiakKV instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| riak.kv.throughput | gets, puts | operations/s | +| riak.dt.vnode_updates | counters, sets, maps | operations/s | +| riak.search | queries | queries/s | +| riak.search.documents | indexed | documents/s | +| riak.consistent.operations | gets, puts | operations/s | +| riak.kv.latency.get | mean, median, 95, 99, 100 | ms | +| riak.kv.latency.put | mean, median, 95, 99, 100 | ms | +| riak.dt.latency.counter_merge | mean, median, 95, 99, 100 | ms | +| riak.dt.latency.set_merge | mean, median, 95, 99, 100 | ms | +| riak.dt.latency.map_merge | mean, median, 95, 99, 100 | ms | +| riak.search.latency.query | median, min, 95, 99, 999, max | ms | +| riak.search.latency.index | median, min, 95, 99, 999, max | ms | +| riak.consistent.latency.get | mean, median, 95, 99, 100 | ms | +| riak.consistent.latency.put | mean, median, 95, 99, 100 | ms | +| riak.vm | processes | total | +| riak.vm.memory.processes | allocated, used | MB | +| riak.kv.siblings_encountered.get | mean, median, 95, 99, 100 | siblings | +| riak.kv.objsize.get | mean, median, 95, 99, 100 | KB | +| riak.search.vnodeq_size | mean, median, 95, 99, 100 | messages | +| riak.search.index | errors | errors | +| riak.core.protobuf_connections | active | connections | +| riak.core.repairs | read | repairs | +| riak.core.fsm_active | get, put, secondary index, list keys | fsms | +| riak.core.fsm_rejected | get, put | fsms | +| riak.search.index | bad_entry, extract_fail | writes | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ riakkv_1h_kv_get_mean_latency ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.get | average time between reception of client GET request and subsequent response to client over the last hour | +| [ riakkv_kv_get_slow ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.get | average time between reception of client GET request and subsequent response to the client over the last 3 minutes, compared to the average over the last hour | +| [ riakkv_1h_kv_put_mean_latency ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.put | average time between reception of client PUT request and subsequent response to the client over the last hour | +| [ riakkv_kv_put_slow ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.put | average time between reception of client PUT request and subsequent response to the client over the last 3 minutes, compared to the average over the last hour | +| [ riakkv_vm_high_process_count ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.vm | number of processes running in the Erlang VM | +| [ riakkv_list_keys_active ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.core.fsm_active | number of currently running list keys finite state machines | + + +## Setup + +### Prerequisites + +#### Configure RiakKV to enable /stats endpoint + +You can follow the RiakKV configuration reference documentation for how to enable this. + +Source : https://docs.riak.com/riak/kv/2.2.3/configuring/reference/#client-interfaces + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/riakkv.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/riakkv.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| url | The url of the server | no | True | + +</details> + +#### Examples + +##### Basic (default) + +A basic example configuration per job + +```yaml +local: +url: 'http://localhost:8098/stats' + +``` +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +local: + url: 'http://localhost:8098/stats' + +remote: + url: 'http://192.0.2.1:8098/stats' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `riakkv` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin riakkv debug trace + ``` + + diff --git a/collectors/python.d.plugin/samba/README.md b/collectors/python.d.plugin/samba/README.md index 8fe133fd5..3b63bbab6 100644..120000 --- a/collectors/python.d.plugin/samba/README.md +++ b/collectors/python.d.plugin/samba/README.md @@ -1,144 +1 @@ -<!-- -title: "Samba monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/samba/README.md" -sidebar_label: "Samba" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Apps" ---> - -# Samba collector - -Monitors the performance metrics of Samba file sharing using `smbstatus` command-line tool. - -Executed commands: - -- `sudo -n smbstatus -P` - -## Requirements - -- `smbstatus` program -- `sudo` program -- `smbd` must be compiled with profiling enabled -- `smbd` must be started either with the `-P 1` option or inside `smb.conf` using `smbd profiling level` - -The module uses `smbstatus`, which can only be executed by `root`. It uses -`sudo` and assumes that it is configured such that the `netdata` user can execute `smbstatus` as root without a -password. - -- Add to your `/etc/sudoers` file: - -`which smbstatus` shows the full path to the binary. - -```bash -netdata ALL=(root) NOPASSWD: /path/to/smbstatus -``` - -- Reset Netdata's systemd - unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux - distributions with systemd) - -The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `smbstatus` using `sudo`. - - -As the `root` user, do the following: - -```cmd -mkdir /etc/systemd/system/netdata.service.d -echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf -systemctl daemon-reload -systemctl restart netdata.service -``` - -## Charts - -1. **Syscall R/Ws** in kilobytes/s - - - sendfile - - recvfile - -2. **Smb2 R/Ws** in kilobytes/s - - - readout - - writein - - readin - - writeout - -3. **Smb2 Create/Close** in operations/s - - - create - - close - -4. **Smb2 Info** in operations/s - - - getinfo - - setinfo - -5. **Smb2 Find** in operations/s - - - find - -6. **Smb2 Notify** in operations/s - - - notify - -7. **Smb2 Lesser Ops** as counters - - - tcon - - negprot - - tdis - - cancel - - logoff - - flush - - lock - - keepalive - - break - - sessetup - -## Enable the collector - -The `samba` collector is disabled by default. To enable it, use `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` -file. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d.conf -``` - -Change the value of the `samba` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl -restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - -## Configuration - -Edit the `python.d/samba.conf` configuration file using `edit-config` from the -Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/samba.conf -``` - - - - -### Troubleshooting - -To troubleshoot issues with the `samba` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `samba` module in debug mode: - -```bash -./python.d.plugin samba debug trace -``` - +integrations/samba.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/samba/integrations/samba.md b/collectors/python.d.plugin/samba/integrations/samba.md new file mode 100644 index 000000000..5638c6d94 --- /dev/null +++ b/collectors/python.d.plugin/samba/integrations/samba.md @@ -0,0 +1,220 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/samba/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/samba/metadata.yaml" +sidebar_label: "Samba" +learn_status: "Published" +learn_rel_path: "Data Collection/Storage, Mount Points and Filesystems" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Samba + + +<img src="https://netdata.cloud/img/samba.svg" width="150"/> + + +Plugin: python.d.plugin +Module: samba + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors the performance metrics of Samba file sharing. + +It is using the `smbstatus` command-line tool. + +Executed commands: + +- `sudo -n smbstatus -P` + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + +`smbstatus` is used, which can only be executed by `root`. It uses `sudo` and assumes that it is configured such that the `netdata` user can execute `smbstatus` as root without a password. + + +### Default Behavior + +#### Auto-Detection + +After all the permissions are satisfied, the `smbstatus -P` binary is executed. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Samba instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| syscall.rw | sendfile, recvfile | KiB/s | +| smb2.rw | readout, writein, readin, writeout | KiB/s | +| smb2.create_close | create, close | operations/s | +| smb2.get_set_info | getinfo, setinfo | operations/s | +| smb2.find | find | operations/s | +| smb2.notify | notify | operations/s | +| smb2.sm_counters | tcon, negprot, tdis, cancel, logoff, flush, lock, keepalive, break, sessetup | count | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Enable the samba collector + +The `samba` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. + +```bash +cd /etc/netdata # Replace this path with your Netdata config directory, if different +sudo ./edit-config python.d.conf +``` +Change the value of the `samba` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. + + +#### Permissions and programs + +To run the collector you need: + +- `smbstatus` program +- `sudo` program +- `smbd` must be compiled with profiling enabled +- `smbd` must be started either with the `-P 1` option or inside `smb.conf` using `smbd profiling level` + +The module uses `smbstatus`, which can only be executed by `root`. It uses `sudo` and assumes that it is configured such that the `netdata` user can execute `smbstatus` as root without a password. + +- add to your `/etc/sudoers` file: + + `which smbstatus` shows the full path to the binary. + + ```bash + netdata ALL=(root) NOPASSWD: /path/to/smbstatus + ``` + +- Reset Netdata's systemd unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux distributions with systemd) + + The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `smbstatus` using `sudo`. + + + As the `root` user, do the following: + + ```cmd + mkdir /etc/systemd/system/netdata.service.d + echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf + systemctl daemon-reload + systemctl restart netdata.service + ``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/samba.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/samba.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration. + +<details><summary>Config</summary> + +```yaml +my_job_name: + name: my_name + update_every: 1 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `samba` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin samba debug trace + ``` + + diff --git a/collectors/python.d.plugin/samba/metadata.yaml b/collectors/python.d.plugin/samba/metadata.yaml index 43bca208e..ec31e0475 100644 --- a/collectors/python.d.plugin/samba/metadata.yaml +++ b/collectors/python.d.plugin/samba/metadata.yaml @@ -23,9 +23,9 @@ modules: metrics_description: "This collector monitors the performance metrics of Samba file sharing." method_description: | It is using the `smbstatus` command-line tool. - + Executed commands: - + - `sudo -n smbstatus -P` supported_platforms: include: [] @@ -44,32 +44,41 @@ modules: setup: prerequisites: list: + - title: Enable the samba collector + description: | + The `samba` collector is disabled by default. To enable it, use `edit-config` from the Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`, to edit the `python.d.conf` file. + + ```bash + cd /etc/netdata # Replace this path with your Netdata config directory, if different + sudo ./edit-config python.d.conf + ``` + Change the value of the `samba` setting to `yes`. Save the file and restart the Netdata Agent with `sudo systemctl restart netdata`, or the [appropriate method](https://github.com/netdata/netdata/blob/master/docs/configure/start-stop-restart.md) for your system. - title: Permissions and programs description: | To run the collector you need: - + - `smbstatus` program - `sudo` program - `smbd` must be compiled with profiling enabled - `smbd` must be started either with the `-P 1` option or inside `smb.conf` using `smbd profiling level` - + The module uses `smbstatus`, which can only be executed by `root`. It uses `sudo` and assumes that it is configured such that the `netdata` user can execute `smbstatus` as root without a password. - + - add to your `/etc/sudoers` file: - + `which smbstatus` shows the full path to the binary. - + ```bash netdata ALL=(root) NOPASSWD: /path/to/smbstatus ``` - + - Reset Netdata's systemd unit [CapabilityBoundingSet](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Capabilities) (Linux distributions with systemd) - + The default CapabilityBoundingSet doesn't allow using `sudo`, and is quite strict in general. Resetting is not optimal, but a next-best solution given the inability to execute `smbstatus` using `sudo`. - - + + As the `root` user, do the following: - + ```cmd mkdir /etc/systemd/system/netdata.service.d echo -e '[Service]\nCapabilityBoundingSet=~' | tee /etc/systemd/system/netdata.service.d/unset-capability-bounding-set.conf @@ -82,14 +91,14 @@ modules: options: description: | There are 2 sections: - + * Global variables * One or more JOBS that can define multiple different instances to monitor. - + The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. - + Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. - + Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. folding: title: "Config options" diff --git a/collectors/python.d.plugin/sensors/README.md b/collectors/python.d.plugin/sensors/README.md index 7ee31bd67..4e92b0882 100644..120000 --- a/collectors/python.d.plugin/sensors/README.md +++ b/collectors/python.d.plugin/sensors/README.md @@ -1,55 +1 @@ -<!-- -title: "Linux machine sensors monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/sensors/README.md" -sidebar_label: "sensors-python.d.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Devices" ---> - -# Linux machine sensors collector - -Reads system sensors information (temperature, voltage, electric current, power, etc.). - -Charts are created dynamically. - -## Configuration - -Edit the `python.d/sensors.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/sensors.conf -``` - -### possible issues - -There have been reports from users that on certain servers, ACPI ring buffer errors are printed by the kernel (`dmesg`) -when ACPI sensors are being accessed. We are tracking such cases in -issue [#827](https://github.com/netdata/netdata/issues/827). Please join this discussion for help. - -When `lm-sensors` doesn't work on your device (e.g. for RPi temperatures), -use [the legacy bash collector](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/sensors/README.md) - - -### Troubleshooting - -To troubleshoot issues with the `sensors` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `sensors` module in debug mode: - -```bash -./python.d.plugin sensors debug trace -``` - +integrations/linux_sensors_lm-sensors.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/sensors/integrations/linux_sensors_lm-sensors.md b/collectors/python.d.plugin/sensors/integrations/linux_sensors_lm-sensors.md new file mode 100644 index 000000000..c807d6b3e --- /dev/null +++ b/collectors/python.d.plugin/sensors/integrations/linux_sensors_lm-sensors.md @@ -0,0 +1,186 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/sensors/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/sensors/metadata.yaml" +sidebar_label: "Linux Sensors (lm-sensors)" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Linux Sensors (lm-sensors) + + +<img src="https://netdata.cloud/img/microchip.svg" width="150"/> + + +Plugin: python.d.plugin +Module: sensors + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Examine Linux Sensors metrics with Netdata for insights into hardware health and performance. + +Enhance your system's reliability with real-time hardware health insights. + + +Reads system sensors information (temperature, voltage, electric current, power, etc.) via [lm-sensors](https://hwmon.wiki.kernel.org/lm_sensors). + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +The following type of sensors are auto-detected: +- temperature - fan - voltage - current - power - energy - humidity + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per chip + +Metrics related to chips. Each chip provides a set of the following metrics, each having the chip name in the metric name as reported by `sensors -u`. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| sensors.temperature | a dimension per sensor | Celsius | +| sensors.voltage | a dimension per sensor | Volts | +| sensors.current | a dimension per sensor | Ampere | +| sensors.power | a dimension per sensor | Watt | +| sensors.fan | a dimension per sensor | Rotations/min | +| sensors.energy | a dimension per sensor | Joule | +| sensors.humidity | a dimension per sensor | Percent | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/sensors.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/sensors.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| types | The types of sensors to collect. | temperature, fan, voltage, current, power, energy, humidity | True | +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | + +</details> + +#### Examples + +##### Default + +Default configuration. + +```yaml +types: + - temperature + - fan + - voltage + - current + - power + - energy + - humidity + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `sensors` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin sensors debug trace + ``` + +### lm-sensors doesn't work on your device + + + +### ACPI ring buffer errors are printed + + + + diff --git a/collectors/python.d.plugin/sensors/metadata.yaml b/collectors/python.d.plugin/sensors/metadata.yaml index c3f681915..d7cb2206f 100644 --- a/collectors/python.d.plugin/sensors/metadata.yaml +++ b/collectors/python.d.plugin/sensors/metadata.yaml @@ -117,7 +117,16 @@ modules: - humidity troubleshooting: problems: - list: [] + list: + - name: lm-sensors doesn't work on your device + description: | + When `lm-sensors` doesn't work on your device (e.g. for RPi temperatures), + use [the legacy bash collector](https://github.com/netdata/netdata/blob/master/collectors/charts.d.plugin/sensors/README.md) + - name: ACPI ring buffer errors are printed + description: | + There have been reports from users that on certain servers, ACPI ring buffer errors are printed by the kernel (`dmesg`) + when ACPI sensors are being accessed. We are tracking such cases in issue [#827](https://github.com/netdata/netdata/issues/827). + Please join this discussion for help. alerts: [] metrics: folding: diff --git a/collectors/python.d.plugin/sensors/sensors.chart.py b/collectors/python.d.plugin/sensors/sensors.chart.py index 701bf6414..0d9de3750 100644 --- a/collectors/python.d.plugin/sensors/sensors.chart.py +++ b/collectors/python.d.plugin/sensors/sensors.chart.py @@ -66,7 +66,7 @@ CHARTS = { LIMITS = { 'temperature': [-127, 1000], - 'voltage': [-127, 127], + 'voltage': [-400, 400], 'current': [-127, 127], 'fan': [0, 65535] } diff --git a/collectors/python.d.plugin/smartd_log/README.md b/collectors/python.d.plugin/smartd_log/README.md index e79348b05..63aad6c85 100644..120000 --- a/collectors/python.d.plugin/smartd_log/README.md +++ b/collectors/python.d.plugin/smartd_log/README.md @@ -1,148 +1 @@ -<!-- -title: "Storage devices monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/smartd_log/README.md" -sidebar_label: "S.M.A.R.T. attributes" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Devices" ---> - -# Storage devices collector - -Monitors `smartd` log files to collect HDD/SSD S.M.A.R.T attributes. - -## Requirements - -- `smartmontools` - -It produces following charts for SCSI devices: - -1. **Read Error Corrected** - -2. **Read Error Uncorrected** - -3. **Write Error Corrected** - -4. **Write Error Uncorrected** - -5. **Verify Error Corrected** - -6. **Verify Error Uncorrected** - -7. **Temperature** - -For ATA devices: - -1. **Read Error Rate** - -2. **Seek Error Rate** - -3. **Soft Read Error Rate** - -4. **Write Error Rate** - -5. **SATA Interface Downshift** - -6. **UDMA CRC Error Count** - -7. **Throughput Performance** - -8. **Seek Time Performance** - -9. **Start/Stop Count** - -10. **Power-On Hours Count** - -11. **Power Cycle Count** - -12. **Unexpected Power Loss** - -13. **Spin-Up Time** - -14. **Spin-up Retries** - -15. **Calibration Retries** - -16. **Temperature** - -17. **Reallocated Sectors Count** - -18. **Reserved Block Count** - -19. **Program Fail Count** - -20. **Erase Fail Count** - -21. **Wear Leveller Worst Case Erase Count** - -22. **Unused Reserved NAND Blocks** - -23. **Reallocation Event Count** - -24. **Current Pending Sector Count** - -25. **Offline Uncorrectable Sector Count** - -26. **Percent Lifetime Used** - -## prerequisite - -`smartd` must be running with `-A` option to write smartd attribute information to files. - -For this you need to set `smartd_opts` (or `SMARTD_ARGS`, check _smartd.service_ content) in `/etc/default/smartmontools`: - -``` -# dump smartd attrs info every 600 seconds -smartd_opts="-A /var/log/smartd/ -i 600" -``` - -You may need to create the smartd directory before smartd will write to it: - -```sh -mkdir -p /var/log/smartd -``` - -Otherwise, all the smartd `.csv` files may get written to `/var/lib/smartmontools` (default location). See also <https://linux.die.net/man/8/smartd> for more info on the `-A --attributelog=PREFIX` command. - -`smartd` appends logs at every run. It's strongly recommended to use `logrotate` for smartd files. - -## Configuration - -Edit the `python.d/smartd_log.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/smartd_log.conf -``` - -```yaml -local: - log_path : '/var/log/smartd/' -``` - -If no configuration is given, module will attempt to read log files in `/var/log/smartd/` directory. - - - - -### Troubleshooting - -To troubleshoot issues with the `smartd_log` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `smartd_log` module in debug mode: - -```bash -./python.d.plugin smartd_log debug trace -``` - +integrations/s.m.a.r.t..md
\ No newline at end of file diff --git a/collectors/python.d.plugin/smartd_log/integrations/s.m.a.r.t..md b/collectors/python.d.plugin/smartd_log/integrations/s.m.a.r.t..md new file mode 100644 index 000000000..a943f8704 --- /dev/null +++ b/collectors/python.d.plugin/smartd_log/integrations/s.m.a.r.t..md @@ -0,0 +1,222 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/smartd_log/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/smartd_log/metadata.yaml" +sidebar_label: "S.M.A.R.T." +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# S.M.A.R.T. + + +<img src="https://netdata.cloud/img/smart.png" width="150"/> + + +Plugin: python.d.plugin +Module: smartd_log + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors HDD/SSD S.M.A.R.T. metrics about drive health and performance. + + +It reads `smartd` log files to collect the metrics. + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + + +### Default Behavior + +#### Auto-Detection + +Upon satisfying the prerequisites, the collector will auto-detect metrics if written in either `/var/log/smartd/` or `/var/lib/smartmontools/`. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +The metrics listed below are split in terms of availability on device type, SCSI or ATA. + +### Per S.M.A.R.T. instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | SCSI | ATA | +|:------|:----------|:----|:---:|:---:| +| smartd_log.read_error_rate | a dimension per device | value | | • | +| smartd_log.seek_error_rate | a dimension per device | value | | • | +| smartd_log.soft_read_error_rate | a dimension per device | errors | | • | +| smartd_log.write_error_rate | a dimension per device | value | | • | +| smartd_log.read_total_err_corrected | a dimension per device | errors | • | | +| smartd_log.read_total_unc_errors | a dimension per device | errors | • | | +| smartd_log.write_total_err_corrected | a dimension per device | errors | • | | +| smartd_log.write_total_unc_errors | a dimension per device | errors | • | | +| smartd_log.verify_total_err_corrected | a dimension per device | errors | • | | +| smartd_log.verify_total_unc_errors | a dimension per device | errors | • | | +| smartd_log.sata_interface_downshift | a dimension per device | events | | • | +| smartd_log.udma_crc_error_count | a dimension per device | errors | | • | +| smartd_log.throughput_performance | a dimension per device | value | | • | +| smartd_log.seek_time_performance | a dimension per device | value | | • | +| smartd_log.start_stop_count | a dimension per device | events | | • | +| smartd_log.power_on_hours_count | a dimension per device | hours | | • | +| smartd_log.power_cycle_count | a dimension per device | events | | • | +| smartd_log.unexpected_power_loss | a dimension per device | events | | • | +| smartd_log.spin_up_time | a dimension per device | ms | | • | +| smartd_log.spin_up_retries | a dimension per device | retries | | • | +| smartd_log.calibration_retries | a dimension per device | retries | | • | +| smartd_log.airflow_temperature_celsius | a dimension per device | celsius | | • | +| smartd_log.temperature_celsius | a dimension per device | celsius | • | • | +| smartd_log.reallocated_sectors_count | a dimension per device | sectors | | • | +| smartd_log.reserved_block_count | a dimension per device | percentage | | • | +| smartd_log.program_fail_count | a dimension per device | errors | | • | +| smartd_log.erase_fail_count | a dimension per device | failures | | • | +| smartd_log.wear_leveller_worst_case_erase_count | a dimension per device | erases | | • | +| smartd_log.unused_reserved_nand_blocks | a dimension per device | blocks | | • | +| smartd_log.reallocation_event_count | a dimension per device | events | | • | +| smartd_log.current_pending_sector_count | a dimension per device | sectors | | • | +| smartd_log.offline_uncorrectable_sector_count | a dimension per device | sectors | | • | +| smartd_log.percent_lifetime_used | a dimension per device | percentage | | • | +| smartd_log.media_wearout_indicator | a dimension per device | percentage | | • | +| smartd_log.nand_writes_1gib | a dimension per device | GiB | | • | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Configure `smartd` to write attribute information to files. + +`smartd` must be running with `-A` option to write `smartd` attribute information to files. + +For this you need to set `smartd_opts` (or `SMARTD_ARGS`, check _smartd.service_ content) in `/etc/default/smartmontools`: + +``` +# dump smartd attrs info every 600 seconds +smartd_opts="-A /var/log/smartd/ -i 600" +``` + +You may need to create the smartd directory before smartd will write to it: + +```sh +mkdir -p /var/log/smartd +``` + +Otherwise, all the smartd `.csv` files may get written to `/var/lib/smartmontools` (default location). See also <https://linux.die.net/man/8/smartd> for more info on the `-A --attributelog=PREFIX` command. + +`smartd` appends logs at every run. It's strongly recommended to use `logrotate` for smartd files. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/smartd_log.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/smartd_log.conf +``` +#### Options + +This particular collector does not need further configuration to work if permissions are satisfied, but you can always customize it's data collection behavior. + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| log_path | path to smartd log files. | /var/log/smartd | True | +| exclude_disks | Space-separated patterns. If the pattern is in the drive name, the module will not collect data for it. | | False | +| age | Time in minutes since the last dump to file. | 30 | False | +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Basic + +A basic configuration example. + +```yaml +custom: + name: smartd_log + log_path: '/var/log/smartd/' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `smartd_log` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin smartd_log debug trace + ``` + + diff --git a/collectors/python.d.plugin/spigotmc/README.md b/collectors/python.d.plugin/spigotmc/README.md index f39d9bab6..66e5c9c47 100644..120000 --- a/collectors/python.d.plugin/spigotmc/README.md +++ b/collectors/python.d.plugin/spigotmc/README.md @@ -1,61 +1 @@ -<!-- -title: "SpigotMC monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/spigotmc/README.md" -sidebar_label: "SpigotMC" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# SpigotMC collector - -Performs basic monitoring for Spigot Minecraft servers. - -It provides two charts, one tracking server-side ticks-per-second in -1, 5 and 15 minute averages, and one tracking the number of currently -active users. - -This is not compatible with Spigot plugins which change the format of -the data returned by the `tps` or `list` console commands. - -## Configuration - -Edit the `python.d/spigotmc.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/spigotmc.conf -``` - -```yaml -host: localhost -port: 25575 -password: pass -``` - -By default, a connection to port 25575 on the local system is attempted with an empty password. - - - - -### Troubleshooting - -To troubleshoot issues with the `spigotmc` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `spigotmc` module in debug mode: - -```bash -./python.d.plugin spigotmc debug trace -``` - +integrations/spigotmc.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/spigotmc/integrations/spigotmc.md b/collectors/python.d.plugin/spigotmc/integrations/spigotmc.md new file mode 100644 index 000000000..af330bdd1 --- /dev/null +++ b/collectors/python.d.plugin/spigotmc/integrations/spigotmc.md @@ -0,0 +1,215 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/spigotmc/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/spigotmc/metadata.yaml" +sidebar_label: "SpigotMC" +learn_status: "Published" +learn_rel_path: "Data Collection/Gaming" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# SpigotMC + + +<img src="https://netdata.cloud/img/spigot.jfif" width="150"/> + + +Plugin: python.d.plugin +Module: spigotmc + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors SpigotMC server performance, in the form of ticks per second average, memory utilization, and active users. + + +It sends the `tps`, `list` and `online` commands to the Server, and gathers the metrics from the responses. + + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, this collector will attempt to connect to a Spigot server running on the local host on port `25575`. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per SpigotMC instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| spigotmc.tps | 1 Minute Average, 5 Minute Average, 15 Minute Average | ticks | +| spigotmc.users | Users | users | +| spigotmc.mem | used, allocated, max | MiB | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Enable the Remote Console Protocol + +Under your SpigotMC server's `server.properties` configuration file, you should set `enable-rcon` to `true`. + +This will allow the Server to listen and respond to queries over the rcon protocol. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/spigotmc.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/spigotmc.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| host | The host's IP to connect to. | localhost | True | +| port | The port the remote console is listening on. | 25575 | True | +| password | Remote console password if any. | | False | + +</details> + +#### Examples + +##### Basic + +A basic configuration example. + +```yaml +local: + name: local_server + url: 127.0.0.1 + port: 25575 + +``` +##### Basic Authentication + +An example using basic password for authentication with the remote console. + +<details><summary>Config</summary> + +```yaml +local: + name: local_server_pass + url: 127.0.0.1 + port: 25575 + password: 'foobar' + +``` +</details> + +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +local_server: + name : my_local_server + url : 127.0.0.1 + port: 25575 + +remote_server: + name : another_remote_server + url : 192.0.2.1 + port: 25575 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `spigotmc` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin spigotmc debug trace + ``` + + diff --git a/collectors/python.d.plugin/squid/README.md b/collectors/python.d.plugin/squid/README.md index da5349184..c4e5a03d7 100644..120000 --- a/collectors/python.d.plugin/squid/README.md +++ b/collectors/python.d.plugin/squid/README.md @@ -1,81 +1 @@ -<!-- -title: "Squid monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/squid/README.md" -sidebar_label: "Squid" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# Squid collector - -Monitors one or more squid instances depending on configuration. - -It produces following charts: - -1. **Client Bandwidth** in kilobits/s - - - in - - out - - hits - -2. **Client Requests** in requests/s - - - requests - - hits - - errors - -3. **Server Bandwidth** in kilobits/s - - - in - - out - -4. **Server Requests** in requests/s - - - requests - - errors - -## Configuration - -Edit the `python.d/squid.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/squid.conf -``` - -```yaml -priority : 50000 - -local: - request : 'cache_object://localhost:3128/counters' - host : 'localhost' - port : 3128 -``` - -Without any configuration module will try to autodetect where squid presents its `counters` data - - - - -### Troubleshooting - -To troubleshoot issues with the `squid` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `squid` module in debug mode: - -```bash -./python.d.plugin squid debug trace -``` - +integrations/squid.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/squid/integrations/squid.md b/collectors/python.d.plugin/squid/integrations/squid.md new file mode 100644 index 000000000..484d8706c --- /dev/null +++ b/collectors/python.d.plugin/squid/integrations/squid.md @@ -0,0 +1,198 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/squid/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/squid/metadata.yaml" +sidebar_label: "Squid" +learn_status: "Published" +learn_rel_path: "Data Collection/Web Servers and Web Proxies" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Squid + + +<img src="https://netdata.cloud/img/squid.png" width="150"/> + + +Plugin: python.d.plugin +Module: squid + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors statistics about the Squid Clients and Servers, like bandwidth and requests. + + +It collects metrics from the endpoint where Squid exposes its `counters` data. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, this collector will try to autodetect where Squid presents its `counters` data, by trying various configurations. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Squid instance + +These metrics refer to each monitored Squid instance. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| squid.clients_net | in, out, hits | kilobits/s | +| squid.clients_requests | requests, hits, errors | requests/s | +| squid.servers_net | in, out | kilobits/s | +| squid.servers_requests | requests, errors | requests/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Configure Squid's Cache Manager + +Take a look at [Squid's official documentation](https://wiki.squid-cache.org/Features/CacheManager/Index#controlling-access-to-the-cache-manager) on how to configure access to the Cache Manager. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/squid.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/squid.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 1 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | local | False | +| host | The host to connect to. | | True | +| port | The port to connect to. | | True | +| request | The URL to request from Squid. | | True | + +</details> + +#### Examples + +##### Basic + +A basic configuration example. + +```yaml +example_job_name: + name: 'local' + host: 'localhost' + port: 3128 + request: 'cache_object://localhost:3128/counters' + +``` +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +local_job: + name: 'local' + host: '127.0.0.1' + port: 3128 + request: 'cache_object://127.0.0.1:3128/counters' + +remote_job: + name: 'remote' + host: '192.0.2.1' + port: 3128 + request: 'cache_object://192.0.2.1:3128/counters' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `squid` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin squid debug trace + ``` + + diff --git a/collectors/python.d.plugin/tomcat/README.md b/collectors/python.d.plugin/tomcat/README.md index 923d6238f..997090c35 100644..120000 --- a/collectors/python.d.plugin/tomcat/README.md +++ b/collectors/python.d.plugin/tomcat/README.md @@ -1,76 +1 @@ -<!-- -title: "Apache Tomcat monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/tomcat/README.md" -sidebar_label: "Tomcat" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# Apache Tomcat collector - -Presents memory utilization of tomcat containers. - -Charts: - -1. **Requests** per second - - - accesses - -2. **Volume** in KB/s - - - volume - -3. **Threads** - - - current - - busy - -4. **JVM Free Memory** in MB - - - jvm - -## Configuration - -Edit the `python.d/tomcat.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/tomcat.conf -``` - -```yaml -localhost: - name : 'local' - url : 'http://127.0.0.1:8080/manager/status?XML=true' - user : 'tomcat_username' - pass : 'secret_tomcat_password' -``` - -Without configuration, module attempts to connect to `http://localhost:8080/manager/status?XML=true`, without any credentials. -So it will probably fail. - - - - -### Troubleshooting - -To troubleshoot issues with the `tomcat` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `tomcat` module in debug mode: - -```bash -./python.d.plugin tomcat debug trace -``` - +integrations/tomcat.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/tomcat/integrations/tomcat.md b/collectors/python.d.plugin/tomcat/integrations/tomcat.md new file mode 100644 index 000000000..8210835c1 --- /dev/null +++ b/collectors/python.d.plugin/tomcat/integrations/tomcat.md @@ -0,0 +1,202 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/tomcat/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/tomcat/metadata.yaml" +sidebar_label: "Tomcat" +learn_status: "Published" +learn_rel_path: "Data Collection/Web Servers and Web Proxies" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Tomcat + + +<img src="https://netdata.cloud/img/tomcat.svg" width="150"/> + + +Plugin: python.d.plugin +Module: tomcat + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Tomcat metrics about bandwidth, processing time, threads and more. + + +It parses the information provided by the http endpoint of the `/manager/status` in XML format + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +You need to provide the username and the password, to access the webserver's status page. Create a seperate user with read only rights for this particular endpoint + +### Default Behavior + +#### Auto-Detection + +If the Netdata Agent and the Tomcat webserver are in the same host, without configuration, module attempts to connect to http://localhost:8080/manager/status?XML=true, without any credentials. So it will probably fail. + +#### Limits + +This module is not supporting SSL communication. If you want a Netdata Agent to monitor a Tomcat deployment, you shouldnt try to monitor it via public network (public internet). Credentials are passed by Netdata in an unsecure port + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Tomcat instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| tomcat.accesses | accesses, errors | requests/s | +| tomcat.bandwidth | sent, received | KiB/s | +| tomcat.processing_time | processing time | seconds | +| tomcat.threads | current, busy | current threads | +| tomcat.jvm | free, eden, survivor, tenured, code cache, compressed, metaspace | MiB | +| tomcat.jvm_eden | used, committed, max | MiB | +| tomcat.jvm_survivor | used, committed, max | MiB | +| tomcat.jvm_tenured | used, committed, max | MiB | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Create a read-only `netdata` user, to monitor the `/status` endpoint. + +This is necessary for configuring the collector. + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/tomcat.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/tomcat.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values.Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options per job</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| url | The URL of the Tomcat server's status endpoint. Always add the suffix ?XML=true. | no | True | +| user | A valid user with read permission to access the /manager/status endpoint of the server. Required if the endpoint is password protected | no | False | +| pass | A valid password for the user in question. Required if the endpoint is password protected | no | False | +| connector_name | The connector component that communicates with a web connector via the AJP protocol, e.g ajp-bio-8009 | | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration + +```yaml +localhost: + name : 'local' + url : 'http://localhost:8080/manager/status?XML=true' + +``` +##### Using an IPv4 endpoint + +A typical configuration using an IPv4 endpoint + +<details><summary>Config</summary> + +```yaml +local_ipv4: + name : 'local' + url : 'http://127.0.0.1:8080/manager/status?XML=true' + +``` +</details> + +##### Using an IPv6 endpoint + +A typical configuration using an IPv6 endpoint + +<details><summary>Config</summary> + +```yaml +local_ipv6: + name : 'local' + url : 'http://[::1]:8080/manager/status?XML=true' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `tomcat` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin tomcat debug trace + ``` + + diff --git a/collectors/python.d.plugin/tomcat/metadata.yaml b/collectors/python.d.plugin/tomcat/metadata.yaml index c22f4f58b..e68526073 100644 --- a/collectors/python.d.plugin/tomcat/metadata.yaml +++ b/collectors/python.d.plugin/tomcat/metadata.yaml @@ -45,7 +45,7 @@ modules: prerequisites: list: - title: Create a read-only `netdata` user, to monitor the `/status` endpoint. - description: You will need this configuring the collector + description: This is necessary for configuring the collector. configuration: file: name: "python.d/tomcat.conf" diff --git a/collectors/python.d.plugin/tor/README.md b/collectors/python.d.plugin/tor/README.md index 15f7e2282..7c20cd40a 100644..120000 --- a/collectors/python.d.plugin/tor/README.md +++ b/collectors/python.d.plugin/tor/README.md @@ -1,89 +1 @@ -<!-- -title: "Tor monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/tor/README.md" -sidebar_label: "Tor" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Apps" ---> - -# Tor collector - -Connects to the Tor control port to collect traffic statistics. - -## Requirements - -- `tor` program -- `stem` python package - -It produces only one chart: - -1. **Traffic** - - - read - - write - -## Configuration - -Edit the `python.d/tor.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/tor.conf -``` - -Needs only `control_port`. - -Here is an example for local server: - -```yaml -update_every : 1 -priority : 60000 - -local_tcp: - name: 'local' - control_port: 9051 - password: <password> # if required - -local_socket: - name: 'local' - control_port: '/var/run/tor/control' - password: <password> # if required -``` - -### prerequisite - -Add to `/etc/tor/torrc`: - -``` -ControlPort 9051 -``` - -For more options please read the manual. - -Without configuration, module attempts to connect to `127.0.0.1:9051`. - - - - -### Troubleshooting - -To troubleshoot issues with the `tor` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `tor` module in debug mode: - -```bash -./python.d.plugin tor debug trace -``` - +integrations/tor.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/tor/integrations/tor.md b/collectors/python.d.plugin/tor/integrations/tor.md new file mode 100644 index 000000000..f5c0026af --- /dev/null +++ b/collectors/python.d.plugin/tor/integrations/tor.md @@ -0,0 +1,196 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/tor/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/tor/metadata.yaml" +sidebar_label: "Tor" +learn_status: "Published" +learn_rel_path: "Data Collection/VPNs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Tor + + +<img src="https://netdata.cloud/img/tor.svg" width="150"/> + + +Plugin: python.d.plugin +Module: tor + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Tor bandwidth traffic . + +It connects to the Tor control port to collect traffic statistics. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +If no configuration is provided the collector will try to connect to 127.0.0.1:9051 to detect a running tor instance. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Tor instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| tor.traffic | read, write | KiB/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Required python module + +The `stem` python library needs to be installed. + + +#### Required Tor configuration + +Add to /etc/tor/torrc: + +ControlPort 9051 + +For more options please read the manual. + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/tor.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/tor.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| control_addr | Tor control IP address | 127.0.0.1 | False | +| control_port | Tor control port. Can be either a tcp port, or a path to a socket file. | 9051 | False | +| password | Tor control password | | False | + +</details> + +#### Examples + +##### Local TCP + +A basic TCP configuration. `local_addr` is ommited and will default to `127.0.0.1` + +<details><summary>Config</summary> + +```yaml +local_tcp: + name: 'local' + control_port: 9051 + password: <password> # if required + +``` +</details> + +##### Local socket + +A basic local socket configuration + +<details><summary>Config</summary> + +```yaml +local_socket: + name: 'local' + control_port: '/var/run/tor/control' + password: <password> # if required + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `tor` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin tor debug trace + ``` + + diff --git a/collectors/python.d.plugin/tor/metadata.yaml b/collectors/python.d.plugin/tor/metadata.yaml index d0ecc1a43..8647eca23 100644 --- a/collectors/python.d.plugin/tor/metadata.yaml +++ b/collectors/python.d.plugin/tor/metadata.yaml @@ -39,6 +39,9 @@ modules: setup: prerequisites: list: + - title: 'Required python module' + description: | + The `stem` python library needs to be installed. - title: 'Required Tor configuration' description: | Add to /etc/tor/torrc: diff --git a/collectors/python.d.plugin/uwsgi/README.md b/collectors/python.d.plugin/uwsgi/README.md index 393be9fc5..44b855949 100644..120000 --- a/collectors/python.d.plugin/uwsgi/README.md +++ b/collectors/python.d.plugin/uwsgi/README.md @@ -1,75 +1 @@ -<!-- -title: "uWSGI monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/uwsgi/README.md" -sidebar_label: "uWSGI" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# uWSGI collector - -Monitors performance metrics exposed by [`Stats Server`](https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html). - - -Following charts are drawn: - -1. **Requests** - - - requests per second - - transmitted data - - average request time - -2. **Memory** - - - rss - - vsz - -3. **Exceptions** -4. **Harakiris** -5. **Respawns** - -## Configuration - -Edit the `python.d/uwsgi.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/uwsgi.conf -``` - -```yaml -socket: - name : 'local' - socket : '/tmp/stats.socket' - -localhost: - name : 'local' - host : 'localhost' - port : 1717 -``` - -When no configuration file is found, module tries to connect to TCP/IP socket: `localhost:1717`. - - -### Troubleshooting - -To troubleshoot issues with the `uwsgi` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `uwsgi` module in debug mode: - -```bash -./python.d.plugin uwsgi debug trace -``` - +integrations/uwsgi.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md b/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md new file mode 100644 index 000000000..309265789 --- /dev/null +++ b/collectors/python.d.plugin/uwsgi/integrations/uwsgi.md @@ -0,0 +1,218 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/uwsgi/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/uwsgi/metadata.yaml" +sidebar_label: "uWSGI" +learn_status: "Published" +learn_rel_path: "Data Collection/Web Servers and Web Proxies" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# uWSGI + + +<img src="https://netdata.cloud/img/uwsgi.svg" width="150"/> + + +Plugin: python.d.plugin +Module: uwsgi + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors uWSGI metrics about requests, workers, memory and more. + +It collects every metric exposed from the stats server of uWSGI, either from the `stats.socket` or from the web server's TCP/IP socket. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This collector will auto-detect uWSGI instances deployed on the local host, running on port 1717, or exposing stats on socket `tmp/stats.socket`. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per uWSGI instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| uwsgi.requests | a dimension per worker | requests/s | +| uwsgi.tx | a dimension per worker | KiB/s | +| uwsgi.avg_rt | a dimension per worker | milliseconds | +| uwsgi.memory_rss | a dimension per worker | MiB | +| uwsgi.memory_vsz | a dimension per worker | MiB | +| uwsgi.exceptions | exceptions | exceptions | +| uwsgi.harakiris | harakiris | harakiris | +| uwsgi.respawns | respawns | respawns | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Enable the uWSGI Stats server + +Make sure that you uWSGI exposes it's metrics via a Stats server. + +Source: https://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/uwsgi.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/uwsgi.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | The JOB's name as it will appear at the dashboard (by default is the job_name) | job_name | False | +| socket | The 'path/to/uwsgistats.sock' | no | False | +| host | The host to connect to | no | False | +| port | The port to connect to | no | False | + +</details> + +#### Examples + +##### Basic (default out-of-the-box) + +A basic example configuration, one job will run at a time. Autodetect mechanism uses it by default. As all JOBs have the same name, only one can run at a time. + +<details><summary>Config</summary> + +```yaml +socket: + name : 'local' + socket : '/tmp/stats.socket' + +localhost: + name : 'local' + host : 'localhost' + port : 1717 + +localipv4: + name : 'local' + host : '127.0.0.1' + port : 1717 + +localipv6: + name : 'local' + host : '::1' + port : 1717 + +``` +</details> + +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +<details><summary>Config</summary> + +```yaml +local: + name : 'local' + host : 'localhost' + port : 1717 + +remote: + name : 'remote' + host : '192.0.2.1' + port : 1717 + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `uwsgi` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin uwsgi debug trace + ``` + + diff --git a/collectors/python.d.plugin/varnish/README.md b/collectors/python.d.plugin/varnish/README.md index d30a9fb1d..194be2335 100644..120000 --- a/collectors/python.d.plugin/varnish/README.md +++ b/collectors/python.d.plugin/varnish/README.md @@ -1,88 +1 @@ -<!-- -title: "Varnish Cache monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/varnish/README.md" -sidebar_label: "Varnish Cache" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Webapps" ---> - -# Varnish Cache collector - -Provides HTTP accelerator global, Backends (VBE) and Storages (SMF, SMA, MSE) statistics using `varnishstat` tool. - -Note that both, Varnish-Cache (free and open source) and Varnish-Plus (Commercial/Enterprise version), are supported. - -## Requirements - -- `netdata` user must be a member of the `varnish` group - -## Charts - -This module produces the following charts: - -- Connections Statistics in `connections/s` -- Client Requests in `requests/s` -- All History Hit Rate Ratio in `percent` -- Current Poll Hit Rate Ratio in `percent` -- Expired Objects in `expired/s` -- Least Recently Used Nuked Objects in `nuked/s` -- Number Of Threads In All Pools in `pools` -- Threads Statistics in `threads/s` -- Current Queue Length in `requests` -- Backend Connections Statistics in `connections/s` -- Requests To The Backend in `requests/s` -- ESI Statistics in `problems/s` -- Memory Usage in `MiB` -- Uptime in `seconds` - -For every backend (VBE): - -- Backend Response Statistics in `kilobits/s` - -For every storage (SMF, SMA, or MSE): - -- Storage Usage in `KiB` -- Storage Allocated Objects - -## Configuration - -Edit the `python.d/varnish.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/varnish.conf -``` - -Only one parameter is supported: - -```yaml -instance_name: 'name' -``` - -The name of the `varnishd` instance to get logs from. If not specified, the host name is used. - - - - -### Troubleshooting - -To troubleshoot issues with the `varnish` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `varnish` module in debug mode: - -```bash -./python.d.plugin varnish debug trace -``` - +integrations/varnish.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/varnish/integrations/varnish.md b/collectors/python.d.plugin/varnish/integrations/varnish.md new file mode 100644 index 000000000..142875f4b --- /dev/null +++ b/collectors/python.d.plugin/varnish/integrations/varnish.md @@ -0,0 +1,212 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/varnish/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/varnish/metadata.yaml" +sidebar_label: "Varnish" +learn_status: "Published" +learn_rel_path: "Data Collection/Web Servers and Web Proxies" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Varnish + + +<img src="https://netdata.cloud/img/varnish.svg" width="150"/> + + +Plugin: python.d.plugin +Module: varnish + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors Varnish metrics about HTTP accelerator global, Backends (VBE) and Storages (SMF, SMA, MSE) statistics. + +Note that both, Varnish-Cache (free and open source) and Varnish-Plus (Commercial/Enterprise version), are supported. + + +It uses the `varnishstat` tool in order to collect the metrics. + + +This collector is supported on all platforms. + +This collector only supports collecting metrics from a single instance of this integration. + +`netdata` user must be a member of the `varnish` group. + + +### Default Behavior + +#### Auto-Detection + +By default, if the permissions are satisfied, the `varnishstat` tool will be executed on the host. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Varnish instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| varnish.session_connection | accepted, dropped | connections/s | +| varnish.client_requests | received | requests/s | +| varnish.all_time_hit_rate | hit, miss, hitpass | percentage | +| varnish.current_poll_hit_rate | hit, miss, hitpass | percentage | +| varnish.cached_objects_expired | objects | expired/s | +| varnish.cached_objects_nuked | objects | nuked/s | +| varnish.threads_total | None | number | +| varnish.threads_statistics | created, failed, limited | threads/s | +| varnish.threads_queue_len | in queue | requests | +| varnish.backend_connections | successful, unhealthy, reused, closed, recycled, failed | connections/s | +| varnish.backend_requests | sent | requests/s | +| varnish.esi_statistics | errors, warnings | problems/s | +| varnish.memory_usage | free, allocated | MiB | +| varnish.uptime | uptime | seconds | + +### Per Backend + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| varnish.backend | header, body | kilobits/s | + +### Per Storage + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| varnish.storage_usage | free, allocated | KiB | +| varnish.storage_alloc_objs | allocated | objects | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Provide the necessary permissions + +In order for the collector to work, you need to add the `netdata` user to the `varnish` user group, so that it can execute the `varnishstat` tool: + +``` +usermod -aG varnish netdata +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/varnish.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/varnish.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| instance_name | the name of the varnishd instance to get logs from. If not specified, the local host name is used. | | True | +| update_every | Sets the default data collection frequency. | 10 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | + +</details> + +#### Examples + +##### Basic + +An example configuration. + +```yaml +job_name: + instance_name: '<name-of-varnishd-instance>' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `varnish` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin varnish debug trace + ``` + + diff --git a/collectors/python.d.plugin/varnish/metadata.yaml b/collectors/python.d.plugin/varnish/metadata.yaml index aa245c25f..d31c1cf6f 100644 --- a/collectors/python.d.plugin/varnish/metadata.yaml +++ b/collectors/python.d.plugin/varnish/metadata.yaml @@ -75,8 +75,8 @@ modules: enabled: true list: - name: instance_name - description: the name of the varnishd instance to get logs from. If not specified, the host name is used. - default_value: '<host name>' + description: the name of the varnishd instance to get logs from. If not specified, the local host name is used. + default_value: "" required: true - name: update_every description: Sets the default data collection frequency. diff --git a/collectors/python.d.plugin/w1sensor/README.md b/collectors/python.d.plugin/w1sensor/README.md index ca08b0400..c0fa9cd1b 100644..120000 --- a/collectors/python.d.plugin/w1sensor/README.md +++ b/collectors/python.d.plugin/w1sensor/README.md @@ -1,50 +1 @@ -<!-- -title: "1-Wire Sensors monitoring with Netdata" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/w1sensor/README.md" -sidebar_label: "1-Wire sensors" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Remotes/Devices" ---> - -# 1-Wire Sensors collector - -Monitors sensor temperature. - -On Linux these are supported by the wire, w1_gpio, and w1_therm modules. -Currently temperature sensors are supported and automatically detected. - -Charts are created dynamically based on the number of detected sensors. - -## Configuration - -Edit the `python.d/w1sensor.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/w1sensor.conf -``` - -An example of a working configuration can be found in the default [configuration file](https://github.com/netdata/netdata/blob/master/collectors/python.d.plugin/w1sensor/w1sensor.conf) of this collector. - -### Troubleshooting - -To troubleshoot issues with the `w1sensor` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `w1sensor` module in debug mode: - -```bash -./python.d.plugin w1sensor debug trace -``` - +integrations/1-wire_sensors.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/w1sensor/integrations/1-wire_sensors.md b/collectors/python.d.plugin/w1sensor/integrations/1-wire_sensors.md new file mode 100644 index 000000000..39987743e --- /dev/null +++ b/collectors/python.d.plugin/w1sensor/integrations/1-wire_sensors.md @@ -0,0 +1,166 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/w1sensor/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/w1sensor/metadata.yaml" +sidebar_label: "1-Wire Sensors" +learn_status: "Published" +learn_rel_path: "Data Collection/Hardware Devices and Sensors" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# 1-Wire Sensors + + +<img src="https://netdata.cloud/img/1-wire.png" width="150"/> + + +Plugin: python.d.plugin +Module: w1sensor + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Monitor 1-Wire Sensors metrics with Netdata for optimal environmental conditions monitoring. Enhance your environmental monitoring with real-time insights and alerts. + +The collector uses the wire, w1_gpio, and w1_therm kernel modules. Currently temperature sensors are supported and automatically detected. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +The collector will try to auto detect available 1-Wire devices. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per 1-Wire Sensors instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| w1sensor.temp | a dimension per sensor | Celsius | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Required Linux kernel modules + +Make sure `wire`, `w1_gpio`, and `w1_therm` kernel modules are loaded. + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/w1sensor.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/w1sensor.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | False | +| name_<1-Wire id> | This allows associating a human readable name with a sensor's 1-Wire identifier. | | False | + +</details> + +#### Examples + +##### Provide human readable names + +Associate two 1-Wire identifiers with human readable names. + +```yaml +sensors: + name_00000022276e: 'Machine room' + name_00000022298f: 'Rack 12' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `w1sensor` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin w1sensor debug trace + ``` + + diff --git a/collectors/python.d.plugin/zscores/README.md b/collectors/python.d.plugin/zscores/README.md index dcb685c98..159ce0787 100644..120000 --- a/collectors/python.d.plugin/zscores/README.md +++ b/collectors/python.d.plugin/zscores/README.md @@ -1,158 +1 @@ -# Basic anomaly detection using Z-scores - -By using smoothed, rolling [Z-Scores](https://en.wikipedia.org/wiki/Standard_score) for selected metrics or charts you can narrow down your focus and shorten root cause analysis. - -This collector uses the [Netdata rest api](https://github.com/netdata/netdata/blob/master/web/api/README.md) to get the `mean` and `stddev` -for each dimension on specified charts over a time range (defined by `train_secs` and `offset_secs`). For each dimension -it will calculate a Z-Score as `z = (x - mean) / stddev` (clipped at `z_clip`). Scores are then smoothed over -time (`z_smooth_n`) and, if `mode: 'per_chart'`, aggregated across dimensions to a smoothed, rolling chart level Z-Score -at each time step. - -## Charts - -Two charts are produced: - -- **Z-Score** (`zscores.z`): This chart shows the calculated Z-Score per chart (or dimension if `mode='per_dim'`). -- **Z-Score >3** (`zscores.3stddev`): This chart shows a `1` if the absolute value of the Z-Score is greater than 3 or - a `0` otherwise. - -Below is an example of the charts produced by this collector and a typical example of how they would look when things -are 'normal' on the system. Most of the zscores tend to bounce randomly around a range typically between 0 to +3 (or -3 -to +3 if `z_abs: 'false'`), a few charts might stay steady at a more constant higher value depending on your -configuration and the typical workload on your system (typically those charts that do not change that much have a -smaller range of values on which to calculate a zscore and so tend to have a higher typical zscore). - -So really its a combination of the zscores values themselves plus, perhaps more importantly, how they change when -something strange occurs on your system which can be most useful. - -![zscores-collector-normal](https://user-images.githubusercontent.com/2178292/108776300-21d44d00-755a-11eb-92a4-ecb8f7d2f175.png) - -For example, if we go onto the system and run a command -like [`stress-ng --all 2`](https://wiki.ubuntu.com/Kernel/Reference/stress-ng) to create some stress, we see many charts -begin to have zscores that jump outside the typical range. When the absolute zscore for a chart is greater than 3 you -will see a corresponding line appear on the `zscores.3stddev` chart to make it a bit clearer what charts might be worth -looking at first (for more background information on why 3 stddev -see [here](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule#:~:text=In%20the%20empirical%20sciences%20the,99.7%25%20probability%20as%20near%20certainty.)) -. - -In the example below we basically took a sledge hammer to our system so its not surprising that lots of charts light up -after we run the stress command. In a more realistic setting you might just see a handful of charts with strange zscores -and that could be a good indication of where to look first. - -![zscores-collector-abnormal](https://user-images.githubusercontent.com/2178292/108776316-28fb5b00-755a-11eb-80de-ec5d38089ecc.png) - -Then as the issue passes the zscores should settle back down into their normal range again as they are calculated in a -rolling and smoothed way (as defined by your `zscores.conf` file). - -![zscores-collector-normal-again](https://user-images.githubusercontent.com/2178292/108776439-4fb99180-755a-11eb-8bb7-b4df144cb44c.png) - -## Requirements - -This collector will only work with Python 3 and requires the below packages be installed. - -```bash -# become netdata user -sudo su -s /bin/bash netdata -# install required packages -pip3 install numpy pandas requests netdata-pandas==0.0.38 -``` - -## Configuration - -Install the underlying Python requirements, Enable the collector and restart Netdata. - -```bash -cd /etc/netdata/ -sudo ./edit-config python.d.conf -# Set `zscores: no` to `zscores: yes` -sudo systemctl restart netdata -``` - -The configuration for the zscores collector defines how it will behave on your system and might take some -experimentation with over time to set it optimally. Out of the box, the config comes with -some [sane defaults](https://www.netdata.cloud/blog/redefining-monitoring-netdata/) to get you started. - -If you are unsure about any of the below configuration options then it's best to just ignore all this and leave -the `zscores.conf` files alone to begin with. Then you can return to it later if you would like to tune things a bit -more once the collector is running for a while. - -Edit the `python.d/zscores.conf` configuration file using `edit-config` from the your -agent's [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory), which is -usually at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/zscores.conf -``` - -The default configuration should look something like this. Here you can see each parameter (with sane defaults) and some -information about each one and what it does. - -```bash -# what host to pull data from -host: '127.0.0.1:19999' -# What charts to pull data for - A regex like 'system\..*|' or 'system\..*|apps.cpu|apps.mem' etc. -charts_regex: 'system\..*' -# length of time to base calculations off for mean and stddev -train_secs: 14400 # use last 4 hours to work out the mean and stddev for the zscore -# offset preceding latest data to ignore when calculating mean and stddev -offset_secs: 300 # ignore last 5 minutes of data when calculating the mean and stddev -# recalculate the mean and stddev every n steps of the collector -train_every_n: 900 # recalculate mean and stddev every 15 minutes -# smooth the z score by averaging it over last n values -z_smooth_n: 15 # take a rolling average of the last 15 zscore values to reduce sensitivity to temporary 'spikes' -# cap absolute value of zscore (before smoothing) for better stability -z_clip: 10 # cap each zscore at 10 so as to avoid really large individual zscores swamping any rolling average -# set z_abs: 'true' to make all zscores be absolute values only. -z_abs: 'true' -# burn in period in which to initially calculate mean and stddev on every step -burn_in: 2 # on startup of the collector continually update the mean and stddev in case any gaps or initial calculations fail to return -# mode can be to get a zscore 'per_dim' or 'per_chart' -mode: 'per_chart' # 'per_chart' means individual dimension level smoothed zscores will be aggregated to one zscore per chart per time step -# per_chart_agg is how you aggregate from dimension to chart when mode='per_chart' -per_chart_agg: 'mean' # 'absmax' will take the max absolute value across all dimensions but will maintain the sign. 'mean' will just average. -``` - -## Notes - -- Python 3 is required as the [`netdata-pandas`](https://github.com/netdata/netdata-pandas) package uses python async - libraries ([asks](https://pypi.org/project/asks/) and [trio](https://pypi.org/project/trio/)) to make asynchronous - calls to the netdata rest api to get the required data for each chart when calculating the mean and stddev. -- It may take a few hours or so for the collector to 'settle' into it's typical behaviour in terms of the scores you - will see in the normal running of your system. -- The zscore you see for each chart when using `mode: 'per_chart'` as actually an aggregated zscore across all the - dimensions on the underlying chart. -- If you set `mode: 'per_dim'` then you will see a zscore for each dimension on each chart as opposed to one per chart. -- As this collector does some calculations itself in python you may want to try it out first on a test or development - system to get a sense of its performance characteristics. Most of the work in calculating the mean and stddev will be - pushed down to the underlying Netdata C libraries via the rest api. But some data wrangling and calculations are then - done using [Pandas](https://pandas.pydata.org/) and [Numpy](https://numpy.org/) within the collector itself. -- On a development n1-standard-2 (2 vCPUs, 7.5 GB memory) vm running Ubuntu 18.04 LTS and not doing any work some of the - typical performance characteristics we saw from running this collector were: - - A runtime (`netdata.runtime_zscores`) of ~50ms when doing scoring and ~500ms when recalculating the mean and - stddev. - - Typically 3%-3.5% cpu usage from scoring, jumping to ~35% for one second when recalculating the mean and stddev. - - About ~50mb of ram (`apps.mem`) being continually used by the `python.d.plugin`. -- If you activate this collector on a fresh node, it might take a little while to build up enough data to calculate a - proper zscore. So until you actually have `train_secs` of available data the mean and stddev calculated will be subject - to more noise. -### Troubleshooting - -To troubleshoot issues with the `zscores` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `zscores` module in debug mode: - -```bash -./python.d.plugin zscores debug trace -``` - +integrations/python.d_zscores.md
\ No newline at end of file diff --git a/collectors/python.d.plugin/zscores/integrations/python.d_zscores.md b/collectors/python.d.plugin/zscores/integrations/python.d_zscores.md new file mode 100644 index 000000000..1ebe865f0 --- /dev/null +++ b/collectors/python.d.plugin/zscores/integrations/python.d_zscores.md @@ -0,0 +1,194 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/zscores/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/python.d.plugin/zscores/metadata.yaml" +sidebar_label: "python.d zscores" +learn_status: "Published" +learn_rel_path: "Data Collection/Other" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# python.d zscores + +Plugin: python.d.plugin +Module: zscores + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +By using smoothed, rolling [Z-Scores](https://en.wikipedia.org/wiki/Standard_score) for selected metrics or charts you can narrow down your focus and shorten root cause analysis. + + +This collector uses the [Netdata rest api](https://github.com/netdata/netdata/blob/master/web/api/README.md) to get the `mean` and `stddev` +for each dimension on specified charts over a time range (defined by `train_secs` and `offset_secs`). + +For each dimension it will calculate a Z-Score as `z = (x - mean) / stddev` (clipped at `z_clip`). Scores are then smoothed over +time (`z_smooth_n`) and, if `mode: 'per_chart'`, aggregated across dimensions to a smoothed, rolling chart level Z-Score at each time step. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per python.d zscores instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| zscores.z | a dimension per chart or dimension | z | +| zscores.3stddev | a dimension per chart or dimension | count | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Python Requirements + +This collector will only work with Python 3 and requires the below packages be installed. + +```bash +# become netdata user +sudo su -s /bin/bash netdata +# install required packages +pip3 install numpy pandas requests netdata-pandas==0.0.38 +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/zscores.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/zscores.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| charts_regex | what charts to pull data for - A regex like `system\..*/` or `system\..*/apps.cpu/apps.mem` etc. | system\..* | True | +| train_secs | length of time (in seconds) to base calculations off for mean and stddev. | 14400 | True | +| offset_secs | offset (in seconds) preceding latest data to ignore when calculating mean and stddev. | 300 | True | +| train_every_n | recalculate the mean and stddev every n steps of the collector. | 900 | True | +| z_smooth_n | smooth the z score (to reduce sensitivity to spikes) by averaging it over last n values. | 15 | True | +| z_clip | cap absolute value of zscore (before smoothing) for better stability. | 10 | True | +| z_abs | set z_abs: 'true' to make all zscores be absolute values only. | true | True | +| burn_in | burn in period in which to initially calculate mean and stddev on every step. | 2 | True | +| mode | mode can be to get a zscore 'per_dim' or 'per_chart'. | per_chart | True | +| per_chart_agg | per_chart_agg is how you aggregate from dimension to chart when mode='per_chart'. | mean | True | +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | + +</details> + +#### Examples + +##### Default + +Default configuration. + +```yaml +local: + name: 'local' + host: '127.0.0.1:19999' + charts_regex: 'system\..*' + charts_to_exclude: 'system.uptime' + train_secs: 14400 + offset_secs: 300 + train_every_n: 900 + z_smooth_n: 15 + z_clip: 10 + z_abs: 'true' + burn_in: 2 + mode: 'per_chart' + per_chart_agg: 'mean' + +``` + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `zscores` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin zscores debug trace + ``` + + diff --git a/collectors/slabinfo.plugin/README.md b/collectors/slabinfo.plugin/README.md index abcbe1e3f..4d4629a77 100644..120000 --- a/collectors/slabinfo.plugin/README.md +++ b/collectors/slabinfo.plugin/README.md @@ -1,36 +1 @@ -<!-- -title: "slabinfo.plugin" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/slabinfo.plugin/README.md" -sidebar_label: "slabinfo.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/System metrics" ---> - -# slabinfo.plugin - -SLAB is a cache mechanism used by the Kernel to avoid fragmentation. - -Each internal structure (process, file descriptor, inode...) is stored within a SLAB. - -## configuring Netdata for slabinfo - -The plugin is disabled by default because it collects and displays a huge amount of metrics. -To enable it set `slabinfo = yes` in the `plugins` section of the `netdata.conf` configuration file. - -If you are using [our official native DEB/RPM packages](https://github.com/netdata/netdata/blob/master/packaging/installer/methods/packages.md), you will additionally need to install the `netdata-plugin-slabinfo` -package using your system package manager. - -There is currently no configuration needed for the plugin itself. - -As `/proc/slabinfo` is only readable by root, this plugin is setuid root. - -## For what use - -This slabinfo details allows to have clues on actions done on your system. -In the following screenshot, you can clearly see a `find` done on a ext4 filesystem (the number of `ext4_inode_cache` & `dentry` are rising fast), and a few seconds later, an admin issued a `echo 3 > /proc/sys/vm/drop_cached` as their count dropped. - -![netdata_slabinfo](https://user-images.githubusercontent.com/9157986/64433811-7f06e500-d0bf-11e9-8e1e-087497e61033.png) - - - +integrations/linux_kernel_slab_allocator_statistics.md
\ No newline at end of file diff --git a/collectors/slabinfo.plugin/integrations/linux_kernel_slab_allocator_statistics.md b/collectors/slabinfo.plugin/integrations/linux_kernel_slab_allocator_statistics.md new file mode 100644 index 000000000..54ccf605f --- /dev/null +++ b/collectors/slabinfo.plugin/integrations/linux_kernel_slab_allocator_statistics.md @@ -0,0 +1,130 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/slabinfo.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/slabinfo.plugin/metadata.yaml" +sidebar_label: "Linux kernel SLAB allocator statistics" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Kernel" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Linux kernel SLAB allocator statistics + + +<img src="https://netdata.cloud/img/linuxserver.svg" width="150"/> + + +Plugin: slabinfo.plugin +Module: slabinfo.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Collects metrics on kernel SLAB cache utilization to monitor the low-level performance impact of workloads in the kernel. + + +The plugin parses `/proc/slabinfo` + +This collector is only supported on the following platforms: + +- Linux + +This collector only supports collecting metrics from a single instance of this integration. + +This integration requires read access to `/proc/slabinfo`, which is accessible only to the root user by default. Netdata uses Linux Capabilities to give the plugin access to this file. `CAP_DAC_READ_SEARCH` is added automatically during installation. This capability allows bypassing file read permission checks and directory read and execute permission checks. If file capabilities are not usable, then the plugin is instead installed with the SUID bit set in permissions sVko that it runs as root. + + +### Default Behavior + +#### Auto-Detection + +Due to the large number of metrics generated by this integration, it is disabled by default and must be manually enabled inside `/etc/netdata/netdata.conf` + + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +SLAB cache utilization metrics for the whole system. + +### Per Linux kernel SLAB allocator statistics instance + + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| mem.slabmemory | a dimension per cache | B | +| mem.slabfilling | a dimension per cache | % | +| mem.slabwaste | a dimension per cache | B | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Minimum setup + +If you installed `netdata` using a package manager, it is also necessary to install the package `netdata-plugin-slabinfo`. + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugins]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>The main configuration file.</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| Enable plugin | As described above plugin is disabled by default, this option is used to enable plugin. | no | True | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/slabinfo.plugin/metadata.yaml b/collectors/slabinfo.plugin/metadata.yaml index 7d135d611..f19778297 100644 --- a/collectors/slabinfo.plugin/metadata.yaml +++ b/collectors/slabinfo.plugin/metadata.yaml @@ -50,7 +50,9 @@ modules: description: "" setup: prerequisites: - list: [] + list: + - title: Minimum setup + description: "If you installed `netdata` using a package manager, it is also necessary to install the package `netdata-plugin-slabinfo`." configuration: file: name: "netdata.conf" diff --git a/collectors/slabinfo.plugin/slabinfo.c b/collectors/slabinfo.plugin/slabinfo.c index 25b96e386..366cba643 100644 --- a/collectors/slabinfo.plugin/slabinfo.c +++ b/collectors/slabinfo.plugin/slabinfo.c @@ -343,6 +343,8 @@ int main(int argc, char **argv) { program_version = "0.1"; error_log_syslog = 0; + log_set_global_severity_for_external_plugins(); + int update_every = 1, i, n, freq = 0; for (i = 1; i < argc; i++) { diff --git a/collectors/statsd.plugin/README.md b/collectors/statsd.plugin/README.md index dd74923ec..d80849dba 100644 --- a/collectors/statsd.plugin/README.md +++ b/collectors/statsd.plugin/README.md @@ -36,7 +36,7 @@ Netdata ships with a few synthetic chart definitions to automatically present ap more uniform way. These synthetic charts are configuration files (you can create your own) that re-arrange statsd metrics into a more meaningful way. -On synthetic charts, we can have alarms as with any metric and chart. +On synthetic charts, we can have alerts as with any metric and chart. - [K6 load testing tool](https://k6.io) - **Description:** k6 is a developer-centric, free and open-source load testing tool built for making performance testing a productive and enjoyable experience. @@ -348,11 +348,11 @@ Using the above configuration `myapp` should get its own section on the dashboar - `gaps when not collected = yes|no`, enables or disables gaps on the charts of the application in case that no metrics are collected. - `memory mode` sets the memory mode for all charts of the application. The default is the global default for Netdata (not the global default for StatsD private charts). We suggest not to use this (we have commented it out in the example) and let your app use the global default for Netdata, which is our dbengine. -- `history` sets the size of the round robin database for this application. The default is the global default for Netdata (not the global default for StatsD private charts). This is only relevant if you use `memory mode = save`. Read more on our [metrics storage(]/docs/store/change-metrics-storage.md) doc. +- `history` sets the size of the round-robin database for this application. The default is the global default for Netdata (not the global default for StatsD private charts). This is only relevant if you use `memory mode = save`. Read more on our [metrics storage(]/docs/store/change-metrics-storage.md) doc. `[dictionary]` defines name-value associations. These are used to renaming metrics, when added to synthetic charts. Metric names are also defined at each `dimension` line. However, using the dictionary dimension names can be declared globally, for each app and is the only way to rename dimensions when using patterns. Of course the dictionary can be empty or missing. -Then, add any number of charts. Each chart should start with `[id]`. The chart will be called `app_name.id`. `family` controls the submenu on the dashboard. `context` controls the alarm templates. `priority` controls the ordering of the charts on the dashboard. The rest of the settings are informational. +Then, add any number of charts. Each chart should start with `[id]`. The chart will be called `app_name.id`. `family` controls the submenu on the dashboard. `context` controls the alert templates. `priority` controls the ordering of the charts on the dashboard. The rest of the settings are informational. Add any number of metrics to a chart, using `dimension` lines. These lines accept 5 space separated parameters: @@ -361,7 +361,7 @@ Add any number of metrics to a chart, using `dimension` lines. These lines accep 3. an optional selector (type) of the value to shown (see below) 4. an optional multiplier 5. an optional divider -6. optional flags, space separated and enclosed in quotes. All the external plugins `DIMENSION` flags can be used. Currently the only usable flag is `hidden`, to add the dimension, but not show it on the dashboard. This is usually needed to have the values available for percentage calculation, or use them in alarms. +6. optional flags, space separated and enclosed in quotes. All the external plugins `DIMENSION` flags can be used. Currently, the only usable flag is `hidden`, to add the dimension, but not show it on the dashboard. This is usually needed to have the values available for percentage calculation, or use them in alerts. So, the format is this: @@ -439,7 +439,7 @@ Use the dictionary in 2 ways: 1. set `dimension = myapp.metric1 ''` and have at the dictionary `myapp.metric1 = metric1 name` 2. set `dimension = myapp.metric1 'm1'` and have at the dictionary `m1 = metric1 name` -In both cases, the dimension will be added with ID `myapp.metric1` and will be named `metric1 name`. So, in alarms use either of the 2 as `${myapp.metric1}` or `${metric1 name}`. +In both cases, the dimension will be added with ID `myapp.metric1` and will be named `metric1 name`. So, in alerts use either of the 2 as `${myapp.metric1}` or `${metric1 name}`. > keep in mind that if you add multiple times the same StatsD metric to a chart, Netdata will append `TYPE` to the dimension ID, so `myapp.metric1` will be added as `myapp.metric1_last` or `myapp.metric1_events`, etc. If you add multiple times the same metric with the same `TYPE` to a chart, Netdata will also append an incremental counter to the dimension ID, i.e. `myapp.metric1_last1`, `myapp.metric1_last2`, etc. diff --git a/collectors/systemd-journal.plugin/README.md b/collectors/systemd-journal.plugin/README.md index e69de29bb..51aa1b7cd 100644 --- a/collectors/systemd-journal.plugin/README.md +++ b/collectors/systemd-journal.plugin/README.md @@ -0,0 +1,673 @@ + +# `systemd` journal plugin + +[KEY FEATURES](#key-features) | [JOURNAL SOURCES](#journal-sources) | [JOURNAL FIELDS](#journal-fields) | +[PLAY MODE](#play-mode) | [FULL TEXT SEARCH](#full-text-search) | [PERFORMANCE](#query-performance) | +[CONFIGURATION](#configuration-and-maintenance) | [FAQ](#faq) + +The `systemd` journal plugin by Netdata makes viewing, exploring and analyzing `systemd` journal logs simple and +efficient. +It automatically discovers available journal sources, allows advanced filtering, offers interactive visual +representations and supports exploring the logs of both individual servers and the logs on infrastructure wide +journal centralization servers. + +![image](https://github.com/netdata/netdata/assets/2662304/691b7470-ec56-430c-8b81-0c9e49012679) + +## Key features + +- Works on both **individual servers** and **journal centralization servers**. +- Supports `persistent` and `volatile` journals. +- Supports `system`, `user`, `namespaces` and `remote` journals. +- Allows filtering on **any journal field** or **field value**, for any time-frame. +- Allows **full text search** (`grep`) on all journal fields, for any time-frame. +- Provides a **histogram** for log entries over time, with a break down per field-value, for any field and any + time-frame. +- Works directly on journal files, without any other third-party components. +- Supports coloring log entries, the same way `journalctl` does. +- In PLAY mode provides the same experience as `journalctl -f`, showing new log entries immediately after they are + received. + +### Prerequisites + +`systemd-journal.plugin` is a Netdata Function Plugin. + +To protect your privacy, as with all Netdata Functions, a free Netdata Cloud user account is required to access it. +For more information check [this discussion](https://github.com/netdata/netdata/discussions/16136). + +### Limitations + +#### Plugin availability + +The following are limitations related to the availability of the plugin: + +- This plugin is not available when Netdata is installed in a container. The problem is that `libsystemd` is not + available in Alpine Linux (there is a `libsystemd`, but it is a dummy that returns failure on all calls). We plan to + change this, by shipping Netdata containers based on Debian. +- For the same reason (lack of `systemd` support for Alpine Linux), the plugin is not available on `static` builds of + Netdata (which are based on `muslc`, not `glibc`). +- On old systemd systems (like Centos 7), the plugin runs always in "full data query" mode, which makes it slower. The + reason, is that systemd API is missing some important calls we need to use the field indexes of `systemd` journal. + However, when running in this mode, the plugin offers also negative matches on the data (like filtering for all logs + that do not have set some field), and this is the reason "full data query" mode is also offered as an option even on + newer versions of `systemd`. + +To use the plugin, install one of our native distribution packages, or install it from source. + +#### `systemd` journal features + +The following are limitations related to the features of `systemd` journal: + +- This plugin does not support binary field values. `systemd` journal has the ability to assign fields with binary data. + This plugin assumes all fields contain text values (text in this context includes numbers). +- This plugin does not support multiple values per field for any given log entry. `systemd` journal has the ability to + accept the same field key, multiple times, with multiple values on a single log entry. This plugin will present the + last value and ignore the others for this log entry. +- This plugin will only read journal files located in `/var/log/journal` or `/run/log/journal`. `systemd-remote` has the + ability to store journal files anywhere (user configured). If journal files are not located in `/var/log/journal` + or `/run/log/journal` (and any of their subdirectories), the plugin will not find them. + +Other than the above, this plugin supports all features of `systemd` journals. + +## Journal Sources + +The plugin automatically detects the available journal sources, based on the journal files available in +`/var/log/journal` (persistent logs) and `/run/log/journal` (volatile logs). + +![journal-sources](https://github.com/netdata/netdata/assets/2662304/28e63a3e-6809-4586-b3b0-80755f340e31) + +The plugin, by default, merges all journal sources together, to provide a unified view of all log messages available. + +> To improve query performance, we recommend selecting the relevant journal source, before doing more analysis on the +> logs. + +### `system` journals + +`system` journals are the default journals available on all `systemd` based systems. + +`system` journals contain: + +- kernel log messages (via `kmsg`), +- audit records, originating from the kernel audit subsystem, +- messages received by `systemd-journald` via `syslog`, +- messages received via the standard output and error of service units, +- structured messages received via the native journal API. + +### `user` journals + +Unlike `journalctl`, the Netdata plugin allows viewing, exploring and querying the journal files of **all users**. + +By default, each user, with a UID outside the range of system users (0 - 999), dynamic service users, +and the nobody user (65534), will get their own set of `user` journal files. For more information about +this policy check [Users, Groups, UIDs and GIDs on systemd Systems](https://systemd.io/UIDS-GIDS/). + +Keep in mind that `user` journals are merged with the `system` journals when they are propagated to a journal +centralization server. So, at the centralization server, the `remote` journals contain both the `system` and `user` +journals of the sender. + +### `namespaces` journals + +The plugin auto-detects the namespaces available and provides a list of all namespaces at the "sources" list on the UI. + +Journal namespaces are both a mechanism for logically isolating the log stream of projects consisting +of one or more services from the rest of the system and a mechanism for improving performance. + +`systemd` service units may be assigned to a specific journal namespace through the `LogNamespace=` unit file setting. + +Keep in mind that namespaces require special configuration to be propagated to a journal centralization server. +This makes them a little more difficult to handle, from the administration perspective. + +### `remote` journals + +Remote journals are created by `systemd-journal-remote`. This `systemd` feature allows creating logs centralization +points within your infrastructure, based exclusively on `systemd`. + +Usually `remote` journals are named by the IP of the server sending these logs. The Netdata plugin automatically +extracts these IPs and performs a reverse DNS lookup to find their hostnames. When this is successful, +`remote` journals are named by the hostnames of the origin servers. + +For information about configuring a journals' centralization server, +check [this FAQ item](#how-do-i-configure-a-journals-centralization-server). + +## Journal Fields + +`systemd` journals are designed to support multiple fields per log entry. The power of `systemd` journals is that, +unlike other log management systems, it supports dynamic and variable fields for each log message, +while all fields and their values are indexed for fast querying. + +This means that each application can log messages annotated with its own unique fields and values, and `systemd` +journals will automatically index all of them, without any configuration or manual action. + +For a description of the most frequent fields found in `systemd` journals, check `man systemd.journal-fields`. + +Fields found in the journal files are automatically added to the UI in multiple places to help you explore +and filter the data. + +The plugin automatically enriches certain fields to make them more user-friendly: + +- `_BOOT_ID`: the hex value is annotated with the timestamp of the first message encountered for this boot id. +- `PRIORITY`: the numeric value is replaced with the human-readable name of each priority. +- `SYSLOG_FACILITY`: the encoded value is replaced with the human-readable name of each facility. +- `ERRNO`: the numeric value is annotated with the short name of each value. +- `_UID` `_AUDIT_LOGINUID`, `_SYSTEMD_OWNER_UID`, `OBJECT_UID`, `OBJECT_SYSTEMD_OWNER_UID`, `OBJECT_AUDIT_LOGINUID`: + the local user database is consulted to annotate them with usernames. +- `_GID`, `OBJECT_GID`: the local group database is consulted to annotate them with group names. +- `_CAP_EFFECTIVE`: the encoded value is annotated with a human-readable list of the linux capabilities. +- `_SOURCE_REALTIME_TIMESTAMP`: the numeric value is annotated with human-readable datetime in UTC. + +The values of all other fields are presented as found in the journals. + +> IMPORTANT: +> The UID and GID annotations are added during presentation and are taken from the server running the plugin. +> For `remote` sources, the names presented may not reflect the actual user and group names on the origin server. +> The numeric value will still be visible though, as-is on the origin server. + +The annotations are not searchable with full-text search. They are only added for the presentation of the fields. + +### Journal fields as columns in the table + +All journal fields available in the journal files are offered as columns on the UI. Use the gear button above the table: + +![image](https://github.com/netdata/netdata/assets/2662304/cd75fb55-6821-43d4-a2aa-033792c7f7ac) + +### Journal fields as additional info to each log entry + +When you click a log line, the `info` sidebar will open on the right of the screen, to provide the full list of fields +related to this log line. You can close this `info` sidebar, by selecting the filter icon at its top. + +![image](https://github.com/netdata/netdata/assets/2662304/3207794c-a61b-444c-8ffe-6c07cbc90ae2) + +### Journal fields as filters + +The plugin presents a select list of fields as filters to the query, with counters for each of the possible values +for the field. This list can used to quickly check which fields and values are available for the entire time-frame +of the query. + +Internally the plugin has: + +1. A white-list of fields, to be presented as filters. +2. A black-list of fields, to prevent them from becoming filters. This list includes fields with a very high + cardinality, like timestamps, unique message ids, etc. This is mainly for protecting the server's performance, + to avoid building in memory indexes for the fields that almost each of their values is unique. + +Keep in mind that the values presented in the filters, and their sorting is affected by the "full data queries" +setting: + +![image](https://github.com/netdata/netdata/assets/2662304/ac710d46-07c2-487b-8ce3-e7f767b9ae0f) + +When "full data queries" is off, empty values are hidden and cannot be selected. This is due to a limitation of +`libsystemd` that does not allow negative or empty matches. Also, values with zero counters may appear in the list. + +When "full data queries" is on, Netdata is applying all filtering to the data (not `libsystemd`), but this means +that all the data of the entire time-frame, without any filtering applied, have to be read by the plugin to prepare +the response required. So, "full data queries" can be significantly slower over long time-frames. + +### Journal fields as histogram sources + +The plugin presents a histogram of the number of log entries across time. + +The data source of this histogram can be any of the fields that are available as filters. +For each of the values this field has, across the entire time-frame of the query, the histogram will get corresponding +dimensions, showing the number of log entries, per value, over time. + +The granularity of the histogram is adjusted automatically to have about 150 columns visible on screen. + +The histogram presented by the plugin is interactive: + +- **Zoom**, either with the global date-time picker, or the zoom tool in the histogram's toolbox. +- **Pan**, either with global date-time picker, or by dragging with the mouse the chart to the left or the right. +- **Click**, to quickly jump to the highlighted point in time in the log entries. + +![image](https://github.com/netdata/netdata/assets/2662304/d3dcb1d1-daf4-49cf-9663-91b5b3099c2d) + +## PLAY mode + +The plugin supports PLAY mode, to continuously update the screen with new log entries found in the journal files. +Just hit the "play" button at the top of the Netdata dashboard screen. + +On centralized log servers, PLAY mode provides a unified view of all the new logs encountered across the entire +infrastructure, +from all hosts sending logs to the central logs server via `systemd-remote`. + +## Full-text search + +The plugin supports searching for any text on all fields of the log entries. + +Full text search is combined with the selected filters. + +The text box accepts asterisks `*` as wildcards. So, `a*b*c` means match anything that contains `a`, then `b` and +then `c` with anything between them. + +## Query performance + +Journal files are designed to be accessed by multiple readers and one writer, concurrently. + +Readers (like this Netdata plugin), open the journal files and `libsystemd`, behind the scenes, maps regions +of the files into memory, to satisfy each query. + +On logs aggregation servers, the performance of the queries depend on the following factors: + +1. The **number of files** involved in each query. + + This is why we suggest to select a source when possible. + +2. The **speed of the disks** hosting the journal files. + + Journal files perform a lot of reading while querying, so the fastest the disks, the faster the query will finish. + +3. The **memory available** for caching parts of the files. + + Increased memory will help the kernel cache the most frequently used parts of the journal files, avoiding disk I/O + and speeding up queries. + +4. The **number of filters** applied. + + Queries are significantly faster when just a few filters are selected. + +In general, for a faster experience, **keep a low number of rows within the visible timeframe**. + +Even on long timeframes, selecting a couple of filters that will result in a **few dozen thousand** log entries +will provide fast / rapid responses, usually less than a second. To the contrary, viewing timeframes with **millions +of entries** may result in longer delays. + +The plugin aborts journal queries when your browser cancels inflight requests. This allows you to work on the UI +while there are background queries running. + +At the time of this writing, this Netdata plugin is about 25-30 times faster than `journalctl` on queries that access +multiple journal files, over long time-frames. + +During the development of this plugin, we submitted, to `systemd`, a number of patches to improve `journalctl` +performance by a factor of 14: + +- https://github.com/systemd/systemd/pull/29365 +- https://github.com/systemd/systemd/pull/29366 +- https://github.com/systemd/systemd/pull/29261 + +However, even after these patches are merged, `journalctl` will still be 2x slower than this Netdata plugin, +on multi-journal queries. + +The problem lies in the way `libsystemd` handles multi-journal file queries. To overcome this problem, +the Netdata plugin queries each file individually and it then it merges the results to be returned. +This is transparent, thanks to the `facets` library in `libnetdata` that handles on-the-fly indexing, filtering, +and searching of any dataset, independently of its source. + +## Configuration and maintenance + +This Netdata plugin does not require any configuration or maintenance. + +## FAQ + +### Can I use this plugin on journals' centralization servers? + +Yes. You can centralize your logs using `systemd-journal-remote`, and then install Netdata +on this logs centralization server to explore the logs of all your infrastructure. + +This plugin will automatically provide multi-node views of your logs and also give you the ability to combine the logs +of multiple servers, as you see fit. + +Check [configuring a logs centralization server](#configuring-a-journals-centralization-server). + +### Can I use this plugin from a parent Netdata? + +Yes. When your nodes are connected to a Netdata parent, all their functions are available +via the parent's UI. So, from the parent UI, you can access the functions of all your nodes. + +Keep in mind that to protect your privacy, in order to access Netdata functions, you need a +free Netdata Cloud account. + +### Is any of my data exposed to Netdata Cloud from this plugin? + +No. When you access the agent directly, none of your data passes through Netdata Cloud. +You need a free Netdata Cloud account only to verify your identity and enable the use of +Netdata Functions. Once this is done, all the data flow directly from your Netdata agent +to your web browser. + +Also check [this discussion](https://github.com/netdata/netdata/discussions/16136). + +When you access Netdata via `https://app.netdata.cloud`, your data travel via Netdata Cloud, +but they are not stored in Netdata Cloud. This is to allow you access your Netdata agents from +anywhere. All communication from/to Netdata Cloud is encrypted. + +### What are `volatile` and `persistent` journals? + +`systemd` `journald` allows creating both `volatile` journals in a `tmpfs` ram drive, +and `persistent` journals stored on disk. + +`volatile` journals are particularly useful when the system monitored is sensitive to +disk I/O, or does not have any writable disks at all. + +For more information check `man systemd-journald`. + +### I centralize my logs with Loki. Why to use Netdata for my journals? + +`systemd` journals have almost infinite cardinality at their labels and all of them are indexed, +even if every single message has unique fields and values. + +When you send `systemd` journal logs to Loki, even if you use the `relabel_rules` argument to +`loki.source.journal` with a JSON format, you need to specify which of the fields from journald +you want inherited by Loki. This means you need to know the most important fields beforehand. +At the same time you loose all the flexibility `systemd` journal provides: +**indexing on all fields and all their values**. + +Loki generally assumes that all logs are like a table. All entries in a stream share the same +fields. But journald does exactly the opposite. Each log entry is unique and may have its own unique fields. + +So, Loki and `systemd-journal` are good for different use cases. + +`systemd-journal` already runs in your systems. You use it today. It is there inside all your systems +collecting the system and applications logs. And for its use case, it has advantages over other +centralization solutions. So, why not use it? + +### Is it worth to build a `systemd` logs centralization server? + +Yes. It is simple, fast and the software to do it is already in your systems. + +For application and system logs, `systemd` journal is ideal and the visibility you can get +by centralizing your system logs and the use of this Netdata plugin, is unparalleled. + +### How do I configure a journals' centralization server? + +A short summary to get journal server running can be found below. +There are two strategies you can apply, when it comes down to a centralized server for `systemd` journal logs. + +1. _Active sources_, where the centralized server fetches the logs from each individual server +2. _Passive sources_, where the centralized server accepts a log stream from an individual server. + +For more options and reference to documentation, check `man systemd-journal-remote` and `man systemd-journal-upload`. + +#### _passive_ journals' centralization without encryption + +> ℹ️ _passive_ is a journal server that waits for clients to push their metrics to it. + +> ⚠️ **IMPORTANT** +> These instructions will copy your logs to a central server, without any encryption or authorization. +> DO NOT USE THIS ON NON-TRUSTED NETWORKS. + +##### _passive_ server, without encryption + +On the centralization server install `systemd-journal-remote`: + +```sh +# change this according to your distro +sudo apt-get install systemd-journal-remote +``` + +Make sure the journal transfer protocol is `http`: + +```sh +sudo cp /lib/systemd/system/systemd-journal-remote.service /etc/systemd/system/ + +# edit it to make sure it says: +# --listen-http=-3 +# not: +# --listen-https=-3 +sudo nano /etc/systemd/system/systemd-journal-remote.service + +# reload systemd +sudo systemctl daemon-reload +``` + +Optionally, if you want to change the port (the default is `19532`), edit `systemd-journal-remote.socket` + +```sh +# edit the socket file +sudo systemctl edit systemd-journal-remote.socket +``` + +and add the following lines into the instructed place, and choose your desired port; save and exit. + +```sh +[Socket] +ListenStream=<DESIRED_PORT> +``` + +Finally, enable it, so that it will start automatically upon receiving a connection: + +``` +# enable systemd-journal-remote +sudo systemctl enable --now systemd-journal-remote.socket +sudo systemctl enable systemd-journal-remote.service +``` + +`systemd-journal-remote` is now listening for incoming journals from remote hosts. + +##### _passive_ client, without encryption + +On the clients, install `systemd-journal-remote`: + +```sh +# change this according to your distro +sudo apt-get install systemd-journal-remote +``` + +Edit `/etc/systemd/journal-upload.conf` and set the IP address and the port of the server, like so: + +``` +[Upload] +URL=http://centralization.server.ip:19532 +``` + +Edit `systemd-journal-upload`, and add `Restart=always` to make sure the client will keep trying to push logs, even if the server is temporarily not there, like this: + +```sh +sudo systemctl edit systemd-journal-upload +``` + +At the top, add: + +``` +[Service] +Restart=always +``` + +Enable and start `systemd-journal-upload`, like this: + +```sh +sudo systemctl enable systemd-journal-upload +sudo systemctl start systemd-journal-upload +``` + +##### verify it works + +To verify the central server is receiving logs, run this on the central server: + +```sh +sudo ls -l /var/log/journal/remote/ +``` + +You should see new files from the client's IP. + +Also, `systemctl status systemd-journal-remote` should show something like this: + +``` +systemd-journal-remote.service - Journal Remote Sink Service + Loaded: loaded (/etc/systemd/system/systemd-journal-remote.service; indirect; preset: disabled) + Active: active (running) since Sun 2023-10-15 14:29:46 EEST; 2h 24min ago +TriggeredBy: ● systemd-journal-remote.socket + Docs: man:systemd-journal-remote(8) + man:journal-remote.conf(5) + Main PID: 2118153 (systemd-journal) + Status: "Processing requests..." + Tasks: 1 (limit: 154152) + Memory: 2.2M + CPU: 71ms + CGroup: /system.slice/systemd-journal-remote.service + └─2118153 /usr/lib/systemd/systemd-journal-remote --listen-http=-3 --output=/var/log/journal/remote/ +``` + +Note the `status: "Processing requests..."` and the PID under `CGroup`. + +On the client `systemctl status systemd-journal-upload` should show something like this: + +``` +● systemd-journal-upload.service - Journal Remote Upload Service + Loaded: loaded (/lib/systemd/system/systemd-journal-upload.service; enabled; vendor preset: disabled) + Drop-In: /etc/systemd/system/systemd-journal-upload.service.d + └─override.conf + Active: active (running) since Sun 2023-10-15 10:39:04 UTC; 3h 17min ago + Docs: man:systemd-journal-upload(8) + Main PID: 4169 (systemd-journal) + Status: "Processing input..." + Tasks: 1 (limit: 13868) + Memory: 3.5M + CPU: 1.081s + CGroup: /system.slice/systemd-journal-upload.service + └─4169 /lib/systemd/systemd-journal-upload --save-state +``` + +Note the `Status: "Processing input..."` and the PID under `CGroup`. + +#### _passive_ journals' centralization with encryption using self-signed certificates + +> ℹ️ _passive_ is a journal server that waits for clients to push their metrics to it. + +##### _passive_ server, with encryption and self-singed certificates + +On the centralization server install `systemd-journal-remote` and `openssl`: + +```sh +# change this according to your distro +sudo apt-get install systemd-journal-remote openssl +``` + +Make sure the journal transfer protocol is `https`: + +```sh +sudo cp /lib/systemd/system/systemd-journal-remote.service /etc/systemd/system/ + +# edit it to make sure it says: +# --listen-https=-3 +# not: +# --listen-http=-3 +sudo nano /etc/systemd/system/systemd-journal-remote.service + +# reload systemd +sudo systemctl daemon-reload +``` + +Optionally, if you want to change the port (the default is `19532`), edit `systemd-journal-remote.socket` + +```sh +# edit the socket file +sudo systemctl edit systemd-journal-remote.socket +``` + +and add the following lines into the instructed place, and choose your desired port; save and exit. + +```sh +[Socket] +ListenStream=<DESIRED_PORT> +``` + +Finally, enable it, so that it will start automatically upon receiving a connection: + +```sh +# enable systemd-journal-remote +sudo systemctl enable --now systemd-journal-remote.socket +sudo systemctl enable systemd-journal-remote.service +``` + +`systemd-journal-remote` is now listening for incoming journals from remote hosts. + +Use [this script](https://gist.github.com/ktsaou/d62b8a6501cf9a0da94f03cbbb71c5c7) to create a self-signed certificates authority and certificates for all your servers. + +```sh +wget -O systemd-journal-self-signed-certs.sh "https://gist.githubusercontent.com/ktsaou/d62b8a6501cf9a0da94f03cbbb71c5c7/raw/c346e61e0a66f45dc4095d254bd23917f0a01bd0/systemd-journal-self-signed-certs.sh" +chmod 755 systemd-journal-self-signed-certs.sh +``` + +Edit the script and at its top, set your settings: + +```sh +# The directory to save the generated certificates (and everything about this certificate authority). +# This is only used on the node generating the certificates (usually on the journals server). +DIR="/etc/ssl/systemd-journal-remote" + +# The journals centralization server name (the CN of the server certificate). +SERVER="server-hostname" + +# All the DNS names or IPs this server is reachable at (the certificate will include them). +# Journal clients can use any of them to connect to this server. +# systemd-journal-upload validates its URL= hostname, against this list. +SERVER_ALIASES=("DNS:server-hostname1" "DNS:server-hostname2" "IP:1.2.3.4" "IP:10.1.1.1" "IP:172.16.1.1") + +# All the names of the journal clients that will be sending logs to the server (the CNs of their certificates). +# These names are used by systemd-journal-remote to name the journal files in /var/log/journal/remote/. +# Also the remote hosts will be presented using these names on Netdata dashboards. +CLIENTS=("vm1" "vm2" "vm3" "add_as_may_as_needed") +``` + +Then run the script: + +```sh +sudo ./systemd-journal-self-signed-certs.sh +``` + +The script will create the directory `/etc/ssl/systemd-journal-remote` and in it you will find all the certificates needed. + +There will also be files named `runme-on-XXX.sh`. There will be 1 script for the server and 1 script for each of the clients. You can copy and paste (or `scp`) these scripts on your server and each of your clients and run them as root: + +```sh +scp /etc/ssl/systemd-journal-remote/runme-on-XXX.sh XXX:/tmp/ +``` + +Once the above is done, `ssh` to each server/client and do: + +```sh +sudo bash /tmp/runme-on-XXX.sh +``` + +The scripts install the needed certificates, fix their file permissions to be accessible by systemd-journal-remote/upload, change `/etc/systemd/journal-remote.conf` (on the server) or `/etc/systemd/journal-upload.conf` on the clients and restart the relevant services. + + +##### _passive_ client, with encryption and self-singed certificates + +On the clients, install `systemd-journal-remote`: + +```sh +# change this according to your distro +sudo apt-get install systemd-journal-remote +``` + +Edit `/etc/systemd/journal-upload.conf` and set the IP address and the port of the server, like so: + +``` +[Upload] +URL=https://centralization.server.ip:19532 +``` + +Make sure that `centralization.server.ip` is one of the `SERVER_ALIASES` when you created the certificates. + +Edit `systemd-journal-upload`, and add `Restart=always` to make sure the client will keep trying to push logs, even if the server is temporarily not there, like this: + +```sh +sudo systemctl edit systemd-journal-upload +``` + +At the top, add: + +``` +[Service] +Restart=always +``` + +Enable and start `systemd-journal-upload`, like this: + +```sh +sudo systemctl enable systemd-journal-upload +``` + +Copy the relevant `runme-on-XXX.sh` script as described on server setup and run it: + +```sh +sudo bash /tmp/runme-on-XXX.sh +``` + + +#### Limitations when using a logs centralization server + +As of this writing `namespaces` support by `systemd` is limited: + +- Docker containers cannot log to namespaces. Check [this issue](https://github.com/moby/moby/issues/41879). +- `systemd-journal-upload` automatically uploads `system` and `user` journals, but not `namespaces` journals. For this + you need to spawn a `systemd-journal-upload` per namespace. + diff --git a/collectors/systemd-journal.plugin/systemd-journal.c b/collectors/systemd-journal.plugin/systemd-journal.c index 304ff244a..c2bd98e7d 100644 --- a/collectors/systemd-journal.plugin/systemd-journal.c +++ b/collectors/systemd-journal.plugin/systemd-journal.c @@ -5,29 +5,110 @@ * GPL v3+ */ -// TODO - 1) MARKDOC - #include "collectors/all.h" #include "libnetdata/libnetdata.h" #include "libnetdata/required_dummies.h" -#ifndef SD_JOURNAL_ALL_NAMESPACES -#define JOURNAL_NAMESPACE SD_JOURNAL_LOCAL_ONLY -#else -#define JOURNAL_NAMESPACE SD_JOURNAL_ALL_NAMESPACES -#endif - +#include <linux/capability.h> #include <systemd/sd-journal.h> #include <syslog.h> +/* + * TODO + * + * _UDEV_DEVLINK is frequently set more than once per field - support multi-value faces + * + */ + + +// ---------------------------------------------------------------------------- +// fstat64 overloading to speed up libsystemd +// https://github.com/systemd/systemd/pull/29261 + +#define ND_SD_JOURNAL_OPEN_FLAGS (0) + +#include <dlfcn.h> +#include <sys/stat.h> + +#define FSTAT_CACHE_MAX 1024 +struct fdstat64_cache_entry { + bool enabled; + bool updated; + int err_no; + struct stat64 stat; + int ret; + size_t cached_count; + size_t session; +}; + +struct fdstat64_cache_entry fstat64_cache[FSTAT_CACHE_MAX] = {0 }; +static __thread size_t fstat_thread_calls = 0; +static __thread size_t fstat_thread_cached_responses = 0; +static __thread bool enable_thread_fstat = false; +static __thread size_t fstat_caching_thread_session = 0; +static size_t fstat_caching_global_session = 0; + +static void fstat_cache_enable_on_thread(void) { + fstat_caching_thread_session = __atomic_add_fetch(&fstat_caching_global_session, 1, __ATOMIC_ACQUIRE); + enable_thread_fstat = true; +} + +static void fstat_cache_disable_on_thread(void) { + fstat_caching_thread_session = __atomic_add_fetch(&fstat_caching_global_session, 1, __ATOMIC_RELEASE); + enable_thread_fstat = false; +} + +int fstat64(int fd, struct stat64 *buf) { + static int (*real_fstat)(int, struct stat64 *) = NULL; + if (!real_fstat) + real_fstat = dlsym(RTLD_NEXT, "fstat64"); + + fstat_thread_calls++; + + if(fd >= 0 && fd < FSTAT_CACHE_MAX) { + if(enable_thread_fstat && fstat64_cache[fd].session != fstat_caching_thread_session) { + fstat64_cache[fd].session = fstat_caching_thread_session; + fstat64_cache[fd].enabled = true; + fstat64_cache[fd].updated = false; + } + + if(fstat64_cache[fd].enabled && fstat64_cache[fd].updated && fstat64_cache[fd].session == fstat_caching_thread_session) { + fstat_thread_cached_responses++; + errno = fstat64_cache[fd].err_no; + *buf = fstat64_cache[fd].stat; + fstat64_cache[fd].cached_count++; + return fstat64_cache[fd].ret; + } + } + + int ret = real_fstat(fd, buf); + + if(fd >= 0 && fd < FSTAT_CACHE_MAX && fstat64_cache[fd].enabled) { + fstat64_cache[fd].ret = ret; + fstat64_cache[fd].updated = true; + fstat64_cache[fd].err_no = errno; + fstat64_cache[fd].stat = *buf; + fstat64_cache[fd].session = fstat_caching_thread_session; + } + + return ret; +} + +// ---------------------------------------------------------------------------- + #define FACET_MAX_VALUE_LENGTH 8192 +#define SYSTEMD_JOURNAL_MAX_SOURCE_LEN 64 #define SYSTEMD_JOURNAL_FUNCTION_DESCRIPTION "View, search and analyze systemd journal entries." #define SYSTEMD_JOURNAL_FUNCTION_NAME "systemd-journal" -#define SYSTEMD_JOURNAL_DEFAULT_TIMEOUT 30 +#define SYSTEMD_JOURNAL_DEFAULT_TIMEOUT 60 #define SYSTEMD_JOURNAL_MAX_PARAMS 100 -#define SYSTEMD_JOURNAL_DEFAULT_QUERY_DURATION (3 * 3600) +#define SYSTEMD_JOURNAL_DEFAULT_QUERY_DURATION (1 * 3600) #define SYSTEMD_JOURNAL_DEFAULT_ITEMS_PER_QUERY 200 +#define SYSTEMD_JOURNAL_WORKER_THREADS 5 + +#define JOURNAL_VS_REALTIME_DELTA_DEFAULT_UT (5 * USEC_PER_SEC) // assume always 5 seconds latency +#define JOURNAL_VS_REALTIME_DELTA_MAX_UT (2 * 60 * USEC_PER_SEC) // up to 2 minutes latency #define JOURNAL_PARAMETER_HELP "help" #define JOURNAL_PARAMETER_AFTER "after" @@ -35,147 +116,1540 @@ #define JOURNAL_PARAMETER_ANCHOR "anchor" #define JOURNAL_PARAMETER_LAST "last" #define JOURNAL_PARAMETER_QUERY "query" +#define JOURNAL_PARAMETER_FACETS "facets" +#define JOURNAL_PARAMETER_HISTOGRAM "histogram" +#define JOURNAL_PARAMETER_DIRECTION "direction" +#define JOURNAL_PARAMETER_IF_MODIFIED_SINCE "if_modified_since" +#define JOURNAL_PARAMETER_DATA_ONLY "data_only" +#define JOURNAL_PARAMETER_SOURCE "source" +#define JOURNAL_PARAMETER_INFO "info" +#define JOURNAL_PARAMETER_ID "id" +#define JOURNAL_PARAMETER_PROGRESS "progress" +#define JOURNAL_PARAMETER_SLICE "slice" +#define JOURNAL_PARAMETER_DELTA "delta" +#define JOURNAL_PARAMETER_TAIL "tail" + +#define JOURNAL_KEY_ND_JOURNAL_FILE "ND_JOURNAL_FILE" +#define JOURNAL_KEY_ND_JOURNAL_PROCESS "ND_JOURNAL_PROCESS" + +#define JOURNAL_DEFAULT_SLICE_MODE true +#define JOURNAL_DEFAULT_DIRECTION FACETS_ANCHOR_DIRECTION_BACKWARD #define SYSTEMD_ALWAYS_VISIBLE_KEYS NULL -#define SYSTEMD_KEYS_EXCLUDED_FROM_FACETS NULL + +#define SYSTEMD_KEYS_EXCLUDED_FROM_FACETS \ + "*MESSAGE*" \ + "|*_RAW" \ + "|*_USEC" \ + "|*_NSEC" \ + "|*TIMESTAMP*" \ + "|*_ID" \ + "|*_ID_*" \ + "|__*" \ + "" + #define SYSTEMD_KEYS_INCLUDED_IN_FACETS \ - "_TRANSPORT" \ - "|SYSLOG_IDENTIFIER" \ - "|SYSLOG_FACILITY" \ + \ + /* --- USER JOURNAL FIELDS --- */ \ + \ + /* "|MESSAGE" */ \ + /* "|MESSAGE_ID" */ \ "|PRIORITY" \ - "|_HOSTNAME" \ - "|_RUNTIME_SCOPE" \ - "|_PID" \ + "|CODE_FILE" \ + /* "|CODE_LINE" */ \ + "|CODE_FUNC" \ + "|ERRNO" \ + /* "|INVOCATION_ID" */ \ + /* "|USER_INVOCATION_ID" */ \ + "|SYSLOG_FACILITY" \ + "|SYSLOG_IDENTIFIER" \ + /* "|SYSLOG_PID" */ \ + /* "|SYSLOG_TIMESTAMP" */ \ + /* "|SYSLOG_RAW" */ \ + /* "!DOCUMENTATION" */ \ + /* "|TID" */ \ + "|UNIT" \ + "|USER_UNIT" \ + "|UNIT_RESULT" /* undocumented */ \ + \ + \ + /* --- TRUSTED JOURNAL FIELDS --- */ \ + \ + /* "|_PID" */ \ "|_UID" \ "|_GID" \ - "|_SYSTEMD_UNIT" \ - "|_SYSTEMD_SLICE" \ - "|_SYSTEMD_USER_SLICE" \ "|_COMM" \ "|_EXE" \ + /* "|_CMDLINE" */ \ + "|_CAP_EFFECTIVE" \ + /* "|_AUDIT_SESSION" */ \ + "|_AUDIT_LOGINUID" \ "|_SYSTEMD_CGROUP" \ + "|_SYSTEMD_SLICE" \ + "|_SYSTEMD_UNIT" \ "|_SYSTEMD_USER_UNIT" \ - "|USER_UNIT" \ - "|UNIT" \ + "|_SYSTEMD_USER_SLICE" \ + "|_SYSTEMD_SESSION" \ + "|_SYSTEMD_OWNER_UID" \ + "|_SELINUX_CONTEXT" \ + /* "|_SOURCE_REALTIME_TIMESTAMP" */ \ + "|_BOOT_ID" \ + "|_MACHINE_ID" \ + /* "|_SYSTEMD_INVOCATION_ID" */ \ + "|_HOSTNAME" \ + "|_TRANSPORT" \ + "|_STREAM_ID" \ + /* "|LINE_BREAK" */ \ + "|_NAMESPACE" \ + "|_RUNTIME_SCOPE" \ + \ + \ + /* --- KERNEL JOURNAL FIELDS --- */ \ + \ + /* "|_KERNEL_DEVICE" */ \ + "|_KERNEL_SUBSYSTEM" \ + /* "|_UDEV_SYSNAME" */ \ + "|_UDEV_DEVNODE" \ + /* "|_UDEV_DEVLINK" */ \ + \ + \ + /* --- LOGGING ON BEHALF --- */ \ + \ + "|OBJECT_UID" \ + "|OBJECT_GID" \ + "|OBJECT_COMM" \ + "|OBJECT_EXE" \ + /* "|OBJECT_CMDLINE" */ \ + /* "|OBJECT_AUDIT_SESSION" */ \ + "|OBJECT_AUDIT_LOGINUID" \ + "|OBJECT_SYSTEMD_CGROUP" \ + "|OBJECT_SYSTEMD_SESSION" \ + "|OBJECT_SYSTEMD_OWNER_UID" \ + "|OBJECT_SYSTEMD_UNIT" \ + "|OBJECT_SYSTEMD_USER_UNIT" \ + \ + \ + /* --- CORE DUMPS --- */ \ + \ + "|COREDUMP_COMM" \ + "|COREDUMP_UNIT" \ + "|COREDUMP_USER_UNIT" \ + "|COREDUMP_SIGNAL_NAME" \ + "|COREDUMP_CGROUP" \ + \ + \ + /* --- DOCKER --- */ \ + \ + "|CONTAINER_ID" \ + /* "|CONTAINER_ID_FULL" */ \ + "|CONTAINER_NAME" \ + "|CONTAINER_TAG" \ + "|IMAGE_NAME" /* undocumented */ \ + /* "|CONTAINER_PARTIAL_MESSAGE" */ \ + \ "" -static netdata_mutex_t mutex = NETDATA_MUTEX_INITIALIZER; +static netdata_mutex_t stdout_mutex = NETDATA_MUTEX_INITIALIZER; static bool plugin_should_exit = false; -DICTIONARY *uids = NULL; -DICTIONARY *gids = NULL; +// ---------------------------------------------------------------------------- +typedef enum { + ND_SD_JOURNAL_NO_FILE_MATCHED, + ND_SD_JOURNAL_FAILED_TO_OPEN, + ND_SD_JOURNAL_FAILED_TO_SEEK, + ND_SD_JOURNAL_TIMED_OUT, + ND_SD_JOURNAL_OK, + ND_SD_JOURNAL_NOT_MODIFIED, + ND_SD_JOURNAL_CANCELLED, +} ND_SD_JOURNAL_STATUS; + +typedef enum { + SDJF_ALL = 0, + SDJF_LOCAL = (1 << 0), + SDJF_REMOTE = (1 << 1), + SDJF_SYSTEM = (1 << 2), + SDJF_USER = (1 << 3), + SDJF_NAMESPACE = (1 << 4), + SDJF_OTHER = (1 << 5), +} SD_JOURNAL_FILE_SOURCE_TYPE; + +typedef struct function_query_status { + bool *cancelled; // a pointer to the cancelling boolean + usec_t stop_monotonic_ut; + + usec_t started_monotonic_ut; + + // request + SD_JOURNAL_FILE_SOURCE_TYPE source_type; + STRING *source; + usec_t after_ut; + usec_t before_ut; + + struct { + usec_t start_ut; + usec_t stop_ut; + } anchor; + + FACETS_ANCHOR_DIRECTION direction; + size_t entries; + usec_t if_modified_since; + bool delta; + bool tail; + bool data_only; + bool slice; + size_t filters; + usec_t last_modified; + const char *query; + const char *histogram; + + // per file progress info + size_t cached_count; + + // progress statistics + usec_t matches_setup_ut; + size_t rows_useful; + size_t rows_read; + size_t bytes_read; + size_t files_matched; + size_t file_working; +} FUNCTION_QUERY_STATUS; + +struct journal_file { + const char *filename; + size_t filename_len; + STRING *source; + SD_JOURNAL_FILE_SOURCE_TYPE source_type; + usec_t file_last_modified_ut; + usec_t msg_first_ut; + usec_t msg_last_ut; + usec_t last_scan_ut; + size_t size; + bool logged_failure; + usec_t max_journal_vs_realtime_delta_ut; +}; + +static void log_fqs(FUNCTION_QUERY_STATUS *fqs, const char *msg) { + netdata_log_error("ERROR: %s, on query " + "timeframe [%"PRIu64" - %"PRIu64"], " + "anchor [%"PRIu64" - %"PRIu64"], " + "if_modified_since %"PRIu64", " + "data_only:%s, delta:%s, tail:%s, direction:%s" + , msg + , fqs->after_ut, fqs->before_ut + , fqs->anchor.start_ut, fqs->anchor.stop_ut + , fqs->if_modified_since + , fqs->data_only ? "true" : "false" + , fqs->delta ? "true" : "false" + , fqs->tail ? "tail" : "false" + , fqs->direction == FACETS_ANCHOR_DIRECTION_FORWARD ? "forward" : "backward"); +} -// ---------------------------------------------------------------------------- +static inline bool netdata_systemd_journal_seek_to(sd_journal *j, usec_t timestamp) { + if(sd_journal_seek_realtime_usec(j, timestamp) < 0) { + netdata_log_error("SYSTEMD-JOURNAL: Failed to seek to %" PRIu64, timestamp); + if(sd_journal_seek_tail(j) < 0) { + netdata_log_error("SYSTEMD-JOURNAL: Failed to seek to journal's tail"); + return false; + } + } + + return true; +} + +#define JD_SOURCE_REALTIME_TIMESTAMP "_SOURCE_REALTIME_TIMESTAMP" -int systemd_journal_query(BUFFER *wb, FACETS *facets, usec_t after_ut, usec_t before_ut, usec_t stop_monotonic_ut) { - sd_journal *j; - int r; +static inline bool parse_journal_field(const char *data, size_t data_length, const char **key, size_t *key_length, const char **value, size_t *value_length) { + const char *k = data; + const char *equal = strchr(k, '='); + if(unlikely(!equal)) + return false; - // Open the system journal for reading - r = sd_journal_open(&j, JOURNAL_NAMESPACE); - if (r < 0) - return HTTP_RESP_INTERNAL_SERVER_ERROR; + size_t kl = equal - k; + + const char *v = ++equal; + size_t vl = data_length - kl - 1; + + *key = k; + *key_length = kl; + *value = v; + *value_length = vl; + + return true; +} + +static inline size_t netdata_systemd_journal_process_row(sd_journal *j, FACETS *facets, struct journal_file *jf, usec_t *msg_ut) { + const void *data; + size_t length, bytes = 0; + + facets_add_key_value_length(facets, JOURNAL_KEY_ND_JOURNAL_FILE, sizeof(JOURNAL_KEY_ND_JOURNAL_FILE) - 1, jf->filename, jf->filename_len); + + SD_JOURNAL_FOREACH_DATA(j, data, length) { + const char *key, *value; + size_t key_length, value_length; + + if(!parse_journal_field(data, length, &key, &key_length, &value, &value_length)) + continue; + +#ifdef NETDATA_INTERNAL_CHECKS + usec_t origin_journal_ut = *msg_ut; +#endif + if(unlikely(key_length == sizeof(JD_SOURCE_REALTIME_TIMESTAMP) - 1 && + memcmp(key, JD_SOURCE_REALTIME_TIMESTAMP, sizeof(JD_SOURCE_REALTIME_TIMESTAMP) - 1) == 0)) { + usec_t ut = str2ull(value, NULL); + if(ut && ut < *msg_ut) { + usec_t delta = *msg_ut - ut; + *msg_ut = ut; + + if(delta > JOURNAL_VS_REALTIME_DELTA_MAX_UT) + delta = JOURNAL_VS_REALTIME_DELTA_MAX_UT; + + // update max_journal_vs_realtime_delta_ut if the delta increased + usec_t expected = jf->max_journal_vs_realtime_delta_ut; + do { + if(delta <= expected) + break; + } while(!__atomic_compare_exchange_n(&jf->max_journal_vs_realtime_delta_ut, &expected, delta, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED)); + + internal_error(delta > expected, + "increased max_journal_vs_realtime_delta_ut from %"PRIu64" to %"PRIu64", " + "journal %"PRIu64", actual %"PRIu64" (delta %"PRIu64")" + , expected, delta, origin_journal_ut, *msg_ut, origin_journal_ut - (*msg_ut)); + } + } + + bytes += length; + facets_add_key_value_length(facets, key, key_length, value, value_length <= FACET_MAX_VALUE_LENGTH ? value_length : FACET_MAX_VALUE_LENGTH); + } + + return bytes; +} + +#define FUNCTION_PROGRESS_UPDATE_ROWS(rows_read, rows) __atomic_fetch_add(&(rows_read), rows, __ATOMIC_RELAXED) +#define FUNCTION_PROGRESS_UPDATE_BYTES(bytes_read, bytes) __atomic_fetch_add(&(bytes_read), bytes, __ATOMIC_RELAXED) +#define FUNCTION_PROGRESS_EVERY_ROWS (1ULL << 13) +#define FUNCTION_DATA_ONLY_CHECK_EVERY_ROWS (1ULL << 7) + +static inline ND_SD_JOURNAL_STATUS check_stop(const bool *cancelled, const usec_t *stop_monotonic_ut) { + if(cancelled && __atomic_load_n(cancelled, __ATOMIC_RELAXED)) { + internal_error(true, "Function has been cancelled"); + return ND_SD_JOURNAL_CANCELLED; + } + + if(now_monotonic_usec() > __atomic_load_n(stop_monotonic_ut, __ATOMIC_RELAXED)) { + internal_error(true, "Function timed out"); + return ND_SD_JOURNAL_TIMED_OUT; + } + + return ND_SD_JOURNAL_OK; +} + +ND_SD_JOURNAL_STATUS netdata_systemd_journal_query_backward( + sd_journal *j, BUFFER *wb __maybe_unused, FACETS *facets, + struct journal_file *jf, FUNCTION_QUERY_STATUS *fqs) { + + usec_t anchor_delta = __atomic_load_n(&jf->max_journal_vs_realtime_delta_ut, __ATOMIC_RELAXED); + + usec_t start_ut = ((fqs->data_only && fqs->anchor.start_ut) ? fqs->anchor.start_ut : fqs->before_ut) + anchor_delta; + usec_t stop_ut = (fqs->data_only && fqs->anchor.stop_ut) ? fqs->anchor.stop_ut : fqs->after_ut; + bool stop_when_full = (fqs->data_only && !fqs->anchor.stop_ut); + + if(!netdata_systemd_journal_seek_to(j, start_ut)) + return ND_SD_JOURNAL_FAILED_TO_SEEK; + + size_t errors_no_timestamp = 0; + usec_t earliest_msg_ut = 0; + size_t row_counter = 0, last_row_counter = 0, rows_useful = 0; + size_t bytes = 0, last_bytes = 0; + + usec_t last_usec_from = 0; + usec_t last_usec_to = 0; + + ND_SD_JOURNAL_STATUS status = ND_SD_JOURNAL_OK; facets_rows_begin(facets); + while (status == ND_SD_JOURNAL_OK && sd_journal_previous(j) > 0) { + usec_t msg_ut = 0; + if(sd_journal_get_realtime_usec(j, &msg_ut) < 0 || !msg_ut) { + errors_no_timestamp++; + continue; + } + + if(unlikely(msg_ut > earliest_msg_ut)) + earliest_msg_ut = msg_ut; - bool timed_out = false; - size_t row_counter = 0; - sd_journal_seek_realtime_usec(j, before_ut); - SD_JOURNAL_FOREACH_BACKWARDS(j) { - row_counter++; + if (unlikely(msg_ut > start_ut)) + continue; + + if (unlikely(msg_ut < stop_ut)) + break; - uint64_t msg_ut; - sd_journal_get_realtime_usec(j, &msg_ut); - if (msg_ut < after_ut) + bytes += netdata_systemd_journal_process_row(j, facets, jf, &msg_ut); + + // make sure each line gets a unique timestamp + if(unlikely(msg_ut >= last_usec_from && msg_ut <= last_usec_to)) + msg_ut = --last_usec_from; + else + last_usec_from = last_usec_to = msg_ut; + + if(facets_row_finished(facets, msg_ut)) + rows_useful++; + + row_counter++; + if(unlikely((row_counter % FUNCTION_DATA_ONLY_CHECK_EVERY_ROWS) == 0 && + stop_when_full && + facets_rows(facets) >= fqs->entries)) { + // stop the data only query + usec_t oldest = facets_row_oldest_ut(facets); + if(oldest && msg_ut < (oldest - anchor_delta)) break; + } - const void *data; - size_t length; - SD_JOURNAL_FOREACH_DATA(j, data, length) { - const char *key = data; - const char *equal = strchr(key, '='); - if(unlikely(!equal)) - continue; + if(unlikely(row_counter % FUNCTION_PROGRESS_EVERY_ROWS == 0)) { + FUNCTION_PROGRESS_UPDATE_ROWS(fqs->rows_read, row_counter - last_row_counter); + last_row_counter = row_counter; - const char *value = ++equal; - size_t key_length = value - key; // including '\0' + FUNCTION_PROGRESS_UPDATE_BYTES(fqs->bytes_read, bytes - last_bytes); + last_bytes = bytes; - char key_copy[key_length]; - memcpy(key_copy, key, key_length - 1); - key_copy[key_length - 1] = '\0'; + status = check_stop(fqs->cancelled, &fqs->stop_monotonic_ut); + } + } - size_t value_length = length - key_length; // without '\0' - facets_add_key_value_length(facets, key_copy, value, value_length <= FACET_MAX_VALUE_LENGTH ? value_length : FACET_MAX_VALUE_LENGTH); - } + FUNCTION_PROGRESS_UPDATE_ROWS(fqs->rows_read, row_counter - last_row_counter); + FUNCTION_PROGRESS_UPDATE_BYTES(fqs->bytes_read, bytes - last_bytes); + + fqs->rows_useful += rows_useful; + + if(errors_no_timestamp) + netdata_log_error("SYSTEMD-JOURNAL: %zu lines did not have timestamps", errors_no_timestamp); + + if(earliest_msg_ut > fqs->last_modified) + fqs->last_modified = earliest_msg_ut; + + return status; +} + +ND_SD_JOURNAL_STATUS netdata_systemd_journal_query_forward( + sd_journal *j, BUFFER *wb __maybe_unused, FACETS *facets, + struct journal_file *jf, FUNCTION_QUERY_STATUS *fqs) { + + usec_t anchor_delta = __atomic_load_n(&jf->max_journal_vs_realtime_delta_ut, __ATOMIC_RELAXED); + + usec_t start_ut = (fqs->data_only && fqs->anchor.start_ut) ? fqs->anchor.start_ut : fqs->after_ut; + usec_t stop_ut = ((fqs->data_only && fqs->anchor.stop_ut) ? fqs->anchor.stop_ut : fqs->before_ut) + anchor_delta; + bool stop_when_full = (fqs->data_only && !fqs->anchor.stop_ut); + + if(!netdata_systemd_journal_seek_to(j, start_ut)) + return ND_SD_JOURNAL_FAILED_TO_SEEK; + + size_t errors_no_timestamp = 0; + usec_t earliest_msg_ut = 0; + size_t row_counter = 0, last_row_counter = 0, rows_useful = 0; + size_t bytes = 0, last_bytes = 0; + + usec_t last_usec_from = 0; + usec_t last_usec_to = 0; + + ND_SD_JOURNAL_STATUS status = ND_SD_JOURNAL_OK; + + facets_rows_begin(facets); + while (status == ND_SD_JOURNAL_OK && sd_journal_next(j) > 0) { + usec_t msg_ut = 0; + if(sd_journal_get_realtime_usec(j, &msg_ut) < 0 || !msg_ut) { + errors_no_timestamp++; + continue; + } + + if(likely(msg_ut > earliest_msg_ut)) + earliest_msg_ut = msg_ut; - facets_row_finished(facets, msg_ut); + if (unlikely(msg_ut < start_ut)) + continue; - if((row_counter % 100) == 0 && now_monotonic_usec() > stop_monotonic_ut) { - timed_out = true; + if (unlikely(msg_ut > stop_ut)) + break; + + bytes += netdata_systemd_journal_process_row(j, facets, jf, &msg_ut); + + // make sure each line gets a unique timestamp + if(unlikely(msg_ut >= last_usec_from && msg_ut <= last_usec_to)) + msg_ut = ++last_usec_to; + else + last_usec_from = last_usec_to = msg_ut; + + if(facets_row_finished(facets, msg_ut)) + rows_useful++; + + row_counter++; + if(unlikely((row_counter % FUNCTION_DATA_ONLY_CHECK_EVERY_ROWS) == 0 && + stop_when_full && + facets_rows(facets) >= fqs->entries)) { + // stop the data only query + usec_t newest = facets_row_newest_ut(facets); + if(newest && msg_ut > (newest + anchor_delta)) break; + } + + if(unlikely(row_counter % FUNCTION_PROGRESS_EVERY_ROWS == 0)) { + FUNCTION_PROGRESS_UPDATE_ROWS(fqs->rows_read, row_counter - last_row_counter); + last_row_counter = row_counter; + + FUNCTION_PROGRESS_UPDATE_BYTES(fqs->bytes_read, bytes - last_bytes); + last_bytes = bytes; + + status = check_stop(fqs->cancelled, &fqs->stop_monotonic_ut); + } + } + + FUNCTION_PROGRESS_UPDATE_ROWS(fqs->rows_read, row_counter - last_row_counter); + FUNCTION_PROGRESS_UPDATE_BYTES(fqs->bytes_read, bytes - last_bytes); + + fqs->rows_useful += rows_useful; + + if(errors_no_timestamp) + netdata_log_error("SYSTEMD-JOURNAL: %zu lines did not have timestamps", errors_no_timestamp); + + if(earliest_msg_ut > fqs->last_modified) + fqs->last_modified = earliest_msg_ut; + + return status; +} + +bool netdata_systemd_journal_check_if_modified_since(sd_journal *j, usec_t seek_to, usec_t last_modified) { + // return true, if data have been modified since the timestamp + + if(!last_modified || !seek_to) + return false; + + if(!netdata_systemd_journal_seek_to(j, seek_to)) + return false; + + usec_t first_msg_ut = 0; + while (sd_journal_previous(j) > 0) { + usec_t msg_ut; + if(sd_journal_get_realtime_usec(j, &msg_ut) < 0) + continue; + + first_msg_ut = msg_ut; + break; + } + + return first_msg_ut != last_modified; +} + +#ifdef HAVE_SD_JOURNAL_RESTART_FIELDS +static bool netdata_systemd_filtering_by_journal(sd_journal *j, FACETS *facets, FUNCTION_QUERY_STATUS *fqs) { + const char *field = NULL; + const void *data = NULL; + size_t data_length; + size_t added_keys = 0; + size_t failures = 0; + size_t filters_added = 0; + + SD_JOURNAL_FOREACH_FIELD(j, field) { + bool interesting; + + if(fqs->data_only) + interesting = facets_key_name_is_filter(facets, field); + else + interesting = facets_key_name_is_facet(facets, field); + + if(interesting) { + if(sd_journal_query_unique(j, field) >= 0) { + bool added_this_key = false; + size_t added_values = 0; + + SD_JOURNAL_FOREACH_UNIQUE(j, data, data_length) { + const char *key, *value; + size_t key_length, value_length; + + if(!parse_journal_field(data, data_length, &key, &key_length, &value, &value_length)) + continue; + + facets_add_possible_value_name_to_key(facets, key, key_length, value, value_length); + + if(!facets_key_name_value_length_is_selected(facets, key, key_length, value, value_length)) + continue; + + if(added_keys && !added_this_key) { + if(sd_journal_add_conjunction(j) < 0) + failures++; + + added_this_key = true; + added_keys++; + } + else if(added_values) + if(sd_journal_add_disjunction(j) < 0) + failures++; + + if(sd_journal_add_match(j, data, data_length) < 0) + failures++; + + added_values++; + filters_added++; + } } } + } + + if(failures) { + log_fqs(fqs, "failed to setup journal filter, will run the full query."); + sd_journal_flush_matches(j); + return true; + } + + return filters_added ? true : false; +} +#endif // HAVE_SD_JOURNAL_RESTART_FIELDS + +static ND_SD_JOURNAL_STATUS netdata_systemd_journal_query_one_file( + const char *filename, BUFFER *wb, FACETS *facets, + struct journal_file *jf, FUNCTION_QUERY_STATUS *fqs) { + + sd_journal *j = NULL; + errno = 0; + + fstat_cache_enable_on_thread(); + + const char *paths[2] = { + [0] = filename, + [1] = NULL, + }; + + if(sd_journal_open_files(&j, paths, ND_SD_JOURNAL_OPEN_FLAGS) < 0 || !j) { + fstat_cache_disable_on_thread(); + return ND_SD_JOURNAL_FAILED_TO_OPEN; + } + + ND_SD_JOURNAL_STATUS status; + bool matches_filters = true; + +#ifdef HAVE_SD_JOURNAL_RESTART_FIELDS + if(fqs->slice) { + usec_t started = now_monotonic_usec(); + + matches_filters = netdata_systemd_filtering_by_journal(j, facets, fqs) || !fqs->filters; + usec_t ended = now_monotonic_usec(); + + fqs->matches_setup_ut += (ended - started); + } +#endif // HAVE_SD_JOURNAL_RESTART_FIELDS + + if(matches_filters) { + if(fqs->direction == FACETS_ANCHOR_DIRECTION_FORWARD) + status = netdata_systemd_journal_query_forward(j, wb, facets, jf, fqs); + else + status = netdata_systemd_journal_query_backward(j, wb, facets, jf, fqs); + } + else + status = ND_SD_JOURNAL_NO_FILE_MATCHED; + + sd_journal_close(j); + fstat_cache_disable_on_thread(); + + return status; +} + +// ---------------------------------------------------------------------------- +// journal files registry + +#define VAR_LOG_JOURNAL_MAX_DEPTH 10 +#define MAX_JOURNAL_DIRECTORIES 100 + +struct journal_directory { + char *path; + bool logged_failure; +}; + +static struct journal_directory journal_directories[MAX_JOURNAL_DIRECTORIES] = { 0 }; +static DICTIONARY *journal_files_registry = NULL; +static DICTIONARY *used_hashes_registry = NULL; + +static usec_t systemd_journal_session = 0; + +static void buffer_json_journal_versions(BUFFER *wb) { + buffer_json_member_add_object(wb, "versions"); + { + buffer_json_member_add_uint64(wb, "sources", + systemd_journal_session + dictionary_version(journal_files_registry)); + } + buffer_json_object_close(wb); +} + +static void journal_file_update_msg_ut(const char *filename, struct journal_file *jf) { + fstat_cache_enable_on_thread(); + + const char *files[2] = { + [0] = filename, + [1] = NULL, + }; + + sd_journal *j = NULL; + if(sd_journal_open_files(&j, files, ND_SD_JOURNAL_OPEN_FLAGS) < 0 || !j) { + fstat_cache_disable_on_thread(); + + if(!jf->logged_failure) { + netdata_log_error("cannot open journal file '%s', using file timestamps to understand time-frame.", filename); + jf->logged_failure = true; + } + + jf->msg_first_ut = 0; + jf->msg_last_ut = jf->file_last_modified_ut; + return; + } + + usec_t first_ut = 0, last_ut = 0; + + if(sd_journal_seek_head(j) < 0 || sd_journal_next(j) < 0 || sd_journal_get_realtime_usec(j, &first_ut) < 0 || !first_ut) { + internal_error(true, "cannot find the timestamp of the first message in '%s'", filename); + first_ut = 0; + } + + if(sd_journal_seek_tail(j) < 0 || sd_journal_previous(j) < 0 || sd_journal_get_realtime_usec(j, &last_ut) < 0 || !last_ut) { + internal_error(true, "cannot find the timestamp of the last message in '%s'", filename); + last_ut = jf->file_last_modified_ut; + } sd_journal_close(j); + fstat_cache_disable_on_thread(); + + if(first_ut > last_ut) { + internal_error(true, "timestamps are flipped in file '%s'", filename); + usec_t t = first_ut; + first_ut = last_ut; + last_ut = t; + } + + jf->msg_first_ut = first_ut; + jf->msg_last_ut = last_ut; +} + +static STRING *string_strdupz_source(const char *s, const char *e, size_t max_len, const char *prefix) { + char buf[max_len]; + size_t len; + char *dst = buf; + + if(prefix) { + len = strlen(prefix); + memcpy(buf, prefix, len); + dst = &buf[len]; + max_len -= len; + } + + len = e - s; + if(len >= max_len) + len = max_len - 1; + memcpy(dst, s, len); + dst[len] = '\0'; + buf[max_len - 1] = '\0'; + + for(size_t i = 0; buf[i] ;i++) + if(!isalnum(buf[i]) && buf[i] != '-' && buf[i] != '.' && buf[i] != ':') + buf[i] = '_'; + + return string_strdupz(buf); +} + +static void files_registry_insert_cb(const DICTIONARY_ITEM *item, void *value, void *data __maybe_unused) { + struct journal_file *jf = value; + jf->filename = dictionary_acquired_item_name(item); + jf->filename_len = strlen(jf->filename); + + // based on the filename + // decide the source to show to the user + const char *s = strrchr(jf->filename, '/'); + if(s) { + if(strstr(jf->filename, "/remote/")) + jf->source_type = SDJF_REMOTE; + else { + const char *t = s - 1; + while(t >= jf->filename && *t != '.' && *t != '/') + t--; + + if(t >= jf->filename && *t == '.') { + jf->source_type = SDJF_NAMESPACE; + jf->source = string_strdupz_source(t + 1, s, SYSTEMD_JOURNAL_MAX_SOURCE_LEN, "namespace-"); + } + else + jf->source_type = SDJF_LOCAL; + } + + if(strncmp(s, "/system", 7) == 0) + jf->source_type |= SDJF_SYSTEM; + + else if(strncmp(s, "/user", 5) == 0) + jf->source_type |= SDJF_USER; + + else if(strncmp(s, "/remote-", 8) == 0) { + jf->source_type |= SDJF_REMOTE; + + s = &s[8]; // skip "/remote-" + + char *e = strchr(s, '@'); + if(!e) + e = strstr(s, ".journal"); + + if(e) { + const char *d = s; + for(; d < e && (isdigit(*d) || *d == '.' || *d == ':') ; d++) ; + if(d == e) { + // a valid IP address + char ip[e - s + 1]; + memcpy(ip, s, e - s); + ip[e - s] = '\0'; + char buf[SYSTEMD_JOURNAL_MAX_SOURCE_LEN]; + if(ip_to_hostname(ip, buf, sizeof(buf))) + jf->source = string_strdupz_source(buf, &buf[strlen(buf)], SYSTEMD_JOURNAL_MAX_SOURCE_LEN, "remote-"); + else { + internal_error(true, "Cannot find the hostname for IP '%s'", ip); + jf->source = string_strdupz_source(s, e, SYSTEMD_JOURNAL_MAX_SOURCE_LEN, "remote-"); + } + } + else + jf->source = string_strdupz_source(s, e, SYSTEMD_JOURNAL_MAX_SOURCE_LEN, "remote-"); + } + else + jf->source_type |= SDJF_OTHER; + } + else + jf->source_type |= SDJF_OTHER; + } + else + jf->source_type = SDJF_LOCAL | SDJF_OTHER; + + journal_file_update_msg_ut(jf->filename, jf); + + internal_error(true, + "found journal file '%s', type %d, source '%s', " + "file modified: %"PRIu64", " + "msg {first: %"PRIu64", last: %"PRIu64"}", + jf->filename, jf->source_type, jf->source ? string2str(jf->source) : "<unset>", + jf->file_last_modified_ut, + jf->msg_first_ut, jf->msg_last_ut); +} + +static bool files_registry_conflict_cb(const DICTIONARY_ITEM *item, void *old_value, void *new_value, void *data __maybe_unused) { + struct journal_file *jf = old_value; + struct journal_file *njf = new_value; + + if(njf->last_scan_ut > jf->last_scan_ut) + jf->last_scan_ut = njf->last_scan_ut; + + if(njf->file_last_modified_ut > jf->file_last_modified_ut) { + jf->file_last_modified_ut = njf->file_last_modified_ut; + jf->size = njf->size; + + const char *filename = dictionary_acquired_item_name(item); + journal_file_update_msg_ut(filename, jf); + +// internal_error(true, +// "updated journal file '%s', type %d, " +// "file modified: %"PRIu64", " +// "msg {first: %"PRIu64", last: %"PRIu64"}", +// filename, jf->source_type, +// jf->file_last_modified_ut, +// jf->msg_first_ut, jf->msg_last_ut); + } + + return false; +} + +#define SDJF_SOURCE_ALL_NAME "all" +#define SDJF_SOURCE_LOCAL_NAME "all-local-logs" +#define SDJF_SOURCE_LOCAL_SYSTEM_NAME "all-local-system-logs" +#define SDJF_SOURCE_LOCAL_USERS_NAME "all-local-user-logs" +#define SDJF_SOURCE_LOCAL_OTHER_NAME "all-uncategorized" +#define SDJF_SOURCE_NAMESPACES_NAME "all-local-namespaces" +#define SDJF_SOURCE_REMOTES_NAME "all-remote-systems" + +struct journal_file_source { + usec_t first_ut; + usec_t last_ut; + size_t count; + uint64_t size; +}; + +static void human_readable_size_ib(uint64_t size, char *dst, size_t dst_len) { + if(size > 1024ULL * 1024 * 1024 * 1024) + snprintfz(dst, dst_len, "%0.2f TiB", (double)size / 1024.0 / 1024.0 / 1024.0 / 1024.0); + else if(size > 1024ULL * 1024 * 1024) + snprintfz(dst, dst_len, "%0.2f GiB", (double)size / 1024.0 / 1024.0 / 1024.0); + else if(size > 1024ULL * 1024) + snprintfz(dst, dst_len, "%0.2f MiB", (double)size / 1024.0 / 1024.0); + else if(size > 1024ULL) + snprintfz(dst, dst_len, "%0.2f KiB", (double)size / 1024.0); + else + snprintfz(dst, dst_len, "%"PRIu64" B", size); +} + +#define print_duration(dst, dst_len, pos, remaining, duration, one, many, printed) do { \ + if((remaining) > (duration)) { \ + uint64_t _count = (remaining) / (duration); \ + uint64_t _rem = (remaining) - (_count * (duration)); \ + (pos) += snprintfz(&(dst)[pos], (dst_len) - (pos), "%s%s%"PRIu64" %s", (printed) ? ", " : "", _rem ? "" : "and ", _count, _count > 1 ? (many) : (one)); \ + (remaining) = _rem; \ + (printed) = true; \ + } \ +} while(0) + +static void human_readable_duration_s(time_t duration_s, char *dst, size_t dst_len) { + if(duration_s < 0) + duration_s = -duration_s; + + size_t pos = 0; + dst[0] = 0 ; + + bool printed = false; + print_duration(dst, dst_len, pos, duration_s, 86400 * 365, "year", "years", printed); + print_duration(dst, dst_len, pos, duration_s, 86400 * 30, "month", "months", printed); + print_duration(dst, dst_len, pos, duration_s, 86400 * 1, "day", "days", printed); + print_duration(dst, dst_len, pos, duration_s, 3600 * 1, "hour", "hours", printed); + print_duration(dst, dst_len, pos, duration_s, 60 * 1, "min", "mins", printed); + print_duration(dst, dst_len, pos, duration_s, 1, "sec", "secs", printed); +} + +static int journal_file_to_json_array_cb(const DICTIONARY_ITEM *item, void *entry, void *data) { + struct journal_file_source *jfs = entry; + BUFFER *wb = data; + + const char *name = dictionary_acquired_item_name(item); + + buffer_json_add_array_item_object(wb); + { + char size_for_humans[100]; + human_readable_size_ib(jfs->size, size_for_humans, sizeof(size_for_humans)); + + char duration_for_humans[1024]; + human_readable_duration_s((time_t)((jfs->last_ut - jfs->first_ut) / USEC_PER_SEC), + duration_for_humans, sizeof(duration_for_humans)); + + char info[1024]; + snprintfz(info, sizeof(info), "%zu files, with a total size of %s, covering %s", + jfs->count, size_for_humans, duration_for_humans); + + buffer_json_member_add_string(wb, "id", name); + buffer_json_member_add_string(wb, "name", name); + buffer_json_member_add_string(wb, "pill", size_for_humans); + buffer_json_member_add_string(wb, "info", info); + } + buffer_json_object_close(wb); // options object + + return 1; +} + +static bool journal_file_merge_sizes(const DICTIONARY_ITEM *item __maybe_unused, void *old_value, void *new_value , void *data __maybe_unused) { + struct journal_file_source *jfs = old_value, *njfs = new_value; + jfs->count += njfs->count; + jfs->size += njfs->size; + + if(njfs->first_ut && njfs->first_ut < jfs->first_ut) + jfs->first_ut = njfs->first_ut; + + if(njfs->last_ut && njfs->last_ut > jfs->last_ut) + jfs->last_ut = njfs->last_ut; + + return false; +} + +static void available_journal_file_sources_to_json_array(BUFFER *wb) { + DICTIONARY *dict = dictionary_create(DICT_OPTION_SINGLE_THREADED|DICT_OPTION_NAME_LINK_DONT_CLONE|DICT_OPTION_DONT_OVERWRITE_VALUE); + dictionary_register_conflict_callback(dict, journal_file_merge_sizes, NULL); + + struct journal_file_source t = { 0 }; + + struct journal_file *jf; + dfe_start_read(journal_files_registry, jf) { + t.first_ut = jf->msg_first_ut; + t.last_ut = jf->msg_last_ut; + t.count = 1; + t.size = jf->size; + + dictionary_set(dict, SDJF_SOURCE_ALL_NAME, &t, sizeof(t)); + + if((jf->source_type & (SDJF_LOCAL)) == (SDJF_LOCAL)) + dictionary_set(dict, SDJF_SOURCE_LOCAL_NAME, &t, sizeof(t)); + if((jf->source_type & (SDJF_LOCAL | SDJF_SYSTEM)) == (SDJF_LOCAL | SDJF_SYSTEM)) + dictionary_set(dict, SDJF_SOURCE_LOCAL_SYSTEM_NAME, &t, sizeof(t)); + if((jf->source_type & (SDJF_LOCAL | SDJF_USER)) == (SDJF_LOCAL | SDJF_USER)) + dictionary_set(dict, SDJF_SOURCE_LOCAL_USERS_NAME, &t, sizeof(t)); + if((jf->source_type & (SDJF_LOCAL | SDJF_OTHER)) == (SDJF_LOCAL | SDJF_OTHER)) + dictionary_set(dict, SDJF_SOURCE_LOCAL_OTHER_NAME, &t, sizeof(t)); + if((jf->source_type & (SDJF_NAMESPACE)) == (SDJF_NAMESPACE)) + dictionary_set(dict, SDJF_SOURCE_NAMESPACES_NAME, &t, sizeof(t)); + if((jf->source_type & (SDJF_REMOTE)) == (SDJF_REMOTE)) + dictionary_set(dict, SDJF_SOURCE_REMOTES_NAME, &t, sizeof(t)); + if(jf->source) + dictionary_set(dict, string2str(jf->source), &t, sizeof(t)); + } + dfe_done(jf); + + dictionary_sorted_walkthrough_read(dict, journal_file_to_json_array_cb, wb); + + dictionary_destroy(dict); +} + +static void files_registry_delete_cb(const DICTIONARY_ITEM *item, void *value, void *data __maybe_unused) { + struct journal_file *jf = value; (void)jf; + const char *filename = dictionary_acquired_item_name(item); (void)filename; + + string_freez(jf->source); + internal_error(true, "removed journal file '%s'", filename); +} + +void journal_directory_scan(const char *dirname, int depth, usec_t last_scan_ut) { + static const char *ext = ".journal"; + static const size_t ext_len = sizeof(".journal") - 1; + + if (depth > VAR_LOG_JOURNAL_MAX_DEPTH) + return; + + DIR *dir; + struct dirent *entry; + struct stat info; + char absolute_path[FILENAME_MAX]; + + // Open the directory. + if ((dir = opendir(dirname)) == NULL) { + if(errno != ENOENT && errno != ENOTDIR) + netdata_log_error("Cannot opendir() '%s'", dirname); + return; + } + + // Read each entry in the directory. + while ((entry = readdir(dir)) != NULL) { + snprintfz(absolute_path, sizeof(absolute_path), "%s/%s", dirname, entry->d_name); + if (stat(absolute_path, &info) != 0) { + netdata_log_error("Failed to stat() '%s", absolute_path); + continue; + } + + if (S_ISDIR(info.st_mode)) { + // If entry is a directory, call traverse recursively. + if (strcmp(entry->d_name, ".") != 0 && strcmp(entry->d_name, "..") != 0) + journal_directory_scan(absolute_path, depth + 1, last_scan_ut); + + } + else if (S_ISREG(info.st_mode)) { + // If entry is a regular file, check if it ends with .journal. + char *filename = entry->d_name; + size_t len = strlen(filename); + + if (len > ext_len && strcmp(filename + len - ext_len, ext) == 0) { + struct journal_file t = { + .file_last_modified_ut = info.st_mtim.tv_sec * USEC_PER_SEC + info.st_mtim.tv_nsec / NSEC_PER_USEC, + .last_scan_ut = last_scan_ut, + .size = info.st_size, + .max_journal_vs_realtime_delta_ut = JOURNAL_VS_REALTIME_DELTA_DEFAULT_UT, + }; + dictionary_set(journal_files_registry, absolute_path, &t, sizeof(t)); + } + } + } + + closedir(dir); +} + +static void journal_files_registry_update() { + usec_t scan_ut = now_monotonic_usec(); + + for(unsigned i = 0; i < MAX_JOURNAL_DIRECTORIES ;i++) { + if(!journal_directories[i].path) + break; + + journal_directory_scan(journal_directories[i].path, 0, scan_ut); + } + + struct journal_file *jf; + dfe_start_write(journal_files_registry, jf) { + if(jf->last_scan_ut < scan_ut) + dictionary_del(journal_files_registry, jf_dfe.name); + } + dfe_done(jf); +} + +// ---------------------------------------------------------------------------- + +static bool jf_is_mine(struct journal_file *jf, FUNCTION_QUERY_STATUS *fqs) { + + if((fqs->source_type == SDJF_ALL || (jf->source_type & fqs->source_type) == fqs->source_type) && + (!fqs->source || fqs->source == jf->source)) { + + usec_t anchor_delta = JOURNAL_VS_REALTIME_DELTA_MAX_UT; + usec_t first_ut = jf->msg_first_ut; + usec_t last_ut = jf->msg_last_ut + anchor_delta; + + if(last_ut >= fqs->after_ut && first_ut <= fqs->before_ut) + return true; + } + + return false; +} + +static int journal_file_dict_items_backward_compar(const void *a, const void *b) { + const DICTIONARY_ITEM **ad = (const DICTIONARY_ITEM **)a, **bd = (const DICTIONARY_ITEM **)b; + struct journal_file *jfa = dictionary_acquired_item_value(*ad); + struct journal_file *jfb = dictionary_acquired_item_value(*bd); + + if(jfa->msg_last_ut < jfb->msg_last_ut) + return 1; + + if(jfa->msg_last_ut > jfb->msg_last_ut) + return -1; + + if(jfa->msg_first_ut < jfb->msg_first_ut) + return 1; + + if(jfa->msg_first_ut > jfb->msg_first_ut) + return -1; + + return 0; +} + +static int journal_file_dict_items_forward_compar(const void *a, const void *b) { + return -journal_file_dict_items_backward_compar(a, b); +} + +static int netdata_systemd_journal_query(BUFFER *wb, FACETS *facets, FUNCTION_QUERY_STATUS *fqs) { + ND_SD_JOURNAL_STATUS status = ND_SD_JOURNAL_NO_FILE_MATCHED; + struct journal_file *jf; + + fqs->files_matched = 0; + fqs->file_working = 0; + fqs->rows_useful = 0; + fqs->rows_read = 0; + fqs->bytes_read = 0; + + size_t files_used = 0; + size_t files_max = dictionary_entries(journal_files_registry); + const DICTIONARY_ITEM *file_items[files_max]; + + // count the files + bool files_are_newer = false; + dfe_start_read(journal_files_registry, jf) { + if(!jf_is_mine(jf, fqs)) + continue; + + file_items[files_used++] = dictionary_acquired_item_dup(journal_files_registry, jf_dfe.item); + + if(jf->msg_last_ut > fqs->if_modified_since) + files_are_newer = true; + } + dfe_done(jf); + + fqs->files_matched = files_used; + + if(fqs->if_modified_since && !files_are_newer) { + buffer_flush(wb); + return HTTP_RESP_NOT_MODIFIED; + } + + // sort the files, so that they are optimal for facets + if(files_used >= 2) { + if (fqs->direction == FACETS_ANCHOR_DIRECTION_BACKWARD) + qsort(file_items, files_used, sizeof(const DICTIONARY_ITEM *), + journal_file_dict_items_backward_compar); + else + qsort(file_items, files_used, sizeof(const DICTIONARY_ITEM *), + journal_file_dict_items_forward_compar); + } + + bool partial = false; + usec_t started_ut; + usec_t ended_ut = now_monotonic_usec(); + + buffer_json_member_add_array(wb, "_journal_files"); + for(size_t f = 0; f < files_used ;f++) { + const char *filename = dictionary_acquired_item_name(file_items[f]); + jf = dictionary_acquired_item_value(file_items[f]); + + if(!jf_is_mine(jf, fqs)) + continue; + + fqs->file_working++; + fqs->cached_count = 0; + + size_t fs_calls = fstat_thread_calls; + size_t fs_cached = fstat_thread_cached_responses; + size_t rows_useful = fqs->rows_useful; + size_t rows_read = fqs->rows_read; + size_t bytes_read = fqs->bytes_read; + size_t matches_setup_ut = fqs->matches_setup_ut; + + ND_SD_JOURNAL_STATUS tmp_status = netdata_systemd_journal_query_one_file(filename, wb, facets, jf, fqs); + + rows_useful = fqs->rows_useful - rows_useful; + rows_read = fqs->rows_read - rows_read; + bytes_read = fqs->bytes_read - bytes_read; + matches_setup_ut = fqs->matches_setup_ut - matches_setup_ut; + fs_calls = fstat_thread_calls - fs_calls; + fs_cached = fstat_thread_cached_responses - fs_cached; + + started_ut = ended_ut; + ended_ut = now_monotonic_usec(); + usec_t duration_ut = ended_ut - started_ut; + + buffer_json_add_array_item_object(wb); // journal file + { + // information about the file + buffer_json_member_add_string(wb, "_filename", filename); + buffer_json_member_add_uint64(wb, "_source_type", jf->source_type); + buffer_json_member_add_string(wb, "_source", string2str(jf->source)); + buffer_json_member_add_uint64(wb, "_last_modified_ut", jf->file_last_modified_ut); + buffer_json_member_add_uint64(wb, "_msg_first_ut", jf->msg_first_ut); + buffer_json_member_add_uint64(wb, "_msg_last_ut", jf->msg_last_ut); + buffer_json_member_add_uint64(wb, "_journal_vs_realtime_delta_ut", jf->max_journal_vs_realtime_delta_ut); + + // information about the current use of the file + buffer_json_member_add_uint64(wb, "duration_ut", ended_ut - started_ut); + buffer_json_member_add_uint64(wb, "rows_read", rows_read); + buffer_json_member_add_uint64(wb, "rows_useful", rows_useful); + buffer_json_member_add_double(wb, "rows_per_second", (double) rows_read / (double) duration_ut * (double) USEC_PER_SEC); + buffer_json_member_add_uint64(wb, "bytes_read", bytes_read); + buffer_json_member_add_double(wb, "bytes_per_second", (double) bytes_read / (double) duration_ut * (double) USEC_PER_SEC); + buffer_json_member_add_uint64(wb, "duration_matches_ut", matches_setup_ut); + buffer_json_member_add_uint64(wb, "fstat_query_calls", fs_calls); + buffer_json_member_add_uint64(wb, "fstat_query_cached_responses", fs_cached); + } + buffer_json_object_close(wb); // journal file + + bool stop = false; + switch(tmp_status) { + case ND_SD_JOURNAL_OK: + case ND_SD_JOURNAL_NO_FILE_MATCHED: + status = (status == ND_SD_JOURNAL_OK) ? ND_SD_JOURNAL_OK : tmp_status; + break; + + case ND_SD_JOURNAL_FAILED_TO_OPEN: + case ND_SD_JOURNAL_FAILED_TO_SEEK: + partial = true; + if(status == ND_SD_JOURNAL_NO_FILE_MATCHED) + status = tmp_status; + break; + + case ND_SD_JOURNAL_CANCELLED: + case ND_SD_JOURNAL_TIMED_OUT: + partial = true; + stop = true; + status = tmp_status; + break; + + case ND_SD_JOURNAL_NOT_MODIFIED: + internal_fatal(true, "this should never be returned here"); + break; + } + + if(stop) + break; + } + buffer_json_array_close(wb); // _journal_files + + // release the files + for(size_t f = 0; f < files_used ;f++) + dictionary_acquired_item_release(journal_files_registry, file_items[f]); + + switch (status) { + case ND_SD_JOURNAL_OK: + if(fqs->if_modified_since && !fqs->rows_useful) { + buffer_flush(wb); + return HTTP_RESP_NOT_MODIFIED; + } + break; + + case ND_SD_JOURNAL_TIMED_OUT: + case ND_SD_JOURNAL_NO_FILE_MATCHED: + break; + + case ND_SD_JOURNAL_CANCELLED: + buffer_flush(wb); + return HTTP_RESP_CLIENT_CLOSED_REQUEST; + + case ND_SD_JOURNAL_NOT_MODIFIED: + buffer_flush(wb); + return HTTP_RESP_NOT_MODIFIED; + + default: + case ND_SD_JOURNAL_FAILED_TO_OPEN: + case ND_SD_JOURNAL_FAILED_TO_SEEK: + buffer_flush(wb); + return HTTP_RESP_INTERNAL_SERVER_ERROR; + } buffer_json_member_add_uint64(wb, "status", HTTP_RESP_OK); - buffer_json_member_add_boolean(wb, "partial", timed_out); + buffer_json_member_add_boolean(wb, "partial", partial); buffer_json_member_add_string(wb, "type", "table"); - buffer_json_member_add_time_t(wb, "update_every", 1); - buffer_json_member_add_string(wb, "help", SYSTEMD_JOURNAL_FUNCTION_DESCRIPTION); - facets_report(facets, wb); + if(!fqs->data_only) { + buffer_json_member_add_time_t(wb, "update_every", 1); + buffer_json_member_add_string(wb, "help", SYSTEMD_JOURNAL_FUNCTION_DESCRIPTION); + } + + if(!fqs->data_only || fqs->tail) + buffer_json_member_add_uint64(wb, "last_modified", fqs->last_modified); + + facets_sort_and_reorder_keys(facets); + facets_report(facets, wb, used_hashes_registry); + + buffer_json_member_add_time_t(wb, "expires", now_realtime_sec() + (fqs->data_only ? 3600 : 0)); - buffer_json_member_add_time_t(wb, "expires", now_realtime_sec()); + buffer_json_member_add_object(wb, "_fstat_caching"); + { + buffer_json_member_add_uint64(wb, "calls", fstat_thread_calls); + buffer_json_member_add_uint64(wb, "cached", fstat_thread_cached_responses); + } + buffer_json_object_close(wb); // _fstat_caching buffer_json_finalize(wb); return HTTP_RESP_OK; } -static void systemd_journal_function_help(const char *transaction) { - pluginsd_function_result_begin_to_stdout(transaction, HTTP_RESP_OK, "text/plain", now_realtime_sec() + 3600); - fprintf(stdout, +static void netdata_systemd_journal_function_help(const char *transaction) { + BUFFER *wb = buffer_create(0, NULL); + buffer_sprintf(wb, "%s / %s\n" "\n" "%s\n" "\n" - "The following filters are supported:\n" + "The following parameters are supported:\n" "\n" - " help\n" + " "JOURNAL_PARAMETER_HELP"\n" " Shows this help message.\n" "\n" - " before:TIMESTAMP\n" + " "JOURNAL_PARAMETER_INFO"\n" + " Request initial configuration information about the plugin.\n" + " The key entity returned is the required_params array, which includes\n" + " all the available systemd journal sources.\n" + " When `"JOURNAL_PARAMETER_INFO"` is requested, all other parameters are ignored.\n" + "\n" + " "JOURNAL_PARAMETER_ID":STRING\n" + " Caller supplied unique ID of the request.\n" + " This can be used later to request a progress report of the query.\n" + " Optional, but if omitted no `"JOURNAL_PARAMETER_PROGRESS"` can be requested.\n" + "\n" + " "JOURNAL_PARAMETER_PROGRESS"\n" + " Request a progress report (the `id` of a running query is required).\n" + " When `"JOURNAL_PARAMETER_PROGRESS"` is requested, only parameter `"JOURNAL_PARAMETER_ID"` is used.\n" + "\n" + " "JOURNAL_PARAMETER_DATA_ONLY":true or "JOURNAL_PARAMETER_DATA_ONLY":false\n" + " Quickly respond with data requested, without generating a\n" + " `histogram`, `facets` counters and `items`.\n" + "\n" + " "JOURNAL_PARAMETER_DELTA":true or "JOURNAL_PARAMETER_DELTA":false\n" + " When doing data only queries, include deltas for histogram, facets and items.\n" + "\n" + " "JOURNAL_PARAMETER_TAIL":true or "JOURNAL_PARAMETER_TAIL":false\n" + " When doing data only queries, respond with the newest messages,\n" + " and up to the anchor, but calculate deltas (if requested) for\n" + " the duration [anchor - before].\n" + "\n" + " "JOURNAL_PARAMETER_SLICE":true or "JOURNAL_PARAMETER_SLICE":false\n" + " When it is turned on, the plugin is executing filtering via libsystemd,\n" + " utilizing all the available indexes of the journal files.\n" + " When it is off, only the time constraint is handled by libsystemd and\n" + " all filtering is done by the plugin.\n" + " The default is: %s\n" + "\n" + " "JOURNAL_PARAMETER_SOURCE":SOURCE\n" + " Query only the specified journal sources.\n" + " Do an `"JOURNAL_PARAMETER_INFO"` query to find the sources.\n" + "\n" + " "JOURNAL_PARAMETER_BEFORE":TIMESTAMP_IN_SECONDS\n" " Absolute or relative (to now) timestamp in seconds, to start the query.\n" " The query is always executed from the most recent to the oldest log entry.\n" " If not given the default is: now.\n" "\n" - " after:TIMESTAMP\n" + " "JOURNAL_PARAMETER_AFTER":TIMESTAMP_IN_SECONDS\n" " Absolute or relative (to `before`) timestamp in seconds, to end the query.\n" " If not given, the default is %d.\n" "\n" - " last:ITEMS\n" + " "JOURNAL_PARAMETER_LAST":ITEMS\n" " The number of items to return.\n" " The default is %d.\n" "\n" - " anchor:NUMBER\n" - " The `timestamp` of the item last received, to return log entries after that.\n" - " If not given, the query will return the top `ITEMS` from the most recent.\n" + " "JOURNAL_PARAMETER_ANCHOR":TIMESTAMP_IN_MICROSECONDS\n" + " Return items relative to this timestamp.\n" + " The exact items to be returned depend on the query `"JOURNAL_PARAMETER_DIRECTION"`.\n" + "\n" + " "JOURNAL_PARAMETER_DIRECTION":forward or "JOURNAL_PARAMETER_DIRECTION":backward\n" + " When set to `backward` (default) the items returned are the newest before the\n" + " `"JOURNAL_PARAMETER_ANCHOR"`, (or `"JOURNAL_PARAMETER_BEFORE"` if `"JOURNAL_PARAMETER_ANCHOR"` is not set)\n" + " When set to `forward` the items returned are the oldest after the\n" + " `"JOURNAL_PARAMETER_ANCHOR"`, (or `"JOURNAL_PARAMETER_AFTER"` if `"JOURNAL_PARAMETER_ANCHOR"` is not set)\n" + " The default is: %s\n" + "\n" + " "JOURNAL_PARAMETER_QUERY":SIMPLE_PATTERN\n" + " Do a full text search to find the log entries matching the pattern given.\n" + " The plugin is searching for matches on all fields of the database.\n" + "\n" + " "JOURNAL_PARAMETER_IF_MODIFIED_SINCE":TIMESTAMP_IN_MICROSECONDS\n" + " Each successful response, includes a `last_modified` field.\n" + " By providing the timestamp to the `"JOURNAL_PARAMETER_IF_MODIFIED_SINCE"` parameter,\n" + " the plugin will return 200 with a successful response, or 304 if the source has not\n" + " been modified since that timestamp.\n" + "\n" + " "JOURNAL_PARAMETER_HISTOGRAM":facet_id\n" + " Use the given `facet_id` for the histogram.\n" + " This parameter is ignored in `"JOURNAL_PARAMETER_DATA_ONLY"` mode.\n" + "\n" + " "JOURNAL_PARAMETER_FACETS":facet_id1,facet_id2,facet_id3,...\n" + " Add the given facets to the list of fields for which analysis is required.\n" + " The plugin will offer both a histogram and facet value counters for its values.\n" + " This parameter is ignored in `"JOURNAL_PARAMETER_DATA_ONLY"` mode.\n" "\n" " facet_id:value_id1,value_id2,value_id3,...\n" " Apply filters to the query, based on the facet IDs returned.\n" " Each `facet_id` can be given once, but multiple `facet_ids` can be given.\n" "\n" - "Filters can be combined. Each filter can be given only one time.\n" , program_name , SYSTEMD_JOURNAL_FUNCTION_NAME , SYSTEMD_JOURNAL_FUNCTION_DESCRIPTION + , JOURNAL_DEFAULT_SLICE_MODE ? "true" : "false" // slice , -SYSTEMD_JOURNAL_DEFAULT_QUERY_DURATION , SYSTEMD_JOURNAL_DEFAULT_ITEMS_PER_QUERY + , JOURNAL_DEFAULT_DIRECTION == FACETS_ANCHOR_DIRECTION_BACKWARD ? "backward" : "forward" ); - pluginsd_function_result_end_to_stdout(); + + netdata_mutex_lock(&stdout_mutex); + pluginsd_function_result_to_stdout(transaction, HTTP_RESP_OK, "text/plain", now_realtime_sec() + 3600, wb); + netdata_mutex_unlock(&stdout_mutex); + + buffer_free(wb); } +const char *errno_map[] = { + [1] = "1 (EPERM)", // "Operation not permitted", + [2] = "2 (ENOENT)", // "No such file or directory", + [3] = "3 (ESRCH)", // "No such process", + [4] = "4 (EINTR)", // "Interrupted system call", + [5] = "5 (EIO)", // "Input/output error", + [6] = "6 (ENXIO)", // "No such device or address", + [7] = "7 (E2BIG)", // "Argument list too long", + [8] = "8 (ENOEXEC)", // "Exec format error", + [9] = "9 (EBADF)", // "Bad file descriptor", + [10] = "10 (ECHILD)", // "No child processes", + [11] = "11 (EAGAIN)", // "Resource temporarily unavailable", + [12] = "12 (ENOMEM)", // "Cannot allocate memory", + [13] = "13 (EACCES)", // "Permission denied", + [14] = "14 (EFAULT)", // "Bad address", + [15] = "15 (ENOTBLK)", // "Block device required", + [16] = "16 (EBUSY)", // "Device or resource busy", + [17] = "17 (EEXIST)", // "File exists", + [18] = "18 (EXDEV)", // "Invalid cross-device link", + [19] = "19 (ENODEV)", // "No such device", + [20] = "20 (ENOTDIR)", // "Not a directory", + [21] = "21 (EISDIR)", // "Is a directory", + [22] = "22 (EINVAL)", // "Invalid argument", + [23] = "23 (ENFILE)", // "Too many open files in system", + [24] = "24 (EMFILE)", // "Too many open files", + [25] = "25 (ENOTTY)", // "Inappropriate ioctl for device", + [26] = "26 (ETXTBSY)", // "Text file busy", + [27] = "27 (EFBIG)", // "File too large", + [28] = "28 (ENOSPC)", // "No space left on device", + [29] = "29 (ESPIPE)", // "Illegal seek", + [30] = "30 (EROFS)", // "Read-only file system", + [31] = "31 (EMLINK)", // "Too many links", + [32] = "32 (EPIPE)", // "Broken pipe", + [33] = "33 (EDOM)", // "Numerical argument out of domain", + [34] = "34 (ERANGE)", // "Numerical result out of range", + [35] = "35 (EDEADLK)", // "Resource deadlock avoided", + [36] = "36 (ENAMETOOLONG)", // "File name too long", + [37] = "37 (ENOLCK)", // "No locks available", + [38] = "38 (ENOSYS)", // "Function not implemented", + [39] = "39 (ENOTEMPTY)", // "Directory not empty", + [40] = "40 (ELOOP)", // "Too many levels of symbolic links", + [42] = "42 (ENOMSG)", // "No message of desired type", + [43] = "43 (EIDRM)", // "Identifier removed", + [44] = "44 (ECHRNG)", // "Channel number out of range", + [45] = "45 (EL2NSYNC)", // "Level 2 not synchronized", + [46] = "46 (EL3HLT)", // "Level 3 halted", + [47] = "47 (EL3RST)", // "Level 3 reset", + [48] = "48 (ELNRNG)", // "Link number out of range", + [49] = "49 (EUNATCH)", // "Protocol driver not attached", + [50] = "50 (ENOCSI)", // "No CSI structure available", + [51] = "51 (EL2HLT)", // "Level 2 halted", + [52] = "52 (EBADE)", // "Invalid exchange", + [53] = "53 (EBADR)", // "Invalid request descriptor", + [54] = "54 (EXFULL)", // "Exchange full", + [55] = "55 (ENOANO)", // "No anode", + [56] = "56 (EBADRQC)", // "Invalid request code", + [57] = "57 (EBADSLT)", // "Invalid slot", + [59] = "59 (EBFONT)", // "Bad font file format", + [60] = "60 (ENOSTR)", // "Device not a stream", + [61] = "61 (ENODATA)", // "No data available", + [62] = "62 (ETIME)", // "Timer expired", + [63] = "63 (ENOSR)", // "Out of streams resources", + [64] = "64 (ENONET)", // "Machine is not on the network", + [65] = "65 (ENOPKG)", // "Package not installed", + [66] = "66 (EREMOTE)", // "Object is remote", + [67] = "67 (ENOLINK)", // "Link has been severed", + [68] = "68 (EADV)", // "Advertise error", + [69] = "69 (ESRMNT)", // "Srmount error", + [70] = "70 (ECOMM)", // "Communication error on send", + [71] = "71 (EPROTO)", // "Protocol error", + [72] = "72 (EMULTIHOP)", // "Multihop attempted", + [73] = "73 (EDOTDOT)", // "RFS specific error", + [74] = "74 (EBADMSG)", // "Bad message", + [75] = "75 (EOVERFLOW)", // "Value too large for defined data type", + [76] = "76 (ENOTUNIQ)", // "Name not unique on network", + [77] = "77 (EBADFD)", // "File descriptor in bad state", + [78] = "78 (EREMCHG)", // "Remote address changed", + [79] = "79 (ELIBACC)", // "Can not access a needed shared library", + [80] = "80 (ELIBBAD)", // "Accessing a corrupted shared library", + [81] = "81 (ELIBSCN)", // ".lib section in a.out corrupted", + [82] = "82 (ELIBMAX)", // "Attempting to link in too many shared libraries", + [83] = "83 (ELIBEXEC)", // "Cannot exec a shared library directly", + [84] = "84 (EILSEQ)", // "Invalid or incomplete multibyte or wide character", + [85] = "85 (ERESTART)", // "Interrupted system call should be restarted", + [86] = "86 (ESTRPIPE)", // "Streams pipe error", + [87] = "87 (EUSERS)", // "Too many users", + [88] = "88 (ENOTSOCK)", // "Socket operation on non-socket", + [89] = "89 (EDESTADDRREQ)", // "Destination address required", + [90] = "90 (EMSGSIZE)", // "Message too long", + [91] = "91 (EPROTOTYPE)", // "Protocol wrong type for socket", + [92] = "92 (ENOPROTOOPT)", // "Protocol not available", + [93] = "93 (EPROTONOSUPPORT)", // "Protocol not supported", + [94] = "94 (ESOCKTNOSUPPORT)", // "Socket type not supported", + [95] = "95 (ENOTSUP)", // "Operation not supported", + [96] = "96 (EPFNOSUPPORT)", // "Protocol family not supported", + [97] = "97 (EAFNOSUPPORT)", // "Address family not supported by protocol", + [98] = "98 (EADDRINUSE)", // "Address already in use", + [99] = "99 (EADDRNOTAVAIL)", // "Cannot assign requested address", + [100] = "100 (ENETDOWN)", // "Network is down", + [101] = "101 (ENETUNREACH)", // "Network is unreachable", + [102] = "102 (ENETRESET)", // "Network dropped connection on reset", + [103] = "103 (ECONNABORTED)", // "Software caused connection abort", + [104] = "104 (ECONNRESET)", // "Connection reset by peer", + [105] = "105 (ENOBUFS)", // "No buffer space available", + [106] = "106 (EISCONN)", // "Transport endpoint is already connected", + [107] = "107 (ENOTCONN)", // "Transport endpoint is not connected", + [108] = "108 (ESHUTDOWN)", // "Cannot send after transport endpoint shutdown", + [109] = "109 (ETOOMANYREFS)", // "Too many references: cannot splice", + [110] = "110 (ETIMEDOUT)", // "Connection timed out", + [111] = "111 (ECONNREFUSED)", // "Connection refused", + [112] = "112 (EHOSTDOWN)", // "Host is down", + [113] = "113 (EHOSTUNREACH)", // "No route to host", + [114] = "114 (EALREADY)", // "Operation already in progress", + [115] = "115 (EINPROGRESS)", // "Operation now in progress", + [116] = "116 (ESTALE)", // "Stale file handle", + [117] = "117 (EUCLEAN)", // "Structure needs cleaning", + [118] = "118 (ENOTNAM)", // "Not a XENIX named type file", + [119] = "119 (ENAVAIL)", // "No XENIX semaphores available", + [120] = "120 (EISNAM)", // "Is a named type file", + [121] = "121 (EREMOTEIO)", // "Remote I/O error", + [122] = "122 (EDQUOT)", // "Disk quota exceeded", + [123] = "123 (ENOMEDIUM)", // "No medium found", + [124] = "124 (EMEDIUMTYPE)", // "Wrong medium type", + [125] = "125 (ECANCELED)", // "Operation canceled", + [126] = "126 (ENOKEY)", // "Required key not available", + [127] = "127 (EKEYEXPIRED)", // "Key has expired", + [128] = "128 (EKEYREVOKED)", // "Key has been revoked", + [129] = "129 (EKEYREJECTED)", // "Key was rejected by service", + [130] = "130 (EOWNERDEAD)", // "Owner died", + [131] = "131 (ENOTRECOVERABLE)", // "State not recoverable", + [132] = "132 (ERFKILL)", // "Operation not possible due to RF-kill", + [133] = "133 (EHWPOISON)", // "Memory page has hardware error", +}; + static const char *syslog_facility_to_name(int facility) { switch (facility) { case LOG_FAC(LOG_KERN): return "kern"; @@ -216,31 +1690,57 @@ static const char *syslog_priority_to_name(int priority) { } } +static FACET_ROW_SEVERITY syslog_priority_to_facet_severity(FACETS *facets __maybe_unused, FACET_ROW *row, void *data __maybe_unused) { + // same to + // https://github.com/systemd/systemd/blob/aab9e4b2b86905a15944a1ac81e471b5b7075932/src/basic/terminal-util.c#L1501 + // function get_log_colors() + + FACET_ROW_KEY_VALUE *priority_rkv = dictionary_get(row->dict, "PRIORITY"); + if(!priority_rkv || priority_rkv->empty) + return FACET_ROW_SEVERITY_NORMAL; + + int priority = str2i(buffer_tostring(priority_rkv->wb)); + + if(priority <= LOG_ERR) + return FACET_ROW_SEVERITY_CRITICAL; + + else if (priority <= LOG_WARNING) + return FACET_ROW_SEVERITY_WARNING; + + else if(priority <= LOG_NOTICE) + return FACET_ROW_SEVERITY_NOTICE; + + else if(priority >= LOG_DEBUG) + return FACET_ROW_SEVERITY_DEBUG; + + return FACET_ROW_SEVERITY_NORMAL; +} + static char *uid_to_username(uid_t uid, char *buffer, size_t buffer_size) { - struct passwd pw, *result; - char tmp[1024 + 1]; + static __thread char tmp[1024 + 1]; + struct passwd pw, *result = NULL; - if (getpwuid_r(uid, &pw, tmp, 1024, &result) != 0 || result == NULL) - return NULL; + if (getpwuid_r(uid, &pw, tmp, sizeof(tmp), &result) != 0 || !result || !pw.pw_name || !(*pw.pw_name)) + snprintfz(buffer, buffer_size - 1, "%u", uid); + else + snprintfz(buffer, buffer_size - 1, "%u (%s)", uid, pw.pw_name); - strncpy(buffer, pw.pw_name, buffer_size - 1); - buffer[buffer_size - 1] = '\0'; // Null-terminate just in case return buffer; } static char *gid_to_groupname(gid_t gid, char* buffer, size_t buffer_size) { - struct group grp, *result; - char tmp[1024 + 1]; + static __thread char tmp[1024]; + struct group grp, *result = NULL; - if (getgrgid_r(gid, &grp, tmp, 1024, &result) != 0 || result == NULL) - return NULL; + if (getgrgid_r(gid, &grp, tmp, sizeof(tmp), &result) != 0 || !result || !grp.gr_name || !(*grp.gr_name)) + snprintfz(buffer, buffer_size - 1, "%u", gid); + else + snprintfz(buffer, buffer_size - 1, "%u (%s)", gid, grp.gr_name); - strncpy(buffer, grp.gr_name, buffer_size - 1); - buffer[buffer_size - 1] = '\0'; // Null-terminate just in case return buffer; } -static void systemd_journal_transform_syslog_facility(FACETS *facets __maybe_unused, BUFFER *wb, void *data __maybe_unused) { +static void netdata_systemd_journal_transform_syslog_facility(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { const char *v = buffer_tostring(wb); if(*v && isdigit(*v)) { int facility = str2i(buffer_tostring(wb)); @@ -252,7 +1752,10 @@ static void systemd_journal_transform_syslog_facility(FACETS *facets __maybe_unu } } -static void systemd_journal_transform_priority(FACETS *facets __maybe_unused, BUFFER *wb, void *data __maybe_unused) { +static void netdata_systemd_journal_transform_priority(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { + if(scope == FACETS_TRANSFORM_FACET_SORT) + return; + const char *v = buffer_tostring(wb); if(*v && isdigit(*v)) { int priority = str2i(buffer_tostring(wb)); @@ -264,141 +1767,663 @@ static void systemd_journal_transform_priority(FACETS *facets __maybe_unused, BU } } -static void systemd_journal_transform_uid(FACETS *facets __maybe_unused, BUFFER *wb, void *data) { - DICTIONARY *cache = data; +static void netdata_systemd_journal_transform_errno(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { + if(scope == FACETS_TRANSFORM_FACET_SORT) + return; + const char *v = buffer_tostring(wb); if(*v && isdigit(*v)) { - const char *sv = dictionary_get(cache, v); - if(!sv) { - char buf[1024 + 1]; - int uid = str2i(buffer_tostring(wb)); - const char *name = uid_to_username(uid, buf, 1024); - if (!name) - name = v; + unsigned err_no = str2u(buffer_tostring(wb)); + if(err_no > 0 && err_no < sizeof(errno_map) / sizeof(*errno_map)) { + const char *name = errno_map[err_no]; + if(name) { + buffer_flush(wb); + buffer_strcat(wb, name); + } + } + } +} + +// ---------------------------------------------------------------------------- +// UID and GID transformation + +#define UID_GID_HASHTABLE_SIZE 10000 + +struct word_t2str_hashtable_entry { + struct word_t2str_hashtable_entry *next; + Word_t hash; + size_t len; + char str[]; +}; + +struct word_t2str_hashtable { + SPINLOCK spinlock; + size_t size; + struct word_t2str_hashtable_entry *hashtable[UID_GID_HASHTABLE_SIZE]; +}; + +struct word_t2str_hashtable uid_hashtable = { + .size = UID_GID_HASHTABLE_SIZE, +}; + +struct word_t2str_hashtable gid_hashtable = { + .size = UID_GID_HASHTABLE_SIZE, +}; + +struct word_t2str_hashtable_entry **word_t2str_hashtable_slot(struct word_t2str_hashtable *ht, Word_t hash) { + size_t slot = hash % ht->size; + struct word_t2str_hashtable_entry **e = &ht->hashtable[slot]; + + while(*e && (*e)->hash != hash) + e = &((*e)->next); + + return e; +} + +const char *uid_to_username_cached(uid_t uid, size_t *length) { + spinlock_lock(&uid_hashtable.spinlock); + + struct word_t2str_hashtable_entry **e = word_t2str_hashtable_slot(&uid_hashtable, uid); + if(!(*e)) { + static __thread char buf[1024]; + const char *name = uid_to_username(uid, buf, sizeof(buf)); + size_t size = strlen(name) + 1; + + *e = callocz(1, sizeof(struct word_t2str_hashtable_entry) + size); + (*e)->len = size - 1; + (*e)->hash = uid; + memcpy((*e)->str, name, size); + } + + spinlock_unlock(&uid_hashtable.spinlock); - sv = dictionary_set(cache, v, (void *)name, strlen(name) + 1); + *length = (*e)->len; + return (*e)->str; +} + +const char *gid_to_groupname_cached(gid_t gid, size_t *length) { + spinlock_lock(&gid_hashtable.spinlock); + + struct word_t2str_hashtable_entry **e = word_t2str_hashtable_slot(&gid_hashtable, gid); + if(!(*e)) { + static __thread char buf[1024]; + const char *name = gid_to_groupname(gid, buf, sizeof(buf)); + size_t size = strlen(name) + 1; + + *e = callocz(1, sizeof(struct word_t2str_hashtable_entry) + size); + (*e)->len = size - 1; + (*e)->hash = gid; + memcpy((*e)->str, name, size); + } + + spinlock_unlock(&gid_hashtable.spinlock); + + *length = (*e)->len; + return (*e)->str; +} + +DICTIONARY *boot_ids_to_first_ut = NULL; + +static void netdata_systemd_journal_transform_boot_id(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { + const char *boot_id = buffer_tostring(wb); + if(*boot_id && isxdigit(*boot_id)) { + usec_t ut = UINT64_MAX; + usec_t *p_ut = dictionary_get(boot_ids_to_first_ut, boot_id); + if(!p_ut) { + struct journal_file *jf; + dfe_start_read(journal_files_registry, jf) { + const char *files[2] = { + [0] = jf_dfe.name, + [1] = NULL, + }; + + sd_journal *j = NULL; + if(sd_journal_open_files(&j, files, ND_SD_JOURNAL_OPEN_FLAGS) < 0 || !j) + continue; + + char m[100]; + size_t len = snprintfz(m, sizeof(m), "_BOOT_ID=%s", boot_id); + usec_t t_ut = 0; + if(sd_journal_add_match(j, m, len) < 0 || + sd_journal_seek_head(j) < 0 || + sd_journal_next(j) < 0 || + sd_journal_get_realtime_usec(j, &t_ut) < 0 || !t_ut) { + sd_journal_close(j); + continue; + } + + if(t_ut < ut) + ut = t_ut; + + sd_journal_close(j); + } + dfe_done(jf); + + dictionary_set(boot_ids_to_first_ut, boot_id, &ut, sizeof(ut)); } + else + ut = *p_ut; + + if(ut != UINT64_MAX) { + time_t timestamp_sec = (time_t)(ut / USEC_PER_SEC); + struct tm tm; + char buffer[30]; + + gmtime_r(×tamp_sec, &tm); + strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", &tm); + + switch(scope) { + default: + case FACETS_TRANSFORM_DATA: + case FACETS_TRANSFORM_VALUE: + buffer_sprintf(wb, " (%s UTC) ", buffer); + break; + + case FACETS_TRANSFORM_FACET: + case FACETS_TRANSFORM_FACET_SORT: + case FACETS_TRANSFORM_HISTOGRAM: + buffer_flush(wb); + buffer_sprintf(wb, "%s UTC", buffer); + break; + } + } + } +} - buffer_flush(wb); - buffer_strcat(wb, sv); +static void netdata_systemd_journal_transform_uid(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { + if(scope == FACETS_TRANSFORM_FACET_SORT) + return; + + const char *v = buffer_tostring(wb); + if(*v && isdigit(*v)) { + uid_t uid = str2i(buffer_tostring(wb)); + size_t len; + const char *name = uid_to_username_cached(uid, &len); + buffer_contents_replace(wb, name, len); } } -static void systemd_journal_transform_gid(FACETS *facets __maybe_unused, BUFFER *wb, void *data) { - DICTIONARY *cache = data; +static void netdata_systemd_journal_transform_gid(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { + if(scope == FACETS_TRANSFORM_FACET_SORT) + return; + const char *v = buffer_tostring(wb); if(*v && isdigit(*v)) { - const char *sv = dictionary_get(cache, v); - if(!sv) { - char buf[1024 + 1]; - int gid = str2i(buffer_tostring(wb)); - const char *name = gid_to_groupname(gid, buf, 1024); - if (!name) - name = v; + gid_t gid = str2i(buffer_tostring(wb)); + size_t len; + const char *name = gid_to_groupname_cached(gid, &len); + buffer_contents_replace(wb, name, len); + } +} - sv = dictionary_set(cache, v, (void *)name, strlen(name) + 1); +const char *linux_capabilities[] = { + [CAP_CHOWN] = "CHOWN", + [CAP_DAC_OVERRIDE] = "DAC_OVERRIDE", + [CAP_DAC_READ_SEARCH] = "DAC_READ_SEARCH", + [CAP_FOWNER] = "FOWNER", + [CAP_FSETID] = "FSETID", + [CAP_KILL] = "KILL", + [CAP_SETGID] = "SETGID", + [CAP_SETUID] = "SETUID", + [CAP_SETPCAP] = "SETPCAP", + [CAP_LINUX_IMMUTABLE] = "LINUX_IMMUTABLE", + [CAP_NET_BIND_SERVICE] = "NET_BIND_SERVICE", + [CAP_NET_BROADCAST] = "NET_BROADCAST", + [CAP_NET_ADMIN] = "NET_ADMIN", + [CAP_NET_RAW] = "NET_RAW", + [CAP_IPC_LOCK] = "IPC_LOCK", + [CAP_IPC_OWNER] = "IPC_OWNER", + [CAP_SYS_MODULE] = "SYS_MODULE", + [CAP_SYS_RAWIO] = "SYS_RAWIO", + [CAP_SYS_CHROOT] = "SYS_CHROOT", + [CAP_SYS_PTRACE] = "SYS_PTRACE", + [CAP_SYS_PACCT] = "SYS_PACCT", + [CAP_SYS_ADMIN] = "SYS_ADMIN", + [CAP_SYS_BOOT] = "SYS_BOOT", + [CAP_SYS_NICE] = "SYS_NICE", + [CAP_SYS_RESOURCE] = "SYS_RESOURCE", + [CAP_SYS_TIME] = "SYS_TIME", + [CAP_SYS_TTY_CONFIG] = "SYS_TTY_CONFIG", + [CAP_MKNOD] = "MKNOD", + [CAP_LEASE] = "LEASE", + [CAP_AUDIT_WRITE] = "AUDIT_WRITE", + [CAP_AUDIT_CONTROL] = "AUDIT_CONTROL", + [CAP_SETFCAP] = "SETFCAP", + [CAP_MAC_OVERRIDE] = "MAC_OVERRIDE", + [CAP_MAC_ADMIN] = "MAC_ADMIN", + [CAP_SYSLOG] = "SYSLOG", + [CAP_WAKE_ALARM] = "WAKE_ALARM", + [CAP_BLOCK_SUSPEND] = "BLOCK_SUSPEND", + [37 /*CAP_AUDIT_READ*/] = "AUDIT_READ", + [38 /*CAP_PERFMON*/] = "PERFMON", + [39 /*CAP_BPF*/] = "BPF", + [40 /* CAP_CHECKPOINT_RESTORE */] = "CHECKPOINT_RESTORE", +}; + +static void netdata_systemd_journal_transform_cap_effective(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { + if(scope == FACETS_TRANSFORM_FACET_SORT) + return; + + const char *v = buffer_tostring(wb); + if(*v && isdigit(*v)) { + uint64_t cap = strtoul(buffer_tostring(wb), NULL, 16); + if(cap) { + buffer_fast_strcat(wb, " (", 2); + for (size_t i = 0, added = 0; i < sizeof(linux_capabilities) / sizeof(linux_capabilities[0]); i++) { + if (linux_capabilities[i] && (cap & (1ULL << i))) { + + if (added) + buffer_fast_strcat(wb, " | ", 3); + + buffer_strcat(wb, linux_capabilities[i]); + added++; + } + } + buffer_fast_strcat(wb, ")", 1); } + } +} - buffer_flush(wb); - buffer_strcat(wb, sv); +static void netdata_systemd_journal_transform_timestamp_usec(FACETS *facets __maybe_unused, BUFFER *wb, FACETS_TRANSFORMATION_SCOPE scope __maybe_unused, void *data __maybe_unused) { + if(scope == FACETS_TRANSFORM_FACET_SORT) + return; + + const char *v = buffer_tostring(wb); + if(*v && isdigit(*v)) { + uint64_t ut = str2ull(buffer_tostring(wb), NULL); + if(ut) { + time_t timestamp_sec = ut / USEC_PER_SEC; + struct tm tm; + char buffer[30]; + + gmtime_r(×tamp_sec, &tm); + strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", &tm); + buffer_sprintf(wb, " (%s.%06llu UTC)", buffer, ut % USEC_PER_SEC); + } } } -static void systemd_journal_dynamic_row_id(FACETS *facets __maybe_unused, BUFFER *json_array, FACET_ROW_KEY_VALUE *rkv, FACET_ROW *row, void *data __maybe_unused) { +// ---------------------------------------------------------------------------- + +static void netdata_systemd_journal_dynamic_row_id(FACETS *facets __maybe_unused, BUFFER *json_array, FACET_ROW_KEY_VALUE *rkv, FACET_ROW *row, void *data __maybe_unused) { FACET_ROW_KEY_VALUE *pid_rkv = dictionary_get(row->dict, "_PID"); const char *pid = pid_rkv ? buffer_tostring(pid_rkv->wb) : FACET_VALUE_UNSET; - FACET_ROW_KEY_VALUE *syslog_identifier_rkv = dictionary_get(row->dict, "SYSLOG_IDENTIFIER"); - const char *identifier = syslog_identifier_rkv ? buffer_tostring(syslog_identifier_rkv->wb) : FACET_VALUE_UNSET; + const char *identifier = NULL; + FACET_ROW_KEY_VALUE *container_name_rkv = dictionary_get(row->dict, "CONTAINER_NAME"); + if(container_name_rkv && !container_name_rkv->empty) + identifier = buffer_tostring(container_name_rkv->wb); - if(strcmp(identifier, FACET_VALUE_UNSET) == 0) { - FACET_ROW_KEY_VALUE *comm_rkv = dictionary_get(row->dict, "_COMM"); - identifier = comm_rkv ? buffer_tostring(comm_rkv->wb) : FACET_VALUE_UNSET; + if(!identifier) { + FACET_ROW_KEY_VALUE *syslog_identifier_rkv = dictionary_get(row->dict, "SYSLOG_IDENTIFIER"); + if(syslog_identifier_rkv && !syslog_identifier_rkv->empty) + identifier = buffer_tostring(syslog_identifier_rkv->wb); + + if(!identifier) { + FACET_ROW_KEY_VALUE *comm_rkv = dictionary_get(row->dict, "_COMM"); + if(comm_rkv && !comm_rkv->empty) + identifier = buffer_tostring(comm_rkv->wb); + } } buffer_flush(rkv->wb); - if(strcmp(pid, FACET_VALUE_UNSET) == 0) - buffer_strcat(rkv->wb, identifier); + if(!identifier) + buffer_strcat(rkv->wb, FACET_VALUE_UNSET); else buffer_sprintf(rkv->wb, "%s[%s]", identifier, pid); buffer_json_add_array_item_string(json_array, buffer_tostring(rkv->wb)); } -static void function_systemd_journal(const char *transaction, char *function, char *line_buffer __maybe_unused, int line_max __maybe_unused, int timeout __maybe_unused) { - char *words[SYSTEMD_JOURNAL_MAX_PARAMS] = { NULL }; - size_t num_words = quoted_strings_splitter_pluginsd(function, words, SYSTEMD_JOURNAL_MAX_PARAMS); +static void netdata_systemd_journal_rich_message(FACETS *facets __maybe_unused, BUFFER *json_array, FACET_ROW_KEY_VALUE *rkv, FACET_ROW *row __maybe_unused, void *data __maybe_unused) { + buffer_json_add_array_item_object(json_array); + buffer_json_member_add_string(json_array, "value", buffer_tostring(rkv->wb)); + buffer_json_object_close(json_array); +} + +DICTIONARY *function_query_status_dict = NULL; + +static void function_systemd_journal_progress(BUFFER *wb, const char *transaction, const char *progress_id) { + if(!progress_id || !(*progress_id)) { + netdata_mutex_lock(&stdout_mutex); + pluginsd_function_json_error_to_stdout(transaction, HTTP_RESP_BAD_REQUEST, "missing progress id"); + netdata_mutex_unlock(&stdout_mutex); + return; + } + + const DICTIONARY_ITEM *item = dictionary_get_and_acquire_item(function_query_status_dict, progress_id); + + if(!item) { + netdata_mutex_lock(&stdout_mutex); + pluginsd_function_json_error_to_stdout(transaction, HTTP_RESP_NOT_FOUND, "progress id is not found here"); + netdata_mutex_unlock(&stdout_mutex); + return; + } + + FUNCTION_QUERY_STATUS *fqs = dictionary_acquired_item_value(item); + + usec_t now_monotonic_ut = now_monotonic_usec(); + if(now_monotonic_ut + 10 * USEC_PER_SEC > fqs->stop_monotonic_ut) + fqs->stop_monotonic_ut = now_monotonic_ut + 10 * USEC_PER_SEC; + + usec_t duration_ut = now_monotonic_ut - fqs->started_monotonic_ut; + + size_t files_matched = fqs->files_matched; + size_t file_working = fqs->file_working; + if(file_working > files_matched) + files_matched = file_working; + + size_t rows_read = __atomic_load_n(&fqs->rows_read, __ATOMIC_RELAXED); + size_t bytes_read = __atomic_load_n(&fqs->bytes_read, __ATOMIC_RELAXED); - BUFFER *wb = buffer_create(0, NULL); buffer_flush(wb); - buffer_json_initialize(wb, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_NEWLINE_ON_ARRAY_ITEMS); + buffer_json_initialize(wb, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_MINIFY); + buffer_json_member_add_uint64(wb, "status", HTTP_RESP_OK); + buffer_json_member_add_string(wb, "type", "table"); + buffer_json_member_add_uint64(wb, "running_duration_usec", duration_ut); + buffer_json_member_add_double(wb, "progress", (double)file_working * 100.0 / (double)files_matched); + char msg[1024 + 1]; + snprintfz(msg, 1024, + "Read %zu rows (%0.0f rows/s), " + "data %0.1f MB (%0.1f MB/s), " + "file %zu of %zu", + rows_read, (double)rows_read / (double)duration_ut * (double)USEC_PER_SEC, + (double)bytes_read / 1024.0 / 1024.0, ((double)bytes_read / (double)duration_ut * (double)USEC_PER_SEC) / 1024.0 / 1024.0, + file_working, files_matched + ); + buffer_json_member_add_string(wb, "message", msg); + buffer_json_finalize(wb); + + netdata_mutex_lock(&stdout_mutex); + pluginsd_function_result_to_stdout(transaction, HTTP_RESP_OK, "application/json", now_realtime_sec() + 1, wb); + netdata_mutex_unlock(&stdout_mutex); - FACETS *facets = facets_create(50, 0, FACETS_OPTION_ALL_KEYS_FTS, + dictionary_acquired_item_release(function_query_status_dict, item); +} + +static void function_systemd_journal(const char *transaction, char *function, int timeout, bool *cancelled) { + fstat_thread_calls = 0; + fstat_thread_cached_responses = 0; + journal_files_registry_update(); + + BUFFER *wb = buffer_create(0, NULL); + buffer_flush(wb); + buffer_json_initialize(wb, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_MINIFY); + + usec_t now_monotonic_ut = now_monotonic_usec(); + FUNCTION_QUERY_STATUS tmp_fqs = { + .cancelled = cancelled, + .started_monotonic_ut = now_monotonic_ut, + .stop_monotonic_ut = now_monotonic_ut + (timeout * USEC_PER_SEC), + }; + FUNCTION_QUERY_STATUS *fqs = NULL; + const DICTIONARY_ITEM *fqs_item = NULL; + + FACETS *facets = facets_create(50, FACETS_OPTION_ALL_KEYS_FTS, SYSTEMD_ALWAYS_VISIBLE_KEYS, SYSTEMD_KEYS_INCLUDED_IN_FACETS, SYSTEMD_KEYS_EXCLUDED_FROM_FACETS); + facets_accepted_param(facets, JOURNAL_PARAMETER_INFO); + facets_accepted_param(facets, JOURNAL_PARAMETER_SOURCE); facets_accepted_param(facets, JOURNAL_PARAMETER_AFTER); facets_accepted_param(facets, JOURNAL_PARAMETER_BEFORE); facets_accepted_param(facets, JOURNAL_PARAMETER_ANCHOR); + facets_accepted_param(facets, JOURNAL_PARAMETER_DIRECTION); facets_accepted_param(facets, JOURNAL_PARAMETER_LAST); facets_accepted_param(facets, JOURNAL_PARAMETER_QUERY); + facets_accepted_param(facets, JOURNAL_PARAMETER_FACETS); + facets_accepted_param(facets, JOURNAL_PARAMETER_HISTOGRAM); + facets_accepted_param(facets, JOURNAL_PARAMETER_IF_MODIFIED_SINCE); + facets_accepted_param(facets, JOURNAL_PARAMETER_DATA_ONLY); + facets_accepted_param(facets, JOURNAL_PARAMETER_ID); + facets_accepted_param(facets, JOURNAL_PARAMETER_PROGRESS); + facets_accepted_param(facets, JOURNAL_PARAMETER_DELTA); + facets_accepted_param(facets, JOURNAL_PARAMETER_TAIL); + +#ifdef HAVE_SD_JOURNAL_RESTART_FIELDS + facets_accepted_param(facets, JOURNAL_PARAMETER_SLICE); +#endif // HAVE_SD_JOURNAL_RESTART_FIELDS // register the fields in the order you want them on the dashboard - facets_register_dynamic_key(facets, "ND_JOURNAL_PROCESS", FACET_KEY_OPTION_NO_FACET|FACET_KEY_OPTION_VISIBLE|FACET_KEY_OPTION_FTS, - systemd_journal_dynamic_row_id, NULL); + facets_register_row_severity(facets, syslog_priority_to_facet_severity, NULL); + + facets_register_key_name(facets, "_HOSTNAME", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_VISIBLE | FACET_KEY_OPTION_FTS); + + facets_register_dynamic_key_name(facets, JOURNAL_KEY_ND_JOURNAL_PROCESS, + FACET_KEY_OPTION_NEVER_FACET | FACET_KEY_OPTION_VISIBLE | FACET_KEY_OPTION_FTS, + netdata_systemd_journal_dynamic_row_id, NULL); + + facets_register_key_name(facets, "MESSAGE", + FACET_KEY_OPTION_NEVER_FACET | FACET_KEY_OPTION_MAIN_TEXT | + FACET_KEY_OPTION_VISIBLE | FACET_KEY_OPTION_FTS); + +// facets_register_dynamic_key_name(facets, "MESSAGE", +// FACET_KEY_OPTION_NEVER_FACET | FACET_KEY_OPTION_MAIN_TEXT | FACET_KEY_OPTION_RICH_TEXT | +// FACET_KEY_OPTION_VISIBLE | FACET_KEY_OPTION_FTS, +// netdata_systemd_journal_rich_message, NULL); - facets_register_key(facets, "MESSAGE", - FACET_KEY_OPTION_NO_FACET|FACET_KEY_OPTION_MAIN_TEXT|FACET_KEY_OPTION_VISIBLE|FACET_KEY_OPTION_FTS); + facets_register_key_name_transformation(facets, "PRIORITY", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_priority, NULL); - facets_register_key_transformation(facets, "PRIORITY", FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS, - systemd_journal_transform_priority, NULL); + facets_register_key_name_transformation(facets, "SYSLOG_FACILITY", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_syslog_facility, NULL); - facets_register_key_transformation(facets, "SYSLOG_FACILITY", FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS, - systemd_journal_transform_syslog_facility, NULL); + facets_register_key_name_transformation(facets, "ERRNO", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_errno, NULL); - facets_register_key(facets, "SYSLOG_IDENTIFIER", FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS); - facets_register_key(facets, "UNIT", FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS); - facets_register_key(facets, "USER_UNIT", FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS); + facets_register_key_name(facets, JOURNAL_KEY_ND_JOURNAL_FILE, + FACET_KEY_OPTION_NEVER_FACET); - facets_register_key_transformation(facets, "_UID", FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS, - systemd_journal_transform_uid, uids); + facets_register_key_name(facets, "SYSLOG_IDENTIFIER", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS); - facets_register_key_transformation(facets, "_GID", FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS, - systemd_journal_transform_gid, gids); + facets_register_key_name(facets, "UNIT", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS); + facets_register_key_name(facets, "USER_UNIT", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS); + + facets_register_key_name_transformation(facets, "_BOOT_ID", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_boot_id, NULL); + + facets_register_key_name_transformation(facets, "_SYSTEMD_OWNER_UID", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_uid, NULL); + + facets_register_key_name_transformation(facets, "_UID", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_uid, NULL); + + facets_register_key_name_transformation(facets, "OBJECT_SYSTEMD_OWNER_UID", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_uid, NULL); + + facets_register_key_name_transformation(facets, "OBJECT_UID", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_uid, NULL); + + facets_register_key_name_transformation(facets, "_GID", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_gid, NULL); + + facets_register_key_name_transformation(facets, "OBJECT_GID", + FACET_KEY_OPTION_FACET | FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_gid, NULL); + + facets_register_key_name_transformation(facets, "_CAP_EFFECTIVE", + FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_cap_effective, NULL); + + facets_register_key_name_transformation(facets, "_AUDIT_LOGINUID", + FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_uid, NULL); + + facets_register_key_name_transformation(facets, "OBJECT_AUDIT_LOGINUID", + FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_uid, NULL); + + facets_register_key_name_transformation(facets, "_SOURCE_REALTIME_TIMESTAMP", + FACET_KEY_OPTION_FTS | FACET_KEY_OPTION_TRANSFORM_VIEW, + netdata_systemd_journal_transform_timestamp_usec, NULL); + + // ------------------------------------------------------------------------ + // parse the parameters + + bool info = false, data_only = false, progress = false, slice = JOURNAL_DEFAULT_SLICE_MODE, delta = false, tail = false; time_t after_s = 0, before_s = 0; usec_t anchor = 0; + usec_t if_modified_since = 0; size_t last = 0; + FACETS_ANCHOR_DIRECTION direction = JOURNAL_DEFAULT_DIRECTION; const char *query = NULL; + const char *chart = NULL; + const char *source = NULL; + const char *progress_id = NULL; + SD_JOURNAL_FILE_SOURCE_TYPE source_type = SDJF_ALL; + size_t filters = 0; - buffer_json_member_add_object(wb, "request"); - buffer_json_member_add_object(wb, "filters"); + buffer_json_member_add_object(wb, "_request"); + char *words[SYSTEMD_JOURNAL_MAX_PARAMS] = { NULL }; + size_t num_words = quoted_strings_splitter_pluginsd(function, words, SYSTEMD_JOURNAL_MAX_PARAMS); for(int i = 1; i < SYSTEMD_JOURNAL_MAX_PARAMS ;i++) { - const char *keyword = get_word(words, num_words, i); + char *keyword = get_word(words, num_words, i); if(!keyword) break; if(strcmp(keyword, JOURNAL_PARAMETER_HELP) == 0) { - systemd_journal_function_help(transaction); + netdata_systemd_journal_function_help(transaction); goto cleanup; } - else if(strncmp(keyword, JOURNAL_PARAMETER_AFTER ":", strlen(JOURNAL_PARAMETER_AFTER ":")) == 0) { - after_s = str2l(&keyword[strlen(JOURNAL_PARAMETER_AFTER ":")]); + else if(strcmp(keyword, JOURNAL_PARAMETER_INFO) == 0) { + info = true; } - else if(strncmp(keyword, JOURNAL_PARAMETER_BEFORE ":", strlen(JOURNAL_PARAMETER_BEFORE ":")) == 0) { - before_s = str2l(&keyword[strlen(JOURNAL_PARAMETER_BEFORE ":")]); + else if(strcmp(keyword, JOURNAL_PARAMETER_PROGRESS) == 0) { + progress = true; } - else if(strncmp(keyword, JOURNAL_PARAMETER_ANCHOR ":", strlen(JOURNAL_PARAMETER_ANCHOR ":")) == 0) { - anchor = str2ull(&keyword[strlen(JOURNAL_PARAMETER_ANCHOR ":")], NULL); + else if(strncmp(keyword, JOURNAL_PARAMETER_DELTA ":", sizeof(JOURNAL_PARAMETER_DELTA ":") - 1) == 0) { + char *v = &keyword[sizeof(JOURNAL_PARAMETER_DELTA ":") - 1]; + + if(strcmp(v, "false") == 0 || strcmp(v, "no") == 0 || strcmp(v, "0") == 0) + delta = false; + else + delta = true; } - else if(strncmp(keyword, JOURNAL_PARAMETER_LAST ":", strlen(JOURNAL_PARAMETER_LAST ":")) == 0) { - last = str2ul(&keyword[strlen(JOURNAL_PARAMETER_LAST ":")]); + else if(strncmp(keyword, JOURNAL_PARAMETER_TAIL ":", sizeof(JOURNAL_PARAMETER_TAIL ":") - 1) == 0) { + char *v = &keyword[sizeof(JOURNAL_PARAMETER_TAIL ":") - 1]; + + if(strcmp(v, "false") == 0 || strcmp(v, "no") == 0 || strcmp(v, "0") == 0) + tail = false; + else + tail = true; } - else if(strncmp(keyword, JOURNAL_PARAMETER_QUERY ":", strlen(JOURNAL_PARAMETER_QUERY ":")) == 0) { - query= &keyword[strlen(JOURNAL_PARAMETER_QUERY ":")]; + else if(strncmp(keyword, JOURNAL_PARAMETER_DATA_ONLY ":", sizeof(JOURNAL_PARAMETER_DATA_ONLY ":") - 1) == 0) { + char *v = &keyword[sizeof(JOURNAL_PARAMETER_DATA_ONLY ":") - 1]; + + if(strcmp(v, "false") == 0 || strcmp(v, "no") == 0 || strcmp(v, "0") == 0) + data_only = false; + else + data_only = true; + } + else if(strncmp(keyword, JOURNAL_PARAMETER_SLICE ":", sizeof(JOURNAL_PARAMETER_SLICE ":") - 1) == 0) { + char *v = &keyword[sizeof(JOURNAL_PARAMETER_SLICE ":") - 1]; + + if(strcmp(v, "false") == 0 || strcmp(v, "no") == 0 || strcmp(v, "0") == 0) + slice = false; + else + slice = true; + } + else if(strncmp(keyword, JOURNAL_PARAMETER_ID ":", sizeof(JOURNAL_PARAMETER_ID ":") - 1) == 0) { + char *id = &keyword[sizeof(JOURNAL_PARAMETER_ID ":") - 1]; + + if(*id) + progress_id = id; + } + else if(strncmp(keyword, JOURNAL_PARAMETER_SOURCE ":", sizeof(JOURNAL_PARAMETER_SOURCE ":") - 1) == 0) { + source = &keyword[sizeof(JOURNAL_PARAMETER_SOURCE ":") - 1]; + + if(strcmp(source, SDJF_SOURCE_ALL_NAME) == 0) { + source_type = SDJF_ALL; + source = NULL; + } + else if(strcmp(source, SDJF_SOURCE_LOCAL_NAME) == 0) { + source_type = SDJF_LOCAL; + source = NULL; + } + else if(strcmp(source, SDJF_SOURCE_REMOTES_NAME) == 0) { + source_type = SDJF_REMOTE; + source = NULL; + } + else if(strcmp(source, SDJF_SOURCE_NAMESPACES_NAME) == 0) { + source_type = SDJF_NAMESPACE; + source = NULL; + } + else if(strcmp(source, SDJF_SOURCE_LOCAL_SYSTEM_NAME) == 0) { + source_type = SDJF_LOCAL | SDJF_SYSTEM; + source = NULL; + } + else if(strcmp(source, SDJF_SOURCE_LOCAL_USERS_NAME) == 0) { + source_type = SDJF_LOCAL | SDJF_USER; + source = NULL; + } + else if(strcmp(source, SDJF_SOURCE_LOCAL_OTHER_NAME) == 0) { + source_type = SDJF_LOCAL | SDJF_OTHER; + source = NULL; + } + else { + source_type = SDJF_ALL; + // else, match the source, whatever it is + } + } + else if(strncmp(keyword, JOURNAL_PARAMETER_AFTER ":", sizeof(JOURNAL_PARAMETER_AFTER ":") - 1) == 0) { + after_s = str2l(&keyword[sizeof(JOURNAL_PARAMETER_AFTER ":") - 1]); + } + else if(strncmp(keyword, JOURNAL_PARAMETER_BEFORE ":", sizeof(JOURNAL_PARAMETER_BEFORE ":") - 1) == 0) { + before_s = str2l(&keyword[sizeof(JOURNAL_PARAMETER_BEFORE ":") - 1]); + } + else if(strncmp(keyword, JOURNAL_PARAMETER_IF_MODIFIED_SINCE ":", sizeof(JOURNAL_PARAMETER_IF_MODIFIED_SINCE ":") - 1) == 0) { + if_modified_since = str2ull(&keyword[sizeof(JOURNAL_PARAMETER_IF_MODIFIED_SINCE ":") - 1], NULL); + } + else if(strncmp(keyword, JOURNAL_PARAMETER_ANCHOR ":", sizeof(JOURNAL_PARAMETER_ANCHOR ":") - 1) == 0) { + anchor = str2ull(&keyword[sizeof(JOURNAL_PARAMETER_ANCHOR ":") - 1], NULL); + } + else if(strncmp(keyword, JOURNAL_PARAMETER_DIRECTION ":", sizeof(JOURNAL_PARAMETER_DIRECTION ":") - 1) == 0) { + direction = strcasecmp(&keyword[sizeof(JOURNAL_PARAMETER_DIRECTION ":") - 1], "forward") == 0 ? FACETS_ANCHOR_DIRECTION_FORWARD : FACETS_ANCHOR_DIRECTION_BACKWARD; + } + else if(strncmp(keyword, JOURNAL_PARAMETER_LAST ":", sizeof(JOURNAL_PARAMETER_LAST ":") - 1) == 0) { + last = str2ul(&keyword[sizeof(JOURNAL_PARAMETER_LAST ":") - 1]); + } + else if(strncmp(keyword, JOURNAL_PARAMETER_QUERY ":", sizeof(JOURNAL_PARAMETER_QUERY ":") - 1) == 0) { + query= &keyword[sizeof(JOURNAL_PARAMETER_QUERY ":") - 1]; + } + else if(strncmp(keyword, JOURNAL_PARAMETER_HISTOGRAM ":", sizeof(JOURNAL_PARAMETER_HISTOGRAM ":") - 1) == 0) { + chart = &keyword[sizeof(JOURNAL_PARAMETER_HISTOGRAM ":") - 1]; + } + else if(strncmp(keyword, JOURNAL_PARAMETER_FACETS ":", sizeof(JOURNAL_PARAMETER_FACETS ":") - 1) == 0) { + char *value = &keyword[sizeof(JOURNAL_PARAMETER_FACETS ":") - 1]; + if(*value) { + buffer_json_member_add_array(wb, JOURNAL_PARAMETER_FACETS); + + while(value) { + char *sep = strchr(value, ','); + if(sep) + *sep++ = '\0'; + + facets_register_facet_id(facets, value, FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS|FACET_KEY_OPTION_REORDER); + buffer_json_add_array_item_string(wb, value); + + value = sep; + } + + buffer_json_array_close(wb); // JOURNAL_PARAMETER_FACETS + } } else { char *value = strchr(keyword, ':'); @@ -412,8 +2437,9 @@ static void function_systemd_journal(const char *transaction, char *function, ch if(sep) *sep++ = '\0'; - facets_register_facet_filter(facets, keyword, value, FACET_KEY_OPTION_REORDER); + facets_register_facet_id_filter(facets, keyword, value, FACET_KEY_OPTION_FACET|FACET_KEY_OPTION_FTS|FACET_KEY_OPTION_REORDER); buffer_json_add_array_item_string(wb, value); + filters++; value = sep; } @@ -423,18 +2449,31 @@ static void function_systemd_journal(const char *transaction, char *function, ch } } - buffer_json_object_close(wb); // filters + // ------------------------------------------------------------------------ + // put this request into the progress db + + if(progress_id && *progress_id) { + fqs_item = dictionary_set_and_acquire_item(function_query_status_dict, progress_id, &tmp_fqs, sizeof(tmp_fqs)); + fqs = dictionary_acquired_item_value(fqs_item); + } + else { + // no progress id given, proceed without registering our progress in the dictionary + fqs = &tmp_fqs; + fqs_item = NULL; + } + + // ------------------------------------------------------------------------ + // validate parameters - time_t expires = now_realtime_sec() + 1; - time_t now_s; + time_t now_s = now_realtime_sec(); + time_t expires = now_s + 1; if(!after_s && !before_s) { - now_s = now_realtime_sec(); before_s = now_s; after_s = before_s - SYSTEMD_JOURNAL_DEFAULT_QUERY_DURATION; } else - rrdr_relative_window_to_absolute(&after_s, &before_s, &now_s, false); + rrdr_relative_window_to_absolute(&after_s, &before_s, now_s); if(after_s > before_s) { time_t tmp = after_s; @@ -448,85 +2487,179 @@ static void function_systemd_journal(const char *transaction, char *function, ch if(!last) last = SYSTEMD_JOURNAL_DEFAULT_ITEMS_PER_QUERY; - buffer_json_member_add_time_t(wb, "after", after_s); - buffer_json_member_add_time_t(wb, "before", before_s); - buffer_json_member_add_uint64(wb, "anchor", anchor); - buffer_json_member_add_uint64(wb, "last", last); - buffer_json_member_add_string(wb, "query", query); - buffer_json_member_add_time_t(wb, "timeout", timeout); - buffer_json_object_close(wb); // request - - facets_set_items(facets, last); - facets_set_anchor(facets, anchor); - facets_set_query(facets, query); - int response = systemd_journal_query(wb, facets, after_s * USEC_PER_SEC, before_s * USEC_PER_SEC, - now_monotonic_usec() + (timeout - 1) * USEC_PER_SEC); - if(response != HTTP_RESP_OK) { - pluginsd_function_json_error(transaction, response, "failed"); - goto cleanup; + // ------------------------------------------------------------------------ + // set query time-frame, anchors and direction + + fqs->after_ut = after_s * USEC_PER_SEC; + fqs->before_ut = (before_s * USEC_PER_SEC) + USEC_PER_SEC - 1; + fqs->if_modified_since = if_modified_since; + fqs->data_only = data_only; + fqs->delta = (fqs->data_only) ? delta : false; + fqs->tail = (fqs->data_only && fqs->if_modified_since) ? tail : false; + fqs->source = string_strdupz(source); + fqs->source_type = source_type; + fqs->entries = last; + fqs->last_modified = 0; + fqs->filters = filters; + fqs->query = (query && *query) ? query : NULL; + fqs->histogram = (chart && *chart) ? chart : NULL; + fqs->direction = direction; + fqs->anchor.start_ut = anchor; + fqs->anchor.stop_ut = 0; + + if(fqs->anchor.start_ut && fqs->tail) { + // a tail request + // we need the top X entries from BEFORE + // but, we need to calculate the facets and the + // histogram up to the anchor + fqs->direction = direction = FACETS_ANCHOR_DIRECTION_BACKWARD; + fqs->anchor.start_ut = 0; + fqs->anchor.stop_ut = anchor; } - pluginsd_function_result_begin_to_stdout(transaction, HTTP_RESP_OK, "application/json", expires); - fwrite(buffer_tostring(wb), buffer_strlen(wb), 1, stdout); + if(anchor && anchor < fqs->after_ut) { + log_fqs(fqs, "received anchor is too small for query timeframe, ignoring anchor"); + anchor = 0; + fqs->anchor.start_ut = 0; + fqs->anchor.stop_ut = 0; + fqs->direction = direction = FACETS_ANCHOR_DIRECTION_BACKWARD; + } + else if(anchor > fqs->before_ut) { + log_fqs(fqs, "received anchor is too big for query timeframe, ignoring anchor"); + anchor = 0; + fqs->anchor.start_ut = 0; + fqs->anchor.stop_ut = 0; + fqs->direction = direction = FACETS_ANCHOR_DIRECTION_BACKWARD; + } - pluginsd_function_result_end_to_stdout(); + facets_set_anchor(facets, fqs->anchor.start_ut, fqs->anchor.stop_ut, fqs->direction); -cleanup: - facets_destroy(facets); - buffer_free(wb); -} + facets_set_additional_options(facets, + ((fqs->data_only) ? FACETS_OPTION_DATA_ONLY : 0) | + ((fqs->delta) ? FACETS_OPTION_SHOW_DELTAS : 0)); -static void *reader_main(void *arg __maybe_unused) { - char buffer[PLUGINSD_LINE_MAX + 1]; + // ------------------------------------------------------------------------ + // set the rest of the query parameters - char *s = NULL; - while(!plugin_should_exit && (s = fgets(buffer, PLUGINSD_LINE_MAX, stdin))) { - char *words[PLUGINSD_MAX_WORDS] = { NULL }; - size_t num_words = quoted_strings_splitter_pluginsd(buffer, words, PLUGINSD_MAX_WORDS); + facets_set_items(facets, fqs->entries); + facets_set_query(facets, fqs->query); - const char *keyword = get_word(words, num_words, 0); +#ifdef HAVE_SD_JOURNAL_RESTART_FIELDS + fqs->slice = slice; + if(slice) + facets_enable_slice_mode(facets); +#else + fqs->slice = false; +#endif - if(keyword && strcmp(keyword, PLUGINSD_KEYWORD_FUNCTION) == 0) { - char *transaction = get_word(words, num_words, 1); - char *timeout_s = get_word(words, num_words, 2); - char *function = get_word(words, num_words, 3); + if(fqs->histogram) + facets_set_timeframe_and_histogram_by_id(facets, fqs->histogram, fqs->after_ut, fqs->before_ut); + else + facets_set_timeframe_and_histogram_by_name(facets, "PRIORITY", fqs->after_ut, fqs->before_ut); - if(!transaction || !*transaction || !timeout_s || !*timeout_s || !function || !*function) { - netdata_log_error("Received incomplete %s (transaction = '%s', timeout = '%s', function = '%s'). Ignoring it.", - keyword, - transaction?transaction:"(unset)", - timeout_s?timeout_s:"(unset)", - function?function:"(unset)"); - } - else { - int timeout = str2i(timeout_s); - if(timeout <= 0) timeout = SYSTEMD_JOURNAL_DEFAULT_TIMEOUT; - netdata_mutex_lock(&mutex); + // ------------------------------------------------------------------------ + // complete the request object + + buffer_json_member_add_boolean(wb, JOURNAL_PARAMETER_INFO, false); + buffer_json_member_add_boolean(wb, JOURNAL_PARAMETER_SLICE, fqs->slice); + buffer_json_member_add_boolean(wb, JOURNAL_PARAMETER_DATA_ONLY, fqs->data_only); + buffer_json_member_add_boolean(wb, JOURNAL_PARAMETER_PROGRESS, false); + buffer_json_member_add_boolean(wb, JOURNAL_PARAMETER_DELTA, fqs->delta); + buffer_json_member_add_boolean(wb, JOURNAL_PARAMETER_TAIL, fqs->tail); + buffer_json_member_add_string(wb, JOURNAL_PARAMETER_ID, progress_id); + buffer_json_member_add_string(wb, JOURNAL_PARAMETER_SOURCE, string2str(fqs->source)); + buffer_json_member_add_uint64(wb, "source_type", fqs->source_type); + buffer_json_member_add_uint64(wb, JOURNAL_PARAMETER_AFTER, fqs->after_ut / USEC_PER_SEC); + buffer_json_member_add_uint64(wb, JOURNAL_PARAMETER_BEFORE, fqs->before_ut / USEC_PER_SEC); + buffer_json_member_add_uint64(wb, "if_modified_since", fqs->if_modified_since); + buffer_json_member_add_uint64(wb, JOURNAL_PARAMETER_ANCHOR, anchor); + buffer_json_member_add_string(wb, JOURNAL_PARAMETER_DIRECTION, fqs->direction == FACETS_ANCHOR_DIRECTION_FORWARD ? "forward" : "backward"); + buffer_json_member_add_uint64(wb, JOURNAL_PARAMETER_LAST, fqs->entries); + buffer_json_member_add_string(wb, JOURNAL_PARAMETER_QUERY, fqs->query); + buffer_json_member_add_string(wb, JOURNAL_PARAMETER_HISTOGRAM, fqs->histogram); + buffer_json_object_close(wb); // request - if(strncmp(function, SYSTEMD_JOURNAL_FUNCTION_NAME, strlen(SYSTEMD_JOURNAL_FUNCTION_NAME)) == 0) - function_systemd_journal(transaction, function, buffer, PLUGINSD_LINE_MAX + 1, timeout); - else - pluginsd_function_json_error(transaction, HTTP_RESP_NOT_FOUND, "No function with this name found in systemd-journal.plugin."); + buffer_json_journal_versions(wb); - fflush(stdout); - netdata_mutex_unlock(&mutex); + // ------------------------------------------------------------------------ + // run the request + + int response; + + if(info) { + facets_accepted_parameters_to_json_array(facets, wb, false); + buffer_json_member_add_array(wb, "required_params"); + { + buffer_json_add_array_item_object(wb); + { + buffer_json_member_add_string(wb, "id", "source"); + buffer_json_member_add_string(wb, "name", "source"); + buffer_json_member_add_string(wb, "help", "Select the SystemD Journal source to query"); + buffer_json_member_add_string(wb, "type", "select"); + buffer_json_member_add_array(wb, "options"); + { + available_journal_file_sources_to_json_array(wb); + } + buffer_json_array_close(wb); // options array } + buffer_json_object_close(wb); // required params object } - else - netdata_log_error("Received unknown command: %s", keyword?keyword:"(unset)"); + buffer_json_array_close(wb); // required_params array + + facets_table_config(wb); + + buffer_json_member_add_uint64(wb, "status", HTTP_RESP_OK); + buffer_json_member_add_string(wb, "type", "table"); + buffer_json_member_add_string(wb, "help", SYSTEMD_JOURNAL_FUNCTION_DESCRIPTION); + buffer_json_finalize(wb); + response = HTTP_RESP_OK; + goto output; } - if(!s || feof(stdin) || ferror(stdin)) { - plugin_should_exit = true; - netdata_log_error("Received error on stdin."); + if(progress) { + function_systemd_journal_progress(wb, transaction, progress_id); + goto cleanup; } - exit(1); + response = netdata_systemd_journal_query(wb, facets, fqs); + + // ------------------------------------------------------------------------ + // cleanup query params + + string_freez(fqs->source); + fqs->source = NULL; + + // ------------------------------------------------------------------------ + // handle error response + + if(response != HTTP_RESP_OK) { + netdata_mutex_lock(&stdout_mutex); + pluginsd_function_json_error_to_stdout(transaction, response, "failed"); + netdata_mutex_unlock(&stdout_mutex); + goto cleanup; + } + +output: + netdata_mutex_lock(&stdout_mutex); + pluginsd_function_result_to_stdout(transaction, response, "application/json", expires, wb); + netdata_mutex_unlock(&stdout_mutex); + +cleanup: + facets_destroy(facets); + buffer_free(wb); + + if(fqs_item) { + dictionary_del(function_query_status_dict, dictionary_acquired_item_name(fqs_item)); + dictionary_acquired_item_release(function_query_status_dict, fqs_item); + dictionary_garbage_collect(function_query_status_dict); + } } +// ---------------------------------------------------------------------------- + int main(int argc __maybe_unused, char **argv __maybe_unused) { stderror = stderr; clocks_init(); @@ -540,44 +2673,104 @@ int main(int argc __maybe_unused, char **argv __maybe_unused) { error_log_errors_per_period = 100; error_log_throttle_period = 3600; - // initialize the threads - netdata_threads_init_for_external_plugins(0); // set the default threads stack size here + log_set_global_severity_for_external_plugins(); + + netdata_configured_host_prefix = getenv("NETDATA_HOST_PREFIX"); + if(verify_netdata_host_prefix() == -1) exit(1); + + // ------------------------------------------------------------------------ + // setup the journal directories + + unsigned d = 0; + + journal_directories[d++].path = strdupz("/var/log/journal"); + journal_directories[d++].path = strdupz("/run/log/journal"); + + if(*netdata_configured_host_prefix) { + char path[PATH_MAX]; + snprintfz(path, sizeof(path), "%s/var/log/journal", netdata_configured_host_prefix); + journal_directories[d++].path = strdupz(path); + snprintfz(path, sizeof(path), "%s/run/log/journal", netdata_configured_host_prefix); + journal_directories[d++].path = strdupz(path); + } + + // terminate the list + journal_directories[d].path = NULL; + + // ------------------------------------------------------------------------ + + function_query_status_dict = dictionary_create_advanced( + DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, + NULL, sizeof(FUNCTION_QUERY_STATUS)); + + // ------------------------------------------------------------------------ + // initialize the used hashes files registry + + used_hashes_registry = dictionary_create(DICT_OPTION_DONT_OVERWRITE_VALUE); + + + // ------------------------------------------------------------------------ + // initialize the journal files registry + + systemd_journal_session = (now_realtime_usec() / USEC_PER_SEC) * USEC_PER_SEC; + + journal_files_registry = dictionary_create_advanced( + DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, + NULL, sizeof(struct journal_file)); + + dictionary_register_insert_callback(journal_files_registry, files_registry_insert_cb, NULL); + dictionary_register_delete_callback(journal_files_registry, files_registry_delete_cb, NULL); + dictionary_register_conflict_callback(journal_files_registry, files_registry_conflict_cb, NULL); + + boot_ids_to_first_ut = dictionary_create_advanced( + DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, + NULL, sizeof(usec_t)); + + journal_files_registry_update(); - uids = dictionary_create(0); - gids = dictionary_create(0); // ------------------------------------------------------------------------ // debug if(argc == 2 && strcmp(argv[1], "debug") == 0) { - char buf[] = "systemd-journal after:-86400 before:0 last:500"; - function_systemd_journal("123", buf, "", 0, 30); + bool cancelled = false; + char buf[] = "systemd-journal after:-16000000 before:0 last:1"; + // char buf[] = "systemd-journal after:1695332964 before:1695937764 direction:backward last:100 slice:true source:all DHKucpqUoe1:PtVoyIuX.MU"; + // char buf[] = "systemd-journal after:1694511062 before:1694514662 anchor:1694514122024403"; + function_systemd_journal("123", buf, 600, &cancelled); exit(1); } // ------------------------------------------------------------------------ + // the event loop for functions + + struct functions_evloop_globals *wg = + functions_evloop_init(SYSTEMD_JOURNAL_WORKER_THREADS, "SDJ", &stdout_mutex, &plugin_should_exit); + + functions_evloop_add_function(wg, SYSTEMD_JOURNAL_FUNCTION_NAME, function_systemd_journal, + SYSTEMD_JOURNAL_DEFAULT_TIMEOUT); - netdata_thread_t reader_thread; - netdata_thread_create(&reader_thread, "SDJ_READER", NETDATA_THREAD_OPTION_DONT_LOG, reader_main, NULL); // ------------------------------------------------------------------------ time_t started_t = now_monotonic_sec(); - size_t iteration; + size_t iteration = 0; usec_t step = 1000 * USEC_PER_MS; bool tty = isatty(fileno(stderr)) == 1; - netdata_mutex_lock(&mutex); + netdata_mutex_lock(&stdout_mutex); fprintf(stdout, PLUGINSD_KEYWORD_FUNCTION " GLOBAL \"%s\" %d \"%s\"\n", SYSTEMD_JOURNAL_FUNCTION_NAME, SYSTEMD_JOURNAL_DEFAULT_TIMEOUT, SYSTEMD_JOURNAL_FUNCTION_DESCRIPTION); heartbeat_t hb; heartbeat_init(&hb); - for(iteration = 0; 1 ; iteration++) { - netdata_mutex_unlock(&mutex); + while(!plugin_should_exit) { + iteration++; + + netdata_mutex_unlock(&stdout_mutex); heartbeat_next(&hb, step); - netdata_mutex_lock(&mutex); + netdata_mutex_lock(&stdout_mutex); if(!tty) fprintf(stdout, "\n"); @@ -589,8 +2782,5 @@ int main(int argc __maybe_unused, char **argv __maybe_unused) { break; } - dictionary_destroy(uids); - dictionary_destroy(gids); - exit(0); } diff --git a/collectors/tc.plugin/README.md b/collectors/tc.plugin/README.md index de5fd4743..2a20ff262 100644..120000 --- a/collectors/tc.plugin/README.md +++ b/collectors/tc.plugin/README.md @@ -1,209 +1 @@ -<!-- -title: "tc.plugin" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/tc.plugin/README.md" -sidebar_label: "tc.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Networking" ---> - -# tc.plugin - -Live demo - **[see it in action here](https://registry.my-netdata.io/#menu_tc)** ! - -![qos](https://cloud.githubusercontent.com/assets/2662304/14439411/b7f36254-0033-11e6-93f0-c739bb6a1c3a.gif) - -Netdata monitors `tc` QoS classes for all interfaces. - -If you also use [FireQOS](http://firehol.org/tutorial/fireqos-new-user/) it will collect interface and class names. - -There is a [shell helper](https://raw.githubusercontent.com/netdata/netdata/master/collectors/tc.plugin/tc-qos-helper.sh.in) for this (all parsing is done by the plugin in `C` code - this shell script is just a configuration for the command to run to get `tc` output). - -The source of the tc plugin is [here](https://raw.githubusercontent.com/netdata/netdata/master/collectors/tc.plugin/plugin_tc.c). It is somewhat complex, because a state machine was needed to keep track of all the `tc` classes, including the pseudo classes tc dynamically creates. - -## Motivation - -One category of metrics missing in Linux monitoring, is bandwidth consumption for each open socket (inbound and outbound traffic). So, you cannot tell how much bandwidth your web server, your database server, your backup, your ssh sessions, etc are using. - -To solve this problem, the most *adventurous* Linux monitoring tools install kernel modules to capture all traffic, analyze it and provide reports per application. A lot of work, CPU intensive and with a great degree of risk (due to the kernel modules involved which might affect the stability of the whole system). Not to mention that such solutions are probably better suited for a core linux router in your network. - -Others use NFACCT, the netfilter accounting module which is already part of the Linux firewall. However, this would require configuring a firewall on every system you want to measure bandwidth (just FYI, I do install a firewall on every server - and I strongly advise you to do so too - but configuring accounting on all servers seems overkill when you don't really need it for billing purposes). - -**There is however a much simpler approach**. - -## QoS - -One of the features the Linux kernel has, but it is rarely used, is its ability to **apply QoS on traffic**. Even most interesting is that it can apply QoS to **both inbound and outbound traffic**. - -QoS is about 2 features: - -1. **Classify traffic** - - Classification is the process of organizing traffic in groups, called **classes**. Classification can evaluate every aspect of network packets, like source and destination ports, source and destination IPs, netfilter marks, etc. - - When you classify traffic, you just assign a label to it. Of course classes have some properties themselves (like queuing mechanisms), but let's say it is that simple: **a label**. For example **I call `web server` traffic, the traffic from my server's tcp/80, tcp/443 and to my server's tcp/80, tcp/443, while I call `web surfing` all other tcp/80 and tcp/443 traffic**. You can use any combinations you like. There is no limit. - -2. **Apply traffic shaping rules to these classes** - - Traffic shaping is used to control how network interface bandwidth should be shared among the classes. Normally, you need to do this, when there is not enough bandwidth to satisfy all the demand, or when you want to control the supply of bandwidth to certain services. Of course classification is sufficient for monitoring traffic, but traffic shaping is also quite important, as we will explain in the next section. - -## Why you want QoS - -1. **Monitoring the bandwidth used by services** - - Netdata provides wonderful real-time charts, like this one (wait to see the orange `rsync` part): - - ![qos3](https://cloud.githubusercontent.com/assets/2662304/14474189/713ede84-0104-11e6-8c9c-8dca5c2abd63.gif) - -2. **Ensure sensitive administrative tasks will not starve for bandwidth** - - Have you tried to ssh to a server when the network is congested? If you have, you already know it does not work very well. QoS can guarantee that services like ssh, dns, ntp, etc will always have a small supply of bandwidth. So, no matter what happens, you will be able to ssh to your server and DNS will always work. - -3. **Ensure administrative tasks will not monopolize all the bandwidth** - - Services like backups, file copies, database dumps, etc can easily monopolize all the available bandwidth. It is common for example a nightly backup, or a huge file transfer to negatively influence the end-user experience. QoS can fix that. - -4. **Ensure each end-user connection will get a fair cut of the available bandwidth.** - - Several QoS queuing disciplines in Linux do this automatically, without any configuration from you. The result is that new sockets are favored over older ones, so that users will get a snappier experience, while others are transferring large amounts of traffic. - -5. **Protect the servers from DDoS attacks.** - - When your system is under a DDoS attack, it will get a lot more bandwidth compared to the one it can handle and probably your applications will crash. Setting a limit on the inbound traffic using QoS, will protect your servers (throttle the requests) and depending on the size of the attack may allow your legitimate users to access the server, while the attack is taking place. - - Using QoS together with a [SYNPROXY](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md) will provide a great degree of protection against most DDoS attacks. Actually when I wrote that article, a few folks tried to DDoS the Netdata demo site to see in real-time the SYNPROXY operation. They did not do it right, but anyway a great deal of requests reached the Netdata server. What saved Netdata was QoS. The Netdata demo server has QoS installed, so the requests were throttled and the server did not even reach the point of resource starvation. Read about it [here](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md). - -On top of all these, QoS is extremely light. You will configure it once, and this is it. It will not bother you again and it will not use any noticeable CPU resources, especially on application and database servers. - -``` -- ensure administrative tasks (like ssh, dns, etc) will always have a small but guaranteed bandwidth. So, no matter what happens, I will be able to ssh to my server and DNS will work. - -- ensure other administrative tasks will not monopolize all the available bandwidth. So, my nightly backup will not hurt my users, a developer that is copying files over the net will not get all the available bandwidth, etc. - -- ensure each end-user connection will get a fair cut of the available bandwidth. -``` - -Once **traffic classification** is applied, we can use **[netdata](https://github.com/netdata/netdata)** to visualize the bandwidth consumption per class in real-time (no configuration is needed for Netdata - it will figure it out). - -QoS, is extremely light. You will configure it once, and this is it. It will not bother you again and it will not use any noticeable CPU resources, especially on application and database servers. - -This is QoS from a home linux router. Check these features: - -1. It is real-time (per second updates) -2. QoS really works in Linux - check that the `background` traffic is squeezed when `surfing` needs it. - -![test2](https://cloud.githubusercontent.com/assets/2662304/14093004/68966020-f553-11e5-98fe-ffee2086fafd.gif) - ---- - -## QoS in Linux? - -Of course, `tc` is probably **the most undocumented, complicated and unfriendly** command in Linux. - -For example, do you know that for matching a simple port range in `tc`, e.g. all the high ports, from 1025 to 65535 inclusive, you have to match these: - -``` -1025/0xffff -1026/0xfffe -1028/0xfffc -1032/0xfff8 -1040/0xfff0 -1056/0xffe0 -1088/0xffc0 -1152/0xff80 -1280/0xff00 -1536/0xfe00 -2048/0xf800 -4096/0xf000 -8192/0xe000 -16384/0xc000 -32768/0x8000 -``` - -To do it the hard way, you can go through the [tc configuration steps](#qos-configuration-with-tc). An easier way is to use **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**, a tool that simplifies QoS management in Linux. - -## Qos Configuration with FireHOL - -The **[FireHOL](https://firehol.org/)** package already distributes **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**. Check the **[FireQOS tutorial](https://firehol.org/tutorial/fireqos-new-user/)** to learn how to write your own QoS configuration. - -With **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**, it is **really simple for everyone to use QoS in Linux**. Just install the package `firehol`. It should already be available for your distribution. If not, check the **[FireHOL Installation Guide](https://firehol.org/installing/)**. After that, you will have the `fireqos` command which uses a configuration like the following `/etc/firehol/fireqos.conf`, used at the Netdata demo site: - -```sh - # configure the Netdata ports - server_netdata_ports="tcp/19999" - - interface eth0 world bidirectional ethernet balanced rate 50Mbit - class arp - match arp - - class icmp - match icmp - - class dns commit 1Mbit - server dns - client dns - - class ntp - server ntp - client ntp - - class ssh commit 2Mbit - server ssh - client ssh - - class rsync commit 2Mbit max 10Mbit - server rsync - client rsync - - class web_server commit 40Mbit - server http - server netdata - - class client - client surfing - - class nms commit 1Mbit - match input src 10.2.3.5 -``` - -Nothing more is needed. You just run `fireqos start` to apply this configuration, restart Netdata and you have real-time visualization of the bandwidth consumption of your applications. FireQOS is not a daemon. It will just convert the configuration to `tc` commands. It will run them and it will exit. - -**IMPORTANT**: If you copy this configuration to apply it to your system, please adapt the speeds - experiment in non-production environments to learn the tool, before applying it on your servers. - -And this is what you are going to get: - -![image](https://cloud.githubusercontent.com/assets/2662304/14436322/c91d90a4-0024-11e6-9fb1-57cdef1580df.png) - -## QoS Configuration with tc - -First, setup the tc rules in rc.local using commands to assign different QoS markings to different classids. You can see one such example in [github issue #4563](https://github.com/netdata/netdata/issues/4563#issuecomment-455711973). - -Then, map the classids to names by creating `/etc/iproute2/tc_cls`. For example: - -``` -2:1 Standard -2:8 LowPriorityData -2:10 HighThroughputData -2:16 OAM -2:18 LowLatencyData -2:24 BroadcastVideo -2:26 MultimediaStreaming -2:32 RealTimeInteractive -2:34 MultimediaConferencing -2:40 Signalling -2:46 Telephony -2:48 NetworkControl -``` - -Add the following configuration option in `/etc/netdata.conf`: - -```\[plugin:tc] - enable show all classes and qdiscs for all interfaces = yes -``` - -Finally, create `/etc/netdata/tc-qos-helper.conf` with this content: -`tc_show="class"` - -Please note, that by default Netdata will enable monitoring metrics only when they are not zero. If they are constantly zero they are ignored. Metrics that will start having values, after Netdata is started, will be detected and charts will be automatically added to the dashboard (a refresh of the dashboard is needed for them to appear though). Set `yes` for a chart instead of `auto` to enable it permanently. You can also set the `enable zero metrics` option to `yes` in the `[global]` section which enables charts with zero metrics for all internal Netdata plugins. - - +integrations/tc_qos_classes.md
\ No newline at end of file diff --git a/collectors/tc.plugin/integrations/tc_qos_classes.md b/collectors/tc.plugin/integrations/tc_qos_classes.md new file mode 100644 index 000000000..2e013fc00 --- /dev/null +++ b/collectors/tc.plugin/integrations/tc_qos_classes.md @@ -0,0 +1,170 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/tc.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/tc.plugin/metadata.yaml" +sidebar_label: "tc QoS classes" +learn_status: "Published" +learn_rel_path: "Data Collection/Linux Systems/Network" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# tc QoS classes + + +<img src="https://netdata.cloud/img/netdata.png" width="150"/> + + +Plugin: tc.plugin +Module: tc.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Examine tc metrics to gain insights into Linux traffic control operations. Study packet flow rates, queue lengths, and drop rates to optimize network traffic flow. + +The plugin uses `tc` command to collect information about Traffic control. + +This collector is only supported on the following platforms: + +- Linux + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs to access command `tc` to get the necessary metrics. To achieve this netdata modifies permission of file `/usr/libexec/netdata/plugins.d/tc-qos-helper.sh`. + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per network device direction + +Metrics related to QoS network device directions. Each direction (in/out) produces its own set of the following metrics. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| device | The network interface. | +| device_name | The network interface name | +| group | The device family | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| tc.qos | a dimension per class | kilobits/s | +| tc.qos_packets | a dimension per class | packets/s | +| tc.qos_dropped | a dimension per class | packets/s | +| tc.qos_tokens | a dimension per class | tokens | +| tc.qos_ctokens | a dimension per class | ctokens | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Create `tc-qos-helper.conf` + +In order to view tc classes, you need to create the file `/etc/netdata/tc-qos-helper.conf` with content: + +```conf +tc_show="class" +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:tc]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config option</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| script to run to get tc values | Path to script `tc-qos-helper.sh` | usr/libexec/netdata/plugins.d/tc-qos-helper.s | False | +| enable show all classes and qdiscs for all interfaces | yes/no flag to control what data is presented. | yes | False | + +</details> + +#### Examples + +##### Basic + +A basic example configuration using classes defined in `/etc/iproute2/tc_cls`. + +An example of class IDs mapped to names in that file can be: + +```conf +2:1 Standard +2:8 LowPriorityData +2:10 HighThroughputData +2:16 OAM +2:18 LowLatencyData +2:24 BroadcastVideo +2:26 MultimediaStreaming +2:32 RealTimeInteractive +2:34 MultimediaConferencing +2:40 Signalling +2:46 Telephony +2:48 NetworkControl +``` + +You can read more about setting up the tc rules in rc.local in this [GitHub issue](https://github.com/netdata/netdata/issues/4563#issuecomment-455711973). + + +```yaml +[plugin:tc] + script to run to get tc values = /usr/libexec/netdata/plugins.d/tc-qos-helper.sh + enable show all classes and qdiscs for all interfaces = yes + +``` + diff --git a/collectors/tc.plugin/metadata.yaml b/collectors/tc.plugin/metadata.yaml index dcd03e470..f4039a8c5 100644 --- a/collectors/tc.plugin/metadata.yaml +++ b/collectors/tc.plugin/metadata.yaml @@ -36,7 +36,14 @@ modules: description: "" setup: prerequisites: - list: [] + list: + - title: Create `tc-qos-helper.conf` + description: | + In order to view tc classes, you need to create the file `/etc/netdata/tc-qos-helper.conf` with content: + + ```conf + tc_show="class" + ``` configuration: file: name: "netdata.conf" @@ -52,16 +59,42 @@ modules: description: Path to script `tc-qos-helper.sh` default_value: "usr/libexec/netdata/plugins.d/tc-qos-helper.s" required: false + - name: enable show all classes and qdiscs for all interfaces + description: yes/no flag to control what data is presented. + default_value: "yes" + required: false examples: folding: enabled: false title: "Config" list: - name: Basic - description: A basic example configuration. + description: | + A basic example configuration using classes defined in `/etc/iproute2/tc_cls`. + + An example of class IDs mapped to names in that file can be: + + ```conf + 2:1 Standard + 2:8 LowPriorityData + 2:10 HighThroughputData + 2:16 OAM + 2:18 LowLatencyData + 2:24 BroadcastVideo + 2:26 MultimediaStreaming + 2:32 RealTimeInteractive + 2:34 MultimediaConferencing + 2:40 Signalling + 2:46 Telephony + 2:48 NetworkControl + ``` + + You can read more about setting up the tc rules in rc.local in this [GitHub issue](https://github.com/netdata/netdata/issues/4563#issuecomment-455711973). + config: | [plugin:tc] script to run to get tc values = /usr/libexec/netdata/plugins.d/tc-qos-helper.sh + enable show all classes and qdiscs for all interfaces = yes troubleshooting: problems: list: [] diff --git a/collectors/tc.plugin/tc-qos-helper.sh.in b/collectors/tc.plugin/tc-qos-helper.sh.in index 97d4d016d..0fab69eef 100755 --- a/collectors/tc.plugin/tc-qos-helper.sh.in +++ b/collectors/tc.plugin/tc-qos-helper.sh.in @@ -291,7 +291,7 @@ while true; do echo "WORKTIME ${LOOPSLEEPMS_LASTWORK}" || exit - loopsleepms ${update_every} + loopsleepms "${update_every}" [ ${gc} -gt ${exit_after} ] && exit 0 done diff --git a/collectors/timex.plugin/README.md b/collectors/timex.plugin/README.md index 6173503b8..89c1bd0d4 100644..120000 --- a/collectors/timex.plugin/README.md +++ b/collectors/timex.plugin/README.md @@ -1,35 +1 @@ -<!-- -title: "timex.plugin" -description: "Monitor the system clock synchronization state." -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/timex.plugin/README.md" -sidebar_label: "timex.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/System metrics" ---> - -# timex.plugin - -This plugin monitors the system kernel clock synchronization state. - -This plugin creates the following charts: - -- System clock synchronization state according to the system kernel -- System clock status which gives the value of the `time_status` variable in the kernel -- Computed time offset between local system and reference clock - -This is obtained from the information provided by the [ntp_adjtime()](https://man7.org/linux/man-pages/man2/adjtimex.2.html) system call. -An unsynchronized clock may indicate a hardware clock error, or an issue with UTC synchronization. - -## Configuration - -Edit the `netdata.conf` configuration file using [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#use-edit-config-to-edit-configuration-files) from the [Netdata config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory), which is typically at `/etc/netdata`. - -Scroll down to the `[plugin:timex]` section to find the available options: - -```ini -[plugin:timex] - # update every = 1 - # clock synchronization state = yes - # time offset = yes -``` +integrations/timex.md
\ No newline at end of file diff --git a/collectors/timex.plugin/integrations/timex.md b/collectors/timex.plugin/integrations/timex.md new file mode 100644 index 000000000..80d77bc8f --- /dev/null +++ b/collectors/timex.plugin/integrations/timex.md @@ -0,0 +1,142 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/timex.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/timex.plugin/metadata.yaml" +sidebar_label: "Timex" +learn_status: "Published" +learn_rel_path: "Data Collection/System Clock and NTP" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Timex + + +<img src="https://netdata.cloud/img/syslog.png" width="150"/> + + +Plugin: timex.plugin +Module: timex.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +Examine Timex metrics to gain insights into system clock operations. Study time sync status, clock drift, and adjustments to ensure accurate system timekeeping. + +It uses system call adjtimex on Linux and ntp_adjtime on FreeBSD or Mac to monitor the system kernel clock synchronization state. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Timex instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| system.clock_sync_state | state | state | +| system.clock_status | unsync, clockerr | status | +| system.clock_sync_offset | offset | milliseconds | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ system_clock_sync_state ](https://github.com/netdata/netdata/blob/master/health/health.d/timex.conf) | system.clock_sync_state | when set to 0, the system kernel believes the system clock is not properly synchronized to a reliable server | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:timex]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + +At least one option ('clock synchronization state', 'time offset') needs to be enabled for this collector to run. + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | +| clock synchronization state | Make chart showing system clock synchronization state. | yes | True | +| time offset | Make chart showing computed time offset between local system and reference clock | yes | True | + +</details> + +#### Examples + +##### Basic + +A basic configuration example. + +<details><summary>Config</summary> + +```yaml +[plugin:timex] + update every = 1 + clock synchronization state = yes + time offset = yes + +``` +</details> + + diff --git a/collectors/xenstat.plugin/README.md b/collectors/xenstat.plugin/README.md index 8d17a33cd..826e18e41 100644..120000 --- a/collectors/xenstat.plugin/README.md +++ b/collectors/xenstat.plugin/README.md @@ -1,57 +1 @@ -<!-- -title: "xenstat.plugin" -custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/xenstat.plugin/README.md" -sidebar_label: "xenstat.plugin" -learn_status: "Published" -learn_topic_type: "References" -learn_rel_path: "Integrations/Monitor/Virtualized environments/Virtualize hosts" ---> - -# xenstat.plugin - -`xenstat.plugin` collects XenServer and XCP-ng statistics. - -## Prerequisites - -1. install `xen-dom0-libs-devel` and `yajl-devel` using the package manager of your system. - Note: On Cent-OS systems you will need `centos-release-xen` repository and the required package for xen is `xen-devel` - -2. re-install Netdata from source. The installer will detect that the required libraries are now available and will also build xenstat.plugin. - -Keep in mind that `libxenstat` requires root access, so the plugin is setuid to root. - -## Charts - -The plugin provides XenServer and XCP-ng host and domains statistics: - -Host: - -1. Number of domains. - -Domain: - -1. CPU. -2. Memory. -3. Networks. -4. VBDs. - -## Configuration - -If you need to disable xenstat for Netdata, edit /etc/netdata/netdata.conf and set: - -``` -[plugins] - xenstat = no -``` - -## Debugging - -You can run the plugin by hand: - -``` -sudo /usr/libexec/netdata/plugins.d/xenstat.plugin 1 debug -``` - -You will get verbose output on what the plugin does. - - +integrations/xen-xcp-ng.md
\ No newline at end of file diff --git a/collectors/xenstat.plugin/integrations/xen-xcp-ng.md b/collectors/xenstat.plugin/integrations/xen-xcp-ng.md new file mode 100644 index 000000000..e4aea6fee --- /dev/null +++ b/collectors/xenstat.plugin/integrations/xen-xcp-ng.md @@ -0,0 +1,175 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/collectors/xenstat.plugin/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/collectors/xenstat.plugin/metadata.yaml" +sidebar_label: "Xen/XCP-ng" +learn_status: "Published" +learn_rel_path: "Data Collection/Containers and VMs" +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Xen/XCP-ng + + +<img src="https://netdata.cloud/img/xen.png" width="150"/> + + +Plugin: xenstat.plugin +Module: xenstat.plugin + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors XenServer and XCP-ng host and domains statistics. + + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + +The plugin needs setuid. + +### Default Behavior + +#### Auto-Detection + +This plugin requires the `xen-dom0-libs-devel` and `yajl-devel` libraries to be installed. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per Xen/XCP-ng instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| xenstat.mem | free, used | MiB | +| xenstat.domains | domains | domains | +| xenstat.cpus | cpus | cpus | +| xenstat.cpu_freq | frequency | MHz | + +### Per xendomain + +Metrics related to Xen domains. Each domain provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| xendomain.states | running, blocked, paused, shutdown, crashed, dying | boolean | +| xendomain.cpu | used | percentage | +| xendomain.mem | maximum, current | MiB | +| xendomain.vcpu | a dimension per vcpu | percentage | + +### Per xendomain vbd + +Metrics related to Xen domain Virtual Block Device. Each VBD provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| xendomain.oo_req_vbd | requests | requests/s | +| xendomain.requests_vbd | read, write | requests/s | +| xendomain.sectors_vbd | read, write | sectors/s | + +### Per xendomain network + +Metrics related to Xen domain network interfaces. Each network interface provides its own set of the following metrics. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| xendomain.bytes_network | received, sent | kilobits/s | +| xendomain.packets_network | received, sent | packets/s | +| xendomain.errors_network | received, sent | errors/s | +| xendomain.drops_network | received, sent | drops/s | + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Libraries + +1. Install `xen-dom0-libs-devel` and `yajl-devel` using the package manager of your system. + + Note: On Cent-OS systems you will need `centos-release-xen` repository and the required package for xen is `xen-devel` + +2. Re-install Netdata from source. The installer will detect that the required libraries are now available and will also build xenstat.plugin. + + + +### Configuration + +#### File + +The configuration file name for this integration is `netdata.conf`. +Configuration for this specific integration is located in the `[plugin:xenstat]` section within that file. + +The file format is a modified INI syntax. The general structure is: + +```ini +[section1] + option1 = some value + option2 = some other value + +[section2] + option3 = some third value +``` +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config netdata.conf +``` +#### Options + + + +<details><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update every | Data collection frequency. | 1 | False | + +</details> + +#### Examples +There are no configuration examples. + + diff --git a/collectors/xenstat.plugin/xenstat_plugin.c b/collectors/xenstat.plugin/xenstat_plugin.c index acd072605..c05d5e298 100644 --- a/collectors/xenstat.plugin/xenstat_plugin.c +++ b/collectors/xenstat.plugin/xenstat_plugin.c @@ -935,6 +935,8 @@ int main(int argc, char **argv) { error_log_errors_per_period = 100; error_log_throttle_period = 3600; + log_set_global_severity_for_external_plugins(); + // ------------------------------------------------------------------------ // parse command line parameters |