diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:23 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-07-24 09:54:44 +0000 |
commit | 836b47cb7e99a977c5a23b059ca1d0b5065d310e (patch) | |
tree | 1604da8f482d02effa033c94a84be42bc0c848c3 /src/collectors/python.d.plugin/pandas | |
parent | Releasing debian version 1.44.3-2. (diff) | |
download | netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.tar.xz netdata-836b47cb7e99a977c5a23b059ca1d0b5065d310e.zip |
Merging upstream version 1.46.3.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'src/collectors/python.d.plugin/pandas')
l--------- | src/collectors/python.d.plugin/pandas/README.md | 1 | ||||
-rw-r--r-- | src/collectors/python.d.plugin/pandas/integrations/pandas.md | 365 | ||||
-rw-r--r-- | src/collectors/python.d.plugin/pandas/metadata.yaml | 308 | ||||
-rw-r--r-- | src/collectors/python.d.plugin/pandas/pandas.chart.py | 99 | ||||
-rw-r--r-- | src/collectors/python.d.plugin/pandas/pandas.conf | 211 |
5 files changed, 984 insertions, 0 deletions
diff --git a/src/collectors/python.d.plugin/pandas/README.md b/src/collectors/python.d.plugin/pandas/README.md new file mode 120000 index 000000000..2fabe63c1 --- /dev/null +++ b/src/collectors/python.d.plugin/pandas/README.md @@ -0,0 +1 @@ +integrations/pandas.md
\ No newline at end of file diff --git a/src/collectors/python.d.plugin/pandas/integrations/pandas.md b/src/collectors/python.d.plugin/pandas/integrations/pandas.md new file mode 100644 index 000000000..898e23f0a --- /dev/null +++ b/src/collectors/python.d.plugin/pandas/integrations/pandas.md @@ -0,0 +1,365 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/collectors/python.d.plugin/pandas/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/src/collectors/python.d.plugin/pandas/metadata.yaml" +sidebar_label: "Pandas" +learn_status: "Published" +learn_rel_path: "Collecting Metrics/Generic Collecting Metrics" +most_popular: False +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# Pandas + + +<img src="https://netdata.cloud/img/pandas.png" width="150"/> + + +Plugin: python.d.plugin +Module: pandas + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +[Pandas](https://pandas.pydata.org/) is a de-facto standard in reading and processing most types of structured data in Python. +If you have metrics appearing in a CSV, JSON, XML, HTML, or [other supported format](https://pandas.pydata.org/docs/user_guide/io.html), +either locally or via some HTTP endpoint, you can easily ingest and present those metrics in Netdata, by leveraging the Pandas collector. + +This collector can be used to collect pretty much anything that can be read by Pandas, and then processed by Pandas. + + +The collector uses [pandas](https://pandas.pydata.org/) to pull data and do pandas-based preprocessing, before feeding to Netdata. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +This integration doesn't support auto-detection. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + +This collector is expecting one row in the final pandas DataFrame. It is that first row that will be taken +as the most recent values for each dimension on each chart using (`df.to_dict(orient='records')[0]`). +See [pd.to_dict()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html)." + + +### Per Pandas instance + +These metrics refer to the entire monitored application. + + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| + + + +## Alerts + +There are no alerts configured by default for this integration. + + +## Setup + +### Prerequisites + +#### Python Requirements + +This collector depends on some Python (Python 3 only) packages that can usually be installed via `pip` or `pip3`. + +```bash +sudo pip install pandas requests +``` + +Note: If you would like to use [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) to query a database, you will need to install the below packages as well. + +```bash +sudo pip install 'sqlalchemy<2.0' psycopg2-binary +``` + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/pandas.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/pandas.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +<details open><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| chart_configs | an array of chart configuration dictionaries | [] | yes | +| chart_configs.name | name of the chart to be displayed in the dashboard. | None | yes | +| chart_configs.title | title of the chart to be displayed in the dashboard. | None | yes | +| chart_configs.family | [family](/docs/dashboards-and-charts/netdata-charts.md#families) of the chart to be displayed in the dashboard. | None | yes | +| chart_configs.context | [context](/docs/dashboards-and-charts/netdata-charts.md#contexts) of the chart to be displayed in the dashboard. | None | yes | +| chart_configs.type | the type of the chart to be displayed in the dashboard. | None | yes | +| chart_configs.units | the units of the chart to be displayed in the dashboard. | None | yes | +| chart_configs.df_steps | a series of pandas operations (one per line) that each returns a dataframe. | None | yes | +| update_every | Sets the default data collection frequency. | 5 | no | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | no | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | no | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | no | +| name | Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. | | no | + +</details> + +#### Examples + +##### Temperature API Example + +example pulling some hourly temperature data, a chart for today forecast (mean,min,max) and another chart for current. + +<details open><summary>Config</summary> + +```yaml +temperature: + name: "temperature" + update_every: 5 + chart_configs: + - name: "temperature_forecast_by_city" + title: "Temperature By City - Today Forecast" + family: "temperature.today" + context: "pandas.temperature" + type: "line" + units: "Celsius" + df_steps: > + pd.DataFrame.from_dict( + {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}&hourly=temperature_2m').json()['hourly']['temperature_2m'] + for (city,lat,lng) + in [ + ('dublin', 53.3441, -6.2675), + ('athens', 37.9792, 23.7166), + ('london', 51.5002, -0.1262), + ('berlin', 52.5235, 13.4115), + ('paris', 48.8567, 2.3510), + ('madrid', 40.4167, -3.7033), + ('new_york', 40.71, -74.01), + ('los_angeles', 34.05, -118.24), + ] + } + ); + df.describe(); # get aggregate stats for each city; + df.transpose()[['mean', 'max', 'min']].reset_index(); # just take mean, min, max; + df.rename(columns={'index':'city'}); # some column renaming; + df.pivot(columns='city').mean().to_frame().reset_index(); # force to be one row per city; + df.rename(columns={0:'degrees'}); # some column renaming; + pd.concat([df, df['city']+'_'+df['level_0']], axis=1); # add new column combining city and summary measurement label; + df.rename(columns={0:'measurement'}); # some column renaming; + df[['measurement', 'degrees']].set_index('measurement'); # just take two columns we want; + df.sort_index(); # sort by city name; + df.transpose(); # transpose so its just one wide row; + - name: "temperature_current_by_city" + title: "Temperature By City - Current" + family: "temperature.current" + context: "pandas.temperature" + type: "line" + units: "Celsius" + df_steps: > + pd.DataFrame.from_dict( + {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}¤t_weather=true').json()['current_weather'] + for (city,lat,lng) + in [ + ('dublin', 53.3441, -6.2675), + ('athens', 37.9792, 23.7166), + ('london', 51.5002, -0.1262), + ('berlin', 52.5235, 13.4115), + ('paris', 48.8567, 2.3510), + ('madrid', 40.4167, -3.7033), + ('new_york', 40.71, -74.01), + ('los_angeles', 34.05, -118.24), + ] + } + ); + df.transpose(); + df[['temperature']]; + df.transpose(); + +``` +</details> + +##### API CSV Example + +example showing a read_csv from a url and some light pandas data wrangling. + +<details open><summary>Config</summary> + +```yaml +example_csv: + name: "example_csv" + update_every: 2 + chart_configs: + - name: "london_system_cpu" + title: "London System CPU - Ratios" + family: "london_system_cpu" + context: "pandas" + type: "line" + units: "n" + df_steps: > + pd.read_csv('https://london.my-netdata.io/api/v1/data?chart=system.cpu&format=csv&after=-60', storage_options={'User-Agent': 'netdata'}); + df.drop('time', axis=1); + df.mean().to_frame().transpose(); + df.apply(lambda row: (row.user / row.system), axis = 1).to_frame(); + df.rename(columns={0:'average_user_system_ratio'}); + df*100; + +``` +</details> + +##### API JSON Example + +example showing a read_json from a url and some light pandas data wrangling. + +<details open><summary>Config</summary> + +```yaml +example_json: + name: "example_json" + update_every: 2 + chart_configs: + - name: "london_system_net" + title: "London System Net - Total Bandwidth" + family: "london_system_net" + context: "pandas" + type: "area" + units: "kilobits/s" + df_steps: > + pd.DataFrame(requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['data'], columns=requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['labels']); + df.drop('time', axis=1); + abs(df); + df.sum(axis=1).to_frame(); + df.rename(columns={0:'total_bandwidth'}); + +``` +</details> + +##### XML Example + +example showing a read_xml from a url and some light pandas data wrangling. + +<details open><summary>Config</summary> + +```yaml +example_xml: + name: "example_xml" + update_every: 2 + line_sep: "|" + chart_configs: + - name: "temperature_forcast" + title: "Temperature Forecast" + family: "temp" + context: "pandas.temp" + type: "line" + units: "celsius" + df_steps: > + pd.read_xml('http://metwdb-openaccess.ichec.ie/metno-wdb2ts/locationforecast?lat=54.7210798611;long=-8.7237392806', xpath='./product/time[1]/location/temperature', parser='etree')| + df.rename(columns={'value': 'dublin'})| + df[['dublin']]| + +``` +</details> + +##### SQL Example + +example showing a read_sql from a postgres database using sqlalchemy. + +<details open><summary>Config</summary> + +```yaml +sql: + name: "sql" + update_every: 5 + chart_configs: + - name: "sql" + title: "SQL Example" + family: "sql.example" + context: "example" + type: "line" + units: "percent" + df_steps: > + pd.read_sql_query( + sql='\ + select \ + random()*100 as metric_1, \ + random()*100 as metric_2 \ + ', + con=create_engine('postgresql://localhost/postgres?user=netdata&password=netdata') + ); + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `pandas` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin pandas debug trace + ``` + + diff --git a/src/collectors/python.d.plugin/pandas/metadata.yaml b/src/collectors/python.d.plugin/pandas/metadata.yaml new file mode 100644 index 000000000..c32be09b4 --- /dev/null +++ b/src/collectors/python.d.plugin/pandas/metadata.yaml @@ -0,0 +1,308 @@ +plugin_name: python.d.plugin +modules: + - meta: + plugin_name: python.d.plugin + module_name: pandas + monitored_instance: + name: Pandas + link: https://pandas.pydata.org/ + categories: + - data-collection.generic-data-collection + icon_filename: pandas.png + related_resources: + integrations: + list: [] + info_provided_to_referring_integrations: + description: "" + keywords: + - pandas + - python + most_popular: false + overview: + data_collection: + metrics_description: | + [Pandas](https://pandas.pydata.org/) is a de-facto standard in reading and processing most types of structured data in Python. + If you have metrics appearing in a CSV, JSON, XML, HTML, or [other supported format](https://pandas.pydata.org/docs/user_guide/io.html), + either locally or via some HTTP endpoint, you can easily ingest and present those metrics in Netdata, by leveraging the Pandas collector. + + This collector can be used to collect pretty much anything that can be read by Pandas, and then processed by Pandas. + method_description: | + The collector uses [pandas](https://pandas.pydata.org/) to pull data and do pandas-based preprocessing, before feeding to Netdata. + supported_platforms: + include: [] + exclude: [] + multi_instance: true + additional_permissions: + description: "" + default_behavior: + auto_detection: + description: "" + limits: + description: "" + performance_impact: + description: "" + setup: + prerequisites: + list: + - title: Python Requirements + description: | + This collector depends on some Python (Python 3 only) packages that can usually be installed via `pip` or `pip3`. + + ```bash + sudo pip install pandas requests + ``` + + Note: If you would like to use [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) to query a database, you will need to install the below packages as well. + + ```bash + sudo pip install 'sqlalchemy<2.0' psycopg2-binary + ``` + configuration: + file: + name: python.d/pandas.conf + description: "" + options: + description: | + There are 2 sections: + + * Global variables + * One or more JOBS that can define multiple different instances to monitor. + + The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + + Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + + Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + folding: + title: Config options + enabled: true + list: + - name: chart_configs + description: an array of chart configuration dictionaries + default_value: "[]" + required: true + - name: chart_configs.name + description: name of the chart to be displayed in the dashboard. + default_value: None + required: true + - name: chart_configs.title + description: title of the chart to be displayed in the dashboard. + default_value: None + required: true + - name: chart_configs.family + description: "[family](/docs/dashboards-and-charts/netdata-charts.md#families) of the chart to be displayed in the dashboard." + default_value: None + required: true + - name: chart_configs.context + description: "[context](/docs/dashboards-and-charts/netdata-charts.md#contexts) of the chart to be displayed in the dashboard." + default_value: None + required: true + - name: chart_configs.type + description: the type of the chart to be displayed in the dashboard. + default_value: None + required: true + - name: chart_configs.units + description: the units of the chart to be displayed in the dashboard. + default_value: None + required: true + - name: chart_configs.df_steps + description: a series of pandas operations (one per line) that each returns a dataframe. + default_value: None + required: true + - name: update_every + description: Sets the default data collection frequency. + default_value: 5 + required: false + - name: priority + description: Controls the order of charts at the netdata dashboard. + default_value: 60000 + required: false + - name: autodetection_retry + description: Sets the job re-check interval in seconds. + default_value: 0 + required: false + - name: penalty + description: Indicates whether to apply penalty to update_every in case of failures. + default_value: yes + required: false + - name: name + description: Job name. This value will overwrite the `job_name` value. JOBS with the same name are mutually exclusive. Only one of them will be allowed running at any time. This allows autodetection to try several alternatives and pick the one that works. + default_value: "" + required: false + examples: + folding: + enabled: true + title: Config + list: + - name: Temperature API Example + folding: + enabled: true + description: example pulling some hourly temperature data, a chart for today forecast (mean,min,max) and another chart for current. + config: | + temperature: + name: "temperature" + update_every: 5 + chart_configs: + - name: "temperature_forecast_by_city" + title: "Temperature By City - Today Forecast" + family: "temperature.today" + context: "pandas.temperature" + type: "line" + units: "Celsius" + df_steps: > + pd.DataFrame.from_dict( + {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}&hourly=temperature_2m').json()['hourly']['temperature_2m'] + for (city,lat,lng) + in [ + ('dublin', 53.3441, -6.2675), + ('athens', 37.9792, 23.7166), + ('london', 51.5002, -0.1262), + ('berlin', 52.5235, 13.4115), + ('paris', 48.8567, 2.3510), + ('madrid', 40.4167, -3.7033), + ('new_york', 40.71, -74.01), + ('los_angeles', 34.05, -118.24), + ] + } + ); + df.describe(); # get aggregate stats for each city; + df.transpose()[['mean', 'max', 'min']].reset_index(); # just take mean, min, max; + df.rename(columns={'index':'city'}); # some column renaming; + df.pivot(columns='city').mean().to_frame().reset_index(); # force to be one row per city; + df.rename(columns={0:'degrees'}); # some column renaming; + pd.concat([df, df['city']+'_'+df['level_0']], axis=1); # add new column combining city and summary measurement label; + df.rename(columns={0:'measurement'}); # some column renaming; + df[['measurement', 'degrees']].set_index('measurement'); # just take two columns we want; + df.sort_index(); # sort by city name; + df.transpose(); # transpose so its just one wide row; + - name: "temperature_current_by_city" + title: "Temperature By City - Current" + family: "temperature.current" + context: "pandas.temperature" + type: "line" + units: "Celsius" + df_steps: > + pd.DataFrame.from_dict( + {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}¤t_weather=true').json()['current_weather'] + for (city,lat,lng) + in [ + ('dublin', 53.3441, -6.2675), + ('athens', 37.9792, 23.7166), + ('london', 51.5002, -0.1262), + ('berlin', 52.5235, 13.4115), + ('paris', 48.8567, 2.3510), + ('madrid', 40.4167, -3.7033), + ('new_york', 40.71, -74.01), + ('los_angeles', 34.05, -118.24), + ] + } + ); + df.transpose(); + df[['temperature']]; + df.transpose(); + - name: API CSV Example + folding: + enabled: true + description: example showing a read_csv from a url and some light pandas data wrangling. + config: | + example_csv: + name: "example_csv" + update_every: 2 + chart_configs: + - name: "london_system_cpu" + title: "London System CPU - Ratios" + family: "london_system_cpu" + context: "pandas" + type: "line" + units: "n" + df_steps: > + pd.read_csv('https://london.my-netdata.io/api/v1/data?chart=system.cpu&format=csv&after=-60', storage_options={'User-Agent': 'netdata'}); + df.drop('time', axis=1); + df.mean().to_frame().transpose(); + df.apply(lambda row: (row.user / row.system), axis = 1).to_frame(); + df.rename(columns={0:'average_user_system_ratio'}); + df*100; + - name: API JSON Example + folding: + enabled: true + description: example showing a read_json from a url and some light pandas data wrangling. + config: | + example_json: + name: "example_json" + update_every: 2 + chart_configs: + - name: "london_system_net" + title: "London System Net - Total Bandwidth" + family: "london_system_net" + context: "pandas" + type: "area" + units: "kilobits/s" + df_steps: > + pd.DataFrame(requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['data'], columns=requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['labels']); + df.drop('time', axis=1); + abs(df); + df.sum(axis=1).to_frame(); + df.rename(columns={0:'total_bandwidth'}); + - name: XML Example + folding: + enabled: true + description: example showing a read_xml from a url and some light pandas data wrangling. + config: | + example_xml: + name: "example_xml" + update_every: 2 + line_sep: "|" + chart_configs: + - name: "temperature_forcast" + title: "Temperature Forecast" + family: "temp" + context: "pandas.temp" + type: "line" + units: "celsius" + df_steps: > + pd.read_xml('http://metwdb-openaccess.ichec.ie/metno-wdb2ts/locationforecast?lat=54.7210798611;long=-8.7237392806', xpath='./product/time[1]/location/temperature', parser='etree')| + df.rename(columns={'value': 'dublin'})| + df[['dublin']]| + - name: SQL Example + folding: + enabled: true + description: example showing a read_sql from a postgres database using sqlalchemy. + config: | + sql: + name: "sql" + update_every: 5 + chart_configs: + - name: "sql" + title: "SQL Example" + family: "sql.example" + context: "example" + type: "line" + units: "percent" + df_steps: > + pd.read_sql_query( + sql='\ + select \ + random()*100 as metric_1, \ + random()*100 as metric_2 \ + ', + con=create_engine('postgresql://localhost/postgres?user=netdata&password=netdata') + ); + troubleshooting: + problems: + list: [] + alerts: [] + metrics: + folding: + title: Metrics + enabled: false + description: | + This collector is expecting one row in the final pandas DataFrame. It is that first row that will be taken + as the most recent values for each dimension on each chart using (`df.to_dict(orient='records')[0]`). + See [pd.to_dict()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html)." + availability: [] + scopes: + - name: global + description: | + These metrics refer to the entire monitored application. + labels: [] + metrics: [] diff --git a/src/collectors/python.d.plugin/pandas/pandas.chart.py b/src/collectors/python.d.plugin/pandas/pandas.chart.py new file mode 100644 index 000000000..7977bcb36 --- /dev/null +++ b/src/collectors/python.d.plugin/pandas/pandas.chart.py @@ -0,0 +1,99 @@ +# -*- coding: utf-8 -*- +# Description: pandas netdata python.d module +# Author: Andrew Maguire (andrewm4894) +# SPDX-License-Identifier: GPL-3.0-or-later + +import os +import pandas as pd + +try: + import requests + HAS_REQUESTS = True +except ImportError: + HAS_REQUESTS = False + +try: + from sqlalchemy import create_engine + HAS_SQLALCHEMY = True +except ImportError: + HAS_SQLALCHEMY = False + +from bases.FrameworkServices.SimpleService import SimpleService + +ORDER = [] + +CHARTS = {} + + +class Service(SimpleService): + def __init__(self, configuration=None, name=None): + SimpleService.__init__(self, configuration=configuration, name=name) + self.order = ORDER + self.definitions = CHARTS + self.chart_configs = self.configuration.get('chart_configs', None) + self.line_sep = self.configuration.get('line_sep', ';') + + def run_code(self, df_steps): + """eval() each line of code and ensure the result is a pandas dataframe""" + + # process each line of code + lines = df_steps.split(self.line_sep) + for line in lines: + line_clean = line.strip('\n').strip(' ') + if line_clean != '' and line_clean[0] != '#': + df = eval(line_clean) + assert isinstance(df, pd.DataFrame), 'The result of each evaluated line of `df_steps` must be of type `pd.DataFrame`' + + # take top row of final df as data to be collected by netdata + data = df.to_dict(orient='records')[0] + + return data + + def check(self): + """ensure charts and dims all configured and that we can get data""" + + if not HAS_REQUESTS: + self.warning('requests library could not be imported') + + if not HAS_SQLALCHEMY: + self.warning('sqlalchemy library could not be imported') + + if not self.chart_configs: + self.error('chart_configs must be defined') + + data = dict() + + # add each chart as defined by the config + for chart_config in self.chart_configs: + if chart_config['name'] not in self.charts: + chart_template = { + 'options': [ + chart_config['name'], + chart_config['title'], + chart_config['units'], + chart_config['family'], + chart_config['context'], + chart_config['type'] + ], + 'lines': [] + } + self.charts.add_chart([chart_config['name']] + chart_template['options']) + + data_tmp = self.run_code(chart_config['df_steps']) + data.update(data_tmp) + + for dim in data_tmp: + self.charts[chart_config['name']].add_dimension([dim, dim, 'absolute', 1, 1]) + + return True + + def get_data(self): + """get data for each chart config""" + + data = dict() + + for chart_config in self.chart_configs: + data_tmp = self.run_code(chart_config['df_steps']) + data.update(data_tmp) + + return data diff --git a/src/collectors/python.d.plugin/pandas/pandas.conf b/src/collectors/python.d.plugin/pandas/pandas.conf new file mode 100644 index 000000000..74a7da3e9 --- /dev/null +++ b/src/collectors/python.d.plugin/pandas/pandas.conf @@ -0,0 +1,211 @@ +# netdata python.d.plugin configuration for pandas +# +# This file is in YaML format. Generally the format is: +# +# name: value +# +# There are 2 sections: +# - global variables +# - one or more JOBS +# +# JOBS allow you to collect values from multiple sources. +# Each source will have its own set of charts. +# +# JOB parameters have to be indented (using spaces only, example below). + +# ---------------------------------------------------------------------- +# Global Variables +# These variables set the defaults for all JOBs, however each JOB +# may define its own, overriding the defaults. + +# update_every sets the default data collection frequency. +# If unset, the python.d.plugin default is used. +update_every: 5 + +# priority controls the order of charts at the netdata dashboard. +# Lower numbers move the charts towards the top of the page. +# If unset, the default for python.d.plugin is used. +# priority: 60000 + +# penalty indicates whether to apply penalty to update_every in case of failures. +# Penalty will increase every 5 failed updates in a row. Maximum penalty is 10 minutes. +# penalty: yes + +# autodetection_retry sets the job re-check interval in seconds. +# The job is not deleted if check fails. +# Attempts to start the job are made once every autodetection_retry. +# This feature is disabled by default. +# autodetection_retry: 0 + +# ---------------------------------------------------------------------- +# JOBS (data collection sources) +# +# The default JOBS share the same *name*. JOBS with the same name +# are mutually exclusive. Only one of them will be allowed running at +# any time. This allows autodetection to try several alternatives and +# pick the one that works. +# +# Any number of jobs is supported. +# +# All python.d.plugin JOBS (for all its modules) support a set of +# predefined parameters. These are: +# +# job_name: +# name: myname # the JOB's name as it will appear on the dashboard +# # dashboard (by default is the job_name) +# # JOBs sharing a name are mutually exclusive +# update_every: 1 # the JOB's data collection frequency +# priority: 60000 # the JOB's order on the dashboard +# penalty: yes # the JOB's penalty +# autodetection_retry: 0 # the JOB's re-check interval in seconds +# +# Additionally to the above, example also supports the following: +# +# chart_configs: [<dictionary>] # an array for chart config dictionaries. +# +# ---------------------------------------------------------------------- +# AUTO-DETECTION JOBS + +# Some example configurations, enable this collector, uncomment and example below and restart netdata to enable. + +# example pulling some hourly temperature data, a chart for today forecast (mean,min,max) and another chart for current. +# temperature: +# name: "temperature" +# update_every: 5 +# chart_configs: +# - name: "temperature_forecast_by_city" +# title: "Temperature By City - Today Forecast" +# family: "temperature.today" +# context: "pandas.temperature" +# type: "line" +# units: "Celsius" +# df_steps: > +# pd.DataFrame.from_dict( +# {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}&hourly=temperature_2m').json()['hourly']['temperature_2m'] +# for (city,lat,lng) +# in [ +# ('dublin', 53.3441, -6.2675), +# ('athens', 37.9792, 23.7166), +# ('london', 51.5002, -0.1262), +# ('berlin', 52.5235, 13.4115), +# ('paris', 48.8567, 2.3510), +# ('madrid', 40.4167, -3.7033), +# ('new_york', 40.71, -74.01), +# ('los_angeles', 34.05, -118.24), +# ] +# } +# ); +# df.describe(); # get aggregate stats for each city; +# df.transpose()[['mean', 'max', 'min']].reset_index(); # just take mean, min, max; +# df.rename(columns={'index':'city'}); # some column renaming; +# df.pivot(columns='city').mean().to_frame().reset_index(); # force to be one row per city; +# df.rename(columns={0:'degrees'}); # some column renaming; +# pd.concat([df, df['city']+'_'+df['level_0']], axis=1); # add new column combining city and summary measurement label; +# df.rename(columns={0:'measurement'}); # some column renaming; +# df[['measurement', 'degrees']].set_index('measurement'); # just take two columns we want; +# df.sort_index(); # sort by city name; +# df.transpose(); # transpose so its just one wide row; +# - name: "temperature_current_by_city" +# title: "Temperature By City - Current" +# family: "temperature.current" +# context: "pandas.temperature" +# type: "line" +# units: "Celsius" +# df_steps: > +# pd.DataFrame.from_dict( +# {city: requests.get(f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}¤t_weather=true').json()['current_weather'] +# for (city,lat,lng) +# in [ +# ('dublin', 53.3441, -6.2675), +# ('athens', 37.9792, 23.7166), +# ('london', 51.5002, -0.1262), +# ('berlin', 52.5235, 13.4115), +# ('paris', 48.8567, 2.3510), +# ('madrid', 40.4167, -3.7033), +# ('new_york', 40.71, -74.01), +# ('los_angeles', 34.05, -118.24), +# ] +# } +# ); +# df.transpose(); +# df[['temperature']]; +# df.transpose(); + +# example showing a read_csv from a url and some light pandas data wrangling. +# pull data in csv format from london demo server and then ratio of user cpus over system cpu averaged over last 60 seconds. +# example_csv: +# name: "example_csv" +# update_every: 2 +# chart_configs: +# - name: "london_system_cpu" +# title: "London System CPU - Ratios" +# family: "london_system_cpu" +# context: "pandas" +# type: "line" +# units: "n" +# df_steps: > +# pd.read_csv('https://london.my-netdata.io/api/v1/data?chart=system.cpu&format=csv&after=-60', storage_options={'User-Agent': 'netdata'}); +# df.drop('time', axis=1); +# df.mean().to_frame().transpose(); +# df.apply(lambda row: (row.user / row.system), axis = 1).to_frame(); +# df.rename(columns={0:'average_user_system_ratio'}); +# df*100; + +# example showing a read_json from a url and some light pandas data wrangling. +# pull data in json format (using requests.get() if json data is too complex for pd.read_json() ) from london demo server and work out 'total_bandwidth'. +# example_json: +# name: "example_json" +# update_every: 2 +# chart_configs: +# - name: "london_system_net" +# title: "London System Net - Total Bandwidth" +# family: "london_system_net" +# context: "pandas" +# type: "area" +# units: "kilobits/s" +# df_steps: > +# pd.DataFrame(requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['data'], columns=requests.get('https://london.my-netdata.io/api/v1/data?chart=system.net&format=json&after=-1').json()['labels']); +# df.drop('time', axis=1); +# abs(df); +# df.sum(axis=1).to_frame(); +# df.rename(columns={0:'total_bandwidth'}); + +# example showing a read_xml from a url and some light pandas data wrangling. +# pull weather forecast data in xml format, use xpath to pull out temperature forecast. +# example_xml: +# name: "example_xml" +# update_every: 2 +# line_sep: "|" +# chart_configs: +# - name: "temperature_forcast" +# title: "Temperature Forecast" +# family: "temp" +# context: "pandas.temp" +# type: "line" +# units: "celsius" +# df_steps: > +# pd.read_xml('http://metwdb-openaccess.ichec.ie/metno-wdb2ts/locationforecast?lat=54.7210798611;long=-8.7237392806', xpath='./product/time[1]/location/temperature', parser='etree')| +# df.rename(columns={'value': 'dublin'})| +# df[['dublin']]| + +# example showing a read_sql from a postgres database using sqlalchemy. +# note: example assumes a running postgress db on localhost with a netdata users and password netdata. +# sql: +# name: "sql" +# update_every: 5 +# chart_configs: +# - name: "sql" +# title: "SQL Example" +# family: "sql.example" +# context: "example" +# type: "line" +# units: "percent" +# df_steps: > +# pd.read_sql_query( +# sql='\ +# select \ +# random()*100 as metric_1, \ +# random()*100 as metric_2 \ +# ', +# con=create_engine('postgresql://localhost/postgres?user=netdata&password=netdata') +# ); |