summaryrefslogtreecommitdiffstats
path: root/collectors/python.d.plugin/pandas/README.md
diff options
context:
space:
mode:
Diffstat (limited to '')
l---------[-rw-r--r--]collectors/python.d.plugin/pandas/README.md97
1 files changed, 1 insertions, 96 deletions
diff --git a/collectors/python.d.plugin/pandas/README.md b/collectors/python.d.plugin/pandas/README.md
index 19b11d5be..2fabe63c1 100644..120000
--- a/collectors/python.d.plugin/pandas/README.md
+++ b/collectors/python.d.plugin/pandas/README.md
@@ -1,96 +1 @@
-# Ingest structured data (Pandas)
-
-<a href="https://pandas.pydata.org/" target="_blank">
- <img src="https://pandas.pydata.org/docs/_static/pandas.svg" alt="Pandas" width="100px" height="50px" />
- </a>
-
-[Pandas](https://pandas.pydata.org/) is a de-facto standard in reading and processing most types of structured data in Python.
-If you have metrics appearing in a CSV, JSON, XML, HTML, or [other supported format](https://pandas.pydata.org/docs/user_guide/io.html),
-either locally or via some HTTP endpoint, you can easily ingest and present those metrics in Netdata, by leveraging the Pandas collector.
-
-The collector uses [pandas](https://pandas.pydata.org/) to pull data and do pandas-based
-preprocessing, before feeding to Netdata.
-
-## Requirements
-
-This collector depends on some Python (Python 3 only) packages that can usually be installed via `pip` or `pip3`.
-
-```bash
-sudo pip install pandas requests
-```
-
-Note: If you would like to use [`pandas.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) to query a database, you will need to install the below packages as well.
-
-```bash
-sudo pip install 'sqlalchemy<2.0' psycopg2-binary
-```
-
-## Configuration
-
-Below is an example configuration to query some json weather data from [Open-Meteo](https://open-meteo.com),
-do some data wrangling on it and save in format as expected by Netdata.
-
-```yaml
-# example pulling some hourly temperature data
-temperature:
- name: "temperature"
- update_every: 3
- chart_configs:
- - name: "temperature_by_city"
- title: "Temperature By City"
- family: "temperature.today"
- context: "pandas.temperature"
- type: "line"
- units: "Celsius"
- df_steps: >
- pd.DataFrame.from_dict(
- {city: requests.get(
- f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lng}&hourly=temperature_2m'
- ).json()['hourly']['temperature_2m']
- for (city,lat,lng)
- in [
- ('dublin', 53.3441, -6.2675),
- ('athens', 37.9792, 23.7166),
- ('london', 51.5002, -0.1262),
- ('berlin', 52.5235, 13.4115),
- ('paris', 48.8567, 2.3510),
- ]
- }
- ); # use dictionary comprehension to make multiple requests;
- df.describe(); # get aggregate stats for each city;
- df.transpose()[['mean', 'max', 'min']].reset_index(); # just take mean, min, max;
- df.rename(columns={'index':'city'}); # some column renaming;
- df.pivot(columns='city').mean().to_frame().reset_index(); # force to be one row per city;
- df.rename(columns={0:'degrees'}); # some column renaming;
- pd.concat([df, df['city']+'_'+df['level_0']], axis=1); # add new column combining city and summary measurement label;
- df.rename(columns={0:'measurement'}); # some column renaming;
- df[['measurement', 'degrees']].set_index('measurement'); # just take two columns we want;
- df.sort_index(); # sort by city name;
- df.transpose(); # transpose so its just one wide row;
-```
-
-`chart_configs` is a list of dictionary objects where each one defines the sequence of `df_steps` to be run using [`pandas`](https://pandas.pydata.org/),
-and the `name`, `title` etc to define the
-[CHART variables](https://github.com/netdata/netdata/blob/master/docs/guides/python-collector.md#create-charts)
-that will control how the results will look in netdata.
-
-The example configuration above would result in a `data` dictionary like the below being collected by Netdata
-at each time step. They keys in this dictionary will be the "dimensions" of the chart.
-
-```javascript
-{'athens_max': 26.2, 'athens_mean': 19.45952380952381, 'athens_min': 12.2, 'berlin_max': 17.4, 'berlin_mean': 10.764285714285714, 'berlin_min': 5.7, 'dublin_max': 15.3, 'dublin_mean': 12.008928571428571, 'dublin_min': 6.6, 'london_max': 18.9, 'london_mean': 12.510714285714286, 'london_min': 5.2, 'paris_max': 19.4, 'paris_mean': 12.054166666666665, 'paris_min': 4.8}
-```
-
-Which, given the above configuration would end up as a chart like below in Netdata.
-
-![pandas collector temperature example chart](https://user-images.githubusercontent.com/2178292/195075312-8ce8cf68-5172-48e3-af09-104ffecfcdd6.png)
-
-## Notes
-- Each line in `df_steps` must return a pandas
-[DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object (`df`) at each step.
-- You can use
-[this colab notebook](https://colab.research.google.com/drive/1VYrddSegZqGtkWGFuiUbMbUk5f3rW6Hi?usp=sharing)
-to mock up and work on your `df_steps` iteratively before adding them to your config.
-- This collector is expecting one row in the final pandas DataFrame. It is that first row that will be taken
-as the most recent values for each dimension on each chart using (`df.to_dict(orient='records')[0]`).
-See [pd.to_dict()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html).
+integrations/pandas.md \ No newline at end of file