diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2019-10-13 08:36:33 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2019-10-13 08:36:33 +0000 |
commit | a30a849b78fa4fe8552141b7b2802d1af1b18c09 (patch) | |
tree | fab3c8bf29bf2d565595d4fa6a9413916ff02fee /database/engine | |
parent | Adding upstream version 1.17.1. (diff) | |
download | netdata-upstream/1.18.0.tar.xz netdata-upstream/1.18.0.zip |
Adding upstream version 1.18.0.upstream/1.18.0
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'database/engine')
-rw-r--r-- | database/engine/README.md | 186 | ||||
-rw-r--r-- | database/engine/pagecache.c | 36 | ||||
-rw-r--r-- | database/engine/pagecache.h | 38 | ||||
-rw-r--r-- | database/engine/rrdengine.c | 69 | ||||
-rw-r--r-- | database/engine/rrdengineapi.c | 42 | ||||
-rw-r--r-- | database/engine/rrdengineapi.h | 2 | ||||
-rw-r--r-- | database/engine/rrdenginelib.c | 4 |
7 files changed, 221 insertions, 156 deletions
diff --git a/database/engine/README.md b/database/engine/README.md index 7791a549f..78f3b15ec 100644 --- a/database/engine/README.md +++ b/database/engine/README.md @@ -1,18 +1,17 @@ # Database engine -The Database Engine works like a traditional -database. There is some amount of RAM dedicated to data caching and indexing and the rest of -the data reside compressed on disk. The number of history entries is not fixed in this case, -but depends on the configured disk space and the effective compression ratio of the data stored. -This is the **only mode** that supports changing the data collection update frequency -(`update_every`) **without losing** the previously stored metrics. +The Database Engine works like a traditional database. There is some amount of RAM dedicated to data caching and +indexing and the rest of the data reside compressed on disk. The number of history entries is not fixed in this case, +but depends on the configured disk space and the effective compression ratio of the data stored. This is the **only +mode** that supports changing the data collection update frequency (`update_every`) **without losing** the previously +stored metrics. ## Files -With the DB engine memory mode the metric data are stored in database files. These files are -organized in pairs, the datafiles and their corresponding journalfiles, e.g.: +With the DB engine memory mode the metric data are stored in database files. These files are organized in pairs, the +datafiles and their corresponding journalfiles, e.g.: -``` +```sh datafile-1-0000000001.ndf journalfile-1-0000000001.njf datafile-1-0000000002.ndf @@ -22,21 +21,19 @@ journalfile-1-0000000003.njf ... ``` -They are located under their host's cache directory in the directory `./dbengine` -(e.g. for localhost the default location is `/var/cache/netdata/dbengine/*`). The higher -numbered filenames contain more recent metric data. The user can safely delete some pairs -of files when Netdata is stopped to manually free up some space. +They are located under their host's cache directory in the directory `./dbengine` (e.g. for localhost the default +location is `/var/cache/netdata/dbengine/*`). The higher numbered filenames contain more recent metric data. The user +can safely delete some pairs of files when Netdata is stopped to manually free up some space. _Users should_ **back up** _their `./dbengine` folders if they consider this data to be important._ ## Configuration -There is one DB engine instance per Netdata host/node. That is, there is one `./dbengine` folder -per node, and all charts of `dbengine` memory mode in such a host share the same storage space -and DB engine instance memory state. You can select the memory mode for localhost by editing -netdata.conf and setting: +There is one DB engine instance per Netdata host/node. That is, there is one `./dbengine` folder per node, and all +charts of `dbengine` memory mode in such a host share the same storage space and DB engine instance memory state. You +can select the memory mode for localhost by editing netdata.conf and setting: -``` +```conf [global] memory mode = dbengine ``` @@ -44,110 +41,157 @@ netdata.conf and setting: For setting the memory mode for the rest of the nodes you should look at [streaming](../../streaming/). -The `history` configuration option is meaningless for `memory mode = dbengine` and is ignored -for any metrics being stored in the DB engine. +The `history` configuration option is meaningless for `memory mode = dbengine` and is ignored for any metrics being +stored in the DB engine. -All DB engine instances, for localhost and all other streaming recipient nodes inherit their -configuration from `netdata.conf`: +All DB engine instances, for localhost and all other streaming recipient nodes inherit their configuration from +`netdata.conf`: -``` +```conf [global] page cache size = 32 dbengine disk space = 256 ``` -The above values are the default and minimum values for Page Cache size and DB engine disk space -quota. Both numbers are in **MiB**. All DB engine instances will allocate the configured resources -separately. +The above values are the default and minimum values for Page Cache size and DB engine disk space quota. Both numbers are +in **MiB**. All DB engine instances will allocate the configured resources separately. -The `page cache size` option determines the amount of RAM in **MiB** that is dedicated to caching -Netdata metric values themselves. +The `page cache size` option determines the amount of RAM in **MiB** that is dedicated to caching Netdata metric values +themselves as far as queries are concerned. The total page cache size will be greater since data collection itself will +consume additional memory as is described in the [Memory requirements](#memory-requirements) section. -The `dbengine disk space` option determines the amount of disk space in **MiB** that is dedicated -to storing Netdata metric values and all related metadata describing them. +The `dbengine disk space` option determines the amount of disk space in **MiB** that is dedicated to storing Netdata +metric values and all related metadata describing them. ## Operation -The DB engine stores chart metric values in 4096-byte pages in memory. Each chart dimension gets -its own page to store consecutive values generated from the data collectors. Those pages comprise -the **Page Cache**. +The DB engine stores chart metric values in 4096-byte pages in memory. Each chart dimension gets its own page to store +consecutive values generated from the data collectors. Those pages comprise the **Page Cache**. -When those pages fill up they are slowly compressed and flushed to disk. -It can take `4096 / 4 = 1024 seconds = 17 minutes`, for a chart dimension that is being collected -every 1 second, to fill a page. Pages can be cut short when we stop Netdata or the DB engine -instance so as to not lose the data. When we query the DB engine for data we trigger disk read -I/O requests that fill the Page Cache with the requested pages and potentially evict cold -(not recently used) pages. +When those pages fill up they are slowly compressed and flushed to disk. It can take `4096 / 4 = 1024 seconds = 17 +minutes`, for a chart dimension that is being collected every 1 second, to fill a page. Pages can be cut short when we +stop Netdata or the DB engine instance so as to not lose the data. When we query the DB engine for data we trigger disk +read I/O requests that fill the Page Cache with the requested pages and potentially evict cold (not recently used) +pages. -When the disk quota is exceeded the oldest values are removed from the DB engine at real time, by -automatically deleting the oldest datafile and journalfile pair. Any corresponding pages residing -in the Page Cache will also be invalidated and removed. The DB engine logic will try to maintain -between 10 and 20 file pairs at any point in time. +When the disk quota is exceeded the oldest values are removed from the DB engine at real time, by automatically deleting +the oldest datafile and journalfile pair. Any corresponding pages residing in the Page Cache will also be invalidated +and removed. The DB engine logic will try to maintain between 10 and 20 file pairs at any point in time. -The Database Engine uses direct I/O to avoid polluting the OS filesystem caches and does not -generate excessive I/O traffic so as to create the minimum possible interference with other -applications. +The Database Engine uses direct I/O to avoid polluting the OS filesystem caches and does not generate excessive I/O +traffic so as to create the minimum possible interference with other applications. ## Memory requirements -Using memory mode `dbengine` we can overcome most memory restrictions and store a dataset that -is much larger than the available memory. +Using memory mode `dbengine` we can overcome most memory restrictions and store a dataset that is much larger than the +available memory. -There are explicit memory requirements **per** DB engine **instance**, meaning **per** Netdata -**node** (e.g. localhost and streaming recipient nodes): +There are explicit memory requirements **per** DB engine **instance**, meaning **per** Netdata **node** (e.g. localhost +and streaming recipient nodes): -- `page cache size` must be at least `#dimensions-being-collected x 4096 x 2` bytes. +- The total page cache memory footprint will be an additional `#dimensions-being-collected x 4096 x 2` bytes over what + the user configured with `page cache size`. - an additional `#pages-on-disk x 4096 x 0.03` bytes of RAM are allocated for metadata. - roughly speaking this is 3% of the uncompressed disk space taken by the DB files. - - for very highly compressible data (compression ratio > 90%) this RAM overhead - is comparable to the disk space footprint. + - for very highly compressible data (compression ratio > 90%) this RAM overhead is comparable to the disk space + footprint. -An important observation is that RAM usage depends on both the `page cache size` and the -`dbengine disk space` options. +An important observation is that RAM usage depends on both the `page cache size` and the `dbengine disk space` options. ## File descriptor requirements -The Database Engine may keep a **significant** amount of files open per instance (e.g. per streaming -slave or master server). When configuring your system you should make sure there are at least 50 -file descriptors available per `dbengine` instance. +The Database Engine may keep a **significant** amount of files open per instance (e.g. per streaming slave or master +server). When configuring your system you should make sure there are at least 50 file descriptors available per +`dbengine` instance. -Netdata allocates 25% of the available file descriptors to its Database Engine instances. This means that only 25% -of the file descriptors that are available to the Netdata service are accessible by dbengine instances. -You should take that into account when configuring your service -or system-wide file descriptor limits. You can roughly estimate that the Netdata service needs 2048 file -descriptors for every 10 streaming slave hosts when streaming is configured to use `memory mode = dbengine`. +Netdata allocates 25% of the available file descriptors to its Database Engine instances. This means that only 25% of +the file descriptors that are available to the Netdata service are accessible by dbengine instances. You should take +that into account when configuring your service or system-wide file descriptor limits. You can roughly estimate that the +Netdata service needs 2048 file descriptors for every 10 streaming slave hosts when streaming is configured to use +`memory mode = dbengine`. -If for example one wants to allocate 65536 file descriptors to the Netdata service on a systemd system -one needs to override the Netdata service by running `sudo systemctl edit netdata` and creating a -file with contents: +If for example one wants to allocate 65536 file descriptors to the Netdata service on a systemd system one needs to +override the Netdata service by running `sudo systemctl edit netdata` and creating a file with contents: -``` +```sh [Service] LimitNOFILE=65536 ``` For other types of services one can add the line: -``` +```sh ulimit -n 65536 ``` -at the beginning of the service file. Alternatively you can change the system-wide limits of the kernel by changing `/etc/sysctl.conf`. For linux that would be: +at the beginning of the service file. Alternatively you can change the system-wide limits of the kernel by changing + `/etc/sysctl.conf`. For linux that would be: -``` +```conf fs.file-max = 65536 ``` In FreeBSD and OS X you change the lines like this: -``` +```conf kern.maxfilesperproc=65536 kern.maxfiles=65536 ``` You can apply the settings by running `sysctl -p` or by rebooting. +## Evaluation + +We have evaluated the performance of the `dbengine` API that the netdata daemon uses internally. This is **not** the +web API of netdata. Our benchmarks ran on a **single** `dbengine` instance, multiple of which can be running in a +netdata master server. We used a server with an AMD Ryzen Threadripper 2950X 16-Core Processor and 2 disk drives, a +Seagate Constellation ES.3 2TB magnetic HDD and a SAMSUNG MZQLB960HAJR-00007 960GB NAND Flash SSD. + +For our workload, we defined 32 charts with 128 metrics each, giving us a total of 4096 metrics. We defined 1 worker +thread per chart (32 threads) that generates new data points with a data generation interval of 1 second. The time axis +of the time-series is emulated and accelerated so that the worker threads can generate as many data points as possible +without delays. + +We also defined 32 worker threads that perform queries on random metrics with semi-random time ranges. The +starting time of the query is randomly selected between the beginning of the time-series and the time of the latest data +point. The ending time is randomly selected between 1 second and 1 hour after the starting time. The pseudo-random +numbers are generated with a uniform distribution. + +The data are written to the database at the same time as they are read from it. This is a concurrent read/write mixed +workload with a duration of 60 seconds. The faster `dbengine` runs, the bigger the dataset size becomes since more +data points will be generated. We set a page cache size of 64MiB for the two disk-bound scenarios. This way, the dataset +size of the metric data is much bigger than the RAM that is being used for caching so as to trigger I/O requests most +of the time. In our final scenario, we set the page cache size to 16 GiB. That way, the dataset fits in the page cache +so as to avoid all disk bottlenecks. + +The reported numbers are the following: + +| device | page cache | dataset | reads/sec | writes/sec | +| :---: | :---: | ---: | ---: | ---: | +| HDD | 64 MiB | 4.1 GiB | 813K | 18.0M | +| SSD | 64 MiB | 9.8 GiB | 1.7M | 43.0M | +| N/A | 16 GiB | 6.8 GiB |118.2M | 30.2M | + +where "reads/sec" is the number of metric data points being read from the database via its API per second and +"writes/sec" is the number of metric data points being written to the database per second. + +Notice that the HDD numbers are pretty high and not much slower than the SSD numbers. This is thanks to the database +engine design being optimized for rotating media. In the database engine disk I/O requests are: + +- asynchronous to mask the high I/O latency of HDDs. +- mostly large to reduce the amount of HDD seeking time. +- mostly sequential to reduce the amount of HDD seeking time. +- compressed to reduce the amount of required throughput. + +As a result, the HDD is not thousands of times slower than the SSD, which is typical for other workloads. + +An interesting observation to make is that the CPU-bound run (16 GiB page cache) generates fewer data than the SSD run +(6.8 GiB vs 9.8 GiB). The reason is that the 32 reader threads in the SSD scenario are more frequently blocked by I/O, +and generate a read load of 1.7M/sec, whereas in the CPU-bound scenario the read load is 70 times higher at 118M/sec. +Consequently, there is a significant degree of interference by the reader threads, that slow down the writer threads. +This is also possible because the interference effects are greater than the SSD impact on data generation throughput. + [![analytics](https://www.google-analytics.com/collect?v=1&aip=1&t=pageview&_s=1&ds=github&dr=https%3A%2F%2Fgithub.com%2Fnetdata%2Fnetdata&dl=https%3A%2F%2Fmy-netdata.io%2Fgithub%2Fdatabase%2Fengine%2FREADME&_u=MAC~&cid=5792dfd7-8dc4-476b-af31-da2fdb9f93d2&tid=UA-64295674-3)](<>) diff --git a/database/engine/pagecache.c b/database/engine/pagecache.c index 457bcb218..a419ba981 100644 --- a/database/engine/pagecache.c +++ b/database/engine/pagecache.c @@ -209,9 +209,31 @@ static void pg_cache_release_pages(struct rrdengine_instance *ctx, unsigned numb pg_cache_release_pages_unsafe(ctx, number); uv_rwlock_wrunlock(&pg_cache->pg_cache_rwlock); } + +/* + * This function returns the maximum number of pages allowed in the page cache. + * The caller must hold the page cache lock. + */ +static inline unsigned long pg_cache_hard_limit(struct rrdengine_instance *ctx) +{ + /* it's twice the number of producers since we pin 2 pages per producer */ + return ctx->max_cache_pages + 2 * (unsigned long)ctx->stats.metric_API_producers; +} + +/* + * This function returns the low watermark number of pages in the page cache. The page cache should strive to keep the + * number of pages below that number. + * The caller must hold the page cache lock. + */ +static inline unsigned long pg_cache_soft_limit(struct rrdengine_instance *ctx) +{ + /* it's twice the number of producers since we pin 2 pages per producer */ + return ctx->cache_pages_low_watermark + 2 * (unsigned long)ctx->stats.metric_API_producers; +} + /* * This function will block until it reserves #number populated pages. - * It will trigger evictions or dirty page flushing if the ctx->max_cache_pages limit is hit. + * It will trigger evictions or dirty page flushing if the pg_cache_hard_limit() limit is hit. */ static void pg_cache_reserve_pages(struct rrdengine_instance *ctx, unsigned number) { @@ -223,10 +245,10 @@ static void pg_cache_reserve_pages(struct rrdengine_instance *ctx, unsigned numb assert(number < ctx->max_cache_pages); uv_rwlock_wrlock(&pg_cache->pg_cache_rwlock); - if (pg_cache->populated_pages + number >= ctx->max_cache_pages + 1) + if (pg_cache->populated_pages + number >= pg_cache_hard_limit(ctx) + 1) debug(D_RRDENGINE, "==Page cache full. Reserving %u pages.==", number); - while (pg_cache->populated_pages + number >= ctx->max_cache_pages + 1) { + while (pg_cache->populated_pages + number >= pg_cache_hard_limit(ctx) + 1) { if (!pg_cache_try_evict_one_page_unsafe(ctx)) { /* failed to evict */ @@ -260,7 +282,7 @@ static void pg_cache_reserve_pages(struct rrdengine_instance *ctx, unsigned numb /* * This function will attempt to reserve #number populated pages. - * It may trigger evictions if the ctx->cache_pages_low_watermark limit is hit. + * It may trigger evictions if the pg_cache_soft_limit() limit is hit. * Returns 0 on failure and 1 on success. */ static int pg_cache_try_reserve_pages(struct rrdengine_instance *ctx, unsigned number) @@ -272,7 +294,7 @@ static int pg_cache_try_reserve_pages(struct rrdengine_instance *ctx, unsigned n assert(number < ctx->max_cache_pages); uv_rwlock_wrlock(&pg_cache->pg_cache_rwlock); - if (pg_cache->populated_pages + number >= ctx->cache_pages_low_watermark + 1) { + if (pg_cache->populated_pages + number >= pg_cache_soft_limit(ctx) + 1) { debug(D_RRDENGINE, "==Page cache full. Trying to reserve %u pages.==", number); @@ -280,11 +302,11 @@ static int pg_cache_try_reserve_pages(struct rrdengine_instance *ctx, unsigned n if (!pg_cache_try_evict_one_page_unsafe(ctx)) break; ++count; - } while (pg_cache->populated_pages + number >= ctx->cache_pages_low_watermark + 1); + } while (pg_cache->populated_pages + number >= pg_cache_soft_limit(ctx) + 1); debug(D_RRDENGINE, "Evicted %u pages.", count); } - if (pg_cache->populated_pages + number < ctx->max_cache_pages + 1) { + if (pg_cache->populated_pages + number < pg_cache_hard_limit(ctx) + 1) { pg_cache->populated_pages += number; ret = 1; /* success */ } diff --git a/database/engine/pagecache.h b/database/engine/pagecache.h index d464211e9..ab1a5c1ad 100644 --- a/database/engine/pagecache.h +++ b/database/engine/pagecache.h @@ -183,4 +183,42 @@ extern void free_page_cache(struct rrdengine_instance *ctx); extern void pg_cache_add_new_metric_time(struct pg_cache_page_index *page_index, struct rrdeng_page_descr *descr); extern void pg_cache_update_metric_times(struct pg_cache_page_index *page_index); +static inline void + pg_cache_atomic_get_pg_info(struct rrdeng_page_descr *descr, usec_t *end_timep, uint32_t *page_lengthp) +{ + usec_t end_time, old_end_time; + uint32_t page_length; + + if (NULL == descr->extent) { + /* this page is currently being modified, get consistent info locklessly */ + do { + end_time = descr->end_time; + __sync_synchronize(); + old_end_time = end_time; + page_length = descr->page_length; + __sync_synchronize(); + end_time = descr->end_time; + __sync_synchronize(); + } while ((end_time != old_end_time || (end_time & 1) != 0)); + + *end_timep = end_time; + *page_lengthp = page_length; + } else { + *end_timep = descr->end_time; + *page_lengthp = descr->page_length; + } +} + +/* The caller must hold a reference to the page and must have already set the new data */ +static inline void pg_cache_atomic_set_pg_info(struct rrdeng_page_descr *descr, usec_t end_time, uint32_t page_length) +{ + assert(!(end_time & 1)); + __sync_synchronize(); + descr->end_time |= 1; /* mark start of uncertainty period by adding 1 microsecond */ + __sync_synchronize(); + descr->page_length = page_length; + __sync_synchronize(); + descr->end_time = end_time; /* mark end of uncertainty period */ +} + #endif /* NETDATA_PAGECACHE_H */ diff --git a/database/engine/rrdengine.c b/database/engine/rrdengine.c index 36d917541..896d71f16 100644 --- a/database/engine/rrdengine.c +++ b/database/engine/rrdengine.c @@ -5,6 +5,8 @@ rrdeng_stats_t global_io_errors = 0; rrdeng_stats_t global_fs_errors = 0; +rrdeng_stats_t global_pg_cache_warnings = 0; +rrdeng_stats_t global_pg_cache_errors = 0; rrdeng_stats_t rrdeng_reserved_file_descriptors = 0; void sanity_check(void) @@ -251,13 +253,10 @@ void flush_pages_cb(uv_fs_t* req) { struct rrdengine_worker_config* wc = req->loop->data; struct rrdengine_instance *ctx = wc->ctx; - struct page_cache *pg_cache = &ctx->pg_cache; struct extent_io_descriptor *xt_io_descr; struct rrdeng_page_descr *descr; struct page_cache_descr *pg_cache_descr; - int ret; unsigned i, count; - Word_t commit_id; xt_io_descr = req->data; if (req->result < 0) { @@ -277,13 +276,6 @@ void flush_pages_cb(uv_fs_t* req) /* care, we don't hold the descriptor mutex */ descr = xt_io_descr->descr_array[i]; - uv_rwlock_wrlock(&pg_cache->commited_page_index.lock); - commit_id = xt_io_descr->descr_commit_idx_array[i]; - ret = JudyLDel(&pg_cache->commited_page_index.JudyL_array, commit_id, PJE0); - assert(1 == ret); - --pg_cache->commited_page_index.nr_commited_pages; - uv_rwlock_wrunlock(&pg_cache->commited_page_index.lock); - pg_cache_replaceQ_insert(ctx, descr); rrdeng_page_descr_mutex_lock(ctx, descr); @@ -331,7 +323,7 @@ static int do_flush_pages(struct rrdengine_worker_config* wc, int force, struct if (force) { debug(D_RRDENGINE, "Asynchronous flushing of extent has been forced by page pressure."); } - uv_rwlock_rdlock(&pg_cache->commited_page_index.lock); + uv_rwlock_wrlock(&pg_cache->commited_page_index.lock); for (Index = 0, count = 0, uncompressed_payload_length = 0, PValue = JudyLFirst(pg_cache->commited_page_index.JudyL_array, &Index, PJE0), descr = unlikely(NULL == PValue) ? NULL : *PValue ; @@ -340,11 +332,15 @@ static int do_flush_pages(struct rrdengine_worker_config* wc, int force, struct PValue = JudyLNext(pg_cache->commited_page_index.JudyL_array, &Index, PJE0), descr = unlikely(NULL == PValue) ? NULL : *PValue) { + uint8_t page_write_pending; + assert(0 != descr->page_length); + page_write_pending = 0; rrdeng_page_descr_mutex_lock(ctx, descr); pg_cache_descr = descr->pg_cache_descr; if (!(pg_cache_descr->flags & RRD_PAGE_WRITE_PENDING)) { + page_write_pending = 1; /* care, no reference being held */ pg_cache_descr->flags |= RRD_PAGE_WRITE_PENDING; uncompressed_payload_length += descr->page_length; @@ -352,8 +348,14 @@ static int do_flush_pages(struct rrdengine_worker_config* wc, int force, struct eligible_pages[count++] = descr; } rrdeng_page_descr_mutex_unlock(ctx, descr); + + if (page_write_pending) { + ret = JudyLDel(&pg_cache->commited_page_index.JudyL_array, Index, PJE0); + assert(1 == ret); + --pg_cache->commited_page_index.nr_commited_pages; + } } - uv_rwlock_rdunlock(&pg_cache->commited_page_index.lock); + uv_rwlock_wrunlock(&pg_cache->commited_page_index.lock); if (!count) { debug(D_RRDENGINE, "%s: no pages eligible for flushing.", __func__); @@ -813,47 +815,6 @@ error_after_loop_init: complete(&ctx->rrdengine_completion); } - -#define NR_PAGES (256) -static void basic_functional_test(struct rrdengine_instance *ctx) -{ - int i, j, failed_validations; - uuid_t uuid[NR_PAGES]; - void *buf; - struct rrdeng_page_descr *handle[NR_PAGES]; - char uuid_str[UUID_STR_LEN]; - char backup[NR_PAGES][UUID_STR_LEN * 100]; /* backup storage for page data verification */ - - for (i = 0 ; i < NR_PAGES ; ++i) { - uuid_generate(uuid[i]); - uuid_unparse_lower(uuid[i], uuid_str); -// fprintf(stderr, "Generated uuid[%d]=%s\n", i, uuid_str); - buf = rrdeng_create_page(ctx, &uuid[i], &handle[i]); - /* Each page contains 10 times its own UUID stringified */ - for (j = 0 ; j < 100 ; ++j) { - strcpy(buf + UUID_STR_LEN * j, uuid_str); - strcpy(backup[i] + UUID_STR_LEN * j, uuid_str); - } - rrdeng_commit_page(ctx, handle[i], (Word_t)i); - } - fprintf(stderr, "\n********** CREATED %d METRIC PAGES ***********\n\n", NR_PAGES); - failed_validations = 0; - for (i = 0 ; i < NR_PAGES ; ++i) { - buf = rrdeng_get_latest_page(ctx, &uuid[i], (void **)&handle[i]); - if (NULL == buf) { - ++failed_validations; - fprintf(stderr, "Page %d was LOST.\n", i); - } - if (memcmp(backup[i], buf, UUID_STR_LEN * 100)) { - ++failed_validations; - fprintf(stderr, "Page %d data comparison with backup FAILED validation.\n", i); - } - rrdeng_put_page(ctx, handle[i]); - } - fprintf(stderr, "\n********** CORRECTLY VALIDATED %d/%d METRIC PAGES ***********\n\n", - NR_PAGES - failed_validations, NR_PAGES); - -} /* C entry point for development purposes * make "LDFLAGS=-errdengine_main" */ @@ -866,8 +827,6 @@ void rrdengine_main(void) if (ret) { exit(ret); } - basic_functional_test(ctx); - rrdeng_exit(ctx); fprintf(stderr, "Hello world!"); exit(0); diff --git a/database/engine/rrdengineapi.c b/database/engine/rrdengineapi.c index bf373f31c..5fa23d8fd 100644 --- a/database/engine/rrdengineapi.c +++ b/database/engine/rrdengineapi.c @@ -4,7 +4,7 @@ /* Default global database instance */ static struct rrdengine_instance default_global_ctx; -int default_rrdeng_page_cache_mb = RRDENG_MIN_PAGE_CACHE_SIZE_MB; +int default_rrdeng_page_cache_mb = 32; int default_rrdeng_disk_quota_mb = RRDENG_MIN_DISK_SPACE_MB; /* @@ -95,9 +95,8 @@ void rrdeng_store_metric_flush_current_page(RRDDIM *rd) if (likely(descr->page_length)) { int ret, page_is_empty; -#ifdef NETDATA_INTERNAL_CHECKS rrd_stat_atomic_add(&ctx->stats.metric_API_producers, -1); -#endif + if (handle->prev_descr) { /* unpin old second page */ pg_cache_put(ctx, handle->prev_descr); @@ -185,16 +184,14 @@ void rrdeng_store_metric_next(RRDDIM *rd, usec_t point_in_time, storage_number n } page = descr->pg_cache_descr->page; page[descr->page_length / sizeof(number)] = number; - descr->end_time = point_in_time; - descr->page_length += sizeof(number); + pg_cache_atomic_set_pg_info(descr, point_in_time, descr->page_length + sizeof(number)); + if (perfect_page_alignment) rd->rrdset->rrddim_page_alignment = descr->page_length; if (unlikely(INVALID_TIME == descr->start_time)) { descr->start_time = point_in_time; -#ifdef NETDATA_INTERNAL_CHECKS rrd_stat_atomic_add(&ctx->stats.metric_API_producers, 1); -#endif pg_cache_insert(ctx, handle->page_index, descr); } else { pg_cache_add_new_metric_time(handle->page_index, descr); @@ -312,8 +309,9 @@ unsigned rrdeng_variable_step_boundaries(RRDSET *st, time_t start_time, time_t e curr = &page_info_array[i]; *pginfo_to_points(curr) = 0; /* initialize to invalid page */ *pginfo_to_dt(curr) = 0; /* no known data collection interval yet */ - if (unlikely(INVALID_TIME == curr->start_time || INVALID_TIME == curr->end_time)) { - info("Ignoring page with invalid timestamp."); + if (unlikely(INVALID_TIME == curr->start_time || INVALID_TIME == curr->end_time || + curr->end_time < curr->start_time)) { + info("Ignoring page with invalid timestamps."); prev = old_prev; continue; } @@ -366,7 +364,7 @@ unsigned rrdeng_variable_step_boundaries(RRDSET *st, time_t start_time, time_t e continue; } - if (unlikely(0 == dt)) { /* unknown data collection interval */ + if (unlikely(0 == *pginfo_to_dt(curr))) { /* unknown data collection interval */ assert(1 == page_points); if (likely(NULL != prev)) { /* get interval from previous page */ @@ -454,7 +452,8 @@ storage_number rrdeng_load_metric_next(struct rrddim_query_handle *rrdimm_handle struct rrdeng_page_descr *descr; storage_number *page, ret; unsigned position, entries; - usec_t next_page_time, current_position_time; + usec_t next_page_time, current_position_time, page_end_time; + uint32_t page_length; handle = &rrdimm_handle->rrdeng; if (unlikely(INVALID_TIME == handle->next_page_time)) { @@ -464,15 +463,17 @@ storage_number rrdeng_load_metric_next(struct rrddim_query_handle *rrdimm_handle if (unlikely(NULL == (descr = handle->descr))) { /* it's the first call */ next_page_time = handle->next_page_time * USEC_PER_SEC; + } else { + pg_cache_atomic_get_pg_info(descr, &page_end_time, &page_length); } position = handle->position + 1; if (unlikely(NULL == descr || - position >= (descr->page_length / sizeof(storage_number)))) { + position >= (page_length / sizeof(storage_number)))) { /* We need to get a new page */ if (descr) { /* Drop old page's reference */ - handle->next_page_time = (descr->end_time / USEC_PER_SEC) + 1; + handle->next_page_time = (page_end_time / USEC_PER_SEC) + 1; if (unlikely(handle->next_page_time > rrdimm_handle->end_time)) { goto no_more_metrics; } @@ -492,26 +493,27 @@ storage_number rrdeng_load_metric_next(struct rrddim_query_handle *rrdimm_handle rrd_stat_atomic_add(&ctx->stats.metric_API_consumers, 1); #endif handle->descr = descr; + pg_cache_atomic_get_pg_info(descr, &page_end_time, &page_length); if (unlikely(INVALID_TIME == descr->start_time || - INVALID_TIME == descr->end_time)) { + INVALID_TIME == page_end_time)) { goto no_more_metrics; } - if (unlikely(descr->start_time != descr->end_time && next_page_time > descr->start_time)) { + if (unlikely(descr->start_time != page_end_time && next_page_time > descr->start_time)) { /* we're in the middle of the page somewhere */ - entries = descr->page_length / sizeof(storage_number); - position = ((uint64_t)(next_page_time - descr->start_time)) * entries / - (descr->end_time - descr->start_time + 1); + entries = page_length / sizeof(storage_number); + position = ((uint64_t)(next_page_time - descr->start_time)) * (entries - 1) / + (page_end_time - descr->start_time); } else { position = 0; } } page = descr->pg_cache_descr->page; ret = page[position]; - entries = descr->page_length / sizeof(storage_number); + entries = page_length / sizeof(storage_number); if (entries > 1) { usec_t dt; - dt = (descr->end_time - descr->start_time) / (entries - 1); + dt = (page_end_time - descr->start_time) / (entries - 1); current_position_time = descr->start_time + position * dt; } else { current_position_time = descr->start_time; diff --git a/database/engine/rrdengineapi.h b/database/engine/rrdengineapi.h index 9b1ab1874..c876705e4 100644 --- a/database/engine/rrdengineapi.h +++ b/database/engine/rrdengineapi.h @@ -5,7 +5,7 @@ #include "rrdengine.h" -#define RRDENG_MIN_PAGE_CACHE_SIZE_MB (32) +#define RRDENG_MIN_PAGE_CACHE_SIZE_MB (8) #define RRDENG_MIN_DISK_SPACE_MB (256) #define RRDENG_NR_STATS (33) diff --git a/database/engine/rrdenginelib.c b/database/engine/rrdenginelib.c index 96504b275..1a04dc2a4 100644 --- a/database/engine/rrdenginelib.c +++ b/database/engine/rrdenginelib.c @@ -8,7 +8,7 @@ void print_page_cache_descr(struct rrdeng_page_descr *descr) { struct page_cache_descr *pg_cache_descr = descr->pg_cache_descr; char uuid_str[UUID_STR_LEN]; - char str[BUFSIZE]; + char str[BUFSIZE + 1]; int pos = 0; uuid_unparse_lower(*descr->id, uuid_str); @@ -31,7 +31,7 @@ void print_page_cache_descr(struct rrdeng_page_descr *descr) void print_page_descr(struct rrdeng_page_descr *descr) { char uuid_str[UUID_STR_LEN]; - char str[BUFSIZE]; + char str[BUFSIZE + 1]; int pos = 0; uuid_unparse_lower(*descr->id, uuid_str); |