summaryrefslogtreecommitdiffstats
path: root/fluent-bit/CHUNKS.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-19 02:57:58 +0000
commitbe1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch)
tree9754ff1ca740f6346cf8483ec915d4054bc5da2d /fluent-bit/CHUNKS.md
parentInitial commit. (diff)
downloadnetdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.tar.xz
netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.zip
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'fluent-bit/CHUNKS.md')
-rw-r--r--fluent-bit/CHUNKS.md109
1 files changed, 109 insertions, 0 deletions
diff --git a/fluent-bit/CHUNKS.md b/fluent-bit/CHUNKS.md
new file mode 100644
index 00000000..852cc8e6
--- /dev/null
+++ b/fluent-bit/CHUNKS.md
@@ -0,0 +1,109 @@
+# Fluent Bit Chunks (internals)
+
+When using Fluent Bit you might read about `chunks`. A Chunk is a unit of
+data that groups multiple records of the same type under the same Tag.
+
+As part of the data ingestion workflow in the pipeline, input plugins who are in
+charge to collect information from different sources, encode the data as `records`
+in a MessagePack buffer and associate them with a Tag (a tag is used for routing).
+
+Internally, Fluent Bit offer two APIs to _ingest_ the records into the pipeline
+depending of the message type to ingest.
+
+- flb_input_chunk_append_raw(): logs ingestion, defined in flb_input_chunk.c
+- flb_input_metrics_append(): metrics ingestion, defined in flb_input_metric.c
+
+When invoking any of the functions mentioned above, the API will make sure to
+find a pre-existing Chunk of the same type that contains the exact same Tag specified
+by the caller, if no available Chunk exists, a new one is created.
+
+For reliability and flexibility reasons, an input plugin might specify that all
+Chunks associated to it will be only located in memory, others might enable
+```storage.type filesystem``` so the Chunk will be located also in filesystem.
+
+## Chunk I/O: Low level
+
+In the low level side, all the Chunks management magic happens on a thin library called
+[Chunk I/O](https://github.com/edsiper/chunkio). This library helps to provide
+different backend types such as memory and filesystem, checksums and care of file system
+data synchronization.
+
+The Chunks at the file system level has it own format, but it's totally agnostic from the
+content that Fluent Bit stores on it.
+
+The following is the layout of a Chunk in the file system:
+
+```
++--------------+----------------+
+| 0xC1 | 0x00 +--> Header 2 bytes
++--------------+----------------+
+| 4 BYTES CRC32 + 16 BYTES +--> CRC32(Content) + Padding
++-------------------------------+
+| Content |
+| +-------------------------+ |
+| | 2 BYTES +-----> Metadata Length
+| +-------------------------+ |
+| +-------------------------+ |
+| | | |
+| | Metadata +-----> Optional Metadata (up to 65535 bytes)
+| | | |
+| +-------------------------+ |
+| +-------------------------+ |
+| | | |
+| | Content Data +-----> User Data
+| | | |
+| +-------------------------+ |
++-------------------------------+
+```
+
+For Fluent Bit, the important areas of information are _Metadata_ and _Content Data_.
+
+## Metadata and Content Data
+
+On Fluent Bit the metadata and content handling has changed a bit, specifically from the
+original version implemented as of v1.8 and the changes on the new v1.9 series:
+
+### Fluent Bit >= v1.9
+
+Metadata on this version introduces 4 bytes at the beginning that identifies the
+format version by setting bytes 0xF1 and 0x77. The third byte called ```type```
+specifies the type of records the Chunk is storing, for Logs this value is ```0x0``` and for Metrics is ```0x1```. The four byte is unused for now.
+
+The following diagrams shows the data format:
+
+
+```
+ -- +---------+-------+
+ / | 0xF1 | 0x77 | <- Magic Bytes
+ / +---------+-------+
+Metadata < | Type | 0x00 | <- Chunk type and unused byte
+ \ +---------+-------+
+ \ | Tag | <- Tag associated to records in the content
+ -- +-----------------+
+ / | +-----------+ |
+ / | | | |
+Content Data < | | records | |
+ \ | | | |
+ \ | +-----------+ |
+ -- +-----------------+
+```
+
+Fluent Bit API provides backward compatibility with the previous metadata and content
+format found on series v1.8.
+
+### Fluent Bit <= v1.8
+
+Up to Fluent Bit <= 1.8.x, the metadata and content data is simple, where metadata
+only stores the Tag and content data the msgpack records.
+
+```
+ +-----------------+
+Metadata < | Tag | <- Tag associated to records in the content
+ -- +-----------------+
+ / | +-----------+ |
+ / | | | |
+Content Data < | | records | |
+ \ | | | |
+ \ | +-----------+ |
+ -- +-----------------+
+```