summaryrefslogtreecommitdiffstats
path: root/doc/wiki/Design.InputStreams.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/wiki/Design.InputStreams.txt')
-rw-r--r--doc/wiki/Design.InputStreams.txt207
1 files changed, 207 insertions, 0 deletions
diff --git a/doc/wiki/Design.InputStreams.txt b/doc/wiki/Design.InputStreams.txt
new file mode 100644
index 0000000..c64d4cb
--- /dev/null
+++ b/doc/wiki/Design.InputStreams.txt
@@ -0,0 +1,207 @@
+Input Streams
+=============
+
+'lib/istream.h' describes Dovecot's input streams. Input streams can be stacked
+on top of each others as many times as wanted.
+
+Input streams actually reading data:
+
+ * file: Read data from fd using 'pread()' for files and 'read()' for
+ non-files.
+ * unix: Read data from UNIX socket. Similar to file, but supports receiving
+ file descriptors.
+ * mmap: Read data from file using 'mmap()'. This usually seems to be slower
+ than just using it with 'read()', so this input stream is probably quite
+ unnecessary.
+ * data: Read data from memory.
+
+Input stream filters:
+
+ * concat: Concatenate multiple input streams together
+ * chain: Chain multiple input streams together. Similar to istream-concat, but
+ more istreams can be added after initialization and EOF needs to be
+ explicitly added.
+ * seekable: Make a number of (possibly non-seekable) input streams into a
+ single seekable input stream. If all of the input streams are already
+ seekable, a concat stream is created instead.
+ * Usually the only non-seekable input streams are non-file fds, such as
+ pipes or sockets.
+ * crlf: Change all newlines to either LFs or CRLFs, by adding or removing CRs
+ as necessary.
+ * limit: Limit input stream's length, so after reading a given number of bytes
+ it returns EOF.
+ * sized: Require istream's length to be exactly the given size, or the last
+ read returns error.
+ * timeout: Fail the read when given timeout is reached.
+ * try: Read from the first input stream that doesn't fail with EINVAL.
+ * tee: Fork an input stream to multiple streams that can be read
+ independently.
+ * multiplex: Multiplex-iostreams support multiple iostream channels inside a
+ single parent istream.
+ * callback: Build an input stream by calling callback functions that return
+ the data.
+ * base64-encoder, base64-decoder: Encode/decode base64.
+ * failure-at: Insert a failure at the specified offset. This can be useful for
+ testing.
+ * hash: Calculate hash of the istream while it's being read.
+ * lib-mail/dot: Read SMTP-style DATA input where the input ends with an empty
+ "." line.
+ * lib-mail/header-filter: Add/remove/modify email headers.
+ * lib-compression/*: Read zlib/bzlib/lz4/lzma compressed data.
+
+Reading
+-------
+
+'i_stream_read()' tries to read more data into the stream's buffer. It returns:
+
+ * -2: Nothing was read, because buffer is full.
+ * -1: Either input reached EOF, or read failed and stream_errno was set.
+ * 0: Input stream is non-blocking, and no more input is available now.
+ * >0: Number of bytes read.
+
+Reading from a stream doesn't actually go forward in the stream, that needs to
+be done manually with 'i_stream_skip()'. This makes it easy to read full data
+records into the stream directly instead of creating separate buffers. For
+example when reading line-based input you can keep reading input into the
+stream until you find LF and then just access the string directly from the
+input buffer. There are actually helper functions for
+this:'i_stream_next_line()' attempts to return the next line if available,
+'i_stream_read_next_line()' does the same but does a read to try to get the
+data.
+
+Because more and more data can be read into the buffer, the buffer size is
+typically limited, and once this limit is reached read returns -2. The buffer
+size is usually given as parameter to the 'i_stream_create_*()', filters use
+their parent stream's buffer size. The buffer size can be also changed with
+'i_stream_set_max_buffer_size()'. Figuring out what the buffer size should be
+depends on the situation. It should be large enough to contain all valid input,
+but small enough that users can't cause a DoS by sending a too large record and
+having Dovecot eat up all the memory.
+
+Once read returns -1, the stream has reached EOF. 'stream->eof=TRUE' is also
+set. In this situation it's important to remember that there may still be data
+available in the buffer. If 'i_stream_have_bytes_left()' returns FALSE, there
+really isn't anything left to read.
+
+Whenever i_stream_read() returns >0, all the existing pointers are potentially
+invalidated. v2.3+: When i_stream_read() returns<= 0, the data previously
+returned by i_stream_get_data() are still valid, preserved in "snapshots".
+(<v2.3 may or may not have invalidated them.)
+
+Example:
+
+---%<-------------------------------------------------------------------------
+/* read line-based data from file_fd, buffer size has no limits */
+struct istream *input = i_stream_create_fd(file_fd, (size_t)-1, FALSE);
+const char *line;
+
+/* return the last line also even if it doesn't end with LF.
+ this is generally a good idea when reading files (but not a good idea
+ when reading commands from e.g. socket). */
+i_stream_set_return_partial_line(input, TRUE);
+while ((line = i_stream_read_next_line(input)) != NULL) {
+ /* handle line */
+}
+i_stream_destroy(&input);
+---%<-------------------------------------------------------------------------
+
+Internals
+---------
+
+'lib/istream-internal.h' describes the internal API that input streams need to
+implement. The methods that need to be implemented are:
+
+ * 'read()' is the most important function. It can also be tricky to get it
+ completely bug-free. See the existing unit tests for other istreams and try
+ to test the edge cases as well (such as ability to read one byte at a time
+ and also with max buffer size of 1). When it needs to read from parent
+ streams, try to use 'i_stream_read_memarea(parent)' if possible so a new
+ snapshot isn't unnecessarily created (see the snapshot discussion below).
+ * 'seek(v_offset, mark)' seeks to given offset. The 'mark' parameter is
+ necessary only when it's difficult to seek backwards in the stream, such as
+ when reading compressed input.
+ * 'sync()' removes everything from internal buffers, so that if the underlying
+ file has changed the changes get noticed immediately after sync.
+ * 'get_size(exact)' returns the size of the input stream, if it's known. If
+ 'exact=TRUE', the returned size must be the same how many bytes can be read
+ from the input. If 'exact=FALSE', the size is mainly used to compare against
+ another stat to see if the underlying input had changed. For example with
+ compressed input the size could be the compressed size.
+ * 'stat(exact)' stats the file, filling as much of the fields as makes sense.
+ 'st_size' field is filled the same way as with 'get_size()', or set to -1 if
+ it's unknown.
+ * 'snapshot(prev_snapshot)' creates a snapshot of the data that is currently
+ available via i_stream_get_data(), merges it with prev_snapshot (if any) and
+ returns the merged snapshot (see below more more details).
+
+There are some variables available:
+
+ * 'buffer' contains pointer to the data.
+ * First 'skip' bytes of the buffer are already skipped over (with
+ 'i_stream_skip()' or seeking).
+ * Data up to 'pos' bytes (beginning after 'skip') in the buffer are available
+ with 'i_stream_get_data()'. If pos=skip, it means there is no available data
+ in the buffer.
+
+If your input stream needs a write buffer, you can use some of the common
+helper functions and variables:
+
+ * 'w_buffer' contain the pointer where you can write data. It should be kept
+ in sync with 'buffer'.
+ * 'buffer_size' specifies the buffer's size, and 'max_buffer_size' the max.
+ size the buffer can be grown to.
+ * 'i_stream_try_alloc(wanted_size, size_r)' can be used when you want to store
+ 'wanted_bytes' into 'w_buffer'. If the buffer isn't large enough for it,
+ it's grown if possible. The buffer isn't grown above the stream's max buffer
+ size. The returned 'size_r' specifies how many bytes are actually available
+ for writing at 'stream->w_buffer + stream->pos'.
+ * 'i_stream_alloc(size) is like 'i_stream_try_alloc()', except it always
+ succeeds allocating'size` bytes, even if it has to grow the buffer larger
+ then the stream's max buffer size.
+ * Lower-level memory allocation functions:
+ * 'i_stream_w_buffer_realloc(old_size)' reallocates 'w_buffer' to the
+ current 'buffer_size'. If memarea's refcount is 1, this can be done with
+ 'i_realloc()', otherwise new memory is allocated.
+ * 'i_stream_grow_buffer(bytes)' grows the 'w_buffer' by the given number of
+ bytes, if possible. It won't reach the stream's current max buffer size.
+ The caller must verify from 'buffer_size' how large the buffer became as
+ a result of this call.
+ * 'i_stream_compress()' attempts to compress the current 'w_buffer' by
+ removing already-skipped data with 'memmove()'. If 'skip' is 0, it does
+ nothing. Note that this function must not be called if 'memarea' has
+ refcount>1. Otherwise that could be modifying a snapshotted memarea.
+
+The snapshots have made implementing slightly more complicated than earlier.
+There are a few different ways to implement istreams:
+
+ * Always point 'buffer=w_buffer' and use 'i_stream_try_alloc()' and/or
+ 'i_stream_alloc()' to allocate the 'w_buffer'. The generic code will handle
+ all the snapshotting. Use 'i_stream_read_memarea()' to read data from parent
+ stream so multiple snapshots aren't unnecessarily created.
+ * Guarantee that if 'read()' returns <=0, the existing 'buffer' will stay
+ valid. Use 'ISTREAM_CREATE_FLAG_NOOP_SNAPSHOT' flag in 'i_stream_create()'
+ so your filter stream isn't unnecessarily snapshotted (or causing a panic
+ due to missing 'snapshot()' implementation).
+ * One way of doing this with filter streams is to read from the parent
+ stream via 'i_stream_read(parent)' and always use
+ 'buffer=i_stream_get_data(parent)'. The parent's snapshotting guarantees
+ that the buffer will stay valid.
+ * Implement the 'snapshot()' yourself in the stream. You'll need to create a
+ new memarea of the current data available via 'i_stream_get_data()' and it
+ must not change, i.e. most likely you'll need to duplicate the allocated
+ memory. Create a new 'struct istream_snapshot' and assign the allocated
+ memarea to its 'old_memarea'. Fill 'prev_snapshot' field and return your new
+ snapshot. The snapshot will be freed by the generic istream code either when
+ the next 'read()' returns >0 or when the istream is destroyed.
+ * Filter streams that only pass through parent stream's contents without
+ changes can just point to the parent stream. The default snapshotting causes
+ the parent to be snapshotted, so the filter stream can simply use
+ 'i_stream_read_memarea()' and point to the parent's buffer.
+
+When Dovecot is configured with '--enable-devel-checks', 'i_stream_read()' will
+verify that the first and the last two bytes of the buffer didn't unexpectedly
+change due to a 'read()'. While developing istream changes you should use this
+to make sure the istream is working properly. Running the istream unit test
+also via valgrind can also be used to verify that the buffer wasn't freed.
+
+(This file was created from the wiki on 2019-06-19 12:42)