2014/04/16 - Pointer assignments during processing of the HTTP body In HAProxy, a struct http_msg is a descriptor for an HTTP message, which stores the state of an HTTP parser at any given instant, relative to a buffer which contains part of the message being inspected. Currently, an http_msg holds a few pointers and offsets to some important locations in a message depending on the state the parser is in. Some of these pointers and offsets may move when data are inserted into or removed from the buffer, others won't move. An important point is that the state of the parser only translates what the parser is reading, and not at all what is being done on the message (eg: forwarding). For an HTTP message and a buffer , we have the following elements to work with : Buffer : -------- buf.size : the allocated size of the buffer. A message cannot be larger than this size. In general, a message will even be smaller because the size is almost always reduced by global.maxrewrite bytes. buf.data : memory area containing the part of the message being worked on. This area is exactly bytes long. It should be seen as a sliding window over the message, but in terms of implementation, it's closer to a wrapping window. For ease of processing, new messages (requests or responses) are aligned to the beginning of the buffer so that they never wrap and common string processing functions can be used. buf.p : memory pointer (char *) to the beginning of the buffer as the parser understands it. It commonly refers to the first character of an HTTP request or response, but during forwarding, it can point to other locations. This pointer always points to a location in . buf.i : number of bytes after that are available in the buffer. If exceeds , then the pending data wrap at the end of the buffer and continue at . buf.o : number of bytes already processed before that are pending for departure. These bytes may leave at any instant once a connection is established. These ones may wrap before to start before . It's common to call the part between buf.p and buf.p+buf.i the input buffer, and the part between buf.p-buf.o and buf.p the output buffer. This design permits efficient forwarding without copies. As a result, forwarding one byte from the input buffer to the output buffer only consists in : - incrementing buf.p - incrementing buf.o - decrementing buf.i Message : --------- Unless stated otherwise, all values are relative to , and are always comprised between 0 and . These values are relative offsets and they do not need to take wrapping into account, they are used as if the buffer was an infinite length sliding window. The buffer management functions handle the wrapping automatically. msg.next : points to the next byte to inspect. This offset is automatically adjusted when inserting/removing some headers. In data states, it is automatically adjusted to the number of bytes already inspected. msg.sov : start of value. First character of the header's value in the header states, start of the body in the data states. Strictly positive values indicate that headers were not forwarded yet ( is before the start of the body), and null or negative values are seen after headers are forwarded ( is at or past the start of the body). The value stops changing when data start to leave the buffer (in order to avoid integer overflows). So the maximum possible range is - to +. This offset is automatically adjusted when inserting or removing some headers. It is useful to rewind the request buffer to the beginning of the body at any phase. The response buffer does not really use it since it is immediately forwarded to the client. msg.sol : start of line. Points to the beginning of the current header line while parsing headers. It is cleared to zero in the BODY state, and contains exactly the number of bytes comprising the preceding chunk size in the DATA state (which can be zero), so that the sum of msg.sov + msg.sol always points to the beginning of data for all states starting with DATA. For chunked encoded messages, this sum always corresponds to the beginning of the current chunk of data as it appears in the buffer, or to be more precise, it corresponds to the first of the remaining bytes of chunked data to be inspected. In TRAILERS state, it contains the length of the last parsed part of the trailer headers. msg.eoh : end of headers. Points to the CRLF (or LF) preceding the body and marking the end of headers. It is where new headers are appended. This offset is automatically adjusted when inserting/removing some headers. It always contains the size of the headers excluding the trailing CRLF even after headers have been forwarded. msg.eol : end of line. Points to the CRLF or LF of the current header line being inspected during the various header states. In data states, it holds the trailing CRLF length (1 or 2) so that msg.eoh + msg.eol always equals the exact header length. It is not affected during data states nor by forwarding. The beginning of the message headers can always be found this way even after headers or data have been forwarded, provided that everything is still present in the buffer : headers = buf.p + msg->sov - msg->eoh - msg->eol Message length : ---------------- msg.chunk_len : amount of bytes of the current chunk or total message body remaining to be inspected after msg.next. It is automatically incremented when parsing a chunk size, and decremented as data are forwarded. msg.body_len : total message body length, for logging. Equals Content-Length when used, otherwise is the sum of all correctly parsed chunks. Message state : --------------- msg.msg_state contains the current parser state, one of HTTP_MSG_*. The state indicates what byte is expected at msg->next. HTTP_MSG_BODY : all headers have been parsed, parsing of body has not started yet. HTTP_MSG_100_SENT : parsing of body has started. If a 100-Continue was needed it has already been sent. HTTP_MSG_DATA : some bytes are remaining for either the whole body when the message size is determined by Content-Length, or for the current chunk in chunked-encoded mode. HTTP_MSG_CHUNK_CRLF : msg->next points to the CRLF after the current data chunk. HTTP_MSG_TRAILERS : msg->next points to the beginning of a possibly empty trailer line after the final empty chunk. HTTP_MSG_DONE : all the Content-Length data has been inspected, or the final CRLF after trailers has been met. Message forwarding : -------------------- Forwarding part of a message consists in advancing buf.p up to the point where it points to the byte following the last one to be forwarded. This can be done inline if enough bytes are present in the buffer, or in multiple steps if more buffers need to be forwarded (possibly including splicing). Thus by definition, after a block has been scheduled for being forwarded, msg->next and msg->sov must be reset. The communication channel between the producer and the consumer holds a counter of extra bytes remaining to be forwarded directly without consulting analysers, after buf.p. This counter is called to_forward. It commonly holds the advertised chunk length or content-length that does not fit in the buffer. For example, if 2000 bytes are to be forwarded, and 10 bytes are present after buf.p as reported by buf.i, then both buf.o and buf.p will advance by 10, buf.i will be reset, and to_forward will be set to 1990 so that in total, 2000 bytes will be forwarded. At the end of the forwarding, buf.p will point to the first byte to be inspected after the 2000 forwarded bytes.