summaryrefslogtreecommitdiffstats
path: root/doc/design-thoughts/http2.txt
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--doc/design-thoughts/http2.txt277
1 files changed, 277 insertions, 0 deletions
diff --git a/doc/design-thoughts/http2.txt b/doc/design-thoughts/http2.txt
new file mode 100644
index 0000000..c21ac10
--- /dev/null
+++ b/doc/design-thoughts/http2.txt
@@ -0,0 +1,277 @@
+2014/10/23 - design thoughts for HTTP/2
+
+- connections : HTTP/2 depends a lot more on a connection than HTTP/1 because a
+ connection holds a compression context (headers table, etc...). We probably
+ need to have an h2_conn struct.
+
+- multiple transactions will be handled in parallel for a given h2_conn. They
+ are called streams in HTTP/2 terminology.
+
+- multiplexing : for a given client-side h2 connection, we can have multiple
+ server-side h2 connections. And for a server-side h2 connection, we can have
+ multiple client-side h2 connections. Streams circulate in N-to-N fashion.
+
+- flow control : flow control will be applied between multiple streams. Special
+ care must be taken so that an H2 client cannot block some H2 servers by
+ sending requests spread over multiple servers to the point where one server
+ response is blocked and prevents other responses from the same server from
+ reaching their clients. H2 connection buffers must always be empty or nearly
+ empty. The per-stream flow control needs to be respected as well as the
+ connection's buffers. It is important to implement some fairness between all
+ the streams so that it's not always the same which gets the bandwidth when
+ the connection is congested.
+
+- some clients can be H1 with an H2 server (is this really needed ?). Most of
+ the initial use case will be H2 clients to H1 servers. It is important to keep
+ in mind that H1 servers do not do flow control and that we don't want them to
+ block transfers (eg: post upload).
+
+- internal tasks : some H2 clients will be internal tasks (eg: health checks).
+ Some H2 servers will be internal tasks (eg: stats, cache). The model must be
+ compatible with this use case.
+
+- header indexing : headers are transported compressed, with a reference to a
+ static or a dynamic header, or a literal, possibly huffman-encoded. Indexing
+ is specific to the H2 connection. This means there is no way any binary data
+ can flow between both sides, headers will have to be decoded according to the
+ incoming connection's context and re-encoded according to the outgoing
+ connection's context, which can significantly differ. In order to avoid the
+ parsing trouble we currently face, headers will have to be clearly split
+ between name and value. It is worth noting that neither the incoming nor the
+ outgoing connections' contexts will be of any use while processing the
+ headers. At best we can have some shortcuts for well-known names that map
+ well to the static ones (eg: use the first static entry with same name), and
+ maybe have a few special cases for static name+value as well. Probably we can
+ classify headers in such categories :
+
+ - static name + value
+ - static name + other value
+ - dynamic name + other value
+
+ This will allow for better processing in some specific cases. Headers
+ supporting a single value (:method, :status, :path, ...) should probably
+ be stored in a single location with a direct access. That would allow us
+ to retrieve a method using hdr[METHOD]. All such indexing must be performed
+ while parsing. That also means that HTTP/1 will have to be converted to this
+ representation very early in the parser and possibly converted back to H/1
+ after processing.
+
+ Header names/values will have to be placed in a small memory area that will
+ inevitably get fragmented as headers are rewritten. An automatic packing
+ mechanism must be implemented so that when there's no more room, headers are
+ simply defragmented/packet to a new table and the old one is released. Just
+ like for the static chunks, we need to have a few such tables pre-allocated
+ and ready to be swapped at any moment. Repacking must not change any index
+ nor affect the way headers are compressed so that it can happen late after a
+ retry (send-name-header for example).
+
+- header processing : can still happen on a (header, value) basis. Reqrep/
+ rsprep completely disappear and will have to be replaced with something else
+ to support renaming headers and rewriting url/path/...
+
+- push_promise : servers can push dummy requests+responses. They advertise
+ the stream ID in the push_promise frame indicating the associated stream ID.
+ This means that it is possible to initiate a client-server stream from the
+ information coming from the server and make the data flow as if the client
+ had made it. It's likely that we'll have to support two types of server
+ connections: those which support push and those which do not. That way client
+ streams will be distributed to existing server connections based on their
+ capabilities. It's important to keep in mind that PUSH will not be rewritten
+ in responses.
+
+- stream ID mapping : since the stream ID is per H2 connection, stream IDs will
+ have to be mapped. Thus a given stream is an entity with two IDs (one per
+ side). Or more precisely a stream has two end points, each one carrying an ID
+ when it ends on an HTTP2 connection. Also, for each stream ID we need to
+ quickly find the associated transaction in progress. Using a small quick
+ unique tree seems indicated considering the wide range of valid values.
+
+- frame sizes : frame have to be remapped between both sides as multiplexed
+ connections won't always have the same characteristics. Thus some frames
+ might be spliced and others will be sliced.
+
+- error processing : care must be taken to never break a connection unless it
+ is dead or corrupt at the protocol level. Stats counter must exist to observe
+ the causes. Timeouts are a great problem because silent connections might
+ die out of inactivity. Ping frames should probably be scheduled a few seconds
+ before the connection timeout so that an unused connection is verified before
+ being killed. Abnormal requests must be dealt with using RST_STREAM.
+
+- ALPN : ALPN must be observed on the client side, and transmitted to the server
+ side.
+
+- proxy protocol : proxy protocol makes little to no sense in a multiplexed
+ protocol. A per-stream equivalent will surely be needed if implementations
+ do not quickly generalize the use of Forward.
+
+- simplified protocol for local devices (eg: haproxy->varnish in clear and
+ without handshake, and possibly even with splicing if the connection's
+ settings are shared)
+
+- logging : logging must report a number of extra information such as the
+ stream ID, and whether the transaction was initiated by the client or by the
+ server (which can be deduced from the stream ID's parity). In case of push,
+ the number of the associated stream must also be reported.
+
+- memory usage : H2 increases memory usage by mandating use of 16384 bytes
+ frame size minimum. That means slightly more than 16kB of buffer in each
+ direction to process any frame. It will definitely have an impact on the
+ deployed maxconn setting in places using less than this (4..8kB are common).
+ Also, the header list is persistent per connection, so if we reach the same
+ size as the request, that's another 16kB in each direction, resulting in
+ about 48kB of memory where 8 were previously used. A more careful encoder
+ can work with a much smaller set even if that implies evicting entries
+ between multiple headers of the same message.
+
+- HTTP/1.0 should very carefully be transported over H2. Since there's no way
+ to pass version information in the protocol, the server could use some
+ features of HTTP/1.1 that are unsafe in HTTP/1.0 (compression, trailers,
+ ...).
+
+- host / :authority : ":authority" is the norm, and "host" will be absent when
+ H2 clients generate :authority. This probably means that a dummy Host header
+ will have to be produced internally from :authority and removed when passing
+ to H2 behind. This can cause some trouble when passing H2 requests to H1
+ proxies, because there's no way to know if the request should contain scheme
+ and authority in H1 or not based on the H2 request. Thus a "proxy" option
+ will have to be explicitly mentioned on HTTP/1 server lines. One of the
+ problem that it creates is that it's not longer possible to pass H/1 requests
+ to H/1 proxies without an explicit configuration. Maybe a table of the
+ various combinations is needed.
+
+ :scheme :authority host
+ HTTP/2 request present present absent
+ HTTP/1 server req absent absent present
+ HTTP/1 proxy req present present present
+
+ So in the end the issue is only with H/2 requests passed to H/1 proxies.
+
+- ping frames : they don't indicate any stream ID so by definition they cannot
+ be forwarded to any server. The H2 connection should deal with them only.
+
+There's a layering problem with H2. The framing layer has to be aware of the
+upper layer semantics. We can't simply re-encode HTTP/1 to HTTP/2 then pass
+it over a framing layer to mux the streams, the frame type must be passed below
+so that frames are properly arranged. Header encoding is connection-based and
+all streams using the same connection will interact in the way their headers
+are encoded. Thus the encoder *has* to be placed in the h2_conn entity, and
+this entity has to know for each stream what its headers are.
+
+Probably that we should remove *all* headers from transported data and move
+them on the fly to a parallel structure that can be shared between H1 and H2
+and consumed at the appropriate level. That means buffers only transport data.
+Trailers have to be dealt with differently.
+
+So if we consider an H1 request being forwarded between a client and a server,
+it would look approximately like this :
+
+ - request header + body land into a stream's receive buffer
+ - headers are indexed and stripped out so that only the body and whatever
+ follows remain in the buffer
+ - both the header index and the buffer with the body stay attached to the
+ stream
+ - the sender can rebuild the whole headers. Since they're found in a table
+ supposed to be stable, it can rebuild them as many times as desired and
+ will always get the same result, so it's safe to build them into the trash
+ buffer for immediate sending, just as we do for the PROXY protocol.
+ - the upper protocol should probably provide a build_hdr() callback which
+ when called by the socket layer, builds this header block based on the
+ current stream's header list, ready to be sent.
+ - the socket layer has to know how many bytes from the headers are left to be
+ forwarded prior to processing the body.
+ - the socket layer needs to consume only the acceptable part of the body and
+ must not release the buffer if any data remains in it (eg: pipelining over
+ H1). This is already handled by channel->o and channel->to_forward.
+ - we could possibly have another optional callback to send a preamble before
+ data, that could be used to send chunk sizes in H1. The danger is that it
+ absolutely needs to be stable if it has to be retried. But it could
+ considerably simplify de-chunking.
+
+When the request is sent to an H2 server, an H2 stream request must be made
+to the server, we find an existing connection whose settings are compatible
+with our needs (eg: tls/clear, push/no-push), and with a spare stream ID. If
+none is found, a new connection must be established, unless maxconn is reached.
+
+Servers must have a maxstream setting just like they have a maxconn. The same
+queue may be used for that.
+
+The "tcp-request content" ruleset must apply to the TCP layer. But with HTTP/2
+that becomes impossible (and useless). We still need something like the
+"tcp-request session" hook to apply just after the SSL handshake is done.
+
+It is impossible to defragment the body on the fly in HTTP/2. Since multiple
+messages are interleaved, we cannot wait for all of them and block the head of
+line. Thus if body analysis is required, it will have to use the stream's
+buffer, which necessarily implies a copy. That means that with each H2 end we
+necessarily have at least one copy. Sometimes we might be able to "splice" some
+bytes from one side to the other without copying into the stream buffer (same
+rules as for TCP splicing).
+
+In theory, only data should flow through the channel buffer, so each side's
+connector is responsible for encoding data (H1: linear/chunks, H2: frames).
+Maybe the same mechanism could be extrapolated to tunnels / TCP.
+
+Since we'd use buffers only for data (and for receipt of headers), we need to
+have dynamic buffer allocation.
+
+Thus :
+- Tx buffers do not exist. We allocate a buffer on the fly when we're ready to
+ send something that we need to build and that needs to be persistent in case
+ of partial send. H1 headers are built on the fly from the header table to a
+ temporary buffer that is immediately sent and whose amount of sent bytes is
+ the only information kept (like for PROXY protocol). H2 headers are more
+ complex since the encoding depends on what was successfully sent. Thus we
+ need to build them and put them into a temporary buffer that remains
+ persistent in case send() fails. It is possible to have a limited pool of
+ Tx buffers and refrain from sending if there is no more buffer available in
+ the pool. In that case we need a wake-up mechanism once a buffer is
+ available. Once the data are sent, the Tx buffer is then immediately recycled
+ in its pool. Note that no tx buffer being used (eg: for hdr or control) means
+ that we have to be able to serialize access to the connection and retry with
+ the same stream. It also means that a stream that times out while waiting for
+ the connector to read the second half of its request has to stay there, or at
+ least needs to be handled gracefully. However if the connector cannot read
+ the data to be sent, it means that the buffer is congested and the connection
+ is dead, so that probably means it can be killed.
+
+- Rx buffers have to be pre-allocated just before calling recv(). A connection
+ will first try to pick a buffer and disable reception if it fails, then
+ subscribe to the list of tasks waiting for an Rx buffer.
+
+- full Rx buffers might sometimes be moved around to the next buffer instead of
+ experiencing a copy. That means that channels and connectors must use the
+ same format of buffer, and that only the channel will have to see its
+ pointers adjusted.
+
+- Tx of data should be made as much as possible without copying. That possibly
+ means by directly looking into the connection buffer on the other side if
+ the local Tx buffer does not exist and the stream buffer is not allocated, or
+ even performing a splice() call between the two sides. One of the problem in
+ doing this is that it requires proper ordering of the operations (eg: when
+ multiple readers are attached to a same buffer). If the splitting occurs upon
+ receipt, there's no problem. If we expect to retrieve data directly from the
+ original buffer, it's harder since it contains various things in an order
+ which does not even indicate what belongs to whom. Thus possibly the only
+ mechanism to implement is the buffer permutation which guarantees zero-copy
+ and only in the 100% safe case. Also it's atomic and does not cause HOL
+ blocking.
+
+It makes sense to chose the frontend_accept() function right after the
+handshake ended. It is then possible to check the ALPN, the SNI, the ciphers
+and to accept to switch to the h2_conn_accept handler only if everything is OK.
+The h2_conn_accept handler will have to deal with the connection setup,
+initialization of the header table, exchange of the settings frames and
+preparing whatever is needed to fire new streams upon receipt of unknown
+stream IDs. Note: most of the time it will not be possible to splice() because
+we need to know in advance the amount of bytes to write the header, and here it
+will not be possible.
+
+H2 health checks must be seen as regular transactions/streams. The check runs a
+normal client which seeks an available stream from a server. The server then
+finds one on an existing connection or initiates a new H2 connection. The H2
+checks will have to be configurable for sharing streams or not. Another option
+could be to specify how many requests can be made over existing connections
+before insisting on getting a separate connection. Note that such separate
+connections might end up stacking up once released. So probably that they need
+to be recycled very quickly (eg: fix how many unused ones can exist max).
+