summaryrefslogtreecommitdiffstats
path: root/llhttp/README.md
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--llhttp/README.md501
1 files changed, 501 insertions, 0 deletions
diff --git a/llhttp/README.md b/llhttp/README.md
new file mode 100644
index 0000000..4960dbb
--- /dev/null
+++ b/llhttp/README.md
@@ -0,0 +1,501 @@
+# llhttp
+[![CI](https://github.com/nodejs/llhttp/workflows/CI/badge.svg)](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI)
+
+Port of [http_parser][0] to [llparse][1].
+
+## Why?
+
+Let's face it, [http_parser][0] is practically unmaintainable. Even
+introduction of a single new method results in a significant code churn.
+
+This project aims to:
+
+* Make it maintainable
+* Verifiable
+* Improving benchmarks where possible
+
+More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY)
+
+## How?
+
+Over time, different approaches for improving [http_parser][0]'s code base
+were tried. However, all of them failed due to resulting significant performance
+degradation.
+
+This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used
+to generate the output C source file, which could be compiled and
+linked with the embedder's program (like [Node.js][7]).
+
+## Performance
+
+So far llhttp outperforms http_parser:
+
+| | input size | bandwidth | reqs/sec | time |
+|:----------------|-----------:|-------------:|-----------:|--------:|
+| **llhttp** | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s |
+| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s |
+
+llhttp is faster by approximately **156%**.
+
+## Maintenance
+
+llhttp project has about 1400 lines of TypeScript code describing the parser
+itself and around 450 lines of C code and headers providing the helper methods.
+The whole [http_parser][0] is implemented in approximately 2500 lines of C, and
+436 lines of headers.
+
+All optimizations and multi-character matching in llhttp are generated
+automatically, and thus doesn't add any extra maintenance cost. On the contrary,
+most of http_parser's code is hand-optimized and unrolled. Instead describing
+"how" it should parse the HTTP requests/responses, a maintainer should
+implement the new features in [http_parser][0] cautiously, considering
+possible performance degradation and manually optimizing the new code.
+
+## Verification
+
+The state machine graph is encoded explicitly in llhttp. The [llparse][1]
+automatically checks the graph for absence of loops and correct reporting of the
+input ranges (spans) like header names and values. In the future, additional
+checks could be performed to get even stricter verification of the llhttp.
+
+## Usage
+
+```C
+#include "stdio.h"
+#include "llhttp.h"
+#include "string.h"
+
+int handle_on_message_complete(llhttp_t* parser) {
+ fprintf(stdout, "Message completed!\n");
+ return 0;
+}
+
+int main() {
+ llhttp_t parser;
+ llhttp_settings_t settings;
+
+ /*Initialize user callbacks and settings */
+ llhttp_settings_init(&settings);
+
+ /*Set user callback */
+ settings.on_message_complete = handle_on_message_complete;
+
+ /*Initialize the parser in HTTP_BOTH mode, meaning that it will select between
+ *HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first
+ *input.
+ */
+ llhttp_init(&parser, HTTP_BOTH, &settings);
+
+ /*Parse request! */
+ const char* request = "GET / HTTP/1.1\r\n\r\n";
+ int request_len = strlen(request);
+
+ enum llhttp_errno err = llhttp_execute(&parser, request, request_len);
+ if (err == HPE_OK) {
+ fprintf(stdout, "Successfully parsed!\n");
+ } else {
+ fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err), parser.reason);
+ }
+}
+```
+For more information on API usage, please refer to [src/native/api.h](https://github.com/nodejs/llhttp/blob/main/src/native/api.h).
+
+## API
+
+### llhttp_settings_t
+
+The settings object contains a list of callbacks that the parser will invoke.
+
+The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_PAUSED` (pause the parser):
+
+* `on_message_begin`: Invoked when a new request/response starts.
+* `on_message_complete`: Invoked when a request/response has been completedly parsed.
+* `on_url_complete`: Invoked after the URL has been parsed.
+* `on_method_complete`: Invoked after the HTTP method has been parsed.
+* `on_version_complete`: Invoked after the HTTP version has been parsed.
+* `on_status_complete`: Invoked after the status code has been parsed.
+* `on_header_field_complete`: Invoked after a header name has been parsed.
+* `on_header_value_complete`: Invoked after a header value has been parsed.
+* `on_chunk_header`: Invoked after a new chunk is started. The current chunk length is stored in `parser->content_length`.
+* `on_chunk_extension_name_complete`: Invoked after a chunk extension name is started.
+* `on_chunk_extension_value_complete`: Invoked after a chunk extension value is started.
+* `on_chunk_complete`: Invoked after a new chunk is received.
+* `on_reset`: Invoked after `on_message_complete` and before `on_message_begin` when a new message
+ is received on the same parser. This is not invoked for the first message of the parser.
+
+The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_USER` (error from the callback):
+
+* `on_url`: Invoked when another character of the URL is received.
+* `on_status`: Invoked when another character of the status is received.
+* `on_method`: Invoked when another character of the method is received.
+ When parser is created with `HTTP_BOTH` and the input is a response, this also invoked for the sequence `HTTP/`
+ of the first message.
+* `on_version`: Invoked when another character of the version is received.
+* `on_header_field`: Invoked when another character of a header name is received.
+* `on_header_value`: Invoked when another character of a header value is received.
+* `on_chunk_extension_name`: Invoked when another character of a chunk extension name is received.
+* `on_chunk_extension_value`: Invoked when another character of a extension value is received.
+
+The callback `on_headers_complete`, invoked when headers are completed, can return:
+
+* `0`: Proceed normally.
+* `1`: Assume that request/response has no body, and proceed to parsing the next message.
+* `2`: Assume absence of body (as above) and make `llhttp_execute()` return `HPE_PAUSED_UPGRADE`.
+* `-1`: Error
+* `HPE_PAUSED`: Pause the parser.
+
+### `void llhttp_init(llhttp_t* parser, llhttp_type_t type, const llhttp_settings_t* settings)`
+
+Initialize the parser with specific type and user settings.
+
+### `uint8_t llhttp_get_type(llhttp_t* parser)`
+
+Returns the type of the parser.
+
+### `uint8_t llhttp_get_http_major(llhttp_t* parser)`
+
+Returns the major version of the HTTP protocol of the current request/response.
+
+### `uint8_t llhttp_get_http_minor(llhttp_t* parser)`
+
+Returns the minor version of the HTTP protocol of the current request/response.
+
+### `uint8_t llhttp_get_method(llhttp_t* parser)`
+
+Returns the method of the current request.
+
+### `int llhttp_get_status_code(llhttp_t* parser)`
+
+Returns the method of the current response.
+
+### `uint8_t llhttp_get_upgrade(llhttp_t* parser)`
+
+Returns `1` if request includes the `Connection: upgrade` header.
+
+### `void llhttp_reset(llhttp_t* parser)`
+
+Reset an already initialized parser back to the start state, preserving the
+existing parser type, callback settings, user data, and lenient flags.
+
+### `void llhttp_settings_init(llhttp_settings_t* settings)`
+
+Initialize the settings object.
+
+### `llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len)`
+
+Parse full or partial request/response, invoking user callbacks along the way.
+
+If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing interrupts,
+and such errno is returned from `llhttp_execute()`. If `HPE_PAUSED` was used as a errno,
+the execution can be resumed with `llhttp_resume()` call.
+
+In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE` is returned
+after fully parsing the request/response. If the user wishes to continue parsing,
+they need to invoke `llhttp_resume_after_upgrade()`.
+
+**if this function ever returns a non-pause type error, it will continue to return
+the same error upon each successive call up until `llhttp_init()` is called.**
+
+### `llhttp_errno_t llhttp_finish(llhttp_t* parser)`
+
+This method should be called when the other side has no further bytes to
+send (e.g. shutdown of readable side of the TCP connection.)
+
+Requests without `Content-Length` and other messages might require treating
+all incoming bytes as the part of the body, up to the last byte of the
+connection.
+
+This method will invoke `on_message_complete()` callback if the
+request was terminated safely. Otherwise a error code would be returned.
+
+
+### `int llhttp_message_needs_eof(const llhttp_t* parser)`
+
+Returns `1` if the incoming message is parsed until the last byte, and has to be completed by calling `llhttp_finish()` on EOF.
+
+### `int llhttp_should_keep_alive(const llhttp_t* parser)`
+
+Returns `1` if there might be any other messages following the last that was
+successfully parsed.
+
+### `void llhttp_pause(llhttp_t* parser)`
+
+Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
+appropriate error reason.
+
+**Do not call this from user callbacks! User callbacks must return
+`HPE_PAUSED` if pausing is required.**
+
+### `void llhttp_resume(llhttp_t* parser)`
+
+Might be called to resume the execution after the pause in user's callback.
+
+See `llhttp_execute()` above for details.
+
+**Call this only if `llhttp_execute()` returns `HPE_PAUSED`.**
+
+### `void llhttp_resume_after_upgrade(llhttp_t* parser)`
+
+Might be called to resume the execution after the pause in user's callback.
+See `llhttp_execute()` above for details.
+
+**Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`**
+
+### `llhttp_errno_t llhttp_get_errno(const llhttp_t* parser)`
+
+Returns the latest error.
+
+### `const char* llhttp_get_error_reason(const llhttp_t* parser)`
+
+Returns the verbal explanation of the latest returned error.
+
+**User callback should set error reason when returning the error. See
+`llhttp_set_error_reason()` for details.**
+
+### `void llhttp_set_error_reason(llhttp_t* parser, const char* reason)`
+
+Assign verbal description to the returned error. Must be called in user
+callbacks right before returning the errno.
+
+**`HPE_USER` error code might be useful in user callbacks.**
+
+### `const char* llhttp_get_error_pos(const llhttp_t* parser)`
+
+Returns the pointer to the last parsed byte before the returned error. The
+pointer is relative to the `data` argument of `llhttp_execute()`.
+
+**This method might be useful for counting the number of parsed bytes.**
+
+### `const char* llhttp_errno_name(llhttp_errno_t err)`
+
+Returns textual name of error code.
+
+### `const char* llhttp_method_name(llhttp_method_t method)`
+
+Returns textual name of HTTP method.
+
+### `const char* llhttp_status_name(llhttp_status_t status)`
+
+Returns textual name of HTTP status.
+
+### `void llhttp_set_lenient_headers(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient header value parsing (disabled by default).
+Lenient parsing disables header value token checks, extending llhttp's
+protocol support to highly non-compliant clients/server.
+
+No `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when
+lenient parsing is "on".
+
+**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of conflicting `Transfer-Encoding` and
+`Content-Length` headers (disabled by default).
+
+Normally `llhttp` would error when `Transfer-Encoding` is present in
+conjunction with `Content-Length`.
+
+This error is important to prevent HTTP request smuggling, but may be less desirable
+for small number of cases involving legacy servers.
+
+**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of `Connection: close` and HTTP/1.0
+requests responses.
+
+Normally `llhttp` would error the HTTP request/response
+after the request/response with `Connection: close` and `Content-Length`.
+
+This is important to prevent cache poisoning attacks,
+but might interact badly with outdated and insecure clients.
+
+With this flag the extra request/response will be parsed normally.
+
+**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_transfer_encoding(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of `Transfer-Encoding` header.
+
+Normally `llhttp` would error when a `Transfer-Encoding` has `chunked` value
+and another value after it (either in a single header or in multiple
+headers whose value are internally joined using `, `).
+
+This is mandated by the spec to reliably determine request body size and thus
+avoid request smuggling.
+
+With this flag the extra value will be parsed normally.
+
+**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_version(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of HTTP version.
+
+Normally `llhttp` would error when the HTTP version in the request or status line
+is not `0.9`, `1.0`, `1.1` or `2.0`.
+With this flag the extra value will be parsed normally.
+
+**Enabling this flag can pose a security issue since you will allow unsupported HTTP versions. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_data_after_close(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of additional data received after a message ends
+and keep-alive is disabled.
+
+Normally `llhttp` would error when additional unexpected data is received if the message
+contains the `Connection` header with `close` value.
+With this flag the extra data will discarded without throwing an error.
+
+**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_optional_lf_after_cr(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of incomplete CRLF sequences.
+
+Normally `llhttp` would error when a CR is not followed by LF when terminating the
+request line, the status line, the headers or a chunk header.
+With this flag only a CR is required to terminate such sections.
+
+**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_optional_cr_before_lf(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of line separators.
+
+Normally `llhttp` would error when a LF is not preceded by CR when terminating the
+request line, the status line, the headers, a chunk header or a chunk data.
+With this flag only a LF is required to terminate such sections.
+
+**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_optional_crlf_after_chunk(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of chunks not separated via CRLF.
+
+Normally `llhttp` would error when after a chunk data a CRLF is missing before
+starting a new chunk.
+With this flag the new chunk can start immediately after the previous one.
+
+**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
+
+### `void llhttp_set_lenient_spaces_after_chunk_size(llhttp_t* parser, int enabled)`
+
+Enables/disables lenient handling of spaces after chunk size.
+
+Normally `llhttp` would error when after a chunk size is followed by one or more spaces are present instead of a CRLF or `;`.
+With this flag this check is disabled.
+
+**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**
+
+## Build Instructions
+
+Make sure you have [Node.js](https://nodejs.org/), npm and npx installed. Then under project directory run:
+
+```sh
+npm install
+make
+```
+
+---
+
+### Bindings to other languages
+
+* Lua: [MunifTanjim/llhttp.lua][11]
+* Python: [pallas/pyllhttp][8]
+* Ruby: [metabahn/llhttp][9]
+* Rust: [JackLiar/rust-llhttp][10]
+
+### Using with CMake
+
+If you want to use this library in a CMake project as a shared library, you can use the snippet below.
+
+```
+FetchContent_Declare(llhttp
+ URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz")
+
+FetchContent_MakeAvailable(llhttp)
+
+# Link with the llhttp_shared target
+target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_shared ${PROJECT_NAME})
+```
+
+If you want to use this library in a CMake project as a static library, you can set some cache variables first.
+
+```
+FetchContent_Declare(llhttp
+ URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz")
+
+set(BUILD_SHARED_LIBS OFF CACHE INTERNAL "")
+set(BUILD_STATIC_LIBS ON CACHE INTERNAL "")
+FetchContent_MakeAvailable(llhttp)
+
+# Link with the llhttp_static target
+target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_static ${PROJECT_NAME})
+```
+
+_Note that using the git repo directly (e.g., via a git repo url and tag) will not work with FetchContent_Declare because [CMakeLists.txt](./CMakeLists.txt) requires string replacements (e.g., `_RELEASE_`) before it will build._
+
+## Building on Windows
+
+### Installation
+
+* `choco install git`
+* `choco install node`
+* `choco install llvm` (or install the `C++ Clang tools for Windows` optional package from the Visual Studio 2019 installer)
+* `choco install make` (or if you have MinGW, it comes bundled)
+
+1. Ensure that `Clang` and `make` are in your system path.
+2. Using Git Bash, clone the repo to your preferred location.
+3. Cd into the cloned directory and run `npm install`
+5. Run `make`
+6. Your `repo/build` directory should now have `libllhttp.a` and `libllhttp.so` static and dynamic libraries.
+7. When building your executable, you can link to these libraries. Make sure to set the build folder as an include path when building so you can reference the declarations in `repo/build/llhttp.h`.
+
+### A simple example on linking with the library:
+
+Assuming you have an executable `main.cpp` in your current working directory, you would run: `clang++ -Os -g3 -Wall -Wextra -Wno-unused-parameter -I/path/to/llhttp/build main.cpp /path/to/llhttp/build/libllhttp.a -o main.exe`.
+
+If you are getting `unresolved external symbol` linker errors you are likely attempting to build `llhttp.c` without linking it with object files from `api.c` and `http.c`.
+
+#### LICENSE
+
+This software is licensed under the MIT License.
+
+Copyright Fedor Indutny, 2018.
+
+Permission is hereby granted, free of charge, to any person obtaining a
+copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to permit
+persons to whom the Software is furnished to do so, subject to the
+following conditions:
+
+The above copyright notice and this permission notice shall be included
+in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
+NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+[0]: https://github.com/nodejs/http-parser
+[1]: https://github.com/nodejs/llparse
+[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling
+[3]: https://en.wikipedia.org/wiki/Tail_call
+[4]: https://llvm.org/docs/LangRef.html
+[5]: https://llvm.org/docs/LangRef.html#call-instruction
+[6]: https://clang.llvm.org/
+[7]: https://github.com/nodejs/node
+[8]: https://github.com/pallas/pyllhttp
+[9]: https://github.com/metabahn/llhttp
+[10]: https://github.com/JackLiar/rust-llhttp
+[11]: https://github.com/MunifTanjim/llhttp.lua