summaryrefslogtreecommitdiffstats
path: root/third-party/llhttp/README.md
blob: e982ed0ae718c2665ada2e40f88b1c694fe156f6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
# llhttp
[![CI](https://github.com/nodejs/llhttp/workflows/CI/badge.svg)](https://github.com/nodejs/llhttp/actions?query=workflow%3ACI)

Port of [http_parser][0] to [llparse][1].

## Why?

Let's face it, [http_parser][0] is practically unmaintainable. Even
introduction of a single new method results in a significant code churn.

This project aims to:

* Make it maintainable
* Verifiable
* Improving benchmarks where possible

More details in [Fedor Indutny's talk at JSConf EU 2019](https://youtu.be/x3k_5Mi66sY)

## How?

Over time, different approaches for improving [http_parser][0]'s code base
were tried. However, all of them failed due to resulting significant performance
degradation.

This project is a port of [http_parser][0] to TypeScript. [llparse][1] is used
to generate the output C source file, which could be compiled and
linked with the embedder's program (like [Node.js][7]).

## Performance

So far llhttp outperforms http_parser:

|                 | input size |  bandwidth   |  reqs/sec  |   time  |
|:----------------|-----------:|-------------:|-----------:|--------:|
| **llhttp**      | 8192.00 mb | 1777.24 mb/s | 3583799.39 req/sec | 4.61 s |
| **http_parser** | 8192.00 mb | 694.66 mb/s | 1406180.33 req/sec | 11.79 s |

llhttp is faster by approximately **156%**.

## Maintenance

llhttp project has about 1400 lines of TypeScript code describing the parser
itself and around 450 lines of C code and headers providing the helper methods.
The whole [http_parser][0] is implemented in approximately 2500 lines of C, and
436 lines of headers.

All optimizations and multi-character matching in llhttp are generated
automatically, and thus doesn't add any extra maintenance cost. On the contrary,
most of http_parser's code is hand-optimized and unrolled. Instead describing
"how" it should parse the HTTP requests/responses, a maintainer should
implement the new features in [http_parser][0] cautiously, considering
possible performance degradation and manually optimizing the new code.

## Verification

The state machine graph is encoded explicitly in llhttp. The [llparse][1]
automatically checks the graph for absence of loops and correct reporting of the
input ranges (spans) like header names and values. In the future, additional
checks could be performed to get even stricter verification of the llhttp.

## Usage

```C
#include "stdio.h"
#include "llhttp.h"
#include "string.h"

int handle_on_message_complete(llhttp_t* parser) {
	fprintf(stdout, "Message completed!\n");
	return 0;
}

int main() {
	llhttp_t parser;
	llhttp_settings_t settings;

	/*Initialize user callbacks and settings */
	llhttp_settings_init(&settings);

	/*Set user callback */
	settings.on_message_complete = handle_on_message_complete;

	/*Initialize the parser in HTTP_BOTH mode, meaning that it will select between
	*HTTP_REQUEST and HTTP_RESPONSE parsing automatically while reading the first
	*input.
	*/
	llhttp_init(&parser, HTTP_BOTH, &settings);

	/*Parse request! */
	const char* request = "GET / HTTP/1.1\r\n\r\n";
	int request_len = strlen(request);

	enum llhttp_errno err = llhttp_execute(&parser, request, request_len);
	if (err == HPE_OK) {
		fprintf(stdout, "Successfully parsed!\n");
	} else {
		fprintf(stderr, "Parse error: %s %s\n", llhttp_errno_name(err), parser.reason);
	}
}
```
For more information on API usage, please refer to [src/native/api.h](https://github.com/nodejs/llhttp/blob/main/src/native/api.h).

## API

### llhttp_settings_t

The settings object contains a list of callbacks that the parser will invoke.

The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_PAUSED` (pause the parser):

* `on_message_begin`: Invoked when a new request/response starts.
* `on_message_complete`: Invoked when a request/response has been completedly parsed.
* `on_url_complete`: Invoked after the URL has been parsed.
* `on_method_complete`: Invoked after the HTTP method has been parsed.
* `on_version_complete`: Invoked after the HTTP version has been parsed.
* `on_status_complete`: Invoked after the status code has been parsed.
* `on_header_field_complete`: Invoked after a header name has been parsed.
* `on_header_value_complete`: Invoked after a header value has been parsed.
* `on_chunk_header`: Invoked after a new chunk is started. The current chunk length is stored in `parser->content_length`.
* `on_chunk_extension_name_complete`: Invoked after a chunk extension name is started.
* `on_chunk_extension_value_complete`: Invoked after a chunk extension value is started.
* `on_chunk_complete`: Invoked after a new chunk is received. 
* `on_reset`: Invoked after `on_message_complete` and before `on_message_begin` when a new message 
   is received on the same parser. This is not invoked for the first message of the parser.

The following callbacks can return `0` (proceed normally), `-1` (error) or `HPE_USER` (error from the callback): 

* `on_url`: Invoked when another character of the URL is received. 
* `on_status`: Invoked when another character of the status is received.
* `on_method`: Invoked when another character of the method is received. 
   When parser is created with `HTTP_BOTH` and the input is a response, this also invoked for the sequence `HTTP/`
   of the first message.
* `on_version`: Invoked when another character of the version is received.
* `on_header_field`: Invoked when another character of a header name is received.
* `on_header_value`: Invoked when another character of a header value is received.
* `on_chunk_extension_name`: Invoked when another character of a chunk extension name is received.
* `on_chunk_extension_value`: Invoked when another character of a extension value is received.

The callback `on_headers_complete`, invoked when headers are completed, can return:

* `0`: Proceed normally.
* `1`: Assume that request/response has no body, and proceed to parsing the next message.
* `2`: Assume absence of body (as above) and make `llhttp_execute()` return `HPE_PAUSED_UPGRADE`.
* `-1`: Error
* `HPE_PAUSED`: Pause the parser.

### `void llhttp_init(llhttp_t* parser, llhttp_type_t type, const llhttp_settings_t* settings)`

Initialize the parser with specific type and user settings.

### `uint8_t llhttp_get_type(llhttp_t* parser)`

Returns the type of the parser.

### `uint8_t llhttp_get_http_major(llhttp_t* parser)`

Returns the major version of the HTTP protocol of the current request/response.

### `uint8_t llhttp_get_http_minor(llhttp_t* parser)`

Returns the minor version of the HTTP protocol of the current request/response.

### `uint8_t llhttp_get_method(llhttp_t* parser)`

Returns the method of the current request.

### `int llhttp_get_status_code(llhttp_t* parser)`

Returns the method of the current response.

### `uint8_t llhttp_get_upgrade(llhttp_t* parser)`

Returns `1` if request includes the `Connection: upgrade` header.

### `void llhttp_reset(llhttp_t* parser)`

Reset an already initialized parser back to the start state, preserving the 
existing parser type, callback settings, user data, and lenient flags.

### `void llhttp_settings_init(llhttp_settings_t* settings)`

Initialize the settings object.

### `llhttp_errno_t llhttp_execute(llhttp_t* parser, const char* data, size_t len)`

Parse full or partial request/response, invoking user callbacks along the way.

If any of `llhttp_data_cb` returns errno not equal to `HPE_OK` - the parsing interrupts, 
and such errno is returned from `llhttp_execute()`. If `HPE_PAUSED` was used as a errno, 
the execution can be resumed with `llhttp_resume()` call.

In a special case of CONNECT/Upgrade request/response `HPE_PAUSED_UPGRADE` is returned 
after fully parsing the request/response. If the user wishes to continue parsing, 
they need to invoke `llhttp_resume_after_upgrade()`.

**if this function ever returns a non-pause type error, it will continue to return 
the same error upon each successive call up until `llhttp_init()` is called.**

### `llhttp_errno_t llhttp_finish(llhttp_t* parser)`

This method should be called when the other side has no further bytes to
send (e.g. shutdown of readable side of the TCP connection.)

Requests without `Content-Length` and other messages might require treating
all incoming bytes as the part of the body, up to the last byte of the
connection. 

This method will invoke `on_message_complete()` callback if the
request was terminated safely. Otherwise a error code would be returned.


### `int llhttp_message_needs_eof(const llhttp_t* parser)`

Returns `1` if the incoming message is parsed until the last byte, and has to be completed by calling `llhttp_finish()` on EOF.

### `int llhttp_should_keep_alive(const llhttp_t* parser)`

Returns `1` if there might be any other messages following the last that was
successfully parsed.

### `void llhttp_pause(llhttp_t* parser)`

Make further calls of `llhttp_execute()` return `HPE_PAUSED` and set
appropriate error reason.

**Do not call this from user callbacks! User callbacks must return
`HPE_PAUSED` if pausing is required.**

### `void llhttp_resume(llhttp_t* parser)`

Might be called to resume the execution after the pause in user's callback.

See `llhttp_execute()` above for details.

**Call this only if `llhttp_execute()` returns `HPE_PAUSED`.**

### `void llhttp_resume_after_upgrade(llhttp_t* parser)`

Might be called to resume the execution after the pause in user's callback.
See `llhttp_execute()` above for details.

**Call this only if `llhttp_execute()` returns `HPE_PAUSED_UPGRADE`**

### `llhttp_errno_t llhttp_get_errno(const llhttp_t* parser)`

Returns the latest error.

### `const char* llhttp_get_error_reason(const llhttp_t* parser)`

Returns the verbal explanation of the latest returned error.

**User callback should set error reason when returning the error. See
`llhttp_set_error_reason()` for details.**

### `void llhttp_set_error_reason(llhttp_t* parser, const char* reason)`

Assign verbal description to the returned error. Must be called in user
callbacks right before returning the errno.

**`HPE_USER` error code might be useful in user callbacks.**

### `const char* llhttp_get_error_pos(const llhttp_t* parser)`

Returns the pointer to the last parsed byte before the returned error. The
pointer is relative to the `data` argument of `llhttp_execute()`.

**This method might be useful for counting the number of parsed bytes.**

### `const char* llhttp_errno_name(llhttp_errno_t err)`

Returns textual name of error code.

### `const char* llhttp_method_name(llhttp_method_t method)`

Returns textual name of HTTP method.

### `const char* llhttp_status_name(llhttp_status_t status)`

Returns textual name of HTTP status.

### `void llhttp_set_lenient_headers(llhttp_t* parser, int enabled)`

Enables/disables lenient header value parsing (disabled by default).
Lenient parsing disables header value token checks, extending llhttp's
protocol support to highly non-compliant clients/server. 

No `HPE_INVALID_HEADER_TOKEN` will be raised for incorrect header values when
lenient parsing is "on".

**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**

### `void llhttp_set_lenient_chunked_length(llhttp_t* parser, int enabled)`

Enables/disables lenient handling of conflicting `Transfer-Encoding` and
`Content-Length` headers (disabled by default).

Normally `llhttp` would error when `Transfer-Encoding` is present in
conjunction with `Content-Length`. 

This error is important to prevent HTTP request smuggling, but may be less desirable
for small number of cases involving legacy servers.

**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**

### `void llhttp_set_lenient_keep_alive(llhttp_t* parser, int enabled)`

Enables/disables lenient handling of `Connection: close` and HTTP/1.0
requests responses.

Normally `llhttp` would error the HTTP request/response 
after the request/response with `Connection: close` and `Content-Length`. 

This is important to prevent cache poisoning attacks,
but might interact badly with outdated and insecure clients. 

With this flag the extra request/response will be parsed normally.

**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!**

### `void llhttp_set_lenient_transfer_encoding(llhttp_t* parser, int enabled)`

Enables/disables lenient handling of `Transfer-Encoding` header.

Normally `llhttp` would error when a `Transfer-Encoding` has `chunked` value
and another value after it (either in a single header or in multiple
headers whose value are internally joined using `, `).

This is mandated by the spec to reliably determine request body size and thus
avoid request smuggling.

With this flag the extra value will be parsed normally.

**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**

### `void llhttp_set_lenient_version(llhttp_t* parser, int enabled)`

Enables/disables lenient handling of HTTP version.

Normally `llhttp` would error when the HTTP version in the request or status line
is not `0.9`, `1.0`, `1.1` or `2.0`.
With this flag the extra value will be parsed normally.

**Enabling this flag can pose a security issue since you will allow unsupported HTTP versions. USE WITH CAUTION!**

### `void llhttp_set_lenient_data_after_close(llhttp_t* parser, int enabled)`

Enables/disables lenient handling of additional data received after a message ends
and keep-alive is disabled.

Normally `llhttp` would error when additional unexpected data is received if the message
contains the `Connection` header with `close` value.
With this flag the extra data will discarded without throwing an error.

**Enabling this flag can pose a security issue since you will be exposed to poisoning attacks. USE WITH CAUTION!**

### `void llhttp_set_lenient_optional_lf_after_cr(llhttp_t* parser, int enabled)`

Enables/disables lenient handling of incomplete CRLF sequences.

Normally `llhttp` would error when a CR is not followed by LF when terminating the
request line, the status line, the headers or a chunk header.
With this flag only a CR is required to terminate such sections.

**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**

### `void llhttp_set_lenient_optional_crlf_after_chunk(llhttp_t* parser, int enabled)`

Enables/disables lenient handling of chunks not separated via CRLF.

Normally `llhttp` would error when after a chunk data a CRLF is missing before
starting a new chunk.
With this flag the new chunk can start immediately after the previous one.

**Enabling this flag can pose a security issue since you will be exposed to request smuggling attacks. USE WITH CAUTION!**

## Build Instructions

Make sure you have [Node.js](https://nodejs.org/), npm and npx installed. Then under project directory run:

```sh
npm install
make
```

---

### Bindings to other languages

* Lua: [MunifTanjim/llhttp.lua][11]
* Python: [pallas/pyllhttp][8]
* Ruby: [metabahn/llhttp][9]
* Rust: [JackLiar/rust-llhttp][10]

### Using with CMake

If you want to use this library in a CMake project as a shared library, you can use the snippet below.

```
FetchContent_Declare(llhttp
  URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz")

FetchContent_MakeAvailable(llhttp)

# Link with the llhttp_shared target
target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_shared ${PROJECT_NAME})
```

If you want to use this library in a CMake project as a static library, you can set some cache variables first.

```
FetchContent_Declare(llhttp
  URL "https://github.com/nodejs/llhttp/archive/refs/tags/release/v8.1.0.tar.gz")

set(BUILD_SHARED_LIBS OFF CACHE INTERNAL "")
set(BUILD_STATIC_LIBS ON CACHE INTERNAL "")
FetchContent_MakeAvailable(llhttp)

# Link with the llhttp_static target
target_link_libraries(${EXAMPLE_PROJECT_NAME} ${PROJECT_LIBRARIES} llhttp_static ${PROJECT_NAME})
```

_Note that using the git repo directly (e.g., via a git repo url and tag) will not work with FetchContent_Declare because [CMakeLists.txt](./CMakeLists.txt) requires string replacements (e.g., `_RELEASE_`) before it will build._

## Building on Windows

### Installation

* `choco install git`
* `choco install node`
* `choco install llvm` (or install the `C++ Clang tools for Windows` optional package from the Visual Studio 2019 installer)
* `choco install make` (or if you have MinGW, it comes bundled)

1. Ensure that `Clang` and `make` are in your system path.
2. Using Git Bash, clone the repo to your preferred location.
3. Cd into the cloned directory and run `npm install`
5. Run `make`
6. Your `repo/build` directory should now have `libllhttp.a` and `libllhttp.so` static and dynamic libraries.
7. When building your executable, you can link to these libraries. Make sure to set the build folder as an include path when building so you can reference the declarations in `repo/build/llhttp.h`.

### A simple example on linking with the library:

Assuming you have an executable `main.cpp` in your current working directory, you would run: `clang++ -Os -g3 -Wall -Wextra -Wno-unused-parameter -I/path/to/llhttp/build main.cpp /path/to/llhttp/build/libllhttp.a -o main.exe`.

If you are getting `unresolved external symbol` linker errors you are likely attempting to build `llhttp.c` without linking it with object files from `api.c` and `http.c`.

#### LICENSE

This software is licensed under the MIT License.

Copyright Fedor Indutny, 2018.

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the
following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.

[0]: https://github.com/nodejs/http-parser
[1]: https://github.com/nodejs/llparse
[2]: https://en.wikipedia.org/wiki/Register_allocation#Spilling
[3]: https://en.wikipedia.org/wiki/Tail_call
[4]: https://llvm.org/docs/LangRef.html
[5]: https://llvm.org/docs/LangRef.html#call-instruction
[6]: https://clang.llvm.org/
[7]: https://github.com/nodejs/node
[8]: https://github.com/pallas/pyllhttp
[9]: https://github.com/metabahn/llhttp
[10]: https://github.com/JackLiar/rust-llhttp
[11]: https://github.com/MunifTanjim/llhttp.lua