summaryrefslogtreecommitdiffstats
path: root/debian/patches/BUG-MINOR-h1-do-not-accept-as-part-of-the-URI-compon.patch
blob: 02d1e747eefbf2e1d4494e0df7cd4a22d7be4e3e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
From: Willy Tarreau <w@1wt.eu>
Date: Tue, 8 Aug 2023 16:17:22 +0200
Subject: BUG/MINOR: h1: do not accept '#' as part of the URI component
Origin: https://git.haproxy.org/?p=haproxy-2.6.git;a=commit;h=832b672eee54866c7a42a1d46078cc9ae0d544d9
Bug-Debian-Security: https://security-tracker.debian.org/tracker/CVE-2023-45539

Seth Manesse and Paul Plasil reported that the "path" sample fetch
function incorrectly accepts '#' as part of the path component. This
can in some cases lead to misrouted requests for rules that would apply
on the suffix:

    use_backend static if { path_end .png .jpg .gif .css .js }

Note that this behavior can be selectively configured using
"normalize-uri fragment-encode" and "normalize-uri fragment-strip".

The problem is that while the RFC says that this '#' must never be
emitted, as often it doesn't suggest how servers should handle it. A
diminishing number of servers still do accept it and trim it silently,
while others are rejecting it, as indicated in the conversation below
with other implementers:

   https://lists.w3.org/Archives/Public/ietf-http-wg/2023JulSep/0070.html

Looking at logs from publicly exposed servers, such requests appear at
a rate of roughly 1 per million and only come from attacks or poorly
written web crawlers incorrectly following links found on various pages.

Thus it looks like the best solution to this problem is to simply reject
such ambiguous requests by default, and include this in the list of
controls that can be disabled using "option accept-invalid-http-request".

We're already rejecting URIs containing any control char anyway, so we
should also reject '#'.

In the H1 parser for the H1_MSG_RQURI state, there is an accelerated
parser for bytes 0x21..0x7e that has been tightened to 0x24..0x7e (it
should not impact perf since 0x21..0x23 are not supposed to appear in
a URI anyway). This way '#' falls through the fine-grained filter and
we can add the special case for it also conditionned by a check on the
proxy's option "accept-invalid-http-request", with no overhead for the
vast majority of valid URIs. Here this information is available through
h1m->err_pos that's set to -2 when the option is here (so we don't need
to change the API to expose the proxy). Example with a trivial GET
through netcat:

  [08/Aug/2023:16:16:52.651] frontend layer1 (#2): invalid request
    backend <NONE> (#-1), server <NONE> (#-1), event #0, src 127.0.0.1:50812
    buffer starts at 0 (including 0 out), 16361 free,
    len 23, wraps at 16336, error at position 7
    H1 connection flags 0x00000000, H1 stream flags 0x00000810
    H1 msg state MSG_RQURI(4), H1 msg flags 0x00001400
    H1 chunk len 0 bytes, H1 body len 0 bytes :

    00000  GET /aa#bb HTTP/1.0\r\n
    00021  \r\n

This should be progressively backported to all stable versions along with
the following patch:

    REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri tests

Similar fixes for h2 and h3 will come in followup patches.

Thanks to Seth Manesse and Paul Plasil for reporting this problem with
detailed explanations.

(cherry picked from commit 2eab6d354322932cfec2ed54de261e4347eca9a6)
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
(cherry picked from commit 9bf75c8e22a8f2537f27c557854a8803087046d0)
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
(cherry picked from commit 9facd01c9ac85fe9bcb331594b80fa08e7406552)
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
---
 src/h1.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/h1.c b/src/h1.c
index 126f23cc7376..92ec96bfe19e 100644
--- a/src/h1.c
+++ b/src/h1.c
@@ -565,13 +565,13 @@ int h1_headers_to_hdr_list(char *start, const char *stop,
 	case H1_MSG_RQURI:
 	http_msg_rquri:
 #ifdef HA_UNALIGNED_LE
-		/* speedup: skip bytes not between 0x21 and 0x7e inclusive */
+		/* speedup: skip bytes not between 0x24 and 0x7e inclusive */
 		while (ptr <= end - sizeof(int)) {
-			int x = *(int *)ptr - 0x21212121;
+			int x = *(int *)ptr - 0x24242424;
 			if (x & 0x80808080)
 				break;
 
-			x -= 0x5e5e5e5e;
+			x -= 0x5b5b5b5b;
 			if (!(x & 0x80808080))
 				break;
 
@@ -583,8 +583,15 @@ int h1_headers_to_hdr_list(char *start, const char *stop,
 			goto http_msg_ood;
 		}
 	http_msg_rquri2:
-		if (likely((unsigned char)(*ptr - 33) <= 93)) /* 33 to 126 included */
+		if (likely((unsigned char)(*ptr - 33) <= 93)) { /* 33 to 126 included */
+			if (*ptr == '#') {
+				if (h1m->err_pos < -1) /* PR_O2_REQBUG_OK not set */
+					goto invalid_char;
+				if (h1m->err_pos == -1) /* PR_O2_REQBUG_OK set: just log */
+					h1m->err_pos = ptr - start + skip;
+			}
 			EAT_AND_JUMP_OR_RETURN(ptr, end, http_msg_rquri2, http_msg_ood, state, H1_MSG_RQURI);
+		}
 
 		if (likely(HTTP_IS_SPHT(*ptr))) {
 			sl.rq.u.len = ptr - sl.rq.u.ptr;
-- 
2.43.0