1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
|
QUICK START
-----------
LibHTP is envisioned to be many things, but the only scenario in which it has been tested
so far is that when you need to parse a duplex HTTP stream which you have obtained by
passively intercepting a communication channel. The assumption is that you have raw TCP data
(after SSL, if SSL is used).
Every parsing operation needs to follow these steps:
1. Configure-time:
1.1. Create one or more parser configuration structures.
1.2. Tweak the configuration of each parser to match the behaviour of
the server you're intercepting the communication of (htp_config_set_* functions).
1.3. Register the parser callbacks you'll need. You will need to use parser callbacks
if you want to monitor parsing events as they occur, and gain access to partial
transaction information. If you are processing data in batch (off-line) you may
simply parse entire streams at a time and only analyze complete transaction data
after the fact.
If you need to gain access to request and response bodies, your only option at
this time is to use the callbacks, because the parser will not preserve that
information.
For callback registration, look up the htp_config_register_* functions.
If your program operates in real-time then it may be desirable to dispose of
the used resources after each transaction is parsed. To do that, use the
htp_config_set_tx_auto_destroy() function to tell LibHTP to delete transactions
after they are no longer needed.
2. Run-time:
2.1. Create a parser instance for every TCP stream you want to process.
2.2. Feed the parser inbound and outbound data.
The parser will typically always consume complete data chunks and return
STREAM_STATE_DATA, which means that you can continue to feed it more data
when you have it. If you have a queue of data chunks, always first send the
parser all the _request_ chunks you have. That will ensure that the parser
never encounters a response for which it had not seen a request (which
would result with a fatal error).
If you get STREAM_STATE_ERROR, the parser has encountered a fatal error and
is unable to continue to parse the stream. An error should never happen for
a valid HTTP stream. If you encounter such an error and you believe the
HTTP stream is valid, please send us the PCAP file we can use to diagnose
the problem.
There is one situation when the parser will not be able to consume a complete
request data chunk, in which case it will return STREAM_STATE_DATA_OTHER. This
means that the parser needs to see some response data. You will then need to
do the following:
2.2.1. Remember how many bytes of the request chunk data were consumed (using
htp_connp_req_data_consumed()).
2.2.2. Suspend request parsing until you get some response data.
2.2.3. Feed some response data (when you have it) to the parser.
Note that it is also possible to receive STREAM_STATE_DATA_OTHER
from the response parser. If that happens, you will need to
remember how many bytes were consumed using
htp_connp_res_data_consumed().
2.2.4. After each chunk of response data fed to the parser, attempt
to resume request stream parsing.
2.2.5. If you again receive STREAM_STATE_DATA_OTHER go back to 2.2.3.
2.2.6. Otherwise, feed to the parser all the request data you have. This is
necessary to prevent the case of the parser seeing more responses
than requests (which would inevitably result with an error).
2.2.7. Send unprocessed response data from 2.2.3 (if any).
2.2.8. Continue sending request/response data as normal.
The above situation should occur very rarely.
2.3. Analyze transaction data in callbacks (if you want to have access to
the data as it is being produced).
2.4. Analyze transaction data after an entire TCP stream has been processed.
2.4. Destroy parser instance to free up the allocated resources.
USER DATA
---------
If you're using the callbacks and you need to keep state between invocations, you have two
options:
1. Associate one opaque structure with a parser instance, using htp_connp_set_user_data().
2. Associate one opaque structure with a transaction instance, using htp_tx_set_user_data().
The best place to do this is in a TRANSACTION_START callback. Don't forget to free up
any resources you allocate on per-transaction basis, before you delete each transaction.
|