summaryrefslogtreecommitdiffstats
path: root/docs/QUICK_START
blob: 30943e08a754d0181ebc6a706becdfc6611edd75 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
QUICK START
-----------

LibHTP is envisioned to be many things, but the only scenario in which it has been tested
so far is that when you need to parse a duplex HTTP stream which you have obtained by
passively intercepting a communication channel. The assumption is that you have raw TCP data
(after SSL, if SSL is used).

Every parsing operation needs to follow these steps:

  1. Configure-time:

     1.1. Create one or more parser configuration structures.

     1.2. Tweak the configuration of each parser to match the behaviour of
          the server you're intercepting the communication of (htp_config_set_* functions).

     1.3. Register the parser callbacks you'll need. You will need to use parser callbacks
          if you want to monitor parsing events as they occur, and gain access to partial
          transaction information. If you are processing data in batch (off-line) you may
          simply parse entire streams at a time and only analyze complete transaction data
          after the fact.

          If you need to gain access to request and response bodies, your only option at
          this time is to use the callbacks, because the parser will not preserve that
          information.

          For callback registration, look up the htp_config_register_* functions.

          If your program operates in real-time then it may be desirable to dispose of
          the used resources after each transaction is parsed. To do that, use the
          htp_config_set_tx_auto_destroy() function to tell LibHTP to delete transactions
          after they are no longer needed.

  2. Run-time:

     2.1. Create a parser instance for every TCP stream you want to process.

     2.2. Feed the parser inbound and outbound data.

          The parser will typically always consume complete data chunks and return
          STREAM_STATE_DATA, which means that you can continue to feed it more data
          when you have it. If you have a queue of data chunks, always first send the
          parser all the _request_ chunks you have. That will ensure that the parser
          never encounters a response for which it had not seen a request (which
          would result with a fatal error).

          If you get STREAM_STATE_ERROR, the parser has encountered a fatal error and
          is unable to continue to parse the stream. An error should never happen for
          a valid HTTP stream. If you encounter such an error and you believe the
          HTTP stream is valid, please send us the PCAP file we can use to diagnose
          the problem.

          There is one situation when the parser will not be able to consume a complete
          request data chunk, in which case it will return STREAM_STATE_DATA_OTHER. This
          means that the parser needs to see some response data. You will then need to
          do the following:

          2.2.1. Remember how many bytes of the request chunk data were consumed (using
                 htp_connp_req_data_consumed()).

          2.2.2. Suspend request parsing until you get some response data.

          2.2.3. Feed some response data (when you have it) to the parser.

                 Note that it is also possible to receive STREAM_STATE_DATA_OTHER
                 from the response parser. If that happens, you will need to
                 remember how many bytes were consumed using
                 htp_connp_res_data_consumed().

          2.2.4. After each chunk of response data fed to the parser, attempt
                 to resume request stream parsing.

          2.2.5. If you again receive STREAM_STATE_DATA_OTHER go back to 2.2.3.

          2.2.6. Otherwise, feed to the parser all the request data you have. This is
                 necessary to prevent the case of the parser seeing more responses
                 than requests (which would inevitably result with an error).

          2.2.7. Send unprocessed response data from 2.2.3 (if any).

          2.2.8. Continue sending request/response data as normal.

          The above situation should occur very rarely.

     2.3. Analyze transaction data in callbacks (if you want to have access to
          the data as it is being produced).

     2.4. Analyze transaction data after an entire TCP stream has been processed.

     2.4. Destroy parser instance to free up the allocated resources.


USER DATA
---------

If you're using the callbacks and you need to keep state between invocations, you have two
options:

  1. Associate one opaque structure with a parser instance, using htp_connp_set_user_data().

  2. Associate one opaque structure with a transaction instance, using htp_tx_set_user_data().
     The best place to do this is in a TRANSACTION_START callback. Don't forget to free up
     any resources you allocate on per-transaction basis, before you delete each transaction.