summaryrefslogtreecommitdiffstats
path: root/doc/userguide/file-extraction
diff options
context:
space:
mode:
Diffstat (limited to 'doc/userguide/file-extraction')
-rw-r--r--doc/userguide/file-extraction/config-update.rst41
-rw-r--r--doc/userguide/file-extraction/file-extraction.rst177
-rw-r--r--doc/userguide/file-extraction/md5.rst124
-rw-r--r--doc/userguide/file-extraction/public-sha1-md5-data-sets.rst4
4 files changed, 346 insertions, 0 deletions
diff --git a/doc/userguide/file-extraction/config-update.rst b/doc/userguide/file-extraction/config-update.rst
new file mode 100644
index 0000000..5035ac8
--- /dev/null
+++ b/doc/userguide/file-extraction/config-update.rst
@@ -0,0 +1,41 @@
+.. _filestore-update-v1-to-v2:
+
+Update File-store v1 Configuration to V2
+========================================
+
+Given a file-store configuration like::
+
+ - file-store:
+ enabled: yes # set to yes to enable
+ log-dir: files # directory to store the files
+ force-magic: no # force logging magic on all stored files
+ force-hash: [md5] # force logging of md5 checksums
+ force-filestore: no # force storing of all files
+ stream-depth: 1mb # reassemble 1mb into a stream, set to no to disable
+ waldo: file.waldo # waldo file to store the file_id across runs
+ max-open-files: 0 # how many files to keep open (O means none)
+ write-meta: yes # write a .meta file if set to yes
+ include-pid: yes # include the pid in filenames if set to yes.
+
+The following changes will need to be made to convert to a v2 style configuration:
+
+* The ``version`` field must be set to 2.
+* The ``log-dir`` field should be renamed to ``dir``. It is recommended to use a new directory instead of an existing v1 directory.
+* Remove the ``waldo`` option. It is no longer used.
+* Remove the ``write-meta`` option.
+* Optionally set ``write-fileinfo`` to enable writing of a metadata file along side the extracted file. Not that this option is disabled by default as a ``fileinfo`` event can be written to the Eve log file.
+* Remove the ``include-pid`` option. There is no equivalent to this option in file-store v2.
+
+Example converted configuration::
+
+ - file-store:
+ version: 2
+ enabled: yes
+ dir: filestore
+ force-hash: [md5]
+ file-filestore: no
+ stream-depth: 1mb
+ max-open-files: 0
+ write-fileinfo: yes
+
+Refer to the :ref:`File Extraction` section of the manual for information about the format of the file-store directory for file-store v2.
diff --git a/doc/userguide/file-extraction/file-extraction.rst b/doc/userguide/file-extraction/file-extraction.rst
new file mode 100644
index 0000000..b642ed3
--- /dev/null
+++ b/doc/userguide/file-extraction/file-extraction.rst
@@ -0,0 +1,177 @@
+.. _File Extraction:
+
+File Extraction
+===============
+
+Architecture
+~~~~~~~~~~~~
+
+The file extraction code works on top of selected protocol parsers (see supported protocols below). The application layer parsers run on top of the stream reassembly engine and the UDP flow tracking.
+
+In case of HTTP, the parser takes care of dechunking and unzipping the request and/or response data if necessary.
+
+This means that settings in the stream engine, reassembly engine and the application layer parsers all affect the workings of the file extraction.
+
+The rule language controls which files are extracted and stored on disk.
+
+Supported protocols are:
+
+- HTTP
+- SMTP
+- FTP
+- NFS
+- SMB
+- HTTP2
+
+Settings
+~~~~~~~~
+
+*stream.checksum_validation* controls whether or not the stream engine rejects packets with invalid checksums. A good idea normally, but the network interface performs checksum offloading a lot of packets may seem to be broken. This setting is enabled by default, and can be disabled by setting to "no". Note that the checksum handling can be controlled per interface, see "checksum_checks" in example configuration.
+
+*file-store.stream-depth* controls how far into a stream reassembly is done. Beyond this value no reassembly will be done. This means that after this value the HTTP session will no longer be tracked. By default a setting of 1 Megabyte is used. 0 sets it to unlimited. If set to no, it is disabled and stream.reassembly.depth is considered. Non-zero values must be greater than ``stream.stream-depth`` to be used.
+
+*libhtp.default-config.request-body-limit* / *libhtp.server-config.<config>.request-body-limit* controls how much of the HTTP request body is tracked for inspection by the `http_client_body` keyword, but also used to limit file inspection. A value of 0 means unlimited.
+
+*libhtp.default-config.response-body-limit* / *libhtp.server-config.<config>.response-body-limit* is like the request body limit, only it applies to the HTTP response body.
+
+
+Output
+~~~~~~
+
+File-Store and Eve Fileinfo
+---------------------------
+
+There are two output modules for logging information about extracted files.
+The first is ``eve.files`` which is an ``eve`` sub-logger
+that logs ``fileinfo`` records. These ``fileinfo`` records provide
+metadata about the file, but not the actual file contents.
+
+This must be enabled in the ``eve`` output::
+
+ - outputs:
+ - eve-log:
+ types:
+ - files:
+ force-magic: no
+ force-hash: [md5,sha256]
+
+See :ref:`suricata-yaml-outputs-eve` for more details on working
+with the `eve` output.
+
+The other output module, ``file-store`` stores the actual files to
+disk.
+
+The ``file-store`` module uses its own log directory (default: `filestore` in
+the default logging directory) and logs files using the SHA256 of the
+contents as the filename. Each file is then placed in a directory
+named `00` to `ff` where the directory shares the first 2 characters
+of the filename. For example, if the SHA256 hex string of an extracted
+file starts with "f9bc6d..." the file we be placed in the directory
+`filestore/f9`.
+
+The size of a file that can be stored depends on ``file-store.stream-depth``,
+if this value is reached a file can be truncated and might not be stored completely.
+If not enabled, ``stream.reassembly.depth`` will be considered.
+
+Setting ``file-store.stream-depth`` to 0 permits store of the entire file;
+here, 0 means "unlimited."
+
+``file-store.stream-depth`` will always override ``stream.reassembly.depth``
+when filestore keyword is used. However, it is not possible to set ``file-store.stream-depth``
+to a value less than ``stream.reassembly.depth``. Values less than this amount are ignored
+and a warning message will be displayed.
+
+A protocol parser, like modbus, could permit to set a different
+store-depth value and use it rather than ``file-store.stream-depth``.
+
+Using the SHA256 for file names allows for automatic de-duplication of
+extracted files. However, the timestamp of a preexisting file will be
+updated if the same files is extracted again, similar to the `touch`
+command.
+
+Optionally a ``fileinfo`` record can be written to its own file
+sharing the same SHA256 as the file it references. To handle recording
+the metadata of each occurrence of an extracted file, these filenames
+include some extra fields to ensure uniqueness. Currently the format
+is::
+
+ <SHA256>.<SECONDS>.<ID>.json
+
+where ``<SECONDS>`` is the seconds from the packet that triggered the
+stored file to be closed and ``<ID>`` is a unique ID for the runtime
+of the Suricata instance. These values should not be depended on, and
+are simply used to ensure uniqueness.
+
+These ``fileinfo`` records are identical to the ``fileinfo`` records
+logged to the ``eve`` output.
+
+See :ref:`suricata-yaml-file-store` for more information on
+configuring the file-store output.
+
+.. note:: This section documents version 2 of the ``file-store``. Version 1 of the file-store has been removed as of Suricata version 6.
+
+Rules
+~~~~~
+
+Without rules in place no extraction will happen. The simplest rule would be:
+
+::
+
+ alert http any any -> any any (msg:"FILE store all"; filestore; sid:1; rev:1;)
+
+This will simply store all files to disk.
+
+
+Want to store all files with a pdf extension?
+
+::
+
+ alert http any any -> any any (msg:"FILE PDF file claimed"; fileext:"pdf"; filestore; sid:2; rev:1;)
+
+
+Or rather all actual pdf files?
+
+::
+
+ alert http any any -> any any (msg:"FILE pdf detected"; filemagic:"PDF document"; filestore; sid:3; rev:1;)
+
+
+Or rather only store files from black list checksum md5 ?
+
+::
+
+ alert http any any -> any any (msg:"Black list checksum match and extract MD5"; filemd5:fileextraction-chksum.list; filestore; sid:4; rev:1;)
+
+
+Or only store files from black list checksum sha1 ?
+
+::
+
+ alert http any any -> any any (msg:"Black list checksum match and extract SHA1"; filesha1:fileextraction-chksum.list; filestore; sid:5; rev:1;)
+
+
+Or finally store files from black list checksum sha256 ?
+
+::
+ alert http any any -> any any (msg:"Black list checksum match and extract SHA256"; filesha256:fileextraction-chksum.list; filestore; sid:6; rev:1;)
+
+Bundled with the Suricata download, is a file with more example rules. In the archive, go to the `rules` directory and check the ``files.rules`` file.
+
+
+MD5
+~~~
+
+Suricata can calculate MD5 checksums of files on the fly and log them. See :doc:`md5` for an explanation on how to enable this.
+
+
+.. toctree::
+
+ md5
+ public-sha1-md5-data-sets
+
+Updating Filestore Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. toctree::
+
+ config-update
diff --git a/doc/userguide/file-extraction/md5.rst b/doc/userguide/file-extraction/md5.rst
new file mode 100644
index 0000000..939463b
--- /dev/null
+++ b/doc/userguide/file-extraction/md5.rst
@@ -0,0 +1,124 @@
+.. _md5:
+
+Storing MD5s checksums
+======================
+
+Configuration
+~~~~~~~~~~~~~
+
+In the Suricata config file:
+
+::
+
+  - file-store:
+       enabled: yes       # set to yes to enable
+       dir: filestore    # directory to store the files
+       force-hash: [md5]  # force logging of md5 checksums
+
+
+For JSON output:
+
+::
+
+ outputs:
+ - eve-log:
+ enabled: yes
+ filetype: regular #regular|syslog|unix_dgram|unix_stream|redis
+ filename: eve.json
+ types:
+ - files:
+ force-magic: no # force logging magic on all logged files
+ # force logging of checksums, available hash functions are md5,
+ # sha1 and sha256
+ #force-hash: [md5]
+
+
+Other settings affecting :doc:`file-extraction`
+
+::
+
+ stream:
+ memcap: 64mb
+ checksum-validation: yes # reject wrong csums
+ inline: no # no inline mode
+ reassembly:
+ memcap: 32mb
+ depth: 0 # reassemble all of a stream
+ toserver-chunk-size: 2560
+ toclient-chunk-size: 2560
+
+Make sure we have *depth: 0* so all files can be tracked fully.
+
+
+::
+
+ libhtp:
+ default-config:
+ personality: IDS
+ # Can be specified in kb, mb, gb. Just a number indicates
+ # it's in bytes.
+ request-body-limit: 0
+ response-body-limit: 0
+
+Make sure we have *request-body-limit: 0* and *response-body-limit: 0*
+
+Testing
+~~~~~~~
+
+For the purpose of testing we use this rule only in a file.rules (a test/example file):
+
+
+::
+
+ alert http any any -> any any (msg:"FILE store all"; filestore; sid:1; rev:1;)
+
+This rule above will save all the file data for files that are opened/downloaded through HTTP
+
+Start Suricata (``-S`` option *ONLY loads* the specified rule file and disregards any other rules that are enabled in suricata.yaml):
+
+::
+
+ suricata -c /etc/suricata/suricata.yaml -S file.rules -i eth0
+
+
+Meta data:
+
+::
+
+ TIME:              05/01/2012-11:09:52.425751
+ SRC IP:            2.23.144.170
+ DST IP:            192.168.1.91
+ PROTO:             6
+ SRC PORT:          80
+ DST PORT:          51598
+ HTTP URI:          /en/US/prod/collateral/routers/ps5855/prod_brochure0900aecd8019dc1f.pdf
+ HTTP HOST:         www.cisco.com
+ HTTP REFERER:      http://www.cisco.com/c/en/us/products/routers/3800-series-integrated-services-routers-isr/index.html
+ FILENAME:          /en/US/prod/collateral/routers/ps5855/prod_brochure0900aecd8019dc1f.pdf
+ MAGIC:             PDF document, version 1.6
+ STATE:             CLOSED
+ MD5:               59eba188e52467adc11bf2442ee5bf57
+ SIZE:              9485123
+
+and in files-json.log (or eve.json) :
+
+
+::
+
+ { "id": 1, "timestamp": "05\/01\/2012-11:10:27.693583", "ipver": 4, "srcip": "2.23.144.170", "dstip": "192.168.1.91", "protocol": 6, "sp": 80, "dp": 51598, "http_uri": "\/en\/US\/prod\/collateral\/routers\/ps5855\/prod_brochure0900aecd8019dc1f.pdf", "http_host": "www.cisco.com", "http_referer": "http:\/\/www.google.com\/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDAQFjAA&url=http%3A%2F%2Fwww.cisco.com%2Fen%2FUS%2Fprod%2Fcollateral%2Frouters%2Fps5855%2Fprod_brochure0900aecd8019dc1f.pdf&ei=OqyfT9eoJubi4QTyiamhAw&usg=AFQjCNGdjDBpBDfQv2r3VogSH41V6T5x9Q", "filename": "\/en\/US\/prod\/collateral\/routers\/ps5855\/prod_brochure0900aecd8019dc1f.pdf", "magic": "PDF document, version 1.6", "state": "CLOSED", "md5": "59eba188e52467adc11bf2442ee5bf57", "stored": true, "size": 9485123 }
+ { "id": 12, "timestamp": "05\/01\/2012-11:12:57.421420", "ipver": 4, "srcip": "2.23.144.170", "dstip": "192.168.1.91", "protocol": 6, "sp": 80, "dp": 51598, "http_uri": "\/en\/US\/prod\/collateral\/routers\/ps5855\/prod_brochure0900aecd8019dc1f.pdf", "http_host": "www.cisco.com", "http_referer": "http:\/\/www.google.com\/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDAQFjAA&url=http%3A%2F%2Fwww.cisco.com%2Fen%2FUS%2Fprod%2Fcollateral%2Frouters%2Fps5855%2Fprod_brochure0900aecd8019dc1f.pdf&ei=OqyfT9eoJubi4QTyiamhAw&usg=AFQjCNGdjDBpBDfQv2r3VogSH41V6T5x9Q", "filename": "\/en\/US\/prod\/collateral\/routers\/ps5855\/prod_brochure0900aecd8019dc1f.pdf", "magic": "PDF document, version 1.6", "state": "CLOSED", "md5": "59eba188e52467adc11bf2442ee5bf57", "stored": true, "size": 9485123 }
+
+
+Log all MD5s without any rules
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you would like to log MD5s for everything and anything that passes through the traffic that you are inspecting with Suricata, but not log the files themselves, all you have to do is disable file-store and enable only the JSON output with forced MD5s - in suricata.yaml like so:
+
+::
+
+ - file-store:
+ version: 2
+ enabled: no # set to yes to enable
+ log-dir: files # directory to store the files
+ force-filestore: no
+    force-hash: [md5]  # force logging of md5 checksums
diff --git a/doc/userguide/file-extraction/public-sha1-md5-data-sets.rst b/doc/userguide/file-extraction/public-sha1-md5-data-sets.rst
new file mode 100644
index 0000000..2e850ce
--- /dev/null
+++ b/doc/userguide/file-extraction/public-sha1-md5-data-sets.rst
@@ -0,0 +1,4 @@
+Public SHA1 MD5 data sets
+=========================
+
+National Software Reference Library - http://www.nsrl.nist.gov/Downloads.html