summaryrefslogtreecommitdiffstats
path: root/CBOR_DNS_STREAM.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2021-03-04 19:22:03 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2021-03-04 20:43:22 +0000
commit22c74419e2c258319bc723351876604b3304604b (patch)
tree8c799a78d53f67388fdf42900657eda617c1306a /CBOR_DNS_STREAM.md
parentInitial commit. (diff)
downloaddnscap-22c74419e2c258319bc723351876604b3304604b.tar.xz
dnscap-22c74419e2c258319bc723351876604b3304604b.zip
Adding upstream version 2.0.0+debian.upstream/2.0.0+debian
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'CBOR_DNS_STREAM.md')
-rw-r--r--CBOR_DNS_STREAM.md399
1 files changed, 399 insertions, 0 deletions
diff --git a/CBOR_DNS_STREAM.md b/CBOR_DNS_STREAM.md
new file mode 100644
index 0000000..a54dc63
--- /dev/null
+++ b/CBOR_DNS_STREAM.md
@@ -0,0 +1,399 @@
+# CBOR DNS Stream Format version 1 (CDSv1)
+
+This is an experimental format for representing DNS information in CBOR
+with the goals to:
+- Be able to stream the information
+- Support incomplete, broken and/or invalid DNS
+- Have close to no data quality and signature degradation
+- Support additional non-DNS meta data (such as ICMP/TCP attributes)
+
+## Overview
+
+In CBOR you are expected to have one root element, most likely an array or
+map. This format does not have a root element, instead you are expected to
+read one CBOR array element at a time as a stream of CBOR elements with the
+first array element being the stream initiator object.
+
+```
+[stream_init]
+[message]
+...
+[message]
+```
+
+Here are some number on the compression rate compared to PCAP:
+
+Uncompressed | PCAP | CDS | Factor
+-------------|------------|-----------|-------
+client | 458373 | 133640 | 0,2915
+zonalizer | 51769844 | 9450475 | 0,1825
+large ditl | 1003931674 | 298167709 | 0,2970
+small ditl | 1651252 | 603314 | 0,3653
+
+Gzipped | PCAP | CDS | Factor | F/Uncompressed
+-------------|------------|-----------|---------|---------------
+client | 108136 | 45944 | 0,4248 | 0,1002
+zonalizer | 12468329 | 2485620 | 0,1993 | 0,0480
+large ditl | 327227203 | 117569598 | 0,3592 | 0,1171
+small ditl | 539323 | 253402 | 0,4698 | 0,1534
+
+Xzipped | PCAP | CDS | Factor | F/Uncompressed
+-------------|------------|-----------|---------|---------------
+client | 76248 | 36308 | 0,4761 | 0,0792
+zonalizer | 7894356 | 1695920 | 0,2148 | 0,0327
+large ditl | 267031412 | 86747604 | 0,3248 | 0,0864
+small ditl | 442260 | 206596 | 0,4671 | 0,1251
+
+- `client` is a couple of hours of DNS from my workstation
+- `zonalizer` is half a day from [Zonalizer](https://zonalizer.makeinstall.se) which continuously tests gTLDs
+- `large ditl`, `small ditl` are capture from [DITL](https://www.dns-oarc.net/oarc/data/ditl)
+
+## Types
+
+- `int`: A CBOR integer (major type 0x00)
+- `uint`: A CBOR integer (value >= 0, major type 0x00)
+- `nint`: A CBOR negative integer (value < 0, major type 0x00), this type has special meaning see `Negative Integers`
+- `simple`: A CBOR simple value (major type 0xe0)
+- `bytes`: A CBOR byte string (major type 0x40)
+- `string`: A CBOR UTF-8 string (major type 0x60)
+- `any`: Any CBOR value
+- `bool`: A CBOR boolean
+- `rindex`: A CBOR negative integer that is a reverse index, see `Deduplication`
+
+## Special Keywords
+
+- `union`: Can be used to merge the given array or map into the current object
+- `optional`: The attribute or object reference is optional
+
+## Negative Integers
+
+CBOR encodes negative numbers in a special way and this format uses that
+for none negative number to tell them apart.
+
+Because of that, all negative numbers needs special decoding:
+
+```
+value = -value - 1
+```
+
+## Objects
+
+The object code below uses:
+- `[` and `]` to indicate the start and end of an array
+- `type name` per object attribute
+- `name` per object reference
+- `...` to indicate a list of previous definition
+- `(`, `|` and `)` to indicate list of various types that the attribute can be
+
+### stream_init
+
+The initial object in the stream.
+
+```
+[
+ string version,
+ union stream_option option,
+ ...
+]
+```
+
+- `version`: The version of the format
+- `option`: A list of stream option objects
+
+### stream_option
+
+A stream option that can specify critical information about the stream and
+how it should be decoded, see `Stream Options` for more information.
+
+```
+[
+ uint option_type,
+ optional any option_value
+]
+```
+
+- `option_type`: The type of option represented as a number
+- `option_value`: The option value
+
+### message
+
+A message object that describes various DNS packets or other information.
+
+```
+[
+ optional bool is_complete,
+ union timestamp timestamp,
+ simple message_bits,
+ union ip_header ip_header,
+ union ( icmp_message | udp_message | tcp_message | dns_message ) content
+]
+```
+
+- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
+- `timestamp`: A timestamp object
+- `message_bits`: Bitmap indicating message content
+ - Bit 0: 0=Not DNS 1=DNS
+ - Bit 1: if DNS: 0=UDP 1=TCP else: 0=ICMP/ICMPv6 1=TCP
+ - Bit 2: Fragmented (0=no 1=yes)
+ - Bit 3: Malformed (0=no 1=yes)
+- `ip_header`: An IP header object
+- `content`: The message content, may be an ICMP, UDP, TCP or DNS message object
+
+### timestamp
+
+The timestamp object of a message.
+
+```
+[
+ ( uint seconds | nint diff_from_last ),
+ optional uint useconds
+ optional uint nseconds
+]
+```
+
+- `seconds`: The seconds of a UNIX timestamp
+- `diff_from_last`: The differentially from last `timestamp.seconds`
+- `useconds`: The microseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.useconds`
+- `nseconds`: The nanoseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.nseconds`
+
+### ip_header
+
+The IP header of a message.
+
+```
+[
+ ( uint | nint ) ip_bits,
+ optional bytes src_addr,
+ optional bytes dest_addr,
+ optional ( uint | nint ) src_dest_port
+]
+```
+
+- `ip_bits`: Bitmap indicating IP header content, if the type is `nint` it also indicates that it is a reverse from last, see `Deduplication` for more information
+ - Bit 0: address family (0=AF_INET, 1=AF_INET6)
+ - Bit 1: src_addr present
+ - Bit 2: dest_addr present
+ - Bit 3: port present
+- `src_addr`: The source address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6
+- `dest_addr`: The destination address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6
+- `src_dest_port`: A combined source and destination port, see `Source And Destination Port`
+
+#### Source And Destination Port
+
+The source and destination port are combined into one value. If both source
+and destination exists then the value is larger then 65535, the destination
+will be the high 16 bits and source the low otherwise it will only be the
+source. If the value is negative then only the destination exists.
+
+```
+if value > 0xffff then
+ src_port = value & 0xffff
+ dest_port = value >> 16
+else if value < 0 then
+ dest_port = -value - 1
+else
+ src_port = value
+```
+
+### icmp_message
+
+`if ip_header.ip_bits.1=0 && ip_header.ip_bits.2=0`
+
+```
+[
+ uint type,
+ uint code
+]
+```
+
+- `type`: TODO
+- `code`: TODO
+
+### udp_message
+
+`if ip_header.ip_bits.1=1 && ip_header.ip_bits.2=0`
+
+TODO
+
+### tcp_message
+
+`if ip_header.ip_bits.2=1`
+
+```
+[
+ uint seq_nr,
+ uint ack_nr,
+ uint tcp_bits,
+ uint window
+]
+```
+
+- `seq_nr`: TODO
+- `ack_nr`: TODO
+- `tcp_bits`: TODO
+ - 0: URG
+ - 1: ACK
+ - 2: PSH
+ - 3: RST
+ - 4: SYN
+ - 5: FIN
+- `window`: TODO
+
+### dns_message
+
+A DNS packet.
+
+```
+[
+ optional bool is_complete,
+ uint id,
+ uint raw_dns_header, # TODO
+ optional nint count_bits,
+ optional uint qdcount,
+ optional uint ancount,
+ optional uint nscount,
+ optional uint arcount,
+ optional simple rr_bits,
+ optional [
+ dns_question question,
+ ...
+ ],
+ optional [
+ resource_record answer,
+ ...
+ ],
+ optional [
+ resource_record authority,
+ ...
+ ],
+ optional [
+ resource_record additional,
+ ...
+ ],
+ optional bytes malformed
+]
+```
+
+- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
+- `id`: DNS identifier
+- `raw_dns_header`: TODO
+- `count_bits`: Bitmap indicating which counts are present, see `Negative Integers` and `Deduplication`
+ - Bit 0: qdcount present
+ - Bit 1: ancount present
+ - Bit 2: nscount present
+ - Bit 3: arcount present
+- `qdcount`: Number of question records if different from the number of entries in `question`
+- `ancount`: Number of answer resource records if different from the number of entries in `answer`
+- `nscount`: Number of authority resource records if different from the number of entries in `authority`
+- `arcount`: Number of additional resource records if different from the number of entries in `additional`
+- `question`: The question records
+- `answer`: The answer resource records
+- `authority`: The authority resource records
+- `additional`: The additional resource records
+- `malformed`: Holds the bytes of the message that was not parsed
+
+### question
+
+A DNS question record.
+
+```
+[
+ optional bool is_complete,
+ ( bytes | compressed_name | rindex ) qname,
+ optional uint qtype,
+ optional nint qclass
+]
+```
+
+- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
+- `qname`: The QNAME as byte string, a name compression object or a reverse index, see `Deduplication`
+- `qtype`: The QTYPE, see `Deduplication`
+- `qclass`: The QCLASS, see `Negative Integers` and `Deduplication`
+
+### compressed_name
+
+An compressed name which has references to other labels within the same message.
+
+```
+[
+ ( bytes label | uint label_index | nint offset | simple extension_bits ),
+ ...
+]
+```
+
+- `label`: A byte string with a label part
+- `label_index`: An index to the N byte string label in the message
+- `offset`: The offset specified in the DNS message which could not be translated into a label index
+- `extension_bits`: The extension bits if not 0b00 or 0b11 # TODO: add the extension bits
+
+### resource_record
+
+A DNS resource record.
+
+```
+[
+ optional bool is_complete,
+ ( bytes | compressed_name | rindex ) name,
+ optional simple rr_bits,
+ optional uint type,
+ optional uint class,
+ optional uint ttl,
+ optional uint rdlength,
+ ( bytes | mixed_rdata ) rdata
+]
+```
+
+- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
+- `name`:
+- `rr_bits`: Bitmap indicating what is present, see `Deduplication`
+ - Bit 0: type
+ - Bit 1: class
+ - Bit 2: ttl
+ - Bit 3: rdlength # TODO: reverse index for TTL?
+- `type`: The resource record type
+- `class`: The resource record class
+- `ttl`: The resource record ttl
+- `rdlength`: The resource record rdata length
+- `rdata`: The resource record data
+
+### mixed_rdata
+
+An array mixed with resource data and compressed names.
+
+```
+[
+ ( bytes | compressed_name ) rdata_part,
+ ...
+]
+```
+- `rdata_part`: The parts of the resource records data
+
+## Stream Options
+
+Each option is specified here as OptionName(OptionNumber) and optional
+OptionValue type.
+
+- `RLABELS(0) uint`: Indicates how many labels should be stored in the reverse label index before discarding them
+- `RLABEL_MIN_SIZE(1) uint`: The minimum size a label must be to be put in the reverse label index
+- `RDATA_RINDEX_SIZE(2) uint`: Indicates how many rdata should be stored in the reverse rdata index before discarding them
+- `RDATA_RINDEX_MIN_SIZE(3) uint`: The minimum size a rdata must be to be put in the reverse rdata index
+- `USE_RDATA_INDEX(4)`: If present then the stream uses rdata indexing
+- `RDATA_INDEX_MIN_SIZE(5) uint`: The minimum size a rdata must be to be put in the rdata index
+
+## Deduplication
+
+Deduplication is done in a few different ways, data may be left out to
+indicate that it is the same as the previous value, an index may be used to
+indicate that it is the same as the N previous value and a reverse index
+may be used to indicate that it is the N previous value looking backwards
+across the stream.
+
+In other words, using the index deduplication you will need to build a table
+of the values you come across during the decoding of the stream, this table
+can grow very large.
+
+As an smaller alternative a reverse index can indicate often used data from
+the N previous value looking back over the stream. This type of index also
+reorder itself to try and put the most used data always in the index.
+
+TODO: details of each attribute and it's deduplication