diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2021-03-04 19:22:03 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2021-03-04 20:43:22 +0000 |
commit | 22c74419e2c258319bc723351876604b3304604b (patch) | |
tree | 8c799a78d53f67388fdf42900657eda617c1306a /CBOR_DNS_STREAM.md | |
parent | Initial commit. (diff) | |
download | dnscap-22c74419e2c258319bc723351876604b3304604b.tar.xz dnscap-22c74419e2c258319bc723351876604b3304604b.zip |
Adding upstream version 2.0.0+debian.upstream/2.0.0+debian
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'CBOR_DNS_STREAM.md')
-rw-r--r-- | CBOR_DNS_STREAM.md | 399 |
1 files changed, 399 insertions, 0 deletions
diff --git a/CBOR_DNS_STREAM.md b/CBOR_DNS_STREAM.md new file mode 100644 index 0000000..a54dc63 --- /dev/null +++ b/CBOR_DNS_STREAM.md @@ -0,0 +1,399 @@ +# CBOR DNS Stream Format version 1 (CDSv1) + +This is an experimental format for representing DNS information in CBOR +with the goals to: +- Be able to stream the information +- Support incomplete, broken and/or invalid DNS +- Have close to no data quality and signature degradation +- Support additional non-DNS meta data (such as ICMP/TCP attributes) + +## Overview + +In CBOR you are expected to have one root element, most likely an array or +map. This format does not have a root element, instead you are expected to +read one CBOR array element at a time as a stream of CBOR elements with the +first array element being the stream initiator object. + +``` +[stream_init] +[message] +... +[message] +``` + +Here are some number on the compression rate compared to PCAP: + +Uncompressed | PCAP | CDS | Factor +-------------|------------|-----------|------- +client | 458373 | 133640 | 0,2915 +zonalizer | 51769844 | 9450475 | 0,1825 +large ditl | 1003931674 | 298167709 | 0,2970 +small ditl | 1651252 | 603314 | 0,3653 + +Gzipped | PCAP | CDS | Factor | F/Uncompressed +-------------|------------|-----------|---------|--------------- +client | 108136 | 45944 | 0,4248 | 0,1002 +zonalizer | 12468329 | 2485620 | 0,1993 | 0,0480 +large ditl | 327227203 | 117569598 | 0,3592 | 0,1171 +small ditl | 539323 | 253402 | 0,4698 | 0,1534 + +Xzipped | PCAP | CDS | Factor | F/Uncompressed +-------------|------------|-----------|---------|--------------- +client | 76248 | 36308 | 0,4761 | 0,0792 +zonalizer | 7894356 | 1695920 | 0,2148 | 0,0327 +large ditl | 267031412 | 86747604 | 0,3248 | 0,0864 +small ditl | 442260 | 206596 | 0,4671 | 0,1251 + +- `client` is a couple of hours of DNS from my workstation +- `zonalizer` is half a day from [Zonalizer](https://zonalizer.makeinstall.se) which continuously tests gTLDs +- `large ditl`, `small ditl` are capture from [DITL](https://www.dns-oarc.net/oarc/data/ditl) + +## Types + +- `int`: A CBOR integer (major type 0x00) +- `uint`: A CBOR integer (value >= 0, major type 0x00) +- `nint`: A CBOR negative integer (value < 0, major type 0x00), this type has special meaning see `Negative Integers` +- `simple`: A CBOR simple value (major type 0xe0) +- `bytes`: A CBOR byte string (major type 0x40) +- `string`: A CBOR UTF-8 string (major type 0x60) +- `any`: Any CBOR value +- `bool`: A CBOR boolean +- `rindex`: A CBOR negative integer that is a reverse index, see `Deduplication` + +## Special Keywords + +- `union`: Can be used to merge the given array or map into the current object +- `optional`: The attribute or object reference is optional + +## Negative Integers + +CBOR encodes negative numbers in a special way and this format uses that +for none negative number to tell them apart. + +Because of that, all negative numbers needs special decoding: + +``` +value = -value - 1 +``` + +## Objects + +The object code below uses: +- `[` and `]` to indicate the start and end of an array +- `type name` per object attribute +- `name` per object reference +- `...` to indicate a list of previous definition +- `(`, `|` and `)` to indicate list of various types that the attribute can be + +### stream_init + +The initial object in the stream. + +``` +[ + string version, + union stream_option option, + ... +] +``` + +- `version`: The version of the format +- `option`: A list of stream option objects + +### stream_option + +A stream option that can specify critical information about the stream and +how it should be decoded, see `Stream Options` for more information. + +``` +[ + uint option_type, + optional any option_value +] +``` + +- `option_type`: The type of option represented as a number +- `option_value`: The option value + +### message + +A message object that describes various DNS packets or other information. + +``` +[ + optional bool is_complete, + union timestamp timestamp, + simple message_bits, + union ip_header ip_header, + union ( icmp_message | udp_message | tcp_message | dns_message ) content +] +``` + +- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists +- `timestamp`: A timestamp object +- `message_bits`: Bitmap indicating message content + - Bit 0: 0=Not DNS 1=DNS + - Bit 1: if DNS: 0=UDP 1=TCP else: 0=ICMP/ICMPv6 1=TCP + - Bit 2: Fragmented (0=no 1=yes) + - Bit 3: Malformed (0=no 1=yes) +- `ip_header`: An IP header object +- `content`: The message content, may be an ICMP, UDP, TCP or DNS message object + +### timestamp + +The timestamp object of a message. + +``` +[ + ( uint seconds | nint diff_from_last ), + optional uint useconds + optional uint nseconds +] +``` + +- `seconds`: The seconds of a UNIX timestamp +- `diff_from_last`: The differentially from last `timestamp.seconds` +- `useconds`: The microseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.useconds` +- `nseconds`: The nanoseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.nseconds` + +### ip_header + +The IP header of a message. + +``` +[ + ( uint | nint ) ip_bits, + optional bytes src_addr, + optional bytes dest_addr, + optional ( uint | nint ) src_dest_port +] +``` + +- `ip_bits`: Bitmap indicating IP header content, if the type is `nint` it also indicates that it is a reverse from last, see `Deduplication` for more information + - Bit 0: address family (0=AF_INET, 1=AF_INET6) + - Bit 1: src_addr present + - Bit 2: dest_addr present + - Bit 3: port present +- `src_addr`: The source address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6 +- `dest_addr`: The destination address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6 +- `src_dest_port`: A combined source and destination port, see `Source And Destination Port` + +#### Source And Destination Port + +The source and destination port are combined into one value. If both source +and destination exists then the value is larger then 65535, the destination +will be the high 16 bits and source the low otherwise it will only be the +source. If the value is negative then only the destination exists. + +``` +if value > 0xffff then + src_port = value & 0xffff + dest_port = value >> 16 +else if value < 0 then + dest_port = -value - 1 +else + src_port = value +``` + +### icmp_message + +`if ip_header.ip_bits.1=0 && ip_header.ip_bits.2=0` + +``` +[ + uint type, + uint code +] +``` + +- `type`: TODO +- `code`: TODO + +### udp_message + +`if ip_header.ip_bits.1=1 && ip_header.ip_bits.2=0` + +TODO + +### tcp_message + +`if ip_header.ip_bits.2=1` + +``` +[ + uint seq_nr, + uint ack_nr, + uint tcp_bits, + uint window +] +``` + +- `seq_nr`: TODO +- `ack_nr`: TODO +- `tcp_bits`: TODO + - 0: URG + - 1: ACK + - 2: PSH + - 3: RST + - 4: SYN + - 5: FIN +- `window`: TODO + +### dns_message + +A DNS packet. + +``` +[ + optional bool is_complete, + uint id, + uint raw_dns_header, # TODO + optional nint count_bits, + optional uint qdcount, + optional uint ancount, + optional uint nscount, + optional uint arcount, + optional simple rr_bits, + optional [ + dns_question question, + ... + ], + optional [ + resource_record answer, + ... + ], + optional [ + resource_record authority, + ... + ], + optional [ + resource_record additional, + ... + ], + optional bytes malformed +] +``` + +- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists +- `id`: DNS identifier +- `raw_dns_header`: TODO +- `count_bits`: Bitmap indicating which counts are present, see `Negative Integers` and `Deduplication` + - Bit 0: qdcount present + - Bit 1: ancount present + - Bit 2: nscount present + - Bit 3: arcount present +- `qdcount`: Number of question records if different from the number of entries in `question` +- `ancount`: Number of answer resource records if different from the number of entries in `answer` +- `nscount`: Number of authority resource records if different from the number of entries in `authority` +- `arcount`: Number of additional resource records if different from the number of entries in `additional` +- `question`: The question records +- `answer`: The answer resource records +- `authority`: The authority resource records +- `additional`: The additional resource records +- `malformed`: Holds the bytes of the message that was not parsed + +### question + +A DNS question record. + +``` +[ + optional bool is_complete, + ( bytes | compressed_name | rindex ) qname, + optional uint qtype, + optional nint qclass +] +``` + +- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists +- `qname`: The QNAME as byte string, a name compression object or a reverse index, see `Deduplication` +- `qtype`: The QTYPE, see `Deduplication` +- `qclass`: The QCLASS, see `Negative Integers` and `Deduplication` + +### compressed_name + +An compressed name which has references to other labels within the same message. + +``` +[ + ( bytes label | uint label_index | nint offset | simple extension_bits ), + ... +] +``` + +- `label`: A byte string with a label part +- `label_index`: An index to the N byte string label in the message +- `offset`: The offset specified in the DNS message which could not be translated into a label index +- `extension_bits`: The extension bits if not 0b00 or 0b11 # TODO: add the extension bits + +### resource_record + +A DNS resource record. + +``` +[ + optional bool is_complete, + ( bytes | compressed_name | rindex ) name, + optional simple rr_bits, + optional uint type, + optional uint class, + optional uint ttl, + optional uint rdlength, + ( bytes | mixed_rdata ) rdata +] +``` + +- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists +- `name`: +- `rr_bits`: Bitmap indicating what is present, see `Deduplication` + - Bit 0: type + - Bit 1: class + - Bit 2: ttl + - Bit 3: rdlength # TODO: reverse index for TTL? +- `type`: The resource record type +- `class`: The resource record class +- `ttl`: The resource record ttl +- `rdlength`: The resource record rdata length +- `rdata`: The resource record data + +### mixed_rdata + +An array mixed with resource data and compressed names. + +``` +[ + ( bytes | compressed_name ) rdata_part, + ... +] +``` +- `rdata_part`: The parts of the resource records data + +## Stream Options + +Each option is specified here as OptionName(OptionNumber) and optional +OptionValue type. + +- `RLABELS(0) uint`: Indicates how many labels should be stored in the reverse label index before discarding them +- `RLABEL_MIN_SIZE(1) uint`: The minimum size a label must be to be put in the reverse label index +- `RDATA_RINDEX_SIZE(2) uint`: Indicates how many rdata should be stored in the reverse rdata index before discarding them +- `RDATA_RINDEX_MIN_SIZE(3) uint`: The minimum size a rdata must be to be put in the reverse rdata index +- `USE_RDATA_INDEX(4)`: If present then the stream uses rdata indexing +- `RDATA_INDEX_MIN_SIZE(5) uint`: The minimum size a rdata must be to be put in the rdata index + +## Deduplication + +Deduplication is done in a few different ways, data may be left out to +indicate that it is the same as the previous value, an index may be used to +indicate that it is the same as the N previous value and a reverse index +may be used to indicate that it is the N previous value looking backwards +across the stream. + +In other words, using the index deduplication you will need to build a table +of the values you come across during the decoding of the stream, this table +can grow very large. + +As an smaller alternative a reverse index can indicate often used data from +the N previous value looking back over the stream. This type of index also +reorder itself to try and put the most used data always in the index. + +TODO: details of each attribute and it's deduplication |