# CBOR DNS Stream Format version 1 (CDSv1) This is an experimental format for representing DNS information in CBOR with the goals to: - Be able to stream the information - Support incomplete, broken and/or invalid DNS - Have close to no data quality and signature degradation - Support additional non-DNS meta data (such as ICMP/TCP attributes) ## Overview In CBOR you are expected to have one root element, most likely an array or map. This format does not have a root element, instead you are expected to read one CBOR array element at a time as a stream of CBOR elements with the first array element being the stream initiator object. ``` [stream_init] [message] ... [message] ``` Here are some number on the compression rate compared to PCAP: Uncompressed | PCAP | CDS | Factor -------------|------------|-----------|------- client | 458373 | 133640 | 0,2915 zonalizer | 51769844 | 9450475 | 0,1825 large ditl | 1003931674 | 298167709 | 0,2970 small ditl | 1651252 | 603314 | 0,3653 Gzipped | PCAP | CDS | Factor | F/Uncompressed -------------|------------|-----------|---------|--------------- client | 108136 | 45944 | 0,4248 | 0,1002 zonalizer | 12468329 | 2485620 | 0,1993 | 0,0480 large ditl | 327227203 | 117569598 | 0,3592 | 0,1171 small ditl | 539323 | 253402 | 0,4698 | 0,1534 Xzipped | PCAP | CDS | Factor | F/Uncompressed -------------|------------|-----------|---------|--------------- client | 76248 | 36308 | 0,4761 | 0,0792 zonalizer | 7894356 | 1695920 | 0,2148 | 0,0327 large ditl | 267031412 | 86747604 | 0,3248 | 0,0864 small ditl | 442260 | 206596 | 0,4671 | 0,1251 - `client` is a couple of hours of DNS from my workstation - `zonalizer` is half a day from [Zonalizer](https://zonalizer.makeinstall.se) which continuously tests gTLDs - `large ditl`, `small ditl` are capture from [DITL](https://www.dns-oarc.net/oarc/data/ditl) ## Types - `int`: A CBOR integer (major type 0x00) - `uint`: A CBOR integer (value >= 0, major type 0x00) - `nint`: A CBOR negative integer (value < 0, major type 0x00), this type has special meaning see `Negative Integers` - `simple`: A CBOR simple value (major type 0xe0) - `bytes`: A CBOR byte string (major type 0x40) - `string`: A CBOR UTF-8 string (major type 0x60) - `any`: Any CBOR value - `bool`: A CBOR boolean - `rindex`: A CBOR negative integer that is a reverse index, see `Deduplication` ## Special Keywords - `union`: Can be used to merge the given array or map into the current object - `optional`: The attribute or object reference is optional ## Negative Integers CBOR encodes negative numbers in a special way and this format uses that for none negative number to tell them apart. Because of that, all negative numbers needs special decoding: ``` value = -value - 1 ``` ## Objects The object code below uses: - `[` and `]` to indicate the start and end of an array - `type name` per object attribute - `name` per object reference - `...` to indicate a list of previous definition - `(`, `|` and `)` to indicate list of various types that the attribute can be ### stream_init The initial object in the stream. ``` [ string version, union stream_option option, ... ] ``` - `version`: The version of the format - `option`: A list of stream option objects ### stream_option A stream option that can specify critical information about the stream and how it should be decoded, see `Stream Options` for more information. ``` [ uint option_type, optional any option_value ] ``` - `option_type`: The type of option represented as a number - `option_value`: The option value ### message A message object that describes various DNS packets or other information. ``` [ optional bool is_complete, union timestamp timestamp, simple message_bits, union ip_header ip_header, union ( icmp_message | udp_message | tcp_message | dns_message ) content ] ``` - `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists - `timestamp`: A timestamp object - `message_bits`: Bitmap indicating message content - Bit 0: 0=Not DNS 1=DNS - Bit 1: if DNS: 0=UDP 1=TCP else: 0=ICMP/ICMPv6 1=TCP - Bit 2: Fragmented (0=no 1=yes) - Bit 3: Malformed (0=no 1=yes) - `ip_header`: An IP header object - `content`: The message content, may be an ICMP, UDP, TCP or DNS message object ### timestamp The timestamp object of a message. ``` [ ( uint seconds | nint diff_from_last ), optional uint useconds optional uint nseconds ] ``` - `seconds`: The seconds of a UNIX timestamp - `diff_from_last`: The differentially from last `timestamp.seconds` - `useconds`: The microseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.useconds` - `nseconds`: The nanoseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.nseconds` ### ip_header The IP header of a message. ``` [ ( uint | nint ) ip_bits, optional bytes src_addr, optional bytes dest_addr, optional ( uint | nint ) src_dest_port ] ``` - `ip_bits`: Bitmap indicating IP header content, if the type is `nint` it also indicates that it is a reverse from last, see `Deduplication` for more information - Bit 0: address family (0=AF_INET, 1=AF_INET6) - Bit 1: src_addr present - Bit 2: dest_addr present - Bit 3: port present - `src_addr`: The source address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6 - `dest_addr`: The destination address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6 - `src_dest_port`: A combined source and destination port, see `Source And Destination Port` #### Source And Destination Port The source and destination port are combined into one value. If both source and destination exists then the value is larger then 65535, the destination will be the high 16 bits and source the low otherwise it will only be the source. If the value is negative then only the destination exists. ``` if value > 0xffff then src_port = value & 0xffff dest_port = value >> 16 else if value < 0 then dest_port = -value - 1 else src_port = value ``` ### icmp_message `if ip_header.ip_bits.1=0 && ip_header.ip_bits.2=0` ``` [ uint type, uint code ] ``` - `type`: TODO - `code`: TODO ### udp_message `if ip_header.ip_bits.1=1 && ip_header.ip_bits.2=0` TODO ### tcp_message `if ip_header.ip_bits.2=1` ``` [ uint seq_nr, uint ack_nr, uint tcp_bits, uint window ] ``` - `seq_nr`: TODO - `ack_nr`: TODO - `tcp_bits`: TODO - 0: URG - 1: ACK - 2: PSH - 3: RST - 4: SYN - 5: FIN - `window`: TODO ### dns_message A DNS packet. ``` [ optional bool is_complete, uint id, uint raw_dns_header, # TODO optional nint count_bits, optional uint qdcount, optional uint ancount, optional uint nscount, optional uint arcount, optional simple rr_bits, optional [ dns_question question, ... ], optional [ resource_record answer, ... ], optional [ resource_record authority, ... ], optional [ resource_record additional, ... ], optional bytes malformed ] ``` - `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists - `id`: DNS identifier - `raw_dns_header`: TODO - `count_bits`: Bitmap indicating which counts are present, see `Negative Integers` and `Deduplication` - Bit 0: qdcount present - Bit 1: ancount present - Bit 2: nscount present - Bit 3: arcount present - `qdcount`: Number of question records if different from the number of entries in `question` - `ancount`: Number of answer resource records if different from the number of entries in `answer` - `nscount`: Number of authority resource records if different from the number of entries in `authority` - `arcount`: Number of additional resource records if different from the number of entries in `additional` - `question`: The question records - `answer`: The answer resource records - `authority`: The authority resource records - `additional`: The additional resource records - `malformed`: Holds the bytes of the message that was not parsed ### question A DNS question record. ``` [ optional bool is_complete, ( bytes | compressed_name | rindex ) qname, optional uint qtype, optional nint qclass ] ``` - `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists - `qname`: The QNAME as byte string, a name compression object or a reverse index, see `Deduplication` - `qtype`: The QTYPE, see `Deduplication` - `qclass`: The QCLASS, see `Negative Integers` and `Deduplication` ### compressed_name An compressed name which has references to other labels within the same message. ``` [ ( bytes label | uint label_index | nint offset | simple extension_bits ), ... ] ``` - `label`: A byte string with a label part - `label_index`: An index to the N byte string label in the message - `offset`: The offset specified in the DNS message which could not be translated into a label index - `extension_bits`: The extension bits if not 0b00 or 0b11 # TODO: add the extension bits ### resource_record A DNS resource record. ``` [ optional bool is_complete, ( bytes | compressed_name | rindex ) name, optional simple rr_bits, optional uint type, optional uint class, optional uint ttl, optional uint rdlength, ( bytes | mixed_rdata ) rdata ] ``` - `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists - `name`: - `rr_bits`: Bitmap indicating what is present, see `Deduplication` - Bit 0: type - Bit 1: class - Bit 2: ttl - Bit 3: rdlength # TODO: reverse index for TTL? - `type`: The resource record type - `class`: The resource record class - `ttl`: The resource record ttl - `rdlength`: The resource record rdata length - `rdata`: The resource record data ### mixed_rdata An array mixed with resource data and compressed names. ``` [ ( bytes | compressed_name ) rdata_part, ... ] ``` - `rdata_part`: The parts of the resource records data ## Stream Options Each option is specified here as OptionName(OptionNumber) and optional OptionValue type. - `RLABELS(0) uint`: Indicates how many labels should be stored in the reverse label index before discarding them - `RLABEL_MIN_SIZE(1) uint`: The minimum size a label must be to be put in the reverse label index - `RDATA_RINDEX_SIZE(2) uint`: Indicates how many rdata should be stored in the reverse rdata index before discarding them - `RDATA_RINDEX_MIN_SIZE(3) uint`: The minimum size a rdata must be to be put in the reverse rdata index - `USE_RDATA_INDEX(4)`: If present then the stream uses rdata indexing - `RDATA_INDEX_MIN_SIZE(5) uint`: The minimum size a rdata must be to be put in the rdata index ## Deduplication Deduplication is done in a few different ways, data may be left out to indicate that it is the same as the previous value, an index may be used to indicate that it is the same as the N previous value and a reverse index may be used to indicate that it is the N previous value looking backwards across the stream. In other words, using the index deduplication you will need to build a table of the values you come across during the decoding of the stream, this table can grow very large. As an smaller alternative a reverse index can indicate often used data from the N previous value looking back over the stream. This type of index also reorder itself to try and put the most used data always in the index. TODO: details of each attribute and it's deduplication