summaryrefslogtreecommitdiffstats
path: root/CBOR_DNS_STREAM.md
diff options
context:
space:
mode:
Diffstat (limited to 'CBOR_DNS_STREAM.md')
-rw-r--r--CBOR_DNS_STREAM.md399
1 files changed, 0 insertions, 399 deletions
diff --git a/CBOR_DNS_STREAM.md b/CBOR_DNS_STREAM.md
deleted file mode 100644
index a54dc63..0000000
--- a/CBOR_DNS_STREAM.md
+++ /dev/null
@@ -1,399 +0,0 @@
-# CBOR DNS Stream Format version 1 (CDSv1)
-
-This is an experimental format for representing DNS information in CBOR
-with the goals to:
-- Be able to stream the information
-- Support incomplete, broken and/or invalid DNS
-- Have close to no data quality and signature degradation
-- Support additional non-DNS meta data (such as ICMP/TCP attributes)
-
-## Overview
-
-In CBOR you are expected to have one root element, most likely an array or
-map. This format does not have a root element, instead you are expected to
-read one CBOR array element at a time as a stream of CBOR elements with the
-first array element being the stream initiator object.
-
-```
-[stream_init]
-[message]
-...
-[message]
-```
-
-Here are some number on the compression rate compared to PCAP:
-
-Uncompressed | PCAP | CDS | Factor
--------------|------------|-----------|-------
-client | 458373 | 133640 | 0,2915
-zonalizer | 51769844 | 9450475 | 0,1825
-large ditl | 1003931674 | 298167709 | 0,2970
-small ditl | 1651252 | 603314 | 0,3653
-
-Gzipped | PCAP | CDS | Factor | F/Uncompressed
--------------|------------|-----------|---------|---------------
-client | 108136 | 45944 | 0,4248 | 0,1002
-zonalizer | 12468329 | 2485620 | 0,1993 | 0,0480
-large ditl | 327227203 | 117569598 | 0,3592 | 0,1171
-small ditl | 539323 | 253402 | 0,4698 | 0,1534
-
-Xzipped | PCAP | CDS | Factor | F/Uncompressed
--------------|------------|-----------|---------|---------------
-client | 76248 | 36308 | 0,4761 | 0,0792
-zonalizer | 7894356 | 1695920 | 0,2148 | 0,0327
-large ditl | 267031412 | 86747604 | 0,3248 | 0,0864
-small ditl | 442260 | 206596 | 0,4671 | 0,1251
-
-- `client` is a couple of hours of DNS from my workstation
-- `zonalizer` is half a day from [Zonalizer](https://zonalizer.makeinstall.se) which continuously tests gTLDs
-- `large ditl`, `small ditl` are capture from [DITL](https://www.dns-oarc.net/oarc/data/ditl)
-
-## Types
-
-- `int`: A CBOR integer (major type 0x00)
-- `uint`: A CBOR integer (value >= 0, major type 0x00)
-- `nint`: A CBOR negative integer (value < 0, major type 0x00), this type has special meaning see `Negative Integers`
-- `simple`: A CBOR simple value (major type 0xe0)
-- `bytes`: A CBOR byte string (major type 0x40)
-- `string`: A CBOR UTF-8 string (major type 0x60)
-- `any`: Any CBOR value
-- `bool`: A CBOR boolean
-- `rindex`: A CBOR negative integer that is a reverse index, see `Deduplication`
-
-## Special Keywords
-
-- `union`: Can be used to merge the given array or map into the current object
-- `optional`: The attribute or object reference is optional
-
-## Negative Integers
-
-CBOR encodes negative numbers in a special way and this format uses that
-for none negative number to tell them apart.
-
-Because of that, all negative numbers needs special decoding:
-
-```
-value = -value - 1
-```
-
-## Objects
-
-The object code below uses:
-- `[` and `]` to indicate the start and end of an array
-- `type name` per object attribute
-- `name` per object reference
-- `...` to indicate a list of previous definition
-- `(`, `|` and `)` to indicate list of various types that the attribute can be
-
-### stream_init
-
-The initial object in the stream.
-
-```
-[
- string version,
- union stream_option option,
- ...
-]
-```
-
-- `version`: The version of the format
-- `option`: A list of stream option objects
-
-### stream_option
-
-A stream option that can specify critical information about the stream and
-how it should be decoded, see `Stream Options` for more information.
-
-```
-[
- uint option_type,
- optional any option_value
-]
-```
-
-- `option_type`: The type of option represented as a number
-- `option_value`: The option value
-
-### message
-
-A message object that describes various DNS packets or other information.
-
-```
-[
- optional bool is_complete,
- union timestamp timestamp,
- simple message_bits,
- union ip_header ip_header,
- union ( icmp_message | udp_message | tcp_message | dns_message ) content
-]
-```
-
-- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
-- `timestamp`: A timestamp object
-- `message_bits`: Bitmap indicating message content
- - Bit 0: 0=Not DNS 1=DNS
- - Bit 1: if DNS: 0=UDP 1=TCP else: 0=ICMP/ICMPv6 1=TCP
- - Bit 2: Fragmented (0=no 1=yes)
- - Bit 3: Malformed (0=no 1=yes)
-- `ip_header`: An IP header object
-- `content`: The message content, may be an ICMP, UDP, TCP or DNS message object
-
-### timestamp
-
-The timestamp object of a message.
-
-```
-[
- ( uint seconds | nint diff_from_last ),
- optional uint useconds
- optional uint nseconds
-]
-```
-
-- `seconds`: The seconds of a UNIX timestamp
-- `diff_from_last`: The differentially from last `timestamp.seconds`
-- `useconds`: The microseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.useconds`
-- `nseconds`: The nanoseconds of a UNIX timestamp or if `diff_from_last` is used it will be the differentially from last `timestamp.nseconds`
-
-### ip_header
-
-The IP header of a message.
-
-```
-[
- ( uint | nint ) ip_bits,
- optional bytes src_addr,
- optional bytes dest_addr,
- optional ( uint | nint ) src_dest_port
-]
-```
-
-- `ip_bits`: Bitmap indicating IP header content, if the type is `nint` it also indicates that it is a reverse from last, see `Deduplication` for more information
- - Bit 0: address family (0=AF_INET, 1=AF_INET6)
- - Bit 1: src_addr present
- - Bit 2: dest_addr present
- - Bit 3: port present
-- `src_addr`: The source address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6
-- `dest_addr`: The destination address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6
-- `src_dest_port`: A combined source and destination port, see `Source And Destination Port`
-
-#### Source And Destination Port
-
-The source and destination port are combined into one value. If both source
-and destination exists then the value is larger then 65535, the destination
-will be the high 16 bits and source the low otherwise it will only be the
-source. If the value is negative then only the destination exists.
-
-```
-if value > 0xffff then
- src_port = value & 0xffff
- dest_port = value >> 16
-else if value < 0 then
- dest_port = -value - 1
-else
- src_port = value
-```
-
-### icmp_message
-
-`if ip_header.ip_bits.1=0 && ip_header.ip_bits.2=0`
-
-```
-[
- uint type,
- uint code
-]
-```
-
-- `type`: TODO
-- `code`: TODO
-
-### udp_message
-
-`if ip_header.ip_bits.1=1 && ip_header.ip_bits.2=0`
-
-TODO
-
-### tcp_message
-
-`if ip_header.ip_bits.2=1`
-
-```
-[
- uint seq_nr,
- uint ack_nr,
- uint tcp_bits,
- uint window
-]
-```
-
-- `seq_nr`: TODO
-- `ack_nr`: TODO
-- `tcp_bits`: TODO
- - 0: URG
- - 1: ACK
- - 2: PSH
- - 3: RST
- - 4: SYN
- - 5: FIN
-- `window`: TODO
-
-### dns_message
-
-A DNS packet.
-
-```
-[
- optional bool is_complete,
- uint id,
- uint raw_dns_header, # TODO
- optional nint count_bits,
- optional uint qdcount,
- optional uint ancount,
- optional uint nscount,
- optional uint arcount,
- optional simple rr_bits,
- optional [
- dns_question question,
- ...
- ],
- optional [
- resource_record answer,
- ...
- ],
- optional [
- resource_record authority,
- ...
- ],
- optional [
- resource_record additional,
- ...
- ],
- optional bytes malformed
-]
-```
-
-- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
-- `id`: DNS identifier
-- `raw_dns_header`: TODO
-- `count_bits`: Bitmap indicating which counts are present, see `Negative Integers` and `Deduplication`
- - Bit 0: qdcount present
- - Bit 1: ancount present
- - Bit 2: nscount present
- - Bit 3: arcount present
-- `qdcount`: Number of question records if different from the number of entries in `question`
-- `ancount`: Number of answer resource records if different from the number of entries in `answer`
-- `nscount`: Number of authority resource records if different from the number of entries in `authority`
-- `arcount`: Number of additional resource records if different from the number of entries in `additional`
-- `question`: The question records
-- `answer`: The answer resource records
-- `authority`: The authority resource records
-- `additional`: The additional resource records
-- `malformed`: Holds the bytes of the message that was not parsed
-
-### question
-
-A DNS question record.
-
-```
-[
- optional bool is_complete,
- ( bytes | compressed_name | rindex ) qname,
- optional uint qtype,
- optional nint qclass
-]
-```
-
-- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
-- `qname`: The QNAME as byte string, a name compression object or a reverse index, see `Deduplication`
-- `qtype`: The QTYPE, see `Deduplication`
-- `qclass`: The QCLASS, see `Negative Integers` and `Deduplication`
-
-### compressed_name
-
-An compressed name which has references to other labels within the same message.
-
-```
-[
- ( bytes label | uint label_index | nint offset | simple extension_bits ),
- ...
-]
-```
-
-- `label`: A byte string with a label part
-- `label_index`: An index to the N byte string label in the message
-- `offset`: The offset specified in the DNS message which could not be translated into a label index
-- `extension_bits`: The extension bits if not 0b00 or 0b11 # TODO: add the extension bits
-
-### resource_record
-
-A DNS resource record.
-
-```
-[
- optional bool is_complete,
- ( bytes | compressed_name | rindex ) name,
- optional simple rr_bits,
- optional uint type,
- optional uint class,
- optional uint ttl,
- optional uint rdlength,
- ( bytes | mixed_rdata ) rdata
-]
-```
-
-- `is_complete`: Will exist and be false if the message is not complete and following attributes may not exists
-- `name`:
-- `rr_bits`: Bitmap indicating what is present, see `Deduplication`
- - Bit 0: type
- - Bit 1: class
- - Bit 2: ttl
- - Bit 3: rdlength # TODO: reverse index for TTL?
-- `type`: The resource record type
-- `class`: The resource record class
-- `ttl`: The resource record ttl
-- `rdlength`: The resource record rdata length
-- `rdata`: The resource record data
-
-### mixed_rdata
-
-An array mixed with resource data and compressed names.
-
-```
-[
- ( bytes | compressed_name ) rdata_part,
- ...
-]
-```
-- `rdata_part`: The parts of the resource records data
-
-## Stream Options
-
-Each option is specified here as OptionName(OptionNumber) and optional
-OptionValue type.
-
-- `RLABELS(0) uint`: Indicates how many labels should be stored in the reverse label index before discarding them
-- `RLABEL_MIN_SIZE(1) uint`: The minimum size a label must be to be put in the reverse label index
-- `RDATA_RINDEX_SIZE(2) uint`: Indicates how many rdata should be stored in the reverse rdata index before discarding them
-- `RDATA_RINDEX_MIN_SIZE(3) uint`: The minimum size a rdata must be to be put in the reverse rdata index
-- `USE_RDATA_INDEX(4)`: If present then the stream uses rdata indexing
-- `RDATA_INDEX_MIN_SIZE(5) uint`: The minimum size a rdata must be to be put in the rdata index
-
-## Deduplication
-
-Deduplication is done in a few different ways, data may be left out to
-indicate that it is the same as the previous value, an index may be used to
-indicate that it is the same as the N previous value and a reverse index
-may be used to indicate that it is the N previous value looking backwards
-across the stream.
-
-In other words, using the index deduplication you will need to build a table
-of the values you come across during the decoding of the stream, this table
-can grow very large.
-
-As an smaller alternative a reverse index can indicate often used data from
-the N previous value looking back over the stream. This type of index also
-reorder itself to try and put the most used data always in the index.
-
-TODO: details of each attribute and it's deduplication