Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 8618

Compacted-DNS (C-DNS): A Format for DNS Packet Capture

Pages: 79
Proposed Standard
Part 1 of 4 – Pages 1 to 17
None   None   Next

Top   ToC   RFC8618 - Page 1
Internet Engineering Task Force (IETF)                      J. Dickinson
Request for Comments: 8618                                      J. Hague
Category: Standards Track                                   S. Dickinson
ISSN: 2070-1721                                               Sinodun IT
                                                            T. Manderson
                                                                   ICANN
                                                                 J. Bond
                                              Wikimedia Foundation, Inc.
                                                          September 2019


         Compacted-DNS (C-DNS): A Format for DNS Packet Capture

Abstract

This document describes a data representation for collections of DNS messages. The format is designed for efficient storage and transmission of large packet captures of DNS traffic; it attempts to minimize the size of such packet capture files but retain the full DNS message contents along with the most useful transport metadata. It is intended to assist with the development of DNS traffic- monitoring applications. Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8618.
Top   ToC   RFC8618 - Page 2
Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Data Collection Use Cases . . . . . . . . . . . . . . . . . . 5 4. Design Considerations . . . . . . . . . . . . . . . . . . . . 8 5. Choice of CBOR . . . . . . . . . . . . . . . . . . . . . . . 10 6. C-DNS Format Conceptual Overview . . . . . . . . . . . . . . 10 6.1. Block Parameters . . . . . . . . . . . . . . . . . . . . 14 6.2. Storage Parameters . . . . . . . . . . . . . . . . . . . 14 6.2.1. Optional Data Items . . . . . . . . . . . . . . . . . 15 6.2.2. Optional RRs and OPCODEs . . . . . . . . . . . . . . 16 6.2.3. Storage Flags . . . . . . . . . . . . . . . . . . . . 17 6.2.4. IP Address Storage . . . . . . . . . . . . . . . . . 17 7. C-DNS Format Detailed Description . . . . . . . . . . . . . . 18 7.1. Map Quantities and Indexes . . . . . . . . . . . . . . . 18 7.2. Tabular Representation . . . . . . . . . . . . . . . . . 18 7.3. "File" . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.3.1. "FilePreamble" . . . . . . . . . . . . . . . . . . . 20 7.3.1.1. "BlockParameters" . . . . . . . . . . . . . . . . 20 7.3.1.1.1. "StorageParameters" . . . . . . . . . . . . . 21 7.3.1.1.1.1. "StorageHints" . . . . . . . . . . . . . 22 7.3.1.1.2. "CollectionParameters" . . . . . . . . . . . 24 7.3.2. "Block" . . . . . . . . . . . . . . . . . . . . . . . 25 7.3.2.1. "BlockPreamble" . . . . . . . . . . . . . . . . . 26 7.3.2.2. "BlockStatistics" . . . . . . . . . . . . . . . . 27 7.3.2.3. "BlockTables" . . . . . . . . . . . . . . . . . . 28 7.3.2.3.1. "ClassType" . . . . . . . . . . . . . . . . . 29 7.3.2.3.2. "QueryResponseSignature" . . . . . . . . . . 30 7.3.2.3.3. "Question" . . . . . . . . . . . . . . . . . 33 7.3.2.3.4. "RR" . . . . . . . . . . . . . . . . . . . . 34 7.3.2.3.5. "MalformedMessageData" . . . . . . . . . . . 34
Top   ToC   RFC8618 - Page 3
         7.3.2.4.  "QueryResponse" . . . . . . . . . . . . . . . . .  35
           7.3.2.4.1.  "ResponseProcessingData"  . . . . . . . . . .  36
           7.3.2.4.2.  "QueryResponseExtended" . . . . . . . . . . .  37
         7.3.2.5.  "AddressEventCount" . . . . . . . . . . . . . . .  38
         7.3.2.6.  "MalformedMessage"  . . . . . . . . . . . . . . .  39
   8.  Versioning  . . . . . . . . . . . . . . . . . . . . . . . . .  39
   9.  C-DNS to PCAP . . . . . . . . . . . . . . . . . . . . . . . .  40
     9.1.  Name Compression  . . . . . . . . . . . . . . . . . . . .  42
   10. Data Collection . . . . . . . . . . . . . . . . . . . . . . .  42
     10.1.  Matching Algorithm . . . . . . . . . . . . . . . . . . .  43
     10.2.  Message Identifiers  . . . . . . . . . . . . . . . . . .  45
       10.2.1.  Primary ID (Required)  . . . . . . . . . . . . . . .  45
       10.2.2.  Secondary ID (Optional)  . . . . . . . . . . . . . .  46
     10.3.  Algorithm Parameters . . . . . . . . . . . . . . . . . .  46
     10.4.  Algorithm Requirements . . . . . . . . . . . . . . . . .  46
     10.5.  Algorithm Limitations  . . . . . . . . . . . . . . . . .  47
     10.6.  Workspace  . . . . . . . . . . . . . . . . . . . . . . .  47
     10.7.  Output . . . . . . . . . . . . . . . . . . . . . . . . .  47
     10.8.  Post-Processing  . . . . . . . . . . . . . . . . . . . .  47
   11. Implementation Guidance . . . . . . . . . . . . . . . . . . .  47
     11.1.  Optional Data  . . . . . . . . . . . . . . . . . . . . .  48
     11.2.  Trailing Bytes . . . . . . . . . . . . . . . . . . . . .  48
     11.3.  Limiting Collection of RDATA . . . . . . . . . . . . . .  49
     11.4.  Timestamps . . . . . . . . . . . . . . . . . . . . . . .  49
   12. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  49
     12.1.  Transport Types  . . . . . . . . . . . . . . . . . . . .  49
     12.2.  Data Storage Flags . . . . . . . . . . . . . . . . . . .  50
     12.3.  Response-Processing Flags  . . . . . . . . . . . . . . .  51
     12.4.  AddressEvent Types . . . . . . . . . . . . . . . . . . .  51
   13. Security Considerations . . . . . . . . . . . . . . . . . . .  52
   14. Privacy Considerations  . . . . . . . . . . . . . . . . . . .  52
   15. References  . . . . . . . . . . . . . . . . . . . . . . . . .  53
     15.1.  Normative References . . . . . . . . . . . . . . . . . .  53
     15.2.  Informative References . . . . . . . . . . . . . . . . .  55
   Appendix A.  CDDL . . . . . . . . . . . . . . . . . . . . . . . .  58
   Appendix B.  DNS Name Compression Example . . . . . . . . . . . .  69
     B.1.  NSD Compression Algorithm . . . . . . . . . . . . . . . .  70
     B.2.  Knot Authoritative Compression Algorithm  . . . . . . . .  70
     B.3.  Observed Differences  . . . . . . . . . . . . . . . . . .  71
   Appendix C.  Comparison of Binary Formats . . . . . . . . . . . .  71
     C.1.  Comparison with Full PCAP Files . . . . . . . . . . . . .  74
     C.2.  Simple versus Block Coding  . . . . . . . . . . . . . . .  74
     C.3.  Binary versus Text Formats  . . . . . . . . . . . . . . .  75
     C.4.  Performance . . . . . . . . . . . . . . . . . . . . . . .  75
     C.5.  Conclusions . . . . . . . . . . . . . . . . . . . . . . .  75
     C.6.  Block Size Choice . . . . . . . . . . . . . . . . . . . .  76
Top   ToC   RFC8618 - Page 4
   Appendix D.  Data Fields for Traffic Regeneration . . . . . . . .  77
     D.1.  Recommended Fields for Traffic Regeneration . . . . . . .  77
     D.2.  Issues with Small Data Captures . . . . . . . . . . . . .  77
   Acknowledgements  . . . . . . . . . . . . . . . . . . . . . . . .  78
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  79

1. Introduction

There has long been a need for server operators to collect DNS Queries and Responses on authoritative and recursive name servers for monitoring and analysis. This data is used in a number of ways, including traffic monitoring, analyzing network attacks, and "day in the life" (DITL) [ditl] analysis. A wide variety of tools already exist that facilitate the collection of DNS traffic data, such as the DNS Statistics Collector (DSC) [dsc], packetq [packetq], dnscap [dnscap], and dnstap [dnstap]. However, there is no standard exchange format for large DNS packet captures. The PCAP ("packet capture") [pcap] format or the PCAP Next Generation (PCAP-NG) [pcapng] format is typically used in practice for packet captures, but these file formats can contain a great deal of additional information that is not directly pertinent to DNS traffic analysis and thus unnecessarily increases the capture file size. Additionally, these tools and formats typically have no filter mechanism to selectively record only certain fields at capture time, requiring post-processing for anonymization or pseudonymization of data to protect user privacy. There has also been work on using text-based formats to describe DNS packets (for example, see [dnsxml] and [RFC8427]), but this work is largely aimed at producing convenient representations of single messages. Many DNS operators may receive hundreds of thousands of Queries per second on a single name server instance, so a mechanism to minimize the storage and transmission size (and therefore upload overhead) of the data collected is highly desirable. The format described in this document, C-DNS (Compacted-DNS), focuses on the problem of capturing and storing large packet capture files of DNS traffic with the following goals in mind: o Minimize the file size for storage and transmission. o Minimize the overhead of producing the packet capture file and the cost of any further (general-purpose) compression of the file.
Top   ToC   RFC8618 - Page 5
   This document contains:

   o  A discussion of some common use cases in which DNS data is
      collected; see Section 3.

   o  A discussion of the major design considerations in developing an
      efficient data representation for collections of DNS messages; see
      Section 4.

   o  A description of why the Concise Binary Object Representation
      (CBOR) [RFC7049] was chosen for this format; see Section 5.

   o  A conceptual overview of the C-DNS format; see Section 6.

   o  The definition of the C-DNS format for the collection of DNS
      messages; see Section 7.

   o  Notes on converting C-DNS data to PCAP format; see Section 9.

   o  Some high-level implementation considerations for applications
      designed to produce C-DNS; see Section 10.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. "Packet" refers to an individual IPv4 or IPv6 packet. Typically, packets are UDP datagrams, but such packets may also be part of a TCP data stream. "Message", unless otherwise qualified, refers to a DNS payload extracted from a UDP datagram or a TCP data stream. The parts of DNS messages are named as they are in [RFC1035]. Specifically, the DNS message has five sections: Header, Question, Answer, Authority, and Additional.

3. Data Collection Use Cases

From a purely server operator perspective, collecting full packet captures of all packets going into or out of a name server provides the most comprehensive picture of network activity. However, there are several design choices or other limitations that are common to many DNS installations and operators.
Top   ToC   RFC8618 - Page 6
   o  DNS servers are hosted in a variety of situations:

      *  Self-hosted servers

      *  Third-party hosting (including multiple third parties)

      *  Third-party hardware (including multiple third parties)

   o  Data is collected under different conditions:

      *  On well-provisioned servers running in a steady state

      *  On heavily loaded servers

      *  On virtualized servers

      *  On servers that are under DoS attack

      *  On servers that are unwitting intermediaries in DoS attacks

   o  Traffic can be collected via a variety of mechanisms:

      *  Within the name server implementation itself

      *  On the same hardware as the name server itself

      *  Using a network tap on an adjacent host to listen to DNS
         traffic

      *  Using port mirroring to listen from another host

   o  The capabilities of data collection (and upload) networks vary:

      *  Out-of-band networks with the same capacity as the in-band
         network

      *  Out-of-band networks with less capacity than the in-band
         network

      *  Everything being on the in-band network

   Thus, there is a wide range of use cases, from very limited data
   collection environments (third-party hardware, servers that are under
   attack, packet capture on the name server itself and no out-of-band
   network) to "limitless" environments (self-hosted, well-provisioned
   servers, using a network tap or port mirroring with out-of-band
   networks with the same capacity as the in-band network).  In the
Top   ToC   RFC8618 - Page 7
   former case, it is infeasible to reliably collect full packet
   captures, especially if the server is under attack.  In the latter
   case, collection of full packet captures may be reasonable.

   As a result of these restrictions, the C-DNS data format is designed
   with the most limited use case in mind, such that:

   o  Data collection will occur on the same hardware as the name server
      itself

   o  Collected data will be stored on the same hardware as the name
      server itself, at least temporarily

   o  Collected data being returned to some central analysis system will
      use the same network interface as the DNS Queries and Responses

   o  There can be multiple third-party servers involved

   Because of these considerations, a major factor in the design of the
   format is minimal storage size of the capture files.

   Another significant consideration for any application that records
   DNS traffic is that the running of the name server software and the
   transmission of DNS Queries and Responses are the most important jobs
   of a name server; capturing data is not.  Any data collection system
   co-located with the name server needs to be intelligent enough to
   carefully manage its CPU, disk, memory, and network utilization.
   This leads to designing a format that requires a relatively low
   overhead to produce and minimizes the requirement for further
   potentially costly compression.

   However, it is also essential that interoperability with less
   restricted infrastructure is maintained.  In particular, it is highly
   desirable that the collection format should facilitate the
   re-creation of common formats (such as PCAP) that are as close to the
   original as is realistic, given the restrictions above.
Top   ToC   RFC8618 - Page 8

4. Design Considerations

This section presents some of the major design considerations used in the development of the C-DNS format. 1. The basic unit of data is a combined DNS Query and the associated Response (a "Query/Response (Q/R) data item"). The same structure will be used for unmatched Queries and Responses. Queries without Responses will be captured omitting the Response data. Responses without Queries will be captured omitting the Query data (but using the Question section from the Response, if present, as an identifying QNAME). * Rationale: A Query and the associated Response represent the basic level of a client's interaction with the server. Also, combining the Query and Response into one item often reduces storage requirements due to commonality in the data of the two messages. In the context of generating a C-DNS file, it is assumed that only those DNS payloads that can be parsed to produce a well-formed DNS message are stored in the structured Query/ Response data items of the C-DNS format and that all other messages will (optionally) be recorded as separate malformed messages. Parsing a well-formed message means, at a minimum, the following: * The packet has a well-formed 12-byte DNS Header with a recognized OPCODE. * The section counts are consistent with the section contents. * All of the Resource Records (RRs) can be fully parsed. 2. All top-level fields in each Query/Response data item will be optional. * Rationale: Different operators will have different requirements for data to be available for analysis. Operators with minimal requirements should not have to pay the cost of recording full data, though this will limit the ability to perform certain kinds of data analysis and also to reconstruct packet captures. For example, omitting the RRs from a Response will reduce the C-DNS file size; in principle, Responses can be synthesized if there is enough context. Operators may have different policies for collecting user data and can choose to omit or anonymize certain fields at capture time, e.g., client address.
Top   ToC   RFC8618 - Page 9
   3.  Multiple Query/Response data items will be collected into blocks
       in the format.  Common data in a block will be abstracted and
       referenced from individual Query/Response data items by indexing.
       The maximum number of Query/Response data items in a block will
       be configurable.

       *  Rationale: This blocking and indexing action provides a
          significant reduction in the volume of file data generated.
          Although this introduces complexity, it provides compression
          of the data that makes use of knowledge of the DNS message
          structure.

       *  It is anticipated that the files produced can be subject to
          further compression using general-purpose compression tools.
          Measurements show that blocking significantly reduces the CPU
          required to perform such strong compression.  See
          Appendix C.2.

       *  Examples of commonality between DNS messages are that in most
          cases the QUESTION RR is the same in the Query and Response
          and that there is a finite set of Query "signatures" (based on
          a subset of attributes).  For many authoritative servers,
          there is very likely to be a finite set of Responses that are
          generated, of which a large number are NXDOMAIN.

   4.  Traffic metadata can optionally be included in each block.
       Specifically, counts of some types of non-DNS packets (e.g.,
       ICMP, TCP resets) sent to the server may be of interest.

   5.  The wire-format content of malformed DNS messages may optionally
       be recorded.

       *  Rationale: Any structured capture format that does not capture
          the DNS payload byte for byte will be limited to some extent
          in that it cannot represent malformed DNS messages.  Only
          those messages that can be fully parsed and transformed into
          the structured format can be fully represented.  Note,
          however, that this can result in rather misleading statistics.
          For example, a malformed Query that cannot be represented in
          the C-DNS format will lead to the (well-formed) DNS Response
          with error code FORMERR appearing as "unmatched".  Therefore,
          it can greatly aid downstream analysis to have the wire format
          of the malformed DNS messages available directly in the
          C-DNS file.
Top   ToC   RFC8618 - Page 10

5. Choice of CBOR

This document presents a detailed format description for C-DNS. The format uses CBOR [RFC7049]. The choice of CBOR was made taking a number of factors into account. o CBOR is a binary representation and thus is economical in storage space. o Other binary representations were investigated, and whilst all had attractive features, none had a significant advantage over CBOR. See Appendix C for some discussion of this. o CBOR is an IETF specification and is familiar to IETF participants. It is based on the now-common ideas of lists and objects and thus requires very little familiarization for those in the wider industry. o CBOR is a simple format and can easily be implemented from scratch if necessary. Formats that are more complex require library support, which may present problems on unusual platforms. o CBOR can also be easily converted to text formats such as JSON [RFC8259] for debugging and other human inspection requirements. o CBOR data schemas can be described using the Concise Data Definition Language (CDDL) [RFC8610].

6. C-DNS Format Conceptual Overview

The following figures show purely schematic representations of the C-DNS format to convey the high-level structure of the C-DNS format. Section 7 provides a detailed discussion of the CBOR representation and individual elements. Figure 1 shows the C-DNS format at the top level, including the file header and data blocks. The Query/Response data items, Address/Event Count data items, and Malformed Message data items link to various Block Tables.
Top   ToC   RFC8618 - Page 11
                   +-------+
                   + C-DNS |
                   +-------+--------------------------+
                   | File Type Identifier             |
                   +----------------------------------+
                   | File Preamble                    |
                   | +--------------------------------+
                   | | Format Version                 |
                   | +--------------------------------+
                   | | Block Parameters               |
                   +-+--------------------------------+
                   | Block                            |
                   | +--------------------------------+
                   | | Block Preamble                 |
                   | +--------------------------------+
                   | | Block Statistics               |
                   | +--------------------------------+
                   | | Block Tables                   |
                   | +--------------------------------+
                   | | Query/Response data items      |
                   | +--------------------------------+
                   | | Address/Event Count data items |
                   | +--------------------------------+
                   | | Malformed Message data items   |
                   +-+--------------------------------+
                   | Block                            |
                   | +--------------------------------+
                   | | Block Preamble                 |
                   | +--------------------------------+
                   | | Block Statistics               |
                   | +--------------------------------+
                   | | Block Tables                   |
                   | +--------------------------------+
                   | | Query/Response data items      |
                   | +--------------------------------+
                   | | Address/Event Count data items |
                   | +--------------------------------+
                   | | Malformed Message data items   |
                   +-+--------------------------------+
                   | Further Blocks...                |
                   +----------------------------------+

                        Figure 1: The C-DNS Format

   Figure 2 shows some more-detailed relationships within each Block,
   specifically those between the Query/Response data item and the
   relevant Block Tables.  Some fields have been omitted for clarity.
Top   ToC   RFC8618 - Page 12
   +----------------+
   | Query/Response |
   +-------------------------+
   | Time Offset             |
   +-------------------------+            +------------------+
   | Client Address          |---------+->| IP Address array |
   +-------------------------+         |  +------------------+
   | Client Port             |         |
   +-------------------------+         |  +------------------+
   | Transaction ID          |     +---)->| Name/RDATA array |<--------+
   +-------------------------+     |   |  +------------------+         |
   | Query Signature         |--+  |   |                               |
   +-------------------------+  |  |   |  +-----------------+          |
   | Client Hoplimit (q)     |  +--)---)->| Query Signature |          |
   +-------------------------+     |   |  +-----------------+-------+  |
   | Response Delay (r)      |     |   +--| Server Address          |  |
   +-------------------------+     |      +-------------------------+  |
   | Query Name              |--+--+      | Server Port             |  |
   +-------------------------+  |         +-------------------------+  |
   | Query Size (q)          |  |         | Transport Flags         |  |
   +-------------------------+  |         +-------------------------+  |
   | Response Size (r)       |  |         | QR Type                 |  |
   +-------------------------+  |         +-------------------------+  |
   | Response Processing (r) |  |         | QR Signature Flags      |  |
   | +-----------------------+  |         +-------------------------+  |
   | | Bailiwick             |--+         | Query OPCODE (q)        |  |
   | +-----------------------+            +-------------------------+  |
   | | Flags                 |            | QR DNS Flags            |  |
   +-+-----------------------+            +-------------------------+  |
   | Extra Query Info (q)    |            | Query RCODE (q)         |  |
   | +-----------------------+            +-------------------------+  |
   | | Question              |--+---+  +--+-Query Class/Type (q)    |  |
   | +-----------------------+      |  |  +-------------------------+  |
   | | Answer                |--+   |  |  | Query QDCOUNT (q)       |  |
   | +-----------------------+  |   |  |  +-------------------------+  |
   | | Authority             |--+   |  |  | Query ANCOUNT (q)       |  |
   | +-----------------------+  |   |  |  +-------------------------+  |
   | | Additional            |--+   |  |  | Query NSCOUNT (q)       |  |
Top   ToC   RFC8618 - Page 13
   +-+-----------------------+  |   |  |  +-------------------------+  |
   | Extra Response Info (r) |  |-+ |  |  | Query ARCOUNT (q)       |  |
   | +-----------------------+  | | |  |  +-------------------------+  |
   | | Answer                |--+ | |  |  | Query EDNS version (q)  |  |
   | +-----------------------+  | | |  |  +-------------------------+  |
   | | Authority             |--+ | |  |  | Query EDNS UDP Size (q) |  |
   | +-----------------------+  | | |  |  +-------------------------+  |
   | | Additional            |--+ | |  |  | Query OPT RDATA (q)     |--+
   +-+-----------------------+    | |  |  +-------------------------+  |
                                  | |  |  | Response RCODE (r)      |  |
                                  | |  |  +-------------------------+  |
   + -----------------------------+ |  +----------+                    |
   |                                |             |                    |
   | + -----------------------------+             |                    |
   | |  +---------------+  +----------+           |                    |
   | +->| Question List |->| Question |           |                    |
   |    | array         |  | array    |           |                    |
   |    +---------------+  +----------+--+        |                    |
   |                       | Name        |--+-----)--------------------+
   |                       +-------------+  |     |  +------------+
   |                       | Class/Type  |--)---+-+->| Class/Type |
   |                       +-------------+  |   |    | array      |
   |                                        |   |    +------------+--+
   |                                        |   |    | CLASS         |
   |    +---------------+  +----------+     |   |    +---------------+
   +--->| RR List array |->| RR array |     |   |    | TYPE          |
        +---------+-----+  +----------+--+  |   |    +---------------+
                           | Name        |--+   |
                           +-------------+      |
                           | Class/Type  |------+
                           +-------------+

       Figure 2: The Query/Response Data Item and Subsidiary Tables

   In Figure 2, data items annotated (q) are only present when a
   Query/Response has a Query, and those annotated (r) are only present
   when a Query/Response Response is present.

   A C-DNS file begins with a file header containing a File Type
   Identifier and a File Preamble.  The File Preamble contains
   information on the file Format Version and an array of Block
   Parameters items (the contents of which include Collection and
   Storage Parameters used for one or more Blocks).

   The file header is followed by a series of Blocks.
Top   ToC   RFC8618 - Page 14
   A Block consists of a Block Preamble item, some Block Statistics for
   the traffic stored within the Block, and then various arrays of
   common data collectively called the Block Tables.  This is then
   followed by an array of the Query/Response data items detailing the
   Queries and Responses stored within the Block.  The array of
   Query/Response data items is in turn followed by the Address/Event
   Count data items (an array of per-client counts of particular IP
   events) and then Malformed Message data items (an array of malformed
   messages that are stored in the Block).

   The exact nature of the DNS data will affect what Block size is the
   best fit; however, sample data for a root server indicated that Block
   sizes up to 10,000 Query/Response data items give good results.  See
   Appendix C.6 for more details.

   This design exploits data commonality and block-based storage to
   minimize the C-DNS file size.  As a result, C-DNS cannot be streamed
   below the level of a Block.

6.1. Block Parameters

The details of the Block Parameters items are not shown in the diagrams but are discussed here for context. An array of Block Parameters items is stored in the File Preamble (with a minimum of one item at index 0); a Block Parameters item consists of a collection of Storage and Collection Parameters that applies to any given Block. An array is used in order to support use cases such as wanting to merge C-DNS files from different sources. The Block Preamble item then contains an optional index for the Block Parameters item that applies for that Block; if not present, the index defaults to 0. Hence, in effect, a global Block Parameters item is defined that can then be overridden per Block.

6.2. Storage Parameters

The Block Parameters item includes a Storage Parameters item -- this contains information about the specific data fields stored in the C-DNS file. These parameters include: o The sub-second timing resolution used by the data. o Information (hints) on which optional data are omitted. See Section 6.2.1.
Top   ToC   RFC8618 - Page 15
   o  Recorded OPCODES [opcodes] and RR TYPEs [rrtypes].  See
      Section 6.2.2.

   o  Flags indicating, for example, whether the data is sampled or
      anonymized.  See Sections 6.2.3 and 14.

   o  Client and server IPv4 and IPv6 address prefixes.  See
      Section 6.2.4.

6.2.1. Optional Data Items

To enable implementations to store data to their precise requirements in as space-efficient a manner as possible, all fields in the following arrays are optional: o Query/Response o Query Signature o Malformed Messages In other words, an implementation can choose to omit any data item that is not required for its use case (whilst observing the restrictions relating to IP address storage described in Section 6.2.4). In addition, implementations may be configured to not record all RRs or to only record messages with certain OPCODES. This does, however, mean that a consumer of a C-DNS file faces two problems: 1. How can it quickly determine if a file definitely does not contain the data items it requires to complete a particular task (e.g., reconstructing DNS traffic or performing a specific piece of data analysis)? 2. How can it determine whether a data item is not present because it was (1) explicitly not recorded or (2) not available/present? For example, capturing C-DNS data from within a name server implementation makes it unlikely that the Client Hoplimit can be recorded. Or, if there is no Query ARCOUNT recorded and no Query OPT RDATA [RFC6891] recorded, is that because no Query contained an OPT RR, or because that data was not stored? The Storage Parameters item therefore also contains a Storage Hints item, which specifies which items the encoder of the file omits from the stored data and will therefore never be present. (This approach is taken because a flag that indicated which items were included for
Top   ToC   RFC8618 - Page 16
   collection would not guarantee that the item was present -- only that
   it might be.)  An implementation decoding that file can then use
   these flags to quickly determine whether the input data is not rich
   enough for its needs.

   One scenario where this may be particularly important is the case of
   regenerating traffic.  It is possible to collect such a small set of
   data items that an implementation decoding the file cannot determine
   if a given Query/Response data item was generated from just a Query,
   just a Response, or a Query/Response pair.  This makes it impossible
   to reconstruct DNS traffic even if sensible defaults are provided for
   the missing data items.  This is discussed in more detail in
   Section 9.

6.2.2. Optional RRs and OPCODEs

Also included in the Storage Parameters item are explicit arrays listing the RR TYPEs and the OPCODEs to be recorded. These arrays remove any ambiguity over whether, for example, messages containing particular OPCODEs are not present because (1) certain OPCODEs did not occur or (2) the implementation is not configured to record them. In the case of OPCODEs, for a message to be fully parsable, the OPCODE must be known to the collecting implementation. Any message with an OPCODE unknown to the collecting implementation cannot be validated as correctly formed and so must be treated as malformed. Messages with OPCODES known to the recording application but not listed in the Storage Parameters item are discarded by the recording application during C-DNS capture (regardless of whether they are malformed or not). In the case of RRs, each record in a message must be fully parsable, including parsing the record RDATA, as otherwise the message cannot be validated as correctly formed. Any RR with an RR TYPE not known to the collecting implementation cannot be validated as correctly formed and so must be treated as malformed. Once a message is correctly parsed, an implementation is free to record only a subset of the RRs present.
Top   ToC   RFC8618 - Page 17

6.2.3. Storage Flags

The Storage Parameters item contains flags that can be used to indicate if: o the data is anonymized, o the data is produced from sample data, or o names in the data have been normalized (converted to uniform case). The Storage Parameters item also contains optional fields holding details of the sampling method used and the anonymization method used. It is RECOMMENDED that these fields contain URIs [RFC3986] pointing to resources describing the methods used. See Section 14 for further discussion of anonymization and normalization.

6.2.4. IP Address Storage

The format can store either full IP addresses or just IP prefixes; the Storage Parameters item contains fields to indicate if only IP prefixes were stored. If the IP address prefixes are absent, then full addresses are stored. In this case, the IP version can be directly inferred from the stored address length and the fields "qr-transport-flags" in QueryResponseSignature, "ae-transport-flags" in AddressEventCount, and "mm-transport-flags" in MalformedMessageData (which contain the IP version bit) are optional. If IP address prefixes are given, only the prefix bits of addresses are stored. In this case, in order to determine the IP version, the fields "qr-transport-flags" in QueryResponseSignature, "ae-transport- flags" in AddressEventCount, and "mm-transport-flags" in MalformedMessageData MUST be present. See Sections 7.3.2.3.2 and 7.3.2.3.5. As an example of storing only IP prefixes, if a client IPv6 prefix of 48 is specified, a client address of 2001:db8:85a3::8a2e:370:7334 will be stored as 0x20010db885a3, reducing address storage space requirements. Similarly, if a client IPv4 prefix of 16 is specified, a client address of 192.0.2.1 will be stored as 0xc000 (192.0).


(next page on part 2)

Next Section