Tech-invite   3GPPspecs   RFCs   Search in Tech-invite

868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100IETF‑orgGroupsStats
in Index   Prev   Next

RFC 8610

Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures

Pages: 64
Group: CBOR
Proposed STD
Part 1 of 4 – Pages 1 to 14
None   None   Next

Top   ToC   RFC8610 - Page 1
Internet Engineering Task Force (IETF)                       H. Birkholz
Request for Comments: 8610                                Fraunhofer SIT
Category: Standards Track                                      C. Vigano
ISSN: 2070-1721                                      Universitaet Bremen
                                                              C. Bormann
                                                 Universitaet Bremen TZI
                                                               June 2019


    Concise Data Definition Language (CDDL): A Notational Convention
         to Express Concise Binary Object Representation (CBOR)
                        and JSON Data Structures

Abstract

   This document proposes a notational convention to express Concise
   Binary Object Representation (CBOR) data structures (RFC 7049).  Its
   main goal is to provide an easy and unambiguous way to express
   structures for protocol messages and data formats that use CBOR or
   JSON.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   https://www.rfc-editor.org/info/rfc8610.
Top   ToC   RFC8610 - Page 2
Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................4
      1.1. Requirements Notation ......................................5
      1.2. Terminology ................................................5
   2. The Style of Data Structure Specification .......................5
      2.1. Groups and Composition in CDDL .............................7
           2.1.1. Usage ..............................................10
           2.1.2. Syntax .............................................10
      2.2. Types .....................................................11
           2.2.1. Values .............................................11
           2.2.2. Choices ............................................11
           2.2.3. Representation Types ...............................13
           2.2.4. Root Type ..........................................14
   3. Syntax .........................................................15
      3.1. General Conventions .......................................15
      3.2. Occurrence ................................................16
      3.3. Predefined Names for Types ................................17
      3.4. Arrays ....................................................18
      3.5. Maps ......................................................19
           3.5.1. Structs ............................................19
           3.5.2. Tables .............................................22
           3.5.3. Non-deterministic Order ............................23
           3.5.4. Cuts in Maps .......................................24
      3.6. Tags ......................................................25
      3.7. Unwrapping ................................................26
      3.8. Controls ..................................................27
           3.8.1. Control Operator .size .............................27
           3.8.2. Control Operator .bits .............................28
           3.8.3. Control Operator .regexp ...........................29
Top   ToC   RFC8610 - Page 3
           3.8.4. Control Operators .cbor and .cborseq ...............30
           3.8.5. Control Operators .within and .and .................30
           3.8.6. Control Operators .lt, .le, .gt, .ge, .eq,
                  .ne, and .default ..................................31
      3.9. Socket/Plug ...............................................32
      3.10. Generics .................................................33
      3.11. Operator Precedence ......................................34
   4. Making Use of CDDL .............................................36
      4.1. As a Guide for a Human User ...............................36
      4.2. For Automated Checking of CBOR Data Structures ............36
      4.3. For Data Analysis Tools ...................................37
   5. Security Considerations ........................................37
   6. IANA Considerations ............................................38
      6.1. CDDL Control Operators Registry ...........................38
   7. References .....................................................40
      7.1. Normative References ......................................40
      7.2. Informative References ....................................41
   Appendix A. Parsing Expression Grammars (PEGs) ....................43
   Appendix B. ABNF Grammar ..........................................45
   Appendix C. Matching Rules ........................................47
   Appendix D. Standard Prelude ......................................52
   Appendix E. Use with JSON .........................................53
   Appendix F. A CDDL Tool ...........................................56
   Appendix G. Extended Diagnostic Notation ..........................56
     G.1. Whitespace in Byte String Notation .........................57
     G.2. Text in Byte String Notation ...............................57
     G.3. Embedded CBOR and CBOR Sequences in Byte Strings ...........57
     G.4. Concatenated Strings .......................................58
     G.5. Hexadecimal, Octal, and Binary Numbers .....................59
     G.6. Comments ...................................................59
   Appendix H. Examples ..............................................60
   Acknowledgements ..................................................63
   Contributors ......................................................63
   Authors' Addresses ................................................64
Top   ToC   RFC8610 - Page 4
1.  Introduction

   In this document, a notational convention to express Concise Binary
   Object Representation (CBOR) data structures [RFC7049] is defined.

   The main goal for the convention is to provide a unified notation
   that can be used when defining protocols that use CBOR.  We term the
   convention "Concise Data Definition Language", or CDDL.

   The CBOR notational convention has the following goals:

   (G1)  Provide an unambiguous description of the overall structure of
         a CBOR data item.

   (G2)  Be flexible in expressing the multiple ways in which data can
         be represented in the CBOR data format.

   (G3)  Be able to express common CBOR datatypes and structures.

   (G4)  Provide a single format that is both readable and editable for
         humans and processable by a machine.

   (G5)  Enable automatic checking of CBOR data items for data format
         compliance.

   (G6)  Enable extraction of specific elements from CBOR data for
         further processing.

   Not an original goal per se, but a convenient side effect of the JSON
   generic data model being a subset of the CBOR generic data model, is
   the fact that CDDL can also be used for describing JSON data
   structures (see Appendix E).

   This document has the following structure:

   The syntax of CDDL is defined in Section 3.  Examples of CDDL and a
   related CBOR data item ("instance"), some of which use the JSON form,
   are described in Appendix H.  Section 4 discusses usage of CDDL.
   Examples are provided throughout the text to better illustrate
   concept definitions.  A formal definition of CDDL using ABNF grammar
   [RFC5234] is provided in Appendix B.  Finally, a _prelude_ of
   standard CDDL definitions that is automatically prepended to, and
   thus available in, every CDDL specification is listed in Appendix D.
Top   ToC   RFC8610 - Page 5
1.1.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

1.2.  Terminology

   New terms are introduced in _cursive_, which is rendered in plain
   text as the new term surrounded by underscores.  CDDL text in the
   running text is in "typewriter", which is rendered in plain text as
   the CDDL text in double quotes (double quotes are also used in the
   usual English sense; the reader is expected to disambiguate this by
   context).

   In this specification, the term "byte" is used in its now-customary
   sense as a synonym for "octet".

2.  The Style of Data Structure Specification

   CDDL focuses on styles of specification that are in use in the
   community employing the data model as pioneered by JSON and now
   refined in CBOR.

   There are a number of more or less atomic elements of a CBOR data
   model, such as numbers, simple values (false, true, nil), text
   strings, and byte strings; CDDL does not focus on specifying their
   structure.  CDDL of course also allows adding a CBOR tag to a
   data item.

   Beyond those atomic elements, further components of a data structure
   definition language are the datatypes used for composition: arrays
   and maps in CBOR (called "arrays" and "objects" in JSON).  While
   these are only two representation formats, they are used to specify
   four loosely distinguishable styles of composition:

   o  A _vector_: an array of elements that are mostly of the same
      semantics.  The set of signatures associated with a signed data
      item is a typical application of a vector.

   o  A _record_: an array the elements of which have different,
      positionally defined semantics, as detailed in the data structure
      definition.  A 2D point, specified as an array of an x coordinate
      (which comes first) and a y coordinate (coming second), is an
      example of a record, as is the pair of exponent (first) and
      mantissa (second) in a CBOR decimal fraction.
Top   ToC   RFC8610 - Page 6
   o  A _table_: a map from a domain of map keys to a domain of map
      values, that are mostly of the same semantics.  A set of language
      tags, each mapped to a text string translated to that specific
      language, is an example of a table.  The key domain is usually not
      limited to a specific set by the specification but is open for the
      application, e.g., in a table mapping IP addresses to Media Access
      Control (MAC) addresses, the specification does not attempt to
      foresee all possible IP addresses.  In a language such as
      JavaScript, a "Map" (as opposed to a plain "Object") would often
      be employed to achieve the generality of the key domain.

   o  A _struct_: a map from a domain of map keys as defined by the
      specification to a domain of map values the semantics of each of
      which is bound to a specific map key.  This is what many people
      have in mind when they think about JSON objects; CBOR adds the
      ability to use map keys that are not just text strings.  Structs
      can be used to solve problems similar to those records are used
      for; the use of explicit map keys facilitates optionality and
      extensibility.

   Two important concepts provide the foundation for CDDL:

   1.  Instead of defining all four types of composition in CDDL
       separately, or even defining one kind for arrays (vectors and
       records) and one kind for maps (tables and structs), there is
       only one kind of composition in CDDL: the _group_ (Section 2.1).

   2.  The other important concept is that of a _type_.  The entire CDDL
       specification defines a type (the one defined by its first
       _rule_), which formally is the set of CBOR data items that are
       acceptable as "instances" for this specification.  CDDL
       predefines a number of basic types such as "uint" (unsigned
       integer) or "tstr" (text string), often making use of a simple
       formal notation for CBOR data items.  Each value that can be
       expressed as a CBOR data item is also a type in its own right,
       e.g., "1".  A type can be built as a _choice_ of other types,
       e.g., an "int" is either a "uint" or a "nint" (negative integer).
       Finally, a type can be built as an array or a map from a group.

   The rest of this section introduces a number of basic concepts of
   CDDL, and Section 3 defines additional syntax.  Appendix C gives a
   concise summary of the semantics of CDDL.
Top   ToC   RFC8610 - Page 7
2.1.  Groups and Composition in CDDL

   CDDL groups are lists of group _entries_, each of which can be a
   name/value pair or a more complex group expression (which then in
   turn stands for a sequence of name/value pairs).  A CDDL group is a
   production in a grammar that matches certain sequences of name/value
   pairs but not others.  The grammar is based on the concepts of
   Parsing Expression Grammars (PEGs) (see Appendix A).

   In an array context, only the value of the name/value pair is
   represented; the name is annotation only (and can be left off from
   the group specification if not needed).  In a map context, the names
   become the map keys ("member keys").

   In an array context, the actual sequence of elements in the group is
   important, as that sequence is the information that allows
   associating actual array elements with entries in the group.  In a
   map context, the sequence of entries in a group is not relevant (but
   there is still a need to write down group entries in a sequence).

   An array matches a specification given as a group when the group
   matches a sequence of name/value pairs the value parts of which
   exactly match the elements of the array in order.

   A map matches a specification given as a group when the group matches
   a sequence of name/value pairs such that all of these name/value
   pairs are present in the map and the map has no name/value pair that
   is not covered by the group.

   A simple example of using a group directly in a map definition is:

                             person = {
                               age: int,
                               name: tstr,
                               employer: tstr,
                             }

                 Figure 1: Using a Group Directly in a Map

   The three entries of the group are written between the curly braces
   that create the map: here, "age", "name", and "employer" are the
   names that turn into the map key text strings, and "int" and "tstr"
   (text string) are the types of the map values under these keys.
Top   ToC   RFC8610 - Page 8
   A group by itself (without creating a map around it) can be placed in
   (round) parentheses and given a name by using it in a rule:

                             pii = (
                               age: int,
                               name: tstr,
                               employer: tstr,
                             )

                          Figure 2: A Basic Group

   This separate, named group definition allows us to rephrase
   Figure 1 as:

                                person = {
                                  pii
                                }

                      Figure 3: Using a Group by Name

   Note that the (curly) braces signify the creation of a map; the
   groups themselves are neutral as to whether they will be used in a
   map or an array.

   As shown in Figure 1, the parentheses for groups are optional when
   there is some other set of brackets present.  Note that they can
   still be used, leading to this not-so-realistic, but perfectly valid,
   example:

                             person = {(
                               age: int,
                               name: tstr,
                               employer: tstr,
                             )}

              Figure 4: Using a Parenthesized Group in a Map
Top   ToC   RFC8610 - Page 9
   Groups can be used to factor out common parts of structs, e.g.,
   instead of writing specifications in copy/paste style, such as in
   Figure 5, one can factor out the common subgroup, choose a name for
   it, and write only the specific parts into the individual maps
   (Figure 6).

                          person = {
                            age: int,
                            name: tstr,
                            employer: tstr,
                          }

                          dog = {
                            age: int,
                            name: tstr,
                            leash-length: float,
                          }

                      Figure 5: Maps with Copy/Paste

                          person = {
                            identity,
                            employer: tstr,
                          }

                          dog = {
                            identity,
                            leash-length: float,
                          }

                          identity = (
                            age: int,
                            name: tstr,
                          )

                 Figure 6: Using a Group for Factorization

   Note that the lists inside the braces in the above definitions
   constitute (anonymous) groups, while "identity" is a named group,
   which can then be included as part of other groups (anonymous as in
   the example, or themselves named).
Top   ToC   RFC8610 - Page 10
2.1.1.  Usage

   Groups are the instrument used in composing data structures with
   CDDL.  It is a matter of style in defining those structures whether
   to define groups (anonymously) right in their contexts or whether to
   define them in a separate rule and to reference them with their
   respective name (possibly more than once).

   With this, one is allowed to define all small parts of their data
   structures and compose bigger protocol data units with those or to
   have only one big protocol data unit that has all definitions ad hoc
   where needed.

2.1.2.  Syntax

   The composition syntax is intended to be concise and easy to read:

   o  The start and end of a group can be marked by "(" and ")".

   o  Definitions of entries inside of a group are noted as follows:
      _keytype => valuetype,_ (read "keytype maps to valuetype").  The
      comma is actually optional (not just in the final entry), but it
      is considered good style to set it.  The double arrow can be
      replaced by a colon in the common case of directly using a text
      string or integer literal as a key; see Section 3.5.1.  This is
      also the common way of naming elements of an array just for
      documentation; see Section 3.4.

   A basic entry consists of a _keytype_ and a _valuetype_, both of
   which are types (Section 2.2); this entry matches any name/value pair
   the name of which is in the keytype and the value of which is in the
   valuetype.

   A group defined as a sequence of group entries matches any sequence
   of name/value pairs that is composed by concatenation in order of
   what the entries match.

   A group definition can also contain choices between groups; see
   Section 2.2.2.
Top   ToC   RFC8610 - Page 11
2.2.  Types

2.2.1.  Values

   Values such as numbers and strings can be used in place of a type.
   (For instance, this is a very common thing to do for a key type,
   common enough that CDDL provides additional convenience syntax
   for this.)

   The value notation is based on the C language, but does not offer all
   the syntactic variations (see Appendix B for details).  The value
   notation for numbers inherits from C the distinction between integer
   values (no fractional part or exponent given -- NR1 [ISO6093];
   "NR" stands for "numerical representation") and floating-point values
   (where a fractional part, an exponent, or both are present -- NR2 or
   NR3), so the type "1" does not include any floating-point numbers
   while the types "1e3" and "1.5" are both floating-point numbers and
   do not include any integer numbers.

2.2.2.  Choices

   Many places that allow a type also allow a choice between types,
   delimited by a "/" (slash).  The entire choice construct can be put
   into parentheses if this is required to make the construction
   unambiguous (please see Appendix B for details of the CDDL grammar).

   Choices of values can be used to express enumerations:

            attire = "bow tie" / "necktie" / "Internet attire"
            protocol = 6 / 17

   Analogous to types, CDDL also allows choices between groups,
   delimited by a "//" (double slash).  Note that the "//" operator
   binds much more weakly than the other CDDL operators, so each line
   within "delivery" in the following example is its own alternative in
   the group choice:

                   address = { delivery }

                   delivery = (
                   street: tstr, ? number: uint, city //
                   po-box: uint, city //
                   per-pickup: true )

                   city = (
                   name: tstr, zip-code: uint
                   )
Top   ToC   RFC8610 - Page 12
   A group choice matches the union of the sets of name/value pair
   sequences that the alternatives in the choice can.

   For both type choices and group choices, additional alternatives can
   be added to a rule later in separate rules by using "/=" and "//=",
   respectively, instead of "=":

                 attire /= "swimwear"

                 delivery //= (
                 lat: float, long: float, drone-type: tstr
                 )

   It is not an error if a name is first used with a "/=" or "//="
   (there is no need to "create it" with "=").

2.2.2.1.  Ranges

   Instead of naming all the values that make up a choice, CDDL allows
   building a _range_ out of two values that are in an ordering
   relationship: a lower bound (first value) and an upper bound (second
   value).  A range can be inclusive of both bounds given (denoted by
   joining two values by ".."), or it can include the lower bound and
   exclude the upper bound (denoted by instead using "...").  If the
   lower bound exceeds the upper bound, the resulting type is the empty
   set (this behavior can be desirable when generics (Section 3.10) are
   being used).

         device-address = byte
         max-byte = 255
         byte = 0..max-byte ; inclusive range
         first-non-byte = 256
         byte1 = 0...first-non-byte ; byte1 is equivalent to byte

   CDDL currently only allows ranges between integers (matching integer
   values) or between floating-point values (matching floating-point
   values).  If both are needed in a type, a type choice between the two
   kinds of ranges can be (clumsily) used:

                int-range = 0..10 ; only integers match
                float-range = 0.0..10.0 ; only floats match
                BAD-range1 = 0..10.0 ; NOT DEFINED
                BAD-range2 = 0.0..10 ; NOT DEFINED
                numeric-range = int-range / float-range

   (See also the control operators .lt/.ge and .le/.gt in
   Section 3.8.6.)
Top   ToC   RFC8610 - Page 13
   Note that the dot is a valid name continuation character in CDDL, so

      min..max

   is not a range expression but a single name.  When using a name as
   the left-hand side of a range operator, use spacing as in

      min .. max

   to separate off the range operator.

2.2.2.2.  Turning a Group into a Choice

   Some choices are built out of large numbers of values, often
   integers, each of which is best given a semantic name in the
   specification.  Instead of naming each of these integers and then
   accumulating them into a choice, CDDL allows building a choice from a
   group by prefixing it with an "&" character:

              terminal-color = &basecolors
              basecolors = (
                black: 0,  red: 1,  green: 2,  yellow: 3,
                blue: 4,  magenta: 5,  cyan: 6,  white: 7,
              )
              extended-color = &(
                basecolors,
                orange: 8,  pink: 9,  purple: 10,  brown: 11,
              )

   As with the use of groups in arrays (Section 3.4), the member names
   have only documentary value (in particular, they might be used by a
   tool when displaying integers that are taken from that choice).

2.2.3.  Representation Types

   CDDL allows the specification of a data item type by referring to the
   CBOR representation (specifically, to major types and additional
   information; see Section 2 of [RFC7049]).  How this is used should be
   evident from the prelude (Appendix D): a hash mark ("#") optionally
   followed by a number from 0 to 7 identifying the major type, which
   then can be followed by a dot and a number specifying the additional
   information.  This construction specifies the set of values that can
   be serialized in CBOR (i.e., "any"), by the given major type if one
   is given, or by the given major type with the additional information
   if both are given.  Where a major type of 6 (Tag) is used, the type
   of the tagged item can be specified by appending it in parentheses.
Top   ToC   RFC8610 - Page 14
   Note that although this notation is based on the CBOR serialization,
   it is about a set of values at the data model level, e.g., "#7.25"
   specifies the set of values that can be represented as half-precision
   floats; it does not mandate that these values also do have to be
   serialized as half-precision floats: CDDL does not provide any
   language means to restrict the choice of serialization variants.
   This also enables the use of CDDL with JSON, which uses a
   fundamentally different way of serializing (some of) the same values.

   It may be necessary to make use of representation types outside the
   prelude, e.g., a specification could start by making use of an
   existing tag in a more specific way or could define a new tag not
   defined in the prelude:

      my_breakfast = #6.55799(breakfast)   ; cbor-any is too general!
      breakfast = cereal / porridge
      cereal = #6.998(tstr)
      porridge = #6.999([liquid, solid])
      liquid = milk / water
      milk = 0
      water = 1
      solid = tstr

2.2.4.  Root Type

   There is no special syntax to identify the root of a CDDL data
   structure definition: that role is simply taken by the first rule
   defined in the file.

   This is motivated by the usual top-down approach for defining data
   structures, decomposing a big data structure unit into smaller parts;
   however, except for the root type, there is no need to strictly
   follow this sequence.

   (Note that there is no way to use a group as a root -- it must be
   a type.)


Next Section