Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 8610

Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures

Pages: 64
Proposed Standard
Errata
Part 1 of 4 – Pages 1 to 14
None   None   Next

Top   ToC   RFC8610 - Page 1
Internet Engineering Task Force (IETF)                       H. Birkholz
Request for Comments: 8610                                Fraunhofer SIT
Category: Standards Track                                      C. Vigano
ISSN: 2070-1721                                      Universitaet Bremen
                                                              C. Bormann
                                                 Universitaet Bremen TZI
                                                               June 2019


    Concise Data Definition Language (CDDL): A Notational Convention
         to Express Concise Binary Object Representation (CBOR)
                        and JSON Data Structures

Abstract

This document proposes a notational convention to express Concise Binary Object Representation (CBOR) data structures (RFC 7049). Its main goal is to provide an easy and unambiguous way to express structures for protocol messages and data formats that use CBOR or JSON. Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc8610.
Top   ToC   RFC8610 - Page 2
Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

1. Introduction ....................................................4 1.1. Requirements Notation ......................................5 1.2. Terminology ................................................5 2. The Style of Data Structure Specification .......................5 2.1. Groups and Composition in CDDL .............................7 2.1.1. Usage ..............................................10 2.1.2. Syntax .............................................10 2.2. Types .....................................................11 2.2.1. Values .............................................11 2.2.2. Choices ............................................11 2.2.3. Representation Types ...............................13 2.2.4. Root Type ..........................................14 3. Syntax .........................................................15 3.1. General Conventions .......................................15 3.2. Occurrence ................................................16 3.3. Predefined Names for Types ................................17 3.4. Arrays ....................................................18 3.5. Maps ......................................................19 3.5.1. Structs ............................................19 3.5.2. Tables .............................................22 3.5.3. Non-deterministic Order ............................23 3.5.4. Cuts in Maps .......................................24 3.6. Tags ......................................................25 3.7. Unwrapping ................................................26 3.8. Controls ..................................................27 3.8.1. Control Operator .size .............................27 3.8.2. Control Operator .bits .............................28 3.8.3. Control Operator .regexp ...........................29
Top   ToC   RFC8610 - Page 3
           3.8.4. Control Operators .cbor and .cborseq ...............30
           3.8.5. Control Operators .within and .and .................30
           3.8.6. Control Operators .lt, .le, .gt, .ge, .eq,
                  .ne, and .default ..................................31
      3.9. Socket/Plug ...............................................32
      3.10. Generics .................................................33
      3.11. Operator Precedence ......................................34
   4. Making Use of CDDL .............................................36
      4.1. As a Guide for a Human User ...............................36
      4.2. For Automated Checking of CBOR Data Structures ............36
      4.3. For Data Analysis Tools ...................................37
   5. Security Considerations ........................................37
   6. IANA Considerations ............................................38
      6.1. CDDL Control Operators Registry ...........................38
   7. References .....................................................40
      7.1. Normative References ......................................40
      7.2. Informative References ....................................41
   Appendix A. Parsing Expression Grammars (PEGs) ....................43
   Appendix B. ABNF Grammar ..........................................45
   Appendix C. Matching Rules ........................................47
   Appendix D. Standard Prelude ......................................52
   Appendix E. Use with JSON .........................................53
   Appendix F. A CDDL Tool ...........................................56
   Appendix G. Extended Diagnostic Notation ..........................56
     G.1. Whitespace in Byte String Notation .........................57
     G.2. Text in Byte String Notation ...............................57
     G.3. Embedded CBOR and CBOR Sequences in Byte Strings ...........57
     G.4. Concatenated Strings .......................................58
     G.5. Hexadecimal, Octal, and Binary Numbers .....................59
     G.6. Comments ...................................................59
   Appendix H. Examples ..............................................60
   Acknowledgements ..................................................63
   Contributors ......................................................63
   Authors' Addresses ................................................64
Top   ToC   RFC8610 - Page 4

1. Introduction

In this document, a notational convention to express Concise Binary Object Representation (CBOR) data structures [RFC7049] is defined. The main goal for the convention is to provide a unified notation that can be used when defining protocols that use CBOR. We term the convention "Concise Data Definition Language", or CDDL. The CBOR notational convention has the following goals: (G1) Provide an unambiguous description of the overall structure of a CBOR data item. (G2) Be flexible in expressing the multiple ways in which data can be represented in the CBOR data format. (G3) Be able to express common CBOR datatypes and structures. (G4) Provide a single format that is both readable and editable for humans and processable by a machine. (G5) Enable automatic checking of CBOR data items for data format compliance. (G6) Enable extraction of specific elements from CBOR data for further processing. Not an original goal per se, but a convenient side effect of the JSON generic data model being a subset of the CBOR generic data model, is the fact that CDDL can also be used for describing JSON data structures (see Appendix E). This document has the following structure: The syntax of CDDL is defined in Section 3. Examples of CDDL and a related CBOR data item ("instance"), some of which use the JSON form, are described in Appendix H. Section 4 discusses usage of CDDL. Examples are provided throughout the text to better illustrate concept definitions. A formal definition of CDDL using ABNF grammar [RFC5234] is provided in Appendix B. Finally, a _prelude_ of standard CDDL definitions that is automatically prepended to, and thus available in, every CDDL specification is listed in Appendix D.
Top   ToC   RFC8610 - Page 5

1.1. Requirements Notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

1.2. Terminology

New terms are introduced in _cursive_, which is rendered in plain text as the new term surrounded by underscores. CDDL text in the running text is in "typewriter", which is rendered in plain text as the CDDL text in double quotes (double quotes are also used in the usual English sense; the reader is expected to disambiguate this by context). In this specification, the term "byte" is used in its now-customary sense as a synonym for "octet".

2. The Style of Data Structure Specification

CDDL focuses on styles of specification that are in use in the community employing the data model as pioneered by JSON and now refined in CBOR. There are a number of more or less atomic elements of a CBOR data model, such as numbers, simple values (false, true, nil), text strings, and byte strings; CDDL does not focus on specifying their structure. CDDL of course also allows adding a CBOR tag to a data item. Beyond those atomic elements, further components of a data structure definition language are the datatypes used for composition: arrays and maps in CBOR (called "arrays" and "objects" in JSON). While these are only two representation formats, they are used to specify four loosely distinguishable styles of composition: o A _vector_: an array of elements that are mostly of the same semantics. The set of signatures associated with a signed data item is a typical application of a vector. o A _record_: an array the elements of which have different, positionally defined semantics, as detailed in the data structure definition. A 2D point, specified as an array of an x coordinate (which comes first) and a y coordinate (coming second), is an example of a record, as is the pair of exponent (first) and mantissa (second) in a CBOR decimal fraction.
Top   ToC   RFC8610 - Page 6
   o  A _table_: a map from a domain of map keys to a domain of map
      values, that are mostly of the same semantics.  A set of language
      tags, each mapped to a text string translated to that specific
      language, is an example of a table.  The key domain is usually not
      limited to a specific set by the specification but is open for the
      application, e.g., in a table mapping IP addresses to Media Access
      Control (MAC) addresses, the specification does not attempt to
      foresee all possible IP addresses.  In a language such as
      JavaScript, a "Map" (as opposed to a plain "Object") would often
      be employed to achieve the generality of the key domain.

   o  A _struct_: a map from a domain of map keys as defined by the
      specification to a domain of map values the semantics of each of
      which is bound to a specific map key.  This is what many people
      have in mind when they think about JSON objects; CBOR adds the
      ability to use map keys that are not just text strings.  Structs
      can be used to solve problems similar to those records are used
      for; the use of explicit map keys facilitates optionality and
      extensibility.

   Two important concepts provide the foundation for CDDL:

   1.  Instead of defining all four types of composition in CDDL
       separately, or even defining one kind for arrays (vectors and
       records) and one kind for maps (tables and structs), there is
       only one kind of composition in CDDL: the _group_ (Section 2.1).

   2.  The other important concept is that of a _type_.  The entire CDDL
       specification defines a type (the one defined by its first
       _rule_), which formally is the set of CBOR data items that are
       acceptable as "instances" for this specification.  CDDL
       predefines a number of basic types such as "uint" (unsigned
       integer) or "tstr" (text string), often making use of a simple
       formal notation for CBOR data items.  Each value that can be
       expressed as a CBOR data item is also a type in its own right,
       e.g., "1".  A type can be built as a _choice_ of other types,
       e.g., an "int" is either a "uint" or a "nint" (negative integer).
       Finally, a type can be built as an array or a map from a group.

   The rest of this section introduces a number of basic concepts of
   CDDL, and Section 3 defines additional syntax.  Appendix C gives a
   concise summary of the semantics of CDDL.
Top   ToC   RFC8610 - Page 7

2.1. Groups and Composition in CDDL

CDDL groups are lists of group _entries_, each of which can be a name/value pair or a more complex group expression (which then in turn stands for a sequence of name/value pairs). A CDDL group is a production in a grammar that matches certain sequences of name/value pairs but not others. The grammar is based on the concepts of Parsing Expression Grammars (PEGs) (see Appendix A). In an array context, only the value of the name/value pair is represented; the name is annotation only (and can be left off from the group specification if not needed). In a map context, the names become the map keys ("member keys"). In an array context, the actual sequence of elements in the group is important, as that sequence is the information that allows associating actual array elements with entries in the group. In a map context, the sequence of entries in a group is not relevant (but there is still a need to write down group entries in a sequence). An array matches a specification given as a group when the group matches a sequence of name/value pairs the value parts of which exactly match the elements of the array in order. A map matches a specification given as a group when the group matches a sequence of name/value pairs such that all of these name/value pairs are present in the map and the map has no name/value pair that is not covered by the group. A simple example of using a group directly in a map definition is: person = { age: int, name: tstr, employer: tstr, } Figure 1: Using a Group Directly in a Map The three entries of the group are written between the curly braces that create the map: here, "age", "name", and "employer" are the names that turn into the map key text strings, and "int" and "tstr" (text string) are the types of the map values under these keys.
Top   ToC   RFC8610 - Page 8
   A group by itself (without creating a map around it) can be placed in
   (round) parentheses and given a name by using it in a rule:

                             pii = (
                               age: int,
                               name: tstr,
                               employer: tstr,
                             )

                          Figure 2: A Basic Group

   This separate, named group definition allows us to rephrase
   Figure 1 as:

                                person = {
                                  pii
                                }

                      Figure 3: Using a Group by Name

   Note that the (curly) braces signify the creation of a map; the
   groups themselves are neutral as to whether they will be used in a
   map or an array.

   As shown in Figure 1, the parentheses for groups are optional when
   there is some other set of brackets present.  Note that they can
   still be used, leading to this not-so-realistic, but perfectly valid,
   example:

                             person = {(
                               age: int,
                               name: tstr,
                               employer: tstr,
                             )}

              Figure 4: Using a Parenthesized Group in a Map
Top   ToC   RFC8610 - Page 9
   Groups can be used to factor out common parts of structs, e.g.,
   instead of writing specifications in copy/paste style, such as in
   Figure 5, one can factor out the common subgroup, choose a name for
   it, and write only the specific parts into the individual maps
   (Figure 6).

                          person = {
                            age: int,
                            name: tstr,
                            employer: tstr,
                          }

                          dog = {
                            age: int,
                            name: tstr,
                            leash-length: float,
                          }

                      Figure 5: Maps with Copy/Paste

                          person = {
                            identity,
                            employer: tstr,
                          }

                          dog = {
                            identity,
                            leash-length: float,
                          }

                          identity = (
                            age: int,
                            name: tstr,
                          )

                 Figure 6: Using a Group for Factorization

   Note that the lists inside the braces in the above definitions
   constitute (anonymous) groups, while "identity" is a named group,
   which can then be included as part of other groups (anonymous as in
   the example, or themselves named).
Top   ToC   RFC8610 - Page 10

2.1.1. Usage

Groups are the instrument used in composing data structures with CDDL. It is a matter of style in defining those structures whether to define groups (anonymously) right in their contexts or whether to define them in a separate rule and to reference them with their respective name (possibly more than once). With this, one is allowed to define all small parts of their data structures and compose bigger protocol data units with those or to have only one big protocol data unit that has all definitions ad hoc where needed.

2.1.2. Syntax

The composition syntax is intended to be concise and easy to read: o The start and end of a group can be marked by "(" and ")". o Definitions of entries inside of a group are noted as follows: _keytype => valuetype,_ (read "keytype maps to valuetype"). The comma is actually optional (not just in the final entry), but it is considered good style to set it. The double arrow can be replaced by a colon in the common case of directly using a text string or integer literal as a key; see Section 3.5.1. This is also the common way of naming elements of an array just for documentation; see Section 3.4. A basic entry consists of a _keytype_ and a _valuetype_, both of which are types (Section 2.2); this entry matches any name/value pair the name of which is in the keytype and the value of which is in the valuetype. A group defined as a sequence of group entries matches any sequence of name/value pairs that is composed by concatenation in order of what the entries match. A group definition can also contain choices between groups; see Section 2.2.2.
Top   ToC   RFC8610 - Page 11

2.2. Types

2.2.1. Values

Values such as numbers and strings can be used in place of a type. (For instance, this is a very common thing to do for a key type, common enough that CDDL provides additional convenience syntax for this.) The value notation is based on the C language, but does not offer all the syntactic variations (see Appendix B for details). The value notation for numbers inherits from C the distinction between integer values (no fractional part or exponent given -- NR1 [ISO6093]; "NR" stands for "numerical representation") and floating-point values (where a fractional part, an exponent, or both are present -- NR2 or NR3), so the type "1" does not include any floating-point numbers while the types "1e3" and "1.5" are both floating-point numbers and do not include any integer numbers.

2.2.2. Choices

Many places that allow a type also allow a choice between types, delimited by a "/" (slash). The entire choice construct can be put into parentheses if this is required to make the construction unambiguous (please see Appendix B for details of the CDDL grammar). Choices of values can be used to express enumerations: attire = "bow tie" / "necktie" / "Internet attire" protocol = 6 / 17 Analogous to types, CDDL also allows choices between groups, delimited by a "//" (double slash). Note that the "//" operator binds much more weakly than the other CDDL operators, so each line within "delivery" in the following example is its own alternative in the group choice: address = { delivery } delivery = ( street: tstr, ? number: uint, city // po-box: uint, city // per-pickup: true ) city = ( name: tstr, zip-code: uint )
Top   ToC   RFC8610 - Page 12
   A group choice matches the union of the sets of name/value pair
   sequences that the alternatives in the choice can.

   For both type choices and group choices, additional alternatives can
   be added to a rule later in separate rules by using "/=" and "//=",
   respectively, instead of "=":

                 attire /= "swimwear"

                 delivery //= (
                 lat: float, long: float, drone-type: tstr
                 )

   It is not an error if a name is first used with a "/=" or "//="
   (there is no need to "create it" with "=").

2.2.2.1. Ranges
Instead of naming all the values that make up a choice, CDDL allows building a _range_ out of two values that are in an ordering relationship: a lower bound (first value) and an upper bound (second value). A range can be inclusive of both bounds given (denoted by joining two values by ".."), or it can include the lower bound and exclude the upper bound (denoted by instead using "..."). If the lower bound exceeds the upper bound, the resulting type is the empty set (this behavior can be desirable when generics (Section 3.10) are being used). device-address = byte max-byte = 255 byte = 0..max-byte ; inclusive range first-non-byte = 256 byte1 = 0...first-non-byte ; byte1 is equivalent to byte CDDL currently only allows ranges between integers (matching integer values) or between floating-point values (matching floating-point values). If both are needed in a type, a type choice between the two kinds of ranges can be (clumsily) used: int-range = 0..10 ; only integers match float-range = 0.0..10.0 ; only floats match BAD-range1 = 0..10.0 ; NOT DEFINED BAD-range2 = 0.0..10 ; NOT DEFINED numeric-range = int-range / float-range (See also the control operators .lt/.ge and .le/.gt in Section 3.8.6.)
Top   ToC   RFC8610 - Page 13
   Note that the dot is a valid name continuation character in CDDL, so

      min..max

   is not a range expression but a single name.  When using a name as
   the left-hand side of a range operator, use spacing as in

      min .. max

   to separate off the range operator.

2.2.2.2. Turning a Group into a Choice
Some choices are built out of large numbers of values, often integers, each of which is best given a semantic name in the specification. Instead of naming each of these integers and then accumulating them into a choice, CDDL allows building a choice from a group by prefixing it with an "&" character: terminal-color = &basecolors basecolors = ( black: 0, red: 1, green: 2, yellow: 3, blue: 4, magenta: 5, cyan: 6, white: 7, ) extended-color = &( basecolors, orange: 8, pink: 9, purple: 10, brown: 11, ) As with the use of groups in arrays (Section 3.4), the member names have only documentary value (in particular, they might be used by a tool when displaying integers that are taken from that choice).

2.2.3. Representation Types

CDDL allows the specification of a data item type by referring to the CBOR representation (specifically, to major types and additional information; see Section 2 of [RFC7049]). How this is used should be evident from the prelude (Appendix D): a hash mark ("#") optionally followed by a number from 0 to 7 identifying the major type, which then can be followed by a dot and a number specifying the additional information. This construction specifies the set of values that can be serialized in CBOR (i.e., "any"), by the given major type if one is given, or by the given major type with the additional information if both are given. Where a major type of 6 (Tag) is used, the type of the tagged item can be specified by appending it in parentheses.
Top   ToC   RFC8610 - Page 14
   Note that although this notation is based on the CBOR serialization,
   it is about a set of values at the data model level, e.g., "#7.25"
   specifies the set of values that can be represented as half-precision
   floats; it does not mandate that these values also do have to be
   serialized as half-precision floats: CDDL does not provide any
   language means to restrict the choice of serialization variants.
   This also enables the use of CDDL with JSON, which uses a
   fundamentally different way of serializing (some of) the same values.

   It may be necessary to make use of representation types outside the
   prelude, e.g., a specification could start by making use of an
   existing tag in a more specific way or could define a new tag not
   defined in the prelude:

      my_breakfast = #6.55799(breakfast)   ; cbor-any is too general!
      breakfast = cereal / porridge
      cereal = #6.998(tstr)
      porridge = #6.999([liquid, solid])
      liquid = milk / water
      milk = 0
      water = 1
      solid = tstr

2.2.4. Root Type

There is no special syntax to identify the root of a CDDL data structure definition: that role is simply taken by the first rule defined in the file. This is motivated by the usual top-down approach for defining data structures, decomposing a big data structure unit into smaller parts; however, except for the root type, there is no need to strictly follow this sequence. (Note that there is no way to use a group as a root -- it must be a type.)


(next page on part 2)

Next Section