Internet Research Task Force (IRTF) W. Eddy Request for Comments: 6256 MTI Systems Category: Informational E. Davies ISSN: 2070-1721 Folly Consulting May 2011 Using Self-Delimiting Numeric Values in Protocols
AbstractSelf-Delimiting Numeric Values (SDNVs) have recently been introduced as a field type in proposed Delay-Tolerant Networking protocols. SDNVs encode an arbitrary-length non-negative integer or arbitrary- length bitstring with minimum overhead. They are intended to provide protocol flexibility without sacrificing economy and to assist in future-proofing protocols under development. This document describes formats and algorithms for SDNV encoding and decoding, along with notes on implementation and usage. This document is a product of the Delay-Tolerant Networking Research Group and has been reviewed by that group. No objections to its publication as an RFC were raised. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Research Task Force (IRTF). The IRTF publishes the results of Internet-related research and development activities. These results might not be suitable for deployment. This RFC represents the consensus of the Delay-Tolerant Networking Research Group of the Internet Research Task Force (IRTF). Documents approved for publication by the IRSG are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6256.
Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. 1. Introduction ....................................................2 1.1. Problems with Fixed-Value Fields ...........................3 1.2. SDNVs for DTN Protocols ....................................4 1.3. SDNV Usage .................................................5 2. Definition of SDNVs .............................................6 3. Basic Algorithms ................................................8 3.1. Encoding Algorithm .........................................8 3.2. Decoding Algorithm .........................................9 3.3. Limitations of Implementations ............................10 4. Comparison to Alternatives .....................................10 5. Security Considerations ........................................13 6. Acknowledgements ...............................................13 7. Informative References .........................................14 Appendix A. SDNV Python Source Code ...............................15
encoding rules and Haverty's Message Services Data Transmission Protocol (MSDTP) [RFC0713]. Because of this focus, SDNVs can be quickly implemented with only a small amount of code. SDNVs are tersely defined in both the Bundle Protocol [RFC5050] and Licklider Transmission Protocol (LTP) [RFC5326] specifications, due to the flow of document production in the DTNRG. This document clarifies and further explains the motivations and engineering decisions behind SDNVs. RFC0793], encoding a maximum value of around 65 kilobytes. The purpose of this value is to provide flow control, by allowing a receiver to specify how many sent bytes its peer can have outstanding (unacknowledged) at any time, thus allowing the receiver to limit its buffer size. As network speeds have grown by several orders of magnitude since TCP's inception, the combination of the 65 kilobyte maximum advertised window and long round-trip times prevented TCP senders from being able to achieve the high throughput that the underlying network supported. This limitation was remedied through the use of the Window Scale option [RFC1323], which provides a multiplier for the advertised window field. However, the Window Scale multiplier is fixed for the duration of the connection, requires support from each end of a TCP connection, and limits the precision of the advertised receive window, so this is certainly a less-than-ideal solution. Because of the field width limit in the original design however, the Window Scale is necessary for TCP to reach high sending rates. An IPv4 address is fixed at 32 bits [RFC0791] (as a historical note, an early version of the IP header format specification in [IEN21] used variable-length addresses in multiples of 8 bits up to 120 bits). Due to the way that subnetting and assignment of address blocks was performed, the number of IPv4 addresses has been seen as a
limit to the growth of the Internet [Hain05]. Two divergent paths to solve this problem have been the use of Network Address Translators (NATs) and the development of IPv6. NATs have caused a number of other issues and problems [RFC2993], leading to increased complexity and fragility, as well as forcing workarounds to be engineered for many other protocols to function within a NATed environment. The IPv6 solution's transitional work has been underway for several years, but has still only just begun to have visible impact on the global Internet. Of course, in both the case of the TCP receive window and IPv4 address length, the field size chosen by the designers seemed like a good idea at the time. The fields were more than big enough for the originally perceived usage of the protocols, and yet were small enough to allow the headers to remain compact and relatively easy and efficient to parse on machines of the time. The fixed sizes that were defined represented a trade-off between the scalability of the protocol versus the overhead and efficiency of processing. In both cases, these engineering decisions turned out to be painfully restrictive in the longer term. RFC5050] and Licklider Transmission Protocol (LTP) [RFC5326], SDNVs have been used for several fields including identifiers, payload/header lengths, and serial (sequence) numbers. SDNVs were developed for use in these types of fields, to avoid sending more bytes than needed, as well as avoiding fixed sizes that may not end up being appropriate. For example, since LTP is intended primarily for use in long-delay interplanetary communications [RFC5325], where links may be fairly low in capacity, it is desirable to avoid the header overhead of routinely sending a 64-bit field where a 16-bit field would suffice. Since many of the nodes implementing LTP are expected to be beyond the current range of human spaceflight, upgrading their on-board LTP implementations to use longer values if the defined fields are found to be too short would also be problematic. Furthermore, extensions similar in mechanism to TCP's Window Scale option are unsuitable for use in DTN protocols since, due to high delays, DTN protocols must avoid handshaking and configuration parameter negotiation to the greatest extent possible. All of these reasons make the choice of SDNVs for use in DTN protocols attractive.
RFC4963]). o Sequence numbers as in TCP or the Stream Control Transmission Protocol (SCTP). o Values used in cryptographic algorithms such as RSA keys, Diffie- Hellman key agreement, or coordinates of points on elliptic curves. o Message lengths as used in file transfer protocols. o Nonces and cookies. As any bitfield can be interpreted as an unsigned integer, SDNVs can also encode arbitrary-length bitfields, including bitfields representing signed integers or other data types; however, this document assumes SDNV encoding and decoding in terms of unsigned integers. Implementations may differ in the interface that they provide to SDNV encoding and decoding functions, in terms of whether the values are numeric, bitfields, etc.; this detail does not alter the representation or algorithms described in this document. The use of SDNVs rather than fixed-length fields gives protocol designers the ability to ameliorate the consequences of making difficult-to-reverse field-sizing decisions, as the SDNV format grows and shrinks depending on the particular value encoded. SDNVs do not necessarily provide optimal encodings for values of any particular length; however, they allow protocol designers to avoid potential blunders in assigning fixed lengths and remove the complexity involved with either negotiating field lengths or constructing protocol extensions. However, if SDNVs are used to encode bitfields, it is essential that the sender and receiver have a consistent interpretation of the decoded value. This is discussed further in Section 2. To our knowledge, at this time, no IETF transport or network-layer protocol designed for use outside of the DTN domain has proposed to use SDNVs; however, there is no inherent reason not to use SDNVs more
broadly in the future. The two examples cited here, of fields that have proven too small in general Internet protocols, are only a small sampling of the much larger set of similar instances that the authors can think of. Outside the Internet protocols, within ASN.1 and previous ITU protocols, constructs very similar to SDNVs have been used for many years due to engineering concerns very similar to those facing the DTNRG. Many protocols use a Type-Length-Value method for encoding variable- length fields (e.g., TCP's options format or many of the fields in the Internet Key Exchange Protocol version 2 (IKEv2)). An SDNV is equivalent to combining the length and value portions of this type of field, with the overhead of the length portion amortized out over the bytes of the value. The penalty paid for this in an SDNV may be several extra bytes for long values (e.g., 1024-bit RSA keys). See Section 4 for further discussion and a comparison. As is shown in later sections, for large values, the current SDNV scheme is fairly inefficient in terms of space (1/8 of the bits are overhead) and not particularly easy to encode/decode in comparison to alternatives. The best use of SDNVs may often be to define the Length field of a TLV structure to be an SDNV whose value is the length of the TLV's Value field. In this way, one can avoid forcing large numbers from being directly encoded as an SDNV, yet retain the extensibility that using SDNVs grants. BRF04]) bore a resemblance to the ASN.1 [ASN1] Basic Encoding Rules (BER) [X.690] for lengths (Section 8.1.3 of X.690). The current SDNV format is the one used by ASN.1 BER for encoding tag identifiers greater than or equal to 31 (Section 220.127.116.11.2 of X.690). A comparison between the current SDNV format and the early SDNV format is made in Section 4. The format currently used is very simple. Before encoding, an integer is represented as a left-to-right bitstring beginning with its most significant bit and ending with its least significant bit. If the bitstring's length is not a multiple of 7, then the string is left-padded with zeros. When transmitted, the bits are encoded into a series of bytes. The low-order 7 bits of each byte in the encoded format are taken left-to-right from the integer's bitstring
representation. The most significant bit of each byte specifies whether it is the final byte of the encoded value (when it holds a 0), or not (when it holds a 1). For example: o 1 (decimal) is represented by the bitstring "0000001" and encoded as the single byte 0x01 (in hexadecimal). o 128 is represented by the bitstring "10000001 00000000" and encoded as the bytes 0x81 followed by 0x00. o Other values can be found in the test vectors of the source code in Appendix A. To be perfectly clear, and avoid potential interoperability issues (as have occurred with ASN.1 BER time values), we explicitly state two considerations regarding zero-padding. (1) When encoding SDNVs, any leading (most significant) zero bits in the input number might be discarded by the SDNV encoder. Protocols that use SDNVs should not rely on leading-zeros being retained after encoding and decoding operations. (2) When decoding SDNVs, the relevant number of leading zeros required to pad up to a machine word or other natural data unit might be added. These are put in the most significant positions in order to not change the value of the number. Protocols using SDNVs should consider situations where lost zero-padding may be problematic. The issues of zero-padding are particularly relevant where an SDNV is being used to represent a bitfield to be transmitted by a protocol. The specification of the protocol and any associated IANA registry should specify the allocation and usage of bit positions within the unencoded field. Unassigned and reserved bits in the unencoded field will be treated as zeros by the SDNV encoding prior to transmission. Assuming the bit positions are numbered starting from 0 at the least significant bit position in the integer representation, then if higher-numbered positions in the field contain all zeros, the encoding process may not transmit these bits explicitly (e.g., if all the bit positions numbered 7 or higher are zeros, then the transmitted SDNV can consist of just one octet). On reception, the decoding process will treat any untransmitted higher-numbered bits as zeros. To ensure correct operation of the protocol, the sender and receiver must have a consistent interpretation of the width of the bitfield. This can be achieved in various ways: o the bitfield width is implicitly defined by the version of the protocol in use in the sender and receiver,
o sending the width of the bitfield explicitly in a separate item, o the higher-numbered bits can be safely ignored by the receiver (e.g., because they represent optimizations), or o marking the highest-numbered bit by prepending a '1' bit to the bitfield. The protocol specification must record how the consistent interpretation is achieved. The SDNV encoding technique is also known as Variable Byte Encoding (see Section 5.3.1 of [Manning09]) and is equivalent to Base-128 Elias Gamma Encoding (see Section 5.3.2 of [Manning09] and Section 3.5 of [Sayood02]). However, the primary motivation for SDNVs is to provide an extensible protocol framework rather than optimal data compression, which is the motivation behind the other uses of the technique. [Manning09] points out that the key feature of this encoding is that it is "prefix free" meaning that no code is a prefix of any other, which is an alternative way of expressing the self- delimiting property. Appendix A contains Python source code implementing the routines described here. The algorithms presented here are convenient for converting between an internal data block and serialized data stream associated with a transmission device. Other approaches are possible with different efficiencies and trade-offs.
This encoding algorithm has a time complexity of O(log n), since it takes a number of steps equal to ceil(n/7), and no additional space beyond the size of the result (8/7 log n) is required. One aspect of this algorithm is that it assumes strings can be efficiently appended to new bytes. One way to implement this is to allocate a buffer for the expected length of the result and fill that buffer one byte at a time from the right end. If, for some reason, an implementation requires an encoded SDNV to be some specific length (possibly related to a machine word), any leftmost zero-padding included needs to properly set the high-order bit in each byte of padding. Section 3.3, protocols using SDNVs must specify the largest number of bits that an implementation is expected to handle, which mitigates this problem. o (Initial Step) Set the result to 0. Set an index to the first byte of the encoded SDNV. o (Recursion Step) Shift the result left 7 bits. Add the low-order 7 bits of the value at the index to the result. If the high-order bit under the pointer is a 1, advance the index by one byte within the encoded SDNV and repeat the Recursion Step, otherwise return the current value of the result. This decoding algorithm takes no more additional space than what is required for the result (7/8 the length of the SDNV) and the pointer. The complication is that before the result can be left-shifted in the Recursion Step, an implementation needs to first make sure that this will not cause any bits to be lost, and re-allocate a larger piece of memory for the result, if required. The pure time complexity is the same as for the encoding algorithm given, but if re-allocation is needed due to the inability to predict the size of the result, decoding may be slower. These decoding steps include removal of any leftmost zero-padding that might be used by an encoder to create encodings of a certain length.
Section 4.1 of [RFC5050] specifies that implementations of the DTN Bundle Protocol are not required to handle SDNVs with more than 64 bits in their unencoded value. Accordingly, integer values transmitted in SDNVs have an upper limit and SDNV-encoded flag fields must be limited to 64 bit positions in any future revisions of the protocol unless the restriction is altered. BRF04], and (3) the current SDNV format. The TLV method uses two fixed-length fields to hold the Type and Length elements that then imply the syntax and semantics of the Value element. This is only similar to an SDNV in that the value element can grow or shrink within the bounds capable of being conveyed by the Length field. Two fundamental differences between TLVs and SDNVs are that through the Type element, TLVs also contain some notion of what their contents are semantically, while SDNVs are simply generic non- negative integers, and protocol engineers still have to choose fixed- field lengths for the Type and Length fields in the TLV format. Some protocols use TLVs where the value conveyed within the Length field needs to be decoded into the actual length of the Value field. This may be accomplished through simple multiplication, left- shifting, or a look-up table. In any case, this tactic limits the granularity of the possible Value lengths, and can contribute some degree of bloat if Values do not fit neatly within the available decoded Lengths. In the SDNV format originally used by LTP, parsing the first byte of the SDNV told an implementation how much space was required to hold the contained value. There were two different types of SDNVs defined for different ranges of use. The SDNV-8 type could hold values up to 127 in a single byte, while the SDNV-16 type could hold values up to 32,767 in 2 bytes. Both formats could encode values requiring up to
N bytes in N+2 bytes, where N<127. The major difference between this old SDNV format and the current SDNV format is that the new format is not as easily decoded as the old format was, but the new format also has absolutely no limitation on its length. The advantage in ease of parsing the old format manifests itself in two aspects: (1) the size of the value is determinable ahead of time, in a way equivalent to parsing a TLV, and (2) the actual value is directly encoded and decoded, without shifting and masking bits as is required in the new format. For these reasons, the old format requires less computational overhead to deal with, but is also very limited in that it can only hold a 1024-bit number, at maximum. Since according to IETF Best Current Practices, an asymmetric cryptography key needed to last for a long term requires using moduli of over 1228 bits [RFC3766], this could be seen as a severe limitation of the old style of SDNVs, from which the currently used style does not suffer. Table 1 compares the maximum values that can be encoded into SDNVs of various lengths using the old SDNV-8/16 method and the current SDNV method. The only place in this table where SDNV-16 is used rather than SDNV-8 is in the 2-byte row. Starting with a single byte, the two methods are equivalent, but when using 2 bytes, the old method is a more compact encoding by one bit. From 3 to 7 bytes of length though, the current SDNV format is more compact, since it only requires one bit per byte of overhead, whereas the old format used a full byte. Thus, at 8 bytes, both schemes are equivalent in efficiency since they both use 8 bits of overhead. Up to 129 bytes, the old format is more compact than the current one, although after this, limit it becomes unusable.
+-------+---------------+-------------+---------------+-------------+ | Bytes | SDNV-8/16 | SDNV | SDNV-8/16 | SDNV | | | Maximum Value | Maximum | Overhead Bits | Overhead | | | | Value | | Bits | +-------+---------------+-------------+---------------+-------------+ | 1 | 127 | 127 | 1 | 1 | | | | | | | | 2 | 32,767 | 16,383 | 1 | 2 | | | | | | | | 3 | 65,535 | 2,097,151 | 8 | 3 | | | | | | | | 4 | 2^24 - 1 | 2^28 - 1 | 8 | 4 | | | | | | | | 5 | 2^32 - 1 | 2^35 - 1 | 8 | 5 | | | | | | | | 6 | 2^40 - 1 | 2^42 - 1 | 8 | 6 | | | | | | | | 7 | 2^48 - 1 | 2^49 - 1 | 8 | 7 | | | | | | | | 8 | 2^56 - 1 | 2^56 - 1 | 8 | 8 | | | | | | | | 9 | 2^64 - 1 | 2^63 - 1 | 8 | 9 | | | | | | | | 10 | 2^72 - 1 | 2^70 - 1 | 8 | 10 | | | | | | | | 16 | 2^120 - 1 | 2^112 - 1 | 8 | 16 | | | | | | | | 32 | 2^248 - 1 | 2^224 - 1 | 8 | 32 | | | | | | | | 64 | 2^504 - 1 | 2^448 - 1 | 8 | 64 | | | | | | | | 128 | 2^1016 - 1 | 2^896 - 1 | 8 | 128 | | | | | | | | 129 | 2^1024 - 1 | 2^903 - 1 | 8 | 129 | | | | | | | | 130 | N/A | 2^910 - 1 | N/A | 130 | | | | | | | | 256 | N/A | 2^1792 - 1 | N/A | 256 | +-------+---------------+-------------+---------------+-------------+ Table 1 Suggested usages of the SDNV format that leverage its strengths and limit the effects of its weaknesses are discussed in Section 1.3. Another aspect of the comparison between SDNVs and alternatives using fixed-length fields is the result of errors in transmission. Bit- errors in an SDNV can result in either errors in the decoded value,
or parsing errors in subsequent fields of the protocol. In fixed- length fields, bit errors always result in errors to the decoded value rather than parsing errors in subsequent fields. If the decoded values from either type of field encoding (SDNV or fixed- length) are used as indexes, offsets, or lengths of further fields in the protocol, similar failures result.
[ASN1] ITU-T Rec. X.680, "Abstract Syntax Notation One (ASN.1). Specification of Basic Notation", ISO/IEC 8824-1:2002, 2002. [BRF04] Burleigh, S., Ramadas, M., and S. Farrell, "Licklider Transmission Protocol", Work in Progress, May 2004. [Hain05] Hain, T., "A Pragmatic Report on IPv4 Address Space Consumption", Internet Protocol Journal Vol. 8, No. 3, September 2005. [IEN21] Cerf, V. and J. Postel, "Specification of Internetwork Transmission Control Program: TCP Version 3", Internet Experimental Note 21, January 1978. [Manning09] Manning, c., Raghavan, P., and H. Schuetze, "Introduction to Information Retrieval", Cambridge University Press ISBN-13: 978-0521865715, 2009, <http://informationretrieval.org/>. [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission Protocol", RFC 713, April 1976. [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC1323] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for High Performance", RFC 1323, May 1992. [RFC2993] Hain, T., "Architectural Implications of NAT", RFC 2993, November 2000. [RFC3766] Orman, H. and P. Hoffman, "Determining Strengths For Public Keys Used For Exchanging Symmetric Keys", BCP 86, RFC 3766, April 2004. [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly Errors at High Data Rates", RFC 4963, July 2007. [RFC5050] Scott, K. and S. Burleigh, "Bundle Protocol Specification", RFC 5050, November 2007.
[RFC5325] Burleigh, S., Ramadas, M., and S. Farrell, "Licklider Transmission Protocol - Motivation", RFC 5325, September 2008. [RFC5326] Ramadas, M., Burleigh, S., and S. Farrell, "Licklider Transmission Protocol - Specification", RFC 5326, September 2008. [Sayood02] Sayood, K., "Lossless Data Compression", Academic Press ISBN-13: 978-0126208610, December 2002, <http://books.google.co.uk/books?id=LjQiGwyabVwC>. [X.690] ITU-T Rec. X.690, "Abstract Syntax Notation One (ASN.1). Encoding Rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER)", ISO/IEC 8825-1:2002, 2002.
flag = 0x80 if n == 0: done = True return r # test cases from LTP and BP internet-drafts, only print failures def sdnv_test(): tests = [(0xABC, chr(0x95) + chr(0x3C)), (0x1234, chr(0xA4) + chr (0x34)), (0x4234, chr(0x81) + chr(0x84) + chr(0x34)), (0x7F, chr(0x7F))] for tp in tests: # test encoding function if sdnv_encode(tp) != tp: print "sdnv_encode fails on input %s" % hex(tp) # test decoding function if sdnv_decode(tp) != tp: print "sdnv_decode fails on input %s, giving %s" % \ (hex(tp), sdnv_decode(tp))