Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6940

REsource LOcation And Discovery (RELOAD) Base Protocol

Pages: 176
Proposed Standard
Errata
Part 4 of 7 – Pages 67 to 92
First   Prev   Next

Top   ToC   RFC6940 - Page 67   prevText

6.5. Forwarding and Link Management Layer

Each node maintains connections to a set of other nodes defined by the Topology Plug-in. This section defines the methods RELOAD uses to form and maintain connections between nodes in the overlay. Three methods are defined: Attach Used to form RELOAD connections between nodes using ICE for NAT traversal. When node A wants to connect to node B, it sends an Attach message to node B through the overlay. The Attach contains A's ICE parameters. B responds with its ICE parameters, and the two nodes perform ICE to form connection. Attach also allows two nodes to connect via No-ICE instead of full ICE. AppAttach Used to form application-layer connections between nodes. Ping A simple request/response which is used to verify connectivity of the target peer.

6.5.1. Attach

A node sends an Attach request when it wishes to establish a direct Overlay Link connection to another node for the purpose of sending RELOAD messages. A client that can establish a connection directly need not send an Attach, as described in the second bullet of Section 4.2.1. As described in Section 6.1, an Attach may be routed to either a Node-ID or a Resource-ID. An Attach routed to a specific Node-ID will fail if that node is not reached. An Attach routed to a Resource-ID will establish a connection with the peer currently responsible for that Resource-ID, which may be useful in establishing a direct connection to the responsible peer for use with frequent or large resource updates. An Attach, in and of itself, does not result in updating the Routing Table of either node. That function is performed by Updates. If node A has Attached to node B, but has not received any Updates from B, it MAY route messages which are directly addressed to B through that channel, but it MUST NOT route messages through B to other peers via that channel. The process of Attaching is separate from the process of becoming a peer (using Join and Update), to prevent half- open states where a node has started to form connections but is not really ready to act as a peer. Thus, clients (unlike peers) can simply Attach without sending Join or Update.
Top   ToC   RFC6940 - Page 68
6.5.1.1. Request Definition
An Attach request message contains the requesting node ICE connection parameters formatted into a binary structure. enum { invalidOverlayLinkType(0), DTLS-UDP-SR(1), DTLS-UDP-SR-NO-ICE(3), TLS-TCP-FH-NO-ICE(4), (255) } OverlayLinkType; enum { invalidCandType(0), host(1), srflx(2), /* RESERVED(3), */ relay(4), (255) } CandType; struct { opaque name<0..2^16-1>; opaque value<0..2^16-1>; } IceExtension; struct { IpAddressPort addr_port; OverlayLinkType overlay_link; opaque foundation<0..255>; uint32 priority; CandType type; select (type) { case host: ; /* Empty */ case srflx: case relay: IpAddressPort rel_addr_port; }; IceExtension extensions<0..2^16-1>; } IceCandidate; struct { opaque ufrag<0..2^8-1>; opaque password<0..2^8-1>; opaque role<0..2^8-1>; IceCandidate candidates<0..2^16-1>; Boolean send_update; } AttachReqAns; The values contained in AttachReqAns are: ufrag The username fragment (from ICE).
Top   ToC   RFC6940 - Page 69
   password
      The ICE password.

   role
      An active/passive/actpass attribute from RFC 4145 [RFC4145].  This
      value MUST be "passive" for the offerer (the peer sending the
      Attach request) and "active" for the answerer (the peer sending
      the Attach response).

   candidates
      One or more ICE candidate values, as described below.

   send_update
      Has the same meaning as the send_update field in RouteQueryReq.

   Each ICE candidate is represented as an IceCandidate structure, which
   is a direct translation of the information from the ICE string
   structures, with the exception of the component ID.  Since there is
   only one component, it is always 1, and thus left out of the
   structure.  The remaining values are specified as follows:

   addr_port
      Corresponds to the ICE connection-address and port productions.

   overlay_link
      Corresponds to the ICE transport production.  Overlay Link
      protocols used with No-ICE MUST specify "No-ICE" in their
      description.  Future overlay link values can be added by defining
      new OverlayLinkType values in the IANA registry as described in
      Section 14.10.  Future extensions to the encapsulation or framing
      that provide for backward compatibility with the previously
      specified encapsulation or framing values MUST use the same
      OverlayLinkType value that was previously defined.
      OverlayLinkType protocols are defined in Section 6.6

      A single AttachReqAns MUST NOT include both candidates whose
      OverlayLinkType protocols use ICE (the default) and candidates
      that specify "No-ICE".

   foundation
      Corresponds to the ICE foundation production.

   priority
      Corresponds to the ICE priority production.

   type
      Corresponds to the ICE cand-type production.
Top   ToC   RFC6940 - Page 70
   rel_addr_port
      Corresponds to the ICE rel-addr and rel-port productions.  It is
      present only for types "relay", "prfix", and "srflx".

   extensions
      ICE extensions.  The name and value fields correspond to binary
      translations of the equivalent fields in the ICE extensions.

   These values should be generated using the procedures described in
   Section 6.5.1.3.

6.5.1.2. Response Definition
If a peer receives an Attach request, it MUST determine how to process the request as follows: o If the peer has not initiated an Attach request to the originating peer of this Attach request, it MUST process this request and SHOULD generate its own response with an AttachReqAns. It should then begin ICE checks. o If the peer has already sent an Attach request to and received the response from the originating peer of this Attach request and, as a result, an ICE check and TLS connection are in progress, then it SHOULD generate an Error_In_Progress error instead of an AttachReqAns. o If the peer has already sent an Attach request to but not yet received the response from the originating peer of this Attach request, it SHOULD apply the following tie-breaker heuristic to determine how to handle this Attach request and the incomplete Attach request it has sent out: * If the peer's own Node-ID is smaller when compared as big- endian unsigned integers, it MUST cancel retransmission of its own incomplete Attach request. It MUST then process this Attach request, generate an AttachReqAns response, and proceed with the corresponding ICE check. * If the peer's own Node-ID is larger when compared as big-endian unsigned integers, it MUST generate an Error_In_Progress error to this Attach request, and then proceed to wait for and complete the Attach and the corresponding ICE check it has originated. o If the peer is overloaded or detects some other kind of error, it MAY generate an error instead of an AttachReqAns.
Top   ToC   RFC6940 - Page 71
   When a peer receives an Attach response, it SHOULD parse the response
   and begin its own ICE checks.

6.5.1.3. Using ICE with RELOAD
This section describes the profile of ICE that is used with RELOAD. RELOAD implementations MUST implement full ICE. In ICE, as defined by [RFC5245], the Session Description Protocol (SDP) is used to carry the ICE parameters. In RELOAD, this function is performed by a binary encoding in the Attach method. This encoding is more restricted than the SDP encoding because the RELOAD environment is simpler: o Only a single media stream is supported. o In this case, the "stream" refers not to RTP or other types of media, but rather to a connection for RELOAD itself or other application-layer protocols, such as SIP. o RELOAD allows only for a single offer/answer exchange. Unlike the usage of ICE within SIP, there is never a need to send a subsequent offer to update the default candidates to match the ones selected by ICE. An agent follows the ICE specification as described in [RFC5245] with the changes and additional procedures described in the subsections below.
6.5.1.4. Collecting STUN Servers
ICE relies on the node having one or more Session Traversal Utilities for NAT (STUN) servers to use. In conventional ICE, it is assumed that nodes are configured with one or more STUN servers through some out-of-band mechanism. This is still possible in RELOAD, but RELOAD also learns STUN servers as it connects to other peers. A peer on a well-provisioned wide-area overlay will be configured with one or more bootstrap nodes. These nodes make an initial list of STUN servers. However, as the peer forms connections with additional peers, it builds more peers that it can use like STUN servers. Because complicated NAT topologies are possible, a peer may need more than one STUN server. Specifically, a peer that is behind a single NAT will typically observe only two IP addresses in its STUN checks: its local address and its server reflexive address from a STUN server outside its NAT. However, if more NATs are involved, a peer may
Top   ToC   RFC6940 - Page 72
   learn additional server reflexive addresses (which vary based on
   where in the topology the STUN server is).  To maximize the chance of
   achieving a direct connection, a peer SHOULD group other peers by the
   peer-reflexive addresses it discovers through them.  It SHOULD then
   select one peer from each group to use as a STUN server for future
   connections.

   Only peers to which the peer currently has connections may be used.
   If the connection to that host is lost, it MUST be removed from the
   list of STUN servers, and a new server from the same group MUST be
   selected unless there are no others servers in the group, in which
   case some other peer MAY be used.

6.5.1.5. Gathering Candidates
When a node wishes to establish a connection for the purposes of RELOAD signaling or application signaling, it follows the process of gathering candidates as described in Section 4 of ICE [RFC5245]. RELOAD utilizes a single component. Consequently, gathering for these "streams" requires a single component. In the case where a node has not yet found a TURN server, the agent would not include a relayed candidate. The ICE specification assumes that an ICE agent is configured with, or somehow knows of, TURN and STUN servers. RELOAD provides a way for an agent to learn these by querying the overlay, as described in Sections 6.5.1.4 and 9. The default candidate selection described in Section 4.1.4 of ICE is ignored; defaults are not signaled or utilized by RELOAD. An alternative to using the full ICE supported by the Attach request is to use the No-ICE mechanism by providing candidates with "No-ICE" Overlay Link protocols. Configuration for the overlay indicates whether or not these Overlay Link protocols can be used. An overlay MUST be either all ICE or all No-ICE. No-ICE will not work in all the scenarios where ICE would work, but in some cases, particularly those with no NATs or firewalls, it will work.
6.5.1.6. Prioritizing Candidates
Standardization of additional protocols for use with ICE is expected, including TCP [RFC6544] and protocols such as the Stream Control Transmission Protocol (SCTP) [RFC4960] and Datagram Congestion Control Protocol (DCCP) [RFC4340]. UDP encapsulations for SCTP and DCCP would expand the Overlay Link protocols available for RELOAD.
Top   ToC   RFC6940 - Page 73
   When additional protocols are available, the following prioritization
   is RECOMMENDED:

   o  Highest priority is assigned to protocols that offer well-
      understood congestion and flow control without head-of-line
      blocking, for example, SCTP without message ordering, DCCP, and
      those protocols encapsulated using UDP.

   o  Second highest priority is assigned to protocols that offer well-
      understood congestion and flow control, but that have head-of-line
      blocking, such as TCP.

   o  Lowest priority is assigned to protocols encapsulated over UDP
      that do not implement well-established congestion control
      algorithms.  The DTLS/UDP with Simple Reliability (SR) overlay
      link protocol is an example of such a protocol.

   Head-of-line blocking is undesirable in an Overlay Link protocol,
   because the messages carried on a RELOAD link are independent, rather
   than stream-oriented.  Therefore, if message N on a link is lost,
   delaying message N+1 on that same link until N is successfully
   retransmitted does nothing other than increase the latency for the
   transaction of message N+1, as they are unrelated to each other.
   Therefore, while the high quality, performance, and availability of
   modern TCP implementations makes them very attractive, their
   performance as Overlay Link protocols is not optimal.

   Note that none of the protocols defined in this document meets these
   conditions, but it is expected that new Overlay Link protocols
   defined in the future will fill this gap.

6.5.1.7. Encoding the Attach Message
Section 4.3 of ICE describes procedures for encoding the SDP for conveying RELOAD candidates. Instead of actually encoding an SDP message, the candidate information (IP address and port and transport protocol, priority, foundation, type, and related address) is carried within the attributes of the Attach request or its response. Similarly, the username fragment and password are carried in the Attach message or its response. Section 6.5.1 describes the detailed attribute encoding for Attach. The Attach request and its response do not contain any default candidates or the ice-lite attribute, as these features of ICE are not used by RELOAD. Since the Attach request contains the candidate information and short term credentials, it is considered as an offer for a single media stream that happens to be encoded in a format different than SDP, but is otherwise considered a valid offer for the purposes of following
Top   ToC   RFC6940 - Page 74
   the ICE specification.  Similarly, the Attach response is considered
   a valid answer for the purposes of following the ICE specification.

6.5.1.8. Verifying ICE Support
An agent MUST skip the verification procedures in Sections 5.1 and 6.1 of ICE. Since RELOAD requires full ICE from all agents, this check is not required.
6.5.1.9. Role Determination
The roles of controlling and controlled, as described in Section 5.2 of ICE, are still utilized with RELOAD. However, the offerer (the entity sending the Attach request) will always be controlling, and the answerer (the entity sending the Attach response) will always be controlled. The connectivity checks MUST still contain the ICE- CONTROLLED and ICE-CONTROLLING attributes, however, even though the role reversal capability for which they are defined will never be needed with RELOAD. This is to allow for a common codebase between ICE for RELOAD and ICE for SDP.
6.5.1.10. Full ICE
When the overlay uses ICE, connectivity checks and nominations are used as in regular ICE.
6.5.1.10.1. Connectivity Checks
The processes of forming check lists in Section 5.7 of ICE, scheduling checks in Section 5.8, and checking connectivity checks in Section 7 are used with RELOAD without change.
6.5.1.10.2. Concluding ICE
The procedures in Section 8 of ICE are followed to conclude ICE, with the following exceptions: o The controlling agent MUST NOT attempt to send an updated offer once the state of its single media stream reaches Completed. o Once the state of ICE reaches Completed, the agent can immediately free all unused candidates. This is because RELOAD does not have the concept of forking, and thus the three-second delay in Section 8.3 of ICE does not apply.
Top   ToC   RFC6940 - Page 75
6.5.1.10.3. Media Keepalives
STUN MUST be utilized for the keepalives described in Section 10 of ICE.
6.5.1.11. No-ICE
No-ICE is selected when either side has provided "no ICE" Overlay Link candidates. STUN is not used for connectivity checks when doing No-ICE; instead, the DTLS or TLS handshake (or similar security layer of future overlay link protocols) forms the connectivity check. The certificate exchanged during the TLS or DTLS handshake MUST match the node which sent the AttachReqAns, and if it does not, the connection MUST be closed.
6.5.1.12. Subsequent Offers and Answers
An agent MUST NOT send a subsequent offer or answer. Thus, the procedures in Section 9 of ICE MUST be ignored.
6.5.1.13. Sending Media
The procedures of Section 11 of ICE apply to RELOAD as well. However, in this case, the "media" takes the form of application- layer protocols (e.g., RELOAD) over TLS or DTLS. Consequently, once ICE processing completes, the agent will begin TLS or DTLS procedures to establish a secure connection. The node that sent the Attach request MUST be the TLS server. The other node MUST be the TLS client. The server MUST request TLS client authentication. The nodes MUST verify that the certificate presented in the handshake matches the identity of the other peer as found in the Attach message. Once the TLS or DTLS signaling is complete, the application protocol is free to use the connection. The concept of a previous selected pair for a component does not apply to RELOAD, since ICE restarts are not possible with RELOAD.
6.5.1.14. Receiving Media
An agent MUST be prepared to receive packets for the application protocol (TLS or DTLS carrying RELOAD) at any time. The jitter and RTP considerations in Section 11 of ICE do not apply to RELOAD.

6.5.2. AppAttach

A node sends an AppAttach request when it wishes to establish a direct connection to another node for the purposes of sending application-layer messages. AppAttach is nearly identical to Attach,
Top   ToC   RFC6940 - Page 76
   except for the purpose of the connection: it is used to transport
   non-RELOAD "media".  A separate request is used to avoid implementer
   confusion between the two methods (this was found to be a real
   problem with initial implementations).  The AppAttach request and its
   response contain an application attribute, which indicates what
   protocol is to be run over the connection.

6.5.2.1. Request Definition
An AppAttachReq message contains the requesting node's ICE connection parameters formatted into a binary structure. struct { opaque ufrag<0..2^8-1>; opaque password<0..2^8-1>; uint16 application; opaque role<0..2^8-1>; IceCandidate candidates<0..2^16-1>; } AppAttachReq; The values contained in AppAttachReq and AppAttachAns are: ufrag The username fragment (from ICE). password The ICE password. application A 16-bit Application-ID, as defined in the Section 14.5. This number represents the IANA-registered application that is going to send data on this connection. role An active/passive/actpass attribute from RFC 4145 [RFC4145]. candidates One or more ICE candidate values. The application using the connection that is set up with this request is responsible for providing traffic of sufficient frequency to keep the NAT and Firewall binding alive. Applications will often send traffic every 25 seconds to ensure this.
Top   ToC   RFC6940 - Page 77
6.5.2.2. Response Definition
If a peer receives an AppAttach request, it SHOULD process the request and generate its own response with a AppAttachAns. It should then begin ICE checks. When a peer receives an AppAttach response, it SHOULD parse the response and begin its own ICE checks. If the Application ID is not supported, the peer MUST reply with an Error_Not_Found error. struct { opaque ufrag<0..2^8-1>; opaque password<0..2^8-1>; uint16 application; opaque role<0..2^8-1>; IceCandidate candidates<0..2^16-1>; } AppAttachAns; The meaning of the fields is the same as in the AppAttachReq.

6.5.3. Ping

Ping is used to test connectivity along a path. A ping can be addressed to a specific Node-ID, to the peer controlling a given location (by using a Resource-ID), or to the wildcard Node-ID.
6.5.3.1. Request Definition
The PingReq structure is used to make a Ping request. struct { opaque<0..2^16-1> padding; } PingReq; The Ping request is empty of meaningful contents. However, it may contain up to 65535 bytes of padding to facilitate the discovery of overlay maximum packet sizes.
6.5.3.2. Response Definition
A successful PingAns response contains the information elements requested by the peer. struct { uint64 response_id; uint64 time; } PingAns;
Top   ToC   RFC6940 - Page 78
   A PingAns message contains the following elements:

   response_id
      A randomly generated 64-bit response ID.  This is used to
      distinguish Ping responses.

   time
      The time when the Ping response was created, represented in the
      same way as storage_time, defined in Section 7.

6.5.4. ConfigUpdate

The ConfigUpdate method is used to push updated configuration data across the overlay. Whenever a node detects that another node has old configuration data, it MUST generate a ConfigUpdate request. The ConfigUpdate request allows updating of two kinds of data: the configuration data (Section 6.3.2.1) and the Kind information (Section 7.4.1.1).
6.5.4.1. Request Definition
The ConfigUpdateReq structure is used to provide updated configuration information. enum { invalidConfigUpdateType(0), config(1), kind(2), (255) } ConfigUpdateType; typedef uint32 KindId; typedef opaque KindDescription<0..2^16-1>; struct { ConfigUpdateType type; uint32 length; select (type) { case config: opaque config_data<0..2^24-1>; case kind: KindDescription kinds<0..2^24-1>; /* This structure may be extended with new types */ }; } ConfigUpdateReq;
Top   ToC   RFC6940 - Page 79
   The ConfigUpdateReq message contains the following elements:

   type
      The type of the contents of the message.  This structure allows
      for unknown content types.

   length
      The length of the remainder of the message.  This is included to
      preserve backward compatibility and is 32 bits instead of 24 to
      facilitate easy conversion between network and host byte order.

   config_data (type==config)
      The contents of the Configuration Document.

   kinds (type==kind)
      One or more XML kind-block productions (see Section 11.1).  These
      MUST be encoded with UTF-8 and assume a default namespace of
      "urn:ietf:params:xml:ns:p2p:config-base".

6.5.4.2. Response Definition
The ConfigUpdateAns structure is used to respond to a ConfigUpdateReq request. struct { } ConfigUpdateAns; If the ConfigUpdateReq is of type "config", it MUST be processed only if all the following are true: o The sequence number in the document is greater than the current configuration sequence number. o The Configuration Document is correctly digitally signed (see Section 11 for details on signatures). Otherwise, appropriate errors MUST be generated. If the ConfigUpdateReq is of type "kind", it MUST be processed only if it is correctly digitally signed by an acceptable Kind signer (i.e., one listed in the current configuration file). Details on the kind-signer field in the configuration file are described in Section 11.1. In addition, if the Kind update conflicts with an existing known Kind (i.e., it is signed by a different signer), then it should be rejected with an Error_Forbidden error. This should not happen in correctly functioning overlays.
Top   ToC   RFC6940 - Page 80
   If the update is acceptable, then the node MUST reconfigure itself to
   match the new information.  This may include adding permissions for
   new Kinds, deleting old Kinds, or even, in extreme circumstances,
   exiting and re-entering the overlay, if, for instance, the DHT
   algorithm has changed.

   If an implementation misses enough ConfigUpdates that include key
   changes, it is possible that it will no longer be able to verify new
   valid ConfigUpdates.  In this case, the only available recovery
   mechanism is to attempt to retrieve a new Configuration Document,
   typically by the mechanisms used for initial bootstrapping.  It is up
   to implementers whether or how to decide to employ this sort of
   recovery mechanism.

   The response for ConfigUpdate is empty.

6.6. Overlay Link Layer

RELOAD can use multiple Overlay Link protocols to send its messages. Because ICE is used to establish connections (see Section 6.5.1.3), RELOAD nodes are able to detect which Overlay Link protocols are offered by other nodes and establish connections between them. Any link protocol needs to be able to establish a secure, authenticated connection and to provide data origin authentication and message integrity for individual data elements. RELOAD currently supports three Overlay Link protocols: o DTLS [RFC6347] over UDP with Simple Reliability (SR) (OverlayLinkType=DTLS-UDP-SR) o TLS [RFC5246] over TCP with Framing Header, No-ICE (OverlayLinkType=TLS-TCP-FH-NO-ICE) o DTLS [RFC6347] over UDP with SR, No-ICE (OverlayLinkType=DTLS-UDP-SR-NO-ICE) Note that although UDP does not properly have "connections", both TLS and DTLS have a handshake that establishes a similar, stateful association. We refer to these as "connections" for the purposes of this document. If a peer receives a message that is larger than the value of max- message-size defined in the overlay configuration, the peer SHOULD send an Error_Message_Too_Large error and then close the TLS or DTLS session from which the message was received. Note that this error can be sent and the session closed before the peer receives the complete message. If the forwarding header is larger than the max-
Top   ToC   RFC6940 - Page 81
   message-size, the receiver SHOULD close the TLS or DTLS session
   without sending an error.

   The RELOAD mechanism requires that failed links be quickly removed
   from the Routing Table so end-to-end retransmission can handle lost
   messages.  Overlay Link protocols MUST be designed with a mechanism
   that quickly signals a likely failure, and implementations SHOULD
   quickly act to remove a failed link from the Routing Table when
   receiving this signal.  The entry can be restored if it proves to
   resume functioning, or it can be replaced at some point in the future
   if necessary.  Section 10.7.2 contains more details specific to the
   CHORD-RELOAD Topology Plug-in.

   The Framing Header (FH) is used to frame messages and provide timing
   when used on a reliable stream-based transport protocol.  Simple
   Reliability (SR) uses the FH to provide congestion control and
   partial reliability when using unreliable message-oriented transport
   protocols.  We will first define each of these algorithms in Sections
   6.6.2 and 6.6.3, and then define Overlay Link protocols that use them
   in Sections 6.6.4, 6.6.5, and 6.6.6.

   Note: We expect future Overlay Link protocols to define replacements
   for all components of these protocols, including the Framing Header.
   The three protocols that we will discuss have been chosen for
   simplicity of implementation and reasonable performance.

6.6.1. Future Overlay Link Protocols

It is possible to define new link-layer protocols and apply them to a new overlay using the "overlay-link-protocol" configuration directive (see Section 11.1.). However, any new protocols MUST meet the following requirements: Endpoint authentication: When a node forms an association with another endpoint, it MUST be possible to cryptographically verify that the endpoint has a given Node-ID. Traffic origin authentication and integrity: When a node receives traffic from another endpoint, it MUST be possible to cryptographically verify that the traffic came from a given association and that it has not been modified in transit from the other endpoint in the association. The overlay link protocol MUST also provide replay prevention/detection. Traffic confidentiality: When a node sends traffic to another endpoint, it MUST NOT be possible for a third party that is not involved in the association to determine the contents of that traffic.
Top   ToC   RFC6940 - Page 82
   Any new overlay protocol MUST be defined via Standards Action
   [RFC5226].  See Section 14.11.

6.6.1.1. HIP
In a Host Identity Protocol Based Overlay Networking Environment (HIP BONE) [RFC6079], HIP [RFC5201] provides connection management (e.g., NAT traversal and mobility) and security for the overlay network. The P2PSIP Working Group has expressed interest in supporting a HIP- based link protocol. Such support would require specifying such details as: o How to issue certificates which provide identities meaningful to the HIP base exchange. We anticipate that this would require a mapping between Overlay Routable Cryptographic Hash Identifiers (ORCHIDs) and NodeIds. o How to carry the HIP I1 and I2 messages. o How to carry RELOAD messages over HIP. [HIP-RELOAD] documents work in progress on using RELOAD with the HIP BONE.
6.6.1.2. ICE-TCP
The ICE-TCP RFC [RFC6544] allows TCP to be supported as an Overlay Link protocol that can be added using ICE.
6.6.1.3. Message-Oriented Transports
Modern message-oriented transports offer high performance and good congestion control, and they avoid head-of-line blocking in case of lost data. These characteristics make them preferable as underlying transport protocols for RELOAD links. SCTP without message ordering and DCCP are two examples of such protocols. However, currently they are not well-supported by commonly available NATs, and specifications for ICE session establishment are not available.
6.6.1.4. Tunneled Transports
As of the time of this writing, there is significant interest in the IETF community in tunneling other transports over UDP, which is motivated by the situation that UDP is well-supported by modern NAT hardware and by the fact that performance similar to a native implementation can be achieved. Currently, SCTP, DCCP, and a generic tunneling extension are being proposed for message-oriented protocols. Once ICE traversal has been specified for these tunneled
Top   ToC   RFC6940 - Page 83
   protocols, they should be straightforward to support as overlay link
   protocols.

6.6.2. Framing Header

In order to support unreliable links and to allow for quick detection of link failures when using reliable end-to-end transports, each message is wrapped in a very simple framing layer (FramedMessage), which is used only for each hop. This layer contains a sequence number which can then be used for ACKs. The same header is used for both reliable and unreliable transports for simplicity of implementation. The definition of FramedMessage is: enum { data(128), ack(129), (255) } FramedMessageType; struct { FramedMessageType type; select (type) { case data: uint32 sequence; opaque message<0..2^24-1>; case ack: uint32 ack_sequence; uint32 received; }; } FramedMessage; The type field of the PDU is set to indicate whether the message is data or an acknowledgement. If the message is of type "data", then the remainder of the PDU is as follows: sequence The sequence number. This increments by one for each framed message sent over this transport session. message The message that is being transmitted. Each connection has it own sequence number space. Initially, the value is zero, and it increments by exactly one for each message sent over that connection.
Top   ToC   RFC6940 - Page 84
   When the receiver receives a message, it SHOULD immediately send an
   ACK message.  The receiver MUST keep track of the 32 most recent
   sequence numbers received on this association in order to generate
   the appropriate ACK.

   If the PDU is of type "ack", the contents are as follows:

   ack_sequence
      The sequence number of the message being acknowledged.

   received
      A bitmask indicating if each of the previous 32 sequence numbers
      before this packet has been among the 32 packets most recently
      received on this connection.  When a packet is received with a
      sequence number N, the receiver looks at the sequence number of
      the 32 previously received packets on this connection.  We call
      the previously received packet number M.  For each of the previous
      32 packets, if the sequence number M is less than N but greater
      than N-32, the N-M bit of the received bitmask is set to one;
      otherwise, it is set to zero.  Note that a bit being set to one
      indicates positively that a particular packet was received, but a
      bit being set to zero means only that it is unknown whether or not
      the packet has been received, because it might have been received
      before the 32 most recently received packets.

   The received field bits in the ACK provide a high degree of
   redundancy so that the sender can figure out which packets the
   receiver has received and can then estimate packet loss rates.  If
   the sender also keeps track of the time at which recent sequence
   numbers have been sent, the RTT (round-trip time) can be estimated.

   Note that because retransmissions receive new sequence numbers,
   multiple ACKs may be received for the same message.  This approach
   provides more information than traditional TCP sequence numbers, but
   care must be taken when applying algorithms designed based on TCP's
   stream-oriented sequence number.

6.6.3. Simple Reliability

When RELOAD is carried over DTLS or another unreliable link protocol, it needs to be used with a reliability and congestion control mechanism, which is provided on a hop-by-hop basis. The basic principle is that each message, regardless of whether or not it carries a request or response, will get an ACK and be reliably retransmitted. The receiver's job is very simple, and is limited to just sending ACKs. All the complexity is at the sender side. This allows the sending implementation to trade off performance versus implementation complexity without affecting the wire protocol.
Top   ToC   RFC6940 - Page 85
   Because the receiver's role is limited to providing packet
   acknowledgements, a wide variety of congestion control algorithms can
   be implemented on the sender side while using the same basic wire
   protocol.  The sender algorithm used MUST meet the requirements of
   [RFC5405].

6.6.3.1. Stop and Wait Sender Algorithm
This section describes one possible implementation of a sender algorithm for Simple Reliability. It is adequate for overlays running on underlying networks with low latency and loss (LANs) or low-traffic overlays on the Internet. A node MUST NOT have more than one unacknowledged message on the DTLS connection at a time. Note that because retransmissions of the same message are given new sequence numbers, there may be multiple unacknowledged sequence numbers in use. The RTO (Retransmission TimeOut) is based on an estimate of the RTT. The value for RTO is calculated separately for each DTLS session. Implementations can use a static value for RTO or a dynamic estimate, which will result in better performance. For implementations that use a static value, the default value for RTO is 500 ms. Nodes MAY use smaller values of RTO if it is known that all nodes are within the local network. The default RTO MAY be set to a larger value, which is RECOMMENDED if it is known in advance (such as on high- latency access links) that the RTT is larger. Implementations that use a dynamic estimate to compute the RTO MUST use the algorithm described in RFC 6298 [RFC6298], with the exception that the value of RTO SHOULD NOT be rounded up to the nearest second, but instead rounded up to the nearest millisecond. The RTT of a successful STUN transaction from the ICE stage is used as the initial measurement for formula 2.2 of RFC 6298. The sender keeps track of the time each message was sent for all recently sent messages. Any time an ACK is received, the sender can compute the RTT for that message by looking at the time the ACK was received and the time when the message was sent. This is used as a subsequent RTT measurement for formula 2.3 of RFC 6298 to update the RTO estimate. (Note that because retransmissions receive new sequence numbers, all received ACKs are used.) An initiating node SHOULD retransmit a message if it has not received an ACK after an interval of RTO (transit nodes do not retransmit at this layer). The node MUST double the time to wait after each retransmission. For each retransmission, the sequence number MUST be incremented.
Top   ToC   RFC6940 - Page 86
   Retransmissions continue until a response is received, until a total
   of 5 requests have been sent, until there has been a hard ICMP error
   [RFC1122], or until a TLS alert indicating the end of the connection
   has been sent or received.  The sender knows a response was received
   when it receives an ACK with a sequence number that indicates it is a
   response to one of the transmissions of this message.  For example,
   assuming an RTO of 500 ms, requests would be sent at times 0 ms, 500
   ms, 1500 ms, 3500 ms, and 7500 ms.  If all retransmissions for a
   message fail, then the sending node SHOULD close the connection
   routing the message.

   To determine when a link might be failing without waiting for the
   final timeout, observe when no ACKs have been received for an entire
   RTO interval, and then wait for three retransmissions to occur beyond
   that point.  If no ACKs have been received by the time the third
   retransmission occurs, it is RECOMMENDED that the link be removed
   from the Routing Table.  The link MAY be restored to the Routing
   Table if ACKs resume before the connection is closed, as described
   above.

   A sender MUST wait 10 ms between receipt of an ACK and transmission
   of the next message.

6.6.4. DTLS/UDP with SR

This overlay link protocol consists of DTLS over UDP while implementing the SR protocol. STUN connectivity checks and keepalives are used. Any compliant sender algorithm may be used.

6.6.5. TLS/TCP with FH, No-ICE

This overlay link protocol consists of TLS over TCP with the framing header. Because ICE is not used, STUN connectivity checks are not used upon establishing the TCP connection, nor are they used for keepalives. Because the TCP layer's application-level timeout is too slow to be useful for overlay routing, the Overlay Link implementation MUST use the framing header to measure the RTT of the connection and calculate an RTO as specified in Section 2 of [RFC6298]. The resulting RTO is not used for retransmissions, but rather as a timeout to indicate when the link SHOULD be removed from the Routing Table. It is RECOMMENDED that such a connection be retained for 30 seconds to determine if the failure was transient before concluding the link has failed permanently. When sending candidates for TLS/TCP with FH, No-ICE, a passive candidate MUST be provided.
Top   ToC   RFC6940 - Page 87

6.6.6. DTLS/UDP with SR, No-ICE

This overlay link protocol consists of DTLS over UDP while implementing the Simple Reliability protocol. Because ICE is not used, no STUN connectivity checks or keepalives are used.

6.7. Fragmentation and Reassembly

In order to allow transmission over datagram protocols such as DTLS, RELOAD messages may be fragmented. Any node along the path can fragment the message, but only the final destination reassembles the fragments. When a node takes a packet and fragments it, each fragment has a full copy of the forwarding header, but the data after the forwarding header is broken up into appropriately sized chunks. The size of the payload chunks needs to take into account space to allow the Via and Destination Lists to grow. Each fragment MUST contain a full copy of the Via List, Destination List, and ForwardingOptions and MUST contain at least 256 bytes of the message body. If these elements cannot fit within the MTU of the underlying datagram protocol, RELOAD fragmentation is not performed, and IP-layer fragmentation is allowed to occur. The length field MUST contain the size of the message after fragmentation. When a message MUST be fragmented, it SHOULD be split into equal-sized fragments that are no larger than the Path MTU (PMTU) of the next overlay link minus 32 bytes. This is to allow the Via List to grow before further fragmentation is required. Note that this fragmentation is not optimal for the end-to-end path -- a message may be refragmented multiple times as it traverses the overlay, but it is assembled only at the final destination. This option has been chosen as it is far easier to implement than end-to- end (e2e) PMTU discovery across an ever-changing overlay and it effectively addresses the reliability issues of relying on IP-layer fragmentation. However, Ping can be used to allow e2e PMTU discovery to be implemented if desired. Upon receipt of a fragmented message by the intended peer, the peer holds the fragments in a holding buffer until the entire message has been received. The message is then reassembled into a single message and processed. In order to mitigate denial-of-service (DoS) attacks, receivers SHOULD time out incomplete fragments after the maximum request lifetime (15 seconds). This time was derived from looking at the end-to-end retransmission time and saving fragments long enough for the full end-to-end retransmissions to take place. Ideally, the receiver would have enough buffer space to deal with as many fragments as can arrive in the maximum request lifetime. However, if
Top   ToC   RFC6940 - Page 88
   the receiver runs out of buffer space to reassemble a message, it
   MUST drop the message.

   The fragment field of the forwarding header is used to encode
   fragmentation information.  The offset is the number of bytes between
   the end of the forwarding header and the start of the data.  The
   first fragment therefore has an offset of 0.  The last fragment
   indicator MUST be appropriately set.  If the message is not
   fragmented, it is simply treated as if it is the only fragment: the
   last fragment bit is set and the offset is 0, resulting in a fragment
   value of 0xC0000000.

   Note: The reason for this definition of the fragment field is that
   originally, the high bit was defined in part of the specification as
   "is fragmented", so there was some specification ambiguity about how
   to encode messages with only one fragment.  This ambiguity was
   resolved in favor of always encoding as the "last" fragment with
   offset 0, thus simplifying the receiver code path, but resulting in
   the high bit being redundant.  Because messages MUST be set with the
   high bit set to 1, implementations SHOULD discard any message with it
   set to 0.  Implementations (presumably legacy ones) which choose to
   accept such messages MUST either ignore the remaining bits or ensure
   that they are 0.  They MUST NOT try to interpret as fragmented
   messages with the high bit set low.

7. Data Storage Protocol

RELOAD provides a set of generic mechanisms for storing and retrieving data in the Overlay Instance. These mechanisms can be used for new applications simply by defining new code points and a small set of rules. No new protocol mechanisms are required. The basic unit of stored data is a single StoredData structure: struct { uint32 length; uint64 storage_time; uint32 lifetime; StoredDataValue value; Signature signature; } StoredData; The contents of this structure are as follows: length The size of the StoredData structure, in bytes, excluding the size of length itself.
Top   ToC   RFC6940 - Page 89
   storage_time
      The time when the data was stored, represented as the number of
      milliseconds elapsed since midnight Jan 1, 1970 UTC, not counting
      leap seconds.  This will have the same values for seconds as
      standard UNIX or POSIX time.  More information can be found at
      [UnixTime].  Any attempt to store a data value with a storage time
      before that of a value already stored at this location MUST
      generate an Error_Data_Too_Old error.  This prevents rollback
      attacks.  The node SHOULD make a best-effort attempt to use a
      correct clock to determine this number.  However, the protocol
      does not require synchronized clocks: the receiving peer uses the
      storage time in the previous store, not its own clock.  Clock
      values are used so that when clocks are generally synchronized,
      data may be stored in a single transaction, rather than querying
      for the value of a counter before the actual store.

      If a node attempting to store new data in response to a user
      request (rather than as an overlay maintenance operation such as
      occurs when healing the overlay from a partition) is rejected with
      an Error_Data_Too_Old error, the node MAY elect to perform its
      store using a storage_time that increments the value used with the
      previous store (this may be obtained by doing a Fetch).  This
      situation may occur when the clocks of nodes storing to this
      location are not properly synchronized.

   lifetime
      The validity period for the data, in seconds, starting from the
      time the peer receives the StoreReq.

   value
      The data value itself, as described in Section 7.2.

   signature
      A signature, as defined in Section 7.1.

   Each Resource-ID specifies a single location in the Overlay Instance.
   However, each location may contain multiple StoredData values,
   distinguished by Kind-ID.  The definition of a Kind describes both
   the data values which may be stored and the data model of the data.
   Some data models allow multiple values to be stored under the same
   Kind-ID.  Section 7.2 describes the available data models.  Thus, for
   instance, a given Resource-ID might contain a single-value element
   stored under Kind-ID X and an array containing multiple values stored
   under Kind-ID Y.
Top   ToC   RFC6940 - Page 90

7.1. Data Signature Computation

Each StoredData element is individually signed. However, the signature also must be self-contained and must cover the Kind-ID and Resource-ID, even though they are not present in the StoredData structure. The input to the signature algorithm is: resource_id || kind || storage_time || StoredDataValue || SignerIdentity where || indicates concatenation and where these values are: resource_id The Resource-ID where this data is stored. kind The Kind-ID for this data. storage_time The contents of the storage_time data value. StoredDataValue The contents of the stored data value, as described in the previous sections. SignerIdentity The signer identity, as defined in Section 6.3.4. Once the signature has been computed, the signature is represented using a signature element, as described in Section 6.3.4. Note that there is no necessary relationship between the validity window of a certificate and the expiry of the data it is authenticating. When signatures are verified, the current time MUST be compared to the certificate validity period. Stored data MAY be set to expire after the signing certificate's validity period. Such signatures are not considered valid after the signing certificate expires. Implementations may "garbage collect" such data at their convenience, either by purging it automatically (perhaps by setting the upper bound on data storage to the lifetime of the signing certificate) or by simply leaving it in place until it expires naturally and relying on users of that data to notice the expired signing certificate.
Top   ToC   RFC6940 - Page 91

7.2. Data Models

The protocol currently defines the following data models: o single value o array o dictionary These are represented with the StoredDataValue structure. The actual data model is known from the Kind being stored. struct { Boolean exists; opaque value<0..2^32-1>; } DataValue; struct { select (DataModel) { case single_value: DataValue single_value_entry; case array: ArrayEntry array_entry; case dictionary: DictionaryEntry dictionary_entry; /* This structure may be extended */ }; } StoredDataValue; The following sections discuss the properties of each data model.

7.2.1. Single Value

A single-value element is a simple sequence of bytes. There may be only one single-value element for each Resource-ID, Kind-ID pair. A single value element is represented as a DataValue, which contains the following two elements: exists This value indicates whether the value exists at all. If it is set to False, it means that no value is present. If it is True, this means that a value is present. This gives the protocol a mechanism for indicating nonexistence as opposed to emptiness.
Top   ToC   RFC6940 - Page 92
   value
      The stored data.

7.2.2. Array

An array is a set of opaque values addressed by an integer index. Arrays are zero based. Note that arrays can be sparse. For instance, a Store of "X" at index 2 in an empty array produces an array with the values [ NA, NA, "X"]. Future attempts to fetch elements at index 0 or 1 will return values with "exists" set to False. An array element is represented as an ArrayEntry: struct { uint32 index; DataValue value; } ArrayEntry; The contents of this structure are: index The index of the data element in the array. value The stored data.

7.2.3. Dictionary

A dictionary is a set of opaque values indexed by an opaque key, with one value for each key. A single dictionary entry is represented as a DictionaryEntry: typedef opaque DictionaryKey<0..2^16-1>; struct { DictionaryKey key; DataValue value; } DictionaryEntry; The contents of this structure are: key The dictionary key for this value. value The stored data.


(next page on part 5)

Next Section