RFC 6824

TCP Extensions for Multipath Operation with Multiple Addresses

Pages: 64
Obsoleted by: 8684

Part 1 of 4 – Pages 1 to 12

RFC6824 - Page 1

Internet Engineering Task Force (IETF)                           A. Ford
Request for Comments: 6824                                         Cisco
Category: Experimental                                         C. Raiciu
ISSN: 2070-1721                             U. Politechnica of Bucharest
                                                              M. Handley
                                                       U. College London
                                                          O. Bonaventure
                                                U. catholique de Louvain
                                                            January 2013


     TCP Extensions for Multipath Operation with Multiple Addresses

Abstract

   TCP/IP communication is currently restricted to a single path per
   connection, yet multiple paths often exist between peers.  The
   simultaneous use of these multiple paths for a TCP/IP session would
   improve resource usage within the network and, thus, improve user
   experience through higher throughput and improved resilience to
   network failure.

   Multipath TCP provides the ability to simultaneously use multiple
   paths between peers.  This document presents a set of extensions to
   traditional TCP to support multipath operation.  The protocol offers
   the same type of service to applications as TCP (i.e., reliable
   bytestream), and it provides the components necessary to establish
   and use multiple TCP flows across potentially disjoint paths.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for examination, experimental implementation, and
   evaluation.

   This document defines an Experimental Protocol for the Internet
   community.  This document is a product of the Internet Engineering
   Task Force (IETF).  It represents the consensus of the IETF
   community.  It has received public review and has been approved for
   publication by the Internet Engineering Steering Group (IESG).  Not
   all documents approved by the IESG are a candidate for any level of
   Internet Standard; see Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc6824.

RFC6824 - Page 2

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................4
      1.1. Design Assumptions .........................................4
      1.2. Multipath TCP in the Networking Stack ......................5
      1.3. Terminology ................................................6
      1.4. MPTCP Concept ..............................................7
      1.5. Requirements Language ......................................8
   2. Operation Overview ..............................................8
      2.1. Initiating an MPTCP Connection .............................9
      2.2. Associating a New Subflow with an Existing MPTCP
           Connection .................................................9
      2.3. Informing the Other Host about Another Potential Address ..10
      2.4. Data Transfer Using MPTCP .................................11
      2.5. Requesting a Change in a Path's Priority ..................11
      2.6. Closing an MPTCP Connection ...............................12
      2.7. Notable Features ..........................................12
   3. MPTCP Protocol .................................................12
      3.1. Connection Initiation .....................................14
      3.2. Starting a New Subflow ....................................18
      3.3. General MPTCP Operation ...................................23
           3.3.1. Data Sequence Mapping ..............................25
           3.3.2. Data Acknowledgments ...............................28
           3.3.3. Closing a Connection ...............................29
           3.3.4. Receiver Considerations ............................30
           3.3.5. Sender Considerations ..............................31
           3.3.6. Reliability and Retransmissions ....................32
           3.3.7. Congestion Control Considerations ..................33
           3.3.8. Subflow Policy .....................................34
      3.4. Address Knowledge Exchange (Path Management) ..............35
           3.4.1. Address Advertisement ..............................36
           3.4.2. Remove Address .....................................39
      3.5. Fast Close ................................................40

RFC6824 - Page 3

      3.6. Fallback ..................................................41
      3.7. Error Handling ............................................45
      3.8. Heuristics ................................................45
           3.8.1. Port Usage .........................................46
           3.8.2. Delayed Subflow Start ..............................46
           3.8.3. Failure Handling ...................................47
   4. Semantic Issues ................................................48
   5. Security Considerations ........................................49
   6. Interactions with Middleboxes ..................................51
   7. Acknowledgments ................................................55
   8. IANA Considerations ............................................55
   9. References .....................................................57
      9.1. Normative References ......................................57
      9.2. Informative References ....................................57
   Appendix A. Notes on Use of TCP Options ...........................59
   Appendix B. Control Blocks ........................................60
      B.1. MPTCP Control Block .......................................60
           B.1.1. Authentication and Metadata ........................60
           B.1.2. Sending Side .......................................61
           B.1.3. Receiving Side .....................................61
      B.2. TCP Control Blocks ........................................62
           B.2.1. Sending Side .......................................62
           B.2.2. Receiving Side .....................................62
   Appendix C. Finite State Machine ..................................63

RFC6824 - Page 4

1.  Introduction

   Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to
   provide a Multipath TCP [2] service, which enables a transport
   connection to operate across multiple paths simultaneously.  This
   document presents the protocol changes required to add multipath
   capability to TCP; specifically, those for signaling and setting up
   multiple paths ("subflows"), managing these subflows, reassembly of
   data, and termination of sessions.  This is not the only information
   required to create a Multipath TCP implementation, however.  This
   document is complemented by three others:

   o  Architecture [2], which explains the motivations behind Multipath
      TCP, contains a discussion of high-level design decisions on which
      this design is based, and an explanation of a functional
      separation through which an extensible MPTCP implementation can be
      developed.

   o  Congestion control [5] presents a safe congestion control
      algorithm for coupling the behavior of the multiple paths in order
      to "do no harm" to other network users.

   o  Application considerations [6] discusses what impact MPTCP will
      have on applications, what applications will want to do with
      MPTCP, and as a consequence of these factors, what API extensions
      an MPTCP implementation should present.

1.1.  Design Assumptions

   In order to limit the potentially huge design space, the working
   group imposed two key constraints on the Multipath TCP design
   presented in this document:

   o  It must be backwards-compatible with current, regular TCP, to
      increase its chances of deployment.

   o  It can be assumed that one or both hosts are multihomed and
      multiaddressed.

   To simplify the design, we assume that the presence of multiple
   addresses at a host is sufficient to indicate the existence of
   multiple paths.  These paths need not be entirely disjoint: they may
   share one or many routers between them.  Even in such a situation,
   making use of multiple paths is beneficial, improving resource
   utilization and resilience to a subset of node failures.  The
   congestion control algorithms defined in [5] ensure this does not act
   detrimentally.  Furthermore, there may be some scenarios where
   different TCP ports on a single host can provide disjoint paths (such

RFC6824 - Page 5

   as through certain Equal-Cost Multipath (ECMP) implementations [7]),
   and so the MPTCP design also supports the use of ports in path
   identifiers.

   There are three aspects to the backwards-compatibility listed above
   (discussed in more detail in [2]):

   External Constraints:  The protocol must function through the vast
      majority of existing middleboxes such as NATs, firewalls, and
      proxies, and as such must resemble existing TCP as far as possible
      on the wire.  Furthermore, the protocol must not assume the
      segments it sends on the wire arrive unmodified at the
      destination: they may be split or coalesced; TCP options may be
      removed or duplicated.

   Application Constraints:  The protocol must be usable with no change
      to existing applications that use the common TCP API (although it
      is reasonable that not all features would be available to such
      legacy applications).  Furthermore, the protocol must provide the
      same service model as regular TCP to the application.

   Fallback:  The protocol should be able to fall back to standard TCP
      with no interference from the user, to be able to communicate with
      legacy hosts.

   The complementary application considerations document [6] discusses
   the necessary features of an API to provide backwards-compatibility,
   as well as API extensions to convey the behavior of MPTCP at a level
   of control and information equivalent to that available with regular,
   single-path TCP.

   Further discussion of the design constraints and associated design
   decisions are given in the MPTCP Architecture document [2] and in
   [8].

1.2.  Multipath TCP in the Networking Stack

   MPTCP operates at the transport layer and aims to be transparent to
   both higher and lower layers.  It is a set of additional features on
   top of standard TCP; Figure 1 illustrates this layering.  MPTCP is
   designed to be usable by legacy applications with no changes;
   detailed discussion of its interactions with applications is given in
   [6].

RFC6824 - Page 6

                                   +-------------------------------+
                                   |           Application         |
      +---------------+            +-------------------------------+
      |  Application  |            |             MPTCP             |
      +---------------+            + - - - - - - - + - - - - - - - +
      |      TCP      |            | Subflow (TCP) | Subflow (TCP) |
      +---------------+            +-------------------------------+
      |      IP       |            |       IP      |      IP       |
      +---------------+            +-------------------------------+

      Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks

1.3.  Terminology

   This document makes use of a number of terms that are either MPTCP-
   specific or have defined meaning in the context of MPTCP, as follows:

   Path:  A sequence of links between a sender and a receiver, defined
      in this context by a 4-tuple of source and destination address/
      port pairs.

   Subflow:  A flow of TCP segments operating over an individual path,
      which forms part of a larger MPTCP connection.  A subflow is
      started and terminated similar to a regular TCP connection.

   (MPTCP) Connection:  A set of one or more subflows, over which an
      application can communicate between two hosts.  There is a one-to-
      one mapping between a connection and an application socket.

   Data-level:  The payload data is nominally transferred over a
      connection, which in turn is transported over subflows.  Thus, the
      term "data-level" is synonymous with "connection level", in
      contrast to "subflow-level", which refers to properties of an
      individual subflow.

   Token:  A locally unique identifier given to a multipath connection
      by a host.  May also be referred to as a "Connection ID".

   Host:  An end host operating an MPTCP implementation, and either
      initiating or accepting an MPTCP connection.

   In addition to these terms, note that MPTCP's interpretation of, and
   effect on, regular single-path TCP semantics are discussed in
   Section 4.

RFC6824 - Page 7

1.4.  MPTCP Concept

   This section provides a high-level summary of normal operation of
   MPTCP, and is illustrated by the scenario shown in Figure 2.  A
   detailed description of operation is given in Section 3.

   o  To a non-MPTCP-aware application, MPTCP will behave the same as
      normal TCP.  Extended APIs could provide additional control to
      MPTCP-aware applications [6].  An application begins by opening a
      TCP socket in the normal way.  MPTCP signaling and operation are
      handled by the MPTCP implementation.

   o  An MPTCP connection begins similarly to a regular TCP connection.
      This is illustrated in Figure 2 where an MPTCP connection is
      established between addresses A1 and B1 on Hosts A and B,
      respectively.

   o  If extra paths are available, additional TCP sessions (termed
      MPTCP "subflows") are created on these paths, and are combined
      with the existing session, which continues to appear as a single
      connection to the applications at both ends.  The creation of the
      additional TCP session is illustrated between Address A2 on Host A
      and Address B1 on Host B.

   o  MPTCP identifies multiple paths by the presence of multiple
      addresses at hosts.  Combinations of these multiple addresses
      equate to the additional paths.  In the example, other potential
      paths that could be set up are A1<->B2 and A2<->B2.  Although this
      additional session is shown as being initiated from A2, it could
      equally have been initiated from B1.

   o  The discovery and setup of additional subflows will be achieved
      through a path management method; this document describes a
      mechanism by which a host can initiate new subflows by using its
      own additional addresses, or by signaling its available addresses
      to the other host.

   o  MPTCP adds connection-level sequence numbers to allow the
      reassembly of segments arriving on multiple subflows with
      differing network delays.

   o  Subflows are terminated as regular TCP connections, with a four-
      way FIN handshake.  The MPTCP connection is terminated by a
      connection-level FIN.

RFC6824 - Page 8

               Host A                               Host B
      ------------------------             ------------------------
      Address A1    Address A2             Address B1    Address B2
      ----------    ----------             ----------    ----------
          |             |                      |             |
          |     (initial connection setup)     |             |
          |----------------------------------->|             |
          |<-----------------------------------|             |
          |             |                      |             |
          |            (additional subflow setup)            |
          |             |--------------------->|             |
          |             |<---------------------|             |
          |             |                      |             |
          |             |                      |             |

                  Figure 2: Example MPTCP Usage Scenario

1.5.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [3].

2.  Operation Overview

   This section presents a single description of common MPTCP operation,
   with reference to the protocol operation.  This is a high-level
   overview of the key functions; the full specification follows in
   Section 3.  Extensibility and negotiated features are not discussed
   here.  Considerable reference is made to symbolic names of MPTCP
   options throughout this section -- these are subtypes of the IANA-
   assigned MPTCP option (see Section 8), and their formats are defined
   in the detailed protocol specification that follows in Section 3.

   A Multipath TCP connection provides a bidirectional bytestream
   between two hosts communicating like normal TCP and, thus, does not
   require any change to the applications.  However, Multipath TCP
   enables the hosts to use different paths with different IP addresses
   to exchange packets belonging to the MPTCP connection.  A Multipath
   TCP connection appears like a normal TCP connection to an
   application.  However, to the network layer, each MPTCP subflow looks
   like a regular TCP flow whose segments carry a new TCP option type.
   Multipath TCP manages the creation, removal, and utilization of these
   subflows to send data.  The number of subflows that are managed
   within a Multipath TCP connection is not fixed and it can fluctuate
   during the lifetime of the Multipath TCP connection.

RFC6824 - Page 9

   All MPTCP operations are signaled with a TCP option -- a single
   numerical type for MPTCP, with "sub-types" for each MPTCP message.
   What follows is a summary of the purpose and rationale of these
   messages.

2.1.  Initiating an MPTCP Connection

   This is the same signaling as for initiating a normal TCP connection,
   but the SYN, SYN/ACK, and ACK packets also carry the MP_CAPABLE
   option.  This is variable length and serves multiple purposes.
   Firstly, it verifies whether the remote host supports Multipath TCP;
   secondly, this option allows the hosts to exchange some information
   to authenticate the establishment of additional subflows.  Further
   details are given in Section 3.1.

      Host A                                  Host B
      ------                                  ------
      MP_CAPABLE            ->
      [A's key, flags]
                            <-                MP_CAPABLE
                                              [B's key, flags]
      ACK + MP_CAPABLE      ->
      [A's key, B's key, flags]

2.2.  Associating a New Subflow with an Existing MPTCP Connection

   The exchange of keys in the MP_CAPABLE handshake provides material
   that can be used to authenticate the endpoints when new subflows will
   be set up.  Additional subflows begin in the same way as initiating a
   normal TCP connection, but the SYN, SYN/ACK, and ACK packets also
   carry the MP_JOIN option.

   Host A initiates a new subflow between one of its addresses and one
   of Host B's addresses.  The token -- generated from the key -- is
   used to identify which MPTCP connection it is joining, and the HMAC
   is used for authentication.  The Hash-based Message Authentication
   Code (HMAC) uses the keys exchanged in the MP_CAPABLE handshake, and
   the random numbers (nonces) exchanged in these MP_JOIN options.
   MP_JOIN also contains flags and an Address ID that can be used to
   refer to the source address without the sender needing to know if it
   has been changed by a NAT.  Further details are in Section 3.2.

RFC6824 - Page 10

      Host A                                  Host B
      ------                                  ------
      MP_JOIN               ->
      [B's token, A's nonce,
       A's Address ID, flags]
                            <-                MP_JOIN
                                              [B's HMAC, B's nonce,
                                               B's Address ID, flags]
      ACK + MP_JOIN         ->
      [A's HMAC]

                            <-                ACK

2.3.  Informing the Other Host about Another Potential Address

   The set of IP addresses associated to a multihomed host may change
   during the lifetime of an MPTCP connection.  MPTCP supports the
   addition and removal of addresses on a host both implicitly and
   explicitly.  If Host A has established a subflow starting at address
   IP#-A1 and wants to open a second subflow starting at address IP#-A2,
   it simply initiates the establishment of the subflow as explained
   above.  The remote host will then be implicitly informed about the
   new address.

   In some circumstances, a host may want to advertise to the remote
   host the availability of an address without establishing a new
   subflow, for example, when a NAT prevents setup in one direction.  In
   the example below, Host A informs Host B about its alternative IP
   address (IP#-A2).  Host B may later send an MP_JOIN to this new
   address.  Due to the presence of middleboxes that may translate IP
   addresses, this option uses an address identifier to unambiguously
   identify an address on a host.  Further details are in Section 3.4.1.

      Host A                                 Host B
      ------                                 ------
      ADD_ADDR                  ->
      [IP#-A2,
       IP#-A2's Address ID]

   There is a corresponding signal for address removal, making use of
   the Address ID that is signaled in the add address handshake.
   Further details in Section 3.4.2.

      Host A                                 Host B
      ------                                 ------
      REMOVE_ADDR               ->
      [IP#-A2's Address ID]

RFC6824 - Page 11

2.4.  Data Transfer Using MPTCP

   To ensure reliable, in-order delivery of data over subflows that may
   appear and disappear at any time, MPTCP uses a 64-bit data sequence
   number (DSN) to number all data sent over the MPTCP connection.  Each
   subflow has its own 32-bit sequence number space and an MPTCP option
   maps the subflow sequence space to the data sequence space.  In this
   way, data can be retransmitted on different subflows (mapped to the
   same DSN) in the event of failure.

   The "Data Sequence Signal" carries the "Data Sequence Mapping".  The
   data sequence mapping consists of the subflow sequence number, data
   sequence number, and length for which this mapping is valid.  This
   option can also carry a connection-level acknowledgment (the "Data
   ACK") for the received DSN.

   With MPTCP, all subflows share the same receive buffer and advertise
   the same receive window.  There are two levels of acknowledgment in
   MPTCP.  Regular TCP acknowledgments are used on each subflow to
   acknowledge the reception of the segments sent over the subflow
   independently of their DSN.  In addition, there are connection-level
   acknowledgments for the data sequence space.  These acknowledgments
   track the advancement of the bytestream and slide the receiving
   window.

   Further details are in Section 3.3.

      Host A                                 Host B
      ------                                 ------
      DATA_SEQUENCE_SIGNAL      ->
      [Data Sequence Mapping]
      [Data ACK]
      [Checksum]

2.5.  Requesting a Change in a Path's Priority

   Hosts can indicate at initial subflow setup whether they wish the
   subflow to be used as a regular or backup path -- a backup path only
   being used if there are no regular paths available.  During a
   connection, Host A can request a change in the priority of a subflow
   through the MP_PRIO signal to Host B.  Further details are in
   Section 3.3.8.

      Host A                                 Host B
      ------                                 ------
      MP_PRIO                   ->

RFC6824 - Page 12

2.6.  Closing an MPTCP Connection

   When Host A wants to inform Host B that it has no more data to send,
   it signals this "Data FIN" as part of the Data Sequence Signal (see
   above).  It has the same semantics and behavior as a regular TCP FIN,
   but at the connection level.  Once all the data on the MPTCP
   connection has been successfully received, then this message is
   acknowledged at the connection level with a DATA_ACK.  Further
   details are in Section 3.3.3.

      Host A                                 Host B
      ------                                 ------
      DATA_SEQUENCE_SIGNAL      ->
      [Data FIN]

                                <-           (MPTCP DATA_ACK)

2.7.  Notable Features

   It is worth highlighting that MPTCP's signaling has been designed
   with several key requirements in mind:

   o  To cope with NATs on the path, addresses are referred to by
      Address IDs, in case the IP packet's source address gets changed
      by a NAT.  Setting up a new TCP flow is not possible if the
      passive opener is behind a NAT; to allow subflows to be created
      when either end is behind a NAT, MPTCP uses the ADD_ADDR message.

   o  MPTCP falls back to ordinary TCP if MPTCP operation is not
      possible, for example, if one host is not MPTCP capable or if a
      middlebox alters the payload.

   o  To meet the threats identified in [9], the following steps are
      taken: keys are sent in the clear in the MP_CAPABLE messages;
      MP_JOIN messages are secured with HMAC-SHA1 ([10], [4]) using
      those keys; and standard TCP validity checks are made on the other
      messages (ensuring sequence numbers are in-window).

(page 12 continued on part 2)