Internet Engineering Task Force (IETF) A. Ford
Request for Comments: 6824 Cisco
Category: Experimental C. Raiciu
ISSN: 2070-1721 U. Politechnica of Bucharest
M. Handley
U. College London
O. Bonaventure
U. catholique de Louvain
January 2013 TCP Extensions for Multipath Operation with Multiple Addresses
Abstract
TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network and, thus, improve user
experience through higher throughput and improved resilience to
network failure.
Multipath TCP provides the ability to simultaneously use multiple
paths between peers. This document presents a set of extensions to
traditional TCP to support multipath operation. The protocol offers
the same type of service to applications as TCP (i.e., reliable
bytestream), and it provides the components necessary to establish
and use multiple TCP flows across potentially disjoint paths.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for examination, experimental implementation, and
evaluation.
This document defines an Experimental Protocol for the Internet
community. This document is a product of the Internet Engineering
Task Force (IETF). It represents the consensus of the IETF
community. It has received public review and has been approved for
publication by the Internet Engineering Steering Group (IESG). Not
all documents approved by the IESG are a candidate for any level of
Internet Standard; see Section 2 of RFC 5741.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6824.
Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction ....................................................41.1. Design Assumptions .........................................41.2. Multipath TCP in the Networking Stack ......................51.3. Terminology ................................................61.4. MPTCP Concept ..............................................71.5. Requirements Language ......................................82. Operation Overview ..............................................82.1. Initiating an MPTCP Connection .............................92.2. Associating a New Subflow with an Existing MPTCP
Connection .................................................92.3. Informing the Other Host about Another Potential Address ..102.4. Data Transfer Using MPTCP .................................112.5. Requesting a Change in a Path's Priority ..................112.6. Closing an MPTCP Connection ...............................122.7. Notable Features ..........................................123. MPTCP Protocol .................................................123.1. Connection Initiation .....................................143.2. Starting a New Subflow ....................................183.3. General MPTCP Operation ...................................233.3.1. Data Sequence Mapping ..............................253.3.2. Data Acknowledgments ...............................283.3.3. Closing a Connection ...............................293.3.4. Receiver Considerations ............................303.3.5. Sender Considerations ..............................313.3.6. Reliability and Retransmissions ....................323.3.7. Congestion Control Considerations ..................333.3.8. Subflow Policy .....................................343.4. Address Knowledge Exchange (Path Management) ..............353.4.1. Address Advertisement ..............................363.4.2. Remove Address .....................................393.5. Fast Close ................................................40
3.6. Fallback ..................................................413.7. Error Handling ............................................453.8. Heuristics ................................................453.8.1. Port Usage .........................................463.8.2. Delayed Subflow Start ..............................463.8.3. Failure Handling ...................................474. Semantic Issues ................................................485. Security Considerations ........................................496. Interactions with Middleboxes ..................................517. Acknowledgments ................................................558. IANA Considerations ............................................559. References .....................................................579.1. Normative References ......................................579.2. Informative References ....................................57Appendix A. Notes on Use of TCP Options ...........................59Appendix B. Control Blocks ........................................60B.1. MPTCP Control Block .......................................60B.1.1. Authentication and Metadata ........................60B.1.2. Sending Side .......................................61B.1.3. Receiving Side .....................................61B.2. TCP Control Blocks ........................................62B.2.1. Sending Side .......................................62B.2.2. Receiving Side .....................................62Appendix C. Finite State Machine ..................................63
1. Introduction
Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to
provide a Multipath TCP [2] service, which enables a transport
connection to operate across multiple paths simultaneously. This
document presents the protocol changes required to add multipath
capability to TCP; specifically, those for signaling and setting up
multiple paths ("subflows"), managing these subflows, reassembly of
data, and termination of sessions. This is not the only information
required to create a Multipath TCP implementation, however. This
document is complemented by three others:
o Architecture [2], which explains the motivations behind Multipath
TCP, contains a discussion of high-level design decisions on which
this design is based, and an explanation of a functional
separation through which an extensible MPTCP implementation can be
developed.
o Congestion control [5] presents a safe congestion control
algorithm for coupling the behavior of the multiple paths in order
to "do no harm" to other network users.
o Application considerations [6] discusses what impact MPTCP will
have on applications, what applications will want to do with
MPTCP, and as a consequence of these factors, what API extensions
an MPTCP implementation should present.
1.1. Design Assumptions
In order to limit the potentially huge design space, the working
group imposed two key constraints on the Multipath TCP design
presented in this document:
o It must be backwards-compatible with current, regular TCP, to
increase its chances of deployment.
o It can be assumed that one or both hosts are multihomed and
multiaddressed.
To simplify the design, we assume that the presence of multiple
addresses at a host is sufficient to indicate the existence of
multiple paths. These paths need not be entirely disjoint: they may
share one or many routers between them. Even in such a situation,
making use of multiple paths is beneficial, improving resource
utilization and resilience to a subset of node failures. The
congestion control algorithms defined in [5] ensure this does not act
detrimentally. Furthermore, there may be some scenarios where
different TCP ports on a single host can provide disjoint paths (such
as through certain Equal-Cost Multipath (ECMP) implementations [7]),
and so the MPTCP design also supports the use of ports in path
identifiers.
There are three aspects to the backwards-compatibility listed above
(discussed in more detail in [2]):
External Constraints: The protocol must function through the vast
majority of existing middleboxes such as NATs, firewalls, and
proxies, and as such must resemble existing TCP as far as possible
on the wire. Furthermore, the protocol must not assume the
segments it sends on the wire arrive unmodified at the
destination: they may be split or coalesced; TCP options may be
removed or duplicated.
Application Constraints: The protocol must be usable with no change
to existing applications that use the common TCP API (although it
is reasonable that not all features would be available to such
legacy applications). Furthermore, the protocol must provide the
same service model as regular TCP to the application.
Fallback: The protocol should be able to fall back to standard TCP
with no interference from the user, to be able to communicate with
legacy hosts.
The complementary application considerations document [6] discusses
the necessary features of an API to provide backwards-compatibility,
as well as API extensions to convey the behavior of MPTCP at a level
of control and information equivalent to that available with regular,
single-path TCP.
Further discussion of the design constraints and associated design
decisions are given in the MPTCP Architecture document [2] and in
[8].
1.2. Multipath TCP in the Networking Stack
MPTCP operates at the transport layer and aims to be transparent to
both higher and lower layers. It is a set of additional features on
top of standard TCP; Figure 1 illustrates this layering. MPTCP is
designed to be usable by legacy applications with no changes;
detailed discussion of its interactions with applications is given in
[6].
+-------------------------------+
| Application |
+---------------+ +-------------------------------+
| Application | | MPTCP |
+---------------+ + - - - - - - - + - - - - - - - +
| TCP | | Subflow (TCP) | Subflow (TCP) |
+---------------+ +-------------------------------+
| IP | | IP | IP |
+---------------+ +-------------------------------+
Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks1.3. Terminology
This document makes use of a number of terms that are either MPTCP-
specific or have defined meaning in the context of MPTCP, as follows:
Path: A sequence of links between a sender and a receiver, defined
in this context by a 4-tuple of source and destination address/
port pairs.
Subflow: A flow of TCP segments operating over an individual path,
which forms part of a larger MPTCP connection. A subflow is
started and terminated similar to a regular TCP connection.
(MPTCP) Connection: A set of one or more subflows, over which an
application can communicate between two hosts. There is a one-to-
one mapping between a connection and an application socket.
Data-level: The payload data is nominally transferred over a
connection, which in turn is transported over subflows. Thus, the
term "data-level" is synonymous with "connection level", in
contrast to "subflow-level", which refers to properties of an
individual subflow.
Token: A locally unique identifier given to a multipath connection
by a host. May also be referred to as a "Connection ID".
Host: An end host operating an MPTCP implementation, and either
initiating or accepting an MPTCP connection.
In addition to these terms, note that MPTCP's interpretation of, and
effect on, regular single-path TCP semantics are discussed in
Section 4.
1.4. MPTCP Concept
This section provides a high-level summary of normal operation of
MPTCP, and is illustrated by the scenario shown in Figure 2. A
detailed description of operation is given in Section 3.
o To a non-MPTCP-aware application, MPTCP will behave the same as
normal TCP. Extended APIs could provide additional control to
MPTCP-aware applications [6]. An application begins by opening a
TCP socket in the normal way. MPTCP signaling and operation are
handled by the MPTCP implementation.
o An MPTCP connection begins similarly to a regular TCP connection.
This is illustrated in Figure 2 where an MPTCP connection is
established between addresses A1 and B1 on Hosts A and B,
respectively.
o If extra paths are available, additional TCP sessions (termed
MPTCP "subflows") are created on these paths, and are combined
with the existing session, which continues to appear as a single
connection to the applications at both ends. The creation of the
additional TCP session is illustrated between Address A2 on Host A
and Address B1 on Host B.
o MPTCP identifies multiple paths by the presence of multiple
addresses at hosts. Combinations of these multiple addresses
equate to the additional paths. In the example, other potential
paths that could be set up are A1<->B2 and A2<->B2. Although this
additional session is shown as being initiated from A2, it could
equally have been initiated from B1.
o The discovery and setup of additional subflows will be achieved
through a path management method; this document describes a
mechanism by which a host can initiate new subflows by using its
own additional addresses, or by signaling its available addresses
to the other host.
o MPTCP adds connection-level sequence numbers to allow the
reassembly of segments arriving on multiple subflows with
differing network delays.
o Subflows are terminated as regular TCP connections, with a four-
way FIN handshake. The MPTCP connection is terminated by a
connection-level FIN.
Host A Host B
------------------------ ------------------------
Address A1 Address A2 Address B1 Address B2
---------- ---------- ---------- ----------
| | | |
| (initial connection setup) | |
|----------------------------------->| |
|<-----------------------------------| |
| | | |
| (additional subflow setup) |
| |--------------------->| |
| |<---------------------| |
| | | |
| | | |
Figure 2: Example MPTCP Usage Scenario1.5. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [3].
2. Operation Overview
This section presents a single description of common MPTCP operation,
with reference to the protocol operation. This is a high-level
overview of the key functions; the full specification follows in
Section 3. Extensibility and negotiated features are not discussed
here. Considerable reference is made to symbolic names of MPTCP
options throughout this section -- these are subtypes of the IANA-
assigned MPTCP option (see Section 8), and their formats are defined
in the detailed protocol specification that follows in Section 3.
A Multipath TCP connection provides a bidirectional bytestream
between two hosts communicating like normal TCP and, thus, does not
require any change to the applications. However, Multipath TCP
enables the hosts to use different paths with different IP addresses
to exchange packets belonging to the MPTCP connection. A Multipath
TCP connection appears like a normal TCP connection to an
application. However, to the network layer, each MPTCP subflow looks
like a regular TCP flow whose segments carry a new TCP option type.
Multipath TCP manages the creation, removal, and utilization of these
subflows to send data. The number of subflows that are managed
within a Multipath TCP connection is not fixed and it can fluctuate
during the lifetime of the Multipath TCP connection.
All MPTCP operations are signaled with a TCP option -- a single
numerical type for MPTCP, with "sub-types" for each MPTCP message.
What follows is a summary of the purpose and rationale of these
messages.
2.1. Initiating an MPTCP Connection
This is the same signaling as for initiating a normal TCP connection,
but the SYN, SYN/ACK, and ACK packets also carry the MP_CAPABLE
option. This is variable length and serves multiple purposes.
Firstly, it verifies whether the remote host supports Multipath TCP;
secondly, this option allows the hosts to exchange some information
to authenticate the establishment of additional subflows. Further
details are given in Section 3.1.
Host A Host B
------ ------
MP_CAPABLE ->
[A's key, flags]
<- MP_CAPABLE
[B's key, flags]
ACK + MP_CAPABLE ->
[A's key, B's key, flags]
2.2. Associating a New Subflow with an Existing MPTCP Connection
The exchange of keys in the MP_CAPABLE handshake provides material
that can be used to authenticate the endpoints when new subflows will
be set up. Additional subflows begin in the same way as initiating a
normal TCP connection, but the SYN, SYN/ACK, and ACK packets also
carry the MP_JOIN option.
Host A initiates a new subflow between one of its addresses and one
of Host B's addresses. The token -- generated from the key -- is
used to identify which MPTCP connection it is joining, and the HMAC
is used for authentication. The Hash-based Message Authentication
Code (HMAC) uses the keys exchanged in the MP_CAPABLE handshake, and
the random numbers (nonces) exchanged in these MP_JOIN options.
MP_JOIN also contains flags and an Address ID that can be used to
refer to the source address without the sender needing to know if it
has been changed by a NAT. Further details are in Section 3.2.
Host A Host B
------ ------
MP_JOIN ->
[B's token, A's nonce,
A's Address ID, flags]
<- MP_JOIN
[B's HMAC, B's nonce,
B's Address ID, flags]
ACK + MP_JOIN ->
[A's HMAC]
<- ACK
2.3. Informing the Other Host about Another Potential Address
The set of IP addresses associated to a multihomed host may change
during the lifetime of an MPTCP connection. MPTCP supports the
addition and removal of addresses on a host both implicitly and
explicitly. If Host A has established a subflow starting at address
IP#-A1 and wants to open a second subflow starting at address IP#-A2,
it simply initiates the establishment of the subflow as explained
above. The remote host will then be implicitly informed about the
new address.
In some circumstances, a host may want to advertise to the remote
host the availability of an address without establishing a new
subflow, for example, when a NAT prevents setup in one direction. In
the example below, Host A informs Host B about its alternative IP
address (IP#-A2). Host B may later send an MP_JOIN to this new
address. Due to the presence of middleboxes that may translate IP
addresses, this option uses an address identifier to unambiguously
identify an address on a host. Further details are in Section 3.4.1.
Host A Host B
------ ------
ADD_ADDR ->
[IP#-A2,
IP#-A2's Address ID]
There is a corresponding signal for address removal, making use of
the Address ID that is signaled in the add address handshake.
Further details in Section 3.4.2.
Host A Host B
------ ------
REMOVE_ADDR ->
[IP#-A2's Address ID]
2.4. Data Transfer Using MPTCP
To ensure reliable, in-order delivery of data over subflows that may
appear and disappear at any time, MPTCP uses a 64-bit data sequence
number (DSN) to number all data sent over the MPTCP connection. Each
subflow has its own 32-bit sequence number space and an MPTCP option
maps the subflow sequence space to the data sequence space. In this
way, data can be retransmitted on different subflows (mapped to the
same DSN) in the event of failure.
The "Data Sequence Signal" carries the "Data Sequence Mapping". The
data sequence mapping consists of the subflow sequence number, data
sequence number, and length for which this mapping is valid. This
option can also carry a connection-level acknowledgment (the "Data
ACK") for the received DSN.
With MPTCP, all subflows share the same receive buffer and advertise
the same receive window. There are two levels of acknowledgment in
MPTCP. Regular TCP acknowledgments are used on each subflow to
acknowledge the reception of the segments sent over the subflow
independently of their DSN. In addition, there are connection-level
acknowledgments for the data sequence space. These acknowledgments
track the advancement of the bytestream and slide the receiving
window.
Further details are in Section 3.3.
Host A Host B
------ ------
DATA_SEQUENCE_SIGNAL ->
[Data Sequence Mapping]
[Data ACK]
[Checksum]
2.5. Requesting a Change in a Path's Priority
Hosts can indicate at initial subflow setup whether they wish the
subflow to be used as a regular or backup path -- a backup path only
being used if there are no regular paths available. During a
connection, Host A can request a change in the priority of a subflow
through the MP_PRIO signal to Host B. Further details are in
Section 3.3.8.
Host A Host B
------ ------
MP_PRIO ->
2.6. Closing an MPTCP Connection
When Host A wants to inform Host B that it has no more data to send,
it signals this "Data FIN" as part of the Data Sequence Signal (see
above). It has the same semantics and behavior as a regular TCP FIN,
but at the connection level. Once all the data on the MPTCP
connection has been successfully received, then this message is
acknowledged at the connection level with a DATA_ACK. Further
details are in Section 3.3.3.
Host A Host B
------ ------
DATA_SEQUENCE_SIGNAL ->
[Data FIN]
<- (MPTCP DATA_ACK)
2.7. Notable Features
It is worth highlighting that MPTCP's signaling has been designed
with several key requirements in mind:
o To cope with NATs on the path, addresses are referred to by
Address IDs, in case the IP packet's source address gets changed
by a NAT. Setting up a new TCP flow is not possible if the
passive opener is behind a NAT; to allow subflows to be created
when either end is behind a NAT, MPTCP uses the ADD_ADDR message.
o MPTCP falls back to ordinary TCP if MPTCP operation is not
possible, for example, if one host is not MPTCP capable or if a
middlebox alters the payload.
o To meet the threats identified in [9], the following steps are
taken: keys are sent in the clear in the MP_CAPABLE messages;
MP_JOIN messages are secured with HMAC-SHA1 ([10], [4]) using
those keys; and standard TCP validity checks are made on the other
messages (ensuring sequence numbers are in-window).