3.4. Address Knowledge Exchange (Path Management)
We use the term "path management" to refer to the exchange of
information about additional paths between hosts, which in this
design is managed by multiple addresses at hosts. For more detail of
the architectural thinking behind this design, see the MPTCP
Architecture document .
This design makes use of two methods of sharing such information, and
both can be used on a connection. The first is the direct setup of
new subflows, already described in Section 3.2, where the initiator
has an additional address. The second method, described in the
following subsections, signals addresses explicitly to the other host
to allow it to initiate new subflows. The two mechanisms are
complementary: the first is implicit and simple, while the explicit
is more complex but is more robust. Together, the mechanisms allow
addresses to change in flight (and thus support operation through
NATs, since the source address need not be known), and also allow the
signaling of previously unknown addresses, and of addresses belonging
to other address families (e.g., both IPv4 and IPv6).
Here is an example of typical operation of the protocol:
o An MPTCP connection is initially set up between address/port A1 of
Host A and address/port B1 of Host B. If Host A is multihomed and
multiaddressed, it can start an additional subflow from its
address A2 to B1, by sending a SYN with a Join option from A2 to
B1, using B's previously declared token for this connection.
Alternatively, if B is multihomed, it can try to set up a new
subflow from B2 to A1, using A's previously declared token. In
either case, the SYN will be sent to the port already in use for
the original subflow on the receiving host.
o Simultaneously (or after a timeout), an ADD_ADDR option
(Section 3.4.1) is sent on an existing subflow, informing the
receiver of the sender's alternative address(es). The recipient
can use this information to open a new subflow to the sender's
additional address. In our example, A will send ADD_ADDR option
informing B of address/port A2. The mix of using the SYN-based
option and the ADD_ADDR option, including timeouts, is
implementation specific and can be tailored to agree with local
o If subflow A2-B1 is successfully set up, Host B can use the
Address ID in the Join option to correlate this with the ADD_ADDR
option that will also arrive on an existing subflow; now B knows
not to open A2-B1, ignoring the ADD_ADDR. Otherwise, if B has not
received the A2-B1 MP_JOIN SYN but received the ADD_ADDR, it can
try to initiate a new subflow from one or more of its addresses to
address A2. This permits new sessions to be opened if one host is
behind a NAT.
Other ways of using the two signaling mechanisms are possible; for
instance, signaling addresses in other address families can only be
done explicitly using the Add Address option.
3.4.1. Address Advertisement
The Add Address (ADD_ADDR) TCP option announces additional addresses
(and optionally, ports) on which a host can be reached (Figure 12).
Multiple instances of this TCP option can be added in a single
message if there is sufficient TCP option space; otherwise, multiple
TCP messages containing this option will be sent. This option can be
used at any time during a connection, depending on when the sender
wishes to enable multiple paths and/or when paths become available.
As with all MPTCP signals, the receiver MUST undertake standard TCP
validity checks before acting upon it.
Every address has an Address ID that can be used for uniquely
identifying the address within a connection for address removal.
This is also used to identify MP_JOIN options (see Section 3.2)
relating to the same address, even when address translators are in
use. The Address ID MUST uniquely identify the address to the sender
(within the scope of the connection), but the mechanism for
allocating such IDs is implementation specific.
All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be
stored by the receiver in a data structure that gathers all the
Address ID to address mappings for a connection (identified by a
token pair). In this way, there is a stored mapping between Address
ID, observed source address, and token pair for future processing of
control information for a connection. Note that an implementation
MAY discard incoming address advertisements at will, for example, for
avoiding the required mapping state, or because advertised addresses
are of no use to it (for example, IPv6 addresses when it has IPv4
only). Therefore, a host MUST treat address advertisements as soft
state, and it MAY choose to refresh advertisements periodically.
This option is shown in Figure 12. The illustration is sized for
IPv4 addresses (IPVer = 4). For IPv6, the IPVer field will read 6,
and the length of the address will be 16 octets (instead of 4).
The presence of the final 2 octets, specifying the TCP port number to
use, are optional and can be inferred from the length of the option.
Although it is expected that the majority of use cases will use the
same port pairs as used for the initial subflow (e.g., port 80
remains port 80 on all subflows, as does the ephemeral port at the
client), there may be cases (such as port-based load balancing) where
the explicit specification of a different port is required. If no
port is specified, MPTCP SHOULD attempt to connect to the specified
address on the same port as is already in use by the subflow on which
the ADD_ADDR signal was sent; this is discussed in more detail in
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| Kind | Length |Subtype| IPVer | Address ID |
| Address (IPv4 - 4 octets / IPv6 - 16 octets) |
| Port (2 octets, optional) |
Figure 12: Add Address (ADD_ADDR) Option
Due to the proliferation of NATs, it is reasonably likely that one
host may attempt to advertise private addresses . It is not
desirable to prohibit this, since there may be cases where both hosts
have additional interfaces on the same private network, and a host
MAY want to advertise such addresses. The MP_JOIN handshake to
create a new subflow (Section 3.2) provides mechanisms to minimize
security risks. The MP_JOIN message contains a 32-bit token that
uniquely identifies the connection to the receiving host. If the
token is unknown, the host will return with a RST. In the unlikely
event that the token is known, subflow setup will continue, but the
HMAC exchange must occur for authentication. This will fail, and
will provide sufficient protection against two unconnected hosts
accidentally setting up a new subflow upon the signal of a private
address. Further security considerations around the issue of
ADD_ADDR messages that accidentally misdirect, or maliciously direct,
new MP_JOIN attempts are discussed in Section 5.
Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and
in order, to the other end. This would ensure that this address
management does not unnecessarily cause an outage in the connection
when remove/add addresses are processed in reverse order, and also to
ensure that all possible paths are used. Note, however, that losing
reliability and ordering will not break the multipath connections, it
will just reduce the opportunity to open multipath paths and to
survive different patterns of path failures.
Therefore, implementing reliability signals for these TCP options is
not necessary. In order to minimize the impact of the loss of these
options, however, it is RECOMMENDED that a sender should send these
options on all available subflows. If these options need to be
received in order, an implementation SHOULD only send one ADD_ADDR/
REMOVE_ADDR option per RTT, to minimize the risk of misordering.
A host can send an ADD_ADDR message with an already assigned Address
ID, but the Address MUST be the same as previously assigned to this
Address ID, and the Port MUST be different from one already in use
for this Address ID. If these conditions are not met, the receiver
SHOULD silently ignore the ADD_ADDR. A host wishing to replace an
existing Address ID MUST first remove the existing one
A host that receives an ADD_ADDR but finds a connection set up to
that IP address and port number is unsuccessful SHOULD NOT perform
further connection attempts to this address/port combination for this
connection. A sender that wants to trigger a new incoming connection
attempt on a previously advertised address/port combination can
therefore refresh ADD_ADDR information by sending the option again.
During normal MPTCP operation, it is unlikely that there will be
sufficient TCP option space for ADD_ADDR to be included along with
those for data sequence numbering (Section 3.3.1). Therefore, it is
expected that an MPTCP implementation will send the ADD_ADDR option
on separate ACKs. As discussed earlier, however, an MPTCP
implementation MUST NOT treat duplicate ACKs with any MPTCP option,
with the exception of the DSS option, as indications of congestion
, and an MPTCP implementation SHOULD NOT send more than two
duplicate ACKs in a row for signaling purposes.
3.4.2. Remove Address
If, during the lifetime of an MPTCP connection, a previously
announced address becomes invalid (e.g., if the interface
disappears), the affected host SHOULD announce this so that the peer
can remove subflows related to this address.
This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 13), which will remove a previously added address (or list of
addresses) from a connection and terminate any subflows currently
using that address.
For security purposes, if a host receives a REMOVE_ADDR option, it
must ensure the affected path(s) are no longer in use before it
instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger
the sending of a TCP keepalive  on the path, and if a response is
received the path SHOULD NOT be removed. Typical TCP validity tests
on the subflow (e.g., ensuring sequence and ACK numbers are correct)
MUST also be undertaken. An implementation can use indications of
these test failures as part of intrusion detection or error logging.
The sending and receipt (if no keepalive response was received) of
this message SHOULD trigger the sending of RSTs by both hosts on the
affected subflow(s) (if possible), as a courtesy to cleaning up
middlebox state, before cleaning up any local state.
Address removal is undertaken by ID, so as to permit the use of NATs
and other middleboxes that rewrite source addresses. If there is no
address at the requested ID, the receiver will silently ignore the
A subflow that is still functioning MUST be closed with a FIN
exchange as in regular TCP, rather than using this option. For more
information, see Section 3.3.3.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| Kind | Length = 3+n |Subtype|(resvd)| Address ID | ...
(followed by n-1 Address IDs, if required)
Figure 13: Remove Address (REMOVE_ADDR) Option3.5. Fast Close
Regular TCP has the means of sending a reset (RST) signal to abruptly
close a connection. With MPTCP, the RST only has the scope of the
subflow and will only close the concerned subflow but not affect the
remaining subflows. MPTCP's connection will stay alive at the data
level, in order to permit break-before-make handover between
subflows. It is therefore necessary to provide an MPTCP-level
"reset" to allow the abrupt closure of the whole MPTCP connection,
and this is the MP_FASTCLOSE option.
MP_FASTCLOSE is used to indicate to the peer that the connection will
be abruptly closed and no data will be accepted anymore. The reasons
for triggering an MP_FASTCLOSE are implementation specific. Regular
TCP does not allow sending a RST while the connection is in a
synchronized state . Nevertheless, implementations allow the
sending of a RST in this state, if, for example, the operating system
is running out of resources. In these cases, MPTCP should send the
MP_FASTCLOSE. This option is illustrated in Figure 14.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| Kind | Length |Subtype| (reserved) |
| Option Receiver's Key |
| (64 bits) |
Figure 14: Fast Close (MP_FASTCLOSE) Option
If Host A wants to force the closure of an MPTCP connection, the
MPTCP Fast Close procedure is as follows:
o Host A sends an ACK containing the MP_FASTCLOSE option on one
subflow, containing the key of Host B as declared in the initial
connection handshake. On all the other subflows, Host A sends a
regular TCP RST to close these subflows, and tears them down.
Host A now enters FASTCLOSE_WAIT state.
o Upon receipt of an MP_FASTCLOSE, containing the valid key, Host B
answers on the same subflow with a TCP RST and tears down all
subflows. Host B can now close the whole MPTCP connection (it
transitions directly to CLOSED state).
o As soon as Host A has received the TCP RST on the remaining
subflow, it can close this subflow and tear down the whole
connection (transition from FASTCLOSE_WAIT to CLOSED states). If
Host A receives an MP_FASTCLOSE instead of a TCP RST, both hosts
attempted fast closure simultaneously. Host A should reply with a
TCP RST and tear down the connection.
o If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE
after one retransmission timeout (RTO) (the RTO of the subflow
where the MPTCP_RST has been sent), it SHOULD retransmit the
MP_FASTCLOSE. The number of retransmissions SHOULD be limited to
avoid this connection from being retained for a long time, but
this limit is implementation specific. A RECOMMENDED number is 3.
Sometimes, middleboxes will exist on a path that could prevent the
operation of MPTCP. MPTCP has been designed in order to cope with
many middlebox modifications (see Section 6), but there are still
some cases where a subflow could fail to operate within the MPTCP
requirements. These cases are notably the following: the loss of TCP
options on a path and the modification of payload data. If such an
event occurs, it is necessary to "fall back" to the previous, safe
operation. This may be either falling back to regular TCP or
removing a problematic subflow.
At the start of an MPTCP connection (i.e., the first subflow), it is
important to ensure that the path is fully MPTCP capable and the
necessary TCP options can reach each host. The handshake as
described in Section 3.1 SHOULD fall back to regular TCP if either of
the SYN messages do not have the MPTCP options: this is the same, and
desired, behavior in the case where a host is not MPTCP capable, or
the path does not support the MPTCP options. When attempting to join
an existing MPTCP connection (Section 3.2), if a path is not MPTCP
capable and the TCP options do not get through on the SYNs, the
subflow will be closed according to the MP_JOIN logic.
There is, however, another corner case that should be addressed.
That is one of MPTCP options getting through on the SYN, but not on
regular packets. This can be resolved if the subflow is the first
subflow, and thus all data in flight is contiguous, using the
A sender MUST include a DSS option with data sequence mapping in
every segment until one of the sent segments has been acknowledged
with a DSS option containing a Data ACK. Upon reception of the
acknowledgment, the sender has the confirmation that the DSS option
passes in both directions and may choose to send fewer DSS options
than once per segment.
If, however, an ACK is received for data (not just for the SYN)
without a DSS option containing a Data ACK, the sender determines the
path is not MPTCP capable. In the case of this occurring on an
additional subflow (i.e., one started with MP_JOIN), the host MUST
close the subflow with a RST. In the case of the first subflow
(i.e., that started with MP_CAPABLE), it MUST drop out of an MPTCP
mode back to regular TCP. The sender will send one final data
sequence mapping, with the Data-Level Length value of 0 indicating an
infinite mapping (in case the path drops options in one direction
only), and then revert to sending data on the single subflow without
any MPTCP options.
Note that this rule essentially prohibits the sending of data on the
third packet of an MP_CAPABLE or MP_JOIN handshake, since both that
option and a DSS cannot fit in TCP option space. If the initiator is
to send first, another segment must be sent that contains the data
and DSS. Note also that an additional subflow cannot be used until
the initial path has been verified as MPTCP capable.
These rules should cover all cases where such a failure could happen:
whether it's on the forward or reverse path and whether the server or
the client first sends data. If lost options on data packets occur
on any other subflow apart from the initial subflow, it should be
treated as a standard path failure. The data would not be DATA_ACKed
(since there is no mapping for the data), and the subflow can be
closed with a RST.
The case described above is a specialized case of fallback, for when
the lack of MPTCP support is detected before any data is acknowledged
at the connection level on a subflow. More generally, fallback
(either closing a subflow, or to regular TCP) can become necessary at
any point during a connection if a non-MPTCP-aware middlebox changes
the data stream.
As described in Section 3.3, each portion of data for which there is
a mapping is protected by a checksum. This mechanism is used to
detect if middleboxes have made any adjustments to the payload
(added, removed, or changed data). A checksum will fail if the data
has been changed in any way. This will also detect if the length of
data on the subflow is increased or decreased, and this means the
data sequence mapping is no longer valid. The sender no longer knows
what subflow-level sequence number the receiver is genuinely
operating at (the middlebox will be faking ACKs in return), and it
cannot signal any further mappings. Furthermore, in addition to the
possibility of payload modifications that are valid at the
application layer, there is the possibility that false positives
could be hit across MPTCP segment boundaries, corrupting the data.
Therefore, all data from the start of the segment that failed the
checksum onwards is not trustworthy.
When multiple subflows are in use, the data in flight on a subflow
will likely involve data that is not contiguously part of the
connection-level stream, since segments will be spread across the
multiple subflows. Due to the problems identified above, it is not
possible to determine what the adjustment has done to the data
(notably, any changes to the subflow sequence numbering). Therefore,
it is not possible to recover the subflow, and the affected subflow
must be immediately closed with a RST, featuring an MP_FAIL option
(Figure 15), which defines the data sequence number at the start of
the segment (defined by the data sequence mapping) that had the
checksum failure. Note that the MP_FAIL option requires the use of
the full 64-bit sequence number, even if 32-bit sequence numbers are
normally in use in the DSS signals on the path.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| Kind | Length=12 |Subtype| (reserved) |
| Data Sequence Number (8 octets) |
Figure 15: Fallback (MP_FAIL) Option
The receiver MUST discard all data following the data sequence number
specified. Failed data MUST NOT be DATA_ACKed and so will be
retransmitted on other subflows (Section 3.3.6).
A special case is when there is a single subflow and it fails with a
checksum error. If it is known that all unacknowledged data in
flight is contiguous (which will usually be the case with a single
subflow), an infinite mapping can be applied to the subflow without
the need to close it first, and essentially turn off all further
MPTCP signaling. In this case, if a receiver identifies a checksum
failure when there is only one path, it will send back an MP_FAIL
option on the subflow-level ACK, referring to the data-level sequence
number of the start of the segment on which the checksum error was
detected. The sender will receive this, and if all unacknowledged
data in flight is contiguous, will signal an infinite mapping. This
infinite mapping will be a DSS option (Section 3.3) on the first new
packet, containing a data sequence mapping that acts retroactively,
referring to the start of the subflow sequence number of the last
segment that was known to be delivered intact. From that point
onwards, data can be altered by a middlebox without affecting MPTCP,
as the data stream is equivalent to a regular, legacy TCP session.
In the rare case that the data is not contiguous (which could happen
when there is only one subflow but it is retransmitting data from a
subflow that has recently been uncleanly closed), the receiver MUST
close the subflow with a RST with MP_FAIL. The receiver MUST discard
all data that follows the data sequence number specified. The sender
MAY attempt to create a new subflow belonging to the same connection,
and, if it chooses to do so, SHOULD place the single subflow
immediately in single-path mode by setting an infinite data sequence
mapping. This mapping will begin from the data-level sequence number
that was declared in the MP_FAIL.
After a sender signals an infinite mapping, it MUST only use subflow
ACKs to clear its send buffer. This is because Data ACKs may become
misaligned with the subflow ACKs when middleboxes insert or delete
data. The receive SHOULD stop generating Data ACKs after it receives
an infinite mapping.
When a connection has fallen back, only one subflow can send data;
otherwise, the receiver would not know how to reorder the data. In
practice, this means that all MPTCP subflows will have to be
terminated except one. Once MPTCP falls back to regular TCP, it MUST
NOT revert to MPTCP later in the connection.
It should be emphasized that we are not attempting to prevent the use
of middleboxes that want to adjust the payload. An MPTCP-aware
middlebox could provide such functionality by also rewriting
3.7. Error Handling
In addition to the fallback mechanism as described above, the
standard classes of TCP errors may need to be handled in an MPTCP-
specific way. Note that changing semantics -- such as the relevance
of a RST -- are covered in Section 4. Where possible, we do not want
to deviate from regular TCP behavior.
The following list covers possible errors and the appropriate MPTCP
o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or
missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
behavior on an unknown port)
o DSN out of window (during normal operation): drop the data, do not
send Data ACKs
o Remove request for unknown address ID: silently ignore
There are a number of heuristics that are needed for performance or
deployment but that are not required for protocol correctness. In
this section, we detail such heuristics. Note that discussion of
buffering and certain sender and receiver window behaviors are
presented in Sections 3.3.4 and 3.3.5, as well as retransmission in
3.8.1. Port Usage
Under typical operation, an MPTCP implementation SHOULD use the same
ports as already in use. In other words, the destination port of a
SYN containing an MP_JOIN option SHOULD be the same as the remote
port of the first subflow in the connection. The local port for such
SYNs SHOULD also be the same as for the first subflow (and as such,
an implementation SHOULD reserve ephemeral ports across all local IP
addresses), although there may be cases where this is infeasible.
This strategy is intended to maximize the probability of the SYN
being permitted by a firewall or NAT at the recipient and to avoid
confusing any network monitoring software.
There may also be cases, however, where the passive opener wishes to
signal to the other host that a specific port should be used, and
this facility is provided in the Add Address option as documented in
Section 3.4.1. It is therefore feasible to allow multiple subflows
between the same two addresses but using different port pairs, and
such a facility could be used to allow load balancing within the
network based on 5-tuples (e.g., some ECMP implementations ).
3.8.2. Delayed Subflow Start
Many TCP connections are short-lived and consist only of a few
segments, and so the overheads of using MPTCP outweigh any benefits.
A heuristic is required, therefore, to decide when to start using
additional subflows in an MPTCP connection. We expect that
experience gathered from deployments will provide further guidance on
this, and will be affected by particular application characteristics
(which are likely to change over time). However, a suggested
general-purpose heuristic that an implementation MAY choose to employ
is as follows. Results from experimental deployments are needed in
order to verify the correctness of this proposal.
If a host has data buffered for its peer (which implies that the
application has received a request for data), the host opens one
subflow for each initial window's worth of data that is buffered.
Consideration should also be given to limiting the rate of adding new
subflows, as well as limiting the total number of subflows open for a
particular connection. A host may choose to vary these values based
on its load or knowledge of traffic and path characteristics.
Note that this heuristic alone is probably insufficient. Traffic for
many common applications, such as downloads, is highly asymmetric and
the host that is multihomed may well be the client that will never
fill its buffers, and thus never use MPTCP. Advanced APIs that allow
an application to signal its traffic requirements would aid in these
An additional time-based heuristic could be applied, opening
additional subflows after a given period of time has passed. This
would alleviate the above issue, and also provide resilience for low-
bandwidth but long-lived applications.
This section has shown some of the considerations that an implementer
should give when developing MPTCP heuristics, but is not intended to
3.8.3. Failure Handling
Requirements for MPTCP's handling of unexpected signals have been
given in Section 3.7. There are other failure cases, however, where
a hosts can choose appropriate behavior.
For example, Section 3.1 suggests that a host SHOULD fall back to
trying regular TCP SYNs after one or more failures of MPTCP SYNs for
a connection. A host may keep a system-wide cache of such
information, so that it can back off from using MPTCP, firstly for
that particular destination host, and eventually on a whole
interface, if MPTCP connections continue failing.
Another failure could occur when the MP_JOIN handshake fails.
Section 3.7 specifies that an incorrect handshake MUST lead to the
subflow being closed with a RST. A host operating an active
intrusion detection system may choose to start blocking MP_JOIN
packets from the source host if multiple failed MP_JOIN attempts are
seen. From the connection initiator's point of view, if an MP_JOIN
fails, it SHOULD NOT attempt to connect to the same IP address and
port during the lifetime of the connection, unless the other host
refreshes the information with another ADD_ADDR option. Note that
the ADD_ADDR option is informational only, and does not guarantee the
other host will attempt a connection.
In addition, an implementation may learn, over a number of
connections, that certain interfaces or destination addresses
consistently fail and may default to not trying to use MPTCP for
these. Behavior could also be learned for particularly badly
performing subflows or subflows that regularly fail during use, in
order to temporarily choose not to use these paths.
4. Semantic Issues
In order to support multipath operation, the semantics of some TCP
components have changed. To aid clarity, this section collects these
semantic changes as a reference.
Sequence number: The (in-header) TCP sequence number is specific to
the subflow. To allow the receiver to reorder application data,
an additional data-level sequence space is used. In this data-
level sequence space, the initial SYN and the final DATA_FIN
occupy 1 octet of sequence space. There is an explicit mapping of
data sequence space to subflow sequence space, which is signaled
through TCP options in data packets.
ACK: The ACK field in the TCP header acknowledges only the subflow
sequence number, not the data-level sequence space.
Implementations SHOULD NOT attempt to infer a data-level
acknowledgment from the subflow ACKs. This separates subflow- and
connection-level processing at an end host.
Duplicate ACK: A duplicate ACK that includes any MPTCP signaling
(with the exception of the DSS option) MUST NOT be treated as a
signal of congestion. To limit the chances of non-MPTCP-aware
entities mistakenly interpreting duplicate ACKs as a signal of
congestion, MPTCP SHOULD NOT send more than two duplicate ACKs
containing (non-DSS) MPTCP signals in a row.
Receive Window: The receive window in the TCP header indicates the
amount of free buffer space for the whole data-level connection
(as opposed to for this subflow) that is available at the
receiver. This is the same semantics as regular TCP, but to
maintain these semantics the receive window must be interpreted at
the sender as relative to the sequence number given in the
DATA_ACK rather than the subflow ACK in the TCP header. In this
way, the original flow control role is preserved. Note that some
middleboxes may change the receive window, and so a host SHOULD
use the maximum value of those recently seen on the constituent
subflows for the connection-level receive window, and also needs
to maintain a subflow-level window for subflow-level processing.
FIN: The FIN flag in the TCP header applies only to the subflow it
is sent on, not to the whole connection. For connection-level FIN
semantics, the DATA_FIN option is used.
RST: The RST flag in the TCP header applies only to the subflow it
is sent on, not to the whole connection. The MP_FASTCLOSE option
provides the fast close functionality of a RST at the MPTCP
Address List: Address list management (i.e., knowledge of the local
and remote hosts' lists of available IP addresses) is handled on a
per-connection basis (as opposed to per subflow, per host, or per
pair of communicating hosts). This permits the application of
per-connection local policy. Adding an address to one connection
(either explicitly through an Add Address message, or implicitly
through a Join) has no implication for other connections between
the same pair of hosts.
5-tuple: The 5-tuple (protocol, local address, local port, remote
address, remote port) presented by kernel APIs to the application
layer in a non-multipath-aware application is that of the first
subflow, even if the subflow has since been closed and removed
from the connection. This decision, and other related API issues,
are discussed in more detail in .
5. Security Considerations
As identified in , the addition of multipath capability to TCP
will bring with it a number of new classes of threat. In order to
prevent these,  presents a set of requirements for a security
solution for MPTCP. The fundamental goal is for the security of
MPTCP to be "no worse" than regular TCP today, and the key security
o Provide a mechanism to confirm that the parties in a subflow
handshake are the same as in the original connection setup.
o Provide verification that the peer can receive traffic at a new
address before using it as part of a connection.
o Provide replay protection, i.e., ensure that a request to add/
remove a subflow is 'fresh'.
In order to achieve these goals, MPTCP includes a hash-based
handshake algorithm documented in Sections 3.1 and 3.2.
The security of the MPTCP connection hangs on the use of keys that
are shared once at the start of the first subflow, and are never sent
again over the network (unless used in the fast close mechanism,
Section 3.5). To ease demultiplexing while not giving away any
cryptographic material, future subflows use a truncated cryptographic
hash of this key as the connection identification "token". The keys
are concatenated and used as keys for creating Hash-based Message
Authentication Codes (HMACs) used on subflow setup, in order to
verify that the parties in the handshake are the same as in the
original connection setup. It also provides verification that the
peer can receive traffic at this new address. Replay attacks would
still be possible when only keys are used; therefore, the handshakes
use single-use random numbers (nonces) at both ends -- this ensures
the HMAC will never be the same on two handshakes. Guidance on
generating random numbers suitable for use as keys is given in 
and discussed in Section 3.1.
The use of crypto capability bits in the initial connection handshake
to negotiate use of a particular algorithm allows the deployment of
additional crypto mechanisms in the future. Note that this would be
susceptible to bid-down attacks only if the attacker was on-path (and
thus would be able to modify the data anyway). The security
mechanism presented in this document should therefore protect against
all forms of flooding and hijacking attacks discussed in .
During normal operation, regular TCP protection mechanisms (such as
ensuring sequence numbers are in-window) will provide the same level
of protection against attacks on individual TCP subflows as exists
for regular TCP today. Implementations will introduce additional
buffers compared to regular TCP, to reassemble data at the connection
level. The application of window sizing will minimize the risk of
denial-of-service attacks consuming resources.
As discussed in Section 3.4.1, a host may advertise its private
addresses, but these might point to different hosts in the receiver's
network. The MP_JOIN handshake (Section 3.2) will ensure that this
does not succeed in setting up a subflow to the incorrect host.
However, it could still create unwanted TCP handshake traffic. This
feature of MPTCP could be a target for denial-of-service exploits,
with malicious participants in MPTCP connections encouraging the
recipient to target other hosts in the network. Therefore,
implementations should consider heuristics (Section 3.8) at both the
sender and receiver to reduce the impact of this.
A small security risk could theoretically exist with key reuse, but
in order to accomplish a replay attack, both the sender and receiver
keys, and the sender and receiver random numbers, in the MP_JOIN
handshake (Section 3.2) would have to match.
Whilst this specification defines a "medium" security solution,
meeting the criteria specified at the start of this section and the
threat analysis (), since attacks only ever get worse, it is
likely that a future Standards Track version of MPTCP would need to
be able to support stronger security. There are several ways the
security of MPTCP could potentially be improved; some of these would
be compatible with MPTCP as defined in this document, whilst others
may not be. For now, the best approach is to get experience with the
current approach, establish what might work, and check that the
threat analysis is still accurate.
Possible ways of improving MPTCP security could include:
o defining a new MPCTP cryptographic algorithm, as negotiated in
MP_CAPABLE. A sub-case could be to include an additional
deployment assumption, such as stateful servers, in order to allow
a more powerful algorithm to be used.
o defining how to secure data transfer with MPTCP, whilst not
changing the signaling part of the protocol.
o defining security that requires more option space, perhaps in
conjunction with a "long options" proposal for extending the TCP
options space (such as those surveyed in ), or perhaps
building on the current approach with a second stage of MPTCP-
o revisiting the working group's decision to exclusively use TCP
options for MPTCP signaling, and instead look at also making use
of the TCP payloads.
MPTCP has been designed with several methods available to indicate a
new security mechanism, including:
o available flags in MP_CAPABLE (Figure 4);
o available subtypes in the MPTCP option (Figure 3);
o the version field in MP_CAPABLE (Figure 4);
6. Interactions with Middleboxes
Multipath TCP was designed to be deployable in the present world.
Its design takes into account "reasonable" existing middlebox
behavior. In this section, we outline a few representative
middlebox-related failure scenarios and show how Multipath TCP
handles them. Next, we list the design decisions multipath has made
to accommodate the different middleboxes.
A primary concern is our use of a new TCP option. Middleboxes should
forward packets with unknown options unchanged, yet there are some
that don't. These we expect will either strip options and pass the
data, drop packets with new options, copy the same option into
multiple segments (e.g., when doing segmentation), or drop options
during segment coalescing.
MPTCP uses a single new TCP option "Kind", and all message types are
defined by "subtype" values (see Section 8). This should reduce the
chances of only some types of MPTCP options being passed, and instead
the key differing characteristics are different paths, and the
presence of the SYN flag.
MPTCP SYN packets on the first subflow of a connection contain the
MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD
fall back to regular TCP. If packets with the MP_JOIN option
(Section 3.2) are dropped, the paths will simply not be used.
If a middlebox strips options but otherwise passes the packets
unchanged, MPTCP will behave safely. If an MP_CAPABLE option is
dropped on either the outgoing or the return path, the initiating
host can fall back to regular TCP, as illustrated in Figure 16 and
discussed in Section 3.1.
Subflow SYNs contain the MP_JOIN option. If this option is stripped
on the outgoing path, the SYN will appear to be a regular SYN to Host
B. Depending on whether there is a listening socket on the target
port, Host B will reply either with SYN/ACK or RST (subflow
connection fails). When Host A receives the SYN/ACK it sends a RST
because the SYN/ACK does not contain the MP_JOIN option and its
token. Either way, the subflow setup fails, but otherwise does not
affect the MPTCP connection as a whole.
Host A Host B
| Middlebox M |
| | |
| SYN(MP_CAPABLE) | SYN |
| SYN/ACK |
a) MP_CAPABLE option stripped on outgoing path
Host A Host B
| SYN(MP_CAPABLE) |
| Middlebox M |
| | |
| SYN/ACK |SYN/ACK(MP_CAPABLE)|
b) MP_CAPABLE option stripped on return path
Figure 16: Connection Setup with Middleboxes that
Strip Options from Packets
We now examine data flow with MPTCP, assuming the flow is correctly
set up, which implies the options in the SYN packets were allowed
through by the relevant middleboxes. If options are allowed through
and there is no resegmentation or coalescing to TCP segments,
Multipath TCP flows can proceed without problems.
The case when options get stripped on data packets has been discussed
in the Fallback section. If a fraction of options are stripped,
behavior is not deterministic. If some data sequence mappings are
lost, the connection can continue so long as mappings exist for the
subflow-level data (e.g., if multiple maps have been sent that
reinforce each other). If some subflow-level space is left unmapped,
however, the subflow is treated as broken and is closed, through the
process described in Section 3.6. MPTCP should survive with a loss
of some Data ACKs, but performance will degrade as the fraction of
stripped options increases. We do not expect such cases to appear in
practice, though: most middleboxes will either strip all options or
let them all through.
We end this section with a list of middlebox classes, their behavior,
and the elements in the MPTCP design that allow operation through
such middleboxes. Issues surrounding dropping packets with options
or stripping options were discussed above, and are not included here:
o NATs  (Network Address (and Port) Translators) change the
source address (and often source port) of packets. This means
that a host will not know its public-facing address for signaling
in MPTCP. Therefore, MPTCP permits implicit address addition via
the MP_JOIN option, and the handshake mechanism ensures that
connection attempts to private addresses  do not cause
problems. Explicit address removal is undertaken by an Address ID
to allow no knowledge of the source address.
o Performance Enhancing Proxies (PEPs)  might proactively ACK
data to increase performance. MPTCP, however, relies on accurate
congestion control signals from the end host, and non-MPTCP-aware
PEPs will not be able to provide such signals. MPTCP will,
therefore, fall back to single-path TCP, or close the problematic
subflow (see Section 3.6).
o Traffic Normalizers  may not allow holes in sequence numbers,
and may cache packets and retransmit the same data. MPTCP looks
like standard TCP on the wire, and will not retransmit different
data on the same subflow sequence number. In the event of a
retransmission, the same data will be retransmitted on the
original TCP subflow even if it is additionally retransmitted at
the connection level on a different subflow.
o Firewalls  might perform initial sequence number randomization
on TCP connections. MPTCP uses relative sequence numbers in data
sequence mapping to cope with this. Like NATs, firewalls will not
permit many incoming connections, so MPTCP supports address
signaling (ADD_ADDR) so that a multiaddressed host can invite its
peer behind the firewall/NAT to connect out to its additional
o Intrusion Detection Systems look out for traffic patterns and
content that could threaten a network. Multipath will mean that
such data is potentially spread, so it is more difficult for an
IDS to analyze the whole traffic, and potentially increases the
risk of false positives. However, for an MPTCP-aware IDS, tokens
can be read by such systems to correlate multiple subflows and
reassemble for analysis.
o Application-level middleboxes such as content-aware firewalls may
alter the payload within a subflow, such as rewriting URIs in HTTP
traffic. MPTCP will detect these using the checksum and close the
affected subflow(s), if there are other subflows that can be used.
If all subflows are affected, multipath will fall back to TCP,
allowing such middleboxes to change the payload. MPTCP-aware
middleboxes should be able to adjust the payload and MPTCP
metadata in order not to break the connection.
In addition, all classes of middleboxes may affect TCP traffic in the
o TCP options may be removed, or packets with unknown options
dropped, by many classes of middleboxes. It is intended that the
initial SYN exchange, with a TCP option, will be sufficient to
identify the path capabilities. If such a packet does not get
through, MPTCP will end up falling back to regular TCP.
o Segmentation/Coalescing (e.g., TCP segmentation offloading) might
copy options between packets and might strip some options.
MPTCP's data sequence mapping includes the relative subflow
sequence number instead of using the sequence number in the
segment. In this way, the mapping is independent of the packets
that carry it.
o The receive window may be shrunk by some middleboxes at the
subflow level. MPTCP will use the maximum window at data level,
but will also obey subflow-specific windows.
The authors were originally supported by Trilogy
(http://www.trilogy-project.org), a research project (ICT-216372)
partially funded by the European Community under its Seventh
Alan Ford was originally supported by Roke Manor Research.
The authors gratefully acknowledge significant input into this
document from Sebastien Barre, Christoph Paasch, and Andrew McDonald.
The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
Sean Turner, Stephen Farrell, and Martin Stiemerling.
8. IANA Considerations
This document defines a new TCP option for MPTCP, assigned a value of
30 (decimal) from the TCP option space. This value is the value of
"Kind" as seen in all MPTCP options in this document. This value is
| Kind | Length | Meaning | Reference |
| 30 | N | Multipath TCP (MPTCP) | RFC 6824 |
Table 1: TCP Option Kind Numbers
This document also defines a 4-bit subtype field, for which IANA has
created and will maintain a new sub-registry entitled "MPTCP Option
Subtypes" under the "Transmission Control Protocol (TCP) Parameters"
registry. Initial values for the MPTCP option subtype registry are
given below; future assignments are to be defined by Standards Action
as defined by . Assignments consist of the MPTCP subtype's
symbolic name and its associated value, as per the following table.
| Value | Symbol | Name | Reference |
| 0x0 | MP_CAPABLE | Multipath Capable | Section 3.1 |
| 0x1 | MP_JOIN | Join Connection | Section 3.2 |
| 0x2 | DSS | Data Sequence Signal (Data | Section 3.3 |
| | | ACK and data sequence | |
| | | mapping) | |
| 0x3 | ADD_ADDR | Add Address | Section 3.4.1 |
| 0x4 | REMOVE_ADDR | Remove Address | Section 3.4.2 |
| 0x5 | MP_PRIO | Change Subflow Priority | Section 3.3.8 |
| 0x6 | MP_FAIL | Fallback | Section 3.6 |
| 0x7 | MP_FASTCLOSE | Fast Close | Section 3.5 |
Table 2: MPTCP Option Subtypes
Values 0x8 through 0xe are currently unassigned. The value 0xf is
reserved for Private Use within controlled testbeds.
IANA has created another sub-registry, "MPTCP Handshake Algorithms"
under the "Transmission Control Protocol (TCP) Parameters" registry,
based on the flags in MP_CAPABLE (Section 3.1). The flags consist of
8 bits, labeled "A" through "H", and this document assigns the bits
| Flag Bit | Meaning | Reference |
| A | Checksum required | RFC 6824, Section 3.1 |
| B | Extensibility | RFC 6824, Section 3.1 |
| C-G | Unassigned | |
| H | HMAC-SHA1 | RFC 6824, Section 3.2 |
Table 3: MPTCP Handshake Algorithms
Note that the meanings of bits C through H can be dependent upon bit
B, depending on how Extensibility is defined in future
specifications; see Section 3.1 for more information.
Future assignments in this registry are also to be defined by
Standards Action as defined by . Assignments consist of the
value of the flags, a symbolic name for the algorithm, and a
reference to its specification.