Internet Engineering Task Force (IETF) O. Bonaventure Request for Comments: 8041 UCLouvain Category: Informational C. Paasch ISSN: 2070-1721 Apple, Inc. G. Detal Tessares January 2017 Use Cases and Operational Experience with Multipath TCP
AbstractThis document discusses both use cases and operational experience with Multipath TCP (MPTCP) in real networks. It lists several prominent use cases where Multipath TCP has been considered and is being used. It also gives insight to some heuristics and decisions that have helped to realize these use cases and suggests possible improvements. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc8041.
Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. 1. Introduction ....................................................3 2. Use Cases .......................................................4 2.1. Datacenters ................................................4 2.2. Cellular/WiFi Offload ......................................5 2.3. Multipath TCP Proxies ......................................8 3. Operational Experience ..........................................9 3.1. Middlebox Interference .....................................9 3.2. Congestion Control ........................................11 3.3. Subflow Management ........................................12 3.4. Implemented Subflow Managers ..............................13 3.5. Subflow Destination Port ..................................15 3.6. Closing Subflows ..........................................16 3.7. Packet Schedulers .........................................17 3.8. Segment Size Selection ....................................18 3.9. Interactions with the Domain Name System ..................19 3.10. Captive Portals ..........................................20 3.11. Stateless Webservers .....................................20 3.12. Load-Balanced Server Farms ...............................21 4. Security Considerations ........................................21 5. References .....................................................23 5.1. Normative References ......................................23 5.2. Informative References ....................................23 Acknowledgements ..................................................30 Authors' Addresses ................................................30
RFC6824] and five independent implementations have been developed. As of November 2016, Multipath TCP has been or is being implemented on the following platforms: o Linux kernel [MultipathTCP-Linux] o Apple iOS and macOS o Citrix load balancers o FreeBSD [FreeBSD-MPTCP] o Oracle Solaris The first three implementations are known to interoperate. Three of these implementations are open source (Linux kernel, FreeBSD and Apple's iOS and macOS). Apple's implementation is widely deployed. Since the publication of [RFC6824] as an Experimental RFC, experience has been gathered by various network researchers and users about the operational issues that arise when Multipath TCP is used in today's Internet. When the MPTCP working group was created, several use cases for Multipath TCP were identified [RFC6182]. Since then, other use cases have been proposed and some have been tested and even deployed. We describe these use cases in Section 2. Section 3 focuses on the operational experience with Multipath TCP. Most of this experience comes from the utilization of the Multipath TCP implementation in the Linux kernel [MultipathTCP-Linux]. This open-source implementation has been downloaded and implemented by thousands of users all over the world. Many of these users have provided direct or indirect feedback by writing documents (scientific articles or blog messages) or posting to the mptcp-dev mailing list (see https://listes-2.sipr.ucl.ac.be/sympa/arc/mptcp-dev). This Multipath TCP implementation is actively maintained and continuously improved. It is used on various types of hosts, ranging from smartphones or embedded routers to high-end servers. The Multipath TCP implementation in the Linux kernel is not, by far, the most widespread deployment of Multipath TCP. Since September 2013, Multipath TCP is also supported on smartphones and tablets beginning with iOS7 [IETFJ]. There are likely hundreds of millions of MPTCP-enabled devices. This Multipath TCP implementation is
currently only used to support the Siri voice recognition/control application. Some lessons learned from this deployment are described in [IETFJ]. Section 3 is organized as follows. Supporting the middleboxes was one of the difficult issues in designing the Multipath TCP protocol. We explain in Section 3.1 which types of middleboxes the Linux Kernel implementation of Multipath TCP supports and how it reacts upon encountering these. Section 3.2 summarizes the MPTCP-specific congestion controls that have been implemented. Sections 3.3 to 3.7 discuss heuristics and issues with respect to subflow management as well as the scheduling across the subflows. Section 3.8 explains some problems that occurred with subflows having different Maximum Segment Size (MSS) values. Section 3.9 presents issues with respect to content delivery networks and suggests a solution to this issue. Finally, Section 3.10 documents an issue with captive portals where MPTCP will behave suboptimally. MPTCPBIB]. Several of the papers published in the scientific literature have identified possible improvements that are worth being discussed here. HotNets][SIGCOMM11]. Today's datacenters are designed to provide several paths between single- homed servers. The multiplicity of these paths comes from the utilization of Equal-Cost Multipath (ECMP) and other load-balancing techniques inside the datacenter. Most of the deployed load- balancing techniques in datacenters rely on hashes computed over the five tuple. Thus, all packets from the same TCP connection follow the same path: so they are not reordered. The results in [HotNets] demonstrate by simulations that Multipath TCP can achieve a better utilization of the available network by using multiple subflows for each Multipath TCP session. Although [RFC6182] assumes that at least one of the communicating hosts has several IP addresses, [HotNets] demonstrates that Multipath TCP is beneficial when both hosts are single-homed. This idea is analyzed in more details in [SIGCOMM11], where the Multipath TCP implementation in the Linux kernel is modified to be able to use several subflows from the same IP address. Measurements in a public datacenter show the quantitative benefits of Multipath TCP [SIGCOMM11] in this environment.
Although ECMP is widely used inside datacenters, this is not the only environment where there are different paths between a pair of hosts. ECMP and other load-balancing techniques such as Link Aggregation Groups (LAGs) are widely used in today's networks; having multiple paths between a pair of single-homed hosts is becoming the norm instead of the exception. Although these multiple paths often have the same cost (from an IGP metrics viewpoint), they do not necessarily have the same performance. For example, [IMC13c] reports the results of a long measurement study showing that load-balanced Internet paths between that same pair of hosts can have huge delay differences. IETFJ]. It has been briefly discussed during IETF 88 [IETF88], but there is no published paper or report that analyses this deployment. For this reason, we only discuss published papers that have mainly used the Multipath TCP implementation in the Linux kernel for their experiments. The performance of Multipath TCP in wireless networks was briefly evaluated in [NSDI12]. One experiment analyzes the performance of Multipath TCP on a client with two wireless interfaces. This evaluation shows that when the receive window is large, Multipath TCP can efficiently use the two available links. However, if the window becomes smaller, then packets sent on a slow path can block the transmission of packets on a faster path. In some cases, the performance of Multipath TCP over two paths can become lower than the performance of regular TCP over the best performing path. Two heuristics, reinjection and penalization, are proposed in [NSDI12] to solve this identified performance problem. These two heuristics have since been used in the Multipath TCP implementation in the Linux kernel. [CONEXT13] explored the problem in more detail and revealed some other scenarios where Multipath TCP can have difficulties in efficiently pooling the available paths. Improvements to the Multipath TCP implementation in the Linux kernel are proposed in [CONEXT13] to cope with some of these problems. The first experimental analysis of Multipath TCP in a public wireless environment was presented in [Cellnet12]. These measurements explore the ability of Multipath TCP to use two wireless networks (real WiFi and 3G networks). Three modes of operation are compared. The first mode of operation is the simultaneous use of the two wireless networks. In this mode, Multipath TCP pools the available resources
and uses both wireless interfaces. This mode provides fast handover from WiFi to cellular or the opposite when the user moves. Measurements presented in [CACM14] show that the handover from one wireless network to another is not an abrupt process. When a host moves, there are regions where the quality of one of the wireless networks is weaker than the other, but the host considers this wireless network to still be up. When a mobile host enters such regions, its ability to send packets over another wireless network is important to ensure a smooth handover. This is clearly illustrated from the packet trace discussed in [CACM14]. Many cellular networks use volume-based pricing; users often prefer to use unmetered WiFi networks when available instead of metered cellular networks. [Cellnet12] implements support for the MP_PRIO option to explore two other modes of operation. In the backup mode, Multipath TCP opens a TCP subflow over each interface, but the cellular interface is configured in backup mode. This implies that data flows only over the WiFi interface when both interfaces are considered to be active. If the WiFi interface fails, then the traffic switches quickly to the cellular interface, ensuring a smooth handover from the user's viewpoint [Cellnet12]. The cost of this approach is that the WiFi and cellular interfaces are likely to remain active all the time since all subflows are established over the two interfaces. The single-path mode is slightly different. This mode benefits from the break-before-make capability of Multipath TCP. When an MPTCP session is established, a subflow is created over the WiFi interface. No packet is sent over the cellular interface as long as the WiFi interface remains up [Cellnet12]. This implies that the cellular interface can remain idle and battery capacity is preserved. When the WiFi interface fails, a new subflow is established over the cellular interface in order to preserve the established Multipath TCP sessions. Compared to the backup mode described earlier, measurements reported in [Cellnet12] indicate that this mode of operation is characterized by a throughput drop while the cellular interface is brought up and the subflows are reestablished. From a protocol viewpoint, [Cellnet12] discusses the problem posed by the unreliability of the REMOVE_ADDR option and proposes a small protocol extension to allow hosts to reliably exchange this option. It would be useful to analyze packet traces to understand whether the unreliability of the REMOVE_ADDR option poses an operational problem in real deployments.
Another study of the performance of Multipath TCP in wireless networks was reported in [IMC13b]. This study uses laptops connected to various cellular ISPs and WiFi hotspots. It compares various file transfer scenarios. [IMC13b] observes that 4-path MPTCP outperforms 2-path MPTCP, especially for larger files. However, for three congestion-control algorithms (LIA, OLIA, and Reno -- see Section 3.2), there is no significant performance difference for file sizes smaller than 4 MB. A different study of the performance of Multipath TCP with two wireless networks is presented in [INFOCOM14]. In this study the two networks had different qualities: a good network and a lossy network. When using two paths with different packet-loss ratios, the Multipath TCP congestion-control scheme moves traffic away from the lossy link that is considered to be congested. However, [INFOCOM14] documents an interesting scenario that is summarized hereafter. client ----------- path1 -------- server | | +--------------- path2 ------------+ Figure 1: Simple network topology Initially, the two paths in Figure 1 have the same quality and Multipath TCP distributes the load over both of them. During the transfer, the path2 becomes lossy, e.g., because the client moves. Multipath TCP detects the packet losses and they are retransmitted over path1. This enables the data transfer to continue over this path. However, the subflow over path2 is still up and transmits one packet from time to time. Although the N packets have been acknowledged over the first subflow (at the MPTCP level), they have not been acknowledged at the TCP level over the second subflow. To preserve the continuity of the sequence numbers over the second subflow, TCP will continue to retransmit these segments until either they are acknowledged or the maximum number of retransmissions is reached. This behavior is clearly inefficient and may lead to blocking since the second subflow will consume window space to be able to retransmit these packets. [INFOCOM14] proposes a new Multipath TCP option to solve this problem. In practice, a new TCP option is probably not required. When the client detects that the data transmitted over the second subflow has been acknowledged over the first subflow, it could decide to terminate the second subflow by sending a RST segment. If the interface associated to this subflow is still up, a new subflow could be immediately reestablished. It would then be immediately usable to send new data and would not be forced to first retransmit the previously transmitted data. As of this writing, this dynamic management of the subflows is not yet implemented in the Multipath TCP implementation in the Linux kernel.
Some studies have started to analyze the performance of Multipath TCP on smartphones with real applications. In contrast with the bulk transfers that are used by many publications, many deployed applications do not exchange huge amounts of data and mainly use small connections. [COMMAG2016] proposes a software testing framework that allows to automate Android applications to study their interactions with Multipath TCP. [PAM2016] analyses a one-month packet trace of all the packets exchanged by a dozen of smartphones utilized by regular users. This analysis reveals that short connections are important on smartphones and that the main benefit of using Multipath TCP on smartphones is the ability to perform seamless handovers between different wireless networks. Long connections benefit from these handovers. HotMiddlebox13b] [HAMPEL]. Another possibility leverages the SOCKS protocol [RFC1928]. SOCKS is often used in enterprise networks to allow clients to reach external servers. For this, the client opens a TCP connection to the SOCKS server that relays it to the final destination. If both the client and the SOCKS server use Multipath TCP, but not the final destination, then Multipath TCP can still be used on the path between the clients and the SOCKS server. At IETF 93, Korea Telecom announced that they have deployed (in June 2015) a commercial service that uses Multipath TCP on smartphones. These smartphones access regular TCP servers through a SOCKS proxy. This enables them to achieve throughputs of up to 850 Mbps [KT].
Measurements performed with Android smartphones [Mobicom15] show that popular applications work correctly through a SOCKS proxy and MPTCP- enabled smartphones. Thanks to Multipath TCP, long-lived connections can be spread over the two available interfaces. However, for short- lived connections, most of the data is sent over the initial subflow that is created over the interface corresponding to the default route and the second subflow is almost not used [PAM2016]. A second use case is when Multipath TCP is used by middleboxes, typically inside access networks. Various network operators are discussing and evaluating solutions for hybrid access networks [TR-348]. Such networks arise when a network operator controls two different access network technologies, e.g., wired and cellular, and wants to combine them to improve the bandwidth offered to the end users [HYA-ARCH]. Several solutions are currently investigated for such networks [TR-348]. Figure 2 shows the organization of such a network. When a client creates a normal TCP connection, it is intercepted by the Hybrid CPE (HPCE) that converts it in a Multipath TCP connection so that it can use the available access networks (DSL and LTE in the example). The Hybrid Access Gateway (HAG) does the opposite to ensure that the regular server sees a normal TCP connection. Some of the solutions currently discussed for hybrid networks use Multipath TCP on the HCPE and the HAG. Other solutions rely on tunnels between the HCPE and the HAG [GRE-NOTIFY]. client --- HCPE ------ DSL ------- HAG --- internet --- server | | +------- LTE -----------+ Figure 2: Hybrid Access Network
The first analysis appears in [IMC11]. This paper was the main motivation for Multipath TCP incorporating various techniques to cope with middlebox interference. More specifically, Multipath TCP has been designed to cope with middleboxes that: o change source or destination addresses o change source or destination port numbers o change TCP sequence numbers o split or coalesce segments o remove TCP options o modify the payload of TCP segments These middlebox interferences have all been included in the MBtest suite [MBTest]. This test suite is used in [HotMiddlebox13] to verify the reaction of the Multipath TCP implementation in the Linux kernel [MultipathTCP-Linux] when faced with middlebox interference. The test environment used for this evaluation is a dual-homed client connected to a single-homed server. The middlebox behavior can be activated on any of the paths. The main results of this analysis are: o the Multipath TCP implementation in the Linux kernel is not affected by a middlebox that performs NAT or modifies TCP sequence numbers o when a middlebox removes the MP_CAPABLE option from the initial SYN segment, the Multipath TCP implementation in the Linux kernel falls back correctly to regular TCP o when a middlebox removes the DSS option from all data segments, the Multipath TCP implementation in the Linux kernel falls back correctly to regular TCP o when a middlebox performs segment coalescing, the Multipath TCP implementation in the Linux kernel is still able to accurately extract the data corresponding to the indicated mapping o when a middlebox performs segment splitting, the Multipath TCP implementation in the Linux kernel correctly reassembles the data corresponding to the indicated mapping. [HotMiddlebox13] shows, in Figure 4 in Section 3.3, a corner case with segment splitting that may lead to a desynchronization between the two hosts.
The interactions between Multipath TCP and real deployed middleboxes are also analyzed in [HotMiddlebox13]; a particular scenario with the FTP Application Level Gateway running on a NAT is described. Middlebox interference can also be detected by analyzing packet traces on MPTCP-enabled servers. A closer look at the packets received on the multipath-tcp.org server [TMA2015] shows that among the 184,000 Multipath TCP connections, only 125 of them were falling back to regular TCP. These connections originated from 28 different client IP addresses. These include 91 HTTP connections and 34 FTP connections. The FTP interference is expected since Application Level Gateways used for FTP modify the TCP payload and the DSS Checksum detects these modifications. The HTTP interference appeared only on the direction from server to client and could have been caused by transparent proxies deployed in cellular or enterprise networks. A longer trace is discussed in [COMCOM2016] and similar conclusions about the middlebox interference are provided. From an operational viewpoint, knowing that Multipath TCP can cope with various types of middlebox interference is important. However, there are situations where the network operators need to gather information about where a particular middlebox interference occurs. The tracebox software [tracebox] described in [IMC13a] is an extension of the popular traceroute software that enables network operators to check at which hop a particular field of the TCP header (including options) is modified. It has been used by several network operators to debug various middlebox interference problems. Experience with tracebox indicates that supporting the ICMP extension defined in [RFC1812] makes it easier to debug middlebox problems in IPv4 networks. Users of the Multipath TCP implementation have reported some experience with middlebox interference. The strangest scenario has been a middlebox that accepts the Multipath TCP options in the SYN segment but later replaces Multipath TCP options with a TCP EOL option [StrangeMbox]. This causes Multipath TCP to perform a fallback to regular TCP without any impact on the application. RFC6356] in an adaptation of the NewReno algorithm. A detailed description of this coupled algorithm is provided in [NSDI11]. It is the default scheme in the Linux implementation of Multipath TCP, but Linux supports other schemes.
The second congestion-control scheme is OLIA [CONEXT12]. It is also an adaptation of the NewReno single path congestion-control scheme to support multiple paths. Simulations [CONEXT12] and measurements [CONEXT13] have shown that it provides some performance benefits compared to the default coupled congestion-control scheme. The delay-based scheme proposed in [ICNP12] has also been ported to the Multipath TCP implementation in the Linux kernel. It has been evaluated by using simulations [ICNP12] and measurements [PaaschPhD]. BALIA, defined in [BALIA], provides a better balance between TCP friendliness, responsiveness, and window oscillation. These different congestion-control schemes have been compared in several articles. [CONEXT13] and [PaaschPhD] compare these algorithms in an emulated environment. The evaluation showed that the delay-based congestion-control scheme is less able to efficiently use the available links than the three other schemes. RFC6182] and the protocol specification [RFC6824] define the basic usage of the subflows and the protocol mechanisms that are required to create and terminate them. However, there are no guidelines on how subflows are used during the lifetime of a Multipath TCP session. Most of the published experiments with Multipath TCP have been performed in controlled environments. Still, based on the experience running them and discussions on the mptcp-dev mailing list, interesting lessons have been learned about the management of these subflows. From a subflow viewpoint, the Multipath TCP protocol is completely symmetrical. Both the clients and the server have the capability to create subflows. However, in practice, the existing Multipath TCP implementations have opted for a strategy where only the client creates new subflows. The main motivation for this strategy is that often the client resides behind a NAT or a firewall, preventing passive subflow openings on the client. Although there are environments such as datacenters where this problem does not occur, as of this writing, no precise requirement has emerged for allowing the server to create new subflows.
MPTCP-MAX-SUB]. This might require the definition of policy rules to control the operation of the subflow manager. The two scenarios below illustrate some of these requirements. host1 ---------- switch1 ----- host2 | | | +-------------- switch2 --------+ Figure 3: Simple Switched Network Topology
Consider the simple network topology shown in Figure 3. From an operational viewpoint, a network operator could want to create two subflows between the communicating hosts. From a bandwidth utilization viewpoint, the most natural paths are host1-switch1-host2 and host1-switch2-host2. However, a Multipath TCP implementation running on these two hosts may sometimes have difficulties to obtain this result. To understand the difficulty, let us consider different allocation strategies for the IP addresses. A first strategy is to assign two subnets: subnetA (resp. subnetB) contains the IP addresses of host1's interface to switch1 (resp. switch2) and host2's interface to switch1 (resp. switch2). In this case, a Multipath TCP subflow manager should only create one subflow per subnet. To enforce the utilization of these paths, the network operator would have to specify a policy that prefers the subflows in the same subnet over subflows between addresses in different subnets. It should be noted that the policy should probably also specify how the subflow manager should react when an interface or subflow fails. A second strategy is to use a single subnet for all IP addresses. In this case, it becomes more difficult to specify a policy that indicates which subflows should be established. The second subflow manager that is currently supported by the Multipath TCP implementation in the Linux kernel is the ndiffport subflow manager. This manager was initially created to exploit the path diversity that exists between single-homed hosts due to the utilization of flow-based load-balancing techniques [SIGCOMM11]. This subflow manager creates N subflows between the same pair of IP addresses. The N subflows are created by the client and differ only in the source port selected by the client. It was not designed to be used on multihomed hosts. A more flexible subflow manager has been proposed, implemented and evaluated in [CONEXT15]. This subflow manager exposes various kernel events to a user space daemon that decides when subflows need to be created and terminated based on various policies.
Figure 4. client ------- r1 --- internet --- server | | +----------r2-------+ Figure 4: Multihomed-Client Connected to Single-Homed Server When the Multipath TCP implementation in the Linux kernel creates the second subflow, it uses the same destination port as the initial subflow. This choice is motivated by the fact that the server might be protected by a firewall and only accept TCP connections (including subflows) on the official port number. Using the same destination port for all subflows is also useful for operators that rely on the port numbers to track application usage in their network. There have been suggestions from Multipath TCP users to modify the implementation to allow the client to use different destination ports to reach the server. This suggestion seems mainly motivated by traffic-shaping middleboxes that are used in some wireless networks. In networks where different shaping rates are associated with different destination port numbers, this could allow Multipath TCP to reach a higher performance. This behavior is valid according to the Multipath TCP specification [RFC6824]. An application could use an enhanced socket API [SOCKET] to behave in this way. However, from an implementation point-of-view supporting different destination ports for the same Multipath TCP connection can cause some issues. A legacy implementation of a TCP stack creates a listening socket to react upon incoming SYN segments. The listening socket is handling the SYN segments that are sent on a specific port number. Demultiplexing incoming segments can thus be done solely by looking at the IP addresses and the port numbers. With Multipath TCP however, incoming SYN segments may have an MP_JOIN option with a different destination port. This means that all incoming segments
that did not match on an existing listening-socket or an already established socket must be parsed for an eventual MP_JOIN option. This imposes an additional cost on servers, previously not existent on legacy TCP implementations.