Network Working Group G. Montenegro Request for Comments: 2757 Sun Microsystems, Inc. Category: Informational S. Dawkins Nortel Networks M. Kojo University of Helsinki V. Magret Alcatel N. Vaidya Texas A&M University January 2000 Long Thin Networks Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2000). All Rights Reserved. Abstract In view of the unpredictable and problematic nature of long thin networks (for example, wireless WANs), arriving at an optimized transport is a daunting task. We have reviewed the existing proposals along with future research items. Based on this overview, we also recommend mechanisms for implementation in long thin networks. Our goal is to identify a TCP that works for all users, including users of long thin networks. We started from the working recommendations of the IETF TCP Over Satellite Links (tcpsat) working group with this end in mind. We recognize that not every tcpsat recommendation will be required for long thin networks as well, and work toward a set of TCP recommendations that are 'benign' in environments that do not require them.
Table of Contents 1 Introduction ................................................. 3 1.1 Network Architecture .................................... 5 1.2 Assumptions about the Radio Link ........................ 6 2 Should it be IP or Not? ..................................... 7 2.1 Underlying Network Error Characteristics ................ 7 2.2 Non-IP Alternatives ..................................... 8 2.2.1 WAP ................................................ 8 2.2.2 Deploying Non-IP Alternatives ...................... 9 2.3 IP-based Considerations ................................. 9 2.3.1 Choosing the MTU [Stevens94, RFC1144] .............. 9 2.3.2 Path MTU Discovery [RFC1191] ....................... 10 2.3.3 Non-TCP Proposals .................................. 10 3 The Case for TCP ............................................. 11 4 Candidate Optimizations ...................................... 12 4.1 TCP: Current Mechanisms ................................. 12 4.1.1 Slow Start and Congestion Avoidance ................ 12 4.1.2 Fast Retransmit and Fast Recovery .................. 12 4.2 Connection Setup with T/TCP [RFC1397, RFC1644] .......... 14 4.3 Slow Start Proposals .................................... 14 4.3.1 Larger Initial Window .............................. 14 4.3.2 Growing the Window during Slow Start ............... 15 220.127.116.11 ACK Counting .................................. 15 18.104.22.168 ACK-every-segment ............................. 16 4.3.3 Terminating Slow Start ............................. 17 4.3.4 Generating ACKs during Slow Start .................. 17 4.4 ACK Spacing ............................................. 17 4.5 Delayed Duplicate Acknowlegements ....................... 18 4.6 Selective Acknowledgements [RFC2018] .................... 18 4.7 Detecting Corruption Loss ............................... 19 4.7.1 Without Explicit Notification ...................... 19 4.7.2 With Explicit Notifications ........................ 20 4.8 Active Queue Management ................................. 21 4.9 Scheduling Algorithms ................................... 21 4.10 Split TCP and Performance-Enhancing Proxies (PEPs) ..... 22 4.10.1 Split TCP Approaches .............................. 23 4.10.2 Application Level Proxies ......................... 26 4.10.3 Snoop and its Derivatives ......................... 27 4.10.4 PEPs to handle Periods of Disconnection ........... 29 4.11 Header Compression Alternatives ........................ 30 4.12 Payload Compression .................................... 31 4.13 TCP Control Block Interdependence [Touch97] ............ 32 5 Summary of Recommended Optimizations ......................... 33 6 Conclusion ................................................... 35 7 Acknowledgements ............................................. 35 8 Security Considerations ...................................... 35
9 References ................................................... 36 Authors' Addresses ............................................. 44 Full Copyright Statement ....................................... 46 1 Introduction Optimized wireless networking is one of the major hurdles that Mobile Computing must solve if it is to enable ubiquitous access to networking resources. However, current data networking protocols have been optimized primarily for wired networks. Wireless environments have very different characteristics in terms of latency, jitter, and error rate as compared to wired networks. Accordingly, traditional protocols are ill-suited to this medium. Mobile Wireless networks can be grouped in W-LANs (for example, 802.11 compliant networks) and W-WANs (for example, CDPD [CDPD], Ricochet, CDMA [CDMA], PHS, DoCoMo, GSM [GSM] to name a few). W-WANs present the most serious challenge, given that the length of the wireless link (expressed as the delay*bandwidth product) is typically 4 to 5 times as long as that of its W-LAN counterparts. For example, for an 802.11 network, assuming the delay (round-trip time) is about 3 ms. and the bandwidth is 1.5 Mbps, the delay*bandwidth product is 4500 bits. For a W-WAN such as Ricochet, a typical round-trip time may be around 500 ms. (the best is about 230 ms.), and the sustained bandwidth is about 24 Kbps. This yields a delay*bandwidth product roughly equal to 1.5 KB. In the near future, 3rd Generation wireless services will offer 384Kbps and more. Assuming a 200 ms round-trip, the delay*bandwidth product in this case is 76.8 Kbits (9.6 KB). This value is larger than the default 8KB buffer space used by many TCP implementations. This means that, whereas for W-LANs the default buffer space is enough, future W-WANs will operate inefficiently (that is, they will not be able to fill the pipe) unless they override the default value. A 3rd Generation wireless service offering 2 Mbps with 200-millisecond latency requires a 50 KB buffer. Most importantly, latency across a link adversely affects throughput. For example, [MSMO97] derives an upper bound on TCP throughput. Indeed, the resultant expression is inversely related to the round-trip time. The long latencies also push the limits (and commonly transgress them) for what is acceptable to users of interactive applications. As a quick glance to our list of references will reveal, there is a wealth of proposals that attempt to solve the wireless networking problem. In this document, we survey the different solutions available or under investigation, and issue the corresponding recommendations.
There is a large body of work on the subject of improving TCP performance over satellite links. The documents under development by the tcpsat working group of the IETF [AGS98, ADGGHOSSTT98] are very relevant. In both cases, it is essential to start by improving the characteristics of the medium by using forward error correction (FEC) at the link layer to reduce the BER (bit error rate) from values as high as 10-3 to 10-6 or better. This makes the BER manageable. Once in this realm, retransmission schemes like ARQ (automatic repeat request) may be used to bring it down even further. Notice that sometimes it may be desirable to forego ARQ because of the additional delay it implies. In particular, time sensitive traffic (video, audio) must be delivered within a certain time limit beyond which the data is obsolete. Exhaustive retransmissions in this case merely succeed in wasting time in order to deliver data that will be discarded once it arrives at its destination. This indicates the desirability of augmenting the protocol stack implementation on devices such that the upper protocol layers can inform the link and MAC layer when to avoid such costly retransmission schemes. Networks that include satellite links are examples of "long fat networks" (LFNs or "elephants"). They are "long" networks because their round-trip time is quite high (for example, 0.5 sec and higher for geosynchronous satellites). Not all satellite links fall within the LFN regime. In particular, round-trip times in a low-earth orbiting (LEO) satellite network may be as little as a few milliseconds (and never extend beyond 160 to 200 ms). W-WANs share the "L" with LFNs. However, satellite networks are also "fat" in the sense that they may have high bandwidth. Satellite networks may often have a delay*bandwidth product above 64 KBytes, in which case they pose additional problems to TCP [TCPHP]. W-WANs do not generally exhibit this behavior. Accordingly, this document only deals with links that are "long thin pipes", and the networks that contain them: "long thin networks". We call these "LTNs". This document does not give an overview of the API used to access the underlying transport. We believe this is an orthogonal issue, even though some of the proposals below have been put forth assuming a given interface. It is possible, for example, to support the traditional socket semantics without fully relying on TCP/IP transport [MOWGLI]. Our focus is on the on-the-wire protocols. We try to include the most relevant ones and briefly (given that we provide the references needed for further study) mention their most salient points.
1.1 Network Architecture One significant difference between LFNs and LTNs is that we assume the W-WAN link is the last hop to the end user. This allows us to assume that a single intermediate node sees all packets transferred between the wireless mobile device and the rest of the Internet. This is only one of the topologies considered by the TCP Satellite community. Given our focus on mobile wireless applications, we only consider a very specific architecture that includes: - a wireless mobile device, connected via - a wireless link (which may, in fact comprise several hops at the link layer), to - an intermediate node (sometimes referred to as a base station) connected via - a wireline link, which in turn interfaces with - the landline Internet and millions of legacy servers and web sites. Specifically, we are not as concerned with paths that include two wireless segments separated by a wired one. This may occur, for example, if one mobile device connects across its immediate wireless segment via an intermediate node to the Internet, and then via a second wireless segment to another mobile device. Quite often, mobile devices connect to a legacy server on the wired Internet. Typically, the endpoints of the wireless segment are the intermediate node and the mobile device. However, the latter may be a wireless router to a mobile network. This is also important and has applications in, for example, disaster recovery. Our target architecture has implications which concern the deployability of candidate solutions. In particular, an important requirement is that we cannot alter the networking stack on the legacy servers. It would be preferable to only change the networking stack at the intermediate node, although changing it at the mobile devices is certainly an option and perhaps a necessity. We envision mobile devices that can use the wireless medium very efficiently, but overcome some of its traditional constraints. That is, full mobility implies that the devices have the flexibility and agility to use whichever happens to be the best network connection
available at any given point in time or space. Accordingly, devices could switch from a wired office LAN and hand over their ongoing connections to continue on, say, a wireless WAN. This type of agility also requires Mobile IP [RFC2002]. 1.2 Assumptions about the Radio Link The system architecture described above assumes at most one wireless link (perhaps comprising more than one wireless hop). However, this is not enough to characterize a wireless link. Additional considerations are: - What are the error characteristics of the wireless medium? The link may present a higher BER than a wireline network due to burst errors and disconnections. The techniques below usually do not address all the types of errors. Accordingly, a complete solution should combine the best of all the proposals. Nevertheless, in this document we are more concerned with (and give preference to solving) the most typical case: (1) higher BER due to random errors (which implies longer and more variable delays due to link-layer error corrections and retransmissions) rather than (2) an interruption in service due to a handoff or a disconnection. The latter are also important and we do include relevant proposals in this survey. - Is the wireless service datagram oriented, or is it a virtual circuit? Currently, switched virtual circuits are more common, but packet networks are starting to appear, for example, Metricom's Starmode [CB96], CDPD [CDPD] and General Packet Radio Service (GPRS) [GPRS],[BW97] in GSM. - What kind of reliability does the link provide? Wireless services typically retransmit a packet (frame) until it has been acknowledged by the target. They may allow the user to turn off this behavior. For example, GSM allows RLP [RLP] (Radio Link Protocol) to be turned off. Metricom has a similar "lightweight" mode. In GSM RLP, a frame is retransmitted until the maximum number of retransmissions (protocol parameter) is reached. What happens when this limit is reached is determined by the telecom operator: the physical link connection is either disconnected or a link reset is enforced where the sequence numbers are resynchronized and the transmit and receive buffers are flushed resulting in lost data. Some wireless services, like CDMA IS95-RLP [CDMA, Karn93], limit the latency on the wireless link by retransmitting a frame only a couple of times. This decreases the residual frame error rate significantly, but does not provide fully reliable link service.
- Does the mobile device transmit and receive at the same time? Doing so increases the cost of the electronics on the mobile device. Typically, this is not the case. We assume in this document that mobile devices do not transmit and receive simultaneously. - Does the mobile device directly address more than one peer on the wireless link? Packets to each different peer may traverse spatially distinct wireless paths. Accordingly, the path to each peer may exhibit very different characteristics. Quite commonly, the mobile device addresses only one peer (the intermediate node) at any given point in time. When this is not the case, techniques such as Channel-State Dependent Packet Scheduling come into play (see the section "Packet Scheduling" below). 2 Should it be IP or Not? The first decision is whether to use IP as the underlying network protocol or not. In particular, some data protocols evolved from wireless telephony are not always -- though at times they may be -- layered on top of IP [MOWGLI, WAP]. These proposals are based on the concept of proxies that provide adaptation services between the wireless and wireline segments. This is a reasonable model for mobile devices that always communicate through the proxy. However, we expect many wireless mobile devices to utilize wireline networks whenever they are available. This model closely follows current laptop usage patterns: devices typically utilize LANs, and only resort to dial-up access when "out of the office." For these devices, an architecture that assumes IP is the best approach, because it will be required for communications that do not traverse the intermediate node (for example, upon reconnection to a W-LAN or a 10BaseT network at the office). 2.1 Underlying Network Error Characteristics Using IP as the underlying network protocol requires a certain (low) level of link robustness that is expected of wireless links. IP, and the protocols that are carried in IP packets, are protected end-to-end by checksums that are relatively weak [Stevens94, Paxson97] (and, in some cases, optional). For much of the Internet, these checksums are sufficient; in wireless environments, the error characteristics of the raw wireless link are much less robust than the rest of the end-to-end path. Hence for paths that include
wireless links, exclusively relying on end-to-end mechanisms to detect and correct transmission errors is undesirable. These should be complemented by local link-level mechanisms. Otherwise, damaged IP packets are propagated through the network only to be discarded at the destination host. For example, intermediate routers are required to check the IP header checksum, but not the UDP or TCP checksums. Accordingly, when the payload of an IP packet is corrupted, this is not detected until the packet arrives at its ultimate destination. A better approach is to use link-layer mechanisms such as FEC, retransmissions, and so on in order to improve the characteristics of the wireless link and present a much more reliable service to IP. This approach has been taken by CDPD, Ricochet and CDMA. This approach is roughly analogous to the successful deployment of Point-to-Point Protocol (PPP), with robust framing and 16-bit checksumming, on wireline networks as a replacement for the Serial Line Interface Protocol (SLIP), with only a single framing byte and no checksumming. [AGS98] recommends the use of FEC in satellite environments. Notice that the link-layer could adapt its frame size to the prevalent BER. It would perform its own fragmentation and reassembly so that IP could still enjoy a large enough MTU size [LS98]. A common concern for using IP as a transport is the header overhead it implies. Typically, the underlying link-layer appears as PPP [RFC1661] to the IP layer above. This allows for header compression schemes [IPHC, IPHC-RTP, IPHC-PPP] which greatly alleviate the problem. 2.2 Non-IP Alternatives A number of non-IP alternatives aimed at wireless environments have been proposed. One representative proposal is discussed here. 2.2.1 WAP The Wireless Application Protocol (WAP) specifies an application framework and network protocols for wireless devices such as mobile telephones, pagers, and PDAs [WAP]. The architecture requires a proxy between the mobile device and the server. The WAP protocol stack is layered over a datagram transport service. Such a service is provided by most wireless networks; for example, IS-136, GSM SMS/USSD, and UDP in IP networks like CDPD and GSM GPRS. The core of
the WAP protocols is a binary HTTP/1.1 protocol with additional features such as header caching between requests and a shared state between client and server. 2.2.2 Deploying Non-IP Alternatives IP is such a fundamental element of the Internet that non-IP alternatives face substantial obstacles to deployment, because they do not exploit the IP infrastructure. Any non-IP alternative that is used to provide gatewayed access to the Internet must map between IP addresses and non-IP addresses, must terminate IP-level security at a gateway, and cannot use IP-oriented discovery protocols (Dynamic Host Configuration Protocol, Domain Name Services, Lightweight Directory Access Protocol, Service Location Protocol, etc.) without translation at a gateway. A further complexity occurs when a device supports both wireless and wireline operation. If the device uses IP for wireless operation, uninterrupted operation when the device is connected to a wireline network is possible (using Mobile IP). If a non-IP alternative is used, this switchover is more difficult to accomplish. Non-IP alternatives face the burden of proof that IP is so ill-suited to a wireless environment that it is not a viable technology. 2.3 IP-based Considerations Given its worldwide deployment, IP is an obvious choice for the underlying network technology. Optimizations implemented at this level benefit traditional Internet application protocols as well as new ones layered on top of IP or UDP. 2.3.1 Choosing the MTU [Stevens94, RFC1144] In slow networks, the time required to transmit the largest possible packet may be considerable. Interactive response time should not exceed the well-known human factors limit of 100 to 200 ms. This should be considered the maximum time budget to (1) send a packet and (2) obtain a response. In most networking stack implementations, (1) is highly dependent on the maximum transmission unit (MTU). In the worst case, a small packet from an interactive application may have to wait for a large packet from a bulk transfer application before being sent. Hence, a good rule of thumb is to choose an MTU such that its transmission time is less than (or not much larger than) 200 ms.
Of course, compression and type-of-service queuing (whereby interactive data packets are given a higher priority) may alleviate this problem. In particular, the latter may reduce the average wait time to about half the MTU's transmission time. 2.3.2 Path MTU Discovery [RFC1191] Path MTU discovery benefits any protocol built on top of IP. It allows a sender to determine what the maximum end-to-end transmission unit is to a given destination. Without Path MTU discovery, the default IPv4 MTU size is 576. The benefits of using a larger MTU are: - Smaller ratio of header overhead to data - Allows TCP to grow its congestion window faster, since it increases in units of segments. Of course, for a given BER, a larger MTU has a correspondingly larger probability of error within any given segment. The BER may be reduced using lower level techniques like FEC and link-layer retransmissions. The issue is that now delays may become a problem due to the additional retransmissions, and the fact that packet transmission time increases with a larger MTU. Recommendation: Path MTU discovery is recommended. [AGS98] already recommends its use in satellite environments. 2.3.3 Non-TCP Proposals Other proposals assume an underlying IP datagram service, and implement an optimized transport either directly on top of IP [NETBLT] or on top of UDP [MNCP]. Not relying on TCP is a bold move, given the wealth of experience and research related to it. It could be argued that the Internet has not collapsed because its main protocol, TCP, is very careful in how it uses the network, and generally treats it as a black box assuming all packet losses are due to congestion and prudently backing off. This avoids further congestion. However, in the wireless medium, packet losses may also be due to corruption due to high BER, fading, and so on. Here, the right approach is to try harder, instead of backing off. Alternative transport protocols are: - NETBLT [NETBLT, RFC1986, RFC1030] - MNCP [MNCP]
- ESRO [RFC2188] - RDP [RFC908, RFC1151] - VMTP [VMTP] 3 The Case for TCP This is one of the most hotly debated issues in the wireless arena. Here are some arguments against it: - It is generally recognized that TCP does not perform well in the presence of significant levels of non-congestion loss. TCP detractors argue that the wireless medium is one such case, and that it is hard enough to fix TCP. They argue that it is easier to start from scratch. - TCP has too much header overhead. - By the time the mechanisms are in place to fix it, TCP is very heavy, and ill-suited for use by lightweight, portable devices. and here are some in support of TCP: - It is preferable to continue using the same protocol that the rest of the Internet uses for compatibility reasons. Any extensions specific to the wireless link may be negotiated. - Legacy mechanisms may be reused (for example three-way handshake). - Link-layer FEC and ARQ can reduce the BER such that any losses TCP does see are, in fact, caused by congestion (or a sustained interruption of link connectivity). Modern W-WAN technologies do this (CDPD, US-TDMA, CDMA, GSM), thus improving TCP throughput. - Handoffs among different technologies are made possible by Mobile IP [RFC2002], but only if the same protocols, namely TCP/IP, are used throughout. - Given TCP's wealth of research and experience, alternative protocols are relatively immature, and the full implications of their widespread deployment not clearly understood. Overall, we feel that the performance of TCP over long-thin networks can be improved significantly. Mechanisms to do so are discussed in the next sections.
4 Candidate Optimizations There is a large volume of work on the subject of optimizing TCP for operation over wireless media. Even though satellite networks generally fall in the LFN regime, our current LTN focus has much to benefit from it. For example, the work of the TCP-over-Satellite working group of the IETF has been extremely helpful in preparing this section [AGS98, ADGGHOSSTT98]. 4.1 TCP: Current Mechanisms A TCP sender adapts its use of bandwidth based on feedback from the receiver. The high latency characteristic of LTNs implies that TCP's adaptation is correspondingly slower than on networks with shorter delays. Similarly, delayed ACKs exacerbate the perceived latency on the link. Given that TCP grows its congestion window in units of segments, small MTUs may slow adaptation even further. 4.1.1 Slow Start and Congestion Avoidance Slow Start and Congestion Avoidance [RFC2581] are essential the Internet's stability. However there are two reasons why the wireless medium adversely affects them: - Whenever TCP's retransmission timer expires, the sender assumes that the network is congested and invokes slow start. This is why it is important to minimize the losses caused by corruption, leaving only those caused by congestion (as expected by TCP). - The sender increases its window based on the number of ACKs received. Their rate of arrival, of course, is dependent on the RTT (round-trip-time) between sender and receiver, which implies long ramp-up times in high latency links like LTNs. The dependency lasts until the pipe is filled. - During slow start, the sender increases its window in units of segments. This is why it is important to use an appropriately large MTU which, in turn, requires requires link layers with low loss. 4.1.2 Fast Retransmit and Fast Recovery When a TCP sender receives several duplicate ACKs, fast retransmit [RFC2581] allows it to infer that a segment was lost. The sender retransmits what it considers to be this lost segment without waiting for the full timeout, thus saving time.
After a fast retransmit, a sender invokes the fast recovery [RFC2581] algorithm. Fast recovery allows the sender to transmit at half its previous rate (regulating the growth of its window based on congestion avoidance), rather than having to begin a slow start. This also saves time. In general, TCP can increase its window beyond the delay-bandwidth product. However, in LTN links the congestion window may remain rather small, less than four segments, for long periods of time due to any of the following reasons: 1. Typical "file size" to be transferred over a connection is relatively small (Web requests, Web document objects, email messages, files, etc.) In particular, users of LTNs are not very willing to carry out large transfers as the response time is so long. 2. If the link has high BER, the congestion window tends to stay small 3. When an LTN is combined with a highly congested wireline Internet path, congestion losses on the Internet have the same effect as 2. 4. Commonly, ISPs/operators configure only a small number of buffers (even as few as for 3 packets) per user in their dial- up routers 5. Often small socket buffers are recommended with LTNs in order to prevent the RTO from inflating and to diminish the amount of packets with competing traffic. A small window effectively prevents the sender from taking advantage of Fast Retransmits. Moreover, efficient recovery from multiple losses within a single window requires adoption of new proposals (NewReno [RFC2582]). In addition, on slow paths with no packet reordering waiting for three duplicate ACKs to arrive postpones retransmission unnecessarily. Recommendation: Implement Fast Retransmit and Fast Recovery at this time. This is a widely-implemented optimization and is currently at Proposed Standard level. [AGS98] recommends implementation of Fast Retransmit/Fast Recovery in satellite environments. NewReno [RFC2582] apparently does help a sender better handle partial ACKs and multiple losses in a single window, but at this point is not recommended due to its experimental nature. Instead, SACK [RFC2018] is the preferred mechanism.
4.2 Connection Setup with T/TCP [RFC1397, RFC1644] TCP engages in a "three-way handshake" whenever a new connection is set up. Data transfer is only possible after this phase has completed successfully. T/TCP allows data to be exchanged in parallel with the connection set up, saving valuable time for short transactions on long-latency networks. Recommendation: T/TCP is not recommended, for these reasons: - It is an Experimental RFC. - It is not widely deployed, and it has to be deployed at both ends of a connection. - Security concerns have been raised that T/TCP is more vulnerable to address-spoofing attacks than TCP itself. - At least some of the benefits of T/TCP (eliminating three-way handshake on subsequent query-response transactions, for instance) are also available with persistent connections on HTTP/1.1, which is more widely deployed. [ADGGHOSSTT98] does not have a recommendation on T/TCP in satellite environments. 4.3 Slow Start Proposals Because slow start dominates the network response seen by interactive users at the beginning of a TCP connection, a number of proposals have been made to modify or eliminate slow start in long latency environments. Stability of the Internet is paramount, so these proposals must demonstrate that they will not adversely affect Internet congestion levels in significant ways. 4.3.1 Larger Initial Window Traditional slow start, with an initial window of one segment, is a time-consuming bandwidth adaptation procedure over LTNs. Studies on an initial window larger than one segment [RFC2414, AHO98] resulted in the TCP standard supporting a maximum value of 2 [RFC2581]. Higher values are still experimental in nature.
In simulations with an increased initial window of three packets [RFC2415], this proposal does not contribute significantly to packet drop rates, and it has the added benefit of improving initial response times when the peer device delays acknowledgements during slow start (see next proposal). [RFC2416] addresses situations where the initial window exceeds the number of buffers available to TCP and indicates that this situation is no different from the case where the congestion window grows beyond the number of buffers available. [RFC2581] now allows an initial congestion window of two segments. A larger initial window, perhaps as many as four segments, might be allowed in the future in environments where this significantly improves performance (LFNs and LTNs). Recommendation: Implement this on devices now. The research on this optimization indicates that 3 segments is a safe initial setting, and is centering on choosing between 2, 3, and 4. For now, use 2 (following RFC2581), which at least allows clients running query- response applications to get an initial ACK from unmodified servers without waiting for a typical delayed ACK timeout of 200 milliseconds, and saves two round-trips. An initial window of 3 [RFC2415] looks promising and may be adopted in the future pending further research and experience. 4.3.2 Growing the Window during Slow Start The sender increases its window based on the flow of ACKs coming back from the receiver. Particularly during slow start, this flow is very important. A couple of the proposals that have been studied are (1) ACK counting and (2) ACK-every-segment. 22.214.171.124 ACK Counting The main idea behind ACK counting is: - Make each ACK count to its fullest by growing the window based on the data being acknowledged (byte counting) instead of the number of ACKs (ACK counting). This has been shown to cause bursts which lead to congestion. [Allman98] shows that Limited Byte Counting (LBC), in which the window growth is limited to 2 segments, does not lead to as much burstiness, and offers some performance gains. Recommendation: Unlimited byte counting is not recommended. Van Jacobson cautions against byte counting [TCPSATMIN] because it leads to burstiness, and recommends ACK spacing [ACKSPACING] instead.
ACK spacing requires ACKs to consistently pass through a single ACK- spacing router. This requirement works well for W-WAN environments if the ACK-spacing router is also the intermediate node. Limited byte counting warrants further investigation before we can recommend this proposal, but it shows promise. 126.96.36.199 ACK-every-segment The main idea behind ACK-every-segment is: - Keep a constant stream of ACKs coming back by turning off delayed ACKs [RFC1122] during slow start. ACK-every-segment must be limited to slow start, in order to avoid penalizing asymmetric-bandwidth configurations. For instance, a low bandwidth link carrying acknowledgements back to the sender, hinders the growth of the congestion window, even if the link toward the client has a greater bandwidth [BPK99]. Even though simulations confirm its promise (it allows receivers to receive the second segment from unmodified senders without waiting for a typical delayed ACK timeout of 200 milliseconds), for this technique to be practical the receiver must acknowledge every segment only when the sender is in slow start. Continuing to do so when the sender is in congestion avoidance may have adverse effects on the mobile device's battery consumption and on traffic in the network. This violates a SHOULD in [RFC2581]: delayed acknowledgements SHOULD be used by a TCP receiver. "Disabling Delayed ACKs During Slow Start" is technically unimplementable, as the receiver has no way of knowing when the sender crosses ssthresh (the "slow start threshold") and begins using the congestion avoidance algorithm. If receivers follow recommendations for increased initial windows, disabling delayed ACKs during an increased initial window would open the TCP window more rapidly without doubling ACK traffic in general. However, this scheme might double ACK traffic if most connections remain in slow- start. Recommendation: ACK only the first segment on a new connection with no delay.
4.3.3 Terminating Slow Start New mechanisms [ADGGHOSSTT98] are being proposed to improve TCP's adaptive properties such that the available bandwidth is better utilized while reducing the possibility of congesting the network. This results in the closing of the congestion window to 1 segment (which precludes fast retransmit), and the subsequent slow start phase. Theoretically, an optimum value for slow-start threshold (ssthresh) allows connection bandwidth utilization to ramp up as aggressively as possible without "overshoot" (using so much bandwidth that packets are lost and congestion avoidance procedures are invoked). Recommendation: Estimating the slow start threshold is not recommended. Although this would be helpful if we knew how to do it, rough consensus on the tcp-impl and tcp-sat mailing lists is that in non-trivial operational networks there is no reliable method to probe during TCP startup and estimate the bandwidth available. 4.3.4 Generating ACKs during Slow Start Mitigations that inject additional ACKs (whether "ACK-first-segment" or "ACK-every-segment-during-slow-start") beyond what today's conformant TCPs inject are only applicable during the slow-start phases of a connection. After an initial exchange, the connection usually completes slow-start, so TCPs only inject additional ACKs when (1) the connection is closed, and a new connection is opened, or (2) the TCPs handle idle connection restart correctly by performing slow start. Item (1) is typical when using HTTP/1.0, in which each request- response transaction requires a new connection. Persistent connections in HTTP/1.1 help in maintaining a connection in congestion avoidance instead of constantly reverting to slow-start. Because of this, these optimizations which are only enabled during slow-start do not get as much of a chance to act. Item (2), of course, is independent of HTTP version. 4.4 ACK Spacing During slow start, the sender responds to the incoming ACK stream by transmitting N+1 segments for each ACK, where N is the number of new segments acknowledged by the incoming ACK. This results in data being sent at twice the speed at which it can be processed by the network. Accordingly, queues will form, and due to insufficient buffering at the bottleneck router, packets may get dropped before the link's capacity is full.
Spacing out the ACKs effectively controls the rate at which the sender will transmit into the network, and may result in little or no queueing at the bottleneck router [ACKSPACING]. Furthermore, ack spacing reduces the size of the bursts. Recommendation: No recommendation at this time. Continue monitoring research in this area. 4.5 Delayed Duplicate Acknowlegements As was mentioned above, link-layer retransmissions may decrease the BER enough that congestion accounts for most of packet losses; still, nothing can be done about interruptions due to handoffs, moving beyond wireless coverage, etc. In this scenario, it is imperative to prevent interaction between link-layer retransmission and TCP retransmission as these layers duplicate each other's efforts. In such an environment it may make sense to delay TCP's efforts so as to give the link-layer a chance to recover. With this in mind, the Delayed Dupacks [MV97, Vaidya99] scheme selectively delays duplicate acknowledgements at the receiver. It is preferable to allow a local mechanism to resolve a local problem, instead of invoking TCP's end- to-end mechanism and incurring the associated costs, both in terms of wasted bandwidth and in terms of its effect on TCP's window behavior. The Delayed Dupacks scheme can be used despite IP encryption since the intermediate node does not need to examine the TCP headers. Currently, it is not well understood how long the receiver should delay the duplicate acknowledgments. In particular, the impact of wireless medium access control (MAC) protocol on the choice of delay parameter needs to be studied. The MAC protocol may affect the ability to choose the appropriate delay (either statically or dynamically). In general, significant variabilities in link-level retransmission times can have an adverse impact on the performance of the Delayed Dupacks scheme. Furthermore, as discussed later in section 4.10.3, Delayed Dupacks and some other schemes (such as Snoop [SNOOP]) are only beneficial in certain types of network links. Recommendation: Delaying duplicate acknowledgements may be useful in specific network topologies, but a general recommendation requires further research and experience. 4.6 Selective Acknowledgements [RFC2018] SACK may not be useful in many LTNs, according to Section 1.1 of [TCPHP]. In particular, SACK is more useful in the LFN regime, especially if large windows are being used, because there is a
considerable probability of multiple segment losses per window. In the LTN regime, TCP windows are much smaller, and burst errors must be much longer in duration in order to damage multiple segments. Accordingly, the complexity of SACK may not be justifiable, unless there is a high probability of burst errors and congestion on the wireless link. A desire for compatibility with TCP recommendations for non-LTN environments may dictate LTN support for SACK anyway. [AGS98] recommends use of SACK with Large TCP Windows in satellite environments, and notes that this implies support for PAWS (Protection Against Wrapped Sequence space) and RTTM (Round Trip Time Measurement) as well. Berkeley's SNOOP protocol research [SNOOP] indicates that SACK does improve throughput for SNOOP when multiple segments are lost per window [BPSK96]. SACK allows SNOOP to recover from multi-segment losses in one round-trip. In this case, the mobile device needs to implement some form of selective acknowledgements. If SACK is not used, TCP may enter congestion avoidance as the time needed to retransmit the lost segments may be greater than the retransmission timer. Recommendation: Implement SACK now for compatibility with other TCPs and improved performance with SNOOP. 4.7 Detecting Corruption Loss 4.7.1 Without Explicit Notification In the absence of explicit notification from the network, some researchers have suggested statistical methods for congestion avoidance [Jain89, WC91, VEGAS]. A natural extension of these heuristics would enable a sender to distinguish between losses caused by congestion and other causes. The research results on the reliability of sender-based heuristics is unfavorable [BV97, BV98]. [BV98a] reports better results in constrained environments using packet inter-arrival times measured at the receiver, but highly- variable delay - of the type encountered in wireless environments during intercell handoff - confounds these heuristics. Recommendation: No recommendation at this time - continue to monitor research results.
4.7.2 With Explicit Notifications With explicit notification from the network it is possible to determine when a loss is due to congestion. Several proposals along these lines include: - Explicit Loss Notification (ELN) [BPSK96] - Explicit Bad State Notification (EBSN) [BBKVP96] - Explicit Loss Notification to the Receiver (ELNR), and Explicit Delayed Dupack Activation Notification (EDDAN) (notifications to mobile receiver) [MV97] - Explicit Congestion Notification (ECN) [ECN] Of these proposals, Explicit Congestion Notification (ECN) seems closest to deployment on the Internet, and will provide some benefit for TCP connections on long thin networks (as well as for all other TCP connections). Recommendation: No recommendation at this time. Schemes like ELNR and EDDAN [MV97], in which the only systems that need to be modified are the intermediate node and the mobile device, are slated for adoption pending further research. However, this solution has some limitations. Since the intermediate node must have access to the TCP headers, the IP payload must not be encrypted. ECN uses the TOS byte in the IP header to carry congestion information (ECN-capable and Congestion-encountered). This byte is not encrypted in IPSEC, so ECN can be used on TCP connections that are encrypted using IPSEC. Recommendation: Implement ECN. In spite of this, mechanisms for explicit corruption notification are still relevant and should be tracked. Note: ECN provides useful information to avoid deteriorating further a bad situation, but has some limitations for wireless applications. Absence of packets marked with ECN should not be interpreted by ECN- capable TCP connections as a green light for aggressive retransmissions. On the contrary, during periods of extreme network congestion routers may drop packets marked with explicit notification because their buffers are exhausted - exactly the wrong time for a host to begin retransmitting aggressively.