Network Working Group M. Allman, Editor Request for Comments: 2760 NASA Glenn Research Center/BBN Technologies Category: Informational S. Dawkins Nortel D. Glover J. Griner D. Tran NASA Glenn Research Center T. Henderson University of California at Berkeley J. Heidemann J. Touch University of Southern California/ISI H. Kruse S. Ostermann Ohio University K. Scott The MITRE Corporation J. Semke Pittsburgh Supercomputing Center February 2000 Ongoing TCP Research Related to Satellites Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2000). All Rights Reserved. Abstract This document outlines possible TCP enhancements that may allow TCP to better utilize the available bandwidth provided by networks containing satellite links. The algorithms and mechanisms outlined have not been judged to be mature enough to be recommended by the IETF. The goal of this document is to educate researchers as to the current work and progress being done in TCP research related to satellite networks.
Table of Contents 1 Introduction. . . . . . . . . . . . . . . . . . . . 2 2 Satellite Architectures . . . . . . . . . . . . . . 3 2.1 Asymmetric Satellite Networks . . . . . . . . . . . 3 2.2 Satellite Link as Last Hop. . . . . . . . . . . . . 3 2.3 Hybrid Satellite Networks . . . . . . . . . . . 4 2.4 Point-to-Point Satellite Networks . . . . . . . . . 4 2.5 Multiple Satellite Hops . . . . . . . . . . . . . . 4 3 Mitigations . . . . . . . . . . . . . . . . . . . . 4 3.1 TCP For Transactions. . . . . . . . . . . . . . . . 4 3.2 Slow Start. . . . . . . . . . . . . . . . . . . . . 5 3.2.1 Larger Initial Window . . . . . . . . . . . . . . . 6 3.2.2 Byte Counting . . . . . . . . . . . . . . . . . . . 7 3.2.3 Delayed ACKs After Slow Start . . . . . . . . . . . 9 3.2.4 Terminating Slow Start. . . . . . . . . . . . . . . 11 3.3 Loss Recovery . . . . . . . . . . . . . . . . . . . 12 3.3.1 Non-SACK Based Mechanisms . . . . . . . . . . . . . 12 3.3.2 SACK Based Mechanisms . . . . . . . . . . . . . . . 13 3.3.3 Explicit Congestion Notification. . . . . . . . . . 16 3.3.4 Detecting Corruption Loss . . . . . . . . . . . . . 18 3.4 Congestion Avoidance. . . . . . . . . . . . . . . . 21 3.5 Multiple Data Connections . . . . . . . . . . . . . 22 3.6 Pacing TCP Segments . . . . . . . . . . . . . . . . 24 3.7 TCP Header Compression. . . . . . . . . . . . . . . 26 3.8 Sharing TCP State Among Similar Connections . . . . 29 3.9 ACK Congestion Control. . . . . . . . . . . . . . . 32 3.10 ACK Filtering . . . . . . . . . . . . . . . . . . . 34 4 Conclusions . . . . . . . . . . . . . . . . . . . . 36 5 Security Considerations . . . . . . . . . . . . . . 36 6 Acknowledgments . . . . . . . . . . . . . . . . . . 37 7 References. . . . . . . . . . . . . . . . . . . . . 37 8 Authors' Addresses. . . . . . . . . . . . . . . . . 43 9 Full Copyright Statement. . . . . . . . . . . . . . 46 1 Introduction This document outlines mechanisms that may help the Transmission Control Protocol (TCP) [Pos81] better utilize the bandwidth provided by long-delay satellite environments. These mechanisms may also help in other environments or for other protocols. The proposals outlined in this document are currently being studied throughout the research community. Therefore, these mechanisms are not mature enough to be recommended for wide-spread use by the IETF. However, some of these mechanisms may be safely used today. It is hoped that this document will stimulate further study into the described mechanisms. If, at
some point, the mechanisms discussed in this memo prove to be safe and appropriate to be recommended for general use, the appropriate IETF documents will be written. It should be noted that non-TCP mechanisms that help performance over satellite links do exist (e.g., application-level changes, queueing disciplines, etc.). However, outlining these non-TCP mitigations is beyond the scope of this document and therefore is left as future work. Additionally, there are a number of mitigations to TCP's performance problems that involve very active intervention by gateways along the end-to-end path from the sender to the receiver. Documenting the pros and cons of such solutions is also left as future work. 2 Satellite Architectures Specific characteristics of satellite links and the impact these characteristics have on TCP are presented in RFC 2488 [AGS99]. This section discusses several possible topologies where satellite links may be integrated into the global Internet. The mitigation outlined in section 3 will include a discussion of which environment the mechanism is expected to benefit. 2.1 Asymmetric Satellite Networks Some satellite networks exhibit a bandwidth asymmetry, a larger data rate in one direction than the reverse direction, because of limits on the transmission power and the antenna size at one end of the link. Meanwhile, some other satellite systems are unidirectional and use a non-satellite return path (such as a dialup modem link). The nature of most TCP traffic is asymmetric with data flowing in one direction and acknowledgments in opposite direction. However, the term asymmetric in this document refers to different physical capacities in the forward and return links. Asymmetry has been shown to be a problem for TCP [BPK97,BPK98]. 2.2 Satellite Link as Last Hop Satellite links that provide service directly to end users, as opposed to satellite links located in the middle of a network, may allow for specialized design of protocols used over the last hop. Some satellite providers use the satellite link as a shared high speed downlink to users with a lower speed, non-shared terrestrial link that is used as a return link for requests and acknowledgments. Many times this creates an asymmetric network, as discussed above.
2.3 Hybrid Satellite Networks In the more general case, satellite links may be located at any point in the network topology. In this case, the satellite link acts as just another link between two gateways. In this environment, a given connection may be sent over terrestrial links (including terrestrial wireless), as well as satellite links. On the other hand, a connection could also travel over only the terrestrial network or only over the satellite portion of the network. 2.4 Point-to-Point Satellite Networks In point-to-point satellite networks, the only hop in the network is over the satellite link. This pure satellite environment exhibits only the problems associated with the satellite links, as outlined in [AGS99]. Since this is a private network, some mitigations that are not appropriate for shared networks can be considered. 2.5 Multiple Satellite Hops In some situations, network traffic may traverse multiple satellite hops between the source and the destination. Such an environment aggravates the satellite characteristics described in [AGS99]. 3 Mitigations The following sections will discuss various techniques for mitigating the problems TCP faces in the satellite environment. Each of the following sections will be organized as follows: First, each mitigation will be briefly outlined. Next, research work involving the mechanism in question will be briefly discussed. Next the implementation issues of the mechanism will be presented (including whether or not the particular mechanism presents any dangers to shared networks). Then a discussion of the mechanism's potential with regard to the topologies outlined above is given. Finally, the relationships and possible interactions with other TCP mechanisms are outlined. The reader is expected to be familiar with the TCP terminology used in [AGS99]. 3.1 TCP For Transactions 3.1.1 Mitigation Description TCP uses a three-way handshake to setup a connection between two hosts [Pos81]. This connection setup requires 1-1.5 round-trip times (RTTs), depending upon whether the data sender started the connection actively or passively. This startup time can be eliminated by using TCP extensions for transactions (T/TCP) [Bra94]. After the first
connection between a pair of hosts is established, T/TCP is able to bypass the three-way handshake, allowing the data sender to begin transmitting data in the first segment sent (along with the SYN). This is especially helpful for short request/response traffic, as it saves a potentially long setup phase when no useful data is being transmitted. 3.1.2 Research T/TCP is outlined and analyzed in [Bra92,Bra94]. 3.1.3 Implementation Issues T/TCP requires changes in the TCP stacks of both the data sender and the data receiver. While T/TCP is safe to implement in shared networks from a congestion control perspective, several security implications of sending data in the first data segment have been identified [ddKI99]. 3.1.4 Topology Considerations It is expected that T/TCP will be equally beneficial in all environments outlined in section 2. 3.1.5 Possible Interaction and Relationships with Other Research T/TCP allows data transfer to start more rapidly, much like using a larger initial congestion window (see section 3.2.1), delayed ACKs after slow start (section 3.2.3) or byte counting (section 3.2.2). 3.2 Slow Start The slow start algorithm is used to gradually increase the size of TCP's congestion window (cwnd) [Jac88,Ste97,APS99]. The algorithm is an important safe-guard against transmitting an inappropriate amount of data into the network when the connection starts up. However, slow start can also waste available network capacity, especially in long-delay networks [All97a,Hay97]. Slow start is particularly inefficient for transfers that are short compared to the delay*bandwidth product of the network (e.g., WWW transfers). Delayed ACKs are another source of wasted capacity during the slow start phase. RFC 1122 [Bra89] suggests data receivers refrain from ACKing every incoming data segment. However, every second full-sized segment should be ACKed. If a second full-sized segment does not arrive within a given timeout, an ACK must be generated (this timeout cannot exceed 500 ms). Since the data sender increases the size of cwnd based on the number of arriving ACKs, reducing the number of
ACKs slows the cwnd growth rate. In addition, when TCP starts sending, it sends 1 segment. When using delayed ACKs a second segment must arrive before an ACK is sent. Therefore, the receiver is always forced to wait for the delayed ACK timer to expire before ACKing the first segment, which also increases the transfer time. Several proposals have suggested ways to make slow start less time consuming. These proposals are briefly outlined below and references to the research work given. 3.2.1 Larger Initial Window 22.214.171.124 Mitigation Description One method that will reduce the amount of time required by slow start (and therefore, the amount of wasted capacity) is to increase the initial value of cwnd. An experimental TCP extension outlined in [AFP98] allows the initial size of cwnd to be increased from 1 segment to that given in equation (1). min (4*MSS, max (2*MSS, 4380 bytes)) (1) By increasing the initial value of cwnd, more packets are sent during the first RTT of data transmission, which will trigger more ACKs, allowing the congestion window to open more rapidly. In addition, by sending at least 2 segments initially, the first segment does not need to wait for the delayed ACK timer to expire as is the case when the initial size of cwnd is 1 segment (as discussed above). Therefore, the value of cwnd given in equation 1 saves up to 3 RTTs and a delayed ACK timeout when compared to an initial cwnd of 1 segment. Also, we note that RFC 2581 [APS99], a standards-track document, allows a TCP to use an initial cwnd of up to 2 segments. This change is highly recommended for satellite networks. 126.96.36.199 Research Several researchers have studied the use of a larger initial window in various environments. [Nic97] and [KAGT98] show a reduction in WWW page transfer time over hybrid fiber coax (HFC) and satellite links respectively. Furthermore, it has been shown that using an initial cwnd of 4 segments does not negatively impact overall performance over dialup modem links with a small number of buffers [SP98]. [AHO98] shows an improvement in transfer time for 16 KB files across the Internet and dialup modem links when using a larger initial value for cwnd. However, a slight increase in dropped
segments was also shown. Finally, [PN98] shows improved transfer time for WWW traffic in simulations with competing traffic, in addition to a small increase in the drop rate. 188.8.131.52 Implementation Issues The use of a larger initial cwnd value requires changes to the sender's TCP stack. Using an initial congestion window of 2 segments is allowed by RFC 2581 [APS99]. Using an initial congestion window of 3 or 4 segments is not expected to present any danger of congestion collapse [AFP98], however may degrade performance in some networks. 184.108.40.206 Topology Considerations It is expected that the use of a large initial window would be equally beneficial to all network architectures outlined in section 2. 220.127.116.11 Possible Interaction and Relationships with Other Research Using a fixed larger initial congestion window decreases the impact of a long RTT on transfer time (especially for short transfers) at the cost of bursting data into a network with unknown conditions. A mechanism that mitigates bursts may make the use of a larger initial congestion window more appropriate (e.g., limiting the size of line- rate bursts [FF96] or pacing the segments in a burst [VH97a]). Also, using delayed ACKs only after slow start (as outlined in section 3.2.3) offers an alternative way to immediately ACK the first segment of a transfer and open the congestion window more rapidly. Finally, using some form of TCP state sharing among a number of connections (as discussed in 3.8) may provide an alternative to using a fixed larger initial window. 3.2.2 Byte Counting 18.104.22.168 Mitigation Description As discussed above, the wide-spread use of delayed ACKs increases the time needed by a TCP sender to increase the size of the congestion window during slow start. This is especially harmful to flows traversing long-delay GEO satellite links. One mechanism that has been suggested to mitigate the problems caused by delayed ACKs is the use of "byte counting", rather than standard ACK counting [All97a,All98]. Using standard ACK counting, the congestion window is increased by 1 segment for each ACK received during slow start. However, using byte counting the congestion window increase is based
on the number of previously unacknowledged bytes covered by each incoming ACK, rather than on the number of ACKs received. This makes the increase relative to the amount of data transmitted, rather than being dependent on the ACK interval used by the receiver. Two forms of byte counting are studied in [All98]. The first is unlimited byte counting (UBC). This mechanism simply uses the number of previously unacknowledged bytes to increase the congestion window each time an ACK arrives. The second form is limited byte counting (LBC). LBC limits the amount of cwnd increase to 2 segments. This limit throttles the size of the burst of data sent in response to a "stretch ACK" [Pax97]. Stretch ACKs are acknowledgments that cover more than 2 segments of previously unacknowledged data. Stretch ACKs can occur by design [Joh95] (although this is not standard), due to implementation bugs [All97b,PADHV99] or due to ACK loss. [All98] shows that LBC prevents large line-rate bursts when compared to UBC, and therefore offers fewer dropped segments and better performance. In addition, UBC causes large bursts during slow start based loss recovery due to the large cumulative ACKs that can arrive during loss recovery. The behavior of UBC during loss recovery can cause large decreases in performance and [All98] strongly recommends UBC not be deployed without further study into mitigating the large bursts. Note: The standards track RFC 2581 [APS99] allows a TCP to use byte counting to increase cwnd during congestion avoidance, however not during slow start. 22.214.171.124 Research Using byte counting, as opposed to standard ACK counting, has been shown to reduce the amount of time needed to increase the value of cwnd to an appropriate size in satellite networks [All97a]. In addition, [All98] presents a simulation comparison of byte counting and the standard cwnd increase algorithm in uncongested networks and networks with competing traffic. This study found that the limited form of byte counting outlined above can improve performance, while also increasing the drop rate slightly. [BPK97,BPK98] also investigated unlimited byte counting in conjunction with various ACK filtering algorithms (discussed in section 3.10) in asymmetric networks.
126.96.36.199 Implementation Issues Changing from ACK counting to byte counting requires changes to the data sender's TCP stack. Byte counting violates the algorithm for increasing the congestion window outlined in RFC 2581 [APS99] (by making congestion window growth more aggressive during slow start) and therefore should not be used in shared networks. 188.8.131.52 Topology Considerations It has been suggested by some (and roundly criticized by others) that byte counting will allow TCP to provide uniform cwnd increase, regardless of the ACKing behavior of the receiver. In addition, byte counting also mitigates the retarded window growth provided by receivers that generate stretch ACKs because of the capacity of the return link, as discussed in [BPK97,BPK98]. Therefore, this change is expected to be especially beneficial to asymmetric networks. 184.108.40.206 Possible Interaction and Relationships with Other Research Unlimited byte counting should not be used without a method to mitigate the potentially large line-rate bursts the algorithm can cause. Also, LBC may send bursts that are too large for the given network conditions. In this case, LBC may also benefit from some algorithm that would lessen the impact of line-rate bursts of segments. Also note that using delayed ACKs only after slow start (as outlined in section 3.2.3) negates the limited byte counting algorithm because each ACK covers only one segment during slow start. Therefore, both ACK counting and byte counting yield the same increase in the congestion window at this point (in the first RTT). 3.2.3 Delayed ACKs After Slow Start 220.127.116.11 Mitigation Description As discussed above, TCP senders use the number of incoming ACKs to increase the congestion window during slow start. And, since delayed ACKs reduce the number of ACKs returned by the receiver by roughly half, the rate of growth of the congestion window is reduced. One proposed solution to this problem is to use delayed ACKs only after the slow start (DAASS) phase. This provides more ACKs while TCP is aggressively increasing the congestion window and less ACKs while TCP is in steady state, which conserves network resources.
18.104.22.168 Research [All98] shows that in simulation, using delayed ACKs after slow start (DAASS) improves transfer time when compared to a receiver that always generates delayed ACKs. However, DAASS also slightly increases the loss rate due to the increased rate of cwnd growth. 22.214.171.124 Implementation Issues The major problem with DAASS is in the implementation. The receiver has to somehow know when the sender is using the slow start algorithm. The receiver could implement a heuristic that attempts to watch the change in the amount of data being received and change the ACKing behavior accordingly. Or, the sender could send a message (a flipped bit in the TCP header, perhaps) indicating that it was using slow start. The implementation of DAASS is, therefore, an open issue. Using DAASS does not violate the TCP congestion control specification [APS99]. However, the standards (RFC 2581 [APS99]) currently recommend using delayed acknowledgments and DAASS goes (partially) against this recommendation. 126.96.36.199 Topology Considerations DAASS should work equally well in all scenarios presented in section 2. However, in asymmetric networks it may aggravate ACK congestion in the return link, due to the increased number of ACKs (see sections 3.9 and 3.10 for a more detailed discussion of ACK congestion). 188.8.131.52 Possible Interaction and Relationships with Other Research DAASS has several possible interactions with other proposals made in the research community. DAASS can aggravate congestion on the path between the data receiver and the data sender due to the increased number of returning acknowledgments. This can have an especially adverse effect on asymmetric networks that are prone to experiencing ACK congestion. As outlined in sections 3.9 and 3.10, several mitigations have been proposed to reduce the number of ACKs that are passed over a low-bandwidth return link. Using DAASS will increase the number of ACKs sent by the receiver. The interaction between DAASS and the methods for reducing the number of ACKs is an open research question. Also, as noted in section 184.108.40.206 above, DAASS provides some of the same benefits as using a larger initial congestion window and therefore it may not be desirable to use both mechanisms together. However, this remains an open question. Finally, DAASS and limited byte counting are both used to increase
the rate at which the congestion window is opened. The DAASS algorithm substantially reduces the impact limited byte counting has on the rate of congestion window increase. 3.2.4 Terminating Slow Start 220.127.116.11 Mitigation Description The initial slow start phase is used by TCP to determine an appropriate congestion window size for the given network conditions [Jac88]. Slow start is terminated when TCP detects congestion, or when the size of cwnd reaches the size of the receiver's advertised window. Slow start is also terminated if cwnd grows beyond a certain size. The threshold at which TCP ends slow start and begins using the congestion avoidance algorithm is called "ssthresh" [Jac88]. In most implementations, the initial value for ssthresh is the receiver's advertised window. During slow start, TCP roughly doubles the size of cwnd every RTT and therefore can overwhelm the network with at most twice as many segments as the network can handle. By setting ssthresh to a value less than the receiver's advertised window initially, the sender may avoid overwhelming the network with twice the appropriate number of segments. Hoe [Hoe96] proposes using the packet-pair algorithm [Kes91] and the measured RTT to determine a more appropriate value for ssthresh. The algorithm observes the spacing between the first few returning ACKs to determine the bandwidth of the bottleneck link. Together with the measured RTT, the delay*bandwidth product is determined and ssthresh is set to this value. When TCP's cwnd reaches this reduced ssthresh, slow start is terminated and transmission continues using congestion avoidance, which is a more conservative algorithm for increasing the size of the congestion window. 18.104.22.168 Research It has been shown that estimating ssthresh can improve performance and decrease packet loss in simulations [Hoe96]. However, obtaining an accurate estimate of the available bandwidth in a dynamic network is very challenging, especially attempting to do so on the sending side of the TCP connection [AP99]. Therefore, before this mechanism is widely deployed, bandwidth estimation must be studied in a more detail. 22.214.171.124 Implementation Issues As outlined in [Hoe96], estimating ssthresh requires changes to the data sender's TCP stack. As suggested in [AP99], bandwidth estimates may be more accurate when taken by the TCP receiver, and therefore both sender and receiver changes would be required. Estimating
ssthresh is safe to implement in production networks from a congestion control perspective, as it can only make TCP more conservative than outlined in RFC 2581 [APS99] (assuming the TCP implementation is using an initial ssthresh of infinity as allowed by [APS99]). 126.96.36.199 Topology Considerations It is expected that this mechanism will work equally well in all symmetric topologies outlined in section 2. However, asymmetric links pose a special problem, as the rate of the returning ACKs may not be the bottleneck bandwidth in the forward direction. This can lead to the sender setting ssthresh too low. Premature termination of slow start can hurt performance, as congestion avoidance opens cwnd more conservatively. Receiver-based bandwidth estimators do not suffer from this problem. 188.8.131.52 Possible Interaction and Relationships with Other Research Terminating slow start at the right time is useful to avoid multiple dropped segments. However, using a selective acknowledgment-based loss recovery scheme (as outlined in section 3.3.2) can drastically improve TCP's ability to quickly recover from multiple lost segments Therefore, it may not be as important to terminate slow start before a large loss event occurs. [AP99] shows that using delayed acknowledgments [Bra89] reduces the effectiveness of sender-side bandwidth estimation. Therefore, using delayed ACKs only during slow start (as outlined in section 3.2.3) may make bandwidth estimation more feasible.