MMFR96]. These algorithms generally work by updating the fast recovery algorithm to use information provided by "partial ACKs" to trigger retransmissions. A partial ACK covers some new data, but not all data outstanding when a particular loss event starts. For instance, consider the case when segment N is retransmitted using the fast
retransmit algorithm and segment M is the last segment sent when segment N is resent. If segment N is the only segment lost, the ACK elicited by the retransmission of segment N would be for segment M. If, however, segment N+1 was also lost, the ACK elicited by the retransmission of segment N will be N+1. This can be taken as an indication that segment N+1 was lost and used to trigger a retransmission. Hoe95,Hoe96] introduced the idea of using partial ACKs to trigger retransmissions and showed that doing so could improve performance. [FF96] shows that in some cases using partial ACKs to trigger retransmissions reduces the time required to recover from multiple lost segments. However, [FF96] also shows that in some cases (many lost segments) relying on the RTO timer can improve performance over simply using partial ACKs to trigger all retransmissions. [HK99] shows that using partial ACKs to trigger retransmissions, in conjunction with SACK, improves performance when compared to TCP using fast retransmit/fast recovery in a satellite environment. Finally, [FH99] describes several slightly different variants of NewReno. RFC 2581 [APS99]. section 2. section 188.8.131.52.5. FF96] describe a conservative extension to the fast recovery algorithm that takes into account information provided by selective acknowledgments (SACKs) [MMFR96] sent by the receiver. The algorithm starts after fast retransmit triggers the resending of a
segment. As with fast retransmit, the algorithm cuts cwnd in half when a loss is detected. The algorithm keeps a variable called "pipe", which is an estimate of the number of outstanding segments in the network. The pipe variable is decremented by 1 segment for each duplicate ACK that arrives with new SACK information. The pipe variable is incremented by 1 for each new or retransmitted segment sent. A segment may be sent when the value of pipe is less than cwnd (this segment is either a retransmission per the SACK information or a new segment if the SACK information indicates that no more retransmits are needed). This algorithm generally allows TCP to recover from multiple segment losses in a window of data within one RTT of loss detection. Like the forward acknowledgment (FACK) algorithm described below, the SACK information allows the pipe algorithm to decouple the choice of when to send a segment from the choice of what segment to send. [APS99] allows the use of this algorithm, as it is consistent with the spirit of the fast recovery algorithm. FF96] shows that the above described SACK algorithm performs better than several non-SACK based recovery algorithms when 1--4 segments are lost from a window of data. [AHKO97] shows that the algorithm improves performance over satellite links. Hayes [Hay97] shows the in certain circumstances, the SACK algorithm can hurt performance by generating a large line-rate burst of data at the end of loss recovery, which causes further loss. RFC 2581 [APS99]. section 2. section 184.108.40.206.5.
MM96a,MM96b] was developed to improve TCP congestion control during loss recovery. FACK uses TCP SACK options to glean additional information about the congestion state, adding more precise control to the injection of data into the network during recovery. FACK decouples the congestion control algorithms from the data recovery algorithms to provide a simple and direct way to use SACK to improve congestion control. Due to the separation of these two algorithms, new data may be sent during recovery to sustain TCP's self-clock when there is no further data to retransmit. The most recent version of FACK is Rate-Halving [MM96b], in which one packet is sent for every two ACKs received during recovery. Transmitting a segment for every-other ACK has the result of reducing the congestion window in one round trip to half of the number of packets that were successfully handled by the network (so when cwnd is too large by more than a factor of two it still gets reduced to half of what the network can sustain). Another important aspect of FACK with Rate-Halving is that it sustains the ACK self-clock during recovery because transmitting a packet for every-other ACK does not require half a cwnd of data to drain from the network before transmitting, as required by the fast recovery algorithm [Ste97,APS99]. In addition, the FACK with Rate-Halving implementation provides Thresholded Retransmission to each lost segment. "Tcprexmtthresh" is the number of duplicate ACKs required by TCP to trigger a fast retransmit and enter recovery. FACK applies thresholded retransmission to all segments by waiting until tcprexmtthresh SACK blocks indicate that a given segment is missing before resending the segment. This allows reasonable behavior on links that reorder segments. As described above, FACK sends a segment for every second ACK received during recovery. New segments are transmitted except when tcprexmtthresh SACK blocks have been observed for a dropped segment, at which point the dropped segment is retransmitted. [APS99] allows the use of this algorithm, as it is consistent with the spirit of the fast recovery algorithm. MM96a]. The algorithm was later enhanced to include Rate-Halving [MM96b]. The real-world performance of FACK with Rate-Halving was shown to be much closer to
the theoretical maximum for TCP than either TCP Reno or the SACK- based extensions to fast recovery outlined in section 220.127.116.11 [MSMO97]. RFC 2581 [APS99]. section 2. Since it is better able to sustain its self-clock than TCP Reno, it may be considerably more attractive over long delay paths. section 3.3.1, in that they attempt to recover from multiple lost segments without reverting to using the retransmission timer. As has been shown, the above SACK based algorithms are more robust than the NewReno algorithm. However, the SACK algorithm requires a cooperating TCP receiver, which the NewReno algorithm does not. A reasonable TCP implementation might include both a SACK-based and a NewReno-based loss recovery algorithm such that the sender can use the most appropriate loss recovery algorithm based on whether or not the receiver supports SACKs. Finally, both SACK-based and non-SACK-based versions of fast recovery have been shown to transmit a large burst of data upon leaving loss recovery, in some cases [Hay97]. Therefore, the algorithms may benefit from some burst suppression algorithm.
informing it of congestion. IP routers can accomplish this with an ICMP Source Quench message. The arrival of a BECN signal may or may not mean that a TCP data segment has been dropped, but it is a clear indication that the TCP sender should reduce its sending rate (i.e., the value of cwnd). The second major form of congestion notification is forward ECN (FECN). FECN routers mark data segments with a special tag when congestion is imminent, but forward the data segment. The data receiver then echos the congestion information back to the sender in the ACK packet. A description of a FECN mechanism for TCP/IP is given in [RF99]. As described in [RF99], senders transmit segments with an "ECN- Capable Transport" bit set in the IP header of each packet. If a router employing an active queueing strategy, such as Random Early Detection (RED) [FJ93,BCC+98], would otherwise drop this segment, an "Congestion Experienced" bit in the IP header is set instead. Upon reception, the information is echoed back to TCP senders using a bit in the TCP header. The TCP sender adjusts the congestion window just as it would if a segment was dropped. The implementation of ECN as specified in [RF99] requires the deployment of active queue management mechanisms in the affected routers. This allows the routers to signal congestion by sending TCP a small number of "congestion signals" (segment drops or ECN messages), rather than discarding a large number of segments, as can happen when TCP overwhelms a drop-tail router queue. Since satellite networks generally have higher bit-error rates than terrestrial networks, determining whether a segment was lost due to congestion or corruption may allow TCP to achieve better performance in high BER environments than currently possible (due to TCP's assumption that all loss is due to congestion). While not a solution to this problem, adding an ECN mechanism to TCP may be a part of a mechanism that will help achieve this goal. See section 3.3.4 for a more detailed discussion of differentiating between corruption and congestion based losses. Flo94] shows that ECN is effective in reducing the segment loss rate which yields better performance especially for short and interactive TCP connections. Furthermore, [Flo94] also shows that ECN avoids some unnecessary, and costly TCP retransmission timeouts. Finally, [Flo94] also considers some of the advantages and disadvantages of various forms of explicit congestion notification.
section 2 will present a bias towards or against ECN traffic. Jac88,Jac90] and defined in [Bra89,Ste97,APS99], is to assume that all loss is due to congestion and to trigger the congestion control algorithms, as defined in [Ste97,APS99]. The loss may be detected using the fast retransmit algorithm, or in the worst case is detected by the expiration of TCP's retransmission timer. TCP's assumption that loss is due to congestion rather than corruption is a conservative mechanism that prevents congestion collapse [Jac88,FF98]. Over satellite networks, however, as in many wireless environments, loss due to corruption is more common than on terrestrial networks. One common partial solution to this problem is
to add Forward Error Correction (FEC) to the data that's sent over the satellite/wireless link. A more complete discussion of the benefits of FEC can be found in [AGS99]. However, given that FEC does not always work or cannot be universally applied, other mechanisms have been studied to attempt to make TCP able to differentiate between congestion-based and corruption-based loss. TCP segments that have been corrupted are most often dropped by intervening routers when link-level checksum mechanisms detect that an incoming frame has errors. Occasionally, a TCP segment containing an error may survive without detection until it arrives at the TCP receiving host, at which point it will almost always either fail the IP header checksum or the TCP checksum and be discarded as in the link-level error case. Unfortunately, in either of these cases, it's not generally safe for the node detecting the corruption to return information about the corrupt packet to the TCP sender because the sending address itself might have been corrupted. SF98] show a performance improvement when TCP can properly differentiate between between corruption and congestion of wireless links. Perhaps the greatest research challenge in detecting corruption is getting TCP (a transport-layer protocol) to receive appropriate information from either the network layer (IP) or the link layer. Much of the work done to date has involved link-layer mechanisms that retransmit damaged segments. The challenge seems to be to get these mechanisms to make repairs in such a way that TCP understands what happened and can respond appropriately. BKVP97,BPSK96]. One of the problems with this promising technique is that it causes an effective reordering of the segments from the TCP receiver's point of view. As a simple example, if segments A B C D are sent across a noisy link and segment B is corrupted, segments C and D may have already crossed the link before B can be retransmitted at the link
level, causing them to arrive at the TCP receiver in the order A C D B. This segment reordering would cause the TCP receiver to generate duplicate ACKs upon the arrival of segments C and D. If the reordering was bad enough, the sender would trigger the fast retransmit algorithm in the TCP sender, in response to the duplicate ACKs. Research presented in [MV98] proposes the idea of suppressing or delaying the duplicate ACKs in the reverse direction to counteract this behavior. Alternatively, proposals that make TCP more robust in the face of re-ordered segment arrivals [Flo99] may reduce the side effects of the re-ordering caused by link-layer retransmissions. A more high-level approach, outlined in the [DMT96], uses a new "corruption experienced" ICMP error message generated by routers that detect corruption. These messages are sent in the forward direction, toward the packet's destination, rather than in the reverse direction as is done with ICMP Source Quench messages. Sending the error messages in the forward direction allows this feedback to work over asymmetric paths. As noted above, generating an error message in response to a damaged packet is problematic because the source and destination addresses may not be valid. The mechanism outlined in [DMT96] gets around this problem by having the routers maintain a small cache of recent packet destinations; when the router experiences an error rate above some threshold, it sends an ICMP corruption-experienced message to all of the destinations in its cache. Each TCP receiver then must return this information to its respective TCP sender (through a TCP option). Upon receiving an ACK with this "corruption-experienced" option, the TCP sender assumes that packet loss is due to corruption rather than congestion for two round trip times (RTT) or until it receives additional link state information (such as "link down", source quench, or additional "corruption experienced" messages). Note that in shared networks, ignoring segment loss for 2 RTTs may aggravate congestion by making TCP unresponsive. section 2. It would be particularly beneficial in the satellite/wireless environment over which these errors may be more prevalent.
Jac88,Ste97,APS99]. Several researchers have observed that this policy leads to unfair sharing of bandwidth when multiple connections with different RTTs traverse the same bottleneck link, with the long RTT connections obtaining only a small fraction of their fair share of the bandwidth. One effective solution to this problem is to deploy fair queueing and TCP-friendly buffer management in network routers [Sut98]. However, in the absence of help from the network, other researchers have investigated changes to the congestion avoidance policy at the TCP sender, as described in [Flo91,HK98]. Flo91,HK98]. It attempts to equalize the rate at which TCP senders increase their sending rate during congestion avoidance. Both [Flo91] and [HK98] illustrate cases in which the "Constant-Rate" policy largely corrects the bias against long RTT connections, although [HK98] presents some evidence that such a policy may be difficult to incrementally deploy in an operational network. The proper selection of a constant (for the constant rate of increase) is an open issue. The "Increase-by-K" policy can be selectively used by long RTT connections in a heterogeneous environment. This policy simply changes the slope of the linear increase, with connections over a given RTT threshold adding "K" segments to the congestion window every RTT, instead of one. [HK98] presents evidence that this policy, when used with small values of "K", may be successful in reducing the unfairness while keeping the link utilization high, when a small number of connections share a bottleneck link. The selection of the constant "K," the RTT threshold to invoke this policy, and performance under a large number of flows are all open issues.
RFC 2581 [APS99] and therefore should not be implemented in shared networks at this time. PADHV99], increasing the congestion window by multiple segments per RTT can cause TCP to drop multiple segments and force a retransmission timeout in some versions of TCP. Therefore, the above changes to the congestion avoidance algorithm may need to be accompanied by a SACK-based loss recovery algorithm that can quickly repair multiple dropped segments.
2. During the congestion avoidance phase, the transfer increases the effective cwnd by N segments per RTT, rather than the one segment per RTT increase that a single TCP connection provides. Again, this can aid the transfer by more rapidly increasing the effective cwnd to an appropriate point. However, this rate of increase can also be too aggressive for the network conditions. In this case, the use of multiple data connections can aggravate congestion in the network. 3. Using multiple connections can provide a very large overall congestion window. This can be an advantage for TCP implementations that do not support the TCP window scaling extension [JBB92]. However, the aggregate cwnd size across all N connections is equivalent to using a TCP implementation that supports large windows. 4. The overall cwnd decrease in the face of dropped segments is reduced when using N parallel connections. A single TCP connection reduces the effective size of cwnd to half when a single segment loss is detected. When utilizing N connections each using a window of W bytes, a single drop reduces the window to: (N * W) - (W / 2) Clearly this is a less dramatic reduction in the effective cwnd size than when using a single TCP connection. And, the amount by which the cwnd is decreased is further reduced by increasing N. The use of multiple data connections can increase the ability of non-SACK TCP implementations to quickly recover from multiple dropped segments without resorting to a timeout, assuming the dropped segments cross connections. The use of multiple parallel connections makes TCP overly aggressive for many environments and can contribute to congestive collapse in shared networks [FF99]. The advantages provided by using multiple TCP connections are now largely provided by TCP extensions (larger windows, SACKs, etc.). Therefore, the use of a single TCP connection is more "network friendly" than using multiple parallel connections. However, using multiple parallel TCP connections may provide performance improvement in private networks.
IL92,Hah94,AOK95,AKO96]. In addition, research has shown that multiple TCP connections can outperform a single modern TCP connection (with large windows and SACK) [AHKO97]. However, these studies did not consider the impact of using multiple TCP connections on competing traffic. [FF99] argues that using multiple simultaneous connections to transfer a given file may lead to congestive collapse in shared networks. FF99] using multiple parallel TCP connections is not safe (from a congestion control perspective) in shared networks and should not be used. FF99] outlines that the use of multiple parallel connections in a shared network, such as the Internet, may lead to congestive collapse. However, the use of multiple connections may be safe and beneficial in private networks. The specific topology being used will dictate the number of parallel connections required. Some work has been done to determine the appropriate number of connections on the fly [AKO96], but such a mechanism is far from complete. JBB92]. In addition, a larger initial congestion window is achieved, similar to using [AFP98] or TCB sharing (see section 3.8). JK92]),
performance is reduced in long-lived (but bursty) connections (such as HTTP/1.1, which uses persistent TCP connections to transfer multiple WWW page elements) [Hei97a]. Rate-based pacing (RBP) is a technique, used in the absence of incoming ACKs, where the data sender temporarily paces TCP segments at a given rate to restart the ACK clock. Upon receipt of the first ACK, pacing is discontinued and normal TCP ACK clocking resumes. The pacing rate may either be known from recent traffic estimates (when restarting an idle connection or from recent prior connections), or may be known through external means (perhaps in a point-to-point or point-to-multipoint satellite network where available bandwidth can be assumed to be large). In addition, pacing data during the first RTT of a transfer may allow TCP to make effective use of high bandwidth-delay links even for short transfers. However, in order to pace segments during the first RTT a TCP will have to be using a non-standard initial congestion window and a new mechanism to pace outgoing segments rather than send them back-to-back. Determining an appropriate size for the initial cwnd is an open research question. Pacing can also be used to reduce bursts in general (due to buggy TCPs or byte counting, see section 3.2.2 for a discussion on byte counting). VH97a]. In this environment, RBP substantially improves performance compared to slow-start-after-idle for intermittent senders, and it slightly improves performance over burst-full-cwnd-after-idle (because of drops) [VH98]. More recently, pacing has been suggested to eliminate burstiness in networks with ACK filtering [BPK97]. VH97b]. RBP requires an additional sender timer for pacing. The overhead of timer-driven data transfer is often considered too high for practical use. Preliminary experiments suggest that in RBP this overhead is minimal because RBP only requires this timer for one RTT of transmission [VH98]. RBP is expected to make TCP more conservative in sending bursts of data after an idle period in hosts that do not revert to slow start after an idle period. On the other hand, RBP makes TCP more aggressive if the sender uses the slow start algorithm to start the ACK clock after a long idle period.
Section 2. Use at the beginning of new connections would be restricted to topologies where available bandwidth can be estimated out-of-band. section 3.2.4) more difficult. DNP99,DENP97,Jac90] reduce the overhead of TCP sessions by replacing the data in the TCP and IP headers that remains constant, changes slowly, or changes in a predictable manner with a short "connection number". Using this method, the sender first sends a full TCP/IP header, including in it a connection number that the sender will use to reference the connection. The receiver stores the full header and uses it as a template, filling in some fields from the limited
information contained in later, compressed headers. This compression can reduce the size of an IPv4/TCP headers from 40 to as few as 3 to 5 bytes (3 bytes for some common cases, 5 bytes in general). Compression and decompression generally happen below the IP layer, at the end-points of a given physical link (such as at two routers connected by a serial line). The hosts on either side of the physical link must maintain some state about the TCP connections that are using the link. The decompresser must pass complete, uncompressed packets to the IP layer. Thus header compression is transparent to routing, for example, since an incoming packet with compressed headers is expanded before being passed to the IP layer. A variety of methods can be used by the compressor/decompressor to negotiate the use of header compression. For example, the PPP serial line protocol allows for an option exchange, during which time the compressor/decompressor agree on whether or not to use header compression. For older SLIP implementations, [Jac90] describes a mechanism that uses the first bit in the IP packet as a flag. The reduction in overhead is especially useful when the link is bandwidth-limited such as terrestrial wireless and mobile satellite links, where the overhead associated with transmitting the header bits is nontrivial. Header compression has the added advantage that for the case of uniformly distributed bit errors, compressing TCP/IP headers can provide a better quality of service by decreasing the packet error probability. The shorter, compressed packets are less likely to be corrupted, and the reduction in errors increases the connection's throughput. Extra space is saved by encoding changes in fields that change relatively slowly by sending only their difference from their values in the previous packet instead of their absolute values. In order to decode headers compressed this way, the receiver keeps a copy of each full, reconstructed TCP header after it is decoded, and applies the delta values from the next decoded compressed header to the reconstructed full header template. A disadvantage to using this delta encoding scheme where values are encoded as deltas from their values in the previous packet is that if a single compressed packet is lost, subsequent packets with compressed headers can become garbled if they contain fields which depend on the lost packet. Consider a forward data stream of packets with compressed headers and increasing sequence numbers. If packet N is lost, the full header of packet N+1 will be reconstructed at the receiver using packet N-1's full header as a template. Thus the
sequence number, which should have been calculated from packet N's header, will be wrong, the checksum will fail, and the packet will be discarded. When the sending TCP times out and retransmits a packet with a full header is forwarded to re-synchronize the decompresser. It is important to note that the compressor does not maintain any timers, nor does the decompresser know when an error occurred (only the receiving TCP knows this, when the TCP checksum fails). A single bit error will cause the decompresser to lose sync, and subsequent packets with compressed headers will be dropped by the receiving TCP, since they will all fail the TCP checksum. When this happens, no duplicate acknowledgments will be generated, and the decompresser can only re-synchronize when it receives a packet with an uncompressed header. This means that when header compression is being used, both fast retransmit and selective acknowledgments will not be able correct packets lost on a compressed link. The "twice" algorithm, described below, may be a partial solution to this problem. [DNP99] and [DENP97] describe TCP/IPv4 and TCP/IPv6 compression algorithms including compressing the various IPv6 extension headers as well as methods for compressing non-TCP streams. [DENP97] also augments TCP header compression by introducing the "twice" algorithm. If a particular packet fails to decompress properly, the twice algorithm modifies its assumptions about the inferred fields in the compressed header, assuming that a packet identical to the current one was dropped between the last correctly decoded packet and the current one. Twice then tries to decompress the received packet under the new assumptions and, if the checksum passes, the packet is passed to IP and the decompresser state has been re-synchronized. This procedure can be extended to three or more decoding attempts. Additional robustness can be achieved by caching full copies of packets which don't decompress properly in the hopes that later arrivals will fix the problem. Finally, the performance improvement if the decompresser can explicitly request a full header is discussed. Simulation results show that twice, in conjunction with the full header request mechanism, can improve throughput over uncompressed streams. Jac90] outlines a simple header compression scheme for TCP/IP. In [DENP97] the authors present the results of simulations showing that header compression is advantageous for both low and medium bandwidth links. Simulations show that the twice algorithm, combined with an explicit header request mechanism, improved throughput by 10-15% over uncompressed sessions across a wide range of bit error rates.
Much of this improvement may have been due to the twice algorithm quickly re-synchronizing the decompresser when a packet is lost. This is because the twice algorithm, applied one or two times when the decompresser becomes unsynchronized, will re-sync the decompresser in between 83% and 99% of the cases examined. This means that packets received correctly after twice has resynchronized the decompresser will cause duplicate acknowledgments. This re- enables the use of both fast retransmit and SACK in conjunction with header compression. DENP97] requires more extensive modifications to the sending and receiving ends of each link that employs header compression. Header compression does not violate TCP's congestion control mechanisms and therefore can be safely implemented in shared networks. section 2, but will provide relatively more improvement in situations where packet sizes are small (i.e., overhead is large) and there is medium to low bandwidth and/or higher BER. When TCP's congestion window size is large, implementing the explicit header request mechanism, the twice algorithm, and caching packets which fail to decompress properly becomes more critical.