Network Working Group N. Spring Request for Comments: 3540 D. Wetherall Category: Experimental D. Ely University of Washington June 2003 Robust Explicit Congestion Notification (ECN) Signaling with Nonces Status of this Memo This memo defines an Experimental Protocol for the Internet community. It does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved.
AbstractThis note describes the Explicit Congestion Notification (ECN)-nonce, an optional addition to ECN that protects against accidental or malicious concealment of marked packets from the TCP sender. It improves the robustness of congestion control by preventing receivers from exploiting ECN to gain an unfair share of network bandwidth. The ECN-nonce uses the two ECN-Capable Transport (ECT)codepoints in the ECN field of the IP header, and requires a flag in the TCP header. It is computationally efficient for both routers and hosts. RFC3168] improving its robustness against malicious or accidental concealment of marked packets. It has not been deployed widely. One goal of publication as an Experimental RFC is to be prudent, and encourage use and deployment prior to publication in the standards track. Another consideration is to give time for firewall developers to recognize and accept the pattern presented by the nonce. It is the intent of the Transport Area Working Group to re-submit this specification as an IETF Proposed Standard in the future after more experience has been gained.
The correct operation of ECN requires the cooperation of the receiver to return Congestion Experienced signals to the sender, but the protocol lacks a mechanism to enforce this cooperation. This raises the possibility that an unscrupulous or poorly implemented receiver could always clear ECN-Echo and simply not return congestion signals to the sender. This would give the receiver a performance advantage at the expense of competing connections that behave properly. More generally, any device along the path (NAT box, firewall, QOS bandwidth shapers, and so forth) could remove congestion marks with impunity. The above behaviors may or may not constitute a threat to the operation of congestion control in the Internet. However, given the central role of congestion control, it is prudent to design the ECN signaling loop to be robust against as many threats as possible. In this way, ECN can provide a clear incentive for improvement over the prior state-of-the-art without potential incentives for abuse. The ECN-nonce is a simple, efficient mechanism to eliminate the potential abuse of ECN. The ECN-nonce enables the sender to verify the correct behavior of the ECN receiver and that there is no other interference that conceals marked (or dropped) packets in the signaling path. The ECN- nonce protects against both implementation errors and deliberate abuse. The ECN-nonce: - catches a misbehaving receiver with a high probability, and never implicates an innocent receiver. - does not change other aspects of ECN, nor does it reduce the benefits of ECN for behaving receivers. - is cheap in both per-packet overhead (one TCP header flag) and processing requirements. - is simple and, to the best of our knowledge, not prone to other attacks. We also note that use of the ECN-nonce has two additional benefits, even when only drop-tail routers are used. First, packet drops cannot be concealed from the sender. Second, it prevents optimistic acknowledgements [Savage], in which TCP segments are acknowledged before they have been received. These benefits also serve to increase the robustness of congestion control from attacks. We do not elaborate on these benefits in this document. The rest of this document describes the ECN-nonce. We present an overview followed by detailed behavior at senders and receivers.
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. ECN] is assumed. For simplicity, we describe the ECN-nonce in one direction only, though it is run in both directions in parallel. The ECN protocol for TCP remains unchanged, except for the definition of a new field in the TCP header. As in [RFC3168], ECT(0) or ECT(1) (ECN-Capable Transport) is set in the ECN field of the IP header on outgoing packets. Congested routers change this field to CE (Congestion Experienced). When TCP receivers notice CE, the ECE (ECN-Echo) flag is set in subsequent acknowledgements until receiving a CWR flag. The CWR flag is sent on new data whenever the sender reacts to congestion. The ECN-nonce adds to this protocol, and enables the receiver to demonstrate to the sender that segments being acknowledged were received unmarked. A random one-bit value (a nonce) is encoded in the two ECT codepoints. The one-bit sum of these nonces is returned in a TCP header flag, the nonce sum (NS) bit. Packet marking erases the nonce value in the ECT codepoints because CE overwrites both ECN IP header bits. Since each nonce is required to calculate the sum, the correct nonce sum implies receipt of only unmarked packets. Not only are receivers prevented from concealing marked packets, middle- boxes along the network path cannot unmark a packet without successfully guessing the value of the original nonce. The sender can verify the nonce sum returned by the receiver to ensure that congestion indications in the form of marked (or dropped) packets are not being concealed. Because the nonce sum is only one bit long, senders have a 50-50 chance of catching a lying receiver whenever an acknowledgement conceals a mark. Because each acknowledgement is an independent trial, cheaters will be caught quickly if there are repeated congestion signals. The following paragraphs describe aspects of the ECN-nonce protocol in greater detail.
Each acknowledgement carries a nonce sum, which is the one bit sum (exclusive-or, or parity) of nonces over the byte range represented by the acknowledgement. The sum is used because not every packet is acknowledged individually, nor are packets acknowledged reliably. If a sum were not used, the nonce in an unmarked packet could be echoed to prove to the sender that the individual packet arrived unmarked. However, since these acks are not reliably delivered, the sender could not distinguish a lost ACK from one that was never sent in order to conceal a marked packet. The nonce sum prevents the receiver from concealing individual marked packets by not acknowledging them. Because the nonce and nonce sum are both one bit quantities, the sum is no easier to guess than the individual nonces. We show the nonce sum calculation below in Figure 1. Sender Receiver initial sum = 1 -- 1:4 ECT(0) --> NS = 1 + 0(1:4) = 1(:4) <- ACK 4, NS=1 --- -- 4:8 ECT(1) --> NS = 1(:4) + 1(4:8) = 0(:8) <- ACK 8, NS=0 --- -- 8:12 ECT(1) -> NS = 0(:8) + 1(8:12) = 1(:12) <- ACK 12, NS=1 -- -- 12:16 ECT(1) -> NS = 1(:12) + 1(12:16) = 0(:16) <- ACK 16, NS=0 -- Figure 1: The calculation of nonce sums at the receiver. After congestion has occurred and packets have been marked or lost, resynchronization of the sender and receiver nonce sums is needed. When packets are marked, the nonce is cleared, and the sum of the nonces at the receiver will no longer match the sum at the sender. Once nonces have been lost, the difference between sender and receiver nonce sums is constant until there is further loss. This means that it is possible to resynchronize the sender and receiver after congestion by having the sender set its nonce sum to that of the receiver. Because congestion indications do not need to be conveyed more frequently than once per round trip, the sender suspends checking while the CWR signal is being delivered and resets its nonce sum to the receiver's when new data is acknowledged. This has the benefit that the receiver is not explicitly involved in the re-synchronization process. The resynchronization process is shown in Figure 2 below. Note that the nonce sum returned in ACK 12 (NS=0) differs from that in the previous example (NS=1), and it continues to differ for ACK 16.
Sender Receiver initial sum = 1 -- 1:4 ECT(0) -> NS = 1 + 0(1:4) = 1(:4) <- ACK 4, NS=1 -- -- 4:8 ECT(1) -> CE -> NS = 1(:4) + ?(4:8) = 1(:8) <- ACK 8, ECE NS=1 -- -- 8:12 ECT(1), CWR -> NS = 1(:8) + 1(8:12) = 0(:12) <- ACK 12, NS=0 -- -- 12:16 ECT(1) -> NS = 0(:12) + 1(12:16) = 1(:16) <- ACK 16, NS=1 -- Figure 2: The calculation of nonce sums at the receiver when a packet (4:8) is marked. The receiver may calculate the wrong nonce sum when the original nonce information is lost after a packet is marked. Third, we need to reconcile that nonces are sent with packets but acknowledgements cover byte ranges. Acknowledged byte boundaries need not match the transmitted boundaries, and information can be retransmitted in packets with different byte boundaries. We discuss the first issue, how a receiver sets a nonce when acknowledging part of a segment, in Section 6.1. The second question, what nonce to send when retransmitting smaller segments as a large segment, has a simple answer: ECN is disabled for retransmissions, so can carry no nonce. Because retransmissions are associated with congestion events, nonce checking is suspended until after CWR is acknowledged and the congestion event is over. The next sections describe the detailed behavior of senders, routers and receivers, starting with sender transmit behavior, then around the ECN signaling loop, and finish with sender acknowledgement processing.
RFC3168]. By marking packets to signal congestion, the original value of the nonce, in ECT(0) or ECT(1), is removed. Neither the receiver nor any other party can unmark the packet without successfully guessing the value of the original nonce. RFC3168]. Returning the nonce sum is optional, but recommended, as senders are allowed to discontinue sending ECN-capable packets to receivers that do not support the ECN-nonce. As packets are removed from the queue of out-of-order packets to be acknowledged, the nonce is recovered from the IP header. The nonce is added to the current nonce sum as the acknowledgement sequence number is advanced for the recent packet. In the case of marked packets, one or more nonce values may be unknown to the receiver. In this case the missing nonce values are ignored when calculating the sum (or equivalently a value of zero is assumed) and ECN-Echo will be set to signal congestion to the sender. Returning the nonce sum corresponding to a given acknowledgement is straightforward. It is carried in a single "NS" (Nonce Sum) bit in the TCP header. This bit is adjacent to the CWR and ECN-Echo bits, set as Bit 7 in byte 13 of the TCP header, as shown below: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | N | C | E | U | A | P | R | S | F | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | | | R | E | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ Figure 3: The new definition of bytes 13 and 14 of the TCP Header. The initial nonce sum is 1, and is included in the SYN/ACK and ACK of the three way TCP handshake. This allows the other endpoint to infer nonce support, but is not a negotiation, in that the receiver of the SYN/ACK need not check if NS is set to decide whether to set NS in the subsequent ACK.
RFC3168]. During this recovery process, the sum may be incorrect because one or more nonces were not received. This does not matter during recovery, because TCP invokes congestion mechanisms at most once per RTT, whether there are one or more losses during that period.
Sender Receiver initial sum = 1 -- 1:4 ECT(0) -> NS = 1 + 0(1:4) = 1(:4) <- ACK 4, NS=1 -- -- 4:8 ECT(1) -> LOST -- 8:12 ECT(1) -> nonce sum calculation deferred until in-order data received <- ACK 4, NS=0 -- -- 12:16 ECT(1) -> nonce sum calculation deferred <- ACK 4, NS=0 -- -- 4:8 retransmit -> NS = 1(:4) + ?(4:8) + 1(8:12) + 1(12:16) = 1(:16) <- ACK 16, NS=1 -- -- 16:20 ECT(1) CWR -> <- ACK 20, NS=0 -- NS = 1(:16) + 1(16:20) = 0(:20) Figure 4: The calculation of nonce sums at the receiver when a packet is lost, and resynchronization after loss. The nonce sum is not changed until the cumulative acknowledgement is advanced. In practice, resynchronization can be accomplished by storing a bit that has the value one if the expected nonce sum stored by the sender and the received nonce sum in the acknowledgement of CWR differ, and zero otherwise. This synchronization offset bit can then be used in the comparison between expected nonce sum and received nonce sum. The sender should ignore the nonce sum returned on any acknowledgements bearing the ECN-echo flag. When an acknowledgment covers only a portion of a segment, such as when a middlebox resegments at the TCP layer instead of fragmenting IP packets, the sender should accept the nonce sum expected at the next segment boundary. In other words, an acknowledgement covering part of an original segment will include the nonce sum expected when the entire segment is acknowledged. Finally, in ECN, senders can choose not to indicate ECN capability on some packets for any reason. An ECN-nonce sender must resynchronize after sending such ECN-incapable packets, as though a CWR had been sent with the first new data after the ECN-incapable packets. The sender loses protection for any unacknowledged packets until resynchronization occurs.
Eifel], a recently proposed mechanism for removing the retransmission ambiguity to improve TCP performance. A misbehaving receiver might claim to have received only original transmissions to convince the sender to undo congestion actions. Since retransmissions are sent without ECT, and thus no nonce, returning the correct nonce sum confirms that only original transmissions were received.
RFC3168, use of the Don't Fragment bit with ECN is recommended. Receivers that receive unmarked fragments can reconstruct the original nonce to conceal a marked fragment. The ECN-nonce cannot protect against misbehaving receivers that conceal marked fragments, so some protection is lost in situations where Path MTU discovery is disabled. When responding to a small path MTU, the sender will retransmit a smaller frame in place of a larger one. Since these smaller packets are retransmissions, they will be ECN-incapable and bear no nonce. The sender should resynchronize on the first newly transmitted packet. Schneier], or similar scheme that would allow an adversary who has seen several previous bits to infer the generation function and thus its future output.
Although the ECN-nonce protects against concealment of congestion signals and optimistic acknowledgement, it provides no additional protection for the integrity of the connection. RFC 2780. The IANA has added the following to the registry for "TCP Header Flags": RFC 3540 defines bit 7 from the Reserved field to be used for the Nonce Sum, as follows: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | N | C | E | U | A | P | R | S | F | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | | | R | E | G | K | H | T | N | N | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ TCP Header Flags Bit Name Reference --- ---- --------- 7 NS (Nonce Sum) [RFC 3540]
[ECN] "The ECN Web Page", URL "http://www.icir.org/floyd/ecn.html". [RFC3168] Ramakrishnan, K., Floyd, S. and D. Black, "The addition of explicit congestion notification (ECN) to IP", RFC 3168, September 2001. [Eifel] R. Ludwig and R. Katz. The Eifel Algorithm: Making TCP Robust Against Spurious Retransmissions. Computer Communications Review, January, 2000. [B97] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [Savage] S. Savage, N. Cardwell, D. Wetherall, T. Anderson. TCP congestion control with a misbehaving receiver. SIGCOMM CCR, October 1999. [Schneier] Bruce Schneier. Applied Cryptography, 2nd ed., 1996
Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.