Network Working Group B. Davie Request for Comments: 3246 A. Charny Obsoletes: 2598 Cisco Systems, Inc. Category: Standards Track J.C.R. Bennett Motorola K. Benson Tellabs J.Y. Le Boudec EPFL W. Courtney TRW S. Davari PMC-Sierra V. Firoiu Nortel Networks D. Stiliadis Lucent Technologies March 2002 An Expedited Forwarding PHB (Per-Hop Behavior) Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved.
AbstractThis document defines a PHB (per-hop behavior) called Expedited Forwarding (EF). The PHB is a basic building block in the Differentiated Services architecture. EF is intended to provide a building block for low delay, low jitter and low loss services by ensuring that the EF aggregate is served at a certain configured rate. This document obsoletes RFC 2598.
1 Introduction ........................................... 2 1.1 Relationship to RFC 2598 ............................... 3 2 Definition of EF PHB ................................... 3 2.1 Intuitive Description of EF ............................ 3 2.2 Formal Definition of the EF PHB ........................ 5 2.3 Figures of merit ....................................... 8 2.4 Delay and jitter ....................................... 8 2.5 Loss ................................................... 9 2.6 Microflow misordering .................................. 9 2.7 Recommended codepoint for this PHB ..................... 9 2.8 Mutability ............................................. 10 2.9 Tunneling .............................................. 10 2.10 Interaction with other PHBs ............................ 10 3 Security Considerations ................................ 10 4 IANA Considerations .................................... 11 5 Acknowledgments ........................................ 11 6 References ............................................. 11 Appendix: Implementation Examples .............................. 12 Authors' Addresses ............................................. 14 Full Copyright Statement ....................................... 16 3, 4]. This memo describes a particular PHB called expedited forwarding (EF). The intent of the EF PHB is to provide a building block for low loss, low delay, and low jitter services. The details of exactly how to build such services are outside the scope of this specification. The dominant causes of delay in packet networks are fixed propagation delays (e.g. those arising from speed-of-light delays) on wide area links and queuing delays in switches and routers. Since propagation delays are a fixed property of the topology, delay and jitter are minimized when queuing delays are minimized. In this context, jitter is defined as the variation between maximum and minimum delay. The intent of the EF PHB is to provide a PHB in which suitably marked packets usually encounter short or empty queues. Furthermore, if queues remain short relative to the buffer space available, packet loss is also kept to a minimum.
To ensure that queues encountered by EF packets are usually short, it is necessary to ensure that the service rate of EF packets on a given output interface exceeds their arrival rate at that interface over long and short time intervals, independent of the load of other (non-EF) traffic. This specification defines a PHB in which EF packets are guaranteed to receive service at or above a configured rate and provides a means to quantify the accuracy with which this service rate is delivered over any time interval. It also provides a means to quantify the maximum delay and jitter that a packet may experience under bounded operating conditions. Note that the EF PHB only defines the behavior of a single node. The specification of behavior of a collection of nodes is outside the scope of this document. A Per-Domain Behavior (PDB) specification  may provide such information. When a DS-compliant node claims to implement the EF PHB, the implementation MUST conform to the specification given in this document. However, the EF PHB is not a mandatory part of the Differentiated Services architecture - a node is NOT REQUIRED to implement the EF PHB in order to be considered DS-compliant. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 . RFC 2598 . The main difference is that it adds mathematical formalism to give a more rigorous definition of the behavior described. The full rationale for this is given in .
- EF traffic clearly cannot be served at rate R if there are no EF packets waiting to be served, but it may be impossible to determine externally whether EF packets are actually waiting to be served by the output scheduler. For example, if an EF packet has entered the router and not exited, it may be awaiting service, or it may simply have encountered some processing or transmission delay within the router. The formal definition below takes account of these issues. It assumes that EF packets should ideally be served at rate R or faster, and bounds the deviation of the actual departure time of each packet from the "ideal" departure time of that packet. We define the departure time of a packet as the time when the last bit of that packet leaves the node. The "ideal" departure time of each EF packet is computed iteratively. In the case when an EF packet arrives at a device when all the previous EF packets have already departed, the computation of the ideal departure time is simple. Service of the packet should (ideally) start as soon as it arrives, so the ideal departure time is simply the arrival time plus the ideal time to transmit the packet at rate R. For a packet of length L_j, that transmission time at the configured rate R is L_j/R. (Of course, a real packet will typically get transmitted at line rate once its transmission actually starts, but we are calculating the ideal target behavior here; the ideal service takes place at rate R.) In the case when an EF packet arrives at a device that still contains EF packets awaiting service, the computation of the ideal departure time is more complicated. There are two cases to be considered. If the previous (j-1-th) departure occurred after its own ideal departure time, then the scheduler is running "late". In this case, the ideal time to start service of the new packet is the ideal departure time of the previous (j-1-th) packet, or the arrival time of the new packet, whichever is later, because we cannot expect a packet to begin service before it arrives. If the previous (j-1-th) departure occurred before its own ideal departure time, then the scheduler is running "early". In this case, service of the new packet should begin at the actual departure time of the previous packet. Once we know the time at which service of the j-th packet should (ideally) begin, then the ideal departure time of the j-th packet is L_j/R seconds later. Thus we are able to express the ideal departure time of the j-th packet in terms of the arrival time of the j-th packet, the actual departure time of the j-1-th packet, and the ideal departure time of the j-1-th packet. Equations eq_1 and eq_2 in Section 2.2 capture this relationship.
Whereas the original EF definition did not provide any means to guarantee the delay of an individual EF packet, this property may be desired. For this reason, the equations in Section 2.2 consist of two parts: an "aggregate behavior" set and a "packet-identity-aware" set of equations. The aggregate behavior equations (eq_1 and eq_2) simply describe the properties of the service delivered to the EF aggregate by the device. The "packet-identity-aware" equations (eq_3 and eq_4) enable the bound on delay of an individual packet to be calculated given a knowledge of the operating conditions of the device. The significance of these two sets of equations is discussed further in Section 2.2. Note that these two sets of equations provide two ways of characterizing the behavior of a single device, not two different modes of behavior.
- E_a is the error term for the treatment of the EF aggregate. Note that E_a represents the worst case deviation between the actual departure time of an EF packet and the ideal departure time of the same packet, i.e. E_a provides an upper bound on (d_j - f_j) for all j. - d_0 and f_0 do not refer to a real packet departure but are used purely for the purposes of the recursion. The time origin should be chosen such that no EF packets are in the system at time 0. - for the definitions of a_j and d_j, the "last bit" of the packet includes the layer 2 trailer if present, because a packet cannot generally be considered available for forwarding until such a trailer has been received. An EF-compliant node MUST be able to be characterized by the range of possible R values that it can support on each of its interfaces while conforming to these equations, and the value of E_a that can be met on each interface. R may be line rate or less. E_a MAY be specified as a worst-case value for all possible R values or MAY be expressed as a function of R. Note also that, since a node may have multiple inputs and complex internal scheduling, the j-th EF packet to arrive at the node destined for a certain interface may not be the j-th EF packet to depart from that interface. It is in this sense that eq_1 and eq_2 are unaware of packet identity. In addition, a node that supports EF on an interface I at some configured rate R MUST satisfy the following equations: D_j <= F_j + E_p for all j > 0 (eq_3) where F_j is defined iteratively by F_0 = 0, D_0 = 0 F_j = max(A_j, min(D_j-1, F_j-1)) + L_j/R, for all j > 0 (eq_4) In this definition: - D_j is the actual departure time of the individual EF packet that arrived at the node destined for interface I at time A_j, i.e., given a packet which was the j-th EF packet destined for I to arrive at the node via any input, D_j is the time at which the last bit of that individual packet actually leaves the node from the interface I.
- F_j is the target departure time for the individual EF packet that arrived at the node destined for interface I at time A_j. - A_j is the time that the last bit of the j-th EF packet destined to the output I to arrive actually arrives at the node. - L_j is the size (bits) of the j-th EF packet to arrive at the node that is destined to output I. L_j is measured on the IP datagram (IP header plus payload) and does not include any lower layer (e.g. MAC layer) overhead. - R is the EF configured rate at output I (in bits/second). - E_p is the error term for the treatment of individual EF packets. Note that E_p represents the worst case deviation between the actual departure time of an EF packet and the ideal departure time of the same packet, i.e. E_p provides an upper bound on (D_j - F_j) for all j. - D_0 and F_0 do not refer to a real packet departure but are used purely for the purposes of the recursion. The time origin should be chosen such that no EF packets are in the system at time 0. - for the definitions of A_j and D_j, the "last bit" of the packet includes the layer 2 trailer if present, because a packet cannot generally be considered available for forwarding until such a trailer has been received. It is the fact that D_j and F_j refer to departure times for the j-th packet to arrive that makes eq_3 and eq_4 aware of packet identity. This is the critical distinction between the last two equations and the first two. An EF-compliant node SHOULD be able to be characterized by the range of possible R values that it can support on each of its interfaces while conforming to these equations, and the value of E_p that can be met on each interface. E_p MAY be specified as a worst-case value for all possible R values or MAY be expressed as a function of R. An E_p value of "undefined" MAY be specified. For discussion of situations in which E_p may be undefined see the Appendix and . For the purposes of testing conformance to these equations, it may be necessary to deal with packet arrivals on different interfaces that are closely spaced in time. If two or more EF packets destined for the same output interface arrive (on different inputs) at almost the
Since the minimum delay through the device is clearly at least zero, D also provides a bound on jitter. To provide a tighter bound on jitter, the value of E_p for a device MAY be specified as two separate components such that E_p = E_fixed + E_variable where E_fixed represents the minimum delay that can be experienced by an EF packet through the node. Section 2.2.
5]. Section 2.2 MUST hold for a node independent of the amount of non-EF traffic offered to it. If the EF PHB is implemented by a mechanism that allows unlimited preemption of other traffic (e.g., a priority queue), the implementation MUST include some means to limit the damage EF traffic could inflict on other traffic (e.g., a token bucket rate limiter). Traffic that exceeds this limit MUST be discarded. This maximum EF rate, and burst size if appropriate, MUST be settable by a network administrator (using whatever mechanism the node supports for non- volatile configuration). 4]. It protects against denial-of-service and theft-of-service attacks which exploit DSCPs that are not identified in any Traffic Conditioning Specification provisioned at an ingress interface, but which map to EF inside the DS-domain.
3].  Jacobson, V., Nichols, K. and K. Poduri, "An Expedited Forwarding PHB", RFC 2598, June 1999.  Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.  Nichols, K., Blake, S., Baker, F. and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998.  Black, D., Blake, S., Carlson, M., Davies, E., Wang, Z. and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, December 1998.  Black, D., "Differentiated Services and Tunnels", RFC 2983, October 2000.  Charny, A., Baker, F., Davie, B., Bennett, J.C.R., Benson, K., Le Boudec, J.Y., Chiu, A., Courtney, W., Davari, S., Firoiu, V., Kalmanek, C., Ramakrishnan, K.K. and D. Stiliadis, "Supplemental Information for the New Definition of the EF PHB (Expedited Forwarding Per-Hop Behavior)", RFC 3247, March 2002.  Nichols K. and B. Carpenter, "Definition of Differentiated Services Per Domain Behaviors and Rules for their Specification", RFC 3086, April 2001.
6], and include both output schedulers and the internal design of a device. A priority queue is widely considered as the canonical example of an implementation of EF. A "perfect" output buffered device (i.e. one which delivers packets immediately to the appropriate output queue) with a priority queue for EF traffic will provide both a low E_a and a low E_p. We note that the main factor influencing E_a will be the inability to pre-empt an MTU-sized non-EF packet that has just begun transmission at the time when an EF packet arrives at the output interface, plus any additional delay that might be caused by non- pre-emptable queues between the priority queue and the physical interface. E_p will be influenced primarily by the number of interfaces. Another example of an implementation of EF is a weighted round robin scheduler. Such an implementation will typically not be able to support values of R as high as the link speeds, because the maximum rate at which EF traffic can be served in the presence of competing traffic will be affected by the number of other queues and the weights given to them. Furthermore, such an implementation is likely to have a value of E_a that is higher than a priority queue implementation, all else being equal, as a result of the time spent serving non-EF queues by the round robin scheduler. Finally, it is possible to implement hierarchical scheduling algorithms, such that some non-FIFO scheduling algorithm is run on sub-flows within the EF aggregate, while the EF aggregate as a whole could be served at high priority or with a large weight by the top- level scheduler. Such an algorithm might perform per-input scheduling or per-microflow scheduling within the EF aggregate, for example. Because such algorithms lead to non-FIFO service within the EF aggregate, the value of E_p for such algorithms may be higher than for other implementations. For some schedulers of this type it may be difficult to provide a meaningful bound on E_p that would hold for any pattern of traffic arrival, and thus a value of "undefined" may be most appropriate.
It should be noted that it is quite acceptable for a Diffserv domain to provide multiple instances of EF. Each instance should be characterizable by the equations in Section 2.2 of this specification. The effect of having multiple instances of EF on the E_a and E_p values of each instance will depend considerably on how the multiple instances are implemented. For example, in a multi- level priority scheduler, an instance of EF that is not at the highest priority may experience relatively long periods when it receives no service while higher priority instances of EF are served. This would result in relatively large values of E_a and E_p. By contrast, in a WFQ-like scheduler, each instance of EF would be represented by a queue served at some configured rate and the values of E_a and E_p could be similar to those for a single EF instance.
Shahram Davari PMC-Sierra Inc 411 Legget Drive Ottawa, ON K2K 3C9, Canada EMail: email@example.com Victor Firoiu Nortel Networks 600 Tech Park Billerica, MA 01821 EMail: firstname.lastname@example.org Dimitrios Stiliadis Lucent Technologies 101 Crawfords Corner Road Holmdel, NJ 07733 EMail: email@example.com
Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.