Section 4.2.4) found no evidence that byte-mode packet drop had been implemented, so deployment will be sparse at best. A migration strategy is not really needed to remove an algorithm that may not even be deployed. A programme of experimental updates to take packet size into account in transport congestion control protocols has already started with TFRC-SP [RFC4828].
Appendix B.2). This problem is not pressing, because most Internet resources are designed to be bit-congestible before packet processing starts to congest (see Section 1.1). The IRTF's Internet Congestion Control Research Group (ICCRG) has set itself the task of reaching consensus on generic forwarding mechanisms that are necessary and sufficient to support the Internet's future congestion control requirements (the first challenge in [RFC6077]). The research question of whether packet congestion might become common and what to do if it does may in the future be explored in the IRTF (the "Challenge 3: Packet Size" in [RFC6077]). Note that sometimes it seems that resources might be congested by neither bits nor packets, e.g., where the queue for access to a wireless medium is in units of transmission opportunities. However, the root cause of congestion of the underlying spectrum is overload of bits (see Section 4.1.2).
Appendix C explains why the ability of networks to police the response of _any_ transport to congestion depends on bit-congestible network resources only doing packet-mode drop, not byte-mode drop. In summary, it says that making drop probability depend on the size of the packets that bits happen to be divided into simply encourages the bits to be divided into smaller packets. Byte-mode drop would therefore irreversibly complicate any attempt to fix the Internet's incentive structures. Section 2 of this memo are different in each case: o When network equipment measures the length of a queue, if it is not feasible to use time; it is recommended to count in bytes if the network resource is congested by bytes, or to count in packets if is congested by packets. o When network equipment decides whether to drop (or mark) a packet, it is recommended that the size of the particular packet should not be taken into account. o However, when a transport algorithm responds to a dropped or marked packet, the size of the rate reduction should be proportionate to the size of the packet. In summary, the answers are 'it depends', 'no', and 'yes', respectively. For the specific case of RED, this means that byte-mode queue measurement will often be appropriate, but the use of byte-mode drop is very strongly discouraged. At the transport layer, the IETF should continue updating congestion control protocols to take into account the size of each packet that indicates congestion. Also, the IETF should continue to make protocols less sensitive to losing control packets like SYNs, pure ACKs, and DNS exchanges. Although many control packets happen to be small, the alternative of network equipment favouring all small packets would be dangerous. That would create perverse incentives to split data transfers into smaller packets. The memo develops these recommendations from principled arguments concerning scaling, layering, incentives, inherent efficiency, security, and 'policeability'. It also addresses practical issues
such as specific buffer architectures and incremental deployment. Indeed, a limited survey of RED implementations is discussed, which shows there appears to be little, if any, installed base of RED's byte-mode drop. Therefore, it can be deprecated with little, if any, incremental deployment complications. The recommendations have been developed on the well-founded basis that most Internet resources are bit-congestible, not packet- congestible. We need to know the likelihood that this assumption will prevail in the longer term and, if it might not, what protocol changes will be needed to cater for a mix of the two. The IRTF Internet Congestion Control Research Group (ICCRG) is currently working on these problems [RFC6077]. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, April 1998.
[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914, September 2000. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [BLUE02] Feng, W-c., Shin, K., Kandlur, D., and D. Saha, "The BLUE active queue management algorithms", IEEE/ACM Transactions on Networking 10(4) 513-528, August 2002, <http://dx.doi.org/10.1109/TNET.2002.801399>. [CCvarPktSize] Widmer, J., Boutremans, C., and J-Y. Le Boudec, "End-to- end congestion control for TCP-friendly flows with variable packet size", ACM CCR 34(2) 137-151, April 2004, <http://doi.acm.org/10.1145/997150.997162>. [CHOKe_Var_Pkt] Psounis, K., Pan, R., and B. Prabhaker, "Approximate Fair Dropping for Variable-Length Packets", IEEE Micro 21(1):48-56, January-February 2001, <http://ieeexplore.ieee.org/xpl/ articleDetails.jsp?arnumber=903061>. [CoDel] Nichols, K. and V. Jacobson, "Controlled Delay Active Queue Management", Work in Progress, February 2013. [DRQ] Shin, M., Chong, S., and I. Rhee, "Dual-Resource TCP/AQM for Processing-Constrained Networks", IEEE/ACM Transactions on Networking Vol 16, issue 2, April 2008, <http://dx.doi.org/10.1109/TNET.2007.900415>. [DupTCP] Wischik, D., "Short messages", Philosophical Transactions of the Royal Society A 366(1872):1941-1953, June 2008, <http://rsta.royalsocietypublishing.org/content/366/1872/ 1941.full.pdf+html>. [ECNFixedWireless] Siris, V., "Resource Control for Elastic Traffic in CDMA Networks", Proc. ACM MOBICOM'02 , September 2002, <http://www.ics.forth.gr/netlab/publications/ resource_control_elastic_cdma.html>.
[Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the evolution of congestion control", Automatica 35(12)1969-1985, December 1999, <http://www.sciencedirect.com/science/article/pii/ S0005109899001351>. [GentleAggro] Flach, T., Dukkipati, N., Terzis, A., Raghavan, B., Cardwell, N., Cheng, Y., Jain, A., Hao, S., Katz-Bassett, E., and R. Govindan, "Reducing web latency: the virtue of gentle aggression", ACM SIGCOMM CCR 43(4)159-170, August 2013, <http://doi.acm.org/10.1145/2486001.2486014>. [IOSArch] Bollapragada, V., White, R., and C. Murphy, "Inside Cisco IOS Software Architecture", Cisco Press: CCIE Professional Development ISBN13: 978-1-57870-181-0, July 2000. [PIE] Pan, R., Natarajan, P., Piglione, C., Prabhu, M., Subramanian, V., Baker, F., and B. Steeg, "PIE: A Lightweight Control Scheme To Address the Bufferbloat Problem", Work in Progress, February 2014. [PktSizeEquCC] Vasallo, P., "Variable Packet Size Equation-Based Congestion Control", ICSI Technical Report tr-00-008, 2000, <http://http.icsi.berkeley.edu/ftp/global/pub/ techreports/2000/tr-00-008.pdf>. [RED93] Floyd, S. and V. Jacobson, "Random Early Detection (RED) gateways for Congestion Avoidance", IEEE/ACM Transactions on Networking 1(4) 397--413, August 1993, <http://ieeexplore.ieee.org/xpls/ abs_all.jsp?arnumber=251892>. [REDbias] Eddy, W. and M. Allman, "A Comparison of RED's Byte and Packet Modes", Computer Networks 42(3) 261--280, June 2003, <http://www.ir.bbn.com/documents/articles/redbias.ps>. [REDbyte] De Cnodder, S., Elloumi, O., and K. Pauwels, "Effect of different packet sizes on RED performance", Proc. 5th IEEE Symposium on Computers and Communications (ISCC) 793-799, July 2000, <http://ieeexplore.ieee.org/xpls/ abs_all.jsp?arnumber=860741>.
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. [RFC3426] Floyd, S., "General Architectural and Policy Considerations", RFC 3426, November 2002. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion Control for Voice Traffic in the Internet", RFC 3714, March 2004. [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control (TFRC): The Small-Packet (SP) Variant", RFC 4828, April 2007. [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, September 2008. [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. Ramakrishnan, "Adding Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June 2009. [RFC5670] Eardley, P., "Metering and Marking Behaviour of PCN- Nodes", RFC 5670, November 2009. [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009. [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding Acknowledgement Congestion Control to TCP", RFC 5690, February 2010. [RFC6077] Papadimitriou, D., Welzl, M., Scharf, M., and B. Briscoe, "Open Research Issues in Internet Congestion Control", RFC 6077, February 2011. [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., and K. Carlberg, "Explicit Congestion Notification (ECN) for RTP over UDP", RFC 6679, August 2012.
[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion Exposure (ConEx) Concepts and Use Cases", RFC 6789, December 2012. [Rate_fair_Dis] Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", ACM CCR 37(2)63-74, April 2007, <http://portal.acm.org/citation.cfm?id=1232926>. [gentle_RED] Floyd, S., "Recommendation on using the "gentle_" variant of RED", Web page , March 2000, <http://www.icir.org/floyd/red/gentle.html>. [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End Congestion Control", IEEE/ACM Transactions on Networking 7(4) 458--472, August 1999, <http://ieeexplore.ieee.org/ xpls/abs_all.jsp?arnumber=793002>. [pktByteEmail] Floyd, S., "RED: Discussions of Byte and Packet Modes", email, March 1997, <http://ee.lbl.gov/floyd/REDaveraging.txt>.
Table 3. About 19% of those surveyed replied, giving a sample size of 16. Although in most cases we do not have permission to identify the respondents, we can say that those that have responded include most of the larger equipment vendors, covering a large fraction of the market. The two who gave permission to be identified were Cisco and Alcatel-Lucent. The others range across the large network equipment vendors at L3 & L2, firewall vendors, wireless equipment vendors, as well as large software businesses with a small selection of networking products. All those who responded confirmed that they have not implemented the variant of RED with drop dependent on packet size (2 were fairly sure they had not but needed to check more thoroughly). At the time the survey was conducted, Linux did not implement RED with packet-size bias of drop, although we have not investigated a wider range of open source code. +-------------------------------+----------------+--------------+ | Response | No. of vendors | % of vendors | +-------------------------------+----------------+--------------+ | Not implemented | 14 | 17% | | Not implemented (probably) | 2 | 2% | | Implemented | 0 | 0% | | No response | 68 | 81% | | Total companies/orgs surveyed | 84 | 100% | +-------------------------------+----------------+--------------+ Table 3: Vendor Survey on byte-mode drop variant of RED (lower drop probability for small packets) Where reasons were given for why the byte-mode drop variant had not been implemented, the extra complexity of packet-bias code was most prevalent, though one vendor had a more principled reason for avoiding it -- similar to the argument of this document. Our survey was of vendor implementations, so we cannot be certain about operator deployment. But we believe many queues in the Internet are still tail drop. The company of one of the co-authors (BT) has widely deployed RED; however, many tail-drop queues are bound to still exist, particularly in access network equipment and on middleboxes like firewalls, where RED is not always available.
Routers using a memory architecture based on fixed-size buffers with borrowing may also still be prevalent in the Internet. As explained in Section 4.2.1, these also provide a marginal (but legitimate) bias towards small packets. So even though RED byte-mode drop is not prevalent, it is likely there is still some bias towards small packets in the Internet due to tail-drop and fixed-buffer borrowing. Table 4). Checking the two scenarios in each of the horizontal and vertical dimensions tests the extremes of sensitivity to packet size in the transport and in the network respectively. Note that this section does not consider byte-mode drop at all. Having deprecated byte-mode drop, the goal here is to check that packet-mode drop will be sufficient in all cases. +-------------------------------+-----------------+-----------------+ | Transport -> | a) Independent | b) Dependent on | | ----------------------------- | of packet size | packet size of | | Network | of congestion | congestion | | | notifications | notifications | +-------------------------------+-----------------+-----------------+ | 1) Predominantly bit- | Scenario a1) | Scenario b1) | | congestible network | | | | 2) Mix of bit-congestible and | Scenario a2) | Scenario b2) | | pkt-congestible network | | | +-------------------------------+-----------------+-----------------+ Table 4: Four Possible Congestion Scenarios Appendix B.1 focuses on the horizontal dimension of Table 4 checking that packet-mode drop (or marking) gives sufficient information, whether or not the transport uses it -- scenarios b) and a) respectively. Appendix B.2 focuses on the vertical dimension of Table 4, checking that packet-mode drop gives sufficient information to the transport whether resources in the network are bit-congestible or packet- congestible (these terms are defined in Section 1.1).
Notation: To be concrete, we will compare two flows with different packet sizes, s_1 and s_2. As an example, we will take s_1 = 60 B = 480 b and s_2 = 1,500 B = 12,000 b. A flow's bit rate, x [bps], is related to its packet rate, u [pps], by x(t) = s*u(t). In the bit-congestible case, path congestion will be denoted by p_b, and in the packet-congestible case by p_p. When either case is implied, the letter p alone will denote path congestion. RFC 5681 treats a congestion notification on any packet whatever its size as one event. However, a network with just the packet-mode drop algorithm gives more information if the transport chooses to use it. We will use Table 5 to illustrate this. We will set aside the last column until later. The columns labelled 'Flow 1' and 'Flow 2' compare two flows consisting of 60 B and 1,500 B packets respectively. The body of the table considers two separate cases, one where the flows have an equal bit rate and the other with equal packet rates. In both cases, the two flows fill a 96 Mbps link. Therefore, in the equal bit rate case, they each have half the bit rate (48Mbps). Whereas, with equal packet rates, Flow 1 uses 25 times smaller packets so it gets 25 times less bit rate -- it only gets 1/(1+25) of the link capacity (96 Mbps / 26 = 4 Mbps after rounding). In contrast Flow 2 gets 25 times more bit rate (92 Mbps) in the equal packet rate case because its packets are 25 times larger. The packet rate shown for each flow could easily be derived once the bit rate was known by dividing the bit rate by packet size, as shown in the column labelled 'Formula'.
Parameter Formula Flow 1 Flow 2 Combined ----------------------- ----------- -------- -------- -------- Packet size s/8 60 B 1,500 B (Mix) Packet size s 480 b 12,000 b (Mix) Pkt loss probability p 0.1% 0.1% 0.1% EQUAL BIT RATE CASE Bit rate x 48 Mbps 48 Mbps 96 Mbps Packet rate u = x/s 100 kpps 4 kpps 104 kpps Absolute pkt-loss rate p*u 100 pps 4 pps 104 pps Absolute bit-loss rate p*u*s 48 kbps 48 kbps 96 kbps Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1% Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1% EQUAL PACKET RATE CASE Bit rate x 4 Mbps 92 Mbps 96 Mbps Packet rate u = x/s 8 kpps 8 kpps 15 kpps Absolute pkt-loss rate p*u 8 pps 8 pps 15 pps Absolute bit-loss rate p*u*s 4 kbps 92 kbps 96 kbps Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1% Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1% Table 5: Absolute Loss Rates and Loss Ratios for Flows of Small and Large Packets and Both Combined So far, we have merely set up the scenarios. We now consider congestion notification in the scenario. Two TCP flows with the same round-trip time aim to equalise their packet-loss rates over time; that is, the number of packets lost in a second, which is the packets per second (u) multiplied by the probability that each one is dropped (p). Thus, TCP converges on the case labelled 'Equal packet rate' in the table, where both flows aim for the same absolute packet-loss rate (both 8 pps in the table). Packet-mode drop actually gives flows sufficient information to measure their loss rate in bits per second, if they choose, not just packets per second. Each flow can count the size of a lost or marked packet and scale its rate response in proportion (as TFRC-SP does). The result is shown in the row entitled 'Absolute bit-loss rate', where the bits lost in a second is the packets per second (u) multiplied by the probability of losing a packet (p) multiplied by the packet size (s). Such an algorithm would try to remove any imbalance in the bit-loss rate such as the wide disparity in the case labelled 'Equal packet rate' (4k bps vs. 92 kbps). Instead, a packet-size-dependent algorithm would aim for equal bit-loss rates, which would drive both flows towards the case labelled 'Equal bit rate', by driving them to equal bit-loss rates (both 48 kbps in this example).
The explanation so far has assumed that each flow consists of packets of only one constant size. Nonetheless, it extends naturally to flows with mixed packet sizes. In the right-most column of Table 5, a flow of mixed-size packets is created simply by considering Flow 1 and Flow 2 as a single aggregated flow. There is no need for a flow to maintain an average packet size. It is only necessary for the transport to scale its response to each congestion indication by the size of each individual lost (or marked) packet. Taking, for example, the case labelled 'Equal packet rate', in one second about 8 small packets and 8 large packets are lost (making closer to 15 than 16 losses per second due to rounding). If the transport multiplies each loss by its size, in one second it responds to 8*480 and 8*12,000 lost bits, adding up to 96,000 lost bits in a second. This double checks correctly, being the same as 0.1% of the total bit rate of 96 Mbps. For completeness, the formula for absolute bit-loss rate is p(u1*s1+u2*s2). Incidentally, a transport will always measure the loss probability the same, irrespective of whether it measures in packets or in bytes. In other words, the ratio of lost packets to sent packets will be the same as the ratio of lost bytes to sent bytes. (This is why TCP's bit rate is still proportional to packet size, even when byte counting is used, as recommended for TCP in [RFC5681], mainly for orthogonal security reasons.) This is intuitively obvious by comparing two example flows; one with 60 B packets, the other with 1,500 B packets. If both flows pass through a queue with drop probability 0.1%, each flow will lose 1 in 1,000 packets. In the stream of 60 B packets, the ratio of lost bytes to sent bytes will be 60 B in every 60,000 B; and in the stream of 1,500 B packets, the loss ratio will be 1,500 B out of 1,500,000 B. When the transport responds to the ratio of lost to sent packets, it will measure the same ratio whether it measures in packets or bytes: 0.1% in both cases. The fact that this ratio is the same whether measured in packets or bytes can be seen in Table 5, where the ratio of lost packets to sent packets and the ratio of lost bytes to sent bytes is always 0.1% in all cases (recall that the scenario was set up with p=0.1%). This discussion of how the ratio can be measured in packets or bytes is only raised here to highlight that it is irrelevant to this memo! Whether or not a transport depends on packet size depends on how this ratio is used within the congestion control algorithm. So far, we have shown that packet-mode drop passes sufficient information to the transport layer so that the transport can take bit congestion into account, by using the sizes of the packets that indicate congestion. We have also shown that the transport can
choose not to take packet size into account if it wishes. We will now consider whether the transport can know which to do. Section 3 gives four good reasons why it would be a bad idea to allow for packet size by biasing drop probability in favour of small packets within the network. The impracticality of our thought
experiment shows that it will be hard to give transports a practical way to know whether or not to take into account the size of congestion indication packets. Fortunately, this dilemma is not pressing because by design most equipment becomes bit-congested before its packet processing becomes congested (as already outlined in Section 1.1). Therefore, transports can be designed on the relatively sound assumption that a congestion indication will usually imply bit congestion. Nonetheless, although the above idealised protocol isn't intended for implementation, we do want to emphasise that research is needed to predict whether there are good reasons to believe that packet congestion might become more common, and if so, to find a way to somehow distinguish between bit and packet congestion [RFC3714]. Recently, the dual resource queue (DRQ) proposal [DRQ] has been made on the premise that, as network processors become more cost- effective, per-packet operations will become more complex (irrespective of whether more function in the network is desirable). Consequently the premise is that CPU congestion will become more common. DRQ is a proposed modification to the RED algorithm that folds both bit congestion and packet congestion into one signal (either loss or ECN). Finally, we note one further complication. Strictly, packet- congestible resources are often cycle-congestible. For instance, for routing lookups, load depends on the complexity of each lookup and whether or not the pattern of arrivals is amenable to caching. This also reminds us that any solution must not require a forwarding engine to use excessive processor cycles in order to decide how to say it has no spare processor cycles.
necessarily equivalent to another from a different link. A policing function local to the link can know the local MTU where the congestion occurred. However, a policer at the edge of the network cannot, at least not without a lot of complexity. The early research proposals for type (i) policing at a bottleneck link [pBox] used byte-mode drop, then detected flows that contributed disproportionately to the number of packets dropped. However, with no extra complexity, later proposals used packet-mode drop and looked for flows that contributed a disproportionate amount of dropped bytes [CHOKe_Var_Pkt]. Work is progressing on the Congestion Exposure (ConEx) protocol [RFC6789], which enables a type (ii) edge policer located at a user's attachment point. The idea is to be able to take an integrated view of the effect of all a user's traffic on any link in the internetwork. However, byte-mode drop would effectively preclude such edge policing because of the MTU issue above. Indeed, making drop probability depend on the size of the packets that bits happen to be divided into would simply encourage the bits to be divided into smaller packets in order to confuse policing. In contrast, as long as a dropped/marked packet is taken to mean that all the bytes in the packet are dropped/marked, a policer can remain robust against sequences of bits being re-divided into different size packets or across different size flows [Rate_fair_Dis].
http://bobbriscoe.net/ Jukka Manner Aalto University Department of Communications and Networking (Comnet) P.O. Box 13000 FIN-00076 Aalto Finland Phone: +358 9 470 22481 EMail: firstname.lastname@example.org URI: http://www.netlab.tkk.fi/~jmanner/