Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7141

Byte and Packet Congestion Notification

Pages: 41
Best Current Practice: 41
Errata
BCP 41 is also:  2914
Updates:  23092914
Part 3 of 3 – Pages 25 to 41
First   Prev   None

Top   ToC   RFC7141 - Page 25   prevText

5. Outstanding Issues and Next Steps

5.1. Bit-congestible Network

For a connectionless network with nearly all resources being bit- congestible, the recommended position is clear -- the network should not make allowance for packet sizes and the transport should. This leaves two outstanding issues: o The question of how to handle any legacy AQM deployments using byte-mode drop; o The need to start a programme to update transport congestion control protocol standards to take packet size into account. A survey of equipment vendors (Section 4.2.4) found no evidence that byte-mode packet drop had been implemented, so deployment will be sparse at best. A migration strategy is not really needed to remove an algorithm that may not even be deployed. A programme of experimental updates to take packet size into account in transport congestion control protocols has already started with TFRC-SP [RFC4828].
Top   ToC   RFC7141 - Page 26

5.2. Bit- and Packet-Congestible Network

The position is much less clear-cut if the Internet becomes populated by a more even mix of both packet-congestible and bit-congestible resources (see Appendix B.2). This problem is not pressing, because most Internet resources are designed to be bit-congestible before packet processing starts to congest (see Section 1.1). The IRTF's Internet Congestion Control Research Group (ICCRG) has set itself the task of reaching consensus on generic forwarding mechanisms that are necessary and sufficient to support the Internet's future congestion control requirements (the first challenge in [RFC6077]). The research question of whether packet congestion might become common and what to do if it does may in the future be explored in the IRTF (the "Challenge 3: Packet Size" in [RFC6077]). Note that sometimes it seems that resources might be congested by neither bits nor packets, e.g., where the queue for access to a wireless medium is in units of transmission opportunities. However, the root cause of congestion of the underlying spectrum is overload of bits (see Section 4.1.2).

6. Security Considerations

This memo recommends that queues do not bias drop probability due to packets size. For instance, dropping small packets less often than large ones creates a perverse incentive for transports to break down their flows into tiny segments. One of the benefits of implementing AQM was meant to be to remove this perverse incentive that tail-drop queues gave to small packets. In practice, transports cannot all be trusted to respond to congestion. So another reason for recommending that queues not bias drop probability towards small packets is to avoid the vulnerability to small-packet DDoS attacks that would otherwise result. One of the benefits of implementing AQM was meant to be to remove tail drop's DoS vulnerability to small packets, so we shouldn't add it back again. If most queues implemented AQM with byte-mode drop, the resulting network would amplify the potency of a small-packet DDoS attack. At the first queue, the stream of packets would push aside a greater proportion of large packets, so more of the small packets would survive to attack the next queue. Thus a flood of small packets would continue on towards the destination, pushing regular traffic with large packets out of the way in one queue after the next, but suffering much less drop itself.
Top   ToC   RFC7141 - Page 27
   Appendix C explains why the ability of networks to police the
   response of _any_ transport to congestion depends on bit-congestible
   network resources only doing packet-mode drop, not byte-mode drop.
   In summary, it says that making drop probability depend on the size
   of the packets that bits happen to be divided into simply encourages
   the bits to be divided into smaller packets.  Byte-mode drop would
   therefore irreversibly complicate any attempt to fix the Internet's
   incentive structures.

7. Conclusions

This memo identifies the three distinct stages of the congestion notification process where implementations need to decide whether to take packet size into account. The recommendations provided in Section 2 of this memo are different in each case: o When network equipment measures the length of a queue, if it is not feasible to use time; it is recommended to count in bytes if the network resource is congested by bytes, or to count in packets if is congested by packets. o When network equipment decides whether to drop (or mark) a packet, it is recommended that the size of the particular packet should not be taken into account. o However, when a transport algorithm responds to a dropped or marked packet, the size of the rate reduction should be proportionate to the size of the packet. In summary, the answers are 'it depends', 'no', and 'yes', respectively. For the specific case of RED, this means that byte-mode queue measurement will often be appropriate, but the use of byte-mode drop is very strongly discouraged. At the transport layer, the IETF should continue updating congestion control protocols to take into account the size of each packet that indicates congestion. Also, the IETF should continue to make protocols less sensitive to losing control packets like SYNs, pure ACKs, and DNS exchanges. Although many control packets happen to be small, the alternative of network equipment favouring all small packets would be dangerous. That would create perverse incentives to split data transfers into smaller packets. The memo develops these recommendations from principled arguments concerning scaling, layering, incentives, inherent efficiency, security, and 'policeability'. It also addresses practical issues
Top   ToC   RFC7141 - Page 28
   such as specific buffer architectures and incremental deployment.
   Indeed, a limited survey of RED implementations is discussed, which
   shows there appears to be little, if any, installed base of RED's
   byte-mode drop.  Therefore, it can be deprecated with little, if any,
   incremental deployment complications.

   The recommendations have been developed on the well-founded basis
   that most Internet resources are bit-congestible, not packet-
   congestible.  We need to know the likelihood that this assumption
   will prevail in the longer term and, if it might not, what protocol
   changes will be needed to cater for a mix of the two.  The IRTF
   Internet Congestion Control Research Group (ICCRG) is currently
   working on these problems [RFC6077].

8. Acknowledgements

Thank you to Sally Floyd, who gave extensive and useful review comments. Also thanks for the reviews from Philip Eardley, David Black, Fred Baker, David Taht, Toby Moncaster, Arnaud Jacquet, and Mirja Kuehlewind, as well as helpful explanations of different hardware approaches from Larry Dunn and Fred Baker. We are grateful to Bruce Davie and his colleagues for providing a timely and efficient survey of RED implementation in Cisco's product range. Also, grateful thanks to Toby Moncaster, Will Dormann, John Regnault, Simon Carter, and Stefaan De Cnodder who further helped survey the current status of RED implementation and deployment, and, finally, thanks to the anonymous individuals who responded. Bob Briscoe and Jukka Manner were partly funded by Trilogy and Trilogy 2, research projects (ICT-216372, ICT-317756) supported by the European Community under its Seventh Framework Programme. The views expressed here are those of the authors only.

9. References

9.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, April 1998.
Top   ToC   RFC7141 - Page 29
   [RFC2914]  Floyd, S., "Congestion Control Principles", BCP 41, RFC
              2914, September 2000.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP", RFC
              3168, September 2001.

9.2. Informative References

[BLUE02] Feng, W-c., Shin, K., Kandlur, D., and D. Saha, "The BLUE active queue management algorithms", IEEE/ACM Transactions on Networking 10(4) 513-528, August 2002, <http://dx.doi.org/10.1109/TNET.2002.801399>. [CCvarPktSize] Widmer, J., Boutremans, C., and J-Y. Le Boudec, "End-to- end congestion control for TCP-friendly flows with variable packet size", ACM CCR 34(2) 137-151, April 2004, <http://doi.acm.org/10.1145/997150.997162>. [CHOKe_Var_Pkt] Psounis, K., Pan, R., and B. Prabhaker, "Approximate Fair Dropping for Variable-Length Packets", IEEE Micro 21(1):48-56, January-February 2001, <http://ieeexplore.ieee.org/xpl/ articleDetails.jsp?arnumber=903061>. [CoDel] Nichols, K. and V. Jacobson, "Controlled Delay Active Queue Management", Work in Progress, February 2013. [DRQ] Shin, M., Chong, S., and I. Rhee, "Dual-Resource TCP/AQM for Processing-Constrained Networks", IEEE/ACM Transactions on Networking Vol 16, issue 2, April 2008, <http://dx.doi.org/10.1109/TNET.2007.900415>. [DupTCP] Wischik, D., "Short messages", Philosophical Transactions of the Royal Society A 366(1872):1941-1953, June 2008, <http://rsta.royalsocietypublishing.org/content/366/1872/ 1941.full.pdf+html>. [ECNFixedWireless] Siris, V., "Resource Control for Elastic Traffic in CDMA Networks", Proc. ACM MOBICOM'02 , September 2002, <http://www.ics.forth.gr/netlab/publications/ resource_control_elastic_cdma.html>.
Top   ToC   RFC7141 - Page 30
   [Evol_cc]  Gibbens, R. and F. Kelly, "Resource pricing and the
              evolution of congestion control", Automatica
              35(12)1969-1985, December 1999,
              <http://www.sciencedirect.com/science/article/pii/
              S0005109899001351>.

   [GentleAggro]
              Flach, T., Dukkipati, N., Terzis, A., Raghavan, B.,
              Cardwell, N., Cheng, Y., Jain, A., Hao, S., Katz-Bassett,
              E., and R. Govindan, "Reducing web latency: the virtue of
              gentle aggression", ACM SIGCOMM CCR 43(4)159-170, August
              2013, <http://doi.acm.org/10.1145/2486001.2486014>.

   [IOSArch]  Bollapragada, V., White, R., and C. Murphy, "Inside Cisco
              IOS Software Architecture", Cisco Press: CCIE Professional
              Development ISBN13: 978-1-57870-181-0, July 2000.

   [PIE]      Pan, R., Natarajan, P., Piglione, C., Prabhu, M.,
              Subramanian, V., Baker, F., and B. Steeg, "PIE: A
              Lightweight Control Scheme To Address the Bufferbloat
              Problem", Work in Progress, February 2014.

   [PktSizeEquCC]
              Vasallo, P., "Variable Packet Size Equation-Based
              Congestion Control", ICSI Technical Report tr-00-008,
              2000, <http://http.icsi.berkeley.edu/ftp/global/pub/
              techreports/2000/tr-00-008.pdf>.

   [RED93]    Floyd, S. and V. Jacobson, "Random Early Detection (RED)
              gateways for Congestion Avoidance", IEEE/ACM Transactions
              on Networking 1(4) 397--413, August 1993,
              <http://ieeexplore.ieee.org/xpls/
              abs_all.jsp?arnumber=251892>.

   [REDbias]  Eddy, W. and M. Allman, "A Comparison of RED's Byte and
              Packet Modes", Computer Networks 42(3) 261--280, June
              2003,
              <http://www.ir.bbn.com/documents/articles/redbias.ps>.

   [REDbyte]  De Cnodder, S., Elloumi, O., and K. Pauwels, "Effect of
              different packet sizes on RED performance", Proc. 5th IEEE
              Symposium on Computers and Communications (ISCC) 793-799,
              July 2000, <http://ieeexplore.ieee.org/xpls/
              abs_all.jsp?arnumber=860741>.
Top   ToC   RFC7141 - Page 31
   [RFC2474]  Nichols, K., Blake, S., Baker, F., and D. Black,
              "Definition of the Differentiated Services Field (DS
              Field) in the IPv4 and IPv6 Headers", RFC 2474, December
              1998.

   [RFC3426]  Floyd, S., "General Architectural and Policy
              Considerations", RFC 3426, November 2002.

   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC3714]  Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
              Control for Voice Traffic in the Internet", RFC 3714,
              March 2004.

   [RFC4828]  Floyd, S. and E. Kohler, "TCP Friendly Rate Control
              (TFRC): The Small-Packet (SP) Variant", RFC 4828, April
              2007.

   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
              Friendly Rate Control (TFRC): Protocol Specification", RFC
              5348, September 2008.

   [RFC5562]  Kuzmanovic, A., Mondal, A., Floyd, S., and K.
              Ramakrishnan, "Adding Explicit Congestion Notification
              (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June
              2009.

   [RFC5670]  Eardley, P., "Metering and Marking Behaviour of PCN-
              Nodes", RFC 5670, November 2009.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, September 2009.

   [RFC5690]  Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
              Acknowledgement Congestion Control to TCP", RFC 5690,
              February 2010.

   [RFC6077]  Papadimitriou, D., Welzl, M., Scharf, M., and B. Briscoe,
              "Open Research Issues in Internet Congestion Control", RFC
              6077, February 2011.

   [RFC6679]  Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
              and K. Carlberg, "Explicit Congestion Notification (ECN)
              for RTP over UDP", RFC 6679, August 2012.
Top   ToC   RFC7141 - Page 32
   [RFC6789]  Briscoe, B., Woundy, R., and A. Cooper, "Congestion
              Exposure (ConEx) Concepts and Use Cases", RFC 6789,
              December 2012.

   [Rate_fair_Dis]
              Briscoe, B., "Flow Rate Fairness: Dismantling a Religion",
              ACM CCR 37(2)63-74, April 2007,
              <http://portal.acm.org/citation.cfm?id=1232926>.

   [gentle_RED]
              Floyd, S., "Recommendation on using the "gentle_" variant
              of RED", Web page , March 2000,
              <http://www.icir.org/floyd/red/gentle.html>.

   [pBox]     Floyd, S. and K. Fall, "Promoting the Use of End-to-End
              Congestion Control", IEEE/ACM Transactions on Networking
              7(4) 458--472, August 1999, <http://ieeexplore.ieee.org/
              xpls/abs_all.jsp?arnumber=793002>.

   [pktByteEmail]
              Floyd, S., "RED: Discussions of Byte and Packet Modes",
              email, March 1997,
              <http://ee.lbl.gov/floyd/REDaveraging.txt>.
Top   ToC   RFC7141 - Page 33

Appendix A. Survey of RED Implementation Status

This Appendix is informative, not normative. In May 2007 a survey was conducted of 84 vendors to assess how widely drop probability based on packet size has been implemented in RED Table 3. About 19% of those surveyed replied, giving a sample size of 16. Although in most cases we do not have permission to identify the respondents, we can say that those that have responded include most of the larger equipment vendors, covering a large fraction of the market. The two who gave permission to be identified were Cisco and Alcatel-Lucent. The others range across the large network equipment vendors at L3 & L2, firewall vendors, wireless equipment vendors, as well as large software businesses with a small selection of networking products. All those who responded confirmed that they have not implemented the variant of RED with drop dependent on packet size (2 were fairly sure they had not but needed to check more thoroughly). At the time the survey was conducted, Linux did not implement RED with packet-size bias of drop, although we have not investigated a wider range of open source code. +-------------------------------+----------------+--------------+ | Response | No. of vendors | % of vendors | +-------------------------------+----------------+--------------+ | Not implemented | 14 | 17% | | Not implemented (probably) | 2 | 2% | | Implemented | 0 | 0% | | No response | 68 | 81% | | Total companies/orgs surveyed | 84 | 100% | +-------------------------------+----------------+--------------+ Table 3: Vendor Survey on byte-mode drop variant of RED (lower drop probability for small packets) Where reasons were given for why the byte-mode drop variant had not been implemented, the extra complexity of packet-bias code was most prevalent, though one vendor had a more principled reason for avoiding it -- similar to the argument of this document. Our survey was of vendor implementations, so we cannot be certain about operator deployment. But we believe many queues in the Internet are still tail drop. The company of one of the co-authors (BT) has widely deployed RED; however, many tail-drop queues are bound to still exist, particularly in access network equipment and on middleboxes like firewalls, where RED is not always available.
Top   ToC   RFC7141 - Page 34
   Routers using a memory architecture based on fixed-size buffers with
   borrowing may also still be prevalent in the Internet.  As explained
   in Section 4.2.1, these also provide a marginal (but legitimate) bias
   towards small packets.  So even though RED byte-mode drop is not
   prevalent, it is likely there is still some bias towards small
   packets in the Internet due to tail-drop and fixed-buffer borrowing.

Appendix B. Sufficiency of Packet-Mode Drop

This Appendix is informative, not normative. Here we check that packet-mode drop (or marking) in the network gives sufficiently generic information for the transport layer to use. We check against a 2x2 matrix of four scenarios that may occur now or in the future (Table 4). Checking the two scenarios in each of the horizontal and vertical dimensions tests the extremes of sensitivity to packet size in the transport and in the network respectively. Note that this section does not consider byte-mode drop at all. Having deprecated byte-mode drop, the goal here is to check that packet-mode drop will be sufficient in all cases. +-------------------------------+-----------------+-----------------+ | Transport -> | a) Independent | b) Dependent on | | ----------------------------- | of packet size | packet size of | | Network | of congestion | congestion | | | notifications | notifications | +-------------------------------+-----------------+-----------------+ | 1) Predominantly bit- | Scenario a1) | Scenario b1) | | congestible network | | | | 2) Mix of bit-congestible and | Scenario a2) | Scenario b2) | | pkt-congestible network | | | +-------------------------------+-----------------+-----------------+ Table 4: Four Possible Congestion Scenarios Appendix B.1 focuses on the horizontal dimension of Table 4 checking that packet-mode drop (or marking) gives sufficient information, whether or not the transport uses it -- scenarios b) and a) respectively. Appendix B.2 focuses on the vertical dimension of Table 4, checking that packet-mode drop gives sufficient information to the transport whether resources in the network are bit-congestible or packet- congestible (these terms are defined in Section 1.1).
Top   ToC   RFC7141 - Page 35
   Notation:  To be concrete, we will compare two flows with different
      packet sizes, s_1 and s_2.  As an example, we will take
      s_1 = 60 B = 480 b and s_2 = 1,500 B = 12,000 b.

      A flow's bit rate, x [bps], is related to its packet rate, u
      [pps], by

         x(t) = s*u(t).

      In the bit-congestible case, path congestion will be denoted by
      p_b, and in the packet-congestible case by p_p.  When either case
      is implied, the letter p alone will denote path congestion.

B.1. Packet-Size (In)Dependence in Transports

In all cases, we consider a packet-mode drop queue that indicates congestion by dropping (or marking) packets with probability p irrespective of packet size. We use an example value of loss (marking) probability, p=0.1%. A transport like TCP as specified in RFC 5681 treats a congestion notification on any packet whatever its size as one event. However, a network with just the packet-mode drop algorithm gives more information if the transport chooses to use it. We will use Table 5 to illustrate this. We will set aside the last column until later. The columns labelled 'Flow 1' and 'Flow 2' compare two flows consisting of 60 B and 1,500 B packets respectively. The body of the table considers two separate cases, one where the flows have an equal bit rate and the other with equal packet rates. In both cases, the two flows fill a 96 Mbps link. Therefore, in the equal bit rate case, they each have half the bit rate (48Mbps). Whereas, with equal packet rates, Flow 1 uses 25 times smaller packets so it gets 25 times less bit rate -- it only gets 1/(1+25) of the link capacity (96 Mbps / 26 = 4 Mbps after rounding). In contrast Flow 2 gets 25 times more bit rate (92 Mbps) in the equal packet rate case because its packets are 25 times larger. The packet rate shown for each flow could easily be derived once the bit rate was known by dividing the bit rate by packet size, as shown in the column labelled 'Formula'.
Top   ToC   RFC7141 - Page 36
      Parameter               Formula       Flow 1   Flow 2 Combined
      ----------------------- ----------- -------- -------- --------
      Packet size             s/8             60 B  1,500 B    (Mix)
      Packet size             s              480 b 12,000 b    (Mix)
      Pkt loss probability    p               0.1%     0.1%     0.1%

      EQUAL BIT RATE CASE
      Bit rate                x            48 Mbps  48 Mbps  96 Mbps
      Packet rate             u = x/s     100 kpps   4 kpps 104 kpps
      Absolute pkt-loss rate  p*u          100 pps    4 pps  104 pps
      Absolute bit-loss rate  p*u*s        48 kbps  48 kbps  96 kbps
      Ratio of lost/sent pkts p*u/u           0.1%     0.1%     0.1%
      Ratio of lost/sent bits p*u*s/(u*s)     0.1%     0.1%     0.1%

      EQUAL PACKET RATE CASE
      Bit rate                x             4 Mbps  92 Mbps  96 Mbps
      Packet rate             u = x/s       8 kpps   8 kpps  15 kpps
      Absolute pkt-loss rate  p*u            8 pps    8 pps   15 pps
      Absolute bit-loss rate  p*u*s         4 kbps  92 kbps  96 kbps
      Ratio of lost/sent pkts p*u/u           0.1%     0.1%     0.1%
      Ratio of lost/sent bits p*u*s/(u*s)     0.1%     0.1%     0.1%

    Table 5: Absolute Loss Rates and Loss Ratios for Flows of Small and
                      Large Packets and Both Combined

   So far, we have merely set up the scenarios.  We now consider
   congestion notification in the scenario.  Two TCP flows with the same
   round-trip time aim to equalise their packet-loss rates over time;
   that is, the number of packets lost in a second, which is the packets
   per second (u) multiplied by the probability that each one is dropped
   (p).  Thus, TCP converges on the case labelled 'Equal packet rate' in
   the table, where both flows aim for the same absolute packet-loss
   rate (both 8 pps in the table).

   Packet-mode drop actually gives flows sufficient information to
   measure their loss rate in bits per second, if they choose, not just
   packets per second.  Each flow can count the size of a lost or marked
   packet and scale its rate response in proportion (as TFRC-SP does).
   The result is shown in the row entitled 'Absolute bit-loss rate',
   where the bits lost in a second is the packets per second (u)
   multiplied by the probability of losing a packet (p) multiplied by
   the packet size (s).  Such an algorithm would try to remove any
   imbalance in the bit-loss rate such as the wide disparity in the case
   labelled 'Equal packet rate' (4k bps vs. 92 kbps).  Instead, a
   packet-size-dependent algorithm would aim for equal bit-loss rates,
   which would drive both flows towards the case labelled 'Equal bit
   rate', by driving them to equal bit-loss rates (both 48 kbps in this
   example).
Top   ToC   RFC7141 - Page 37
   The explanation so far has assumed that each flow consists of packets
   of only one constant size.  Nonetheless, it extends naturally to
   flows with mixed packet sizes.  In the right-most column of Table 5,
   a flow of mixed-size packets is created simply by considering Flow 1
   and Flow 2 as a single aggregated flow.  There is no need for a flow
   to maintain an average packet size.  It is only necessary for the
   transport to scale its response to each congestion indication by the
   size of each individual lost (or marked) packet.  Taking, for
   example, the case labelled 'Equal packet rate', in one second about 8
   small packets and 8 large packets are lost (making closer to 15 than
   16 losses per second due to rounding).  If the transport multiplies
   each loss by its size, in one second it responds to 8*480 and
   8*12,000 lost bits, adding up to 96,000 lost bits in a second.  This
   double checks correctly, being the same as 0.1% of the total bit rate
   of 96 Mbps.  For completeness, the formula for absolute bit-loss rate
   is p(u1*s1+u2*s2).

   Incidentally, a transport will always measure the loss probability
   the same, irrespective of whether it measures in packets or in bytes.
   In other words, the ratio of lost packets to sent packets will be the
   same as the ratio of lost bytes to sent bytes.  (This is why TCP's
   bit rate is still proportional to packet size, even when byte
   counting is used, as recommended for TCP in [RFC5681], mainly for
   orthogonal security reasons.)  This is intuitively obvious by
   comparing two example flows; one with 60 B packets, the other with
   1,500 B packets.  If both flows pass through a queue with drop
   probability 0.1%, each flow will lose 1 in 1,000 packets.  In the
   stream of 60 B packets, the ratio of lost bytes to sent bytes will be
   60 B in every 60,000 B; and in the stream of 1,500 B packets, the
   loss ratio will be 1,500 B out of 1,500,000 B.  When the transport
   responds to the ratio of lost to sent packets, it will measure the
   same ratio whether it measures in packets or bytes: 0.1% in both
   cases.  The fact that this ratio is the same whether measured in
   packets or bytes can be seen in Table 5, where the ratio of lost
   packets to sent packets and the ratio of lost bytes to sent bytes is
   always 0.1% in all cases (recall that the scenario was set up with
   p=0.1%).

   This discussion of how the ratio can be measured in packets or bytes
   is only raised here to highlight that it is irrelevant to this memo!
   Whether or not a transport depends on packet size depends on how this
   ratio is used within the congestion control algorithm.

   So far, we have shown that packet-mode drop passes sufficient
   information to the transport layer so that the transport can take bit
   congestion into account, by using the sizes of the packets that
   indicate congestion.  We have also shown that the transport can
Top   ToC   RFC7141 - Page 38
   choose not to take packet size into account if it wishes.  We will
   now consider whether the transport can know which to do.

B.2. Bit-Congestible and Packet-Congestible Indications

As a thought-experiment, imagine an idealised congestion notification protocol that supports both bit-congestible and packet-congestible resources. It would require at least two ECN flags, one for each of the bit-congestible and packet-congestible resources. 1. A packet-congestible resource trying to code congestion level p_p into a packet stream should mark the idealised 'packet congestion' field in each packet with probability p_p irrespective of the packet's size. The transport should then take a packet with the packet congestion field marked to mean just one mark, irrespective of the packet size. 2. A bit-congestible resource trying to code time-varying byte- congestion level p_b into a packet stream should mark the 'byte congestion' field in each packet with probability p_b, again irrespective of the packet's size. Unlike before, the transport should take a packet with the byte congestion field marked to count as a mark on each byte in the packet. This hides a fundamental problem -- much more fundamental than whether we can magically create header space for yet another ECN flag, or whether it would work while being deployed incrementally. Distinguishing drop from delivery naturally provides just one implicit bit of congestion indication information -- the packet is either dropped or not. It is hard to drop a packet in two ways that are distinguishable remotely. This is a similar problem to that of distinguishing wireless transmission losses from congestive losses. This problem would not be solved, even if ECN were universally deployed. A congestion notification protocol must survive a transition from low levels of congestion to high. Marking two states is feasible with explicit marking, but it is much harder if packets are dropped. Also, it will not always be cost-effective to implement AQM at every low-level resource, so drop will often have to suffice. We are not saying two ECN fields will be needed (and we are not saying that somehow a resource should be able to drop a packet in one of two different ways so that the transport can distinguish which sort of drop it was!). These two congestion notification channels are a conceptual device to illustrate a dilemma we could face in the future. Section 3 gives four good reasons why it would be a bad idea to allow for packet size by biasing drop probability in favour of small packets within the network. The impracticality of our thought
Top   ToC   RFC7141 - Page 39
   experiment shows that it will be hard to give transports a practical
   way to know whether or not to take into account the size of
   congestion indication packets.

   Fortunately, this dilemma is not pressing because by design most
   equipment becomes bit-congested before its packet processing becomes
   congested (as already outlined in Section 1.1).  Therefore,
   transports can be designed on the relatively sound assumption that a
   congestion indication will usually imply bit congestion.

   Nonetheless, although the above idealised protocol isn't intended for
   implementation, we do want to emphasise that research is needed to
   predict whether there are good reasons to believe that packet
   congestion might become more common, and if so, to find a way to
   somehow distinguish between bit and packet congestion [RFC3714].

   Recently, the dual resource queue (DRQ) proposal [DRQ] has been made
   on the premise that, as network processors become more cost-
   effective, per-packet operations will become more complex
   (irrespective of whether more function in the network is desirable).
   Consequently the premise is that CPU congestion will become more
   common.  DRQ is a proposed modification to the RED algorithm that
   folds both bit congestion and packet congestion into one signal
   (either loss or ECN).

   Finally, we note one further complication.  Strictly, packet-
   congestible resources are often cycle-congestible.  For instance, for
   routing lookups, load depends on the complexity of each lookup and
   whether or not the pattern of arrivals is amenable to caching.  This
   also reminds us that any solution must not require a forwarding
   engine to use excessive processor cycles in order to decide how to
   say it has no spare processor cycles.

Appendix C. Byte-Mode Drop Complicates Policing Congestion Response

This section is informative, not normative. There are two main classes of approach to policing congestion response: (i) policing at each bottleneck link or (ii) policing at the edges of networks. Packet-mode drop in RED is compatible with either, while byte-mode drop precludes edge policing. The simplicity of an edge policer relies on one dropped or marked packet being equivalent to another of the same size without having to know which link the drop or mark occurred at. However, the byte-mode drop algorithm has to depend on the local MTU of the line -- it needs to use some concept of a 'normal' packet size. Therefore, one dropped or marked packet from a byte-mode drop algorithm is not
Top   ToC   RFC7141 - Page 40
   necessarily equivalent to another from a different link.  A policing
   function local to the link can know the local MTU where the
   congestion occurred.  However, a policer at the edge of the network
   cannot, at least not without a lot of complexity.

   The early research proposals for type (i) policing at a bottleneck
   link [pBox] used byte-mode drop, then detected flows that contributed
   disproportionately to the number of packets dropped.  However, with
   no extra complexity, later proposals used packet-mode drop and looked
   for flows that contributed a disproportionate amount of dropped bytes
   [CHOKe_Var_Pkt].

   Work is progressing on the Congestion Exposure (ConEx) protocol
   [RFC6789], which enables a type (ii) edge policer located at a user's
   attachment point.  The idea is to be able to take an integrated view
   of the effect of all a user's traffic on any link in the
   internetwork.  However, byte-mode drop would effectively preclude
   such edge policing because of the MTU issue above.

   Indeed, making drop probability depend on the size of the packets
   that bits happen to be divided into would simply encourage the bits
   to be divided into smaller packets in order to confuse policing.  In
   contrast, as long as a dropped/marked packet is taken to mean that
   all the bytes in the packet are dropped/marked, a policer can remain
   robust against sequences of bits being re-divided into different size
   packets or across different size flows [Rate_fair_Dis].
Top   ToC   RFC7141 - Page 41

Authors' Addresses

Bob Briscoe BT B54/77, Adastral Park Martlesham Heath Ipswich IP5 3RE UK Phone: +44 1473 645196 EMail: bob.briscoe@bt.com URI: http://bobbriscoe.net/ Jukka Manner Aalto University Department of Communications and Networking (Comnet) P.O. Box 13000 FIN-00076 Aalto Finland Phone: +358 9 470 22481 EMail: jukka.manner@aalto.fi URI: http://www.netlab.tkk.fi/~jmanner/