Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 3290

An Informal Management Model for Diffserv Routers

Pages: 56
Informational
Part 3 of 3 – Pages 35 to 56
First   Prev   None

Top   ToC   RFC3290 - Page 35   prevText

8. Traffic Conditioning Blocks (TCBs)

The Classifier, Meter, Action, Algorithmic Dropper, Queue and Scheduler functional datapath elements described above can be combined into Traffic Conditioning Blocks (TCBs). A TCB is an abstraction of a set of functional datapath elements that may be used to facilitate the definition of specific traffic conditioning functionality (e.g., it might be likened to a template which can be replicated multiple times for different traffic streams or different customers). It has no likely physical representation in the implementation of the data path: it is invented purely as an abstraction for use by management tools. This model describes the configuration and management of a Diffserv interface in terms of a TCB that contains, by definition, zero or more Classifier, Meter, Action, Algorithmic Dropper, Queue and Scheduler elements. These elements are arranged arbitrarily according to the policy being expressed, but always in the order here. Traffic may be classified; classified traffic may be metered; each stream of traffic identified by a combination of classifiers and meters may have some set of actions performed on it, followed by drop
Top   ToC   RFC3290 - Page 36
   algorithms; packets of the traffic stream may ultimately be stored
   into a queue and then be scheduled out to the next TCB or physical
   interface.  It is permissible to omit elements or include null
   elements of any type, or to concatenate multiple functional datapath
   elements of the same type.

   When the Diffserv treatment for a given packet needs to have such
   building blocks repeated, this is performed by cascading multiple
   TCBs:  an output of one TCB may drive the input of a succeeding one.
   For example, consider the case where traffic of a set of classes is
   shaped to a set of rates, but the total output rate of the group of
   classes must also be limited to a rate.  One might imagine a set of
   network news feeds, each with a certain maximum rate, and a policy
   that their aggregate may not exceed some figure.  This may be simply
   accomplished by cascading two TCBs.  The first classifies the traffic
   into its separate feeds and queues each feed separately.  The feeds
   (or a subset of them) are now fed into a second TCB, which places all
   input (these news feeds) into a single queue with a certain maximum
   rate.  In implementation, one could imagine this as the several
   literal queues, a CBQ or WFQ system with an appropriate (and complex)
   weighting scheme, or a number of other approaches.  But they would
   have the same externally measurable effect on the traffic as if they
   had been literally implemented with separate TCBs.

8.1. TCB

A generalized TCB might consist of the following stages: - Classification stage - Metering stage - Action stage (involving Markers, Absolute Droppers, Counters, and Multiplexors) - Queuing stage (involving Algorithmic Droppers, Queues, and Schedulers) where each stage may consist of a set of parallel datapaths consisting of pipelined elements. A Classifier or a Meter is typically a 1:N element, an Action, Algorithmic Dropper, or Queue is typically a 1:1 element and a Scheduler is a N:1 element. A complete TCB should, however, result in a 1:1 or 1:N abstract element. Note that the fan-in or fan-out of an element is not an important defining characteristic of this taxonomy.
Top   ToC   RFC3290 - Page 37

8.1.1. Building blocks for Queuing

Some particular rules are applied to the ordering of elements within a Queuing stage within a TCB: elements of the same type may appear more than once, either in parallel or in series. Typically, a queuing stage will have relatively many elements in parallel and few in series. Iteration and recursion are not supported constructs (the elements are arranged in an acyclic graph). The following inter- connections of elements are allowed: - The input of a Queue may be the input of the queuing block, or it may be connected to the output of an Algorithmic Dropper, or to an output of a Scheduler. - Each input of a Scheduler may be connected to the output of a Queue, to the output of an Algorithmic Dropper, or to the output of another Scheduler. - The input of an Algorithmic Dropper may be the first element of the queuing stage, the output of another Algorithmic Dropper, or it may be connected to the output of a Queue (to indicate head-dropping). - The output of the queuing block may be the output of a Queue, an Algorithmic Dropper, or a Scheduler. Note, in particular, that Schedulers may operate in series such so that a packet at the head of a Queue feeding the concatenated Schedulers is serviced only after all of the scheduling criteria are met. For example, a Queue which carries EF traffic streams may be served first by a non-work-conserving Scheduler to shape the stream to a maximum rate, then by a work-conserving Scheduler to mix EF traffic streams with other traffic streams. Alternatively, there might be a Queue and/or a dropper between the two Schedulers. Note also that some non-sensical scenarios (e.g., a Queue preceding an Algorithmic Dropper, directly feeding into another Queue), are prohibited.

8.2. An Example TCB

A SLS is presumed to have been negotiated between the customer and the provider which specifies the handling of the customer's traffic, as defined by a TCS) by the provider's network. The agreement might be of the following form:
Top   ToC   RFC3290 - Page 38
      DSCP     PHB   Profile     Treatment
      ----     ---   -------     ----------------------
      001001   EF    Profile4    Discard non-conforming.
      001100   AF11  Profile5    Shape to profile, tail-drop when full.
      001101   AF21  Profile3    Re-mark non-conforming to DSCP 001000,
                                 tail-drop when full.
      other    BE    none        Apply RED-like dropping.

   This SLS specifies that the customer may submit packets marked for
   DSCP 001001 which will get EF treatment so long as they remain
   conforming to Profile4, which will be discarded if they exceed this
   profile.  The discarded packets are counted in this example, perhaps
   for use by the provider's sales department in convincing the customer
   to buy a larger SLS.  Packets marked for DSCP 001100 will be shaped
   to Profile5 before forwarding.  Packets marked for DSCP 001101 will
   be metered to Profile3 with non-conforming packets "downgraded" by
   being re-marked with a DSCP of 001000.  It is implicit in this
   agreement that conforming packets are given the PHB originally
   indicated by the packets' DSCP field.

   Figures 6 and 7 illustrates a TCB that might be used to handle this
   SLS at an ingress interface at the customer/provider boundary.

   The Classification stage of this example consists of a single BA
   classifier.  The BA classifier is used to separate traffic based on
   the Diffserv service level requested by the customer (as indicated by
   the DSCP in each submitted packet's IP header).  We illustrate three
   DSCP filter values: A, B, and C. The 'X' in the BA classifier is a
   wildcard filter that matches every packet not otherwise matched.

   The path for DSCP 001100 proceeds directly to Dropper1 whilst the
   paths for DSCP 001001 and 001101 include a metering stage.  All other
   traffic is passed directly on to Dropper3.  There is a separate meter
   for each set of packets corresponding to classifier outputs A and C.
   Each meter uses a specific profile, as specified in the TCS, for the
   corresponding Diffserv service level.  The meters in this example
   each indicate one of two conformance levels: conforming or non-
   conforming.

   Following the Metering stage is an Action stage in some of the
   branches.  Packets submitted for DSCP 001001 (Classifier output A)
   that are deemed non-conforming by Meter1 are counted and discarded
   while packets that are conforming are passed on to Queue1.  Packets
   submitted for DSCP 001101 (Classifier output C) that are deemed non-
   conforming by Meter2 are re-marked and then both conforming and non-
   conforming packets are multiplexed together before being passed on to
   Dropper2/Queue3.
Top   ToC   RFC3290 - Page 39
   The Algorithmic Dropping, Queuing and Scheduling stages are realized
   as follows, illustrated in figure 7.  Note that the figure does not
   show any of the implicit control linkages between elements that allow
   e.g., an Algorithmic Dropper to sense the current state of a
   succeeding Queue.

                         +-----+
                         |    A|---------------------------> to Queue1
                      +->|     |
                      |  |    B|--+  +-----+    +-----+
                      |  +-----+  |  |     |    |     |
                      |  Meter1   +->|     |--->|     |
                      |              |     |    |     |
                      |              +-----+    +-----+
                      |              Counter1   Absolute
submitted +-----+     |                         Dropper1
traffic   |    A|-----+
--------->|    B|--------------------------------------> to AlgDropper1
          |    C|-----+
          |    X|--+  |
          +-----+  |  |  +-----+                +-----+
        Classifier1|  |  |    A|--------------->|A    |
           (BA)    |  +->|     |                |     |--> to AlgDrop2
                   |     |    B|--+  +-----+ +->|B    |
                   |     +-----+  |  |     | |  +-----+
                   |     Meter2   +->|     |-+    Mux1
                   |                 |     |
                   |                 +-----+
                   |                 Marker1
                   +-----------------------------------> to AlgDropper3

     Figure 6:  An Example Traffic Conditioning Block (Part 1)

   Conforming DSCP 001001 packets from Meter1 are passed directly to
   Queue1: there is no way, with configuration of the following
   Scheduler to match the metering, for these packets to overflow the
   depth of Queue1, so there is no requirement for dropping at this
   point.  Packets marked for DSCP 001100 must be passed through a
   tail-dropper, AlgDropper1, which serves to limit the depth of the
   following queue, Queue2: packets that arrive to a full queue will be
   discarded.  This is likely to be an error case: the customer is
   obviously not sticking to its agreed profile.  Similarly, all packets
   from the original DSCP 001101 stream (some may have been re-marked by
   this stage) are passed to AlgDropper2 and Queue3.  Packets marked for
   all other DSCPs are passed to AlgDropper3 which is a RED-like
   Algorithmic Dropper: based on feedback of the current depth of
   Queue4, this dropper is supposed to discard enough packets from its
   input stream to keep the queue depth under control.
Top   ToC   RFC3290 - Page 40
   These four Queue elements are then serviced by a Scheduler element
   Scheduler1: this must be configured to give each of its inputs an
   appropriate priority and/or bandwidth share.  Inputs A and C are
   given guarantees of bandwidth, as appropriate for the contracted
   profiles.  Input B is given a limit on the bandwidth it can use
   (i.e., a non-work-conserving discipline) in order to achieve the
   desired shaping of this stream.  Input D is given no limits or
   guarantees but a lower priority than the other queues, appropriate
   for its best-effort status.  Traffic then exits the Scheduler in a
   single orderly stream.

   The interconnections of the TCB elements illustrated in Figures 6 and
   7 can be represented textually as follows:

        TCB1:

        Classifier1:
        FilterA:             Meter1
        FilterB:             Dropper1
        FilterC:             Meter2
        Default:             Dropper3


      from Meter1                     +-----+
      ------------------------------->|     |----+
                                      |     |    |
                                      +-----+    |
                                      Queue1     |
                                                 |  +-----+
      from Classifier1 +-----+        +-----+    +->|A    |
      ---------------->|     |------->|     |------>|B    |------->
                       |     |        |     |  +--->|C    |  exiting
                       +-----+        +-----+  | +->|D    |  traffic
                       AlgDropper1    Queue2   | |  +-----+
                                               | |  Scheduler1
      from Mux1        +-----+        +-----+  | |
      ---------------->|     |------->|     |--+ |
                       |     |        |     |    |
                       +-----+        +-----+    |
                       AlgDropper2    Queue3     |
                                                 |
      from Classifier1 +-----+        +-----+    |
      ---------------->|     |------->|     |----+
                       |     |        |     |
                       +-----+        +-----+
                       AlgDropper3    Queue4

        Figure 7: An Example Traffic Conditioning Block (Part 2)
Top   ToC   RFC3290 - Page 41
        Meter1:
        Type:                AverageRate
        Profile:             Profile4
        ConformingOutput:    Queue1
        NonConformingOutput: Counter1

        Counter1:
        Output:              AbsoluteDropper1

        Meter2:
        Type:                AverageRate
        Profile:             Profile3
        ConformingOutput:    Mux1.InputA
        NonConformingOutput: Marker1

        Marker1:
        Type:                DSCPMarker
        Mark:                001000
        Output:              Mux1.InputB

        Mux1:
        Output:              Dropper2

        AlgDropper1:
        Type:                AlgorithmicDropper
        Discipline:          Drop-on-threshold
        Trigger:             Queue2.Depth > 10kbyte
        Output:              Queue2

        AlgDropper2:
        Type:                AlgorithmicDropper
        Discipline:          Drop-on-threshold
        Trigger:             Queue3.Depth > 20kbyte
        Output:              Queue3

        AlgDropper3:
        Type:                AlgorithmicDropper
        Discipline:          RED93
        Trigger:             Internal
        Output:              Queue3
        MinThresh:           Queue3.Depth > 20 kbyte
        MaxThresh:           Queue3.Depth > 40 kbyte
           <other RED parms too>
Top   ToC   RFC3290 - Page 42
        Queue1:
        Type:                FIFO
        Output:              Scheduler1.InputA

        Queue2:
        Type:                FIFO
        Output:              Scheduler1.InputB

        Queue3:
        Type:                FIFO
        Output:              Scheduler1.InputC

        Queue4:
        Type:                FIFO
        Output:              Scheduler1.InputD

        Scheduler1:
        Type:                Scheduler4Input
        InputA:
        MaxRateProfile:      none
        MinRateProfile:      Profile4
        Priority:            20
        InputB:
        MaxRateProfile:      Profile5
        MinRateProfile:      none
        Priority:            40
        InputC:
        MaxRateProfile:      none
        MinRateProfile:      Profile3
        Priority:            20
        InputD:
        MaxRateProfile:      none
        MinRateProfile:      none
        Priority:            10

8.3. An Example TCB to Support Multiple Customers

The TCB described above can be installed on an ingress interface to implement a provider/customer TCS if the interface is dedicated to the customer. However, if a single interface is shared between multiple customers, then the TCB above will not suffice, since it does not differentiate among traffic from different customers. Its classification stage uses only BA classifiers. The configuration is readily modified to support the case of multiple customers per interface, as follows. First, a TCB is defined for each customer to reflect the TCS with that customer: TCB1, defined above is the TCB for customer 1. Similar elements are created for
Top   ToC   RFC3290 - Page 43
   TCB2 and for TCB3 which reflect the agreements with customers 2 and 3
   respectively.  These 3 TCBs may or may not contain similar elements
   and parameters.

   Finally, a classifier is added to the front end to separate the
   traffic from the three different customers.  This forms a new TCB,
   TCB4, which is illustrated in Figure 8.

   A representation of this multi-customer TCB might be:

      TCB4:

      Classifier4:
      Filter1:     to TCB1
      Filter2:     to TCB2
      Filter3:     to TCB3
      No Match:    AbsoluteDropper4

      AbsoluteDropper4:
      Type:                AbsoluteDropper

      TCB1:
      (as defined above)

      TCB2:
      (similar to TCB1, perhaps with different
       elements or numeric parameters)

      TCB3:
      (similar to TCB1, perhaps with different
       elements or numeric parameters)

   and the filters, based on each customer's source MAC address, could
   be defined as follows:

      Filter1:

      submitted +-----+
      traffic   |    A|--------> TCB1
      --------->|    B|--------> TCB2
                |    C|--------> TCB3
                |    X|------+   +-----+
                +-----+      +-->|     |
                Classifier4      +-----+
                                 AbsoluteDrop4

      Figure 8: An Example of a Multi-Customer TCB
Top   ToC   RFC3290 - Page 44
      Type:        MacAddress
      SrcValue:    01-02-03-04-05-06 (source MAC address of customer 1)
      SrcMask:     FF-FF-FF-FF-FF-FF
      DestValue:   00-00-00-00-00-00
      DestMask:    00-00-00-00-00-00

      Filter2:
      (similar to Filter1 but with customer 2's source MAC address as
       SrcValue)

      Filter3:
      (similar to Filter1 but with customer 3's source MAC address as
       SrcValue)

   In this example, Classifier4 separates traffic submitted from
   different customers based on the source MAC address in submitted
   packets.  Those packets with recognized source MAC addresses are
   passed to the TCB implementing the TCS with the corresponding
   customer.  Those packets with unrecognized source MAC addresses are
   passed to a dropper.

   TCB4 has a Classifier stage and an Action element stage performing
   dropping of all unmatched traffic.

8.4. TCBs Supporting Microflow-based Services

The TCB illustrated above describes a configuration that might be suitable for enforcing a SLS at a router's ingress. It assumes that the customer marks its own traffic for the appropriate service level. It then limits the rate of aggregate traffic submitted at each service level, thereby protecting the resources of the Diffserv network. It does not provide any isolation between the customer's individual microflows. A more complex example might be a TCB configuration that offers additional functionality to the customer. It recognizes individual customer microflows and marks each one independently. It also isolates the customer's individual microflows from each other in order to prevent a single microflow from seizing an unfair share of the resources available to the customer at a certain service level. This is illustrated in Figure 9. Suppose that the customer has an SLS which specifies 2 service levels, to be identified to the provider by DSCP A and DSCP B. Traffic is first directed to a MF classifier which classifies traffic based on miscellaneous classification criteria, to a granularity sufficient to identify individual customer microflows. Each microflow can then be marked for a specific DSCP The metering
Top   ToC   RFC3290 - Page 45
   elements limit the contribution of each of the customer's microflows
   to the service level for which it was marked.  Packets exceeding the
   allowable limit for the microflow are dropped.

                     +-----+   +-----+
    Classifier1      |     |   |     |---------------+
        (MF)      +->|     |-->|     |     +-----+   |
      +-----+     |  |     |   |     |---->|     |   |
      |    A|------  +-----+   +-----+     +-----+   |
   -->|    B|-----+  Marker1   Meter1      Absolute  |
      |    C|---+ |                        Dropper1  |   +-----+
      |    X|-+ | |  +-----+   +-----+               +-->|A    |
      +-----+ | | |  |     |   |     |------------------>|B    |--->
              | | +->|     |-->|     |     +-----+   +-->|C    | to TCB2
              | |    |     |   |     |---->|     |   |   +-----+
              | |    +-----+   +-----+     +-----+   |    Mux1
              | |    Marker2   Meter2      Absolute  |
              | |                          Dropper2  |
              | |    +-----+   +-----+               |
              | |    |     |   |     |---------------+
              | |--->|     |-->|     |     +-----+
              |      |     |   |     |---->|     |
              |      +-----+   +-----+     +-----+
              |      Marker3   Meter3      Absolute
              |                            Dropper3
              V etc.

      Figure 9: An Example of a Marking and Traffic Isolation TCB

   This TCB could be formally specified as follows:

      TCB1:
      Classifier1: (MF)
      FilterA:             Marker1
      FilterB:             Marker2
      FilterC:             Marker3
      etc.

      Marker1:
      Output:              Meter1

      Marker2:
      Output:              Meter2

      Marker3:
      Output:              Meter3
Top   ToC   RFC3290 - Page 46
      Meter1:
      ConformingOutput:    Mux1.InputA
      NonConformingOutput: AbsoluteDropper1

      Meter2:
      ConformingOutput:    Mux1.InputB
      NonConformingOutput: AbsoluteDropper2

      Meter3:
      ConformingOutput:    Mux1.InputC
      NonConformingOutput: AbsoluteDropper3

      etc.

      Mux1:
      Output:              to TCB2

   Note that the detailed traffic element declarations are not shown
   here.  Traffic is either dropped by TCB1 or emerges marked for one of
   two DSCPs.  This traffic is then passed to TCB2 which is illustrated
   in Figure 10.

   TCB2 could then be specified as follows:

      Classifier2: (BA)
      FilterA:               Meter5
      FilterB:               Meter6


                     +-----+
                     |     |---------------> to Queue1
                  +->|     |     +-----+
        +-----+   |  |     |---->|     |
        |    A|---+  +-----+     +-----+
      ->|     |       Meter5     AbsoluteDropper4
        |    B|---+  +-----+
        +-----+   |  |     |---------------> to Queue2
      Classifier2 +->|     |     +-----+
         (BA)        |     |---->|     |
                     +-----+     +-----+
                      Meter6     AbsoluteDropper5

      Figure 10: Additional Example: TCB2

      Meter5:
      ConformingOutput:      Queue1
      NonConformingOutput:   AbsoluteDropper4
Top   ToC   RFC3290 - Page 47
      Meter6:
      ConformingOutput:      Queue2
      NonConformingOutput:   AbsoluteDropper5

8.5. Cascaded TCBs

Nothing in this model prevents more complex scenarios in which one microflow TCB precedes another (e.g., for TCBs implementing separate TCSs for the source and for a set of destinations).

9. Security Considerations

Security vulnerabilities of Diffserv network operation are discussed in [DSARCH]. This document describes an abstract functional model of Diffserv router elements. Certain denial-of-service attacks such as those resulting from resource starvation may be mitigated by appropriate configuration of these router elements; for example, by rate limiting certain traffic streams or by authenticating traffic marked for higher quality-of-service. There may be theft-of-service scenarios where a malicious host can exploit a loose token bucket policer to obtain slightly better QoS than that committed in the TCS.

10. Acknowledgments

Concepts, terminology, and text have been borrowed liberally from [POLTERM], as well as from other IETF work on MIBs and policy- management. We wish to thank the authors of some of those documents: Fred Baker, Michael Fine, Keith McCloghrie, John Seligson, Kwok Chan, Scott Hahn, and Andrea Westerinen for their contributions. This document has benefited from the comments and suggestions of several participants of the Diffserv working group, particularly Shahram Davari, John Strassner, and Walter Weiss. This document could never have reached this level of rough consensus without the relentless pressure of the co-chairs Brian Carpenter and Kathie Nichols, for which the authors are grateful.

11. References

[AF-PHB] Heinanen, J., Baker, F., Weiss, W. and J. Wroclawski, "Assured Forwarding PHB Group", RFC 2597, June 1999. [DSARCH] Carlson, M., Weiss, W., Blake, S., Wang, Z., Black, D. and E. Davies, "An Architecture for Differentiated Services", RFC 2475, December 1998.
Top   ToC   RFC3290 - Page 48
   [DSFIELD]   Nichols, K., Blake, S., Baker, F. and D. Black,
               "Definition of the Differentiated Services Field (DS
               Field) in the IPv4 and IPv6 Headers", RFC 2474, December
               1998.

   [DSMIB]     Baker, F., Smith, A., and K. Chan, "Management
               Information Base for the Differentiated Services
               Architecture", RFC 3289, May 2002.

   [E2E]       Bernet, Y., Yavatkar, R., Ford, P., Baker, F., Zhang, L.,
               Speer, M., Nichols, K., Braden, R., Davie, B.,
               Wroclawski, J. and E. Felstaine, "A Framework for
               Integrated Services Operation over Diffserv Networks",
               RFC 2998, November 2000.

   [EF-PHB]    Davie, B., Charny, A., Bennett, J.C.R., Benson, K., Le
               Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V. and D.
               Stiliadis, "An Expedited Forwarding PHB (Per-Hop
               Behavior)", RFC 3246, March 2002.

   [FJ95]      Floyd, S. and V. Jacobson, "Link Sharing and Resource
               Management Models for Packet Networks", IEEE/ACM
               Transactions on Networking, Vol. 3 No. 4, August 1995l.

   [INTSERV]   Braden, R., Clark, D. and S. Shenker, "Integrated
               Services in the Internet Architecture: an Overview", RFC
               1633, June 1994.

   [NEWTERMS]  Grossman, D., "New Terminology and Clarifications for
               Diffserv", RFC 3260, April, 2002

   [PDBDEF]    K. Nichols and B. Carpenter, "Definition of
               Differentiated Services Per Domain Behaviors and Rules
               for Their Specification", RFC 3086, April 2001.

   [POLTERM]   Westerinen, A., Schnizlein, J., Strassner, J., Scherling,
               M., Quinn, B., Herzog, S., Huynh, A., Carlson, M., Perry,
               J. and S. Waldbusser, "Policy Terminology", RFC 3198,
               November 2001.

   [QOSDEVMOD] Strassner, J., Westerinen, A. and B. Moore, "Information
               Model for Describing Network Device QoS Mechanisms", Work
               in Progress.
Top   ToC   RFC3290 - Page 49
   [QUEUEMGMT] Braden, R., Clark, D., Crowcroft, J., Davie, B., Deering,
               S., Estrin, D., Floyd, S., Jacobson, V., Minshall, C.,
               Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
               S., Wroclawski, J. and L. Zhang, "Recommendations on
               Queue Management and Congestion Avoidance in the
               Internet", RFC 2309, April 1998.

   [SRTCM]     Heinanen, J. and R. Guerin, "A Single Rate Three Color
               Marker", RFC 2697, September 1999.

   [TRTCM]     Heinanen, J. and R. Guerin, "A Two Rate Three Color
               Marker", RFC 2698, September 1999.

   [VIC]       McCanne, S. and Jacobson, V., "vic: A Flexible Framework
               for Packet Video", ACM Multimedia '95, November 1995, San
               Francisco, CA, pp. 511-522.
               <ftp://ftp.ee.lbl.gov/papers/vic-mm95.ps.Z>

   [802.1D]   "Information technology - Telecommunications and
               information exchange between systems - Local and
               metropolitan area networks - Common specifications - Part
               3: Media Access Control (MAC) Bridges:  Revision.  This
               is a revision of ISO/IEC 10038: 1993, 802.1j-1992 and
               802.6k-1992.  It incorporates P802.11c, P802.1p and
               P802.12e.", ISO/IEC 15802-3: 1998.
Top   ToC   RFC3290 - Page 50

Appendix A. Discussion of Token Buckets and Leaky Buckets

"Leaky bucket" and/or "Token Bucket" models are used to describe rate control in several architectures, including Frame Relay, ATM, Integrated Services and Differentiated Services. Both of these models are, by definition, theoretical relationships between some defined burst size, B, a rate, R, and a time interval, t: R = B/t Thus, a token bucket or leaky bucket might specify an information rate of 1.2 Mbps with a burst size of 1500 bytes. In this case, the token rate is 1,200,000 bits per second, the token burst is 12,000 bits and the token interval is 10 milliseconds. The specification says that conforming traffic will, in the worst case, come in 100 bursts per second of 1500 bytes each and at an average rate not exceeding 1.2 Mbps.

A.1 Leaky Buckets

A leaky bucket algorithm is primarily used for shaping traffic as it leaves an interface onto the network (handled under Queues and Schedulers in this model). Traffic theoretically departs from an interface at a rate of one bit every so many time units (in the example, one bit every 0.83 microseconds) but, in fact, departs in multi-bit units (packets) at a rate approximating the theoretical, as measured over a longer interval. In the example, it might send one 1500 byte packet every 10 ms or perhaps one 500 byte packet every 3.3 ms. It is also possible to build multi-rate leaky buckets in which traffic departs from the interface at varying rates depending on recent activity or inactivity. Implementations generally seek as constant a transmission rate as achievable. In theory, a 10 Mbps shaped transmission stream from an algorithmic implementation and a stream which is running at 10 Mbps because its bottleneck link has been a 10 Mbps Ethernet link should be indistinguishable. Depending on configuration, the approximation to theoretical smoothness may vary by moving as much as an MTU from one token interval to another. Traffic may also be jostled by other traffic competing for the same transmission resources.

A.2 Token Buckets

A token bucket, on the other hand, measures the arrival rate of traffic from another device. This traffic may originally have been shaped using a leaky bucket shaper or its equivalent. The token bucket determines whether the traffic (still) conforms to the specification. Multi-rate token buckets (e.g., token buckets with
Top   ToC   RFC3290 - Page 51
   both a peak rate and a mean rate, and sometimes more) are commonly
   used, such as those described in [SRTCM] and [TRTCM].  In this case,
   absolute smoothness is not expected, but conformance to one or more
   of the specified rates is.

   Simplistically, a data stream is said to conform to a simple token
   bucket parameterized by a {R, B} if the system receives in any time
   interval, t, at most, an amount of data not exceeding (R * t) + B.

   For a multi-rate token bucket case, the data stream is said to
   conform if, for each of the rates, the stream conforms to the token-
   bucket profile appropriate for traffic of that class.  For example,
   received traffic that arrives pre-classified as one of the "excess"
   rates (e.g., AF12 or AF13 traffic for a device implementing the AF1x
   PHB) is only compared to the relevant "excess" token bucket profile.

A.3 Some Consequences

The fact that Internet Protocol data is organized into variable length packets introduces some uncertainty in the conformance decision made by any downstream Meter that is attempting to determine conformance to a traffic profile that is theoretically designed for fixed-length units of data. When used as a leaky bucket shaper, the above definition interacts with clock granularity in ways one might not expect. A leaky bucket releases a packet only when all of its bits would have been allowed: it does not borrow from future capacity. If the clock is very fine grain, on the order of the bit rate or faster, this is not an issue. But if the clock is relatively slow (and millisecond or multi- millisecond clocks are not unusual in networking equipment), this can introduce jitter to the shaped stream. This leaves an implementor of a token bucket Meter with a dilemma. When the number of bandwidth tokens, b, left in the token bucket is positive but less than the size of the packet being operated on, L, one of three actions can be performed: (1) The whole size of the packet can be subtracted from the bucket, leaving it negative, remembering that, when new tokens are next added to the bucket, the new token allocation, B, must be added to b rather than simply setting the bucket to "full". This option potentially puts more than the desired burst size of data into this token bucket interval and correspondingly less into the next. It does, however, keep the average amount accepted per token bucket interval equal to the token burst. This approach accepts traffic if any one bit in the packet would have been
Top   ToC   RFC3290 - Page 52
            accepted and borrows up to one MTU of capacity from one or
            more subsequent intervals when necessary.  Such a token
            bucket meter implementation is said to offer "loose"
            conformance to the token bucket.

      (2)   Alternatively, the packet can be rejected and the amount of
            tokens in the bucket left unchanged (and maybe an attempt
            could be made to accept the packet under another threshold
            in another bucket), remembering that, when new tokens are
            next added to the bucket, the new token allocation, B, must
            be added to b rather than simply setting the bucket to
            "full".  This potentially puts less than the permissible
            burst size of data into this token bucket interval and
            correspondingly more into the next.  Like the first option,
            it keeps the average amount accepted per token bucket
            interval equal to the token burst.  This approach accepts
            traffic only if every bit in the packet would have been
            accepted and borrows up to one MTU of capacity from one or
            more previous intervals when necessary.  Such a token bucket
            meter implementation is said to offer "strict" (or perhaps
            "stricter") conformance to the token bucket.  This option is
            consistent with [SRTCM] and [TRTCM] and is often used in ATM
            and frame-relay implementations.

      (3)   The TB variable can be set to zero to account for the first
            part of the packet and the remainder of the packet size can
            be taken out of the next-colored bucket.  This, of course,
            has another bug:  the same packet cannot have both
            conforming and non-conforming components in the Diffserv
            architecture and so is not really appropriate here and we do
            not discuss this option further here.

            Unfortunately, the thing that cannot be done is exactly to
            fit the token burst specification with random sized packets:
            therefore token buckets in a variable length packet
            environment always have a some variance from theoretical
            reality.  This has also been observed in the ATM Guaranteed
            Frame Rate (GFR) service category specification and Frame
            Relay.  A number of observations may be made:

   o  Operationally, a token bucket meter is reasonable for traffic
      which has been shaped by a leaky bucket shaper or a serial line.
      However, traffic in the Internet is rarely shaped in that way: TCP
      applies no shaping to its traffic, but rather depends on longer-
      range ACK-clocking behavior to help it approximate a certain rate
      and explicitly sends traffic bursts during slow start,
      retransmission, and fast recovery.  Video-on-IP implementations
      such as [VIC] may have a leaky bucket shaper available to them,
Top   ToC   RFC3290 - Page 53
      but often do not, and simply enqueue the output of their codec for
      transmission on the appropriate interface.  As a result, in each
      of these cases, a token bucket meter may reject traffic in the
      short term (over a single token interval) which it would have
      accepted if it had a longer time in view and which it needs to
      accept for the application to work properly.  To work around this,
      the token interval, B/R, must approximate or exceed the RTT of the
      session(s) in question and the burst size, B, must accommodate the
      largest burst that the originator might send.

   o  The behavior of a loose token bucket is significantly different
      from the token bucket description for ATM and for Frame Relay.

   o  A loose token bucket does not accept packets while the token count
      is negative.  This means that, when a large packet has just
      borrowed tokens from the future, even a small incoming packet
      (e.g., a 40-byte TCP ACK/SYN) will not be accepted.  Therefore, if
      such a loose token bucket is configured with a burst size close to
      the MTU, some discrimination against smaller packets can take
      place: use of a larger burst size avoids this problem.

   o  The converse of the above is that a strict token bucket sometimes
      does not accept large packets when a loose one would do so.
      Therefore, if such a strict token bucket is configured with a
      burst size close to the MTU, some discrimination against larger
      packets can take place: use of a larger burst size avoids this
      problem.

   o  In real-world deployments, MTUs are often larger than the burst
      size offered by a link-layer network service provider.  If so then
      it is possible that a strict token bucket meter would find that
      traffic never matches the specified profile: this may be avoided
      by not allowing such a specification to be used.  This situation
      cannot arise with a loose token bucket since the smallest burst
      size that can be configured is 1 bit, by definition limiting a
      loose token bucket to having a burst size of greater than one MTU.

   o  Both strict token bucket specifications, as specified in [SRTCM]
      and [TRTCM], and loose ones, are subject to a persistent under-
      run.  These accumulate burst capacity over time, up to the maximum
      burst size.  Suppose that the maximum burst size is exactly the
      size of the packets being sent - which one might call the
      "strictest" token bucket implementation.  In such a case, when one
      packet has been accepted, the token depth becomes zero and starts
      to accumulate again.  If the next packet is received any time
      earlier than a token interval later, it will not be accepted.  If
      the next packet arrives exactly on time, it will be accepted and
      the token depth again set to zero.  If it arrives later, however,
Top   ToC   RFC3290 - Page 54
      accumulation of tokens will have stopped because it is capped by
      the maximum burst size: during the interval between the bucket
      becoming full and the actual arrival of the packet, no new tokens
      are added.  As a result, jitter that accumulates across multiple
      hops in the network conspires against the algorithm to reduce the
      actual acceptance rate.  Thus it usually makes sense to set the
      maximum token bucket size somewhat greater than the MTU in order
      to absorb some of the jitter and allow a practical acceptance rate
      more in line with the desired theoretical rate.

A.4 Mathematical Definition of Strict Token Bucket Conformance

The strict token bucket conformance behavior defined in [SRTCM] and [TRTCM] is not mandatory for compliance with any current Diffserv standards, but we give here a mathematical definition of two- parameter token bucket operation which is consistent with those documents and which can also be used to define a shaping profile. Define a token bucket with bucket size B, token accumulation rate R and instantaneous token occupancy b(t). Assume that b(0) = B. Then after an arbitrary interval with no packet arrivals, b(t) will not change since the bucket is already full of tokens. Assume a packet of size L bytes arrives at time t'. The bucket occupancy is still B. Then, as long as L <= B, the packet conforms to the meter, and afterwards b(t') = B - L. Assume now an interval delta_t = t - t' elapses before the next packet arrives, of size L' <= B. Just before this, at time t-, the bucket has accumulated delta_t*R tokens over the interval, up to a maximum of B tokens so that: b(t-) = min{ B, b(t') + delta_t*R } For a strict token bucket, the conformance test is as follows: if (b(t-) - L' >= 0) { /* the packet conforms */ b(t) = b(t-) - L'; } else { /* the packet does not conform */ b(t) = b(t-); }
Top   ToC   RFC3290 - Page 55
   This function can also be used to define a shaping profile.  If a
   packet of size L arrives at time t, it will be eligible for
   transmission at time te given as follows (we still assume L <= B):

                  te = max{ t, t" }

   where t" = (L - b(t') + t'*R) / R and b(t") = L, the time when L
   credits have accumulated in the bucket, and when the packet would
   conform if the token bucket were a meter. te != t" only if t > t".

   A mathematical definition along these lines for loose token bucket
   conformance is left as an exercise for the reader.

Authors' Addresses

Yoram Bernet Microsoft One Microsoft Way Redmond, WA 98052 Phone: +1 425 936 9568 EMail: ybernet@msn.com Steven Blake Ericsson 920 Main Campus Drive, Suite 500 Raleigh, NC 27606 Phone: +1 919 472 9913 EMail: steven.blake@ericsson.com Daniel Grossman Motorola Inc. 20 Cabot Blvd. Mansfield, MA 02048 Phone: +1 508 261 5312 EMail: dan@dma.isg.mot.com Andrew Smith (editor) Harbour Networks Jiuling Building 21 North Xisanhuan Ave. Beijing, 100089 PRC Fax: +1 415 345 1827 EMail: ah_smith@acm.org
Top   ToC   RFC3290 - Page 56
Full Copyright Statement

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.