Metering is defined in [DSARCH]. Diffserv network providers may
choose to offer services to customers based on a temporal (i.e.,
rate) profile within which the customer submits traffic for the
service. In this event, a meter might be used to trigger real-time
traffic conditioning actions (e.g., marking) by routing a non-
conforming packet through an appropriate next-stage action element.
Alternatively, by counting conforming and/or non-conforming traffic
using a Counter element downstream of the Meter, it might also be
used to help in collecting data for out-of-band management functions
such as billing applications.
Meters are logically 1:N (fan-out) devices (although a multiplexor
can be used in front of a meter). Meters are parameterized by a
temporal profile and by conformance levels, each of which is
associated with a meter's output. Each output can be connected to
another functional element.
Note that this model of a meter differs slightly from that described
in [DSARCH]. In that description the meter is not a datapath element
but is instead used to monitor the traffic stream and send control
signals to action elements to dynamically modulate their behavior
based on the conformance of the packet. This difference in the
description does not change the function of a meter. Figure 4
illustrates a meter with 3 levels of conformance.
In some Diffserv examples (e.g., [AF-PHB]), three levels of
conformance are discussed in terms of colors, with green representing
conforming, yellow representing partially conforming and red
representing non-conforming. These different conformance levels may
be used to trigger different queuing, marking or dropping treatment
later on in the processing. Other example meters use a binary notion
of conformance; in the general case N levels of conformance can be
supported. In general there is no constraint on the type of
functional datapath element following a meter output, but care must
be taken not to inadvertently configure a datapath that results in
packet reordering that is not consistent with the requirements of the
relevant PHB specification.
| |--------> conformance A
--------->| meter |--------> conformance B
| |--------> conformance C
Figure 4. A Generic Meter
A meter, according to this model, measures the rate at which packets
making up a stream of traffic pass it, compares the rate to some set
of thresholds, and produces some number of potential results (two or
more): a given packet is said to be "conformant" to a level of the
meter if, at the time that the packet is being examined, the stream
appears to be within the rate limit for the profile associated with
that level. A fuller discussion of conformance to meter profiles
(and the associated requirements that this places on the schedulers
upstream) is provided in Appendix A.
The following are some examples of possible meters.
5.1.1. Average Rate Meter
An example of a very simple meter is an average rate meter. This
type of meter measures the average rate at which packets are
submitted to it over a specified averaging time.
An average rate profile may take the following form:
AverageRate: 120 kbps
Delta: 100 msec
A Meter measuring against this profile would continually maintain a
count that indicates the total number and/or cumulative byte-count of
packets arriving between time T (now) and time T - 100 msecs. So
long as an arriving packet does not push the count over 12 kbits in
the last 100 msec, the packet would be deemed conforming. Any packet
that pushes the count over 12 kbits would be deemed non-conforming.
Thus, this Meter deems packets to correspond to one of two
conformance levels: conforming or non-conforming, and sends them on
for the appropriate subsequent treatment.
5.1.2. Exponential Weighted Moving Average (EWMA) Meter
The EWMA form of Meter is easy to implement in hardware and can be
parameterized as follows:
avg_rate(t) = (1 - Gain) * avg_rate(t') + Gain * rate(t)
t = t' + Delta
For a packet arriving at time t:
if (avg_rate(t) > AverageRate)
"Gain" controls the time constant (e.g., frequency response) of what
is essentially a simple IIR low-pass filter. "Rate(t)" measures the
number of incoming bytes in a small fixed sampling interval, Delta.
Any packet that arrives and pushes the average rate over a predefined
rate AverageRate is deemed non-conforming. An EWMA Meter profile
might look something like the following:
AverageRate: 25 kbps
Delta: 10 usec
5.1.3. Two-Parameter Token Bucket Meter
A more sophisticated Meter might measure conformance to a token
bucket (TB) profile. A TB profile generally has two parameters, an
average token rate, R, and a burst size, B. TB Meters compare the
arrival rate of packets to the average rate specified by the TB
profile. Logically, tokens accumulate in a bucket at the average
rate, R, up to a maximum credit which is the burst size, B. When a
packet of length L arrives, a conformance test is applied. There are
at least two such tests in widespread use:
Packets of length L bytes are considered conforming only if there
are sufficient tokens available in the bucket at the time of
packet arrival for the complete packet (i.e., the current depth is
greater than or equal to L): no tokens may be borrowed from future
token allocations. For examples of this approach, see [SRTCM] and
Packets of length L bytes are considered conforming if any tokens
are available in the bucket at the time of packet arrival: up to L
bytes may then be borrowed from future token allocations.
Packets are allowed to exceed the average rate in bursts up to the
burst size. For further discussion of loose and strict conformance
to token bucket profiles, as well as system and implementation
issues, see Appendix A.
A two-parameter TB meter has exactly two possible conformance levels
(conforming, non-conforming). Such a meter might appear as follows:
AverageRate: 200 kbps
BurstSize: 100 kbytes
5.1.4. Multi-Stage Token Bucket Meter
More complicated TB meters might define multiple burst sizes and more
conformance levels. Packets found to exceed the larger burst size
are deemed non-conforming. Packets found to exceed the smaller burst
size are deemed partially-conforming. Packets exceeding neither are
deemed conforming. Some token bucket meters designed for Diffserv
networks are described in more detail in [SRTCM, TRTCM]; in some of
these references, three levels of conformance are discussed in terms
of colors with green representing conforming, yellow representing
partially conforming, and red representing non-conforming. Note that
these multiple-conformance-level meters can sometimes be implemented
using an appropriate sequence of multiple two-parameter TB meters.
A profile for a multi-stage TB meter with three levels of conformance
might look as follows:
AverageRate: 100 kbps
BurstSize: 20 kbytes
AverageRate: 100 kbps
BurstSize: 100 kbytes
5.1.5. Null Meter
A null meter has only one output: always conforming, and no
associated temporal profile. Such a meter is useful to define in the
event that the configuration or management interface does not have
the flexibility to omit a meter in a datapath segment.
6. Action Elements
The classifiers and meters described up to this point are fan-out
elements which are generally used to determine the appropriate action
to apply to a packet. The set of possible actions that can then be
- Absolute Dropping
- Null action - do nothing
The corresponding action elements are described in the following
6.1. DSCP Marker
DSCP Markers are 1:1 elements which set a codepoint (e.g., the DSCP
in an IP header). DSCP Markers may also act on unmarked packets
(e.g., those submitted with DSCP of zero) or may re-mark previously
marked packets. In particular, the model supports the application of
marking based on a preceding classifier match. The mark set in a
packet will determine its subsequent PHB treatment in downstream
nodes of a network and possibly also in subsequent processing stages
within this router.
DSCP Markers for Diffserv are normally parameterized by a single
parameter: the 6-bit DSCP to be marked in the packet header.
6.2. Absolute Dropper
Absolute Droppers simply discard packets. There are no parameters
for these droppers. Because this Absolute Dropper is a terminating
point of the datapath and has no outputs, it is probably desirable to
forward the packet through a Counter Action first for instrumentation
Absolute Droppers are not the only elements than can cause a packet
to be discarded: another element is an Algorithmic Dropper element
(see Section 7.1.3). However, since this element's behavior is
closely tied the state of one or more queues, we choose to
distinguish it as a separate functional datapath element.
It is occasionally necessary to multiplex traffic streams into a
functional datapath element with a single input. A M:1 (fan-in)
multiplexor is a simple logical device for merging traffic streams.
It is parameterized by its number of incoming ports.
One passive action is to account for the fact that a data packet was
processed. The statistics that result might be used later for
customer billing, service verification or network engineering
purposes. Counters are 1:1 functional datapath elements which update
a counter by L and a packet counter by 1 every time a L-byte sized
packet passes through them. Counters can be used to count packets
about to be dropped by an Absolute Dropper or to count packets
arriving at or departing from some other functional element.
6.5. Null Action
A null action has one input and one output. The element performs no
action on the packet. Such an element is useful to define in the
event that the configuration or management interface does not have
the flexibility to omit an action element in a datapath segment.
7. Queuing Elements
Queuing elements modulate the transmission of packets belonging to
the different traffic streams and determine their ordering, possibly
storing them temporarily or discarding them. Packets are usually
stored either because there is a resource constraint (e.g., available
bandwidth) which prevents immediate forwarding, or because the
queuing block is being used to alter the temporal properties of a
traffic stream (i.e., shaping). Packets are discarded for one of the
- because of buffering limitations.
- because a buffer threshold is exceeded (including when shaping
- as a feedback control signal to reactive control protocols such
- because a meter exceeds a configured profile (i.e., policing).
The queuing elements in this model represent a logical abstraction of
a queuing system which is used to configure PHB-related parameters.
The model can be used to represent a broad variety of possible
implementations. However, it need not necessarily map one-to-one
with physical queuing systems in a specific router implementation.
Implementors should map the configurable parameters of the
implementation's queuing systems to these queuing element parameters
as appropriate to achieve equivalent behaviors.
7.1. Queuing Model
Queuing is a function which lends itself to innovation. It must be
modeled to allow a broad range of possible implementations to be
represented using common structures and parameters. This model uses
functional decomposition as a tool to permit the needed latitude.
Queuing systems perform three distinct, but related, functions: they
store packets, they modulate the departure of packets belonging to
various traffic streams and they selectively discard packets. This
model decomposes queuing into the component elements that perform
each of these functions: Queues, Schedulers, and Algorithmic
Droppers, respectively. These elements may be connected together as
part of a TCB, as described in section 8.
The remainder of this section discusses FIFO Queues: typically, the
Queue element of this model will be implemented as a FIFO data
structure. However, this does not preclude implementations which are
not strictly FIFO, in that they also support operations that remove
or examine packets (e.g., for use by discarders) other than at the
head or tail. However, such operations must not have the effect of
reordering packets belonging to the same microflow.
Note that the term FIFO has multiple different common usages: it is
sometimes taken to mean, among other things, a data structure that
permits items to be removed only in the order in which they were
inserted or a service discipline which is non-reordering.
7.1.1. FIFO Queue
In this model, a FIFO Queue element is a data structure which at any
time may contain zero or more packets. It may have one or more
thresholds associated with it. A FIFO has one or more inputs and
exactly one output. It must support an enqueue operation to add a
packet to the tail of the queue and a dequeue operation to remove a
packet from the head of the queue. Packets must be dequeued in the
order in which they were enqueued. A FIFO has a current depth, which
indicates the number of packets and/or bytes that it contains at a
particular time. FIFOs in this model are modeled without inherent
limits on their depth - obviously this does not reflect the reality
of implementations: FIFO size limits are modeled here by an
algorithmic dropper associated with the FIFO, typically at its input.
It is quite likely that every FIFO will be preceded by an algorithmic
dropper. One exception might be the case where the packet stream has
already been policed to a profile that can never exceed the scheduler
bandwidth available at the FIFO's output - this would not need an
algorithmic dropper at the input to the FIFO.
This representation of a FIFO allows for one common type of depth
limit, one that results from a FIFO supplied from a limited pool of
buffers, shared between multiple FIFOs.
In an implementation, packets are presumably stored in one or more
buffers. Buffers are allocated from one or more free buffer pools.
If there are multiple instances of a FIFO, their packet buffers may
or may not be allocated out of the same free buffer pool. Free
buffer pools may also have one or more thresholds associated with
them, which may affect discarding and/or scheduling. Other than
this, buffering mechanisms are implementation specific and not part
of this model.
A FIFO might be represented using the following parameters:
Note that a FIFO must provide triggers and/or current state
information to other elements upstream and downstream from it: in
particular, it is likely that the current depth will need to be used
by Algorithmic Dropper elements placed before or after the FIFO. It
will also likely need to provide an implicit "I have packets for you"
signal to downstream Scheduler elements.
A scheduler is an element which gates the departure of each packet
that arrives at one of its inputs, based on a service discipline. It
has one or more inputs and exactly one output. Each input has an
upstream element to which it is connected, and a set of parameters
that affects the scheduling of packets received at that input.
The service discipline (also known as a scheduling algorithm) is an
algorithm which might take any of the following as its input(s):
a) static parameters such as relative priority associated with each
of the scheduler's inputs.
b) absolute token bucket parameters for maximum or minimum rates
associated with each of the scheduler's inputs.
c) parameters, such as packet length or DSCP, associated with the
packet currently present at its input.
d) absolute time and/or local state.
Possible service disciplines fall into a number of categories,
including (but not limited to) first come, first served (FCFS),
strict priority, weighted fair bandwidth sharing (e.g., WFQ), rate-
limited strict priority, and rate-based. Service disciplines can be
further distinguished by whether they are work-conserving or non-
work-conserving (see Glossary). Non-work-conserving schedulers can
be used to shape traffic streams to match some profile by delaying
packets that might be deemed non-conforming by some downstream node:
a packet is delayed until such time as it would conform to a
downstream meter using the same profile.
[DSARCH] defines PHBs without specifying required scheduling
algorithms. However, PHBs such as the class selectors [DSFIELD], EF
[EF-PHB] and AF [AF-PHB] have descriptions or configuration
parameters which strongly suggest the sort of scheduling discipline
needed to implement them. This document discusses a minimal set of
queue parameters to enable realization of these PHBs. It does not
attempt to specify an all-embracing set of parameters to cover all
possible implementation models. A minimal set includes:
a) a minimum service rate profile which allows rate guarantees for
each traffic stream as required by EF and AF without specifying
the details of how excess bandwidth between these traffic streams
is shared. Additional parameters to control this behavior should
be made available, but are dependent on the particular scheduling
b) a service priority, used only after the minimum rate profiles of
all inputs have been satisfied, to decide how to allocate any
c) a maximum service rate profile, for use only with a non-work-
conserving service discipline.
Any one of these profiles is composed, for the purposes of this
model, of both a rate (in suitable units of bits, bytes or larger
chunks in some unit of time) and a burst size, as discussed further
in Appendix A.
By way of example, for an implementation of the EF PHB using a strict
priority scheduling algorithm that assumes that the aggregate EF rate
has been appropriately bounded by upstream policing to avoid
starvation of other BAs, the service rate profiles are not used: the
minimum service rate profile would be defaulted to zero and the
maximum service rate profile would effectively be the "line rate".
Such an implementation, with multiple priority classes, could also be
used for the Diffserv class selectors [DSFIELD].
Alternatively, setting the service priority values for each input to
the scheduler to the same value enables the scheduler to satisfy the
minimum service rates for each input, so long as the sum of all
minimum service rates is less than or equal to the line rate.
For example, a non-work-conserving scheduler, allocating spare
bandwidth equally between all its inputs, might be represented using
the following parameters:
A work-conserving scheduler might be represented using the following
7.1.3. Algorithmic Dropper
An Algorithmic Dropper is an element which selectively discards
packets that arrive at its input, based on a discarding algorithm.
It has one data input and one output. In this model (but not
necessarily in a real implementation), a packet enters the dropper at
its input and either its buffer is returned to a free buffer pool or
the packet exits the dropper at the output.
Alternatively, an Algorithmic Dropper can be thought of as invoking
operations on a FIFO Queue which selectively remove a packet and
return its buffer to the free buffer pool based on a discarding
algorithm. In this case, the operation could be modeled as being a
side-effect on the FIFO upon which it operated, rather than as having
a discrete input and output. This treatment is equivalent and we
choose the one described in the previous paragraph for this model.
One of the primary characteristics of an Algorithmic Dropper is the
choice of which packet (if any) is to be dropped: for the purposes of
this model, we restrict the packet selection choices to one of the
following and we indicate the choice by the relative positions of
Algorithmic Dropper and FIFO Queue elements in the model:
a) selection of a packet that is about to be added to the tail of a
queue (a "Tail Dropper"): the output of the Algorithmic Dropper
element is connected to the input of the relevant FIFO Queue
b) a packet that is currently at the head of a queue (a "Head
Dropper"): the output of the FIFO Queue element is connected to
the input of the Algorithmic Dropper element.
Other packet selection methods could be added to this model in the
form of a different type of datapath element.
The Algorithmic Dropper is modeled as having a single input. It is
possible that packets which were classified differently by a
Classifier in this TCB will end up passing through the same dropper.
The dropper's algorithm may need to apply different calculations
based on characteristics of the incoming packet (e.g., its DSCP). So
there is a need, in implementations of this model, to be able to
relate information about which classifier element was matched by a
packet from a Classifier to an Algorithmic Dropper. In the rare
cases where this is required, the chosen model is to insert another
Classifier element at this point in the flow and for it to feed into
multiple Algorithmic Dropper elements, each one implementing a drop
calculation that is independent of any classification keys of the
packet: this will likely require the creation of a new TCB to contain
the Classifier and the Algorithmic Dropper elements.
NOTE: There are many other formulations of a model that could
represent this linkage that are different from the one described
above: one formulation would have been to have a pointer from one
of the drop probability calculation algorithms inside the dropper
to the original Classifier element that selects this algorithm.
Another way would have been to have multiple "inputs" to the
Algorithmic Dropper element fed from the preceding elements,
leading eventually back to the Classifier elements that matched
the packet. Yet another formulation might have been for the
Classifier to (logically) include some sort of "classification
identifier" along with the packet along its path, for use by any
subsequent element. And yet another could have been to include a
classifier inside the dropper, in order for it to pick out the
drop algorithm to be applied. These other approaches could be
used by implementations but were deemed to be less clear than the
approach taken here.
An Algorithmic Dropper, an example of which is illustrated in Figure
5, has one or more triggers that cause it to make a decision whether
or not to drop one (or possibly more than one) packet. A trigger may
be internal (the arrival of a packet at the input to the dropper) or
it may be external (resulting from one or more state changes at
another element, such as a FIFO Queue depth crossing a threshold or a
scheduling event). It is likely that an instantaneous FIFO depth
will need to be smoothed over some averaging interval before being
used as a useful trigger. Some dropping algorithms may require
several trigger inputs feeding back from events elsewhere in the
system (e.g., depth-smoothing functions that calculate averages over
more than one time interval).
| +-------+ | n |smoothing |
| |calc. | | |(optional) |
| +-------+ | +-----------+
| | | ^
| v | |Depth
Input | +-------+ no | ------------+ to Scheduler
| | ? | | ------------+
| +-------+ | FIFO
| |yes |
| | | | |
| | v | count + |
| +---+ bit-bucket|
Figure 5. Example of Algorithmic Dropper from Tail of a Queue
A trigger may be a boolean combination of events (e.g., a FIFO depth
exceeding a threshold OR a buffer pool depth falling below a
threshold). It takes as its input some set of dynamic parameters
(e.g., smoothed or instantaneous FIFO depth), and some set of static
parameters (e.g., thresholds), and possibly other parameters
associated with the packet. It may also have internal state (e.g.,
history of its past actions). Note that, although an Algorithmic
Dropper may require knowledge of data fields in a packet, as
discovered by a Classifier in the same TCB, it may not modify the
packet (i.e., it is not a marker).
The result of the trigger calculation is that the dropping algorithm
makes a decision on whether to forward or to discard a packet. The
discarding function is likely to keep counters regarding the
discarded packets (there is no appropriate place here to include a
Counter Action element).
The example in Figure 5 also shows a FIFO Queue element from whose
tail the dropping is to take place and whose depth characteristics
are used by this Algorithmic Dropper. It also shows where a depth-
smoothing function might be included: smoothing functions are outside
the scope of this document and are not modeled explicitly here, we
merely indicate where they might be added.
RED, RED-on-In-and-Out (RIO) and Drop-on-threshold are examples of
dropping algorithms. Tail-dropping and head-dropping are effected by
the location of the Algorithmic Dropper element relative to the FIFO
Queue element. As an example, a dropper using a RIO algorithm might
be represented using 2 Algorithmic Droppers with the following
AlgorithmicDropper1: (for in-profile traffic)
MinThresh: Fifo1.Depth > 20 kbyte
MaxThresh: Fifo1.Depth > 30 kbyte
AlgorithmicDropper2: (for out-of-profile traffic)
MinThresh: Fifo1.Depth > 10 kbyte
MaxThresh: Fifo1.Depth > 20 kbyte
Another form of Algorithmic Dropper, a threshold-dropper, might be
represented using the following parameters:
Trigger: Fifo2.Depth > 20 kbyte
7.2. Sharing load among traffic streams using queuing
Queues are used, in Differentiated Services, for a number of
purposes. In essence, they are simply places to store traffic until
it is transmitted. However, when several queues are used together in
a queuing system, they can also achieve effects beyond that for given
traffic streams. They can be used to limit variation in delay or
impose a maximum rate (shaping), to permit several streams to share a
link in a semi-predictable fashion (load sharing), or to move
variation in delay from some streams to other streams.
Traffic shaping is often used to condition traffic, such that packets
arriving in a burst will be "smoothed" and deemed conforming by
subsequent downstream meters in this or other nodes. In [DSARCH] a
shaper is described as a queuing element controlled by a meter which
defines its temporal profile. However, this representation of a
shaper differs substantially from typical shaper implementations.
In the model described here, a shaper is realized by using a non-
work-conserving Scheduler. Some implementations may elect to have
queues whose sole purpose is shaping, while others may integrate the
shaping function with other buffering, discarding, and scheduling
associated with access to a resource. Shapers operate by delaying
the departure of packets that would be deemed non-conforming by a
meter configured to the shaper's maximum service rate profile. The
packet is scheduled to depart no sooner than such time that it would
7.2.1. Load Sharing
Load sharing is the traditional use of queues and was theoretically
explored by Floyd & Jacobson [FJ95], although it has been in use in
communications systems since the 1970's.
[DSARCH] discusses load sharing as dividing an interface among
traffic classes predictably, or applying a minimum rate to each of a
set of traffic classes, which might be measured as an absolute lower
bound on the rate a traffic stream achieves or a fraction of the rate
an interface offers. It is generally implemented as some form of
weighted queuing algorithm among a set of FIFO queues i.e., a WFQ
scheme. This has interesting side-effects.
A key effect sought is to ensure that the mean rate the traffic in a
stream experiences is never lower than some threshold when there is
at least that much traffic to send. When there is less traffic than
this, the queue tends to be starved of traffic, meaning that the
queuing system will not delay its traffic by very much. When there
is significantly more traffic and the queue starts filling, packets
in this class will be delayed significantly more than traffic in
other classes that are under-using their available capacity. This
form of queuing system therefore tends to move delay and variation in
delay from under-used classes of traffic to heavier users, as well as
managing the rates of the traffic streams.
A side-effect of a WRR or WFQ implementation is that between any two
packets in a given traffic class, the scheduler may emit one or more
packets from each of the other classes in the queuing system. In
cases where average behavior is in view, this is perfectly
acceptable. In cases where traffic is very intolerant of jitter and
there are a number of competing classes, this may have undesirable
7.2.2. Traffic Priority
Traffic Prioritization is a special case of load sharing, wherein a
certain traffic class is deemed so jitter-intolerant that if it has
traffic present, that traffic must be sent at the earliest possible
time. By extension, several priorities might be defined, such that
traffic in each of several classes is given preferential service over
any traffic of a lower class. It is the obvious implementation of IP
Precedence as described in [RFC 791], of 802.1p traffic classes
[802.1D], and other similar technologies.
Priority is often abused in real networks; people tend to think that
traffic which has a high business priority deserves this treatment
and talk more about the business imperatives than the actual
application requirements. This can have severe consequences;
networks have been configured which placed business-critical traffic
at a higher priority than routing-protocol traffic, resulting in
collapse of the network's management or control systems. However, it
may have a legitimate use for services based on an Expedited
Forwarding (EF) PHB, where it is absolutely sure, thanks to policing
at all possible traffic entry points, that a traffic stream does not
abuse its rate and that the application is indeed jitter-intolerant
enough to merit this type of handling. Note that, even in cases with
well-policed ingress points, there is still the possibility of
unexpected traffic loops within an un-policed core part of the
network causing such collapse.