DSARCH]. Diffserv network providers may choose to offer services to customers based on a temporal (i.e., rate) profile within which the customer submits traffic for the service. In this event, a meter might be used to trigger real-time traffic conditioning actions (e.g., marking) by routing a non- conforming packet through an appropriate next-stage action element. Alternatively, by counting conforming and/or non-conforming traffic using a Counter element downstream of the Meter, it might also be used to help in collecting data for out-of-band management functions such as billing applications. Meters are logically 1:N (fan-out) devices (although a multiplexor can be used in front of a meter). Meters are parameterized by a temporal profile and by conformance levels, each of which is associated with a meter's output. Each output can be connected to another functional element. Note that this model of a meter differs slightly from that described in [DSARCH]. In that description the meter is not a datapath element but is instead used to monitor the traffic stream and send control signals to action elements to dynamically modulate their behavior based on the conformance of the packet. This difference in the description does not change the function of a meter. Figure 4 illustrates a meter with 3 levels of conformance. In some Diffserv examples (e.g., [AF-PHB]), three levels of conformance are discussed in terms of colors, with green representing conforming, yellow representing partially conforming and red representing non-conforming. These different conformance levels may be used to trigger different queuing, marking or dropping treatment later on in the processing. Other example meters use a binary notion of conformance; in the general case N levels of conformance can be supported. In general there is no constraint on the type of functional datapath element following a meter output, but care must be taken not to inadvertently configure a datapath that results in packet reordering that is not consistent with the requirements of the relevant PHB specification.
unmetered metered traffic traffic +---------+ | |--------> conformance A --------->| meter |--------> conformance B | |--------> conformance C +---------+ Figure 4. A Generic Meter A meter, according to this model, measures the rate at which packets making up a stream of traffic pass it, compares the rate to some set of thresholds, and produces some number of potential results (two or more): a given packet is said to be "conformant" to a level of the meter if, at the time that the packet is being examined, the stream appears to be within the rate limit for the profile associated with that level. A fuller discussion of conformance to meter profiles (and the associated requirements that this places on the schedulers upstream) is provided in Appendix A.
that pushes the count over 12 kbits would be deemed non-conforming. Thus, this Meter deems packets to correspond to one of two conformance levels: conforming or non-conforming, and sends them on for the appropriate subsequent treatment.
rate, R, up to a maximum credit which is the burst size, B. When a packet of length L arrives, a conformance test is applied. There are at least two such tests in widespread use: Strict conformance Packets of length L bytes are considered conforming only if there are sufficient tokens available in the bucket at the time of packet arrival for the complete packet (i.e., the current depth is greater than or equal to L): no tokens may be borrowed from future token allocations. For examples of this approach, see [SRTCM] and [TRTCM]. Loose conformance Packets of length L bytes are considered conforming if any tokens are available in the bucket at the time of packet arrival: up to L bytes may then be borrowed from future token allocations. Packets are allowed to exceed the average rate in bursts up to the burst size. For further discussion of loose and strict conformance to token bucket profiles, as well as system and implementation issues, see Appendix A. A two-parameter TB meter has exactly two possible conformance levels (conforming, non-conforming). Such a meter might appear as follows: Meter3: Type: SimpleTokenBucket Profile: Profile3 ConformanceType: loose ConformingOutput: Queue1 NonConformingOutput: AbsoluteDropper1 Profile3: Type: SimpleTokenBucket AverageRate: 200 kbps BurstSize: 100 kbytes SRTCM, TRTCM]; in some of these references, three levels of conformance are discussed in terms of colors with green representing conforming, yellow representing partially conforming, and red representing non-conforming. Note that
these multiple-conformance-level meters can sometimes be implemented using an appropriate sequence of multiple two-parameter TB meters. A profile for a multi-stage TB meter with three levels of conformance might look as follows: Meter4: Type: TwoRateTokenBucket ProfileA: Profile4 ConformanceTypeA: strict ConformingOutputA: Queue1 ProfileB: Profile5 ConformanceTypeB: strict ConformingOutputB: Marker1 NonConformingOutput: AbsoluteDropper1 Profile4: Type: SimpleTokenBucket AverageRate: 100 kbps BurstSize: 20 kbytes Profile5: Type: SimpleTokenBucket AverageRate: 100 kbps BurstSize: 100 kbytes
- Multiplexing - Counting - Null action - do nothing The corresponding action elements are described in the following sections. Section 7.1.3). However, since this element's behavior is closely tied the state of one or more queues, we choose to distinguish it as a separate functional datapath element.
- because of buffering limitations. - because a buffer threshold is exceeded (including when shaping is performed). - as a feedback control signal to reactive control protocols such as TCP. - because a meter exceeds a configured profile (i.e., policing). The queuing elements in this model represent a logical abstraction of a queuing system which is used to configure PHB-related parameters. The model can be used to represent a broad variety of possible implementations. However, it need not necessarily map one-to-one with physical queuing systems in a specific router implementation. Implementors should map the configurable parameters of the implementation's queuing systems to these queuing element parameters as appropriate to achieve equivalent behaviors. section 8. The remainder of this section discusses FIFO Queues: typically, the Queue element of this model will be implemented as a FIFO data structure. However, this does not preclude implementations which are not strictly FIFO, in that they also support operations that remove or examine packets (e.g., for use by discarders) other than at the head or tail. However, such operations must not have the effect of reordering packets belonging to the same microflow. Note that the term FIFO has multiple different common usages: it is sometimes taken to mean, among other things, a data structure that permits items to be removed only in the order in which they were inserted or a service discipline which is non-reordering.
DSARCH] defines PHBs without specifying required scheduling algorithms. However, PHBs such as the class selectors [DSFIELD], EF [EF-PHB] and AF [AF-PHB] have descriptions or configuration parameters which strongly suggest the sort of scheduling discipline needed to implement them. This document discusses a minimal set of queue parameters to enable realization of these PHBs. It does not attempt to specify an all-embracing set of parameters to cover all possible implementation models. A minimal set includes: a) a minimum service rate profile which allows rate guarantees for each traffic stream as required by EF and AF without specifying the details of how excess bandwidth between these traffic streams is shared. Additional parameters to control this behavior should be made available, but are dependent on the particular scheduling algorithm implemented.
b) a service priority, used only after the minimum rate profiles of all inputs have been satisfied, to decide how to allocate any remaining bandwidth. c) a maximum service rate profile, for use only with a non-work- conserving service discipline. Any one of these profiles is composed, for the purposes of this model, of both a rate (in suitable units of bits, bytes or larger chunks in some unit of time) and a burst size, as discussed further in Appendix A. By way of example, for an implementation of the EF PHB using a strict priority scheduling algorithm that assumes that the aggregate EF rate has been appropriately bounded by upstream policing to avoid starvation of other BAs, the service rate profiles are not used: the minimum service rate profile would be defaulted to zero and the maximum service rate profile would effectively be the "line rate". Such an implementation, with multiple priority classes, could also be used for the Diffserv class selectors [DSFIELD]. Alternatively, setting the service priority values for each input to the scheduler to the same value enables the scheduler to satisfy the minimum service rates for each input, so long as the sum of all minimum service rates is less than or equal to the line rate. For example, a non-work-conserving scheduler, allocating spare bandwidth equally between all its inputs, might be represented using the following parameters: Scheduler1: Type: Scheduler2Input Input1: MaxRateProfile: Profile1 MinRateProfile: Profile2 Priority: none Input2: MaxRateProfile: Profile3 MinRateProfile: Profile4 Priority: none A work-conserving scheduler might be represented using the following parameters:
Scheduler2: Type: Scheduler3Input Input1: MaxRateProfile: WorkConserving MinRateProfile: Profile5 Priority: 1 Input2: MaxRateProfile: WorkConserving MinRateProfile: Profile6 Priority: 2 Input3: MaxRateProfile: WorkConserving MinRateProfile: none Priority: 3
Other packet selection methods could be added to this model in the form of a different type of datapath element. The Algorithmic Dropper is modeled as having a single input. It is possible that packets which were classified differently by a Classifier in this TCB will end up passing through the same dropper. The dropper's algorithm may need to apply different calculations based on characteristics of the incoming packet (e.g., its DSCP). So there is a need, in implementations of this model, to be able to relate information about which classifier element was matched by a packet from a Classifier to an Algorithmic Dropper. In the rare cases where this is required, the chosen model is to insert another Classifier element at this point in the flow and for it to feed into multiple Algorithmic Dropper elements, each one implementing a drop calculation that is independent of any classification keys of the packet: this will likely require the creation of a new TCB to contain the Classifier and the Algorithmic Dropper elements. NOTE: There are many other formulations of a model that could represent this linkage that are different from the one described above: one formulation would have been to have a pointer from one of the drop probability calculation algorithms inside the dropper to the original Classifier element that selects this algorithm. Another way would have been to have multiple "inputs" to the Algorithmic Dropper element fed from the preceding elements, leading eventually back to the Classifier elements that matched the packet. Yet another formulation might have been for the Classifier to (logically) include some sort of "classification identifier" along with the packet along its path, for use by any subsequent element. And yet another could have been to include a classifier inside the dropper, in order for it to pick out the drop algorithm to be applied. These other approaches could be used by implementations but were deemed to be less clear than the approach taken here. An Algorithmic Dropper, an example of which is illustrated in Figure 5, has one or more triggers that cause it to make a decision whether or not to drop one (or possibly more than one) packet. A trigger may be internal (the arrival of a packet at the input to the dropper) or it may be external (resulting from one or more state changes at another element, such as a FIFO Queue depth crossing a threshold or a scheduling event). It is likely that an instantaneous FIFO depth will need to be smoothed over some averaging interval before being used as a useful trigger. Some dropping algorithms may require several trigger inputs feeding back from events elsewhere in the system (e.g., depth-smoothing functions that calculate averages over more than one time interval).
+------------------+ +-----------+ | +-------+ | n |smoothing | | |trigger|<----------/---|function(s)| | |calc. | | |(optional) | | +-------+ | +-----------+ | | | ^ | v | |Depth Input | +-------+ no | ------------+ to Scheduler ---------->|discard|--------------> |x|x|x|x|-------> | | ? | | ------------+ | +-------+ | FIFO | |yes | | | | | | | | v | count + | | +---+ bit-bucket| +------------------+ Algorithmic Dropper Figure 5. Example of Algorithmic Dropper from Tail of a Queue A trigger may be a boolean combination of events (e.g., a FIFO depth exceeding a threshold OR a buffer pool depth falling below a threshold). It takes as its input some set of dynamic parameters (e.g., smoothed or instantaneous FIFO depth), and some set of static parameters (e.g., thresholds), and possibly other parameters associated with the packet. It may also have internal state (e.g., history of its past actions). Note that, although an Algorithmic Dropper may require knowledge of data fields in a packet, as discovered by a Classifier in the same TCB, it may not modify the packet (i.e., it is not a marker). The result of the trigger calculation is that the dropping algorithm makes a decision on whether to forward or to discard a packet. The discarding function is likely to keep counters regarding the discarded packets (there is no appropriate place here to include a Counter Action element). The example in Figure 5 also shows a FIFO Queue element from whose tail the dropping is to take place and whose depth characteristics are used by this Algorithmic Dropper. It also shows where a depth- smoothing function might be included: smoothing functions are outside the scope of this document and are not modeled explicitly here, we merely indicate where they might be added. RED, RED-on-In-and-Out (RIO) and Drop-on-threshold are examples of dropping algorithms. Tail-dropping and head-dropping are effected by the location of the Algorithmic Dropper element relative to the FIFO
Queue element. As an example, a dropper using a RIO algorithm might be represented using 2 Algorithmic Droppers with the following parameters: AlgorithmicDropper1: (for in-profile traffic) Type: AlgorithmicDropper Discipline: RED Trigger: Internal Output: Fifo1 MinThresh: Fifo1.Depth > 20 kbyte MaxThresh: Fifo1.Depth > 30 kbyte SampleWeight .002 MaxDropProb 1% AlgorithmicDropper2: (for out-of-profile traffic) Type: AlgorithmicDropper Discipline: RED Trigger: Internal Output: Fifo1 MinThresh: Fifo1.Depth > 10 kbyte MaxThresh: Fifo1.Depth > 20 kbyte SampleWeight .002 MaxDropProb 2% Another form of Algorithmic Dropper, a threshold-dropper, might be represented using the following parameters: AlgorithmicDropper3: Type: AlgorithmicDropper Discipline: Drop-on-threshold Trigger: Fifo2.Depth > 20 kbyte Output: Fifo1 DSARCH] a shaper is described as a queuing element controlled by a meter which
defines its temporal profile. However, this representation of a shaper differs substantially from typical shaper implementations. In the model described here, a shaper is realized by using a non- work-conserving Scheduler. Some implementations may elect to have queues whose sole purpose is shaping, while others may integrate the shaping function with other buffering, discarding, and scheduling associated with access to a resource. Shapers operate by delaying the departure of packets that would be deemed non-conforming by a meter configured to the shaper's maximum service rate profile. The packet is scheduled to depart no sooner than such time that it would become conforming. FJ95], although it has been in use in communications systems since the 1970's. [DSARCH] discusses load sharing as dividing an interface among traffic classes predictably, or applying a minimum rate to each of a set of traffic classes, which might be measured as an absolute lower bound on the rate a traffic stream achieves or a fraction of the rate an interface offers. It is generally implemented as some form of weighted queuing algorithm among a set of FIFO queues i.e., a WFQ scheme. This has interesting side-effects. A key effect sought is to ensure that the mean rate the traffic in a stream experiences is never lower than some threshold when there is at least that much traffic to send. When there is less traffic than this, the queue tends to be starved of traffic, meaning that the queuing system will not delay its traffic by very much. When there is significantly more traffic and the queue starts filling, packets in this class will be delayed significantly more than traffic in other classes that are under-using their available capacity. This form of queuing system therefore tends to move delay and variation in delay from under-used classes of traffic to heavier users, as well as managing the rates of the traffic streams. A side-effect of a WRR or WFQ implementation is that between any two packets in a given traffic class, the scheduler may emit one or more packets from each of the other classes in the queuing system. In cases where average behavior is in view, this is perfectly acceptable. In cases where traffic is very intolerant of jitter and there are a number of competing classes, this may have undesirable consequences.
RFC 791], of 802.1p traffic classes [802.1D], and other similar technologies. Priority is often abused in real networks; people tend to think that traffic which has a high business priority deserves this treatment and talk more about the business imperatives than the actual application requirements. This can have severe consequences; networks have been configured which placed business-critical traffic at a higher priority than routing-protocol traffic, resulting in collapse of the network's management or control systems. However, it may have a legitimate use for services based on an Expedited Forwarding (EF) PHB, where it is absolutely sure, thanks to policing at all possible traffic entry points, that a traffic stream does not abuse its rate and that the application is indeed jitter-intolerant enough to merit this type of handling. Note that, even in cases with well-policed ingress points, there is still the possibility of unexpected traffic loops within an un-policed core part of the network causing such collapse.