Internet Engineering Task Force (IETF) B. Constantine Request for Comments: 7640 JDSU Category: Informational R. Krishnan ISSN: 2070-1721 Dell Inc. September 2015 Traffic Management Benchmarking
AbstractThis framework describes a practical methodology for benchmarking the traffic management capabilities of networking devices (i.e., policing, shaping, etc.). The goals are to provide a repeatable test method that objectively compares performance of the device's traffic management capabilities and to specify the means to benchmark traffic management with representative application traffic. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7640. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1. Introduction ....................................................3 1.1. Traffic Management Overview ................................3 1.2. Lab Configuration and Testing Overview .....................5 2. Conventions Used in This Document ...............................6 3. Scope and Goals .................................................7 4. Traffic Benchmarking Metrics ...................................10 4.1. Metrics for Stateless Traffic Tests .......................10 4.2. Metrics for Stateful Traffic Tests ........................12 5. Tester Capabilities ............................................13 5.1. Stateless Test Traffic Generation .........................13 5.1.1. Burst Hunt with Stateless Traffic ..................14 5.2. Stateful Test Pattern Generation ..........................14 5.2.1. TCP Test Pattern Definitions .......................15 6. Traffic Benchmarking Methodology ...............................17 6.1. Policing Tests ............................................17 6.1.1. Policer Individual Tests ...........................18 6.1.2. Policer Capacity Tests .............................19 126.96.36.199. Maximum Policers on Single Physical Port ..20 188.8.131.52. Single Policer on All Physical Ports ......22 184.108.40.206. Maximum Policers on All Physical Ports ....22 6.2. Queue/Scheduler Tests .....................................23 6.2.1. Queue/Scheduler Individual Tests ...................23 220.127.116.11. Testing Queue/Scheduler with Stateless Traffic .........................23 18.104.22.168. Testing Queue/Scheduler with Stateful Traffic ..........................25 6.2.2. Queue/Scheduler Capacity Tests .....................28 22.214.171.124. Multiple Queues, Single Port Active .......28 126.96.36.199.1. Strict Priority on Egress Port ....................28 188.8.131.52.2. Strict Priority + WFQ on Egress Port ....................29 184.108.40.206. Single Queue per Port, All Ports Active ...30 220.127.116.11. Multiple Queues per Port, All Ports Active ..............................31 6.3. Shaper Tests ..............................................32 6.3.1. Shaper Individual Tests ............................32 18.104.22.168. Testing Shaper with Stateless Traffic .....33 22.214.171.124. Testing Shaper with Stateful Traffic ......34 6.3.2. Shaper Capacity Tests ..............................36 126.96.36.199. Single Queue Shaped, All Physical Ports Active ..............................37 188.8.131.52. All Queues Shaped, Single Port Active .....37 184.108.40.206. All Queues Shaped, All Ports Active .......39
6.4. Concurrent Capacity Load Tests ............................40 7. Security Considerations ........................................40 8. References .....................................................41 8.1. Normative References ......................................41 8.2. Informative References ....................................42 Appendix A. Open Source Tools for Traffic Management Testing ......44 Appendix B. Stateful TCP Test Patterns ............................45 Acknowledgments ...................................................51 Authors' Addresses ................................................51 Section 1.1. This document provides a framework to conduct repeatable traffic management benchmarks for devices and systems in a lab environment. Specifically, this framework defines the methods to characterize the capacity of the following traffic management features in network devices: classification, policing, queuing/scheduling, and traffic shaping. This benchmarking framework can also be used as a test procedure to assist in the tuning of traffic management parameters before service activation. In addition to Layer 2/3 (Ethernet/IP) benchmarking, Layer 4 (TCP) test patterns are proposed by this document in order to more realistically benchmark end-user traffic.
- Traffic scheduling: provides traffic classification within the network device by directing packets to various types of queues and applies a dispatching algorithm to assign the forwarding sequence of packets. - Traffic shaping: controls traffic by actively buffering and smoothing the output rate in an attempt to adapt bursty traffic to the configured limits. - Active Queue Management (AQM): involves monitoring the status of internal queues and proactively dropping (or remarking) packets, which causes hosts using congestion-aware protocols to "back off" and in turn alleviate queue congestion [RFC7567]. On the other hand, classic traffic management techniques reactively drop (or remark) packets based on queue-full conditions. The benchmarking scenarios for AQM are different and are outside the scope of this testing framework. Even though AQM is outside the scope of this framework, it should be noted that the TCP metrics and TCP test patterns (defined in Sections 4.2 and 5.2, respectively) could be useful to test new AQM algorithms (targeted to alleviate "bufferbloat"). Examples of these algorithms include Controlled Delay [CoDel] and Proportional Integral controller Enhanced [PIE]. The following diagram is a generic model of the traffic management capabilities within a network device. It is not intended to represent all variations of manufacturer traffic management capabilities, but it provides context for this test framework. |----------| |----------------| |--------------| |----------| | | | | | | | | |Interface | |Ingress Actions | |Egress Actions| |Interface | |Ingress | |(classification,| |(scheduling, | |Egress | |Queues | | marking, | | shaping, | |Queues | | |-->| policing, or |-->| active queue |-->| | | | | shaping) | | management, | | | | | | | | remarking) | | | |----------| |----------------| |--------------| |----------| Figure 1: Generic Traffic Management Capabilities of a Network Device Ingress actions such as classification are defined in [RFC4689] and include IP addresses, port numbers, and DSCP. In terms of marking, [RFC2697] and [RFC2698] define a Single Rate Three Color Marker and a Two Rate Three Color Marker, respectively.
The Metro Ethernet Forum (MEF) specifies policing and shaping in terms of ingress and egress subscriber/provider conditioning functions as described in MEF 12.2 [MEF-12.2], as well as ingress and bandwidth profile attributes as described in MEF 10.3 [MEF-10.3] and MEF 26.1 [MEF-26.1]. +--------------+ +-------+ +----------+ +-----------+ | Transmitting | | | | | | Receiving | | Test Host | | | | | | Test Host | | |-----| Device|---->| Network |--->| | | | | Under | | Delay | | | | | | Test | | Emulator | | | | |<----| |<----| |<---| | | | | | | | | | +--------------+ +-------+ +----------+ +-----------+ Figure 2: Lab Setup for Traffic Management Tests As shown in the test diagram, the framework supports unidirectional and bidirectional traffic management tests (where the transmitting and receiving roles would be reversed on the return path). This testing framework describes the tests and metrics for each of the following traffic management functions: - Classification - Policing - Queuing/scheduling - Shaping The tests are divided into individual and rated capacity tests. The individual tests are intended to benchmark the traffic management functions according to the metrics defined in Section 4. The capacity tests verify traffic management functions under the load of many simultaneous individual tests and their flows. This involves concurrent testing of multiple interfaces with the specific traffic management function enabled, and increasing the load to the capacity limit of each interface.
For example, a device is specified to be capable of shaping on all of its egress ports. The individual test would first be conducted to benchmark the specified shaping function against the metrics defined in Section 4. Then, the capacity test would be executed to test the shaping function concurrently on all interfaces and with maximum traffic load. The Network Delay Emulator (NDE) is required for TCP stateful tests in order to allow TCP to utilize a TCP window of significant size in its control loop. Note also that the NDE SHOULD be passive in nature (e.g., a fiber spool). This is recommended to eliminate the potential effects that an active delay element (i.e., test impairment generator) may have on the test flows. In the case where a fiber spool is not practical due to the desired latency, an active NDE MUST be independently verified to be capable of adding the configured delay without loss. In other words, the Device Under Test (DUT) would be removed and the NDE performance benchmarked independently. Note that the NDE SHOULD be used only as emulated delay. Most NDEs allow for per-flow delay actions, emulating QoS prioritization. For this framework, the NDE's sole purpose is simply to add delay to all packets (emulate network latency). So, to benchmark the performance of the NDE, the maximum offered load should be tested against the following frame sizes: 128, 256, 512, 768, 1024, 1500, and 9600 bytes. The delay accuracy at each of these packet sizes can then be used to calibrate the range of expected Bandwidth-Delay Product (BDP) for the TCP stateful tests. RFC2119]. The following acronyms are used: AQM: Active Queue Management BB: Bottleneck Bandwidth BDP: Bandwidth-Delay Product BSA: Burst Size Achieved CBS: Committed Burst Size
CIR: Committed Information Rate DUT: Device Under Test EBS: Excess Burst Size EIR: Excess Information Rate NDE: Network Delay Emulator QL: Queue Length QoS: Quality of Service RTT: Round-Trip Time SBB: Shaper Burst Bytes SBI: Shaper Burst Interval SP: Strict Priority SR: Shaper Rate SSB: Send Socket Buffer SUT: System Under Test Ti: Transmission Interval TTP: TCP Test Pattern TTPET: TCP Test Pattern Execution Time
Essentially, any network device that performs traffic management as defined in Section 1.1 can be benchmarked or tested with this framework. The primary goal is to assess the maximum forwarding performance deemed to be within the provisioned traffic limits that a network device can sustain without dropping or impairing packets, and without compromising the accuracy of multiple instances of traffic management functions. This is the benchmark for comparison between devices. Within this framework, the metrics are defined for each traffic management test but do not include pass/fail criteria, which are not within the charter of the BMWG. This framework provides the test methods and metrics to conduct repeatable testing, which will provide the means to compare measured performance between DUTs. As mentioned in Section 1.2, these methods describe the individual tests and metrics for several management functions. It is also within scope that this framework will benchmark each function in terms of overall rated capacity. This involves concurrent testing of multiple interfaces with the specific traffic management function enabled, up to the capacity limit of each interface. It is not within the scope of this framework to specify the procedure for testing multiple configurations of traffic management functions concurrently. The multitudes of possible combinations are almost unbounded, and the ability to identify functional "break points" would be almost impossible. However, Section 6.4 provides suggestions for some profiles of concurrent functions that would be useful to benchmark. The key requirement for any concurrent test function is that tests MUST produce reliable and repeatable results. Also, it is not within scope to perform conformance testing. Tests defined in this framework benchmark the traffic management functions according to the metrics defined in Section 4 and do not address any conformance to standards related to traffic management. The current specifications don't specify exact behavior or implementation, and the specifications that do exist (cited in Section 1.1) allow implementations to vary with regard to short-term rate accuracy and other factors. This is a primary driver for this framework: to provide an objective means to compare vendor traffic management functions.
Another goal is to devise methods that utilize flows with congestion- aware transport (TCP) as part of the traffic load and still produce repeatable results in the isolated test environment. This framework will derive stateful test patterns (TCP or application layer) that can also be used to further benchmark the performance of applicable traffic management techniques such as queuing/scheduling and traffic shaping. In cases where the network device is stateful in nature (i.e., firewall, etc.), stateful test pattern traffic is important to test, along with stateless UDP traffic in specific test scenarios (i.e., applications using TCP transport and UDP VoIP, etc.). As mentioned earlier in this document, repeatability of test results is critical, especially considering the nature of stateful TCP traffic. To this end, the stateful tests will use TCP test patterns to emulate applications. This framework also provides guidelines for application modeling and open source tools to achieve the repeatable stimulus. Finally, TCP metrics from [RFC6349] MUST be measured for each stateful test and provide the means to compare each repeated test. Even though this framework targets the testing of TCP applications (i.e., web, email, database, etc.), it could also be applied to the Stream Control Transmission Protocol (SCTP) in terms of test patterns. WebRTC, Signaling System 7 (SS7) signaling, and 3GPP are SCTP-based applications that could be modeled with this framework to benchmark SCTP's effect on traffic management performance. Note that at the time of this writing, this framework does not address tcpcrypt (encrypted TCP) test patterns, although the metrics defined in Section 4.2 can still be used because the metrics are based on TCP retransmission and RTT measurements (versus any of the payload). Thus, if tcpcrypt becomes popular, it would be natural for benchmarkers to consider encrypted TCP patterns and include them in test cases.
RFC4737] and [RFC4689] provide recommendations for sequence tracking, along with definitions of in-sequence and out-of-order packets. The following metrics MUST be measured during the stateless traffic benchmarking components of the tests: - Burst Size Achieved (BSA): For the traffic policing and network queue tests, the tester will be configured to send bursts to test either the Committed Burst Size (CBS) or Excess Burst Size (EBS) of a policer or the queue/buffer size configured in the DUT. The BSA metric is a measure of the actual burst size received at the egress port of the DUT with no lost packets. For example, the configured CBS of a DUT is 64 KB, and after the burst test, only a 63 KB burst can be achieved without packet loss. Then, 63 KB is the BSA. Also, the average Packet Delay Variation (PDV) (see below) as experienced by the packets sent at the BSA burst size should be recorded. This metric SHALL be reported in units of bytes, KB, or MB. - Lost Packets (LP): For all traffic management tests, the tester will transmit the test packets into the DUT ingress port, and the number of packets received at the egress port will be measured. The difference between packets transmitted into the ingress port and received at the egress port is the number of lost packets as measured at the egress port. These packets must have unique identifiers such that only the test packets are measured. For cases where multiple flows are transmitted from the ingress port to the egress port (e.g., IP conversations), each flow must have sequence numbers within the stream of test packets.
[RFC6703] and [RFC2680] describe the need to establish the time threshold to wait before a packet is declared as lost. This threshold MUST be reported, with the results reported as an integer number that cannot be negative. - Out-of-Sequence (OOS): In addition to the LP metric, the test packets must be monitored for sequence. [RFC4689] defines the general function of sequence tracking, as well as definitions for in-sequence and out-of-order packets. Out-of-order packets will be counted per [RFC4737]. This metric SHALL be reported as an integer number that cannot be negative. - Packet Delay (PD): The PD metric is the difference between the timestamp of the received egress port packets and the packets transmitted into the ingress port, as specified in [RFC1242]. The transmitting host and receiving host time must be in time sync (achieved by using NTP, GPS, etc.). This metric SHALL be reported as a real number of seconds, where a negative measurement usually indicates a time synchronization problem between test devices. - Packet Delay Variation (PDV): The PDV metric is the variation between the timestamp of the received egress port packets, as specified in [RFC5481]. Note that per [RFC5481], this PDV is the variation of one-way delay across many packets in the traffic flow. Per the measurement formula in [RFC5481], select the high percentile of 99%, and units of measure will be a real number of seconds (a negative value is not possible for the PDV and would indicate a measurement error). - Shaper Rate (SR): The SR represents the average DUT output rate (bps) over the test interval. The SR is only applicable to the traffic-shaping tests. - Shaper Burst Bytes (SBB): A traffic shaper will emit packets in "trains" of different sizes; these frames are emitted "back-to- back" with respect to the mandatory interframe gap. This metric characterizes the method by which the shaper emits traffic. Some shapers transmit larger bursts per interval, and a burst of one packet would apply to the less common case of a shaper sending a constant-bitrate stream of single packets. This metric SHALL be reported in units of bytes, KB, or MB. The SBB metric is only applicable to the traffic-shaping tests. - Shaper Burst Interval (SBI): The SBI is the time between bursts emitted by the shaper and is measured at the DUT egress port. This metric SHALL be reported as a real number of seconds. The SBI is only applicable to the traffic-shaping tests.
RFC6349] TCP metrics and MUST include: - TCP Test Pattern Execution Time (TTPET): [RFC6349] defined the TCP Transfer Time for bulk transfers, which is simply the measured time to transfer bytes across single or concurrent TCP connections. The TCP test patterns used in traffic management tests will include bulk transfer and interactive applications. The interactive patterns include instances such as HTTP business applications and database applications. The TTPET will be the measure of the time for a single execution of a TCP Test Pattern (TTP). Average, minimum, and maximum times will be measured or calculated and expressed as a real number of seconds. An example would be an interactive HTTP TTP session that should take 5 seconds on a GigE network with 0.5-millisecond latency. During ten (10) executions of this TTP, the TTPET results might be an average of 6.5 seconds, a minimum of 5.0 seconds, and a maximum of 7.9 seconds. - TCP Efficiency: After the execution of the TTP, TCP Efficiency represents the percentage of bytes that were not retransmitted. Transmitted Bytes - Retransmitted Bytes TCP Efficiency % = --------------------------------------- X 100 Transmitted Bytes "Transmitted Bytes" is the total number of TCP bytes to be transmitted, including the original bytes and the retransmitted bytes. To avoid any misinterpretation that a reordered packet is a retransmitted packet (as may be the case with packet decode interpretation), these retransmitted bytes should be recorded from the perspective of the sender's TCP/IP stack. - Buffer Delay: Buffer Delay represents the increase in RTT during a TCP test versus the baseline DUT RTT (non-congested, inherent latency). RTT and the technique to measure RTT (average versus baseline) are defined in [RFC6349]. Referencing [RFC6349], the average RTT is derived from the total of all measured RTTs during the actual test sampled at every second divided by the test duration in seconds.
Total RTTs during transfer Average RTT during transfer = ------------------------------ Transfer duration in seconds Average RTT during transfer - Baseline RTT Buffer Delay % = ------------------------------------------ X 100 Baseline RTT Note that even though this was not explicitly stated in [RFC6349], retransmitted packets should not be used in RTT measurements. Also, the test results should record the average RTT in milliseconds across the entire test duration, as well as the number of samples. RFC4689] (e.g., IP address, DSCP, classification of Type of Service). The open source tool "iperf" can be used to generate stateless UDP traffic and is discussed in Appendix A. Since iperf is a software- based tool, there will be performance limitations at higher link speeds (e.g., 1 GigE, 10 GigE). Careful calibration of any test environment using iperf is important. At higher link speeds, using hardware-based packet test equipment is recommended.
RFC6349]. The TCP test device may be a standard computer or a dedicated communications test instrument. In both cases, it must be capable of emulating both a client and a server. For any test using stateful TCP test traffic, the Network Delay Emulator (the NDE function as shown in the lab setup diagram in Section 1.2) must be used in order to provide a meaningful BDP. As discussed in Section 1.2, the target traffic rate and configured RTT MUST be verified independently, using just the NDE for all stateful tests (to ensure that the NDE can add delay without inducing any packet loss). The TCP test host MUST be capable of generating and receiving stateful TCP test traffic at the full link speed of the DUT. As a general rule of thumb, testing TCP throughput at rates greater than 500 Mbps may require high-performance server hardware or dedicated hardware-based test tools.
The TCP test host MUST allow the adjustment of both Send and Receive Socket Buffer sizes. The Socket Buffers must be large enough to fill the BDP for bulk transfer of TCP test application traffic. Measuring RTT and retransmissions per connection will generally require a dedicated communications test instrument. In the absence of dedicated hardware-based test tools, these measurements may need to be conducted with packet capture tools; i.e., conduct TCP throughput tests, and analyze RTT and retransmissions in packet captures. The TCP implementation used by the test host MUST be specified in the test results (e.g., TCP New Reno, TCP options supported). Additionally, the test results SHALL provide specific congestion control algorithm details, as per [RFC3148]. While [RFC6349] defined the means to conduct throughput tests of TCP bulk transfers, the traffic management framework will extend TCP test execution into interactive TCP application traffic. Examples include email, HTTP, and business applications. This interactive traffic is bidirectional and can be chatty, meaning many turns in traffic communication during the course of a transaction (versus the relatively unidirectional flow of bulk transfer applications). The test device must not only support bulk TCP transfer application traffic but MUST also support chatty traffic. A valid stress test SHOULD include both traffic types. This is due to the non-uniform, bursty nature of chatty applications versus the relatively uniform nature of bulk transfers (the bulk transfer smoothly stabilizes to equilibrium state under lossless conditions). While iperf is an excellent choice for TCP bulk transfer testing, the "netperf" open source tool provides the ability to control client and server request/response behavior. The netperf-wrapper tool is a Python script that runs multiple simultaneous netperf instances and aggregates the results. Appendix A provides an overview of netperf/netperf-wrapper, as well as iperf. As with any software- based tool, the performance must be qualified to the link speed to be tested. Hardware-based test equipment should be considered for reliable results at higher link speeds (e.g., 1 GigE, 10 GigE).
An application could be fully emulated up to Layer 7; however, this framework proposes that stateful TCP test patterns be used in order to provide granular and repeatable control for the benchmarks. The following diagram illustrates a simple web-browsing application (HTTP). GET URL Client -------------------------> Web | Web 200 OK 100 ms | | Browser <------------------------- Server Figure 3: Simple Flow Diagram for a Web Application In this example, the Client Web Browser (client) requests a URL, and then the Web Server delivers the web page content to the client (after a server delay of 100 milliseconds). This asynchronous "request/response" behavior is intrinsic to most TCP-based applications, such as email (SMTP), file transfers (FTP and Server Message Block (SMB)), database (SQL), web applications (SOAP), and Representational State Transfer (REST). The impact on the network elements is due to the multitudes of clients and the variety of bursty traffic, which stress traffic management functions. The actual emulation of the specific application protocols is not required, and TCP test patterns can be defined to mimic the application network traffic flows and produce repeatable results. Application modeling techniques have been proposed in [3GPP2-C_R1002-A], which provides examples to model the behavior of HTTP, FTP, and Wireless Application Protocol (WAP) applications at the TCP layer. The models have been defined with various mathematical distributions for the request/response bytes and inter-request gap times. The model definition formats described in [3GPP2-C_R1002-A] are the basis for the guidelines provided in Appendix B and are also similar to formats used by network modeling tools. Packet captures can also be used to characterize application traffic and specify some of the test patterns listed in Appendix B. This framework does not specify a fixed set of TCP test patterns but does provide test cases that SHOULD be performed; see Appendix B. Some of these examples reflect those specified in [CA-Benchmark], which suggests traffic mixes for a variety of representative application profiles. Other examples are simply well-known application traffic types such as HTTP.