Network Working Group M. Mathis Request for Comments: 4898 J. Heffner Category: Standards Track Pittsburgh Supercomputing Center R. Raghunarayan Cisco Systems May 2007 TCP Extended Statistics MIB Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract This document describes extended performance statistics for TCP. They are designed to use TCP's ideal vantage point to diagnose performance problems in both the network and the application. If a network-based application is performing poorly, TCP can determine if the bottleneck is in the sender, the receiver, or the network itself. If the bottleneck is in the network, TCP can provide specific information about its nature. Table of Contents 1. Introduction ....................................................2 2. The Internet-Standard Management Framework ......................2 3. Overview ........................................................2 3.1. MIB Initialization and Persistence .........................4 3.2. Relationship to TCP Standards ..............................4 3.3. Diagnosing SYN-Flood Denial-of-Service Attacks .............6 4. TCP Extended Statistics MIB .....................................7 5. Security Considerations ........................................69 6. IANA Considerations ............................................70 7. Normative References ...........................................70 8. Informative References .........................................72 9. Contributors ...................................................73 10. Acknowledgments ...............................................73
1. Introduction This document describes extended performance statistics for TCP. They are designed to use TCP's ideal vantage point to diagnose performance problems in both the network and the application. If a network-based application is performing poorly, TCP can determine if the bottleneck is in the sender, the receiver, or the network itself. If the bottleneck is in the network, TCP can provide specific information about its nature. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. The Simple Network Management Protocol (SNMP) objects defined in this document extend TCP MIB, as specified in RFC 4022 [RFC4022]. In addition to several new scalars and other objects, it augments two tables and makes one clarification to RFC 4022. Existing management stations for the TCP MIB are expected to be fully compatible with these clarifications. 2. The Internet-Standard Management Framework For a detailed overview of the documents that describe the current Internet-Standard Management Framework, please refer to section 7 of RFC 3410 [RFC3410]. Managed objects are accessed via a virtual information store, termed the Management Information Base or MIB. MIB objects are generally accessed through the Simple Network Management Protocol (SNMP). Objects in the MIB are defined using the mechanisms defined in the Structure of Management Information (SMI). This memo specifies a MIB module that is compliant to the SMIv2, which is described in STD 58, RFC 2578 [RFC2578], STD 58, RFC 2579 [RFC2579] and STD 58, RFC 2580 [RFC2580]. 3. Overview The TCP-ESTATS-MIB defined in this memo consists of two groups of scalars, seven tables, and two notifications: * The first group of scalars contain statistics of the TCP protocol engine not covered in RFC 4022. This group consists of the single scalar tcpEStatsListenerTableLastChange, which provides management stations with an easier mechanism to validate their listener caches.
* The second group of scalars consist of knobs to enable and disable information collection by the tables containing connection-related statistics/information. For example, the tcpEStatsControlPath object controls the activation of the tcpEStatsPathTable. The tcpEStatsConnTableLatency object determines how long connection table rows are retained after a TCP connection transitions into the closed state. * The tcpEStatsListenerTable augments tcpListenerTable in TCP-MIB [RFC4022] to provide additional information on the active TCP listeners on a device. It supports objects to monitor and diagnose SYN-flood denial-of-service attacks as described below. * The tcpEStatsConnectIdTable augments the tcpConnectionTable in TCP-MIB [RFC4022] to provide a mapping between connection 4-tuples (which index tcpConnectionTable) and an integer connection index, tcpEStatsConnectIndex. The connection index is used to index into the five remaining tables in this MIB module, and is designed to facilitate rapid polling of multiple objects associated with one TCP connection. * The tcpEStatsPerfTable contains objects that are useful for measuring TCP performance and first check problem diagnosis. * The tcpEStatsPathTable contains objects that can be used to infer detailed behavior of the Internet path, such as the extent that there are segment losses or reordering, etc. * The tcpEStatsStackTable contains objects that are most useful for determining how well the TCP control algorithms are coping with this particular path. * The tcpEStatsAppTable provides objects that are useful for determining if the application using TCP is limiting TCP performance. * The tcpEStatsTuneTable provides per-connection controls that can be used to work around a number of common problems that plague TCP over some paths. * The two notifications defined in this MIB module are tcpEStatsEstablishNotification, indicating that a new connection has been accepted (or established, see below), and tcpEStatsCloseNotification, indicating that an existing connection has recently closed.
3.1. MIB Initialization and Persistence The TCP protocol itself is specifically designed not to preserve any state whatsoever across system reboots, and enforces this by requiring randomized Initial Sequence numbers and ephemeral ports under any conditions where segments from old connections might corrupt new connections following a reboot. All of the objects in the MIB MUST have the same persistence properties as the underlying TCP implementation. On a reboot, all zero-based counters MUST be cleared, all dynamically created table rows MUST be deleted, and all read-write objects MUST be restored to their default values. It is assumed that all TCP implementation have some initialization code (if nothing else, to set IP addresses) that has the opportunity to adjust tcpEStatsConnTableLatency and other read-write scalars controlling the creation of the various tables, before establishing the first TCP connection. Implementations MAY also choose to make these control scalars persist across reboots. The ZeroBasedCounter32 and ZeroBasedCounter64 objects in the listener and connection tables are initialized to zero when the table row is created. The tcpEStatsConnTableLatency object determines how long connection table rows are retained after a TCP connection transitions into the closed state, to permit reading final connection completion statistics. In RFC 4022 (TCP-MIB), the discussion of tcpConnectionTable row latency (page 9) the words "soon after" are understood to mean after tcpEStatsConnTableLatency, such that all rows of all tables associated with one connection are retained at least tcpEStatsConnTableLatency after connection close. This clarification to RFC 4022 only applies when TCP-ESTATS-MIB is implemented. If TCP-ESTATS-MIB is not implemented, RFC 4022 permits an unspecified delay between connection close and row deletion. 3.2. Relationship to TCP Standards There are more than 70 RFCs and other documents that specify various aspects of the Transmission Control Protocol (TCP) [RFC4614]. While most protocols are completely specified in one or two documents, this has not proven to be feasible for TCP. TCP implements a reliable end-to-end data transport service over a very weakly constrained IP datagram service. The essential problem that TCP has to solve is balancing the applications need for fast and reliable data transport against the need to make fair, efficient, and equitable use of network resources, with only sparse information about the state of the network or its capabilities.
TCP maintains this balance through the use of many estimators and heuristics that regulate various aspects of the protocol. For example, RFC 2988 describes how to calculate the retransmission timer (RTO) from the average and variance of the network round-trip-time (RTT), as estimated from the round-trip time sampled on some data segments. Although these algorithms are standardized, they are a compromise which is optimal for only common Internet environments. Other estimators might yield better results (higher performance or more efficient use of the network) in some environments, particularly under uncommon conditions. It is the consensus of the community that nearly all of the estimators and heuristics used in TCP might be improved through further research and development. For this reason, nearly all TCP documents leave some latitude for future improvements, for example, by the use of "SHOULD" instead of "MUST" [RFC2119]. Even standard algorithms that are required because they critically effect fairness or the dynamic stability of Internet congestion control, include some latitude for evolution. As a consequence, there is considerable diversity in the details of the TCP implementations actually in use today. The fact that the underlying algorithms are not uniform makes it difficult to tightly specify a MIB. We could have chosen the point of view that the MIB should publish precisely defined metrics of the network path, even if they are different from the estimators in use by TCP. This would make the MIB more useful as a measurement tool, but less useful for understanding how any specific TCP implementation is interacting with the network path and upper protocol layers. We chose instead to have the MIB expose the estimators and important states variables of the algorithms in use, without constraining the TCP implementation. As a consequence, the MIB objects are defined in terms of fairly abstract descriptions (e.g., round-trip time), but are intended to expose the actual estimators or other state variables as they are used in TCP implementations, possibly transformed (e.g., scaled or otherwise adjusted) to match the spirit of the object descriptions in this document. This may mean that MIB objects may not be exactly comparable between two different TCP implementations. A general management station can only assume the abstract descriptions, which are useful for a general assessment of how TCP is functioning. To a TCP implementer with detailed knowledge about the TCP implementation on a specific host, this MIB might be useful for debugging or evaluating the algorithms in their implementation.
Under no conditions is this MIB intended to constrain TCP to use (or exclude) any particular estimator, heuristic, algorithm, or implementation. 3.3. Diagnosing SYN-Flood Denial-of-Service Attacks The tcpEStatsListenerTable is specifically designed to provide information that is useful for diagnosing SYN-flood Denial-of-Service attacks, where a server is overwhelmed by forged or otherwise malicious connection attempts. There are several different techniques that can be used to defend against SYN-flooding but none are standardized [Edd06]. These different techniques all have the same basic characteristics that are instrumentable with a common set of objects, even though the techniques differ greatly in the details. All SYN-flood defenses avoid allocating significant resources (memory or CPU) to incoming (passive open) connections until the connections meet some liveness criteria (to defend against forged IP source addresses) and the server has sufficient resources to process the incoming request. Note that allocating resources is an implementation-specific event that may not correspond to an observable protocol event (e.g., segments on the wire). There are two general concepts that can be applied to all known SYN-flood defenses. There is generally a well-defined event when a connection is allocated full resources, and a "backlog" -- a queue of embryonic connections that have been allocated only partial resources. In many implementations, incoming TCP connections are allocated resources as a side effect of the POSIX [POSIX] accept() call. For this reason we use the terminology "accepting a connection" to refer to this event: committing sufficient network resources to process the incoming request. Accepting a connection typically entails allocating memory for the protocol control block [RFC793], the per- connection table rows described in this MIB and CPU resources, such as process table entries or threads. Note that it is not useful to accept connections before they are ESTABLISHED, because this would create an easy opportunity for Denial-of-Service attacks, using forged source IP addresses. The backlog consists of connections that are in SYN-RCVD or ESTABLISHED states, that have not been accepted. For purposes of this MIB, we assume that these connections have been allocated some resources (e.g., an embryonic protocol control block), but not full resources (e.g., do not yet have MIB table rows).
Note that some SYN-Flood defenses dispense with explicit SYN-RCVD state by cryptographically encoding the state in the ISS (initial sequence number sent) of the SYN-ACK (sometimes called a syn-cookie), and then using the sequence number of the first ACK to reconstruct the SYN-RCVD state before transitioning to the ESTABLISHED state. For these implementations there is no explicit representation of the SYN-RCVD state, and the backlog only consists of connections that are ESTABLISHED and are waiting to be ACCEPTED. Furthermore, most SYN-flood defenses have some mechanism to throttle connections that might otherwise overwhelm this endpoint. They generally use some combination of discarding incoming SYNs and discarding connections already in the backlog. This does not cause all connections from legitimate clients to fail, as long as the clients retransmit the SYN or first ACK as specified in RFC 793. Most diversity in SYN flood defenses arise from variations in these algorithms to limit load, and therefore cannot be instrumented with a common standard MIB. The Listen Table instruments all passively opened TCP connections in terms of observable protocol events (e.g., sent and received segments) and resource allocation events (entering the backlog and being accepted). This approach eases generalization to SYN-flood mechanisms that use alternate TCP state transition diagrams and implicit mechanisms to encode some states.