Network Working Group B. Aboba Request for Comments: 2975 Microsoft Corporation Category: Informational J. Arkko Ericsson D. Harrington Cabletron Systems Inc. October 2000 Introduction to Accounting Management Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2000). All Rights Reserved.
AbstractThe field of Accounting Management is concerned with the collection of resource consumption data for the purposes of capacity and trend analysis, cost allocation, auditing, and billing. This document describes each of these problems, and discusses the issues involved in design of modern accounting systems. Since accounting applications do not have uniform security and reliability requirements, it is not possible to devise a single accounting protocol and set of security services that will meet all needs. Thus the goal of accounting management is to provide a set of tools that can be used to meet the requirements of each application. This document describes the currently available tools as well as the state of the art in accounting protocol design. A companion document, RFC 2924, reviews the state of the art in accounting attributes and record formats.
1. Introduction 2 1.1 Requirements language 3 1.2 Terminology 3 1.3 Accounting management architecture 5 1.4 Accounting management objectives 7 1.5 Intra-domain and inter-domain accounting 10 1.6 Accounting record production 11 1.7 Requirements summary 13 2. Scaling and reliability 14 2.1 Fault resilience 14 2.2 Resource consumption 23 2.3 Data collection models 26 3. Review of Accounting Protocols 32 3.1 RADIUS 32 3.2 TACACS+ 33 3.3 SNMP 33 4. Review of Accounting Data Transfer 43 4.1 SMTP 44 4.2 Other protocols 44 5. Summary 45 6. Security Considerations 48 7. Acknowledgments 48 8. References 48 9. Authors' Addresses 52 10. Intellectual Property Statement 53 11. Full Copyright Statement 54 RFC 2924, reviews the state of the art in accounting attributes and record formats.
the recommended process. Accomplishing this may require security services such as authentication and integrity protection. Cost Allocation The act of allocating costs between entities. Note that cost allocation and rating are fundamentally different processes. In cost allocation the objective is typically to allocate a known cost among several entities. In rating the objective is to determine the amount to be charged for use of a resource. In cost allocation, the cost per unit of resource may need to be determined; in rating, this is typically a given. Interim accounting Interim accounting provides a snapshot of usage during a user's session. This may be useful in the event of a device reboot or other network problem that prevents the reception or generation of a session summary packet or session record. Interim accounting records can always be summarized without the loss of information. Note that interim accounting records may be stored internally on the device (such as in non-volatile storage) so as to survive a reboot and thus may not always be transmitted over the wire. Session record A session record represents a summary of the resource consumption of a user over the entire session. Accounting gateways creating the session record may do so by processing interim accounting events or accounting events from several devices serving the same user. Accounting Protocol A protocol used to convey data for accounting purposes. Intra-domain accounting Intra-domain accounting involves the collection of information on resource usage within an administrative domain, for use within that domain. In intra-domain accounting, accounting packets and session records typically do not cross administrative boundaries. Inter-domain accounting Inter-domain accounting involves the collection of information on resource usage within an administrative
domain, for use within another administrative domain. In inter-domain accounting, accounting packets and session records will typically cross administrative boundaries. Real-time accounting Real-time accounting involves the processing of information on resource usage within a defined time window. Time constraints are typically imposed in order to limit financial risk. Accounting server The accounting server receives accounting data from devices and translates it into session records. The accounting server may also take responsibility for the routing of session records to interested parties. 8], the distinction can be made by examining the domain portion of the NAI. If the domain portion is absent or corresponds to the local domain, then the session record is treated as an intra-domain accounting event. Otherwise, it is treated as an inter-domain accounting event.
Intra-domain accounting events are typically routed to the local billing server, while inter-domain accounting events will be routed to accounting servers operating within other administrative domains. While it is not required that session record formats used in inter and intra-domain accounting be the same, this is desirable, since it eliminates translations that would otherwise be required. Where a proxy forwarder is employed, domain-based access controls may be employed by the proxy forwarder, rather than by the devices themselves. The network device will typically speak an accounting protocol to the proxy forwarder, which may then either convert the accounting packets to session records, or forward the accounting packets to another domain. In either case, domain separation is typically achieved by having the proxy forwarder sort the session records or accounting messages by destination. Where the accounting proxy is not trusted, it may be difficult to verify that the proxy is issuing correct session records based on the accounting messages it receives, since the original accounting messages typically are not forwarded along with the session records. Therefore where trust is an issue, the proxy typically forwards the accounting packets themselves. Assuming that the accounting protocol supports data object security, this allows the end-points to verify that the proxy has not modified the data in transit or snooped on the packet contents.
The diagram below illustrates the accounting management architecture: +------------+ | | | Network | | Device | | | +------------+ | Accounting | Protocol | | V +------------+ +------------+ | | | | | Org B | Inter-domain session records | Org A | | Acctg. |<----------------------------->| Acctg. | |Proxy/Server| or accounting protocol | Server | | | | | +------------+ +------------+ | | | | Transfer | Intra-domain | Protocol | Session records | | | V V +------------+ +------------+ | | | | | Org B | | Org A | | Billing | | Billing | | Server | | Server | | | | | +------------+ +------------+
requirements while still providing the forecast with the desired statistical accuracy, it may be possible to tolerate high packet loss as long as bias is not introduced. The security requirements for trend analysis and capacity planning depend on the circumstances of data collection and the sensitivity of the data. Additional security services may be required when data is being transferred between administrative domains. For example, when information is being collected and analyzed within the same administrative domain, integrity protection and authentication may be used in order to guard against collection of invalid data. In inter-domain applications confidentiality may be desirable to guard against snooping by third parties.
Whether these techniques will be useful varies by application since the degree of financial exposure is application-dependent. For dial-up Internet access from a local provider, charges are typically low and therefore the risk of loss is small. However, in the case of dial-up roaming or voice over IP, time-based charges may be substantial and therefore the risk of fraud is larger. In such situations it is highly desirable to quickly detect unusual account activity, and it may be desirable for authorization to depend on ability to pay. In situations where valuable resources can be reserved, or where charges can be high, very large bills may be rung up quickly, and processing may need to be completed within a defined time window in order to limit exposure. Since in usage-sensitive systems, accounting data translates into revenue, the security and reliability requirements are greater. Due to financial and legal requirements such systems need to be able to survive an audit. Thus security services such as authentication, integrity and replay protection are frequently required and confidentiality and data object integrity may also be desirable. Application-layer acknowledgments are also often required so as to guard against accounting server failures.
21]- and activity-based costing techniques described in  are typically based on detailed analysis of usage data, and as a result they are almost always usage-sensitive. Whether these techniques are applied to allocation of costs between partners in a venture or to allocation of costs between departments in a single firm, cost allocation models often have profound behavioral and financial impacts. As a result, systems developed for this purposes are typically as concerned with reliable data collection and security as are billing applications. Due to financial and legal requirements, archival accounting practices are frequently required in this application.
Since inter-domain accounting applications involve transfers of accounting data between domains, additional security measures may be desirable. In addition to authentication, replay and integrity protection, it may be desirable to deploy security services such as confidentiality and data object integrity. In inter-domain accounting each involved party also typically requires a copy of each accounting event for invoice generation and auditing.
control the generation of accounting records. This is of importance in inter-domain accounting or when network devices do not have tariff information. The centralized control of accounting record production can be realized, for instance, by having authorization servers require re-authorization at certain times and requiring the production of accounting records upon each re-authorization. In conclusion, in some cases it is necessary to produce multiple accounting records from a single session. It must be possible to do this without requiring the user to start a new session or to re- authenticate. The production of multiple records can be controlled either by the network device or by the AAA server. The requirements for timeliness, security and reliability in multiple record sessions are the same as for single-record sessions.
18], "once the cable is cut you don't need more retransmissions, you need a *lot* more voltage." Thus, the choice of transport has no impact on resilience against faults such as network partition, accounting server failures or device reboots. What does provide resilience against these faults is non-volatile storage. The importance of non-volatile storage in design of reliable accounting systems cannot be over-emphasized. Without non-volatile storage, event-driven systems will lose data once the transmission timeout has been exceeded, and batching designs will experience data loss once the internal memory used for accounting data storage has been exceeded. Via use of non-volatile storage, and internally stored interim records, most of these data losses can be avoided. It may even be argued that non-volatile storage is more important to accounting reliability than network connectivity, since for many years reliable accounting systems were implemented based solely on physical storage, without any network connectivity. For example,
phone usage data used to be stored on paper, film, or magnetic media and carried from the place of collection to a central location for bill processing. 25] and in TACACS+. While interim accounting can provide resilience against packet loss, server failures, short-duration network failures, or device reboot, its applicability is limited. Transmission of interim accounting data over the wire should not be thought of as a mainstream reliability improvement technique since it increases use of network bandwidth in normal operation, while providing benefits only in the event of a fault. Since most packet loss on the Internet is due to congestion, sending interim accounting data over the wire can make the problem worse by increasing bandwidth usage. Therefore on-the-wire interim accounting is best restricted to high-value accounting data such as information on long-lived sessions. To protect against loss of data on such sessions, the interim reporting interval is typically set several standard deviations larger than the average session duration. This ensures that most sessions will not result in generation of interim accounting events and the additional bandwidth consumed by interim accounting will be limited. However, as the interim accounting interval decreases toward the average session time, the additional bandwidth consumed by interim accounting increases markedly, and as a result, the interval must be set with caution. Where non-volatile storage is unavailable, interim accounting can also result in excessive consumption of memory that could be better allocated to storage of session data. As a result, implementors should be careful to ensure that new interim accounting data overwrites previous data rather than accumulating additional interim records in memory, thereby worsening the buffer exhaustion problem. Given the increasing popularity of non-volatile storage for use in consumer devices such as digital cameras, such devices are rapidly declining in price. This makes it increasingly feasible for network devices to include built-in support for non-volatile storage. This can be accomplished, for example, by support for compact PCMCIA cards.
Where non-volatile storage is available, this can be used to store interim accounting data. Stored interim events are then replaced by updated interim events or by session data when the session completes. The session data can itself be erased once the data has been transmitted and acknowledged at the application layer. This approach avoids interim data being transmitted over the wire except in the case of a device reboot. When a device reboots, internally stored interim records are transferred to the accounting server.
(NAPs) where packet loss may be substantial. Resilience against packet loss can be accomplished via implementation of a retry mechanism on top of UDP, or use of TCP  or SCTP . On-the-wire interim accounting provides only limited benefits in mitigating the effects of packet loss. UDP-based transport is frequently used in accounting applications. However, this is not appropriate in all cases. Where accounting data will not fit within a single UDP packet without fragmentation, use of TCP or SCTP transport may be preferred to use of multiple round-trips in UDP. As noted in  and , this may be an issue in the retrieval of large tables. In addition, in cases where congestion is likely, such as in inter- domain accounting, TCP or SCTP congestion control and round-trip time estimation will be very useful, optimizing throughput. In applications which require maintenance of session state, such as simultaneous usage control, TCP and application-layer keep alive packets or SCTP with its built-in heartbeat capabilities provide a mechanism for keeping track of session state. When implementing UDP retransmission, there are a number of issues to keep in mind: Data model Retry behavior Congestion control Timeout behavior Accounting reliability can be influenced by how the data is modeled. For example, it is almost always preferable to use cumulative variables rather than expressing accounting data in terms of a change from a previous data item. With cumulative data, the current state can be recovered by a successful retrieval, even after many packets have been lost. However, if the data is transmitted as a change then the state will not be recovered until the next cumulative update is sent. Thus, such implementations are much more vulnerable to packet loss, and should be avoided wherever possible. In designing a UDP retry mechanism, it is important that the retry timers relate to the round-trip time, so that retransmissions will not typically occur within the period in which acknowledgments may be expected to arrive. Accounting bandwidth may be significant in some circumstances, so that the added traffic due to unnecessary retransmissions may increase congestion levels.
Congestion control in accounting data transfer is a somewhat controversial issue. Since accounting traffic is often considered mission-critical, it has been argued that congestion control is not a requirement; better to let other less-critical traffic back off in response to congestion. Moreover, without non-volatile storage, congestive back-off in accounting applications can result in data loss due to buffer exhaustion. However, it can also be argued that in modern accounting implementations, it is possible to implement congestion control while improving throughput and maintaining high reliability. In circumstances where there is sustained packet loss, there simply is not sufficient capacity to maintain existing transmission rates. Thus, aggregate throughput will actually improve if congestive back- off is implemented. This is due to elimination of retransmissions and the ability to utilize techniques such as RED to desynchronize flows. In addition, with QoS mechanisms such as differentiated services, it is possible to mark accounting packets for preferential handling so as to provide for lower packet loss if desired. Thus considerable leeway is available to the network administrator in controlling the treatment of accounting packets and hard coding inelastic behavior is unnecessary. Typically, systems implementing non-volatile storage allow for backlogged accounting data to be placed in non-volatile storage pending transmission, so that buffer exhaustion resulting from congestive back-off need not be a concern. Since UDP is not really a transport protocol, UDP-based accounting protocols such as  often do not prescribe timeout behavior. Thus implementations may exhibit widely different behavior. For example, one implementation may drop accounting data after three constant duration retries to the same server, while another may implement exponential back-off to a given server, then switch to another server, up to a total timeout interval of twelve hours, while storing the untransmitted data on non-volatile storage. The practical difference between these approaches is substantial; the former approach will not satisfy archival accounting requirements while the latter may. More predictable behavior can be achieved via use of SCTP or TCP transport.
For protocols based on TCP, it is possible for the device to maintain connections to both the primary and secondary accounting servers, using the secondary connection after expiration of a timer on the primary connection. Alternatively, it is possible to open a connection to the secondary accounting server after a timeout or loss of the primary connection, or on expiration of a timer. Thus, accounting protocols based on TCP are capable of responding more rapidly to connectivity failures than TCP timeouts would otherwise allow, at the expense of an increased risk of duplicates. With SCTP, it is possible to control transport layer timeout behavior, and therefore it is not necessary for the accounting application to maintain its own timers. SCTP also enables multiplexing of multiple connections within a single transport connection, all maintaining the same congestion control state, avoiding the "head of line blocking" issues that can occur with TCP. However, since SCTP is not widely available, use of this transport can impose an additional implementation burden on the designer. For protocols using UDP, transmission to the secondary server can occur after a number of retries or timer expiration. For compatibility with congestion avoidance, it is advisable to incorporate techniques such as round-trip-time estimation, slow start and congestive back-off. Thus the accounting protocol designer utilizing UDP often is lead to re-inventing techniques already existing in TCP and SCTP. As a result, the use of raw UDP transport in accounting applications is not recommended. With any transport it is possible for the primary and secondary accounting servers to receive duplicate packets, so support for duplicate elimination is required. Since accounting server failures can result in data accumulation on accounting clients, use of non- volatile storage can ensure against data loss due to transmission timeouts or buffer exhaustion. On-the-wire interim accounting provides only limited benefits in mitigating the effects of accounting server failures.
In such cases it is desirable to distinguish between transport layer acknowledgment and application layer acknowledgment. Even though both acknowledgments may be sent within the same packet (such as a TCP segment carrying an application layer acknowledgment along with a piggy-backed ACK), the semantics are different. A transport-layer acknowledgment means "the transport layer has taken responsibility for delivering the data to the application", while an application- layer acknowledgment means "the application has taken responsibility for the data". A common misconception is that use of TCP transport guarantees that data is delivered to the application. However, as noted in RFC 793 : An acknowledgment by TCP does not guarantee that the data has been delivered to the end user, but only that the receiving TCP has taken the responsibility to do so. Therefore, if receiving TCP fails after sending the ACK, the application may not receive the data. Similarly, if the application fails prior to committing the data to stable storage, the data may be lost. In order for a sending application to be sure that the data it sent was received by the receiving application, either a graceful close of the TCP connection or an application-layer acknowledgment is required. In order to protect against data loss, it is necessary that the application-layer acknowledgment imply that the data has been written to stable storage or suitably processed so as to guard against loss. In the case of partial failures, it is possible for the transport layer to acknowledge receipt via transport layer acknowledgment, without having delivered the data to the application. Similarly, the application may not complete the tasks necessary to take responsibility for the data. For example, an accounting server may receive data from the transport layer but be incapable of storing it data due to a back end database problem or disk fault. In this case it should not send an application layer acknowledgment, even though a a transport layer acknowledgment is appropriate. Rather, an application layer error message should be sent indicating the source of the problem, such as "Backend store unavailable". Thus application-layer acknowledgment capability requires not only the ability to acknowledge when the application has taken responsibility for the data, but also the ability to indicate when the application has not taken responsibility for the data, and why.
accounting proxy responsible for delivering accounting packets. If the accounting proxy involves moving parts (e.g. a disk drive) while the devices do not, overall system reliability can be reduced. Store and forward accounting proxies only add value in situations where the accounting subsystem is unreliable. For example, where devices do not implement non-volatile storage and the accounting protocol lacks transport and application layer reliability, locating the accounting proxy (with its stable storage) close to the device can reduce the risk of data loss. However, such systems are inherently unreliable so that they are only appropriate for use in capacity planning or non-usage sensitive billing applications. If archival accounting reliability is desired, it is necessary to engineer a reliable accounting system from the start using the techniques described in this document, rather than attempting to patch an inherently unreliable system by adding store and forward accounting proxies.
compression algorithms are only typically applied to session records so as to enable implementation of interim data overwrite.
4], transfer only one accounting event per packet, which is inefficient. Without non-volatile storage, a pure event-driven model typically stores accounting events that have not yet been delivered only until the timeout interval expires. As a result this model has the smallest memory requirements. Once the timeout interval has expired, the accounting event is lost, even if the device has sufficient buffer space to continue to store it. As a result, the event-driven model is the least reliable, since accounting data loss will occur due to device reboots, sustained packet loss, or network failures of duration greater than the timeout interval. In event-driven protocols without a "keep alive" message, accounting servers cannot assume a device failure should no messages arrive for an extended period. Thus, event-driven accounting systems are typically not useful in monitoring of device health. The event-driven model is frequently used in shared use networks and roaming, since this model sends data to the recipient domains without requiring them to poll a large number of devices, most of which have no relevant data. Since the event-driven model typically does not support batching, it permits accounting records to be sent with low processing delay, enabling application of fraud prevention techniques. However, because roaming accounting events are frequently of high value, the poor reliability of this model is an issue. As a result, the event-driven polling model may be more appropriate. Per-session state is typical of event-driven systems without batching. As a result, the event-driven approach scales poorly. However, event-driven systems offer the lowest processing delay since events are processed immediately and there is no possibility of an event requiring low processing delay being caught behind a batch transfer.
An event-driven system with batching will store accounting events that have not yet been delivered up to the limits of memory. As a result, accounting data loss will occur due to device reboots, but not due to packet loss or network failures of sufficiently short duration to be handled within available memory. Note that while transfer efficiency will increase with batch size, without non- volatile storage, the potential data loss from a device reboot will also increase. Where event-driven systems with batching have a keep-alive interval and run over reliable transport, the accounting server can assume that a failure has occurred if no messages are received within the keep-alive interval. Thus, such implementations can be useful in monitoring of device health. When used for this purpose the average time delay prior to failure detection is one half the keep-alive interval. Through implementation of a scheduling algorithm, event-driven systems with batching can deliver appropriate service to accounting events that require low processing delay. For example, high-value inter-domain accounting events could be sent immediately, thus enabling use of fraud-prevention techniques, while all other events would be batched. However, there is a possibility that an event requiring low processing delay will be caught behind a batch transfer in progress. Thus the maximum processing delay is proportional to the maximum batch size divided by the link speed. Event-driven systems with batching scale with the number of active devices. As a result this approach scales better than the pure event-driven approach, or even the polling approach, and is equivalent in terms of scaling to the event-driven polling approach. However, the event-driven batching approach has lower processing delay than the event-driven polling approach, since delivery of accounting data requires fewer round-trips and events requiring low processing delay can be accommodated if a scheduling algorithm is employed.
Without non-volatile storage, an event-driven polling model will lose data due to device reboots, but not due to packet loss, or network partitions of short-duration. Unless a minimum delivery interval is set, event-driven polling systems are not useful in monitoring of device health. The event-driven polling model can be suitable for use in roaming since it permits accounting data to be sent to the roaming partners with low processing delay. At the same time non-roaming accounting can be handled via more efficient polling techniques, thereby providing the best of both worlds. Where batching can be implemented, the state required in event-driven polling can be reduced to scale with the number of active devices. If portions of the network vary widely in usage, then this state may actually be less than that of the polling approach. Note that processing delay in this approach is higher than in event-driven accounting with batching since at least two round-trips are required to deliver data: one for the event notification, and one for the resulting poll.