Network Working Group T. Speakman Request for Comments: 3208 Cisco Systems Category: Experimental J. Crowcroft UCL J. Gemmell Microsoft D. Farinacci Procket Networks S. Lin Juniper Networks D. Leshchiner TIBCO Software M. Luby Digital Fountain T. Montgomery Talarian Corporation L. Rizzo University of Pisa A. Tweedly N. Bhaskar R. Edmonstone R. Sumanasekera L. Vicisano Cisco Systems December 2001 PGM Reliable Transport Protocol Specification Status of this Memo This memo defines an Experimental Protocol for the Internet community. It does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract Pragmatic General Multicast (PGM) is a reliable multicast transport protocol for applications that require ordered or unordered, duplicate-free, multicast data delivery from multiple sources to multiple receivers. PGM guarantees that a receiver in the group either receives all data packets from transmissions and repairs, or is able to detect unrecoverable data packet loss. PGM is
specifically intended as a workable solution for multicast applications with basic reliability requirements. Its central design goal is simplicity of operation with due regard for scalability and network efficiency. Table of Contents 1. Introduction and Overview .................................. 3 2. Architectural Description .................................. 9 3. Terms and Concepts ......................................... 12 4. Procedures - General ....................................... 18 5. Procedures - Sources ....................................... 19 6. Procedures - Receivers ..................................... 22 7. Procedures - Network Elements .............................. 27 8. Packet Formats ............................................. 31 9. Options .................................................... 40 10. Security Considerations .................................... 56 11. Appendix A - Forward Error Correction ...................... 58 12. Appendix B - Support for Congestion Control ................ 72 13. Appendix C - SPM Requests .................................. 79 14. Appendix D - Poll Mechanism ................................ 82 15. Appendix E - Implosion Prevention .......................... 92 16. Appendix F - Transmit Window Example ....................... 98 17 Appendix G - Applicability Statement ....................... 103 18. Abbreviations .............................................. 105 19. Acknowledgments ............................................ 106 20. References ................................................. 106 21. Authors' Addresses.......................................... 108 22. Full Copyright Statement ................................... 111 Nota Bene: The publication of this specification is intended to freeze the definition of PGM in the interest of fostering both ongoing and prospective experimentation with the protocol. The intent of that experimentation is to provide experience with the implementation and deployment of a reliable multicast protocol of this class so as to be able to feed that experience back into the longer-term standardization process underway in the Reliable Multicast Transport Working Group of the IETF. Appendix G provides more specific detail on the scope and status of some of this experimentation. Reports of experiments include [16-23]. Additional results and new experimentation are encouraged.
1. Introduction and Overview A variety of reliable protocols have been proposed for multicast data delivery, each with an emphasis on particular types of applications, network characteristics, or definitions of reliability (, , , ). In this tradition, Pragmatic General Multicast (PGM) is a reliable transport protocol for applications that require ordered or unordered, duplicate-free, multicast data delivery from multiple sources to multiple receivers. PGM is specifically intended as a workable solution for multicast applications with basic reliability requirements rather than as a comprehensive solution for multicast applications with sophisticated ordering, agreement, and robustness requirements. Its central design goal is simplicity of operation with due regard for scalability and network efficiency. PGM has no notion of group membership. It simply provides reliable multicast data delivery within a transmit window advanced by a source according to a purely local strategy. Reliable delivery is provided within a source's transmit window from the time a receiver joins the group until it departs. PGM guarantees that a receiver in the group either receives all data packets from transmissions and repairs, or is able to detect unrecoverable data packet loss. PGM supports any number of sources within a multicast group, each fully identified by a globally unique Transport Session Identifier (TSI), but since these sources/sessions operate entirely independently of each other, this specification is phrased in terms of a single source and extends without modification to multiple sources. More specifically, PGM is not intended for use with applications that depend either upon acknowledged delivery to a known group of recipients, or upon total ordering amongst multiple sources. Rather, PGM is best suited to those applications in which members may join and leave at any time, and that are either insensitive to unrecoverable data packet loss or are prepared to resort to application recovery in the event. Through its optional extensions, PGM provides specific mechanisms to support applications as disparate as stock and news updates, data conferencing, low-delay real-time video transfer, and bulk data transfer. In the following text, transport-layer originators of PGM data packets are referred to as sources, transport-layer consumers of PGM data packets are referred to as receivers, and network-layer entities in the intervening network are referred to as network elements.
Unless otherwise specified, the term "repair" will be used to indicate both the actual retransmission of a copy of a missing packet or the transmission of an FEC repair packet. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119  and indicate requirement levels for compliant PGM implementations. 1.1. Summary of Operation PGM runs over a datagram multicast protocol such as IP multicast . In the normal course of data transfer, a source multicasts sequenced data packets (ODATA), and receivers unicast selective negative acknowledgments (NAKs) for data packets detected to be missing from the expected sequence. Network elements forward NAKs PGM-hop-by- PGM-hop to the source, and confirm each hop by multicasting a NAK confirmation (NCF) in response on the interface on which the NAK was received. Repairs (RDATA) may be provided either by the source itself or by a Designated Local Repairer (DLR) in response to a NAK. Since NAKs provide the sole mechanism for reliability, PGM is particularly sensitive to their loss. To minimize NAK loss, PGM defines a network-layer hop-by-hop procedure for reliable NAK forwarding. Upon detection of a missing data packet, a receiver repeatedly unicasts a NAK to the last-hop PGM network element on the distribution tree from the source. A receiver repeats this NAK until it receives a NAK confirmation (NCF) multicast to the group from that PGM network element. That network element responds with an NCF to the first occurrence of the NAK and any further retransmissions of that same NAK from any receiver. In turn, the network element repeatedly forwards the NAK to the upstream PGM network element on the reverse of the distribution path from the source of the original data packet until it also receives an NCF from that network element. Finally, the source itself receives and confirms the NAK by multicasting an NCF to the group. While NCFs are multicast to the group, they are not propagated by PGM network elements since they act as hop-by-hop confirmations.
To avoid NAK implosion, PGM specifies procedures for subnet-based NAK suppression amongst receivers and NAK elimination within network elements. The usual result is the propagation of just one copy of a given NAK along the reverse of the distribution path from any network with directly connected receivers to a source. The net effect is that unicast NAKs return from a receiver to a source on the reverse of the path on which ODATA was forwarded, that is, on the reverse of the distribution tree from the source. More specifically, they return through exactly the same sequence of PGM network elements through which ODATA was forwarded, but in reverse. The reasons for handling NAKs this way will become clear in the discussion of constraining repairs, but first it's necessary to describe the mechanisms for establishing the requisite source path state in PGM network elements. To establish source path state in PGM network elements, the basic data transfer operation is augmented by Source Path Messages (SPMs) from a source, periodically interleaved with ODATA. SPMs function primarily to establish source path state for a given TSI in all PGM network elements on the distribution tree from the source. PGM network elements use this information to address returning unicast NAKs directly to the upstream PGM network element toward the source, and thereby insure that NAKs return from a receiver to a source on the reverse of the distribution path for the TSI. SPMs are sent by a source at a rate that serves to maintain up-to- date PGM neighbor information. In addition, SPMs complement the role of DATA packets in provoking further NAKs from receivers, and maintaining receive window state in the receivers. As a further efficiency, PGM specifies procedures for the constraint of repairs by network elements so that they reach only those network segments containing group members that did not receive the original transmission. As NAKs traverse the reverse of the ODATA path (upward), they establish repair state in the network elements which is used in turn to constrain the (downward) forwarding of the corresponding RDATA. Besides procedures for the source to provide repairs, PGM also specifies options and procedures that permit designated local repairers (DLRs) to announce their availability and to redirect repair requests (NAKs) to themselves rather than to the original source. In addition to these conventional procedures for loss recovery through selective ARQ, Appendix A specifies Forward Error Correction (FEC) procedures for sources to provide and receivers to request general error correcting parity packets rather than selective retransmissions.
Finally, since PGM operates without regular return traffic from receivers, conventional feedback mechanisms for transport flow and congestion control cannot be applied. Appendix B specifies a TCP- friendly, NE-based solution for PGM congestion control, and cites a reference to a TCP-friendly, end-to-end solution for PGM congestion control. In its basic operation, PGM relies on a purely rate-limited transmission strategy in the source to bound the bandwidth consumed by PGM transport sessions and to define the transmit window maintained by the source. PGM defines four basic packet types: three that flow downstream (SPMs, DATA, NCFs), and one that flows upstream (NAKs). 1.2. Design Goals and Constraints PGM has been designed to serve that broad range of multicast applications that have relatively simple reliability requirements, and to do so in a way that realizes the much advertised but often unrealized network efficiencies of multicast data transfer. The usual impediments to realizing these efficiencies are the implosion of negative and positive acknowledgments from receivers to sources, repair latency from the source, and the propagation of repairs to disinterested receivers. 1.2.1. Reliability. Reliable data delivery across an unreliable network is conventionally achieved through an end-to-end protocol in which a source (implicitly or explicitly) solicits receipt confirmation from a receiver, and the receiver responds positively or negatively. While the frequency of negative acknowledgments is a function of the reliability of the network and the receiver's resources (and so, potentially quite low), the frequency of positive acknowledgments is fixed at at least the rate at which the transmit window is advanced, and usually more often. Negative acknowledgments primarily determine repairs and reliability. Positive acknowledgments primarily determine transmit buffer management. When these principles are extended without modification to multicast protocols, the result, at least for positive acknowledgments, is a burden of positive acknowledgments transmitted to the source that quickly threatens to overwhelm it as the number of receivers grows. More succinctly, ACK implosion keeps ACK-based reliable multicast protocols from scaling well.
One of the goals of PGM is to get as strong a definition of reliability as possible from as simple a protocol as possible. ACK implosion can be addressed in a variety of effective but complicated ways, most of which require re-transmit capability from other than the original source. An alternative is to dispense with positive acknowledgments altogether, and to resort to other strategies for buffer management while retaining negative acknowledgments for repairs and reliability. The approach taken in PGM is to retain negative acknowledgments, but to dispense with positive acknowledgments and resort instead to timeouts at the source to manage transmit resources. The definition of reliability with PGM is a direct consequence of this design decision. PGM guarantees that a receiver either receives all data packets from transmissions and repairs, or is able to detect unrecoverable data packet loss. PGM includes strategies for repeatedly provoking NAKs from receivers, and for adding reliability to the NAKs themselves. By reinforcing the NAK mechanism, PGM minimizes the probability that a receiver will detect a missing data packet so late that the packet is unavailable for repair either from the source or from a designated local repairer (DLR). Without ACKs and knowledge of group membership, however, PGM cannot eliminate this possibility. 1.2.2. Group Membership A second consequence of eliminating ACKs is that knowledge of group membership is neither required nor provided by the protocol. Although a source may receive some PGM packets (NAKs for instance) from some receivers, the identity of the receivers does not figure in the processing of those packets. Group membership MAY change during the course of a PGM transport session without the knowledge of or consequence to the source or the remaining receivers. 1.2.3. Efficiency While PGM avoids the implosion of positive acknowledgments simply by dispensing with ACKs, the implosion of negative acknowledgments is addressed directly. Receivers observe a random back-off prior to generating a NAK during which interval the NAK is suppressed (i.e. it is not sent, but the receiver acts as if it had sent it) by the receiver upon receipt of a matching NCF. In addition, PGM network elements eliminate duplicate NAKs received on different interfaces on the same network element.
The combination of these two strategies usually results in the source receiving just a single NAK for any given lost data packet. Whether a repair is provided from a DLR or the original source, it is important to constrain that repair to only those network segments containing members that negatively acknowledged the original transmission rather than propagating it throughout the group. PGM specifies procedures for network elements to use the pattern of NAKs to define a sub-tree within the group upon which to forward the corresponding repair so that it reaches only those receivers that missed it in the first place. 1.2.4. Simplicity PGM is designed to achieve the greatest improvement in reliability (as compared to the usual UDP) with the least complexity. As a result, PGM does NOT address conference control, global ordering amongst multiple sources in the group, nor recovery from network partitions. 1.2.5. Operability PGM is designed to function, albeit with less efficiency, even when some or all of the network elements in the multicast tree have no knowledge of PGM. To that end, all PGM data packets can be conventionally multicast routed by non-PGM network elements with no loss of functionality, but with some inefficiency in the propagation of RDATA and NCFs. In addition, since NAKs are unicast to the last-hop PGM network element and NCFs are multicast to the group, NAK/NCF operation is also consistent across non-PGM network elements. Note that for NAK suppression to be most effective, receivers should always have a PGM network element as a first hop network element between themselves and every path to every PGM source. If receivers are several hops removed from the first PGM network element, the efficacy of NAK suppression may degrade. 1.3. Options In addition to the basic data transfer operation described above, PGM specifies several end-to-end options to address specific application requirements. PGM specifies options to support fragmentation, late joining, redirection, Forward Error Correction (FEC), reachability, and session synchronization/termination/reset. Options MAY be appended to PGM data packet headers only by their original transmitters. While they MAY be interpreted by network elements, options are neither added nor removed by network elements.
All options are receiver-significant (i.e., they must be interpreted by receivers). Some options are also network-significant (i.e., they must be interpreted by network elements). Fragmentation MAY be used in conjunction with data packets to allow a transport-layer entity at the source to break up application-layer data packets into multiple PGM data packets to conform with the maximum transmission unit (MTU) supported by the network layer. Late joining allows a source to indicate whether or not receivers may request all available repairs when they initially join a particular transport session. Redirection MAY be used in conjunction with Poll Responses to allow a DLR to respond to normal NCFs or POLLs with a redirecting POLR advertising its own address as an alternative re-transmitter to the original source. FEC techniques MAY be applied by receivers to use source-provided parity packets rather than selective retransmissions to effect loss recovery. 2. Architectural Description As an end-to-end transport protocol, PGM specifies packet formats and procedures for sources to transmit and for receivers to receive data. To enhance the efficiency of this data transfer, PGM also specifies packet formats and procedures for network elements to improve the reliability of NAKs and to constrain the propagation of repairs. The division of these functions is described in this section and expanded in detail in the next section. 2.1. Source Functions Data Transmission Sources multicast ODATA packets to the group within the transmit window at a given transmit rate. Source Path State Sources multicast SPMs to the group, interleaved with ODATA if present, to establish source path state in PGM network elements.
NAK Reliability Sources multicast NCFs to the group in response to any NAKs they receive. Repairs Sources multicast RDATA packets to the group in response to NAKs received for data packets within the transmit window. Transmit Window Advance Sources MAY advance the trailing edge of the window according to one of a number of strategies. Implementations MAY support automatic adjustments such as keeping the window at a fixed size in bytes, a fixed number of packets or a fixed real time duration. In addition, they MAY optionally delay window advancement based on NAK-silence for a certain period. Some possible strategies are outlined later in this document. 2.2. Receiver Functions Source Path State Receivers use SPMs to determine the last-hop PGM network element for a given TSI to which to direct their NAKs. Data Reception Receivers receive ODATA within the transmit window and eliminate any duplicates. Repair Requests Receivers unicast NAKs to the last-hop PGM network element (and MAY optionally multicast a NAK with TTL of 1 to the local group) for data packets within the receive window detected to be missing from the expected sequence. A receiver MUST repeatedly transmit a given NAK until it receives a matching NCF. NAK Suppression Receivers suppress NAKs for which a matching NCF or NAK is received during the NAK transmit back-off interval.
Receive Window Advance Receivers immediately advance their receive windows upon receipt of any PGM data packet or SPM within the transmit window that advances the receive window. 2.3. Network Element Functions Network elements forward ODATA without intervention. Source Path State Network elements intercept SPMs and use them to establish source path state for the corresponding TSI before multicast forwarding them in the usual way. NAK Reliability Network elements multicast NCFs to the group in response to any NAK they receive. For each NAK received, network elements create repair state recording the transport session identifier, the sequence number of the NAK, and the input interface on which the NAK was received. Constrained NAK Forwarding Network elements repeatedly unicast forward only the first copy of any NAK they receive to the upstream PGM network element on the distribution path for the TSI until they receive an NCF in response. In addition, they MAY optionally multicast this NAK upstream with TTL of 1. Nota Bene: Once confirmed by an NCF, network elements discard NAK packets; NAKs are NOT retained in network elements beyond this forwarding operation, but state about the reception of them is stored. NAK Elimination Network elements discard exact duplicates of any NAK for which they already have repair state (i.e., that has been forwarded either by themselves or a neighboring PGM network element), and respond with a matching NCF.
Constrained RDATA Forwarding Network elements use NAKs to maintain repair state consisting of a list of interfaces upon which a given NAK was received, and they forward the corresponding RDATA only on these interfaces. NAK Anticipation If a network element hears an upstream NCF (i.e., on the upstream interface for the distribution tree for the TSI), it establishes repair state without outgoing interfaces in anticipation of responding to and eliminating duplicates of the NAK that may arrive from downstream.