RFC 3208

PGM Reliable Transport Protocol Specification

Pages: 111
Experimental
→ Errata

Part 1 of 5 – Pages 1 to 12

RFC3208 - Page 1

Network Working Group                                        T. Speakman
Request for Comments: 3208                                 Cisco Systems
Category: Experimental                                      J. Crowcroft
                                                                     UCL
                                                              J. Gemmell
                                                               Microsoft
                                                            D. Farinacci
                                                        Procket Networks
                                                                  S. Lin
                                                        Juniper Networks
                                                           D. Leshchiner
                                                          TIBCO Software
                                                                 M. Luby
                                                        Digital Fountain
                                                           T. Montgomery
                                                    Talarian Corporation
                                                                L. Rizzo
                                                      University of Pisa
                                                              A. Tweedly
                                                              N. Bhaskar
                                                           R. Edmonstone
                                                         R. Sumanasekera
                                                             L. Vicisano
                                                           Cisco Systems
                                                           December 2001


             PGM Reliable Transport Protocol Specification

Status of this Memo

   This memo defines an Experimental Protocol for the Internet
   community.  It does not specify an Internet standard of any kind.
   Discussion and suggestions for improvement are requested.
   Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

   Pragmatic General Multicast (PGM) is a reliable multicast transport
   protocol for applications that require ordered or unordered,
   duplicate-free, multicast data delivery from multiple sources to
   multiple receivers.  PGM guarantees that a receiver in the group
   either receives all data packets from transmissions and repairs, or
   is able to detect unrecoverable data packet loss.  PGM is

RFC3208 - Page 2

   specifically intended as a workable solution for multicast
   applications with basic reliability requirements.  Its central design
   goal is simplicity of operation with due regard for scalability and
   network efficiency.

Table of Contents

   1.  Introduction and Overview ..................................    3
   2.  Architectural Description ..................................    9
   3.  Terms and Concepts .........................................   12
   4.  Procedures - General .......................................   18
   5.  Procedures - Sources .......................................   19
   6.  Procedures - Receivers .....................................   22
   7.  Procedures - Network Elements ..............................   27
   8.  Packet Formats .............................................   31
   9.  Options ....................................................   40
   10. Security Considerations ....................................   56
   11. Appendix A - Forward Error Correction ......................   58
   12. Appendix B - Support for Congestion Control ................   72
   13. Appendix C - SPM Requests ..................................   79
   14. Appendix D - Poll Mechanism ................................   82
   15. Appendix E - Implosion Prevention ..........................   92
   16. Appendix F - Transmit Window Example .......................   98
   17  Appendix G - Applicability Statement .......................  103
   18. Abbreviations ..............................................  105
   19. Acknowledgments ............................................  106
   20. References .................................................  106
   21. Authors' Addresses..........................................  108
   22. Full Copyright Statement ...................................  111

Nota Bene:

   The publication of this specification is intended to freeze the
   definition of PGM in the interest of fostering both ongoing and
   prospective experimentation with the protocol.  The intent of that
   experimentation is to provide experience with the implementation and
   deployment of a reliable multicast protocol of this class so as to be
   able to feed that experience back into the longer-term
   standardization process underway in the Reliable Multicast Transport
   Working Group of the IETF.  Appendix G provides more specific detail
   on the scope and status of some of this experimentation.  Reports of
   experiments include [16-23].  Additional results and new
   experimentation are encouraged.

RFC3208 - Page 3

1.  Introduction and Overview

   A variety of reliable protocols have been proposed for multicast data
   delivery, each with an emphasis on particular types of applications,
   network characteristics, or definitions of reliability ([1], [2],
   [3], [4]).  In this tradition, Pragmatic General Multicast (PGM) is a
   reliable transport protocol for applications that require ordered or
   unordered, duplicate-free, multicast data delivery from multiple
   sources to multiple receivers.

   PGM is specifically intended as a workable solution for multicast
   applications with basic reliability requirements rather than as a
   comprehensive solution for multicast applications with sophisticated
   ordering, agreement, and robustness requirements.  Its central design
   goal is simplicity of operation with due regard for scalability and
   network efficiency.

   PGM has no notion of group membership.  It simply provides reliable
   multicast data delivery within a transmit window advanced by a source
   according to a purely local strategy.  Reliable delivery is provided
   within a source's transmit window from the time a receiver joins the
   group until it departs.  PGM guarantees that a receiver in the group
   either receives all data packets from transmissions and repairs, or
   is able to detect unrecoverable data packet loss.  PGM supports any
   number of sources within a multicast group, each fully identified by
   a globally unique Transport Session Identifier (TSI), but since these
   sources/sessions operate entirely independently of each other, this
   specification is phrased in terms of a single source and extends
   without modification to multiple sources.

   More specifically, PGM is not intended for use with applications that
   depend either upon acknowledged delivery to a known group of
   recipients, or upon total ordering amongst multiple sources.

   Rather, PGM is best suited to those applications in which members may
   join and leave at any time, and that are either insensitive to
   unrecoverable data packet loss or are prepared to resort to
   application recovery in the event.  Through its optional extensions,
   PGM provides specific mechanisms to support applications as disparate
   as stock and news updates, data conferencing, low-delay real-time
   video transfer, and bulk data transfer.

   In the following text, transport-layer originators of PGM data
   packets are referred to as sources, transport-layer consumers of PGM
   data packets are referred to as receivers, and network-layer entities
   in the intervening network are referred to as network elements.

RFC3208 - Page 4

   Unless otherwise specified, the term "repair" will be used to
   indicate both the actual retransmission of a copy of a missing packet
   or the transmission of an FEC repair packet.

Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [14] and
   indicate requirement levels for compliant PGM implementations.

1.1.  Summary of Operation

   PGM runs over a datagram multicast protocol such as IP multicast [5].
   In the normal course of data transfer, a source multicasts sequenced
   data packets (ODATA), and receivers unicast selective negative
   acknowledgments (NAKs) for data packets detected to be missing from
   the expected sequence.  Network elements forward NAKs PGM-hop-by-
   PGM-hop to the source, and confirm each hop by multicasting a NAK
   confirmation (NCF) in response on the interface on which the NAK was
   received.  Repairs (RDATA) may be provided either by the source
   itself or by a Designated Local Repairer (DLR) in response to a NAK.

   Since NAKs provide the sole mechanism for reliability, PGM is
   particularly sensitive to their loss.  To minimize NAK loss, PGM
   defines a network-layer hop-by-hop procedure for reliable NAK
   forwarding.

   Upon detection of a missing data packet, a receiver repeatedly
   unicasts a NAK to the last-hop PGM network element on the
   distribution tree from the source.  A receiver repeats this NAK until
   it receives a NAK confirmation (NCF) multicast to the group from that
   PGM network element.  That network element responds with an NCF to
   the first occurrence of the NAK and any further retransmissions of
   that same NAK from any receiver.  In turn, the network element
   repeatedly forwards the NAK to the upstream PGM network element on
   the reverse of the distribution path from the source of the original
   data packet until it also receives an NCF from that network element.
   Finally, the source itself receives and confirms the NAK by
   multicasting an NCF to the group.

   While NCFs are multicast to the group, they are not propagated by PGM
   network elements since they act as hop-by-hop confirmations.

RFC3208 - Page 5

   To avoid NAK implosion, PGM specifies procedures for subnet-based NAK
   suppression amongst receivers and NAK elimination within network
   elements.  The usual result is the propagation of just one copy of a
   given NAK along the reverse of the distribution path from any network
   with directly connected receivers to a source.

   The net effect is that unicast NAKs return from a receiver to a
   source on the reverse of the path on which ODATA was forwarded, that
   is, on the reverse of the distribution tree from the source.  More
   specifically, they return through exactly the same sequence of PGM
   network elements through which ODATA was forwarded, but in reverse.
   The reasons for handling NAKs this way will become clear in the
   discussion of constraining repairs, but first it's necessary to
   describe the mechanisms for establishing the requisite source path
   state in PGM network elements.

   To establish source path state in PGM network elements, the basic
   data transfer operation is augmented by Source Path Messages (SPMs)
   from a source, periodically interleaved with ODATA.  SPMs function
   primarily to establish source path state for a given TSI in all PGM
   network elements on the distribution tree from the source.  PGM
   network elements use this information to address returning unicast
   NAKs directly to the upstream PGM network element toward the source,
   and thereby insure that NAKs return from a receiver to a source on
   the reverse of the distribution path for the TSI.

   SPMs are sent by a source at a rate that serves to maintain up-to-
   date PGM neighbor information.  In addition, SPMs complement the role
   of DATA packets in provoking further NAKs from receivers, and
   maintaining receive window state in the receivers.

   As a further efficiency, PGM specifies procedures for the constraint
   of repairs by network elements so that they reach only those network
   segments containing group members that did not receive the original
   transmission.  As NAKs traverse the reverse of the ODATA path
   (upward), they establish repair state in the network elements which
   is used in turn to constrain the (downward) forwarding of the
   corresponding RDATA.

   Besides procedures for the source to provide repairs, PGM also
   specifies options and procedures that permit designated local
   repairers (DLRs) to announce their availability and to redirect
   repair requests (NAKs) to themselves rather than to the original
   source.  In addition to these conventional procedures for loss
   recovery through selective ARQ, Appendix A specifies Forward Error
   Correction (FEC) procedures for sources to provide and receivers to
   request general error correcting parity packets rather than selective
   retransmissions.

RFC3208 - Page 6

   Finally, since PGM operates without regular return traffic from
   receivers, conventional feedback mechanisms for transport flow and
   congestion control cannot be applied.  Appendix B specifies a TCP-
   friendly, NE-based solution for PGM congestion control, and cites a
   reference to a TCP-friendly, end-to-end solution for PGM congestion
   control.

   In its basic operation, PGM relies on a purely rate-limited
   transmission strategy in the source to bound the bandwidth consumed
   by PGM transport sessions and to define the transmit window
   maintained by the source.

   PGM defines four basic packet types:  three that flow downstream
   (SPMs, DATA, NCFs), and one that flows upstream (NAKs).

1.2.  Design Goals and Constraints

   PGM has been designed to serve that broad range of multicast
   applications that have relatively simple reliability requirements,
   and to do so in a way that realizes the much advertised but often
   unrealized network efficiencies of multicast data transfer.  The
   usual impediments to realizing these efficiencies are the implosion
   of negative and positive acknowledgments from receivers to sources,
   repair latency from the source, and the propagation of repairs to
   disinterested receivers.

1.2.1.  Reliability.

   Reliable data delivery across an unreliable network is conventionally
   achieved through an end-to-end protocol in which a source (implicitly
   or explicitly) solicits receipt confirmation from a receiver, and the
   receiver responds positively or negatively.  While the frequency of
   negative acknowledgments is a function of the reliability of the
   network and the receiver's resources (and so, potentially quite low),
   the frequency of positive acknowledgments is fixed at at least the
   rate at which the transmit window is advanced, and usually more
   often.

   Negative acknowledgments primarily determine repairs and reliability.
   Positive acknowledgments primarily determine transmit buffer
   management.

   When these principles are extended without modification to multicast
   protocols, the result, at least for positive acknowledgments, is a
   burden of positive acknowledgments transmitted to the source that
   quickly threatens to overwhelm it as the number of receivers grows.
   More succinctly, ACK implosion keeps ACK-based reliable multicast
   protocols from scaling well.

RFC3208 - Page 7

   One of the goals of PGM is to get as strong a definition of
   reliability as possible from as simple a protocol as possible.  ACK
   implosion can be addressed in a variety of effective but complicated
   ways, most of which require re-transmit capability from other than
   the original source.

   An alternative is to dispense with positive acknowledgments
   altogether, and to resort to other strategies for buffer management
   while retaining negative acknowledgments for repairs and reliability.
   The approach taken in PGM is to retain negative acknowledgments, but
   to dispense with positive acknowledgments and resort instead to
   timeouts at the source to manage transmit resources.

   The definition of reliability with PGM is a direct consequence of
   this design decision.  PGM guarantees that a receiver either receives
   all data packets from transmissions and repairs, or is able to detect
   unrecoverable data packet loss.

   PGM includes strategies for repeatedly provoking NAKs from receivers,
   and for adding reliability to the NAKs themselves.  By reinforcing
   the NAK mechanism, PGM minimizes the probability that a receiver will
   detect a missing data packet so late that the packet is unavailable
   for repair either from the source or from a designated local repairer
   (DLR).  Without ACKs and knowledge of group membership, however, PGM
   cannot eliminate this possibility.

1.2.2.  Group Membership

   A second consequence of eliminating ACKs is that knowledge of group
   membership is neither required nor provided by the protocol.
   Although a source may receive some PGM packets (NAKs for instance)
   from some receivers, the identity of the receivers does not figure in
   the processing of those packets.  Group membership MAY change during
   the course of a PGM transport session without the knowledge of or
   consequence to the source or the remaining receivers.

1.2.3.  Efficiency

   While PGM avoids the implosion of positive acknowledgments simply by
   dispensing with ACKs, the implosion of negative acknowledgments is
   addressed directly.

   Receivers observe a random back-off prior to generating a NAK during
   which interval the NAK is suppressed (i.e. it is not sent, but the
   receiver acts as if it had sent it) by the receiver upon receipt of a
   matching NCF.  In addition, PGM network elements eliminate duplicate
   NAKs received on different interfaces on the same network element.

RFC3208 - Page 8

   The combination of these two strategies usually results in the source
   receiving just a single NAK for any given lost data packet.

   Whether a repair is provided from a DLR or the original source, it is
   important to constrain that repair to only those network segments
   containing members that negatively acknowledged the original
   transmission rather than propagating it throughout the group.  PGM
   specifies procedures for network elements to use the pattern of NAKs
   to define a sub-tree within the group upon which to forward the
   corresponding repair so that it reaches only those receivers that
   missed it in the first place.

1.2.4.  Simplicity

   PGM is designed to achieve the greatest improvement in reliability
   (as compared to the usual UDP) with the least complexity.  As a
   result, PGM does NOT address conference control, global ordering
   amongst multiple sources in the group, nor recovery from network
   partitions.

1.2.5.  Operability

   PGM is designed to function, albeit with less efficiency, even when
   some or all of the network elements in the multicast tree have no
   knowledge of PGM.  To that end, all PGM data packets can be
   conventionally multicast routed by non-PGM network elements with no
   loss of functionality, but with some inefficiency in the propagation
   of RDATA and NCFs.

   In addition, since NAKs are unicast to the last-hop PGM network
   element and NCFs are multicast to the group, NAK/NCF operation is
   also consistent across non-PGM network elements.  Note that for NAK
   suppression to be most effective, receivers should always have a PGM
   network element as a first hop network element between themselves and
   every path to every PGM source.  If receivers are several hops
   removed from the first PGM network element, the efficacy of NAK
   suppression may degrade.

1.3.  Options

   In addition to the basic data transfer operation described above, PGM
   specifies several end-to-end options to address specific application
   requirements.  PGM specifies options to support fragmentation, late
   joining, redirection, Forward Error Correction (FEC), reachability,
   and session synchronization/termination/reset.  Options MAY be
   appended to PGM data packet headers only by their original
   transmitters.  While they MAY be interpreted by network elements,
   options are neither added nor removed by network elements.

RFC3208 - Page 9

   All options are receiver-significant (i.e., they must be interpreted
   by receivers).  Some options are also network-significant (i.e., they
   must be interpreted by network elements).

   Fragmentation MAY be used in conjunction with data packets to allow a
   transport-layer entity at the source to break up application-layer
   data packets into multiple PGM data packets to conform with the
   maximum transmission unit (MTU) supported by the network layer.

   Late joining allows a source to indicate whether or not receivers may
   request all available repairs when they initially join a particular
   transport session.

   Redirection MAY be used in conjunction with Poll Responses to allow a
   DLR to respond to normal NCFs or POLLs with a redirecting POLR
   advertising its own address as an alternative re-transmitter to the
   original source.

   FEC techniques MAY be applied by receivers to use source-provided
   parity packets rather than selective retransmissions to effect loss
   recovery.

2.  Architectural Description

   As an end-to-end transport protocol, PGM specifies packet formats and
   procedures for sources to transmit and for receivers to receive data.
   To enhance the efficiency of this data transfer, PGM also specifies
   packet formats and procedures for network elements to improve the
   reliability of NAKs and to constrain the propagation of repairs.  The
   division of these functions is described in this section and expanded
   in detail in the next section.

2.1.  Source Functions

      Data Transmission

         Sources multicast ODATA packets to the group within the
         transmit window at a given transmit rate.

      Source Path State

         Sources multicast SPMs to the group, interleaved with ODATA if
         present, to establish source path state in PGM network
         elements.

RFC3208 - Page 10

      NAK Reliability

         Sources multicast NCFs to the group in response to any NAKs
         they receive.

      Repairs

         Sources multicast RDATA packets to the group in response to
         NAKs received for data packets within the transmit window.

      Transmit Window Advance

         Sources MAY advance the trailing edge of the window according
         to one of a number of strategies.  Implementations MAY support
         automatic adjustments such as keeping the window at a fixed
         size in bytes, a fixed number of packets or a fixed real time
         duration.  In addition, they MAY optionally delay window
         advancement based on NAK-silence for a certain period.  Some
         possible strategies are outlined later in this document.

2.2.  Receiver Functions

      Source Path State

         Receivers use SPMs to determine the last-hop PGM network
         element for a given TSI to which to direct their NAKs.

      Data Reception

         Receivers receive ODATA within the transmit window and
         eliminate any duplicates.

      Repair Requests

         Receivers unicast NAKs to the last-hop PGM network element (and
         MAY optionally multicast a NAK with TTL of 1 to the local
         group) for data packets within the receive window detected to
         be missing from the expected sequence.  A receiver MUST
         repeatedly transmit a given NAK until it receives a matching
         NCF.

      NAK Suppression

         Receivers suppress NAKs for which a matching NCF or NAK is
         received during the NAK transmit back-off interval.

RFC3208 - Page 11

      Receive Window Advance

         Receivers immediately advance their receive windows upon
         receipt of any PGM data packet or SPM within the transmit
         window that advances the receive window.

2.3.  Network Element Functions

      Network elements forward ODATA without intervention.

      Source Path State

         Network elements intercept SPMs and use them to establish
         source path state for the corresponding TSI before multicast
         forwarding them in the usual way.

      NAK Reliability

         Network elements multicast NCFs to the group in response to any
         NAK they receive.  For each NAK received, network elements
         create repair state recording the transport session identifier,
         the sequence number of the NAK, and the input interface on
         which the NAK was received.

      Constrained NAK Forwarding

         Network elements repeatedly unicast forward only the first copy
         of any NAK they receive to the upstream PGM network element on
         the distribution path for the TSI until they receive an NCF in
         response.  In addition, they MAY optionally multicast this NAK
         upstream with TTL of 1.

      Nota Bene: Once confirmed by an NCF, network elements discard NAK
      packets; NAKs are NOT retained in network elements beyond this
      forwarding operation, but state about the reception of them is
      stored.

      NAK Elimination

         Network elements discard exact duplicates of any NAK for which
         they already have repair state (i.e., that has been forwarded
         either by themselves or a neighboring PGM network element), and
         respond with a matching NCF.

RFC3208 - Page 12

      Constrained RDATA Forwarding

         Network elements use NAKs to maintain repair state consisting
         of a list of interfaces upon which a given NAK was received,
         and they forward the corresponding RDATA only on these
         interfaces.

      NAK Anticipation

         If a network element hears an upstream NCF (i.e., on the
         upstream interface for the distribution tree for the TSI), it
         establishes repair state without outgoing interfaces in
         anticipation of responding to and eliminating duplicates of the
         NAK that may arrive from downstream.

(page 12 continued on part 2)