RFC 3208

PGM Reliable Transport Protocol Specification

Pages: 111
Experimental
→ Errata

Part 4 of 5 – Pages 72 to 103

RFC3208 - Page 72 prevText

12.  Appendix B - Support for Congestion Control

12.1.  Introduction

   A source MUST implement strategies for congestion avoidance, aimed at
   providing overall network stability, fairness among competing PGM
   flows, and some degree of fairness towards coexisting TCP flows [13].
   In order to do this, the source must be provided with feedback on the
   status of the network in terms of traffic load.  This appendix
   specifies NE procedures that provide such feedback to the source in a
   scalable way.  (An alternative TCP-friendly scheme for congestion
   control that does not require NE support can be found in [16]).

   The procedures specified in this section enable the collection and
   selective forwarding of three types of feedback to the source:

      o Worst link load as measured in network elements.

      o Worst end-to-end path load as measured in network elements.

      o Worst end-to-end path load as reported by receivers.

RFC3208 - Page 73

   This specification defines in detail NE procedures, receivers
   procedures and packet formats.  It also defines basic procedures in
   receivers for generating congestion reports.  This specification does
   not define the procedures used by PGM sources to adapt their
   transmission rates in response of congestion reports.  Those
   procedures depend upon the specific congestion control scheme.

   PGM defines a header option that PGM receivers may append to NAKs
   (OPT_CR).  OPT_CR carries congestion reports in NAKs that propagate
   upstream towards the source.

   During the process of hop-by-hop reverse NAK forwarding, NEs examine
   OPT_CR and possibly modify its contents prior to forwarding the NAK
   upstream.  Forwarding CRs also has the side effect of creating
   congestion report state in the NE.  The presence of OPT_CR and its
   contents also influences the normal NAK suppression rules.  Both the
   modification performed on the congestion report and the additional
   suppression rules depend on the content of the congestion report and
   on the congestion report state recorded in the NE as detailed below.

   OPT_CR contains the following fields:

   OPT_CR_NE_WL   Reports the load in the worst link as detected though
                  NE internal measurements

   OPT_CR_NE_WP   Reports the load in the worst end-to-end path as
                  detected though NE internal measurements

   OPT_CR_RX_WP   Reports the load in the worst end-to-end path as
                  detected by receivers

   A load report is either a packet drop rate (as measured at an NE's
   interfaces) or a packet loss rate (as measured in receivers).  Its
   value is linearly encoded in the range 0-0xFFFF, where 0xFFFF
   represents a 100% loss/drop rate.  Receivers that send a NAK bearing
   OPT_CR determine which of the three report fields are being reported.

   OPT_CR also contains the following fields:

   OPT_CR_NEL     A bit indicating that OPT_CR_NE_WL is being reported.

   OPT_CR_NEP     A bit indicating that OPT_CR_NE_WP is being reported.

   OPT_CR_RXP     A bit indicating that OPT_CR_RX_WP is being reported.

RFC3208 - Page 74

   OPT_CR_LEAD    A SQN in the ODATA space that serves as a temporal
                  reference for the load report values.  This is
                  initialized by receivers with the leading edge of the
                  transmit window as known at the moment of transmitting
                  the NAK.  This value MAY be advanced in NEs that
                  modify the content of OPT_CR.

   OPT_CR_RCVR    The identity of the receiver that generated the worst
                  OPT_CR_RX_WP.

   The complete format of the option is specified later.

12.2.  NE-Based Worst Link Report

   To permit network elements to report worst link, receivers append
   OPT_CR to a NAK with bit OPT_CR_NEL set and OPT_CR_NE_WL set to zero.
   NEs receiving NAKs that contain OPT_CR_NE_WL process the option and
   update per-TSI state related to it as described below.  The ultimate
   result of the NEs' actions ensures that when a NAK leaves a sub-tree,
   OPT_CR_NE_WL contains a congestion report that reflects the load of
   the worst link in that sub-tree.  To achieve this, NEs rewrite
   OPT_CR_NE_WL with the worst value among the loads measured on the
   local (outgoing) links for the session and the congestion reports
   received from those links.

   Note that the mechanism described in this sub-section does not permit
   the monitoring of the load on (outgoing) links at non-PGM-capable
   multicast routers.  For this reason, NE-Based Worst Link Reports
   SHOULD be used in pure PGM topologies only.  Otherwise, this
   mechanism might fail in detecting congestion.  To overcome this
   limitation PGM sources MAY use a heuristic that combines NE-Based
   Worst Link Reports and Receiver-Based Reports.

12.3.  NE-Based Worst Path Report

   To permit network elements to report a worst path, receivers append
   OPT_CR to a NAK with bit OPT_CR_NEP set and OPT_CR_NE_WP set to zero.
   The processing of this field is similar to that of OPT_CR_NE_WL with
   the difference that, on the reception of a NAK, the value of
   OPT_CR_NE_WP is adjusted with the load measured on the interface on
   which the NAK was received according to the following formula:

   OPT_CR_NE_WP = if_load + OPT_CR_NE_WP * (100% - if_loss_rate)

   The worst among the adjusted OPT_CR_NE_WP is then written in the
   outgoing NAK.  This results in a hop-by-hop accumulation of link loss
   rates into a path loss rate.

RFC3208 - Page 75

   As with OPT_CR_NE_WL, the congestion report in OPT_CR_NE_WP may be
   invalid if the multicast distribution tree includes non-PGM-capable
   routers.

12.4.  Receiver-Based Worst Report

   To report a packet loss rate, receivers append OPT_CR to a NAK with
   bit OPT_CR_RXP set and OPT_CR_RX_WP set to the packet loss rate.  NEs
   receiving NAKs that contain OPT_CR_RX_WP process the option and
   update per-TSI state related to it as described below.  The ultimate
   result of the NEs' actions ensures that when a NAK leaves a sub-tree,
   OPT_CR_RX_WP contains a congestion report that reflects the load of
   the worst receiver in that sub-tree.  To achieve this, NEs rewrite
   OTP_CR_RE_WP with the worst value among the congestion reports
   received on its outgoing links for the session.  In addition to this,
   OPT_CR_RCVR MUST contain the NLA of the receiver that originally
   measured the value of OTP_CR_RE_WP being forwarded.

12.5.  Procedures - Receivers

   To enable the generation of any type of congestion report, receivers
   MUST insert OPT_CR in each NAK they generate and provide the
   corresponding field (OPT_CR_NE_WL, OPT_CR_NE_WP, OPT_CR_RX_WP).  The
   specific fields to be reported will be advertised to receivers in
   OPT_CRQST on the session's SPMs.  Receivers MUST provide only those
   options requested in OPT_CRQST.

   Receivers MUST initialize OPT_CR_NE_WL and OPT_CR_NE_WP to 0 and they
   MUST initialize OPT_CR_RCVR to their NLA.  At the moment of sending
   the NAK, they MUST also initialize OPT_CR_LEAD to the leading edge of
   the transmission window.

   Additionally, if a receiver generates a NAK with OPT_CR with
   OPT_CR_RX_WP, it MUST initialize OPT_CR_RX_WP to the proper value,
   internally computed.

12.6.  Procedures - Network Elements

   Network elements start processing each OPT_CR by selecting a
   reference SQN in the ODATA space.  The reference SQN selected is the
   highest SQN known to the NE.  Usually this is OPT_CR_LEAD contained
   in the NAK received.

   They use the selected SQN to age the value of load measurement as
   follows:

      o  locally measured load values (e.g. interface loads) are
         considered up-to-date

RFC3208 - Page 76

      o  load values carried in OPT_CR are considered up-to-date and are
         not aged so as to be independent of variance in round-trip
         times from the network element to the receivers

      o  old load values recorded in the NE are exponentially aged
         according to the difference between the selected reference SQN
         and the reference SQN associated with the old load value.

   The exponential aging is computed so that a recorded value gets
   scaled down by a factor exp(-1/2) each time the expected inter-NAK
   time elapses.  Hence the aging formula must include the current loss
   rate as follows:

      aged_loss_rate = loss_rate * exp( - SQN_difference * loss_rate /
      2)

   Note that the quantity 1/loss_rate is the expected SQN_lag between
   two NAKs, hence the formula above can also be read as:

      aged_loss_rate = loss_rate * exp( - 1/2 * SQN_difference /
      SQN_lag)

   which equates to (loss_rate * exp(-1/2)) when the SQN_difference is
   equal to expected SQN_lag between two NAKs.

   All the subsequent computations refer to the aged load values.

   Network elements process OPT_CR by handling the three possible types
   of congestion reports independently.

   For each congestion report in an incoming NAK, a new value is
   computed to be used in the outgoing NAK:

      o  The new value for OPT_CR_NE_WL is the maximum of the load
         measured on the outgoing interfaces for the session, the value
         of OPT_CR_NE_WL of the incoming NAK, and the value previously
         sent upstream (recorded in the NE).  All these values are as
         obtained after the aging process.

      o  The new value for OPT_CR_NE_WP is the maximum of the value
         previously sent upstream (after aging) and the value of
         OPT_CR_NE_WP in the incoming NAK adjusted with the load on the
         interface upon which the NAK was received (as described above).

      o  The new value for OPT_CR_RX_WP is the maximum of the value
         previously sent upstream (after aging) and the value of
         OPT_CR_RX_WP in the incoming NAK.

RFC3208 - Page 77

      o  If OPT_CR_RX_WP was selected from the incoming NAK, the new
         value for OPT_CR_RCVR is the one in the incoming NAK.
         Otherwise it is the value previously sent upstream.

      o  The new value for OPT_CR_LEAD is the reference SQN selected for
         the aging procedure.

12.6.1.  Overriding Normal Suppression Rules

   Normal suppression rules hold to determine if a NAK should be
   forwarded upstream or not.  However if any of the outgoing congestion
   reports has changed by more than 5% relative to the one previously
   sent upstream, this new NAK is not suppressed.

12.6.2.  Link Load Measurement

   PGM routers monitor the load on all their outgoing links and record
   it in the form of per-interface loss rate statistics. "load" MUST be
   interpreted as the percentage of the packets meant to be forwarded on
   the interface that were dropped.  Load statistics refer to the
   aggregate traffic on the links and not to PGM traffic only.

   This document does not specify the algorithm to be used to collect
   such statistics, but requires that such algorithm provide both
   accuracy and responsiveness in the measurement process.  As far as
   accuracy is concerned, the expected measurement error SHOULD be
   upper-limited (e.g. less than than 10%).  As far as responsiveness is
   concerned, the measured load SHOULD converge to the actual value in a
   limited time (e.g. converge to 90% of the actual value in less than
   200 milliseconds), when the instantaneous offered load changes.
   Whenever both requirements cannot be met at the same time, accuracy
   SHOULD be traded for responsiveness.

RFC3208 - Page 78

12.7.  Packet Formats

12.7.1.  OPT_CR - Packet Extension Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E| Option Type | Option Length |Reserved |F|OPX|U|        L P R|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                Congestion Report Reference SQN                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        NE Worst Link          |        NE Worst Path          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Rcvr Worst Path         |          Reserved             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            NLA AFI            |          Reserved             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Worst Receiver's NLA                ...   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+

   Option Type = 0x10

   Option Length = 20 octets + NLA length

      L OPT_CR_NEL bit : set indicates OPT_CR_NE_WL is being reported

      P OPT_CR_NEP bit : set indicates OPT_CR_NE_WP is being reported

      R OPT_CR_RXP bit : set indicates OPT_CR_RX_WP is being reported

   Congestion Report Reference SQN (OPT_CR_LEAD).

      A SQN in the ODATA space that serves as a temporal reference point
      for the load report values.

   NE Worst Link (OPT_CR_NE_WL).

      Reports the load in the worst link as detected though NE internal
      measurements

   NE Worst Path (OPT_CR_NE_WP).

      Reports the load in the worst end-to-end path as detected though
      NE internal measurements

RFC3208 - Page 79

   Rcvr Worst Path (OPT_CR_RX_WP).

      Reports the load in the worst end-to-end path as detected by
      receivers

   Worst Receiver's NLA (OPT_CR_RCVR).

      The unicast address of the receiver that generated the worst
      OPT_CR_RX_WP.

   OPT_CR MAY be appended only to NAKs.

   OPT-CR is network-significant.

12.7.2.  OPT_CRQST - Packet Extension Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E| Option Type | Option Length |Reserved |F|OPX|U|        L P R|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Option Type = 0x11

   Option Length = 4 octets

      L OPT_CRQST_NEL bit : set indicates OPT_CR_NE_WL is being
      requested

      P OPT_CRQST_NEP bit : set indicates OPT_CR_NE_WP is being
      requested

      R OPT_CRQST_RXP bit : set indicates OPT_CR_RX_WP is being
      requested

   OPT_CRQST MAY be appended only to SPMs.

   OPT-CRQST is network-significant.

13.  Appendix C - SPM Requests

13.1.  Introduction

   SPM Requests (SPMRs) MAY be used to solicit an SPM from a source in a
   non-implosive way.  The typical application is for late-joining
   receivers to solicit SPMs directly from a source in order to be able
   to NAK for missing packets without having to wait for a regularly
   scheduled SPM from that source.

RFC3208 - Page 80

13.2.  Overview

   Allowing for SPMR implosion protection procedures, a receiver MAY
   unicast an SPMR to a source to solicit the most current session,
   window, and path state from that source any time after the receiver
   has joined the group.  A receiver may learn the TSI and source to
   which to direct the SPMR from any other PGM packet it receives in the
   group, or by any other means such as from local configuration or
   directory services.  The receiver MUST use the usual SPM procedures
   to glean the unicast address to which it should direct its NAKs from
   the solicited SPM.

13.3.  Packet Contents

   This section just provides enough short-hand to make the Procedures
   intelligible.  For the full details of packet contents, please refer
   to Packet Formats below.

13.3.1.  SPM Requests

   SPMRs are transmitted by receivers to solicit SPMs from a source.

   SPMs are unicast to a source and contain:

   SPMR_TSI       the source-assigned TSI for the session to which the
                  SPMR corresponds

13.4.  Procedures - Sources

   A source MUST respond immediately to an SPMR with the corresponding
   SPM rate limited to once per IHB_MIN per TSI.  The corresponding SPM
   matches SPM_TSI to SPMR_TSI and SPM_DPORT to SPMR_DPORT.

13.5.  Procedures - Receivers

   To moderate the potentially implosive behavior of SPMRs at least on a
   densely populated subnet, receivers MUST use the following back-off
   and suppression procedure based on multicasting the SPMR with a TTL
   of 1 ahead of and in addition to unicasting the SPMR to the source.
   The role of the multicast SPMR is to suppress the transmission of
   identical SPMRs from the subnet.

   More specifically, before unicasting a given SPMR, receivers MUST
   choose a random delay on SPMR_BO_IVL (~250 msecs) during which they
   listen for a multicast of an identical SPMR.  If a receiver does not
   see a matching multicast SPMR within its chosen random interval, it
   MUST first multicast its own SPMR to the group with a TTL of 1 before
   then unicasting its own SPMR to the source.  If a receiver does see a

RFC3208 - Page 81

   matching multicast SPMR within its chosen random interval, it MUST
   refrain from unicasting its SPMR and wait instead for the
   corresponding SPM.

   In addition, receipt of the corresponding SPM within this random
   interval SHOULD cancel transmission of an SPMR.

   In either case, the receiver MUST wait at least SPMR_SPM_IVL before
   attempting to repeat the SPMR by choosing another delay on
   SPMR_BO_IVL and repeating the procedure above.

   The corresponding SPMR matches SPMR_TSI to SPMR_TSI and SPMR_DPORT to
   SPMR_DPORT.  The corresponding SPM matches SPM_TSI to SPMR_TSI and
   SPM_DPORT to SPMR_DPORT.

13.6.  SPM Requests

      SPMR:

         SPM Requests are sent by receivers to request the immediate
         transmission of an SPM for the given TSI from a source.

   The network-header source address of an SPMR is the unicast NLA of
   the entity that originates the SPMR.

   The network-header destination address of an SPMR is the unicast NLA
   of the source from which the corresponding SPM is requested.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Source Port           |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Type     |    Options    |           Checksum            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Global Source ID                   ... |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...    Global Source ID       |           TSDU Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Option Extensions when present ...
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ...

   Source Port:

      SPMR_SPORT

      Data-Destination Port

RFC3208 - Page 82

   Destination Port:

      SPMR_DPORT

      Data-Source Port, together with Global Source ID forms SPMR_TSI

   Type:

      SPMR_TYPE =  0x0C

   Global Source ID:

      SPMR_GSI

      Together with Source Port forms

         SPMR_TSI

14.  Appendix D - Poll Mechanism

14.1.  Introduction

      These procedures provide PGM network elements and sources with the
      ability to poll their downstream PGM neighbors to solicit replies
      in an implosion-controlled way.

      Both general polls and specific polls are possible.  The former
      provide a PGM (parent) node with a way to check if there are any
      PGM (children) nodes connected to it, both network elements and
      receivers, and to estimate their number.  The latter may be used
      by PGM parent nodes to search for nodes with specific properties
      among its PGM children.  An example of application for this is DLR
      discovery.

      Polling is implemented using two additional PGM packets:

   POLL  a Poll Request that PGM parent nodes multicast to the group to
         perform the poll.  Similarly to NCFs, POLL packets stop at the
         first PGM node they reach, as they are not forwarded by PGM
         network elements.

   POLR a Poll Response that PGM children nodes (either network elements
         or receivers) use to reply to a Poll Request by addressing it
         to the NLA of the interface from which the triggering POLL was
         sent.

RFC3208 - Page 83

   The polling mechanism dictates that PGM children nodes that receive a
   POLL packet reply to it only if certain conditions are satisfied and
   ignore the POLL otherwise.  Two types of condition are possible: a
   random condition that defines a probability of replying for the
   polled child, and a deterministic condition.  Both the random
   condition and the deterministic condition are controlled by the
   polling PGM parent node by specifying the probability of replying and
   defining the deterministic condition(s) respectively.  Random-only
   poll, deterministic-only poll or a combination of the two are
   possible.

   The random condition in polls allows the prevention of implosion of
   replies by controlling their number.  Given a probability of replying
   P and assuming that each receiver makes an independent decision, the
   number of expected replies to a poll is P*N where N is the number of
   PGM children relative to the polling PGM parent.  The polling node
   can control the number of expected replies by specifying P in the
   POLL packet.

14.2.  Packet Contents

   This section just provides enough short-hand to make the Procedures
   intelligible.  For the full details of packet contents, please refer
   to Packet Formats below.

14.2.1.  POLL (Poll Request)

   POLL_SQN       a sequence number assigned sequentially by the polling
                  parent in unit increments and scoped by POLL_PATH and
                  the TSI of the session.

   POLL_ROUND     a poll round sequence number.  Multiple poll rounds
                  are possible within a POLL_SQN.

   POLL_S_TYPE    the sub-type of the poll request

   POLL_PATH      the network-layer address (NLA) of the interface on
                  the PGM network element or source on which the POLL is
                  transmitted

   POLL_BO_IVL    the back-off interval that MUST be used to compute the
                  random back-off time to wait before sending the
                  response to a poll.  POLL_BO_IVL is expressed in
                  microseconds.

   POLL_RAND      a random string used to implement the randomness in
                  replying

RFC3208 - Page 84

   POLL_MASK      a bit-mask used to determine the probability of random
                  replies

   Poll request MAY also contain options which specify deterministic
   conditions for the reply.  No options are currently defined.

14.2.2.  POLR (Poll Response)

   POLR_SQN       POLL_SQN of the poll request for which this is a reply

   POLR_ROUND     POLL_ROUND of the poll request for which this is a
                  reply

   Poll response MAY also contain options.  No options are currently
   defined.

14.3.  Procedures - General

14.3.1.  General Polls

   General Polls may be used to check for and count PGM children that
   are 1 PGM hop downstream of an interface of a given node.  They have
   POLL_S_TYPE equal to PGM_POLL_GENERAL.  PGM children that receive a
   general poll decide whether to reply to it only based on the random
   condition present in the POLL.

   To prevent response implosion, PGM parents that initiate a general
   poll SHOULD establish the probability of replying to the poll, P, so
   that the expected number of replies is contained.  The expected
   number of replies is N * P, where N is the number of children.  To be
   able to compute this number, PGM parents SHOULD already have a rough
   estimate of the number of children.  If they do not have a recent
   estimate of this number, they SHOULD send the first poll with a very
   low probability of replying and increase it in subsequent polls in
   order to get the desired number of replies.

   To prevent poll-response implosion caused by a sudden increase in the
   children population occurring between two consecutive polls with
   increasing probability of replying, PGM parents SHOULD use poll
   rounds.  Poll rounds allow PGM parents to "freeze" the size of the
   children population when they start a poll and to maintain it
   constant across multiple polls (with the same POLL_SQN but different
   POLL_ROUND).  This works by allowing PGM children to respond to a
   poll only if its POLL_ROUND is zero or if they have previously
   received a poll with the same POLL_SQN and POLL_ROUND equal to zero.

RFC3208 - Page 85

   In addition to this PGM children MUST observe a random back-off in
   replying to a poll.  This spreads out the replies in time and allows
   a PGM parent to abort the poll if too many replies are being
   received.  To abort an ongoing poll a PGM parent MUST initiate
   another poll with different POLL_SQN.  PGM children that receive a
   POLL MUST cancel any pending reply for POLLs with POLL_SQN different
   from the one of the last POLL received.

   For a given poll with probability of replying P, a PGM parent
   estimates the number of children as M / P, where M is the number of
   responses received.  PGM parents SHOULD keep polling periodically and
   use some average of the result of recent polls as their estimate for
   the number of children.

14.3.2.  Specific Polls

   Specific polls provide a way to search for PGM children that comply
   to specific requisites.  As an example specific poll could be used to
   search for down-stream DLRs.  A specific poll is characterized by a
   POLL_S_TYPE different from PGM_POLL_GENERAL.  PGM children decide
   whether to reply to a specific poll or not based on the POLL_S_TYPE,
   on the random condition and on options possibly present in the POLL.
   The way options should be interpreted is defined by POLL_S_TYPE.  The
   random condition MUST be interpreted as an additional condition to be
   satisfied.  To disable the random condition PGM parents MUST specify
   a probability of replying P equal to 1.

   PGM children MUST ignore a POLL packet if they do not understand
   POLL_S_TYPE.  Some specific POLL_S_TYPE may also require that the
   children ignore a POLL if they do not fully understand all the PGM
   options present in the packet.

14.4.  Procedures - PGM Parents (Sources or Network Elements)

   A PGM parent (source or network element), that wants to poll the
   first PGM-hop children connected to one of its outgoing interfaces
   MUST send a POLL packet on that interface with:

   POLL_SQN       equal to POLL_SQN of the last POLL sent incremented by
                  one.  If poll rounds are used, this must be equal to
                  POLL_SQN of the last group of rounds incremented by
                  one.

   POLL_ROUND     The round of the poll.  If the poll has a single
                  round, this must be zero.  If the poll has multiple
                  rounds, this must be one plus the last POLL_ROUND for
                  the same POLL_SQN, or zero if this is the first round
                  within this POLL_SQN.

RFC3208 - Page 86

   POLL_S_TYPE    the type of the poll.  For general poll use
                  PGM_POLL_GENERAL

   POLL_PATH      set to the NLA of the outgoing interface

   POLL_BO_IVL    set to the wanted reply back-off interval.  As far as
                  the choice of this is concerned, using NAK_BO_IVL is
                  usually a conservative option, however a smaller value
                  MAY be used, if the number of expected replies can be
                  determined with a good confidence or if a
                  conservatively low probability of reply (P) is being
                  used (see POLL_MASK next).  When the number of
                  expected replies is unknown, a large POLL_BO_IVL
                  SHOULD be used, so that the poll can be effectively
                  aborted if the number of replies being received is too
                  large.

   POLL_RAND      MUST be a random string re-computed each time a new
                  poll is sent on a given interface

   POLL_MASK      determines the probability of replying, P,  according
                  to the relationship P = 1 / ( 2 ^ B ), where B is the
                  number of bits set in POLL_MASK [15].  If this is a
                  deterministic poll, B MUST be 0, i.e. POLL_MASK MUST
                  be a all-zeroes bit-mask.

      Nota Bene: POLLs transmitted by network elements MUST bear the
      ODATA source's network-header source address, not the network
      element's NLA.  POLLs MUST also be transmitted with the IP

      Router Alert Option [6], to be allow PGM network element to
      intercept them.

   A PGM parent that has started a poll by sending a POLL packet SHOULD
   wait at least POLL_BO_IVL before starting another poll.  During this
   interval it SHOULD collect all the valid response (the one with
   POLR_SQN and POLR_ROUND matching with the outstanding poll) and
   process them at the end of the collection interval.

   A PGM parent SHOULD observe the rules mentioned in the description of
   general procedures, to prevent implosion of response.  These rules
   should in general be observed both for generic polls and specific
   polls.  The latter however can be performed using deterministic poll
   (with no implosion prevention) if the expected number of replies is
   known to be small.  A PGM parent that issue a generic poll with the
   intent of estimating the children population size SHOULD use poll
   rounds to "freeze" the children that are involved in the measure
   process.  This allows the sender to "open the door wider" across

RFC3208 - Page 87

   subsequent rounds (by increasing the probability of response),
   without fear of being flooded by late joiners.  Note the use of
   rounds has the drawback of introducing additional delay in the
   estimate of the population size, as the estimate obtained at the end
   of a round-group refers to the condition present at the time of the
   first round.

   A PGM parent that has started a poll SHOULD monitor the number of
   replies during the collection phase.  If this become too large, the
   PGM parent SHOULD abort the poll by immediately starting a new poll
   (different POLL_SQN) and specifying a very low probability of
   replying.


   When polling is being used to estimate the receiver population for
   the purpose of calculating NAK_BO_IVL, OPT_NAK_BO_IVL (see 16.4.1
   below) MUST be appended to SPMs, MAY be appended to NCFs and POLLs,
   and in all cases MUST have NAK_BO_IVL_SQN set to POLL_SQN of the most
   recent complete round of polls, and MUST bear that round's
   corresponding derived value of NAK_BAK_IVL.  In this way,
   OPT_NAK_BO_IVL provides a current value for NAK_BO_IVL at the same
   time as information is being gathered for the calculation of a future
   value of NAK_BO_IVL.

14.5.  Procedures - PGM Children (Receivers or Network Elements)

   PGM receivers and network elements MUST compute a 32-bit random node
   identifier (RAND_NODE_ID) at startup time.  When a PGM child
   (receiver or network element) receives a POLL it MUST use its
   RAND_NODE_ID to match POLL_RAND of incoming POLLs.  The match is
   limited to the bits specified by POLL_MASK.  If the incoming POLL
   contain a POLL_MASK made of all zeroes, the match is successful
   despite the content of POLL_RAND (deterministic reply).  If the match
   fails, then the receiver or network element MUST discard the POLL
   without any further action, otherwise it MUST check the fields
   POLL_ROUND, POLL_S_TYPE and any PGM option included in the POLL to
   determine whether it SHOULD reply to the poll.

   If POLL_ROUND is non-zero and the PGM receiver has not received a
   previous poll with the same POLL_SQN and a zero POLL_ROUND, it MUST
   discard the poll without further action.

   If POLL_S_TYPE is equal to PGM_POLL_GENERAL, the PGM child MUST
   schedule a reply to the POLL despite the presence of PGM options on
   the POLL packet.

RFC3208 - Page 88

   If POLL_S_TYPE is different from PGM_POLL_GENERAL, the decision on
   whether a reply should be scheduled depends on the actual type and on
   the options possibly present in the POLL.

   If POLL_S_TYPE is unknown to the recipient of the POLL, it MUST NOT
   reply and ignore the poll.  Currently the only POLL_S_TYPE defined
   are PGM_POLL_GENERAL and PGM_POLL_DLR.

   If a PGM receiver or network element has decided to reply to a POLL,
   it MUST schedule the transmission of a single POLR at a random time
   in the future.  The random delay is chosen in the interval [0,
   POLL_BO_IVL].  POLL_BO_IVL is the one contained in the POLL received.
   When this timer expires, it MUST send a POLR using POLL_PATH of the
   received POLL as destination address.  POLR_SQN MUST be equal to
   POLL_SQN and POLR_ROUND must be equal to POLL_ROUND.  The POLR MAY
   contain PGM options according to the semantic of POLL_S_TYPE or the
   semantic of PGM options possibly present in the POLL.  If POLL_S_TYPE
   is PGM_POLL_GENERAL no option is REQUIRED.

   A PGM receiver or network element MUST cancel any pending
   transmission of POLRs if a new POLL is received with POLL_SQN
   different from POLR_SQN of the poll that scheduled POLRs.

14.6.  Constant Definition

   The POLL_S_TYPE values currently defined are:

      PGM_POLL_GENERAL  0

      PGM_POLL_DLR      1

14.7.  Packet Formats

   The packet formats described in this section are transport-layer
   headers that MUST immediately follow the network-layer header in the
   packet.

   The descriptions of Data-Source Port, Data-Destination Port, Options,
   Checksum, Global Source ID (GSI), and TSDU Length are those provided
   in Section 8.

14.7.1.  Poll Request

   POLL are sent by PGM parents (sources or network elements) to
   initiate a poll among their first PGM-hop children.

RFC3208 - Page 89

   POLLs are sent to the ODATA multicast group.  The network-header
   source address of a POLL is the ODATA source's NLA.  POLL MUST be
   transmitted with the IP Router Alert Option.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Source Port           |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Type     |    Options    |           Checksum            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Global Source ID                   ... |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...    Global Source ID       |           TSDU Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    POLL's Sequence Number                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         POLL's Round          |       POLL's Sub-type         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            NLA AFI            |          Reserved             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            Path NLA                     ...   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...-+-+
   |                  POLL's  Back-off Interval                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Random String                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                      Matching Bit-Mask                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Option Extensions when present ...                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Source Port:

      POLL_SPORT

      Data-Source Port, together with POLL_GSI forms POLL_TSI

   Destination Port:

      POLL_DPORT

      Data-Destination Port

   Type:

      POLL_TYPE = 0x01

RFC3208 - Page 90

   Global Source ID:

      POLL_GSI

      Together with POLL_SPORT forms POLL_TSI

   POLL's Sequence Number

      POLL_SQN

      The sequence number assigned to the POLL by the originator.

   POLL's Round

      POLL_ROUND

      The round number within the poll sequence number.

   POLL's Sub-type

      POLL_S_TYPE

      The sub-type of the poll request.

   Path NLA:

      POLL_PATH

      The NLA of the interface on the source or network element on which
      this POLL was forwarded.

   POLL's Back-off Interval

      POLL_BO_IVL

      The back-off interval used to compute a random back-off for the
      reply, expressed in microseconds.

   Random String

      POLL_RAND

      A random string used to implement the random condition in
      replying.

RFC3208 - Page 91

   Matching Bit-Mask

      POLL_MASK

      A  bit-mask used to determine the probability of random replies.

14.7.2.  Poll Response

   POLR are sent by PGM children (receivers or network elements) to
   reply to a POLL.

   The network-header source address of a POLR is the unicast NLA of the
   entity that originates the POLR.  The network-header destination
   address of a POLR is initialized by the originator of the POLL to the
   unicast NLA of the upstream PGM element (source or network element)
   known from the POLL that triggered the POLR.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Source Port           |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      Type     |    Options    |           Checksum            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Global Source ID                   ... |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...    Global Source ID       |           TSDU Length         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    POLR's Sequence Number                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         POLR's Round          |           reserved            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Option Extensions when present ...                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Source Port:

      POLR_SPORT

      Data-Destination Port

   Destination Port:

      POLR_DPORT

      Data-Source Port, together with Global Source ID forms POLR_TSI

RFC3208 - Page 92

   Type:

      POLR_TYPE = 0x02

   Global Source ID:

      POLR_GSI

      Together with POLR_DPORT forms POLR_TSI

   POLR's Sequence Number

      POLR_SQN

      The sequence number (POLL_SQN) of the POLL packet for which this
      is a reply.

   POLR's Round

      POLR_ROUND

      The round number (POLL_ROUND) of the POLL packet for which this is
      a reply.

15.  Appendix E - Implosion Prevention

15.1.  Introduction

   These procedures are intended to prevent NAK implosion and to limit
   its extent in case of the loss of all or part of the suppressing
   multicast distribution tree.  They also provide a means to adaptively
   tune the NAK back-off interval, NAK_BO_IVL.

   The PGM virtual topology is established and refreshed by SPMs.
   Between one SPM and the next, PGM nodes may have an out-of-date view
   of the PGM topology due to multicast routing changes, flapping, or a
   link/router failure.  If any of the above happens relative to a PGM
   parent node, a potential NAK implosion problem arises because the
   parent node is unable to suppress the generation of duplicate NAKs as
   it cannot reach its children using NCFs.  The procedures described
   below introduce an alternative way of performing suppression in this
   case.  They also attempt to prevent implosion by adaptively tuning
   NAK_BO_IVL.

RFC3208 - Page 93

15.2.  Tuning NAK_BO_IVL

   Sources and network elements continuously monitor the number of
   duplicated NAKs received and use this observation to tune the NAK
   back-off interval (NAK_BO_IVL) for the first PGM-hop receivers
   connected to them.  Receivers learn the current value of NAK_BO_IVL
   through OPT_NAK_BO_IVL appended to NCFs or SPMs.

15.2.1.  Procedures - Sources and Network Elements

   For each TSI, sources and network elements advertise the value of
   NAK_BO_IVL that their first PGM-hop receivers should use.  They
   advertise a separate value on all the outgoing interfaces for the TSI
   and keep track of the last values advertised.

   For each interface and TSI, sources and network elements count the
   number of NAKs received for a specific repair state (i.e., per
   sequence number per TSI) from the time the interface was first added
   to the repair state list until the time the repair state is
   discarded.  Then they use this number to tune the current value of
   NAK_BO_IVL as follows:

      Increase the current value NAK_BO_IVL when the first duplicate NAK
      is received for a given SQN on a particular interface.

   Decrease the value of NAK_BO_IVL if no duplicate NAKs are received on
   a particular interface for the last NAK_PROBE_NUM measurements where
   each measurement corresponds to the creation of a new repair state.

   An upper and lower limit are defined for the possible value of
   NAK_BO_IVL at any time.  These are NAK_BO_IVL_MAX and NAK_BO_IVL_MIN
   respectively.  The initial value that should be used as a starting
   point to tune NAK_BO_IVL is NAK_BO_IVL_DEFAULT.  The policies
   RECOMMENDED for increasing and decreasing NAK_BO_IVL are multiplying
   by two and dividing by two respectively.

   Sources and network elements advertise the current value of
   NAK_BO_IVL through the OPT_NAK_BO_IVL that they append to NCFs.  They
   MAY also append OPT_NAK_BO_IVL to outgoing SPMs.

   In order to avoid forwarding the NAK_BO_IVL advertised by the parent,
   network elements must be able to recognize OPT_NAK_BO_IVL.  Network
   elements that receive SPMs containing OPT_NAK_BO_IVL MUST either
   remove the option or over-write its content (NAK_BO_IVL) with the
   current value of NAK_BO_IVL for the outgoing interface(s), before
   forwarding the SPMs.

RFC3208 - Page 94

   Sources MAY advertise the value of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN
   to the session by appending a OPT_NAK_BO_RNG to SPMs.

15.2.2.  Procedures - Receivers

   Receivers learn the value of NAK_BO_IVL to use through the option
   OPT_NAK_BO_IVL, when this is present in NCFs or SPMs.  A value for
   NAK_BO_IVL learned from OPT_NAK_BO_IVL MUST NOT be used by a receiver
   unless either NAK_BO_IVL_SQN is zero, or the receiver has seen
   POLL_RND == 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the sequence
   number space.  The initial value of NAK_BO_IVL is set to
   NAK_BO_IVL_DEFAULT.

   Receivers that receive an SPM containing OPT_NAK_BO_RNG MUST use its
   content to set the local values of NAK_BO_IVL_MAX and NAK_BO_IVL_MIN.

15.2.3.  Adjusting NAK_BO_IVL in the absence of NAKs

   Monitoring the number of duplicate NAKs provides a means to track
   indirectly the change in the size of first PGM-hop receiver
   population and adjust NAK_BO_IVL accordingly.  Note that the number
   of duplicate NAKs for a given SQN is related to the number of first
   PGM-hop children that scheduled (or forwarded) a NAK and not to the
   absolute number of first PGM-hop children.  This mechanism, however,
   does not work in the absence of packet loss, hence a large number of
   duplicate NAKs is possible after a period without NAKs, if many new
   receivers have joined the session in the meanwhile.  To address this
   issue, PGM Sources and network elements SHOULD periodically poll the
   number of first PGM-hop children using the "general poll" procedures
   described in Appendix D.  If the result of the polls shows that the
   population size has increased significantly during a period without
   NAKs, they SHOULD increase NAK_BO_IVL as a safety measure.

15.3.  Containing Implosion in the Presence of Network Failures

15.3.1.  Detecting Network Failures

   In some cases PGM (parent) network elements can promptly detect the
   loss of all or part of the suppressing multicast distribution tree
   (due to network failures or route changes) by checking their
   multicast connectivity, when they receive NAKs.  In some other cases
   this is not possible as the connectivity problem might occur at some
   other non-PGM node downstream or might take time to reflect in the
   multicast routing table.  To address these latter cases, PGM uses a
   simple heuristic: a failure is assumed for a TSI when the count of
   duplicated NAKs received for a repair state reaches the value
   DUP_NAK_MAX in one of the interfaces.

RFC3208 - Page 95

15.3.2.  Containing Implosion

   When a PGM source or network element detects or assumes a failure for
   which it looses multicast connectivity to down-stream PGM agents
   (either receivers or other network elements), it sends unicast NCFs
   to them in response to NAKs.  Downstream PGM network elements which
   receive unicast NCFs and have multicast connectivity to the multicast
   session send special SPMs to prevent further NAKs until a regular SPM
   sent by the source refreshes the PGM tree.

   Procedures - Sources and Network Elements

   PGM sources or network elements which detect or assume a failure that
   prevents them from reaching down-stream PGM agents through multicast
   NCFs revert to confirming NAKs through unicast NCFs for a given TSI
   on a given interface.  If the PGM agent is the source itself, than it
   MUST generate an SPM for the TSI, in addition to sending the unicast
   NCF.

   Network elements MUST keep using unicast NCFs until they receive a
   regular SPM from the source.

   When a unicast NCF is sent for the reasons described above, it MUST
   contain the OPT_NBR_UNREACH option and the OPT_PATH_NLA option.
   OPT_NBR_UNREACH indicates that the sender is unable to use multicast
   to reach downstream PGM agents.  OPT_PATH_NLA carries the network
   layer address of the NCF sender, namely the NLA of the interface
   leading to the unreachable subtree.

   When a PGM network element receives an NCF containing the
   OPT_NBR_UNREACH option, it MUST ignore it if OPT_PATH_NLA specifies
   an upstream neighbour different from the one currently known to be
   the upstream neighbor for the TSI.  Assuming the network element
   matches the OPT_PATH_NLA of the upstream neighbour address, it MUST
   stop forwarding NAKs for the TSI until it receives a regular SPM for
   the TSI.  In addition, it MUST also generate a special SPM to prevent
   downstream receivers from sending more NAKs.  This special SPM MUST
   contain the OPT_NBR_UNREACH option and SHOULD have a SPM_SQN equal to
   SPM_SQN of the last regular SPM forwarded.  The OPT_NBR_UNREACH
   option invalidates the windowing information in SPMs (SPM_TRAIL and
   SPM_LEAD).  The PGM network element that adds the OPT_NBR_UNREACH
   option SHOULD invalidate the windowing information by setting
   SPM_TRAIL to 0 and SPM_LEAD to 0x80000000.

   PGM network elements which receive an SPM containing the
   OPT_NBR_UNREACH option and whose SPM_PATH matches the currently known
   PGM parent, MUST forward them in the normal way and MUST stop

RFC3208 - Page 96

   forwarding NAKs for the TSI until they receive a regular SPM for the
   TSI.  If the SPM_PATH does not match the currently known PGM parent,
   the SPM containing the OPT_NBR_UNREACH option MUST be ignored.

   Procedures - Receivers

   PGM receivers which receive either an NCF or an SPM containing the
   OPT_NBR_UNREACH option MUST stop sending NAKs until a regular SPM is
   received for the TSI.

   On reception of a unicast NCF containing the OPT_NBR_UNREACH option
   receivers MUST generate a multicast copy of the packet with TTL set
   to one on the RPF interface for the data source.  This will prevent
   other receivers in the same subnet from generating NAKs.

   Receivers MUST ignore windowing information in SPMs which contain the
   OPT_NBR_UNREACH option.

   Receivers MUST ignore NCFs containing the OPT_NBR_UNREACH option if
   the OPT_PATH_NLA specifies a neighbour different than the one
   currently know to be the PGM parent neighbour.  Similarly receivers
   MUST ignore SPMs containing the OPT_NBR_UNREACH option if SPM_PATH
   does not match the current PGM parent.

15.4.  Packet Formats

15.4.1.  OPT_NAK_BO_IVL - Packet Extension Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E| Option Type | Option Length |Reserved |F|OPX|U|             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     NAK Back-Off Interval                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   NAK Back-Off Interval SQN                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Option Type = 0x04

   NAK Back-Off Interval

      The value of NAK-generation Back-Off Interval in microseconds.

RFC3208 - Page 97

   NAK Back-Off Interval Sequence Number

      The POLL_SQN to which this value of NAK_BO_IVL corresponds.  Zero
      is reserved and means NAK_BO_IVL is NOT being determined through
      polling (see Appendix D) and may be used immediately.  Otherwise,
      NAK_BO_IVL MUST NOT be used unless the receiver has also seen
      POLL_ROUND = 0 for POLL_SQN =< NAK_BO_IVL_SQN within half the
      sequence number space.

   OPT_NAK_BO_IVL MAY be appended to NCFs, SPMs, or POLLs.

   OPT_NAK_BO_IVL is network-significant.

15.4.2.  OPT_NAK_BO_RNG - Packet Extension Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E| Option Type | Option Length |Reserved |F|OPX|U|             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Maximum  NAK Back-Off Interval                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Minimum  NAK Back-Off Interval                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Option Type = 0x05

   Maximum NAK Back-Off Interval

      The maximum value of NAK-generation Back-Off Interval in
      microseconds.

   Minimum NAK Back-Off Interval

      The minimum value of NAK-generation Back-Off Interval in
      microseconds.

   OPT_NAK_BO_RNG MAY be appended to SPMs.

   OPT_NAK_BO_RNG is network-significant.

15.4.3.  OPT_NBR_UNREACH - Packet Extension Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E| Option Type | Option Length |Reserved |F|OPX|U|             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RFC3208 - Page 98

      Option Type = 0x0B

      When present in SPMs, it invalidates the windowing information.

   OPT_NBR_UNREACH MAY be appended to SPMs and NCFs.

   OPT_NBR_UNREACH is network-significant.

15.4.4.  OPT_PATH_NLA - Packet Extension Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |E| Option Type | Option Length |Reserved |F|OPX|U|             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            Path NLA                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Option Type = 0x0C

   Path NLA

      The NLA of the interface on the originating PGM network element
      that it uses to send multicast SPMs to the recipient of the packet
      containing this option.

   OPT_PATH_NLA MAY be appended to NCFs.

   OPT_PATH_NLA is network-significant.

16.  Appendix F - Transmit Window Example

      Nota Bene: The concept of and all references to the increment
      window (TXW_INC) and the window increment (TXW_ADV_SECS)
      throughout this document are for illustrative purposes only.  They
      provide the shorthand with which to describe the concept of
      advancing the transmit window without also implying any particular
      implementation or policy of advancement.

   The size of the transmit window in seconds is simply TXW_SECS.  The
   size of the transmit window in bytes (TXW_BYTES) is (TXW_MAX_RTE *
   TXW_SECS).  The size of the transmit window in sequence numbers
   (TXW_SQNS) is (TXW_BYTES / bytes-per-packet).

   The fraction of the transmit window size (in seconds of data) by
   which the transmit window is advanced (TXW_ADV_SECS) is called the
   window increment.  The trailing (oldest) such fraction of the
   transmit window itself is called the increment window.

RFC3208 - Page 99

   In terms of sequence numbers, the increment window is the range of
   sequence numbers that will be the first to be expired from the
   transmit window.  The trailing (or left) edge of the increment window
   is just TXW_TRAIL, the trailing (or left) edge of the transmit
   window.  The leading (or right) edge of the increment window
   (TXW_INC) is defined as one less than the sequence number of the
   first data packet transmitted by the source TXW_ADV_SECS after
   transmitting TXW_TRAIL.

   A data packet is described as being "in" the transmit or increment
   window, respectively, if its sequence number is in the range defined
   by the transmit or increment window, respectively.

   The transmit window is advanced across the increment window by the
   source when it increments TXW_TRAIL to TXW_INC.  When the transmit
   window is advanced across the increment window, the increment window
   is emptied (i.e., TXW_TRAIL is momentarily equal to TXW_INC), begins
   to refill immediately as transmission proceeds, is full again
   TXW_ADV_SECS later (i.e., TXW_TRAIL is separated from TXW_INC by
   TXW_ADV_SECS of data), at which point the transmit window is advanced
   again, and so on.

16.1.  Advancing across the Increment Window

   In anticipation of advancing the transmit window, the source starts a
   timer TXW_ADV_IVL_TMR which runs for time period TXW_ADV_IVL.
   TXW_ADV_IVL has a value in the range (0, TXW_ADV_SECS).  The value
   MAY be configurable or MAY be determined statically by the strategy
   used for advancing the transmit window.

   When TXW_ADV_IVL_TMR is running, a source MAY reset TXW_ADV_IVL_TMR
   if NAKs are received for packets in the increment window.  In
   addition, a source MAY transmit RDATA in the increment window with
   priority over other data within the transmit window.

   When TXW_ADV_IVL_TMR expires, a source SHOULD advance the trailing
   edge of the transmit window from TXW_TRAIL to TXW_INC.

   Once the transmit window is advanced across the increment window,
   SPM_TRAIL, OD_TRAIL and RD_TRAIL are set to the new value of
   TXW_TRAIL in all subsequent transmitted packets, until the next
   window advancement.

   PGM does not constrain the strategies that a source may use for
   advancing the transmit window.  The source MAY implement any scheme
   or number of schemes.  Three suggested strategies are outlined here.

RFC3208 - Page 100

   Consider the following example:

      Assuming a constant transmit rate of 128kbps and a constant data
      packet size of 1500 bytes, if a source maintains the past 30
      seconds of data for repair and increments its transmit window in 5
      second increments, then

         TXW_MAX_RTE = 16kBps
         TXW_ADV_SECS = 5 seconds,
         TXW_SECS = 35 seconds,
         TXW_BYTES = 560kB,
         TXW_SQNS = 383 (rounded up),

      and the size of the increment window in sequence numbers
      (TXW_MAX_RTE * TXW_ADV_SECS / 1500) = 54 (rounded down).

   Continuing this example, the following is a diagram of the transmit
   window and the increment window therein in terms of sequence numbers.


       TXW_TRAIL                                     TXW_LEAD
          |                                             |
          |                                             |
       |--|--------------- Transmit Window -------------|----|
       v  |                                             |    v
          v                                             v
   n-1 |  n  | n+1 | ... | n+53 | n+54 | ... | n+381 | n+382 | n+383
                            ^
       ^                    |   ^
       |--- Increment Window|---|
                            |
                            |
                         TXW_INC

      So the values of the sequence numbers defining these windows are:

         TXW_TRAIL = n
         TXW_INC = n+53
         TXW_LEAD = n+382

      Nota Bene: In this example the window sizes in terms of sequence
      numbers can be determined only because of the assumption of a
      constant data packet size of 1500 bytes.  When the data packet
      sizes are variable, more or fewer sequence numbers MAY be consumed
      transmitting the same amount (TXW_BYTES) of data.

   So, for a given transport session identified by a TSI, a source
   maintains:

RFC3208 - Page 101

   TXW_MAX_RTE    a maximum transmit rate in kBytes per second, the
                  cumulative transmit rate of some combination of SPMs,
                  ODATA, and RDATA depending on the transmit window
                  advancement strategy

   TXW_TRAIL      the sequence number defining the trailing edge of the
                  transmit window, the sequence number of the oldest
                  data packet available for repair

   TXW_LEAD       the sequence number defining the leading edge of the
                  transmit window, the sequence number of the most
                  recently transmitted ODATA packet

   TXW_INC        the sequence number defining the leading edge of the
                  increment window, the sequence number of the most
                  recently transmitted data packet amongst those that
                  will expire upon the next increment of the transmit
                  window

   PGM does not constrain the strategies that a source may use for
   advancing the transmit window.  A source MAY implement any scheme or
   number of schemes.  This is possible because a PGM receiver must obey
   the window provided by the source in its packets.  Three strategies
   are suggested within this document.

   In the first, called "Advance with Time", the transmit window
   maintains the last TXW_SECS of data in real-time, regardless of
   whether any data was sent in that real time period or not.  The
   actual number of bytes maintained at any instant in time will vary
   between 0 and TXW_BYTES, depending on traffic during the last
   TXW_SECS.  In this case, TXW_MAX_RTE is the cumulative transmit rate
   of SPMs and ODATA.

   In the second, called "Advance with Data", the transmit window
   maintains the last TXW_BYTES bytes of data for repair.  That is, it
   maintains the theoretical maximum amount of data that could be
   transmitted in the time period TXW_SECS, regardless of when they were
   transmitted.  In this case, TXW_MAX_RTE is the cumulative transmit
   rate of SPMs, ODATA, and RDATA.

   The third strategy leaves control of the window in the hands of the
   application.  The API provided by a source implementation for this,
   could allow the application to control the window in terms of APDUs
   and to manually step the window.  This gives a form of Application
   Level Framing (ALF).  In this case, TXW_MAX_RTE is the cumulative
   transmit rate of SPMs, ODATA, and RDATA.

RFC3208 - Page 102

16.2.  Advancing with Data

   In the first strategy, TXW_MAX_RTE is calculated from SPMs and both
   ODATA and RDATA, and NAKs reset TXW_ADV_IVL_TMR.  In this mode of
   operation the transmit window maintains the last TXW_BYTES bytes of
   data for repair.  That is, it maintains the theoretical maximum
   amount of data that could be transmitted in the time period TXW_SECS.
   This means that the following timers are not treated as real-time
   timers, instead they are "data driven".  That is, they expire when
   the amount of data that could be sent in the time period they define
   is sent.  They are the SPM ambient time interval, TXW_ADV_SECS,
   TXW_SECS, TXW_ADV_IVL, TXW_ADV_IVL_TMR and the join interval.  Note
   that the SPM heartbeat timers still run in real-time.

   While TXW_ADV_IVL_TMR is running, a source uses the receipt of a NAK
   for ODATA within the increment window to reset timer TXW_ADV_IVL_TMR
   to TXW_ADV_IVL so that transmit window advancement is delayed until
   no NAKs for data in the increment window are seen for TXW_ADV_IVL
   seconds.  If the transmit window should fill in the meantime, further
   transmissions would be suspended until the transmit window can be
   advanced.

   A source MUST advance the transmit window across the increment window
   only upon expiry of TXW_ADV_IVL_TMR.

   This mode of operation is intended for non-real-time, messaging
   applications based on the receipt of complete data at the expense of
   delay.

16.3.  Advancing with Time

   This strategy advances the transmit window in real-time.  In this
   mode of operation, TXW_MAX_RTE is calculated from SPMs and ODATA only
   to maintain a constant data throughput rate by consuming extra
   bandwidth for repairs.  TXW_ADV_IVL has the value 0 which advances
   the transmit window without regard for whether NAKs for data in the
   increment window are still being received.

   In this mode of operation, all timers are treated as real-time
   timers.

   This mode of operation is intended for real-time, streaming
   applications based on the receipt of timely data at the expense of
   completeness.

RFC3208 - Page 103

16.4.  Advancing under explicit application control

   Some applications may wish more explicit control of the transmit
   window than that provided by the advance with data / time strategies
   above.  An implementation MAY provide this mode of operation and
   allow an application to explicitly control the window in terms of
   APDUs.

(page 103 continued on part 5)