6. PMSI Instantiation
This section provides the procedures for using P-tunnels to
instantiate a PMSI. It describes the procedures for setting up and
maintaining the P-tunnels as well as for sending and receiving C-data
and/or C-control messages on the P-tunnels. However, procedures for
binding particular C-flows to particular P-tunnels are discussed in
Section 7.
PMSIs can be instantiated either by P-multicast trees or by PE-PE
unicast tunnels. In the latter case, the PMSI is said to be
instantiated by "ingress replication".
This specification supports a number of different methods for setting
up P-multicast trees: these are detailed below. A P-tunnel may
support a single VPN (a non-aggregated P-multicast tree) or multiple
VPNs (an aggregated P-multicast tree).
6.1. Use of the Intra-AS I-PMSI A-D Route
6.1.1. Sending Intra-AS I-PMSI A-D Routes
When a PE is provisioned to have one or more VRFs that provide MVPN
support, the PE announces its MVPN membership information using
Intra-AS I-PMSI A-D routes, as discussed in Section 4 and detailed in
Section 9.1.1 of [MVPN-BGP]. (Under certain conditions, detailed in
[MVPN-BGP], the Intra-AS I-PMSI A-D route may be omitted.)
Generally, the Intra-AS I-PMSI A-D route will have a PMSI Tunnel
attribute that identifies a P-tunnel that is being used to
instantiate the I-PMSI. Section 9.1.1 of [MVPN-BGP] details certain
conditions under which the PMSI Tunnel attribute may be omitted (or
in which a PMSI Tunnel attribute with the "no tunnel information
present" bit may be sent).
As a special case, when (a) C-PIM control messages are to be sent
through an MI-PMSI and (b) the MI-PMSI is instantiated by a P-tunnel
technique for which each PE needs to know only a single P-tunnel
identifier per VPN, then the use of the Intra-AS I-PMSI A-D routes
MAY be omitted, and static configuration of the tunnel identifier
used instead. However, this is not recommended for long-term use,
and in all other cases, the Intra-AS I-PMSI A-D routes MUST be used.
The PMSI Tunnel attribute MAY contain an upstream-assigned MPLS
label, assigned by the PE originating the Intra-AS I-PMSI A-D route.
If this label is present, the P-tunnel can be carrying data from
several MVPNs. The label is used on the data packets traveling
through the tunnel to identify the MVPN to which those data packets
belong. (The specified label identifies the packet as belonging to
the MVPN that is identified by the RTs of the Intra-AS I-PMSI A-D
route.)
See Section 12.2 for details on how to place the label in the
packet's label stack.
The Intra-AS I-PMSI A-D route may contain a "PE Distinguisher Labels"
attribute. This contains a set of bindings between upstream-assigned
labels and PE addresses. The PE that originated the route may use
this to bind an upstream-assigned label to one or more of the other
PEs that belong to the same MVPN. The way in which PE Distinguisher
Labels are used is discussed in Sections 6.4.1, 6.4.3, 11.2.2, and
12.3. Other uses of the PE Distinguisher Labels attribute are
outside the scope of this document.
6.1.2. Receiving Intra-AS I-PMSI A-D Routes
The action to be taken when a PE receives an Intra-AS I-PMSI A-D
route for a particular MVPN depends on the particular P-tunnel
technology that is being used by that MVPN. If the P-tunnel
technology requires tunnels to be built by means of receiver-
initiated joins, the PE SHOULD join the tunnel immediately.
6.2. When C-flows Are Specifically Bound to P-Tunnels
This situation is discussed in Section 7.
6.3. Aggregating Multiple MVPNs on a Single P-Tunnel
When a P-multicast tree is shared across multiple MVPNs, it is termed
an "Aggregate Tree". The procedures described in this document allow
a single SP multicast tree to be shared across multiple MVPNs.
Unless otherwise specified, P-multicast tree technology supports
aggregation.
All procedures that are specific to multi-MVPN aggregation are
OPTIONAL and are explicitly pointed out.
Aggregate Trees allow a single P-multicast tree to be used across
multiple MVPNs so that state in the SP core grows per set of MVPNs
and not per MVPN. Depending on the congruence of the aggregated
MVPNs, this may result in trading off optimality of multicast
routing.
An Aggregate Tree can be used by a PE to provide a UI-PMSI or MI-PMSI
service for more than one MVPN. When this is the case, the Aggregate
Tree is said to have an inclusive mapping.
6.3.1. Aggregate Tree Leaf Discovery
BGP MVPN membership discovery (Section 4) allows a PE to determine
the different Aggregate Trees that it should create and the MVPNs
that should be mapped onto each such tree. The leaves of an
Aggregate Tree are determined by the PEs, supporting aggregation,
that belong to all the MVPNs that are mapped onto the tree.
If an Aggregate Tree is used to instantiate one or more S-PMSIs, then
it may be desirable for the PE at the root of the tree to know which
PEs (in its MVPN) are receivers on that tree. This enables the PE to
decide when to aggregate two S-PMSIs, based on congruence (as
discussed in the next section). Thus, explicit tracking may be
required. Since the procedures for disseminating C-multicast routes
do not provide explicit tracking, a type of A-D route known as a
"Leaf A-D route" is used. The PE that wants to assign a particular
C-multicast flow to a particular Aggregate Tree can send an A-D
route, which elicits Leaf A-D routes from the PEs that need to
receive that C-multicast flow. This provides the explicit tracking
information needed to support the aggregation methodology discussed
in the next section. For more details on Leaf A-D routes, please
refer to [MVPN-BGP].
6.3.2. Aggregation Methodology
This document does not specify the mandatory implementation of any
particular set of rules for determining whether or not the PMSIs of
two particular MVPNs are to be instantiated by the same Aggregate
Tree. This determination can be made by implementation-specific
heuristics, by configuration, or even perhaps by the use of offline
tools.
It is the intention of this document that the control procedures will
always result in all the PEs of an MVPN agreeing on the PMSIs that
are to be used and on the tunnels used to instantiate those PMSIs.
This section discusses potential methodologies with respect to
aggregation.
The "congruence" of aggregation is defined by the amount of overlap
in the leaves of the customer trees that are aggregated on an SP
tree. For Aggregate Trees with an inclusive mapping, the congruence
depends on the overlap in the membership of the MVPNs that are
aggregated on the tree. If there is complete overlap, i.e., all
MVPNs have exactly the same sites, aggregation is perfectly
congruent. As the overlap between the MVPNs that are aggregated
reduces, i.e., the number of sites that are common across all the
MVPNs reduces, the congruence reduces.
If aggregation is done such that it is not perfectly congruent, a PE
may receive traffic for MVPNs to which it doesn't belong. As the
amount of multicast traffic in these unwanted MVPNs increases,
aggregation becomes less optimal with respect to delivered traffic.
Hence, there is a trade-off between reducing state and delivering
unwanted traffic.
An implementation should provide knobs to control the congruence of
aggregation. These knobs are implementation dependent. Configuring
the percentage of sites that MVPNs must have in common to be
aggregated is an example of such a knob. This will allow an SP to
deploy aggregation depending on the MVPN membership and traffic
profiles in its network. If different PEs or servers are setting up
Aggregate Trees, this will also allow a service provider to engineer
the maximum amount of unwanted MVPNs for which a particular PE may
receive traffic.
6.3.3. Demultiplexing C-Multicast Traffic
If a P-multicast tree is associated with only one MVPN, determining
the P-multicast tree on which a packet was received is sufficient to
determine the packet's MVPN. All that the egress PE needs to know is
the MVPN with which the P-multicast tree is associated.
When multiple MVPNs are aggregated onto one P-multicast tree,
determining the tree over which the packet is received is not
sufficient to determine the MVPN to which the packet belongs. The
packet must also carry some demultiplexing information to allow the
egress PEs to determine the MVPN to which the packet belongs. Since
the packet has been multicast through the P-network, any given
demultiplexing value must have the same meaning to all the egress
PEs. The demultiplexing value is a MPLS label that corresponds to
the multicast VRF to which the packet belongs. This label is placed
by the ingress PE immediately beneath the P-multicast tree header.
Each of the egress PEs must be able to associate this MPLS label with
the same MVPN. If downstream-assigned labels were used, this would
require all the egress PEs in the MVPN to agree on a common label for
the MVPN. Instead, the MPLS label is upstream-assigned
[MPLS-UPSTREAM-LABEL]. The label bindings are advertised via BGP
Updates originated by the ingress PEs.
This procedure requires each egress PE to support a separate label
space for every other PE. The egress PEs create a forwarding entry
for the upstream-assigned MPLS label, allocated by the ingress PE, in
this label space. Hence, when the egress PE receives a packet over
an Aggregate Tree, it first determines the tree over which the packet
was received. The tree identifier determines the label space in
which the upstream-assigned MPLS label lookup has to be performed.
The same label space may be used for all P-multicast trees rooted at
the same ingress PE or an implementation may decide to use a separate
label space for every P-multicast tree.
A full specification of the procedures to support aggregation on
shared trees or on MP2MP LSPs is outside the scope of this document.
The encapsulation format is either MPLS or MPLS-in-something (e.g.,
MPLS-in-GRE [MPLS-IP]). When MPLS is used, this label will appear
immediately below the label that identifies the P-multicast tree.
When MPLS-in-GRE is used, this label will be the top MPLS label that
appears when the GRE header is stripped off.
When IP encapsulation is used for the P-multicast tree, whatever
information that particular encapsulation format uses for identifying
a particular tunnel is used to determine the label space in which the
MPLS label is looked up.
If the P-multicast tree uses MPLS encapsulation, the P-multicast tree
is itself identified by an MPLS label. The egress PE MUST NOT
advertise IMPLICIT NULL or EXPLICIT NULL for that tree. Once the
label representing the tree is popped off the MPLS label stack, the
next label is the demultiplexing information that allows the proper
MVPN to be determined.
This specification requires that, to support this sort of
aggregation, there be at least one upstream-assigned label per MVPN.
It does not require that there be only one. For example, an ingress
PE could assign a unique label to each (C-S,C-G). (This could be
done using the same technique that is used to assign a particular
(C-S,C-G) to an S-PMSI, see Section 7.4.)
When an egress PE receives a C-multicast data packet over a
P-multicast tree, it needs to forward the packet to the CEs that have
receivers in the packet's C-multicast group. In order to do this,
the egress PE needs to determine the P-tunnel on which the packet was
received. The PE can then determine the MVPN that the packet belongs
to and, if needed, do any further lookups that are needed to forward
the packet.
6.4. Considerations for Specific Tunnel Technologies
While it is believed that the architecture specified in this document
places no limitations on the protocols used for setting up and
maintaining P-tunnels, the only protocols that have been explicitly
considered are PIM-SM (both the SSM and ASM service models are
considered, as are bidirectional trees), RSVP-TE, mLDP, and BGP.
(BGP's role in the setup and maintenance of P-tunnels is to "stitch"
together the intra-AS segments of a segmented inter-AS P-tunnel.)
6.4.1. RSVP-TE P2MP LSPs
If an I-PMSI is to be instantiated as one or more non-segmented
P-tunnels, where the P-tunnels are RSVP-TE P2MP LSPs, then only the
PEs that are at the head ends of those LSPs will ever include the
PMSI Tunnel attribute in their Intra-AS I-PMSI A-D routes. (These
will be the PEs in the "Sender Sites set".)
If an I-PMSI is to be instantiated as one or more segmented
P-tunnels, where some of the intra-AS segments of these tunnels are
RSVP-TE P2MP LSPs, then only a PE or ASBR that is at the head end of
one of these LSPs will ever include the PMSI Tunnel attribute in its
Inter-AS I-PMSI A-D route.
Other PEs send Intra-AS I-PMSI A-D routes without PMSI Tunnel
attributes. (These will be the PEs that are in the "Receiver Sites
set" but not in the "Sender Sites set".) As each "Sender Site" PE
receives an Intra-AS I-PMSI A-D route from a PE in the Receiver Sites
set, it adds the PE originating that Intra-AS I-PMSI A-D route to the
set of receiving PEs for the P2MP LSP. The PE at the head end MUST
then use RSVP-TE [RSVP-P2MP] signaling to add the receiver PEs to the
P-tunnel.
When RSVP-TE P2MP LSPs are used to instantiate S-PMSIs, and a
particular C-flow is to be bound to the LSP, it is necessary to use
explicit tracking so that the head end of the LSP knows which PEs
need to receive data from the specified C-flow. If the binding is
done using S-PMSI A-D routes (see Section 7.4.1), the "Leaf
Information Required" bit MUST be set in the PMSI Tunnel attribute.
RSVP-TE P2MP LSPs can optionally support aggregation of multiple
MVPNs.
If an RSVP-TE P2MP LSP Tunnel is used for only a single MVPN, the
mapping between the LSP and the MVPN can either be configured or be
deduced from the procedures used to announce the LSP (e.g., from the
RTs in the A-D route that announced the LSP). If the LSP is used for
multiple MVPNs, the set of MVPNs using it (and the corresponding MPLS
labels) is inferred from the PMSI Tunnel attributes that specify the
LSP.
If an RSVP-TE P2MP LSP is being used to carry a set of C-flows
traveling along a bidirectional C-tree, using the procedures of
Section 11.2, the head end MUST include the PE Distinguisher Labels
attribute in its Intra-AS I-PMSI A-D route or S-PMSI A-D route, and
it MUST provide an upstream-assigned label for each PE that it has
selected as the Upstream PE for the C-tree's RPA (Rendezvous Point
Address). See Section 11.2 for details.
A PMSI Tunnel attribute specifying an RSVP-TE P2MP LSP contains the
following information:
- The type of the tunnel is set to RSVP-TE P2MP Tunnel
- The RSVP-TE P2MP Tunnel's SESSION Object.
- Optionally, the RSVP-TE P2MP LSP's SENDER_TEMPLATE Object. This
object is included when it is desired to identify a particular
P2MP TE LSP.
Demultiplexing the C-multicast data packets at the egress PE follows
procedures described in Section 6.3.3. As specified in Section
6.3.3, an egress PE MUST NOT advertise IMPLICIT NULL or EXPLICIT NULL
for an RSVP-TE P2MP LSP that is carrying traffic for one or more
MVPNs.
If (and only if) a particular RSVP-TE P2MP LSP is possibly carrying
data from multiple MVPNs, the following special procedures apply:
- A packet in a particular MVPN, when transmitted into the LSP,
must carry the MPLS label specified in the PMSI Tunnel attribute
that announced that LSP as a P-tunnel for that for that MVPN.
- Demultiplexing the C-multicast data packets at the egress PE is
done by means of the MPLS label that rises to the top of the
stack after the label corresponding to the P2MP LSP is popped
off.
It is possible that at the time a PE learns, via an A-D route with a
PMSI Tunnel attribute, that it needs to receive traffic on a
particular RSVP-TE P2MP LSP, the signaling to set up the LSP will not
have been completed. In this case, the PE needs to wait for the
RSVP-TE signaling to take place before it can modify its forwarding
tables as directed by the A-D route.
It is also possible that the signaling to set up an RSVP-TE P2MP LSP
will be completed before a given PE learns, via a PMSI Tunnel
attribute, of the use to which that LSP will be put. The PE MUST
discard any traffic received on that LSP until that time.
In order for the egress PE to be able to discard such traffic, it
needs to know that the LSP is associated with an MVPN and that the
A-D route that binds the LSP to an MVPN or to a particular a C-flow
has not yet been received. This is provided by extending [RSVP-P2MP]
with [RSVP-OOB].
6.4.2. PIM Trees
When the P-tunnels are PIM trees, the PMSI Tunnel attribute contains
enough information to allow each other PE in the same MVPN to use
P-PIM signaling to join the P-tunnel.
If an I-PMSI is to be instantiated as one or more PIM trees, then the
PE that is at the root of a given PIM tree sends an Intra-AS I-PMSI
A-D route containing a PMSI Tunnel attribute that contains all the
information needed for other PEs to join the tree.
If PIM trees are to be used to instantiate an MI-PMSI, each PE in the
MVPN must send an Intra-AS I-PMSI A-D route containing such a PMSI
Tunnel attribute.
If a PMSI is to be instantiated via a shared tree, the PMSI Tunnel
attribute identifies the P-group address. The RP or RPA
corresponding to the P-group address is not specified. It must, of
course, be known to all the PEs. It is presupposed that the PEs use
one of the methods for automatically learning the RP-to-group
correspondences (e.g., Bootstrap Router Protocol [BSR]), or else that
the correspondence is configured.
If a PMSI is to be instantiated via a source-specific tree, the PMSI
Tunnel attribute identifies the PE router that is the root of the
tree, as well as a P-group address. The PMSI Tunnel attribute always
specifies whether the PIM tree is to be a unidirectional shared tree,
a bidirectional shared tree, or a source-specific tree.
If PIM trees are being used to instantiate S-PMSIs, the above
procedures assume that each PE router has a set of group P-addresses
that it can use for setting up the PIM-trees. Each PE must be
configured with this set of P-addresses. If the P-tunnels are
source-specific trees, then the PEs may be configured with
overlapping sets of group P-addresses. If the trees are not source-
specific, then each PE must be configured with a unique set of group
P-addresses (i.e., having no overlap with the set configured at any
other PE router). The management of this set of addresses is thus
greatly simplified when source-specific trees are used, so the use of
source-specific trees is strongly recommended whenever unidirectional
trees are desired.
Specification of the full set of procedures for using bidirectional
PIM trees to instantiate S-PMSIs is outside the scope of this
document.
Details for constructing the PMSI Tunnel attribute identifying a PIM
tree can be found in [MVPN-BGP].
6.4.3. mLDP P2MP LSPs
When the P-tunnels are mLDP P2MP trees, each Intra-AS I-PMSI A-D
route has a PMSI Tunnel attribute containing enough information to
allow each other PE in the same MVPN to use mLDP signaling to join
the P-tunnel. The tunnel identifier consists of a P2MP Forwarding
Equivalence Class (FEC) Element [mLDP].
An mLDP P2MP LSP may be used to carry the traffic of multiple VPNs,
if the PMSI Tunnel attribute specifying it contains a non-zero MPLS
label.
If an mLDP P2MP LSP is being used to carry the set of flows traveling
along a particular bidirectional C-tree, using the procedures of
Section 11.2, the root of the LSP MUST include the PE Distinguisher
Labels attribute in its Intra-AS I-PMSI A-D route or S-PMSI A-D
route, and it MUST provide an upstream-assigned label for the PE that
it has selected to be the Upstream PE for the C-tree's RPA. See
Section 11.2 for details.
6.4.4. mLDP MP2MP LSPs
The specification of the procedures for assigning C-flows to mLDP
MP2MP LSPs that serve as P-tunnels is outside the scope of this
document.
6.4.5. Ingress Replication
As described in Section 3, a PMSI can be instantiated using Unicast
Tunnels between the PEs that are participating in the MVPN. In this
mechanism, the ingress PE replicates a C-multicast data packet
belonging to a particular MVPN and sends a copy to all or a subset of
the PEs that belong to the MVPN. A copy of the packet is tunneled to
a remote PE over a Unicast Tunnel to the remote PE. IP/GRE Tunnels
or MPLS LSPs are examples of unicast tunnels that may be used. The
same Unicast Tunnel can be used to transport packets belonging to
different MVPNs
In order for a PE to use Unicast P-tunnels to send a C-multicast data
packet for a particular MVPN to a set of remote PEs, the remote PEs
must be able to correctly decapsulate such packets and to assign each
one to the proper MVPN. This requires that the encapsulation used
for sending packets through the P-tunnel have demultiplexing
information that the receiver can associate with a particular MVPN.
If ingress replication is being used to instantiate the PMSIs for an
MVPN, the PEs announce this as part of the BGP-based MVPN membership
auto-discovery process, described in Section 4. The PMSI Tunnel
attribute specifies ingress replication; it also specifies a
downstream-assigned MPLS label. This label will be used to identify
that a particular packet belongs to the MVPN that the Intra-AS I-PMSI
A-D route belongs to (as inferred from its RTs). If PE1 specifies a
particular label value for a particular MVPN, then any other PE
sending PE1 a packet for that MVPN through a unicast P-tunnel must
put that label on the packet's label stack. PE1 then treats that
label as the demultiplexor value identifying the MVPN in question.
Ingress replication may be used to instantiate any kind of PMSI.
When ingress replication is done, it is RECOMMENDED, except in the
one particular case mentioned in the next paragraph, that explicit
tracking be done and that the data packets of a particular C-flow
only get sent to those PEs that need to see the packets of that
C-flow. There is never any need to use the procedures of Section 7.4
for binding particular C-flows to particular P-tunnels.
The particular case in which there is no need for explicit tracking
is the case where ingress replication is being used to create a
one-hop ASBR-ASBR inter-AS segment of an segmented inter-AS P-tunnel.
Section 9.1 specifies three different methods that can be used to
prevent duplication of multicast data packets. Any given deployment
must use at least one of those methods. Note that the method
described in Section 9.1.1 ("Discarding Packets from Wrong PE")
presupposes that the egress PE of a P-tunnel can, upon receiving a
packet from the P-tunnel, determine the identity of the PE that
transmitted the packet into the P-tunnel. SPs that use ingress
replication to instantiate their PMSIs are cautioned against this use
for this purpose of unicast P-tunnel technologies that do not allow
the egress PE to identify the ingress PE (e.g., MP2P LSPs for which
penultimate-hop-popping is done). Deployment of ingress replication
with such P-tunnel technology MUST NOT be done unless it is known
that the deployment relies entirely on the procedures of Sections
9.1.2 or 9.1.3 for duplicate prevention.
7. Binding Specific C-Flows to Specific P-Tunnels
As discussed previously, Intra-AS I-PMSI A-D routes may (or may not)
have PMSI Tunnel attributes, identifying P-tunnels that can be used
as the default P-tunnels for carrying C-multicast traffic, i.e., for
carrying C-multicast traffic that has not been specifically bound to
another P-tunnel.
If none of the Intra-AS I-PMSI A-D routes originated by a particular
PE for a particular MVPN carry PMSI Tunnel attributes at all (or if
the only PMSI Tunnel attributes they carry have type "No tunnel
information present"), then there are no default P-tunnels for that
PE to use when transmitting C-multicast traffic in that MVPN to other
PEs. In that case, all such C-flows must be assigned to specific
P-tunnels using one of the mechanisms specified in Section 7.4. That
is, all such C-flows are carried on P-tunnels that instantiate
S-PMSIs.
There are other cases where it may be either necessary or desirable
to use the mechanisms of Section 7.4 to identify specific C-flows and
bind them to or unbind them from specific P-tunnels. Some possible
cases are as follows:
- The policy for a particular MVPN is to send all C-data on
S-PMSIs, even if the Intra-AS I-PMSI A-D routes carry PMSI Tunnel
attributes. (This is another case where all C-data is carried on
S-PMSIs; presumably, the I-PMSIs are used for control
information.)
- It is desired to optimize the routing of the particular C-flow,
which may already be traveling on an I-PMSI, by sending it
instead on an S-PMSI.
- If a particular C-flow is traveling on an S-PMSI, it may be
considered desirable to move it to an I-PMSI (i.e., optimization
of the routing for that flow may no longer be considered
desirable).
- It is desired to change the encapsulation used to carry the
C-flow, e.g., because one now wants to aggregate it on a P-tunnel
with flows from other MVPNs.
Note that if Full PIM Peering over an MI-PMSI (Section 5.2) is being
used, then from the perspective of the PIM state machine, the
"interface" connecting the PEs to each other is the MI-PMSI, even if
some or all of the C-flows are being sent on S-PMSIs. That is, from
the perspective of the C-PIM state machine, when a C-flow is being
sent or received on an S-PMSI, the output or input interface
(respectively) is considered to be the MI-PMSI.
Section 7.1 discusses certain general considerations that apply
whenever a specified C-flow is bound to a specified P-tunnel using
the mechanisms of Section 7.4. This includes the case where the
C-flow is moved from one P-tunnel to another as well as the case
where the C-flow is initially bound to an S-PMSI P-tunnel.
Section 7.2 discusses the specific case of using the mechanisms of
Section 7.4 as a way of optimizing multicast routing by switching
specific flows from one P-tunnel to another.
Section 7.3 discusses the case where the mechanisms of Section 7.4
are used to announce the presence of "unsolicited flooded data" and
to assign such data to a particular P-tunnel.
Section 7.4 specifies the protocols for assigning specific C-flows to
specific P-tunnels. These protocols may be used to assign a C-flow
to a P-tunnel initially or to switch a flow from one P-tunnel to
another.
Procedures for binding to a specified P-tunnel the set of C-flows
traveling along a specified C-tree (or for so binding a set of
C-flows that share some relevant characteristic), without identifying
each flow individually, are outside the scope of this document.
7.1. General Considerations
7.1.1. At the PE Transmitting the C-Flow on the P-Tunnel
The decision to bind a particular C-flow (designated as (C-S,C-G)) to
a particular P-tunnel, or to switch a particular C-flow to a
particular P-tunnel, is always made by the PE that is to transmit the
C-flow onto the P-tunnel.
Whenever a PE moves a particular C-flow from one P-tunnel, say P1, to
another, say P2, care must be taken to ensure that there is no steady
state duplication of traffic. At any given time, the PE transmits
the C-flow either on P1 or on P2, but not on both.
When a particular PE, say PE1, decides to bind a particular C-flow to
a particular P-tunnel, say P2, the following procedures MUST be
applied:
- PE1 must issue the required control plane information to signal
that the specified C-flow is now bound to P-tunnel P2 (see
Section 7.4).
- If P-tunnel P2 needs to be constructed from the root downwards,
PE1 must initiate the signaling to construct P2. This is only
required if P2 is an RSVP-TE P2MP LSP.
- If the specified C-flow is currently bound to a different
P-tunnel, say P1, then:
* PE1 MUST wait for a "switch-over" delay before sending
traffic of the C-flow on P-tunnel P2. It is RECOMMENDED to
allow this delay to be configurable.
* Once the "switch-over" delay has elapsed, PE1 MUST send
traffic for the C-flow on P2 and MUST NOT send it on P1. In
no case is any C-flow packet sent on both P-tunnels.
When a C-flow is switched from one P-tunnel to another, the purpose
of running a switch-over timer is to minimize packet loss without
introducing packet duplication. However, jitter may be introduced
due to the difference in transit delays between the old and new
P-tunnels.
For best effect, the switch-over timer should be configured to a
value that is "just long enough" (a) to allow all the PEs to learn
about the new binding of C-flow to P-tunnel and (b) to allow the PEs
to construct the P-tunnel, if it doesn't already exist.
If, after such a switch, the "old" P-tunnel P1 is no longer needed,
it SHOULD be torn down and the resources supporting it freed. The
procedures for "tearing down" a P-tunnel are specific to the P-tunnel
technology.
Procedures for binding sets of C-flows traveling along specified
C-trees (or sets of C-flows sharing any other characteristic) to a
specified P-tunnel (or for moving them from one P-tunnel to another)
are outside the scope of this document.
7.1.2. At the PE Receiving the C-flow from the P-Tunnel
Suppose that a particular PE, say PE1, learns, via the procedures of
Section 7.4, that some other PE, say PE2, has bound a particular
C-flow, designated as (C-S,C-G), to a particular P-tunnel, say P2.
Then, PE1 must determine whether it needs to receive (C-S,C-G)
traffic from PE2.
If BGP is being used to distribute C-multicast routing information
from PE to PE, the conditions under which PE1 needs to receive
(C-S,C-G) traffic from PE2 are specified in Section 12.3 of
[MVPN-BGP].
If PIM over an MI-PMSI is being used to distribute C-multicast
routing from PE to PE, PE1 needs to receive (C-S,C-G) traffic from
PE2 if one or more of the following conditions holds:
- PE1 has (C-S,C-G) state such that PE2 is PE1's Upstream PE for
(C-S,C-G), and PE1 has downstream neighbors ("non-null olist")
for the (C-S,C-G) state.
- PE1 has (C-*,C-G) state with an Upstream PE (not necessarily PE2)
and with downstream neighbors ( "non-null olist"), but PE1 does
not have (C-S,C-G) state.
- Native PIM methods are being used to prevent steady-state packet
duplication, and PE1 has either (C-*,C-G) or (C-S,C-G) state such
that the MI-PMSI is one of the downstream interfaces. Note that
this includes the case where PE1 is itself sending (C-S,C-G)
traffic on an S-PMSI. (In this case, PE1 needs to receive the
(C-S,C-G) traffic from PE2 in order to allow the PIM Assert
mechanism to function properly.)
Irrespective of whether BGP or PIM is being used to distribute
C-multicast routing information, once PE1 determines that it needs to
receive (C-S,C-G) traffic from PE2, the following procedures MUST be
applied:
- PE1 MUST take all necessary steps to be able to receive the
(C-S,C-G) traffic on P2.
* If P2 is a PIM tunnel or an mLDP LSP, PE1 will need to use
PIM or mLDP (respectively) to join P2 (unless it is already
joined to P2).
* PE1 may need to modify the forwarding state for (C-S,C-G) to
indicate that (C-S,C-G) traffic is to be accepted on P2. If
P2 is an Aggregate Tree, this also implies setting up the
demultiplexing forwarding entries based on the inner label as
described in Section 6.3.3
- If PE1 was previously receiving the (C-S,C-G) C-flow on another
P-tunnel, say P1, then:
* PE1 MAY run a switch-over timer, and until it expires, SHOULD
accept traffic for the given C-flow on both P1 and P2;
* If, after such a switch, the "old" P-tunnel P1 is no longer
needed, it SHOULD be torn down and the resources supporting
it freed. The procedures for "tearing down" a P-tunnel are
specific to the P-tunnel technology.
- If PE1 later determines that it no longer needs to receive any of
the C-multicast data that is being sent on a particular P-tunnel,
it may initiate signaling (specific to the P-tunnel technology)
to remove itself from that tunnel.
7.2. Optimizing Multicast Distribution via S-PMSIs
Whenever a particular multicast stream is being sent on an I-PMSI, it
is likely that the data of that stream is being sent to PEs that do
not require it. If a particular stream has a significant amount of
traffic, it may be beneficial to move it to an S-PMSI that includes
only those PEs that are transmitters and/or receivers (or at least
includes fewer PEs that are neither).
If explicit tracking is being done, S-PMSI creation can also be
triggered on other criteria. For instance, there could be a "pseudo-
wasted bandwidth" criterion: switching to an S-PMSI would be done if
the bandwidth multiplied by the number of uninterested PEs (PE that
are receiving the stream but have no receivers) is above a specified
threshold. The motivation is that (a) the total bandwidth wasted by
many sparsely subscribed low-bandwidth groups may be large and (b)
there's no point to moving a high-bandwidth group to an S-PMSI if all
the PEs have receivers for it.
Switching a (C-S,C-G) stream to an S-PMSI may require the root of the
S-PMSI to determine the egress PEs that need to receive the (C-S,C-G)
traffic. This is true in the following cases:
- If the P-tunnel is a source-initiated tree, such as an RSVP-TE
P2MP Tunnel, the PE needs to know the leaves of the tree before
it can instantiate the S-PMSI.
- If a PE instantiates multiple S-PMSIs, belonging to different
MVPNs, using one P-multicast tree, such a tree is termed an
Aggregate Tree with a selective mapping. The setting up of such
an Aggregate Tree requires the ingress PE to know all the other
PEs that have receivers for multicast groups that are mapped onto
the tree.
The above two cases require that explicit tracking be done for the
(C-S,C-G) stream. The root of the S-PMSI MAY decide to do explicit
tracking of this stream only after it has determined to move the
stream to an S-PMSI, or it MAY have been doing explicit tracking all
along.
If the S-PMSI is instantiated by a P-multicast tree, the PE at the
root of the tree must signal the leaves of the tree that the
(C-S,C-G) stream is now bound to the S-PMSI. Note that the PE could
create the identity of the P-multicast tree prior to the actual
instantiation of the P-tunnel.
If the S-PMSI is instantiated by a source-initiated P-multicast tree
(e.g., an RSVP-TE P2MP tunnel), the PE at the root of the tree must
establish the source-initiated P-multicast tree to the leaves. This
tree MAY have been established before the leaves receive the S-PMSI
binding, or it MAY be established after the leaves receive the
binding. The leaves MUST NOT switch to the S-PMSI until they receive
both the binding and the tree signaling message.
7.3. Announcing the Presence of Unsolicited Flooded Data
A PE may receive "unsolicited" data from a CE, where the data is
intended to be flooded to the other PEs of the same MVPN and then on
to other CEs. By "unsolicited", we mean that the data is to be
delivered to all the other PEs of the MVPN, even though those PEs may
not have sent any control information indicating that they need to
receive that data.
For example, if the BSR [BSR] is being used within the MVPN, BSR
control messages may be received by a PE from a CE. These need to be
forwarded to other PEs, even though no PE ever issues any kind of
explicit signal saying that it wants to receive BSR messages.
If a PE receives a BSR message from a CE, and if the CE's MVPN has an
MI-PMSI, then the PE can just send BSR messages on the appropriate
P-tunnel. Otherwise, the PE MUST announce the binding of a
particular C-flow to a particular P-tunnel, using the procedures of
Section 7.4. The particular C-flow in this case would be
(C-IPaddress_of_PE, ALL-PIM-ROUTERS). The P-tunnel identified by the
procedures of Section 7.4 may or may not be one that was previously
identified in the PMSI Tunnel attribute of an I-PMSI A-D route.
Further procedures for handling BSR may be found in Sections 5.2.1
and 5.3.4.
Analogous procedures may be used for announcing the presence of other
sorts of unsolicited flooded data, e.g., dense mode data or data from
proprietary protocols that presume messages can be flooded. However,
a full specification of the procedures for traffic other than BSR
traffic is outside the scope of this document.
7.4. Protocols for Binding C-Flows to P-Tunnels
We describe two protocols for binding C-flows to P-tunnels.
These protocols can be used for moving C-flows from I-PMSIs to
S-PMSIs, as long as the S-PMSI is instantiated by a P-multicast tree.
(If the S-PMSI is instantiated by means of ingress replication, the
procedures of Section 6.4.5 suffice.)
These protocols can also be used for other cases in which it is
necessary to bind specific C-flows to specific P-tunnels.
7.4.1. Using BGP S-PMSI A-D Routes
Not withstanding the name of the mechanism "S-PMSI A-D routes", the
mechanism to be specified in this section may be used any time it is
necessary to advertise a binding of a C-flow to a particular
P-tunnel.
7.4.1.1. Advertising C-Flow Binding to P-Tunnel
The ingress PE informs all the PEs that are on the path to receivers
of the (C-S,C-G) of the binding of the P-tunnel to the (C-S,C-G).
The BGP announcement is done by sending an update for the MCAST-VPN
address family. An S-PMSI A-D route is used, containing the
following information:
1. The IP address of the originating PE.
2. The RD configured locally for the MVPN. This is required to
uniquely identify the (C-S,C-G) as the addresses could overlap
between different MVPNs. This is the same RD value used in the
auto-discovery process.
3. The C-S address.
4. The C-G address.
5. A PE MAY use a single P-tunnel to aggregate two or more
S-PMSIs. If the PE already advertised unaggregated S-PMSI A-D
routes for these S-PMSIs, then a decision to aggregate them
requires the PE to re-advertise these routes. The re-
advertised routes MUST be the same as the original ones, except
for the PMSI Tunnel attribute. If the PE has not previously
advertised S-PMSI A-D routes for these S-PMSIs, then the
aggregation requires the PE to advertise (new) S-PMSI A-D
routes for these S-PMSIs. The PMSI Tunnel attribute in the
newly advertised/re-advertised routes MUST carry the identity
of the P-tunnel that aggregates the S-PMSIs.
If all these aggregated S-PMSIs belong to the same MVPN, and
this MVPN uses PIM as its C-multicast routing protocol, then
the corresponding S-PMSI A-D routes MAY carry an MPLS upstream-
assigned label [MPLS-UPSTREAM-LABEL]. Moreover, in this case,
the labels MUST be distinct on a per-MVPN basis, and MAY be
distinct on a per-route basis.
If all these aggregated S-PMSIs belong to the MVPN(s) that use
mLDP as its C-multicast routing protocol, then the
corresponding S-PMSI A-D routes MUST carry an MPLS upstream-
assigned label [MPLS-UPSTREAM-LABEL], and these labels MUST be
distinct on a per-route (per-mLDP-FEC) basis, irrespective of
whether the aggregated S-PMSIs belong to the same or different
MVPNs.
When a PE distributes this information via BGP, it must include the
following:
1. An identifier for the particular P-tunnel to which the stream
is to be bound. This identifier is a structured field that
includes the following information:
* The type of tunnel
* An identifier for the tunnel. The form of the identifier
will depend upon the tunnel type. The combination of
tunnel identifier and tunnel type should contain enough
information to enable all the PEs to "join" the tunnel and
receive messages from it.
2. Route Target Extended Communities attribute. This is used as
described in Section 4.
7.4.1.2. Explicit Tracking
If the PE wants to enable explicit tracking for the specified flow,
it also indicates this in the A-D route it uses to bind the flow to a
particular P-tunnel. Then, any PE that receives the A-D route will
respond with a "Leaf A-D route" in which it identifies itself as a
receiver of the specified flow. The Leaf A-D route will be withdrawn
when the PE is no longer a receiver for the flow.
If the PE needs to enable explicit tracking for a flow without at the
same time binding the flow to a specific P-tunnel, it can do so by
sending an S-PMSI A-D route whose NLRI identifies the flow and whose
PMSI Tunnel attribute has its tunnel type value set to "no tunnel
information present" and its "leaf information required" bit set to
1. This will elicit the Leaf A-D routes. This is useful when the PE
needs to know the receivers before selecting a P-tunnel.
7.4.2. UDP-Based Protocol
This procedure carries its control messages in UDP and requires that
the MVPN have an MI-PMSI that can be used to carry the control
messages.
7.4.2.1. Advertising C-Flow Binding to P-Tunnel
In order for a given PE to move a particular C-flow to a particular
P-tunnel, an "S-PMSI Join message" is sent periodically on the
MI-PMSI. (Notwithstanding the name of the mechanism, the mechanism
may be used to bind a flow to any P-tunnel.) The S-PMSI Join message
is a UDP-encapsulated message whose destination address is ALL-PIM-
ROUTERS (224.0.0.13) and whose destination port is 3232.
The S-PMSI Join message contains the following information:
- An identifier for the particular multicast stream that is to be
bound to the P-tunnel. This can be represented as an (S,G) pair.
- An identifier for the particular P-tunnel to which the stream is
to be bound. This identifier is a structured field that includes
the following information:
* The type of tunnel used to instantiate the S-PMSI.
* An identifier for the tunnel. The form of the identifier
will depend upon the tunnel type. The combination of tunnel
identifier and tunnel type should contain enough information
to enable all the PEs to "join" the tunnel and receive
messages from it.
* If (and only if) the identified P-tunnel is aggregating
several S-PMSIs, any demultiplexing information needed by the
tunnel encapsulation protocol to identify a particular
S-PMSI.
If the policy for the MVPN is that traffic is sent/received by
default over an MI-PMSI, then traffic for a particular C-flow can be
switched back to the MI-PMSI simply by ceasing to send S-PMSI Joins
for that C-flow.
Note that an S-PMSI Join that is not received over a PMSI (e.g., one
that is received directly from a CE) is an illegal packet that MUST
be discarded.
7.4.2.2. Packet Formats and Constants
The S-PMSI Join message is encapsulated within UDP and has the
following type/length/value (TLV) encoding:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Value |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| . |
| . |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type (8 bits)
Length (16 bits): the total number of octets in the Type, Length, and
Value fields combined
Value (variable length)
In this specification, only one type of S-PMSI Join is defined. A
Type 1 S-PMSI Join is used when the S-PMSI tunnel is a PIM tunnel
that is used to carry a single multicast stream, where the packets of
that stream have IPv4 source and destination IP addresses.
The S-PMSI Join format to use when the C-source and C-group are IPv6
addresses will be defined in a follow-on document.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| C-source |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| C-group |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| P-group |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type (8 bits): 1
Length (16 bits): 16
Reserved (8 bits): This field SHOULD be zero when transmitted, and
MUST be ignored when received.
C-source (32 bits): the IPv4 address of the traffic source in the
VPN.
C-group (32 bits): the IPv4 address of the multicast traffic
destination address in the VPN.
P-group (32 bits): the IPv4 group address that the PE router is going
to use to encapsulate the flow (C-source, C-group).
The P-group identifies the S-PMSI P-tunnel, and the (C-S,C-G)
identifies the multicast flow that is carried in the P-tunnel.
The protocol uses the following constants.
[S-PMSI_DELAY]:
Once an S-PMSI Join message has been sent, the PE router that is
to transmit onto the S-PMSI will delay this amount of time before
it begins using the S-PMSI. The default value is 3 seconds.
[S-PMSI_TIMEOUT]:
If a PE (other than the transmitter) does not receive any packets
over the S-PMSI P-tunnel for this amount of time, the PE will
prune itself from the S-PMSI P-tunnel, and will expect (C-S,C-G)
packets to arrive on an I-PMSI. The default value is 3 minutes.
This value must be consistent among PE routers.
[S-PMSI_HOLDOWN]:
If the PE that transmits onto the S-PMSI does not see any
(C-S,C-G) packets for this amount of time, it will resume sending
(C-S,C-G) packets on an I-PMSI.
This is used to avoid oscillation when traffic is bursty. The
default value is 1 minute.
[S-PMSI_INTERVAL]:
The interval the transmitting PE router uses to periodically send
the S-PMSI Join message. The default value is 60 seconds.
7.4.3. Aggregation
S-PMSIs can be aggregated on a P-multicast tree. The S-PMSI to
(C-S,C-G) binding advertisement supports aggregation. Furthermore,
the aggregation procedures of Section 6.3 apply. It is also possible
to aggregate both S-PMSIs and I-PMSIs on the same P-multicast tree.