RFC 6372

MPLS Transport Profile (MPLS-TP) Survivability Framework

Pages: 56
Informational

Part 3 of 3 – Pages 37 to 56

RFC6372 - Page 37 prevText

5.  Applicability and Scope of Survivability in MPLS-TP

   The MPLS-TP network can be viewed as two layers (the MPLS LSP layer
   and the PW layer).  The MPLS-TP network operates over data-link
   connections and data-link networks whereby the MPLS-TP links are
   provided by individual data links or by connections in a lower-layer
   network.  The MPLS LSP layer is a mandatory part of the MPLS-TP
   network, while the PW layer is an optional addition for supporting
   specific services.

   MPLS-TP survivability provides recovery from failure of the links and
   nodes in the MPLS-TP network.  The link defects and failures are
   typically caused by defects or failures in the underlying data-link
   connections and networks, but this section is only concerned with
   recovery actions performed in the MPLS-TP network, which must recover
   from the manifestation of any problem as a defect failure in the
   MPLS-TP network.

   This section lists the recovery elements (see Section 1) supported in
   each of the two layers that can recover from defects or failures of
   nodes or links in the MPLS-TP network.

RFC6372 - Page 38

   +--------------+---------------------+------------------------------+
   | Recovery     | MPLS LSP Layer      | PW Layer                     |
   | Element      |                     |                              |
   +--------------+---------------------+------------------------------+
   | Link         | MPLS LSP recovery   | The PW layer is not aware of |
   | Recovery     | can be used to      | the underlying network.      |
   |              | survive the failure | This function is not         |
   |              | of an MPLS-TP link. | supported.                   |
   +--------------+---------------------+------------------------------+
   | Segment/Span | An individual LSP   | For an SS-PW, segment        |
   | Recovery     | segment can be      | recovery is the same as      |
   |              | recovered to        | end-to-end recovery.         |
   |              | survive the failure | Segment recovery for an MS-PW|
   |              | of an MPLS-TP link. | is for future study, and     |
   |              |                     | this function is now         |
   |              |                     | provided using end-to-end    |
   |              |                     | recovery.                    |
   +--------------+---------------------+------------------------------+
   | Concatenated | A concatenated LSP  | Concatenated segment         |
   | Segment      | segment can be      | recovery (in an MS-PW) is for|
   | Recovery     | recovered to        | future study, and this       |
   |              | survive the failure | function is now provided     |
   |              | of an MPLS-TP link  | using end-to-end recovery.   |
   |              | or node.            |                              |
   +--------------+---------------------+------------------------------+
   | End-to-End   | An end-to-end LSP   | End-to-end PW recovery can   |
   | Recovery     | can be recovered to | be applied to survive any    |
   |              | survive any node or | node (including S-PE) or     |
   |              | link failure,       | link failure, except for     |
   |              | except for the      | failure of the ingress or    |
   |              | failure of the      | egress T-PE.                 |
   |              | ingress or egress   |                              |
   |              | node.               |                              |
   +--------------+---------------------+------------------------------+
   | Service      | The MPLS LSP layer  | PW-layer service recovery    |
   | Recovery     | is service-         | requires surviving faults in |
   |              | agnostic.  This     | T-PEs or on Attachment       |
   |              | function is not     | Circuits (ACs).  This is     |
   |              | supported.          | currently out of scope for   |
   |              |                     | MPLS-TP.                     |
   +--------------+---------------------+------------------------------+

                 Table 1: Recovery Elements Supported
                  by the MPLS LSP Layer and PW Layer

   Section 6 provides a description of mechanisms for MPLS-TP-LSP
   survivability.  Section 7 provides a brief overview of mechanisms for
   MPLS-TP-PW survivability.

RFC6372 - Page 39

6.  Mechanisms for Providing Survivability for MPLS-TP LSPs

   This section describes the existing mechanisms that provide LSP
   protection within MPLS-TP networks and highlights areas where new
   work is required.

6.1.  Management Plane

   As described above, a fundamental requirement of MPLS-TP is that
   recovery mechanisms should be capable of functioning in the absence
   of a control plane.  Recovery may be triggered by MPLS-TP OAM fault
   management functions or by external requests (e.g., an operator's
   request for manual control of protection switching).  Recovery LSPs
   (and in particular Restoration LSPs) may be provisioned through the
   management plane.

   The management plane may be used to configure the recovery domain by
   setting the reference end-point points (which control the recovery
   actions), the working and the recovery entities, and the recovery
   type (e.g., 1:1 bidirectional linear protection, ring protection,
   etc.).

   Additional parameters associated with the recovery process (such as
   WTR and hold-off timers, revertive/non-revertive operation, etc.) may
   also be configured.

   In addition, the management plane may initiate manual control of the
   recovery function.  A priority should be set for the fault conditions
   and the operator's requests.

   Since provisioning the recovery domain involves the selection of a
   number of options, mismatches may occur at the different reference
   points.  The MPLS-TP protocol to coordinate protection state, which
   is specified in [MPLS-TP-LP], may be used as an in-band (i.e., data-
   plane-based) control protocol to coordinate the protection states
   between the end points of the recovery domain, and to check the
   consistency of configured parameters (such as timers, revertive/non-
   revertive behavior, etc.) with discovered inconsistencies that are
   reported to the operator.

   It should also be possible for the management plane to track the
   recovery status by receiving reports or by issuing polls.

RFC6372 - Page 40

6.1.1.  Configuration of Protection Operation

   To implement the protection-switching mechanisms, the following
   entities and information should be configured and provisioned:

   o  The end points of a recovery domain.  As described above, these
      end points border on the element of recovery to which recovery is
      applied.

   o  The protection group, which, depending on the required protection
      scheme, consists of a recovery entity and one or more working
      entities.  In 1:1 or 1+1 P2P protection, the paths of the working
      entity and the recovery entities must be physically diverse in
      every respect (i.e., not share any resources or physical
      locations), in order to guarantee protection.

   o  As defined in Section 4.8, the SPME must be supported in order to
      implement data-plane-based LSP segment recovery, since related
      control messages (e.g., for OAM, Protection Path Coordination,
      etc.) can be initiated and terminated at the edges of a path where
      push and pop operations are enabled.  The SPME is an end-to-end
      LSP that in this context corresponds to the recovery entities
      (working and protection) and makes use of the MPLS construct of
      hierarchical nested LSP, as defined in [RFC3031].  OAM messages
      and messages to coordinate protection state can be initiated at
      the edge of the SPME and sent over G-ACH to the peer edge of the
      SPME.  It is necessary to configure the related SPMEs and map
      between the LSP segments being protected and the SPME.  Mapping
      can be 1:1 or 1:N to allow scalable protection of a set of LSP
      segments traversing the part of the network in which a protection
      domain is defined.

      Note that each of these LSPs can be initiated or terminated at
      different end points in the network, but that they all traverse
      the protection domain and share similar constraints (such as
      requirements for QoS, terms of protection, etc.).

   o  The protection type that should be defined (e.g., unidirectional
      1:1, bidirectional 1+1, etc.)

   o  Revertive/non-revertive behavior should be configured.

   o  Timers (such as WTR, hold-off timer, etc.) should be set.

RFC6372 - Page 41

6.1.2.  External Manual Commands

   The following external, manual commands may be provided for manual
   control of the protection-switching operation.  These commands apply
   to a protection group; they are listed in descending order of
   priority:

   o  Blocked protection action - a manual command to prevent data
      traffic from switching to the recovery entity.  This command
      actually disables the protection group.

   o  Force protection action - a manual command that forces a switch of
      normal data traffic to the recovery entity.

   o Manual protection action - a manual command that forces a switch of
      data traffic to the recovery entity only when there is no defect
      in the recovery entity.

   o Clear switching command - the operator may request that a previous
      administrative switch command (manual or force switch) be cleared.

6.2.  Fault Detection

   Fault detection is a fundamental part of recovery and survivability.
   In all schemes, with the exception of some types of 1+1 protection,
   the actions required for the recovery of traffic delivery depend on
   the discovery of some kind of fault.  In 1+1 protection, the selector
   (at the receiving end) may simply be configured to choose the better
   signal; thus, it does not detect a fault or degradation of itself,
   but simply identifies the path that is better for data delivery.

   Faults may be detected in a number of ways depending on the traffic
   pattern and the underlying hardware.  End-to-end faults may be
   reported by the application or by knowledge of the application's data
   pattern, but this is an unusual approach.  There are two more common
   mechanisms for detecting faults in the MPLS-TP layer:

   o  Faults reported by the lower layers.

   o  Faults detected by protocols within the MPLS-TP layer.

   In an IP/MPLS network, the second mechanism may utilize control-plane
   protocols (such as the routing protocols) to detect a failure of
   adjacency between neighboring nodes.  In an MPLS-TP network, it is
   possible that no control plane will be present.  Even if a control
   plane is present, it will be a GMPLS control plane [RFC3945], which
   logically separates control channels from data channels; thus, no
   conclusion about the health of a data channel can be drawn from the

RFC6372 - Page 42

   failure of an associated control channel.  MPLS-TP-layer faults are,
   therefore, only detected through the use of OAM protocols, as
   described in Section 6.4.1.

   Faults may, however, be reported by a lower layer.  These generally
   show up as interface failures or data-link failures (sometimes known
   as connectivity failures) within the MPLS-TP network, for example, an
   underlying optical link may detect loss of light and report a failure
   of the MPLS-TP link that uses it.  Alternatively, an interface card
   failure may be reported to the MPLS-TP layer.

   Faults reported by lower layers are only visible in specific nodes
   within the MPLS-TP network (i.e., at the adjacent end points of the
   MPLS-TP link).  This would only allow recovery to be performed
   locally, so, to enable recovery to be performed by nodes that are not
   immediately local to the fault, the fault must be reported (Sections
   6.4.3 and 6.5.4).

6.3.  Fault Localization

   If an MPLS-TP node detects that there is a fault in an LSP (that is,
   not a network fault reported from a lower layer, but a fault detected
   by examining the LSP), it can immediately perform a recovery action.
   However, unless the location of the fault is known, the only
   practical options are:

   o  Perform end-to-end recovery.

   o  Perform some other recovery as a speculative act.

   Since the speculative acts are not guaranteed to achieve the desired
   results and could consume resources unnecessarily, and since end-to-
   end recovery can require a lot of network resources, it is important
   to be able to localize the fault.

   Fault localization may be achieved by dividing the network into
   protection domains.  End-to-end protection is thereby operated on LSP
   segments, depending on the domain in which the fault is discovered.
   This necessitates monitoring of the LSP at the domain edges.

   Alternatively, a proactive mechanism of fault localization through
   OAM (Section 6.4.3) or through the control plane (Section 6.5.3) is
   required.

   Fault localization is particularly important for restoration because
   a new path must be selected that avoids the fault.  It may not be
   practical or desirable to select a path that avoids the entire failed

RFC6372 - Page 43

   working path, and it is therefore necessary to isolate the fault's
   location.

6.4.  OAM Signaling

   MPLS-TP provides a comprehensive set of OAM tools for fault
   management and performance monitoring at different nested levels
   (end-to-end, a portion of a path (LSP or PW), and at the link level)
   [RFC6371].

   These tools support proactive and on-demand fault management (for
   fault detection and fault localization) as well as performance
   monitoring (to measure the quality of the signals and detect
   degradation).

   To support fast recovery, it is useful to use some of the proactive
   tools to detect fault conditions (e.g., link/node failure or
   degradation) and to trigger the recovery action.

   The MPLS-TP OAM messages run in-band with the traffic and support
   unidirectional and bidirectional P2P paths as well as P2MP paths.

   As described in [RFC6371], MPLS-TP OAM operates in the context of a
   Maintenance Entity that borders on the OAM responsibilities and
   represents the portion of a path between two points that is monitored
   and maintained, and along which OAM messages are exchanged.
   [RFC6371] refers also to a Maintenance Entity Group (MEG), which is a
   collection of one or more Maintenance Entities (MEs) that belong to
   the same transport path (e.g., P2MP transport path) and which are
   maintained and monitored as a group.

   An ME includes two MEPs (Maintenance Entity Group End Points) that
   reside at the boundaries of an ME, and a set of zero or more MIPs
   (Maintenance Entity Group Intermediate Points) that reside within the
   Maintenance Entity along the path.  A MEP is capable of initiating
   and terminating OAM messages, and as such can only be located at the
   edges of a path where push and pop operations are supported.  In
   order to define an ME over a portion of path, it is necessary to
   support SPMEs.

   The SPME is an end-to-end LSP that in this context corresponds to the
   ME; it uses the MPLS construct of hierarchical nested LSPs, which is
   defined in [RFC3031].  OAM messages can be initiated at the edge of
   the SPME and sent over G-ACH to the peer edge of the SPME.

   The related SPMEs must be configured, and mapping must be performed
   between the LSP segments being monitored and the SPME.  Mapping can
   be 1:1 or 1:N to allow scalable operation.  Note that each of these

RFC6372 - Page 44

   LSPs can be initiated or terminated at different end points in the
   network and can share similar constraints (such as requirements for
   QoS, terms of protection, etc.).

   With regard to recovery, where MPLS-TP OAM is supported, an OAM
   Maintenance Entity Group is defined for each of the working and
   protection entities.

6.4.1.  Fault Detection

   MPLS-TP OAM tools may be used proactively to detect the following
   fault conditions between MEPs:

   o  Loss of continuity and misconnectivity - the proactive Continuity
      Check (CC) function is used to detect loss of continuity between
      two MEPs in an MEG.  The proactive Connectivity Verification (CV)
      allows a sink MEP to detect a misconnectivity defect (e.g.,
      mismerge or misconnection) with its peer source MEP when the
      received packet carries an incorrect ME identifier.  For
      protection switching, it is common to run a CC-V (Continuity Check
      and Connectivity Verification) message every 3.33 ms.  In the
      absence of three consecutive CC-V messages, loss of continuity is
      declared and is notified locally to the edge of the recovery
      domain in order to trigger a recovery action.  In some cases, when
      a slower recovery time is acceptable, it is also possible to
      lengthen the transmission rate.

   o  Signal degradation - notification from OAM performance monitoring
      indicating degradation in the working entity may also be used as a
      trigger for protection switching.  In the event of degradation,
      switching to the recovery entity is necessary only if the recovery
      entity can guarantee better conditions.  Degradation can be
      measured by proactively activating MPLS-TP OAM packet loss
      measurement or delay measurement.

   o  A MEP can receive an indication from its sink MEP of a Remote
      Defect Indication and locally notify the end point of the recovery
      domain regarding the fault condition, in order to trigger the
      recovery action.

6.4.2.  Testing for Faults

   The management plane may be used to initiate the testing of links,
   LSP segments, or entire LSPs.

   MPLS-TP provides OAM tools that may be manually invoked on-demand for
   a limited period, in order to troubleshoot links, LSP segments, or
   entire LSPs (e.g., diagnostics, connectivity verification, packet

RFC6372 - Page 45

   loss measurements, etc.).  On-demand monitoring covers a combination
   of "in-service" and "out-of-service" monitoring functions.  Out-of-
   service testing is supported by the OAM on-demand lock operation.
   The lock operation temporarily disables the transport entity (LSP,
   LSP segment, or link), preventing the transmission of all types of
   traffic, with the exceptions of test traffic and OAM (dedicated to
   the locked entity).

   [RFC6371] describes the operations of the OAM functions that may be
   initiated on-demand and provides some considerations.

   MPLS-TP also supports in-service and out-of-service testing of the
   recovery (protection and restoration) mechanism, the integrity of the
   protection/recovery transport paths, and the coordination protocol
   between the end points of the recovery domain.  The testing operation
   emulates a protection-switching request but does not perform the
   actual switching action.

6.4.3.  Fault Localization

   MPLS-TP provides OAM tools to locate a fault and determine its
   precise location.  Fault detection often only takes place at key
   points in the network (such as at LSP end points or at MEPs).  This
   means that a fault may be located anywhere within a segment of the
   relevant LSP.  Finer information granularity is needed to implement
   optimal recovery actions or to diagnose the fault.  On-demand tools
   like trace-route, loopback, and on-demand CC-V can be used to
   localize a fault.

   The information may be notified locally to the end point of the
   recovery domain to allow implementation of optimal recovery action.
   This may be useful for the re-calculation of a recovery path.

   The information should also be reported to network management for
   diagnostic purposes.

6.4.4.  Fault Reporting

   The end points of a recovery domain should be able to detect fault
   conditions in the recovery domain and to notify the management plane.

   In addition, a node within a recovery domain that detects a fault
   condition should also be able to report this to network management.
   Network management should be capable of correlating the fault reports
   and identifying the source of the fault.

   MPLS-TP OAM tools support a function where an intermediate node along
   a path is able to send an alarm report message to the MEP, indicating

RFC6372 - Page 46

   the presence of a fault condition in the server layer that connects
   it to its adjacent node.  This capability allows a MEP to suppress
   alarms that may be generated as a result of a failure condition in
   the server layer.

6.4.5.  Coordination of Recovery Actions

   As described above, in some cases (such as in bidirectional
   protection switching, etc.) it is necessary to coordinate the
   protection states between the edges of the recovery domain.
   [MPLS-TP-LP] defines procedures, protocol messages, and elements for
   this purpose.

   The protocol is also used to signal administrative requests (e.g.,
   manual switch, etc.), but only when these are provisioned at the edge
   of the recovery domain.

   The protocol also enables mismatches to be detected between the
   configurations at the ends of the protection domain (such as timers,
   revertive/non-revertive behavior); these mismatches can subsequently
   be reported to the management plane.

   In the absence of suitable coordination (owing to failures in the
   delivery or processing of the coordination protocol messages),
   protection switching will fail.  This means that the operation of the
   protocol that coordinates the protection state is a fundamental part
   of protection switching.

6.5.  Control Plane

   The GMPLS control plane has been proposed as the control plane for
   MPLS-TP [RFC5317].  Since GMPLS was designed for use in transport
   networks, and since it has been implemented and deployed in many
   networks, it is not surprising that it contains many features that
   support a high degree of survivability.

   The signaling elements of the GMPLS control plane utilize extensions
   to the Resource Reservation Protocol (RSVP) (as described in a series
   of documents commencing with [RFC3471] and [RFC3473]), although it is
   based on [RFC3209] and [RFC2205].  The architecture for GMPLS is
   provided in [RFC3945], while [RFC4426] gives a functional description
   of the protocol extensions needed to support GMPLS-based recovery
   (i.e., protection and restoration).

   A further control-plane protocol called the Link Management Protocol
   (LMP) [RFC4204] is part of the GMPLS protocol family and can be used
   to coordinate fault localization and reporting.

RFC6372 - Page 47

   Clearly, the control-plane techniques described here only apply where
   an MPLS-TP control plane is deployed and operated.  All mandatory
   MPLS-TP survivability features must be enabled, even in the absence
   of the control plane.  However, when present, the control plane may
   be used to provide alternative mechanisms that may be desirable,
   since they offer simple automation or a richer feature set.

6.5.1.  Fault Detection

   The control plane is unable to detect data-plane faults.  However, it
   does provide mechanisms that detect control-plane faults, and these
   can be used to recognize data-plane faults when it is evident that
   the control and data planes are fate-sharing.  Although [RFC5654]
   specifies that MPLS-TP must support an out-of-band control channel,
   it does not insist that it be used exclusively.  This means that
   there may be deployments where an in-band (or at least an in-fiber)
   control channel is used.  In this scenario, failure of the control
   channel can be used to infer that there is a failure of the data
   channel, or, at least, it can be used to trigger an investigation of
   the health of the data channel.

   Both RSVP and LMP provide a control channel "keep-alive" mechanism
   (called the Hello message in both cases).  Failure to receive a
   message in the configured/negotiated time period indicates a control-
   plane failure.  GMPLS routing protocols ([RFC4203] and [RFC5307])
   also include keep-alive mechanisms designed to detect routing
   adjacency failures.  Although these keep-alive mechanisms tend to
   operate at a relatively low frequency (on the order of seconds), it
   is still possible that the first indication of a control-plane fault
   will be received through the routing protocol.

   Note, however, that care must be taken to ascertain that a specific
   failure is not caused by a problem in the control-plane software or
   in a processor component at the far end of a link.

   Because of the various issues involved, it is not recommended that
   the control plane be used as the primary mechanism for fault
   detection in an MPLS-TP network.

6.5.2.  Testing for Faults

   The control plane may be used to initiate and coordinate the testing
   of links, LSP segments, or entire LSPs.  This is important in some
   technologies where it is necessary to halt data transmission while
   testing, but it may also be useful where testing needs to be
   specifically enabled or configured.

RFC6372 - Page 48

   LMP provides a control-plane mechanism to test the continuity and
   connectivity (and naming) of individual links.  A single management
   operation is required to initiate the test at one end of the link,
   while the LMP handles the coordination with the other end of the
   link.  The test mechanism for an MPLS packet link relies on the LMP
   Test message inserted into the data stream at one end of the link and
   extracted at the other end of the link.  This mechanism need not
   disrupt data flowing over the link.

   Note that a link in the LMP may, in fact, be an LSP tunnel used to
   form a link in the MPLS-TP network.

   GMPLS signaling (RSVP) offers two mechanisms that may also assist
   with fault testing.  The first mechanism [RFC3473] defines the
   Admin_Status object that allows an LSP to be set into "testing mode".
   The interpretation of this mode is implementation-specific and could
   be documented more precisely for MPLS-TP.  The mode sets the whole
   LSP into a state where it can be tested; this need not be disruptive
   to data traffic.

   The second mechanism provided by GMPLS to support testing is
   described in [GMPLS-OAM].  This protocol extension supports the
   configuration (including enabling and disabling) of OAM mechanisms
   for a specific LSP.

6.5.3.  Fault Localization

   Fault localization is the process whereby the exact location of a
   fault is determined.  Fault detection often only takes place at key
   points in the network (such as at LSP end points or at MEPs).  This
   means that a fault may be located anywhere within a segment of the
   relevant LSP.

   If segment or end-to-end protection is in use, this level of
   information is often sufficient to repair the LSP.  However, if finer
   information granularity is required (either to implement optimal
   recovery actions or to diagnose a fault), it is necessary to localize
   the specific fault.

   LMP provides a cascaded test-and-propagate mechanism that is designed
   specifically for this purpose.

6.5.4.  Fault Status Reporting

   GMPLS signaling uses the Notify message to report fault status
   [RFC3473].  The Notify message can apply to a single LSP or can carry
   fault information for a set of LSPs, in order to improve the
   scalability of fault notification.

RFC6372 - Page 49

   Since the Notify message is targeted at a specific node, it can be
   delivered rapidly without requiring hop-by-hop processing.  It can be
   targeted at LSP end points or at segment end points (such as MEPs).
   The target points for Notify messages can be manually configured
   within the network, or they may be signaled when the LSP is set up.

   This enables the process to be made consistent with segment
   protection as well as with the concept of Maintenance Entities.

   GMPLS signaling also provides a slower, hop-by-hop mechanism for
   reporting individual LSP faults on a hop-by-hop basis using PathErr
   and ResvErr messages.

   [RFC4783] provides a mechanism to coordinate alarms and other event
   or fault information through GMPLS signaling.  This mechanism is
   useful for understanding the status of the resources used by an LSP
   and for providing information as to why an LSP is not functioning;
   however, it is not intended to replace other fault-reporting
   mechanisms.

   GMPLS routing protocols [RFC4203] and [RFC5307] are used to advertise
   link availability and capabilities within a GMPLS-enabled network.
   Thus, the routing protocols can also provide indirect information
   about network faults; that is, the protocol may stop advertising or
   may withdraw the advertisement for a failed link, or it may advertise
   that the link is about to be shut down gracefully [RFC5817].  This
   mechanisms is, however, not normally considered to be fast enough for
   use as a trigger for protection switching.

6.5.5.  Coordination of Recovery Actions

   Fault coordination is an important feature for certain protection
   mechanisms (such as bidirectional 1:1 protection).  The use of the
   GMPLS Notify message for this purpose is described in [RFC4426];
   however, specific message field values have not yet been defined for
   this operation.

   Further work is needed in GMPLS for control and configuration of
   reversion behavior for end-to-end and segment protection, and the
   coordination of timer values.

6.5.6.  Establishment of Protection and Restoration LSPs

   The management plane may be used to set up protection and recovery
   LSPs, but, when present, the control plane may be used.

RFC6372 - Page 50

   Several protocol extensions exist that simplify this process:

   o  [RFC4872] provides features that support end-to-end protection
      switching.

   o  [RFC4873] describes the establishment of a single, segment-
      protected LSP.  Note that end-to-end protection is a special case
      of segment protection, and [RFC4872] can also be used to provide
      end-to-end protection.

   o  [RFC4874] allows an LSP to be signaled with a request that its
      path exclude specified resources such as links, nodes, and shared
      risk link groups (SRLGs).  This allows a disjoint protection path
      to be requested or a recovery path to be set up to avoid failed
      resources.

   o  Lastly, it should be noted that [RFC5298] provides an overview of
      the GMPLS techniques available to achieve protection in multi-
      domain environments.

7.  Pseudowire Recovery Considerations

   Pseudowires provide end-to-end connectivity over the MPLS-TP network
   and may comprise a single pseudowire segment, or multiple segments
   "stitched" together to provide end-to-end connectivity.

   The pseudowire may, itself, require protection, in order to meet the
   service-level guarantees of its SLA.  This protection could be
   provided by the MPLS-TP LSPs that support the pseudowire, or could be
   a feature of the pseudowire layer itself.

   As indicated above, the functional architecture described in this
   document applies to both LSPs and pseudowires.  However, the recovery
   mechanisms for pseudowires are for further study and will be defined
   in a separate document by the PWE3 working group.

7.1.  Utilization of Underlying MPLS-TP Recovery

   MPLS-TP PWs are carried across the network inside MPLS-TP LSPs.
   Therefore, an obvious way to provide protection for a PW is to
   protect the LSP that carries it.  Such protection can take any of the
   forms described in this document.  The choice of recovery scheme will
   depend on the required speed of recovery and the traffic loss that is
   acceptable for the SLA that the PW is providing.

   If the PW is a Multi-Segment PW, then LSP recovery can only protect
   the PW in individual segments.  This means that a single LSP recovery
   action cannot protect against a failure of a PW switching point (an

RFC6372 - Page 51

   S-PE), nor can it protect more than one segment at a time, since the
   LSP tunnel is terminated at each S-PE.  In this respect, LSP
   protection of a PW is very similar to link-level protection offered
   to the MPLS-TP LSP layer by an underlying network layer (see Section
   4.9).

7.2.  Recovery in the Pseudowire Layer

   Recovery in the PW layer can be provided by simply running separate
   PWs end-to-end.  Other recovery mechanisms in the PW layer, such as
   segment or concatenated segment recovery, or service-level recovery
   involving survivability of T-PE or AC faults will be described in a
   separate document.

   As with any recovery mechanism, it is important to coordinate between
   layers.  This coordination is necessary to ensure that actions
   associated with recovery mechanisms are only performed in one layer
   at a time (that is, the recovery of an underlying LSP needs to be
   coordinated with the recovery of the PW itself).  It also makes sure
   that the working and protection PWs do not both use the same MPLS
   resources within the network (for example, by running over the same
   LSP tunnel; see also Section 4.9).

8.  Manageability Considerations

   Manageability of MPLS-TP networks and their functions is discussed in
   [RFC5950].  OAM features are discussed in [RFC6371].

   Survivability has some key interactions with management, as described
   in this document.  In particular:

   o  Recovery domains may be configured in a way that prevents one-to-
      one correspondence between the MPLS-TP network and the recovery
      domains.

   o  Survivability policies may be configured per network, per recovery
      domain, or per LSP.

   o  Configuration of OAM may involve the selection of MEPs; enabling
      OAM on network segments, spans, and links; and the operation of
      OAM on LSPs, concatenated LSP segments, and LSP segments.

   o  Manual commands may be used to control recovery functions,
      including forcing recovery and locking recovery actions.

   See also the considerations regarding security for management and OAM
   in Section 9 of this document.

RFC6372 - Page 52

9.  Security Considerations

   This framework does not introduce any new security considerations;
   general issues relating to MPLS security can be found in [RFC5920].

   However, several points about MPLS-TP survivability should be noted
   here.

   o  If an attacker is able to force a protection switch-over, this may
      result in a small perturbation to user traffic and could result in
      extra traffic being preempted or displaced from the protection
      resources.  In the case of 1:n protection or shared mesh
      protection, this may result in other traffic becoming unprotected.
      Therefore, it is important that OAM protocols for detecting or
      notifying faults use adequate security to prevent them from being
      used (through the insertion of bogus messages or through the
      capture of legitimate messages) to falsely trigger a recovery
      event.

   o  If manual commands are modified, captured, or simulated (including
      replay), it might be possible for an attacker to perform forced
      recovery actions or to impose lock-out.  These actions could
      impact the capability to provide the recovery function and could
      also affect the normal operation of the network for other traffic.
      Therefore, management protocols used to perform manual commands
      must allow the operator to use appropriate security mechanisms.
      This includes verification that the user who performs the commands
      has appropriate authorization.

   o  If the control plane is used to configure or operate recovery
      mechanisms, the control-plane protocols must also be capable of
      providing adequate security.

10.  Acknowledgments

   Thanks to the following people for useful comments and discussions:
   Italo Busi, David McWalter, Lou Berger, Yaacov Weingarten, Stewart
   Bryant, Dan Frost, Lievren Levrau, Xuehui Dai, Liu Guoman, Xiao Min,
   Daniele Ceccarelli, Scott Bradner, Francesco Fondelli, Curtis
   Villamizar, Maarten Vissers, and Greg Mirsky.

   The Editors would like to thank the participants in ITU-T Study Group
   15 for their detailed review.

   Some figures and text on shared mesh protection were borrowed from
   [MPLS-TP-MESH] with thanks to Tae-sik Cheung and Jeong-dong Ryoo.

RFC6372 - Page 53

11.  References

11.1.  Normative References

   [G.806]        ITU-T, "Characteristics of transport equipment -
                  Description methodology and generic functionality",
                  Recommendation G.806, January 2009.

   [G.808.1]      ITU-T, "Generic Protection Switching - Linear trail
                  and subnetwork protection", Recommendation G.808.1,
                  December 2003.

   [G.841]        ITU-T, "Types and Characteristics of SDH Network
                  Protection Architectures", Recommendation G.841,
                  October 1998.

   [RFC2205]      Braden, R., Ed., Zhang, L., Berson, S., Herzog, S.,
                  and S. Jamin, "Resource ReSerVation Protocol (RSVP) --
                  Version 1 Functional Specification", RFC 2205,
                  September 1997.

   [RFC3209]      Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan,
                  V., and G. Swallow, "RSVP-TE: Extensions to RSVP for
                  LSP Tunnels", RFC 3209, December 2001.

   [RFC3471]      Berger, L., Ed., "Generalized Multi-Protocol Label
                  Switching (GMPLS) Signaling Functional Description",
                  RFC 3471, January 2003.

   [RFC3473]      Berger, L., Ed., "Generalized Multi-Protocol Label
                  Switching (GMPLS) Signaling Resource ReserVation
                  Protocol-Traffic Engineering (RSVP-TE) Extensions",
                  RFC 3473, January 2003.

   [RFC3945]      Mannie, E., Ed., "Generalized Multi-Protocol Label
                  Switching (GMPLS) Architecture", RFC 3945, October
                  2004.

   [RFC4203]      Kompella, K., Ed., and Y. Rekhter, Ed., "OSPF
                  Extensions in Support of Generalized Multi-Protocol
                  Label Switching (GMPLS)", RFC 4203, October 2005.

   [RFC4204]      Lang, J., Ed., "Link Management Protocol (LMP)", RFC
                  4204, October 2005.

RFC6372 - Page 54

   [RFC4427]      Mannie, E., Ed., and D. Papadimitriou, Ed., "Recovery
                  (Protection and Restoration) Terminology for
                  Generalized Multi-Protocol Label Switching (GMPLS)",
                  RFC 4427, March 2006.

   [RFC4428]      Papadimitriou, D., Ed., and E. Mannie, Ed., "Analysis
                  of Generalized Multi-Protocol Label Switching
                  (GMPLS)-based Recovery Mechanisms (including
                  Protection and Restoration)", RFC 4428, March 2006.

   [RFC4873]      Berger, L., Bryskin, I., Papadimitriou, D., and A.
                  Farrel, "GMPLS Segment Recovery", RFC 4873, May 2007.

   [RFC5307]      Kompella, K., Ed., and Y. Rekhter, Ed., "IS-IS
                  Extensions in Support of Generalized Multi-Protocol
                  Label Switching (GMPLS)", RFC 5307, October 2008.

   [RFC5317]      Bryant, S., Ed., and L. Andersson, Ed., "Joint Working
                  Team (JWT) Report on MPLS Architectural Considerations
                  for a Transport Profile", RFC 5317, February 2009.

   [RFC5586]      Bocci, M., Ed., Vigoureux, M., Ed., and S. Bryant,
                  Ed., "MPLS Generic Associated Channel", RFC 5586, June
                  2009.

   [RFC5654]      Niven-Jenkins, B., Ed., Brungard, D., Ed., Betts, M.,
                  Ed., Sprecher, N., and S. Ueno, "Requirements of an
                  MPLS Transport Profile", RFC 5654, September 2009.

   [RFC5921]      Bocci, M., Ed., Bryant, S., Ed., Frost, D., Ed.,
                  Levrau, L., and L. Berger, "A Framework for MPLS in
                  Transport Networks", RFC 5921, July 2010.

   [RFC5950]      Mansfield, S., Ed., Gray, E., Ed., and K. Lam, Ed.,
                  "Network Management Framework for MPLS-based Transport
                  Networks", RFC 5950, September 2010.

   [RFC6371]      Buci, I., Ed. and B. Niven-Jenkins, Ed., "A Framework
                  for MPLS in Transport Networks", RFC 6371, September
                  2011.

11.2.  Informative References

   [GMPLS-OAM]    Takacs, A., Fedyk, D., and J. He, "GMPLS RSVP-TE
                  extensions for OAM Configuration", Work in Progress,
                  July 2011.

RFC6372 - Page 55

   [MPLS-TP-LP]   Weingarten, Y., Osborne, E., Sprecher, N., Fulignoli,
                  A., Ed., and Y. Weingarten, Ed., "MPLS-TP Linear
                  Protection", Work in Progress, August 2011.

   [MPLS-TP-MESH] Cheung, T. and J. Ryoo, "MPLS-TP Shared Mesh
                  Protection", Work in Progress, April 2011.

   [RFC3031]      Rosen, E., Viswanathan, A., and R. Callon,
                  "Multiprotocol Label Switching Architecture", RFC
                  3031, January 2001.

   [RFC3386]      Lai, W., Ed., and D. McDysan, Ed., "Network Hierarchy
                  and Multilayer Survivability", RFC 3386, November
                  2002.

   [RFC3469]      Sharma, V., Ed., and F. Hellstrand, Ed., "Framework
                  for Multi-Protocol Label Switching (MPLS)-based
                  Recovery", RFC 3469, February 2003.

   [RFC4397]      Bryskin, I. and A. Farrel, "A Lexicography for the
                  Interpretation of Generalized Multiprotocol Label
                  Switching (GMPLS) Terminology within the Context of
                  the ITU-T's Automatically Switched Optical Network
                  (ASON) Architecture", RFC 4397, February 2006.

   [RFC4426]      Lang, J., Ed., Rajagopalan, B., Ed., and D.
                  Papadimitriou, Ed., "Generalized Multi-Protocol Label
                  Switching (GMPLS) Recovery Functional Specification",
                  RFC 4426, March 2006.

   [RFC4726]      Farrel, A., Vasseur, J.-P., and A. Ayyangar, "A
                  Framework for Inter-Domain Multiprotocol Label
                  Switching Traffic Engineering", RFC 4726, November
                  2006.

   [RFC4783]      Berger, L., Ed., "GMPLS - Communication of Alarm
                  Information", RFC 4783, December 2006.

   [RFC4872]      Lang, J., Ed., Rekhter, Y., Ed., and D. Papadimitriou,
                  Ed., "RSVP-TE Extensions in Support of End-to-End
                  Generalized Multi-Protocol Label Switching (GMPLS)
                  Recovery", RFC 4872, May 2007.

   [RFC4874]      Lee, CY., Farrel, A., and S. De Cnodder, "Exclude
                  Routes - Extension to Resource ReserVation Protocol-
                  Traffic Engineering (RSVP-TE)", RFC 4874, April 2007.

RFC6372 - Page 56

   [RFC5212]      Shiomoto, K., Papadimitriou, D., Le Roux, JL.,
                  Vigoureux, M., and D. Brungard, "Requirements for
                  GMPLS-Based Multi-Region and Multi-Layer Networks
                  (MRN/MLN)", RFC 5212, July 2008.

   [RFC5298]      Takeda, T., Ed., Farrel, A., Ed., Ikejiri, Y., and JP.
                  Vasseur, "Analysis of Inter-Domain Label Switched Path
                  (LSP) Recovery", RFC 5298, August 2008.

   [RFC5817]      Ali, Z., Vasseur, JP., Zamfir, A., and J. Newton,
                  "Graceful Shutdown in MPLS and Generalized MPLS
                  Traffic Engineering Networks", RFC 5817, April 2010.

   [RFC5920]      Fang, L., Ed., "Security Framework for MPLS and GMPLS
                  Networks", RFC 5920, July 2010.

   [RFC6373]      Andersson, L., Ed., Berger, L., Ed., Fang, L., Ed.,
                  and Bitar, N., Ed, and E. Gray, Ed., "MPLS-TP Control
                  Plane Framework", RFC 6373, September 2011.

   [RFC6291]      Andersson, L., van Helvoort, H., Bonica, R.,
                  Romascanu, D., and S. Mansfield, "Guidelines for the
                  Use of the "OAM" Acronym in the IETF", BCP 161, RFC
                  6291, June 2011.

   [ROSETTA]      Van Helvoort, H., Ed., Andersson, L., Ed., and N.
                  Sprecher, Ed., "A Thesaurus for the Terminology used
                  in Multiprotocol Label Switching Transport Profile
                  (MPLS-TP) drafts/RFCs and ITU-T's Transport Network
                  Recommendations", Work in Progress, June 2011.

Authors' Addresses

   Nurit Sprecher (editor)
   Nokia Siemens Networks
   3 Hanagar St.
   Neve Ne'eman B Hod
   Hasharon, 45241 Israel

   EMail: nurit.sprecher@nsn.com


   Adrian Farrel (editor)
   Juniper Networks

   EMail: adrian@olddog.co.uk