Network Working Group V. Sharma, Ed. Request for Comments: 3469 Metanoia, Inc. Category: Informational F. Hellstrand, Ed. Nortel Networks February 2003 Framework for Multi-Protocol Label Switching (MPLS)-based Recovery Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved.
AbstractMulti-protocol label switching (MPLS) integrates the label swapping forwarding paradigm with network layer routing. To deliver reliable service, MPLS requires a set of procedures to provide protection of the traffic carried on different paths. This requires that the label switching routers (LSRs) support fault detection, fault notification, and fault recovery mechanisms, and that MPLS signaling support the configuration of recovery. With these objectives in mind, this document specifies a framework for MPLS based recovery. Restart issues are not included in this framework. 1. Introduction................................................2 1.1. Background............................................3 1.2. Motivation for MPLS-Based Recovery....................4 1.3. Objectives/Goals......................................5 2. Overview....................................................6 2.1. Recovery Models.......................................7 2.1.1 Rerouting.....................................7 2.1.2 Protection Switching..........................8 2.2. The Recovery Cycles...................................8 2.2.1 MPLS Recovery Cycle Model.....................8 2.2.2 MPLS Reversion Cycle Model...................10 2.2.3 Dynamic Re-routing Cycle Model...............12 2.2.4 Example Recovery Cycle.......................13 2.3. Definitions and Terminology..........................14 2.3.1 General Recovery Terminology.................14
2.3.2 Failure Terminology..........................17 2.4. Abbreviations........................................18 3. MPLS-based Recovery Principles.............................18 3.1. Configuration of Recovery............................19 3.2. Initiation of Path Setup.............................19 3.3. Initiation of Resource Allocation....................20 3.3.1 Subtypes of Protection Switching.............21 3.4. Scope of Recovery....................................21 3.4.1 Topology.....................................21 3.4.2 Path Mapping.................................24 3.4.3 Bypass Tunnels...............................25 3.4.4 Recovery Granularity.........................25 3.4.5 Recovery Path Resource Use...................26 3.5. Fault Detection......................................26 3.6. Fault Notification...................................27 3.7. Switch-Over Operation................................28 3.7.1 Recovery Trigger.............................28 3.7.2 Recovery Action..............................29 3.8. Post Recovery Operation..............................29 3.8.1 Fixed Protection Counterparts................29 3.8.2 Dynamic Protection Counterparts..............30 3.8.3 Restoration and Notification.................31 3.8.4 Reverting to Preferred Path (or Controlled Rearrangement)................31 3.9. Performance..........................................32 4. MPLS Recovery Features.....................................32 5. Comparison Criteria........................................33 6. Security Considerations....................................35 7. Intellectual Property Considerations.......................36 8. Acknowledgements...........................................36 9. References.................................................36 9.1 Normative References.................................36 9.2 Informative References...............................37 10. Contributing Authors.......................................37 11. Editors' Addresses.........................................39 12. Full Copyright Statement...................................40
At points in the document, we provide some thoughts about the operation or viability of certain recovery objectives. These should be viewed as the opinions of the authors, and not the consolidated views of the IETF. The document is informational and it is expected that a standards track document will be developed in the future to describe a subset of this document as to meet the needs currently specified by the TE WG. RFC3031], on the other hand, by integrating forwarding based on label-swapping of a link local label with network layer routing allows flexibility in the delivery of new routing services. MPLS allows for using such media-specific forwarding mechanisms as label swapping. This enables some sophisticated features such as quality-of-service (QoS) and traffic engineering [RFC2702] to be implemented more effectively. An important component of providing QoS, however, is the ability to transport data reliably and efficiently. Although the current routing algorithms are robust and survivable, the amount of time they take to recover from a fault can be significant, in the order of several seconds (for interior gateway protocols (IGPs)) or minutes (for exterior gateway protocols, such as the Border Gateway Protocol (BGP)), causing disruption of service for some applications in the interim. This is unacceptable in situations where the aim is to provide a highly reliable service, with recovery times that are in the order of seconds down to 10's of milliseconds. IP routing may also not be able to provide bandwidth recovery, where the objective is to provide not only an alternative path, but also bandwidth equivalent to that available on the original path. (For some recent work on bandwidth recovery schemes, the reader is referred to [MPLS- BACKUP].) Examples of such applications are Virtual Leased Line services, Stock Exchange data services, voice traffic, video services etc, i.e., every application that gets a disruption in service long enough to not fulfill service agreements or the required level of quality. MPLS recovery may be motivated by the notion that there are limitations to improving the recovery times of current routing algorithms. Additional improvement can be obtained by augmenting these algorithms with MPLS recovery mechanisms [MPLS-PATH]. Since MPLS is a possible technology of choice in future IP-based transport networks, it is useful that MPLS be able to provide protection and restoration of traffic. MPLS may facilitate the convergence of network functionality on a common control and management plane. Further, a protection priority could be used as a differentiating
mechanism for premium services that require high reliability, such as Virtual Leased Line services, and high priority voice and video traffic. The remainder of this document provides a framework for MPLS based recovery. It is focused at a conceptual level and is meant to address motivation, objectives and requirements. Issues of mechanism, policy, routing plans and characteristics of traffic carried by recovery paths are beyond the scope of this document.
restoration. In networks where the latter class of traffic is dominant, providing fast restoration to all classes of traffic may not be cost effective from a service provider's perspective. VI. MPLS has desirable attributes when applied to the purpose of recovery for connectionless networks. Specifically that an LSP is source routed and a forwarding path for recovery can be "pinned" and is not affected by transient instability in SPF routing brought on by failure scenarios. VII. Establishing interoperability of protection mechanisms between routers/LSRs from different vendors in IP or MPLS networks is desired to enable recovery mechanisms to work in a multivendor environment, and to enable the transition of certain protected services to an MPLS core.
on an individual path, or for all traffic on a group of paths. Note that a path is used as a general term and includes the notion of a link, IP route or LSP. VI. MPLS-based recovery techniques may be applicable for an entire end-to-end path or for segments of an end-to-end path. VII. MPLS-based recovery mechanisms should aim to take into consideration the recovery actions of lower layers. MPLS-based mechanisms should not trigger lower layer protection switching nor should MPLS-based mechanisms be triggered when lower layer switching has or may imminently occur. VIII. MPLS-based recovery mechanisms should aim to minimize the loss of data and packet reordering during recovery operations. (The current MPLS specification itself has no explicit requirement on reordering.) IX. MPLS-based recovery mechanisms should aim to minimize the state overhead incurred for each recovery path maintained. X. MPLS-based recovery mechanisms should aim to minimize the signaling overhead to setup and maintain recovery paths and to notify failures. XI. MPLS-based recovery mechanisms should aim to preserve the constraints on traffic after switchover, if desired. That is, if desired, the recovery path should meet the resource requirements of, and achieve the same performance characteristics as, the working path. We observe that some of the above are conflicting goals, and real deployment will often involve engineering compromises based on a variety of factors such as cost, end-user application requirements, network efficiency, complexity involved, and revenue considerations. Thus, these goals are subject to tradeoffs based on the above considerations.
resources consumed. Therefore it is expected that network operators will offer a spectrum of service levels. MPLS-based recovery should give the flexibility to select the recovery mechanism, choose the granularity at which traffic is protected, and to also choose the specific types of traffic that are protected in order to give operators more control over that tradeoff. With MPLS-based recovery, it can be possible to provide different levels of protection for different classes of service, based on their service requirements. For example, using approaches outlined below, a Virtual Leased Line (VLL) service or real-time applications like Voice over IP (VoIP) may be supported using link/node protection together with pre- established, pre-reserved path protection. Best effort traffic, on the other hand, may use path protection that is established on demand or may simply rely on IP re-route or higher layer recovery mechanisms. As another example of their range of application, MPLS- based recovery strategies may be used to protect traffic not originally flowing on label switched paths, such as IP traffic that is normally routed hop-by-hop, as well as traffic forwarded on label switched paths. Section 3.8. In terms of the principles defined in section 3, reroute recovery employs paths established-on-demand with resources reserved-on- demand.
section 3, protection switching employs pre-established recovery paths, and, if resource reservation is required on the recovery path, pre-reserved resources. The various sub-types of protection switching are detailed in Section 4.4 of this document. Figure 1. Definitions and a key to abbreviations follow. --Network Impairment | --Fault Detected | | --Start of Notification | | | -- Start of Recovery Operation | | | | --Recovery Operation Complete | | | | | --Path Traffic Recovered | | | | | | | | | | | | v v v v v v ---------------------------------------------------------------- | T1 | T2 | T3 | T4 | T5 | Figure 1. MPLS Recovery Cycle Model
The various timing measures used in the model are described below. T1 Fault Detection Time T2 Fault Hold-off Time T3 Fault Notification Time T4 Recovery Operation Time T5 Traffic Recovery Time Definitions of the recovery cycle times are as follows: Fault Detection Time The time between the occurrence of a network impairment and the moment the fault is detected by MPLS-based recovery mechanisms. This time may be highly dependent on lower layer protocols. Fault Hold-Off Time The configured waiting time between the detection of a fault and taking MPLS-based recovery action, to allow time for lower layer protection to take effect. The Fault Hold-off Time may be zero. Note: The Fault Hold-Off Time may occur after the Fault Notification Time interval if the node responsible for the switchover, the Path Switch LSR (PSL), rather than the detecting LSR, is configured to wait. Fault Notification Time The time between initiation of a Fault Indication Signal (FIS) by the LSR detecting the fault and the time at which the Path Switch LSR (PSL) begins the recovery operation. This is zero if the PSL detects the fault itself or infers a fault from such events as an adjacency failure. Note: If the PSL detects the fault itself, there still may be a Fault Hold-Off Time period between detection and the start of the recovery operation. Recovery Operation Time The time between the first and last recovery actions. This may include message exchanges between the PSL and PML (Path Merge LSR) to coordinate recovery actions.
Traffic Recovery Time The time between the last recovery action and the time that the traffic (if present) is completely recovered. This interval is intended to account for the time required for traffic to once again arrive at the point in the network that experienced disrupted or degraded service due to the occurrence of the fault (e.g., the PML). This time may depend on the location of the fault, the recovery mechanism, and the propagation delay along the recovery path. Figure 2. Note that the cycle shown below comes after the recovery cycle shown in Fig. 1. --Network Impairment Repaired | --Fault Cleared | | --Path Available | | | --Start of Reversion Operation | | | | --Reversion Operation Complete | | | | | --Traffic Restored on Preferred Path | | | | | | | | | | | | v v v v v v ----------------------------------------------------------------- | T7 | T8 | T9 | T10| T11| Figure 2. MPLS Reversion Cycle Model The various timing measures used in the model are described below. T7 Fault Clearing Time T8 Clear Hold-Off Time T9 Clear Notification Time T10 Reversion Operation Time T11 Traffic Reversion Time Note that time T6 (not shown above) is the time for which the network impairment is not repaired and traffic is flowing on the recovery path.
Definitions of the reversion cycle times are as follows: Fault Clearing Time The time between the repair of a network impairment and the time that MPLS-based mechanisms learn that the fault has been cleared. This time may be highly dependent on lower layer protocols. Clear Hold-Off Time The configured waiting time between the clearing of a fault and MPLS-based recovery action(s). Waiting time may be needed to ensure that the path is stable and to avoid flapping in cases where a fault is intermittent. The Clear Hold-Off Time may be zero. Note: The Clear Hold-Off Time may occur after the Clear Notification Time interval if the PSL is configured to wait. Clear Notification Time The time between initiation of a Fault Recovery Signal (FRS) by the LSR clearing the fault and the time at which the path switch LSR begins the reversion operation. This is zero if the PSL clears the fault itself. Note: If the PSL clears the fault itself, there still may be a Clear Hold-off Time period between fault clearing and the start of the reversion operation. Reversion Operation Time The time between the first and last reversion actions. This may include message exchanges between the PSL and PML to coordinate reversion actions. Traffic Reversion Time The time between the last reversion action and the time that traffic (if present) is completely restored on the preferred path. This interval is expected to be quite small since both paths are working and care may be taken to limit the traffic disruption (e.g., using "make before break" techniques and synchronous switch-over). In practice, the most interesting times in the reversion cycle are the Clear Hold-off Time and the Reversion Operation Time together with Traffic Reversion Time (or some other measure of traffic
disruption). The first interval is to ensure stability of the repaired path and the latter one is to minimize disruption time while the reversion action is in progress. Given that both paths are available, it is better to wait to have a well-controlled switch-back with minimal disruption than have an immediate operation that may cause new faults to be introduced (except, perhaps, when the recovery path is unable to offer a quality of service comparable to the preferred path). Figure 3. Note that the cycle shown below may be overlaid on the recovery cycle shown in Fig. 1 or the reversion cycle shown in Fig. 2, or both (in the event that both the recovery cycle and the reversion cycle take place before the routing protocols converge), and occurs if after the convergence of the routing protocols it is determined (based on on- line algorithms or off-line traffic engineering tools, network configuration, or a variety of other possible criteria) that there is a better route for the working path. --Network Enters a Semi-stable State after an Impairment | --Dynamic Routing Protocols Converge | | --Initiate Setup of New Working Path between PSL | | | and PML | | | --Switchover Operation Complete | | | | --Traffic Moved to New Working Path | | | | | | | | | | v v v v v ----------------------------------------------------------------- | T12 | T13 | T14 | T15 | Figure 3. Dynamic Rerouting Cycle Model The various timing measures used in the model are described below. T12 Network Route Convergence Time T13 Hold-down Time (optional) T14 Switchover Operation Time T15 Traffic Restoration Time
Network Route Convergence Time We define the network route convergence time as the time taken for the network routing protocols to converge and for the network to reach a stable state. Holddown Time We define the holddown period as a bounded time for which a recovery path must be used. In some scenarios it may be difficult to determine if the working path is stable. In these cases a holddown time may be used to prevent excess flapping of traffic between a working and a recovery path. Switchover Operation Time The time between the first and last switchover actions. This may include message exchanges between the PSL and PML to coordinate the switchover actions. Traffic Restoration Time The time between the last restoration action and the time that traffic (if present) is completely restored on the new preferred path. Section 2.1.1). VIII. A new working path is established between the PSL and the PML (assumption is that PSL and PML have not changed) IX. Traffic is switched over to the new working path.
RFC3031], and, in addition, introduces the following new terms.
Protection Counterpart The "other" path when discussing pre-planned protection switching schemes. The protection counterpart for the working path is the recovery path and vice-versa. Path Switch LSR (PSL) An LSR that is responsible for switching or replicating the traffic between the working path and the recovery path. Path Merge LSR (PML) An LSR that is responsible for receiving the recovery path traffic, and either merging the traffic back onto the working path, or, if it is itself the destination, passing the traffic on to the higher layer protocols. Point of Repair (POR) An LSR that is setup for performing MPLS recovery. In other words, an LSR that is responsible for effecting the repair of an LSP. The POR, for example, can be a PSL or a PML, depending on the type of recovery scheme employed. Intermediate LSR An LSR on a working or recovery path that is neither a PSL nor a PML for that path. Path Group (PG) A logical bundling of multiple working paths, each of which is routed identically between a Path Switch LSR and a Path Merge LSR. Protected Path Group (PPG) A path group that requires protection. Protected Traffic Portion (PTP) The portion of the traffic on an individual path that requires protection. For example, code points in the EXP bits of the shim header may identify a protected portion.
Bypass Tunnel A path that serves to back up a set of working paths using the label stacking approach [RFC3031]. The working paths and the bypass tunnel must all share the same path switch LSR (PSL) and the path merge LSR (PML). Switch-Over The process of switching the traffic from the path that the traffic is flowing on onto one or more alternate path(s). This may involve moving traffic from a working path onto one or more recovery paths, or may involve moving traffic from a recovery path(s) on to a more optimal working path(s). Switch-Back The process of returning the traffic from one or more recovery paths back to the working path(s). Revertive Mode A recovery mode in which traffic is automatically switched back from the recovery path to the original working path upon the restoration of the working path to a fault-free condition. This assumes a failed working path does not automatically surrender resources to the network. Non-revertive Mode A recovery mode in which traffic is not automatically switched back to the original working path after this path is restored to a fault-free condition. (Depending on the configuration, the original working path may, upon moving to a fault-free condition, become the recovery path, or it may be used for new working traffic, and be no longer associated with its original recovery path, i.e., is surrendered to the network.) MPLS Protection Domain The set of LSRs over which a working path and its corresponding recovery path are routed. MPLS Protection Plan The set of all LSP protection paths and the mapping from working to protection paths deployed in an MPLS protection domain at a given time.
Liveness Message A message exchanged periodically between two adjacent LSRs that serves as a link probing mechanism. It provides an integrity check of the forward and the backward directions of the link between the two LSRs as well as a check of neighbor aliveness. Path Continuity Test A test that verifies the integrity and continuity of a path or path segment. The details of such a test are beyond the scope of this document. (This could be accomplished, for example, by transmitting a control message along the same links and nodes as the data traffic or similarly could be measured by the absence of traffic and by providing feedback.)
periodically by the node/nodes closest to the point of failure, for some configurable length of time or until the transmitting node receives an acknowledgement from its neighbor. Fault Recovery Signal (FRS) A signal that indicates a fault along a working path has been repaired. Again, like the FIS, it is relayed by each intermediate LSR to its upstream or downstream neighbor, until is reaches the LSR that performs recovery of the original path. The FRS is transmitted periodically by the node/nodes closest to the point of failure, for some configurable length of time or until the transmitting node receives an acknowledgement from its neighbor.
RFC2205][RFC3209] or CR-LDP [RFC3212], or by any other means including SNMP. Pre-established: This is the same as the protection switching option. Here a recovery path(s) is established prior to any failure on the working path. The path selection can either be determined by an administrative centralized tool, or chosen based on some algorithm implemented at the PSL and possibly intermediate nodes. To guard against the situation when the pre-established recovery path fails before or at the same time as the working path, the recovery path should have secondary configuration options as explained in Section 3.3 below. Pre-Qualified: A pre-established path need not be created, it may be pre- qualified. A pre-qualified recovery path is not created expressly for protecting the working path, but instead is a path created for other purposes that is designated as a recovery path after determining that it is an acceptable alternative for carrying the working path traffic. Variants include the case where an optical path or trail is configured, but no switches are set.
Established-on-Demand: This is the same as the rerouting option. Here, a recovery path is established after a failure on its working path has been detected and notified to the PSL. The recovery path may be pre- computed or computed on demand, which influences recovery times.
Link Recovery/Restoration In this case, the recovery path may be configured to route around a certain link deemed to be unreliable. If protection switching is used, several recovery paths may be configured for one working path, depending on the specific faulty link that each protects against. Alternatively, if rerouting is used, upon the occurrence of a fault on the specified link, each path is rebuilt such that it detours around the faulty link. In this case, the recovery path need only be disjoint from its working path at a particular link on the working path, and may have overlapping segments with the working path. Traffic on the working path is switched over to an alternate path at the upstream LSR that connects to the failed link. Link recovery is potentially the fastest to perform the switchover, and can be effective in situations where certain path components are much more unreliable than others. Node Recovery/Restoration In this case, the recovery path may be configured to route around a neighbor node deemed to be unreliable. Thus the recovery path is disjoint from the working path only at a particular node and at links associated with the working path at that node. Once again, the traffic on the primary path is switched over to the recovery path at the upstream LSR that directly connects to the failed node, and the recovery path shares overlapping portions with the working path.
However, it may, in some cases, be slower than local repair since the fault notification message must now travel to the POR to trigger the recovery action.
Section 4.3.1.. 1-to-1 Protection In 1-to-1 protection the working path has a designated recovery path that is only to be used to recover that specific working path. n-to-1 Protection In n-to-1 protection, up to n working paths are protected using only one recovery path. If the intent is to protect against any single fault on any of the working paths, the n working paths should be diversely routed between the same PSL and PML. In some cases, handshaking between PSL and PML may be required to complete the recovery, the details of which are beyond the scope of this document. n-to-m Protection In n-to-m protection, up to n working paths are protected using m recovery paths. Once again, if the intent is to protect against any single fault on any of the n working paths, the n working paths and the m recovery paths should be diversely routed between the same PSL and PML. In some cases, handshaking between PSL and PML may be required to complete the recovery, the details of which are beyond the scope of this document. n-to-m protection is for further study. Split Path Protection In split path protection, multiple recovery paths are allowed to carry the traffic of a working path based on a certain configurable load splitting ratio. This is especially useful when no single recovery path can be found that can carry the entire traffic of the working path in case of a fault. Split path protection may require handshaking between the PSL and the PML(s), and may require the PML(s) to correlate the traffic arriving on
multiple recovery paths with the working path. Although this is an attractive option, the details of split path protection are beyond the scope of this document. RFC2702]. In this case, one LSP (the tunnel) is established between the PSL and PML following an acceptable route and a number of recovery paths can be supported through the tunnel via label stacking. It is not necessary to apply label stacking when using a bypass tunnel. A bypass tunnel can be used with any of the path mapping options discussed in the previous section. As with recovery paths, the bypass tunnel may or may not have resource reservations sufficient to provide recovery without service degradation. It is possible that the bypass tunnel may have sufficient resources to recover some number of working paths, but not all at the same time. If the number of recovery paths carrying traffic in the tunnel at any given time is restricted, this is similar to the n-to-1 or n-to-m protection cases mentioned in Section 3.4.2.
(PPG). When a fault occurs on the working path carrying the PPG, the PPG as a whole can be protected either by being switched to a bypass tunnel or by being switched to a recovery path. MPLS-PATH]. For either a link probing mechanism or path continuity test to be effective, the test message must be guaranteed to follow the same route as the working or recovery path, over the segment being tested. In addition, the path continuity test must take the path merge points
into consideration. In the case of a bi-directional link implemented as two unidirectional links, path failure could mean that either one or both unidirectional links are damaged. Path Degraded (PD) is a fault that indicates to MPLS-based recovery schemes/mechanisms that the path has connectivity, but that the quality of the connection is unacceptable. This may be detected by a path performance monitoring mechanism, or some other mechanism for determining the error rate on the path or some portion of the path. This is local to the LSR and consists of excessive discarding of packets at an interface, either due to label mismatch or due to TTL errors, for example. Link Failure (LF) is an indication from a lower layer that the link over which the path is carried has failed. If the lower layer supports detection and reporting of this fault (that is, any fault that indicates link failure e.g., SONET LOS (Loss of Signal)), this may be used by the MPLS recovery mechanism. In some cases, using LF indications may provide faster fault detection than using only MPLS- based fault detection mechanisms. Link Degraded (LD) is an indication from a lower layer that the link over which the path is carried is performing below an acceptable level. If the lower layer supports detection and reporting of this fault, it may be used by the MPLS recovery mechanism. In some cases, using LD indications may provide faster fault detection than using only MPLS-based fault detection mechanisms.
Since the FIS is a control message, it should be transmitted with high priority to ensure that it propagates rapidly towards the affected POR(s). Depending on how fault notification is configured in the LSRs of an MPLS domain, the FIS could be sent either as a Layer 2 or Layer 3 packet [MPLS-PATH]. The use of a Layer 2-based notification requires a Layer 2 path direct to the POR. An example of a FIS could be the liveness message sent by a downstream LSR to its upstream neighbor, with an optional fault notification field set or it can be implicitly denoted by a teardown message. Alternatively, it could be a separate fault notification packet. The intermediate LSR should identify which of its incoming links to propagate the FIS on.
- upon failure and after traffic has been moved to the recovery path, the resources associated with the original path remain reserved.
Section 4.8.1, without leaving the recovery path unprotected.
path to a working path, once the working path becomes operational following a fault. IV. A PSL may be capable of performing either a switch back to the original working path after the fault is corrected or a switchover to a new working path, upon the discovery or establishment of a more optimal working path. V. The recovery model should take into consideration path merging at intermediate LSRs. If a fault affects the merged segment, all the paths sharing that merged segment should be able to recover. Similarly, if a fault affects a non-merged segment, only the path that is affected by the fault should be recovered.
Local Repair schemes have a topological correlation that cuts across working paths and Network Plan approaches have a correlation that impacts the entire network. Backup Capacity Recovery schemes may require differing amounts of "backup capacity" in the event of a fault. This capacity will be dependent on the traffic characteristics of the network. However, it may also be dependent on the particular protection plan selection algorithms as well as the signaling and re-routing methods. Additive Latency Recovery schemes may introduce additive latency for traffic. For example, a recovery path may take many more hops than the working path. This may be dependent on the recovery path selection algorithms. Quality of Protection Recovery schemes can be considered to encompass a spectrum of "packet survivability" which may range from "relative" to "absolute". Relative survivability may mean that the packet is on an equal footing with other traffic of, as an example, the same diff-serv code point (DSCP) in contending for the resources of the portion of the network that survives the failure. Absolute survivability may mean that the survivability of the protected traffic has explicit guarantees. Re-ordering Recovery schemes may introduce re-ordering of packets. Also the action of putting traffic back on preferred paths might cause packet re-ordering. State Overhead As the number of recovery paths in a protection plan grows, the state required to maintain them also grows. Schemes may require differing numbers of paths to maintain certain levels of coverage, etc. The state required may also depend on the particular scheme used for recovery. The state overhead may be a function of several parameters. For example, the number of recovery paths and the number of the protected facilities (links, nodes, or shared link risk groups (SRLGs)).
Loss Recovery schemes may introduce a certain amount of packet loss during switchover to a recovery path. Schemes that introduce loss during recovery can measure this loss by evaluating recovery times in proportion to the link speed. In case of link or node failure a certain packet loss is inevitable. Coverage Recovery schemes may offer various types of failover coverage. The total coverage may be defined in terms of several metrics: I. Fault Types: Recovery schemes may account for only link faults or both node and link faults or also degraded service. For example, a scheme may require more recovery paths to take node faults into account. II. Number of concurrent faults: dependent on the layout of recovery paths in the protection plan, multiple fault scenarios may be able to be restored. III. Number of recovery paths: for a given fault, there may be one or more recovery paths. IV. Percentage of coverage: dependent on a scheme and its implementation, a certain percentage of faults may be covered. This may be subdivided into percentage of link faults and percentage of node faults. V. The number of protected paths may effect how fast the total set of paths affected by a fault could be recovered. The ratio of protection is n/N, where n is the number of protected paths and N is the total number of paths.
[RFC3031] Rosen, E., Viswanathan, A. and R. Callon, "Multiprotocol Label Switching Architecture", RFC 3031, January 2001. [RFC2702] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and J. McManus, "Requirements for Traffic Engineering Over MPLS", RFC 2702, September 1999. [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V. and G. Swallow, "RSVP-TE Extensions to RSVP for LSP Tunnels", RFC 3209, December 2001. [RFC3212] Jamoussi, B. (Ed.), Andersson, L., Callon, R., Dantu, R., Wu, L., Doolan, P., Worster, T., Feldman, N., Fredette, A., Girish, M., Gray, E., Heinanen, J., Kilty, T. and A. Malis, "Constraint-Based LSP Setup using LDP", RFC 3212, January 2002.
[MPLS-BACKUP] Vasseur, J. P., Charny, A., LeFaucheur, F., and Achirica, "MPLS Traffic Engineering Fast reroute: backup tunnel path computation for bandwidth protection", Work in Progress. [MPLS-PATH] Haung, C., Sharma, V., Owens, K., Makam, V. "Building Reliable MPLS Networks Using a Path Protection Mechanism", IEEE Commun. Mag., Vol. 40, Issue 3, March 2002, pp. 156-162. [RFC2205] Braden, R., Zhang, L., Berson, S., Herzog, S., "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification", RFC 2205, September 1997. Section 11, and is not repeated below.) Ben Mack-Crane Tellabs Operations, Inc. 1415 West Diehl Road Naperville, IL 60563 Phone: (630) 798-6197 EMail: Ben.Mack-Crane@tellabs.com Srinivas Makam Eshernet, Inc. 1712 Ada Ct. Naperville, IL 60540 Phone: (630) 308-3213 EMail: Smakam60540@yahoo.com
Ken Owens Edward Jones Investments 201 Progress Parkway St. Louis, MO 63146 Phone: (314) 515-3431 EMail: email@example.com Changcheng Huang Carleton University Minto Center, Rm. 3082 1125 Colonial By Drive Ottawa, Ont. K1S 5B6 Canada Phone: (613) 520-2600 x2477 EMail: Changcheng.Huang@sce.carleton.ca Jon Weil Brad Cain Storigen Systems 650 Suffolk Street Lowell, MA 01854 Phone: (978) 323-4454 EMail: firstname.lastname@example.org Loa Andersson EMail: email@example.com Bilel Jamoussi Nortel Networks 3 Federal Street, BL3-03 Billerica, MA 01821, USA Phone:(978) 288-4506 EMail: firstname.lastname@example.org
Angela Chiu AT&T Labs-Research 200 Laurel Ave. Rm A5-1F13 Middletown , NJ 07748 Phone: (732) 420-9061 EMail: email@example.com Seyhan Civanlar Lemur Networks, Inc. 135 West 20th Street, 5th Floor New York, NY 10011 Phone: (212) 367-7676 EMail: firstname.lastname@example.org
Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society.