Internet Engineering Task Force (IETF) M. Shand Request for Comments: 5714 S. Bryant Category: Informational Cisco Systems ISSN: 2070-1721 January 2010 IP Fast Reroute Framework
AbstractThis document provides a framework for the development of IP fast- reroute mechanisms that provide protection against link or router failure by invoking locally determined repair paths. Unlike MPLS fast-reroute, the mechanisms are applicable to a network employing conventional IP routing and forwarding. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5714. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Scope and Applicability . . . . . . . . . . . . . . . . . . . 5 4. Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . 5 5. Mechanisms for IP Fast-Reroute . . . . . . . . . . . . . . . . 7 5.1. Mechanisms for Fast Failure Detection . . . . . . . . . . 7 5.2. Mechanisms for Repair Paths . . . . . . . . . . . . . . . 8 5.2.1. Scope of Repair Paths . . . . . . . . . . . . . . . . 9 5.2.2. Analysis of Repair Coverage . . . . . . . . . . . . . 9 5.2.3. Link or Node Repair . . . . . . . . . . . . . . . . . 10 5.2.4. Maintenance of Repair Paths . . . . . . . . . . . . . 10 5.2.5. Local Area Networks . . . . . . . . . . . . . . . . . 11 5.2.6. Multiple Failures and Shared Risk Link Groups . . . . 11 5.3. Mechanisms for Micro-Loop Prevention . . . . . . . . . . . 12 6. Management Considerations . . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 9. Informative References . . . . . . . . . . . . . . . . . . . . 14
employed by MPLS fast-reroute [RFC4090], but the mechanisms employed for the backup routes in pure IP networks are necessarily very different. This document provides a framework for the development of this approach. Note that in order to further minimize the impact on user applications, it may be necessary to design the network such that backup paths with suitable characteristics (for example, capacity and/or delay) are available for the algorithms to select. Such considerations are outside the scope of this document.
LFA Loop-Free Alternate. A neighbor N, that is not a primary neighbor E, whose shortest path to the destination D does not go back through the router S. The neighbor N must meet the following condition: Distance_opt(N, D) < Distance_opt(N, S) + Distance_opt(S, D) Loop-Free Neighbor A neighbor N_i, which is not the particular primary neighbor E_k under discussion, and whose shortest path to D does not traverse S. For example, if there are two primary neighbors E_1 and E_2, E_1 is a loop-free neighbor with regard to E_2, and vice versa. Loop-Free Link-Protecting Alternate A path via a Loop-Free Neighbor N_i that reaches destination D without going through the particular link of S that is being protected. In some cases, the path to D may go through the primary neighbor E. Loop-Free Node-Protecting Alternate A path via a Loop-Free Neighbor N_i that reaches destination D without going through the particular primary neighbor (E) of S that is being protected. N_i The ith neighbor of S. Primary Neighbor A neighbor N_i of S which is one of the next hops for destination D in S's FIB prior to any failure. R_i_j The jth neighbor of N_i. Repair Path The path used by a repairing node to send traffic that it is unable to send via the normal path owing to a failure. Routing Transition The process whereby routers converge on a new topology. In conventional networks, this process frequently causes some disruption to packet delivery.
RPF Reverse Path Forwarding, i.e., checking that a packet is received over the interface that would be used to send packets addressed to the source address of the packet. S Used to denote a router that is the source of a repair that is computed in anticipation of the failure of a neighboring router denoted as E, or of the link between S and E. It is the viewpoint from which IP fast-reroute is described. SPF Shortest Path First, e.g., Dijkstra's algorithm. SPT Shortest path tree Upstream Forwarding Loop A forwarding loop that involves a set of routers, none of which is directly connected to the link that has caused the topology change that triggered a new SPF in any of the routers.
2. The time taken for the local router to react to the failure. This will typically involve generating and flooding new routing updates, perhaps after some hold-down delay, and re-computing the router's FIB. 3. The time taken to pass the information about the failure to other routers in the network. In the absence of routing protocol packet loss, this is typically between 10 milliseconds and 100 milliseconds per hop. 4. The time taken to re-compute the forwarding tables. This is typically a few milliseconds for a link state protocol using Dijkstra's algorithm. 5. The time taken to load the revised forwarding tables into the forwarding hardware. This time is very implementation dependent and also depends on the number of prefixes affected by the failure, but may be several hundred milliseconds. The disruption will last until the routers adjacent to the failure have completed steps 1 and 2, and until all the routers in the network whose paths are affected by the failure have completed the remaining steps. The initial packet loss is caused by the router(s) adjacent to the failure continuing to attempt to transmit packets across the failure until it is detected. This loss is unavoidable, but the detection time can be reduced to a few tens of milliseconds as described in Section 5.1. In some topologies, subsequent packet loss may be caused by the "micro-loops" which may form as a result of temporary inconsistencies between routers' forwarding tables [RFC5715]. These inconsistencies are caused by steps 3, 4, and 5 above, and in many routers it is step 5 that is both the largest factor and that has the greatest variance between routers. The large variance arises from implementation differences and from the differing impact that a failure has on each individual router. For example, the number of prefixes affected by the failure may vary dramatically from one router to another. In order to reduce packet disruption times to a duration commensurate with the failure detection times, two mechanisms may be required: a. A mechanism for the router(s) adjacent to the failure to rapidly invoke a repair path, which is unaffected by any subsequent re- convergence.
b. In topologies that are susceptible to micro-loops, a micro-loop control mechanism may be required [RFC5715]. Performing the first task without the second may result in the repair path being starved of traffic and hence being redundant. Performing the second without the first will result in traffic being discarded by the router(s) adjacent to the failure. Repair paths may always be used in isolation where the failure is short-lived. In this case, the repair paths can be kept in place until the failure is repaired, therefore there is no need to advertise the failure to other routers. Similarly, micro-loop avoidance may be used in isolation to prevent loops arising from pre-planned management action. In which case the link or node being shut down can remain in service for a short time after its removal has been announced into the network, and hence it can function as its own "repair path". Note that micro-loops may also occur when a link or node is restored to service, and thus a micro-loop avoidance mechanism may be required for both link up and link down cases. BFD]. 3. Routing protocol detection; for example, use of "fast Hellos". When configuring packet-based failure detection mechanisms it is important that consideration be given to the likelihood and consequences of false indications of failure. The incidence of false indication of failure may be minimized by appropriately prioritizing the transmission, reception, and processing of the packets used to detect link or node failure. Note that this is not an issue that is specific to IPFRR.
RFC5286]) offer the simplest repair paths and would normally be used when they are available. It is anticipated that around 80% of failures (see Section 5.2.2) can be repaired using these basic methods alone. Multi-hop repair paths are more complex, both in the computations required to determine their existence, and in the mechanisms required to invoke them. They can be further classified as: a. Mechanisms where one or more alternate FIBs are pre-computed in all routers, and the repaired packet is instructed to be forwarded using a "repair FIB" by some method of per-packet signaling such as detecting a "U-turn" [UTURN], [FIFR] or by marking the packet [SIMULA]. b. Mechanisms functionally equivalent to a loose source route that is invoked using the normal FIB. These include tunnels [TUNNELS], alternative shortest paths [ALT-SP], and label-based mechanisms. c. Mechanisms employing special addresses or labels that are installed in the FIBs of all routers with routes pre-computed to avoid certain components of the network. For example, see [NOTVIA].
In many cases, a repair path that reaches two hops away from the router detecting the failure will suffice, and it is anticipated that around 98% of failures (see Section 5.2.2) can be repaired by this method. However, to provide complete repair coverage, some use of longer multi-hop repair paths is generally necessary.
the coverage is less than 100%, it is important for the purposes of comparisons between different proposed repair strategies to define what is meant by such a percentage. There are four possibilities: 1. The percentage of links (or nodes) that can be fully protected (i.e., for all destinations). This is appropriate where the requirement is to protect all traffic, but some percentage of the possible failures may be identified as being un-protectable. 2. The percentage of destinations that can be protected for all link (or node) failures. This is appropriate where the requirement is to protect against all possible failures, but some percentage of destinations may be identified as being un-protectable. 3. For all destinations (d) and for all failures (f), the percentage of the total potential failure cases (d*f) that are protected. This is appropriate where the requirement is an overall "best- effort" protection. 4. The percentage of packets normally passing though the network that will continue to reach their destination. This requires a traffic matrix for the network as part of the analysis. BFD]. This determination may be made prior to invoking any repairs, but this will increase the period of packet loss following a failure unless the determination can be performed as part of the failure detection mechanism itself. Alternatively, a subsequent determination can be used to optimize an already invoked default strategy.
until they are no longer required. This will normally be when the routing protocol has re-converged on the new topology taking into account the failure, and traffic will no longer be using the repair paths. The repair paths have the property that they are unaffected by any topology changes resulting from the failure that caused their instantiation. Therefore, there is no need to re-compute them during the convergence period. They may be affected by an unrelated simultaneous topology change, but such events are out of scope of this work (see Section 5.2.6). Once the routing protocol has re-converged, it is necessary for all repair paths to take account of the new topology. Various optimizations may permit the efficient identification of repair paths that are unaffected by the change, and hence do not require full re- computation. Since the new repair paths will not be required until the next failure occurs, the re-computation may be performed as a background task and be subject to a hold-down, but excessive delay in completing this operation will increase the risk of a new failure occurring before the repair paths are in place.
RFC5715]. The following factors are significant in their classification: 1. Partial or complete protection against micro-loops. 2. Convergence delay. 3. Tolerance of multiple failures (from node failures, and in general). 4. Computational complexity (pre-computed or real time). 5. Applicability to scheduled events. 6. Applicability to link/node reinstatement. 7. Topological constraints.
2. Monitoring and operational support A. Notification of links/nodes/destinations that cannot be protected. B. Notification of pre-computed repair paths, and anticipated traffic patterns. C. Counts of failure detections, protection invocations, and packets forwarded over repair paths. D. Testing repairs.
[ALT-SP] Tian, A., "Fast Reroute using Alternative Shortest Paths", Work in Progress, July 2004. [BFD] Katz, D. and D. Ward, "Bidirectional Forwarding Detection", Work in Progress, January 2010. [FIFR] Nelakuditi, S., Lee, S., Lu, Y., Zhang, Z., and C. Chuah, "Fast Local Rerouting for Handling Transient Link Failures", IEEE/ACM Transactions on Networking, Vol. 15, No. 2, DOI 10.1109/TNET.2007.892851, available from http://www.ieeexplore.ieee.org, April 2007. [NOTVIA] Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute Using Not-via Addresses", Work in Progress, July 2009. [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute Extensions to RSVP-TE for LSP Tunnels", RFC 4090, May 2005. [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, September 2008. [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free Convergence", RFC 5715, January 2010. [SIMULA] Kvalbein, A., Hansen, A., Cicic, T., Gjessing, S., and O. Lysne, "Fast IP Network Recovery using Multiple Routing Configurations", Infocom 10.1109/INFOCOM.2006.227, available from http://www.ieeexplore.ieee.org, April 2006. [TUNNELS] Bryant, S., Filsfils, C., Previdi, S., and M. Shand, "IP Fast Reroute using tunnels", Work in Progress, November 2007. [UTURN] Atlas, A., "U-turn Alternates for IP/LDP Fast-Reroute", Work in Progress, February 2006.