Network Working Group D. Meyer, Ed. Request for Comments: 4984 L. Zhang, Ed. Category: Informational K. Fall, Ed. September 2007 Report from the IAB Workshop on Routing and Addressing Status of This Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
AbstractThis document reports the outcome of the Routing and Addressing Workshop that was held by the Internet Architecture Board (IAB) on October 18-19, 2006, in Amsterdam, Netherlands. The primary goal of the workshop was to develop a shared understanding of the problems that the large backbone operators are facing regarding the scalability of today's Internet routing system. The key workshop findings include an analysis of the major factors that are driving routing table growth, constraints in router technology, and the limitations of today's Internet addressing architecture. It is hoped that these findings will serve as input to the IETF community and help identify next steps towards effective solutions. Note that this document is a report on the proceedings of the workshop. The views and positions documented in this report are those of the workshop participants and not of the IAB. Furthermore, note that work on issues related to this workshop report is continuing, and this document does not intend to reflect the increased understanding of issues nor to discuss the range of potential solutions that may be the outcome of this ongoing work.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Key Findings from the Workshop . . . . . . . . . . . . . . . . 4 2.1. Problem #1: The Scalability of the Routing System . . . . 4 2.1.1. Implications of DFZ RIB Growth . . . . . . . . . . . . 5 2.1.2. Implications of DFZ FIB Growth . . . . . . . . . . . . 6 2.2. Problem #2: The Overloading of IP Address Semantics . . . 6 2.3. Other Concerns . . . . . . . . . . . . . . . . . . . . . . 7 2.4. How Urgent Are These Problems? . . . . . . . . . . . . . . 8 3. Current Stresses on the Routing and Addressing System . . . . 8 3.1. Major Factors Driving Routing Table Growth . . . . . . . . 8 3.1.1. Avoiding Renumbering . . . . . . . . . . . . . . . . . 9 3.1.2. Multihoming . . . . . . . . . . . . . . . . . . . . . 10 3.1.3. Traffic Engineering . . . . . . . . . . . . . . . . . 10 3.2. IPv6 and Its Potential Impact on Routing Table Size . . . 11 4. Implications of Moore's Law on the Scaling Problem . . . . . . 11 4.1. Moore's Law . . . . . . . . . . . . . . . . . . . . . . . 12 4.1.1. DRAM . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.2. Off-chip SRAM . . . . . . . . . . . . . . . . . . . . 13 4.2. Forwarding Engines . . . . . . . . . . . . . . . . . . . . 13 4.3. Chip Costs . . . . . . . . . . . . . . . . . . . . . . . . 14 4.4. Heat and Power . . . . . . . . . . . . . . . . . . . . . . 14 4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 15 5. What Is on the Horizon . . . . . . . . . . . . . . . . . . . . 15 5.1. Continual Growth . . . . . . . . . . . . . . . . . . . . . 15 5.2. Large Numbers of Mobile Networks . . . . . . . . . . . . . 16 5.3. Orders of Magnitude Increase in Mobile Edge Devices . . . 16 6. What Approaches Have Been Investigated . . . . . . . . . . . . 17 6.1. Lessons from MULTI6 . . . . . . . . . . . . . . . . . . . 17 6.2. SHIM6: Pros and Cons . . . . . . . . . . . . . . . . . . . 18 6.3. GSE/Indirection Solutions: Costs and Benefits . . . . . . 19 6.4. Future for Indirection . . . . . . . . . . . . . . . . . . 20 7. Problem Statements . . . . . . . . . . . . . . . . . . . . . . 21 7.1. Problem #1: Routing Scalability . . . . . . . . . . . . . 21 7.2. Problem #2: The Overloading of IP Address Semantics . . . 22 7.2.1. Definition of Locator and Identifier . . . . . . . . . 22 7.2.2. Consequence of Locator and Identifier Overloading . . 23 7.2.3. Traffic Engineering and IP Address Semantics Overload . . . . . . . . . . . . . . . . . . . . . . . 24 7.3. Additional Issues . . . . . . . . . . . . . . . . . . . . 24 7.3.1. Routing Convergence . . . . . . . . . . . . . . . . . 24 7.3.2. Misaligned Costs and Benefits . . . . . . . . . . . . 25 7.3.3. Other Concerns . . . . . . . . . . . . . . . . . . . . 25 7.4. Problem Recognition . . . . . . . . . . . . . . . . . . . 26 8. Criteria for Solution Development . . . . . . . . . . . . . . 26 8.1. Criteria on Scalability . . . . . . . . . . . . . . . . . 26 8.2. Criteria on Incentives and Economics . . . . . . . . . . . 27
8.3. Criteria on Timing . . . . . . . . . . . . . . . . . . . . 28 8.4. Consideration on Existing Systems . . . . . . . . . . . . 28 8.5. Consideration on Security . . . . . . . . . . . . . . . . 29 8.6. Other Criteria . . . . . . . . . . . . . . . . . . . . . . 29 8.7. Understanding the Tradeoff . . . . . . . . . . . . . . . . 29 9. Workshop Recommendations . . . . . . . . . . . . . . . . . . . 30 10. Security Considerations . . . . . . . . . . . . . . . . . . . 31 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 31 12. Informative References . . . . . . . . . . . . . . . . . . . . 31 Appendix A. Suggestions for Specific Steps . . . . . . . . . . . 35 Appendix B. Workshop Participants . . . . . . . . . . . . . . . . 35 Appendix C. Workshop Agenda . . . . . . . . . . . . . . . . . . . 36 Appendix D. Presentations . . . . . . . . . . . . . . . . . . . . 37 DFZ][BGT04]. While it has been long recognized that the existing routing architecture may have serious scalability problems, effective solutions have yet to be identified, developed, and deployed. As a first step towards tackling these long-standing concerns, the IAB held a "Routing and Addressing Workshop" in Amsterdam, Netherlands on October 18-19, 2006. The main objectives of the workshop were to identify existing and potential factors that have major impacts on routing scalability, and to develop a concise problem statement that may serve as input to a set of follow-on activities. This document reports on the outcome from that workshop. The remainder of the document is organized as follows: Section 2 provides an executive summary of the workshop findings. Section 3 describes the sources of stress in the current global routing and addressing system. Section 4 discusses the relationship between Moore's law and our ability to build large routers. Section 5 describes a few foreseeable factors that may exacerbate the current problems outlined in Section 2. Section 6 describes previous work in this area. Section 7 describes the problem statements in more detail, and Section 8 discusses the criteria that constrain the solution space. Finally, Section 9 summarizes the recommendations made by the workshop participants.
The workshop participant list is attached in Appendix B. The agenda can be found in Appendix C, and Appendix D provides pointers to the presentations from the workshop. Finally, note that this document is a report on the outcome of the workshop, not an official document of the IAB. Any opinions expressed are those of the workshop participants and not of the IAB. H03]. There have been various hypotheses regarding the sources of this growth. The workshop identified the following factors as the main driving forces behind the rapid growth of the DFZ RIB: o Multihoming, o Traffic engineering, o Non-aggregatable address allocations (a big portion of which is inherited from historical allocations), and o Business events, such as mergers and acquisitions.
All of the above factors can lead to prefix de-aggregation and/or the injection of unaggregatable prefixes into the DFZ RIB. Prefix de- aggregation leads to an uncontrolled DFZ RIB growth because, absent some non-topologically based routing technology (for example, Routing On Flat Labels [ROFL] or any name-independent compact routing algorithm, e.g., [CNIR]), topological aggregation is the only known practical approach to control the growth of the DFZ RIB. The following section reviews the workshop discussion of the implications of the growth of the DFZ RIB. DFZ]. While this has the obvious effects on the requirements for RIB and FIB memory sizes, the growth driven by prefix de-aggregation also exposes the core of the network to the dynamic nature of the edges, i.e., the de-aggregation leads to an increased number of BGP UPDATE messages injected into the DFZ (frequently referred to as "UPDATE churn"). Consequently, additional processing is required to maintain state for the longer prefixes and to update the FIB. Note that, although the size of the RIB is bounded by the given address space size and the number of reachable hosts (i.e., O(m*2^32) for IPv4, where <m> is the average number of peers each BGP router may have), the amount of protocol activity required to distribute dynamic topological changes is not. That is, the amount of BGP UPDATE churn that the network can experience is essentially unbounded. It was also noted that the UPDATE churn, as currently measured, is heavy-tailed [ATNAC2006]. That is, a relatively small number of Autonomous Systems (ASs) or prefixes are responsible for a disproportionately large fraction of the UPDATE churn that we observe today. Furthermore, much of the churn may turn out to be unnecessary information, possibly due to instability of edge ASs being injected into the global routing system [DynPrefix], or arbitrage of some bandwidth pricing model (see [GIH], for example, or the discussion of the behavior of AS 9121 in [BGP2005]). Finally, it was noted by the workshop participants that the UPDATE churn situation may be exacerbated by the current Regional Internet Registry (RIR) policy in which end sites are allocated Provider- Independent (PI) addresses. These addresses are not topologically aggregatable, and as such, bring the churn problem described above into the core routing system. Of course, as noted by several participants, the RIRs have no real choice in this matter, as many enterprises demand PI addresses that allow them to multihome without the "provider lock" that Provider-Allocated (PA) [PIPA] address space creates. Some enterprises also find the renumbering cost associated with PA address assignments unacceptable.
ML] and our ability to build cost-effective, high-performance routers (see Appendix D). "Moore's Law" is the empirical observation that the transistor density of integrated circuits, with respect to minimum component cost, doubles roughly every 24 months. A commonly held wisdom is that Moore's law would save the day by ensuring that technology will continue to scale at historical rates that surpass the growth rate of routing information handled by core router hardware. However, Li pointed out that Moore's Law does not apply to building high-end routers as far as the cost is concerned. Moore's Law applies specifically to the high-volume portion of the semiconductor industry, while the low-volume, customized silicon used in core routing is well off Moore's Law's cost curve. In particular, off-chip SRAM is commonly used for storing FIB data, and the driver for low-latency, high-capacity SRAM used to be PC cache memory. However, recently cache memory has been migrating directly onto the processor die, and cell phones are now the primary driver for off- chip SRAM. Given cell phones require low-power, small-capacity parts that are not applicable to high-end routers, the SRAMs that are favored for router design are not volume parts and do not track with Moore's law. GSE], where the address structure was designed specifically to enable "aggressive topological aggregation" to scale the routing system. Noel Chiappa has also written extensively on this topic (see, e.g., [EID]). There is, however, a difficulty in creating (and maintaining) the kind of congruence envisioned by Rekhter's Law in today's Internet. The difficulty arises from the overloading of addressing with the semantics of both "who" (endpoint identifier, as used by transport layer) and "where" (locators for the routing system); some might also add that IP addresses are also overloaded with "how" [GIH]. In any
event, this kind of overloading is felt to have had deep implications for the scalability of the global routing system. A refinement to Rekhter's Law, then, is that for the Internet routing system to scale, an IP address must be assigned in such a way that it is congruent with the Internet's topology. However, identifiers are typically assigned based upon organizational (not topological) structure and have stability as a desirable property, a "natural incongruence" arises. As a result, it is difficult (if not impossible) to make a single number space serve both purposes efficiently. Following the logic of the previous paragraphs, workshop participants concluded that the so-called "locator/identifier overload" of the IP address semantics is one of the causes of the routing scalability problem as we see today. Thus, a "split" seems necessary to scale the routing system, although how to actually architect and implement such a split was not explored in detail. Section 2.1 and Section 2.2, the workshop participants also identified the following three pressing, but "second tier", issues. The first one is a general concern with IPv6 deployment. It is commonly believed that the IPv4 address space has put an effective constraint on the IPv4 RIB growth. Once this constraint is lifted by the deployment of IPv6, and in the absence of a scalable routing strategy, the rapid DFZ RIB size growth problem today can potentially be exacerbated by IPv6's much larger address space. The only routing paradigm available today for IPv6 is a combination of Classless Inter-Domain Routing (CIDR) [RFC4632] and Provider-Independent (PI) address allocation strategies [PIPA] (and possibly SHIM6 [SHIM6] when that technology is developed and deployed). Thus, the opportunity exists to create a "swamp" (unaggregatable address space) that can be many orders of magnitude larger than what we faced with IPv4. In short, the advent of IPv6 and its larger address space further underscores both the concerns raised in Section 2.1, and the importance of resolving the architectural issue raised in Section 2.2. The second issue is slow routing convergence. In particular, the concern was that growth in the number of routes that service providers must carry will cause routing convergence to become a significant problem.
The third issue is the misalignment of costs and benefits in today's routing system. While the IETF does not typically consider the "business model" impacts of various technology choices, many participants felt that perhaps the time has come to review that philosophy. Section 2.1 and Section 2.2 need immediate attention. This need was not because the participants perceived a looming, well-defined "hit the wall" date, but rather because these are difficult problems that to date have resisted solution, are likely to get more unwieldy as IPv6 deployment proceeds, and the development and deployment of an effective solution will necessarily take at least a few years. BGP2005]; this number has reached 200,000 as of October 2006 [CIDRRPT], and is projected to increase to 370,000 or more within 5 years [Fuller]. Some workshop participants projected that the DFZ could reach 2 million entries within 15 years, and there might be as many as 10 million multihomed sites by 2050. Another related concern was the number of prefixes changed, added, and withdrawn as a function of time (i.e., BGP UPDATE churn). This has a detrimental impact on routing convergence, since UPDATEs frequently necessitate a re-computation and download of the FIB. For example, a BGP router may observe up to 500,000 BGP updates in a single day [DynPrefix], with the peak arrival rates over 1000 updates per second. Such UPDATE churn problems are not limited to DFZ routes; indeed, the number of internal routes carried by large ISPs also threatens convergence times, given that such internal routes include more specifics, Virtual Private Network (VPN) routes, and other routes that do not appear in the DFZ [ATNAC2006].
prefixes). In this section, we discuss in more detail why this trend is accelerating and may be cause for concern. An increasing fraction of the more-specific prefixes found in the DFZ are due to deliberate action on the part of operators [ATNAC2006]. Motivations to advertise these more-specifics include: o Traffic Engineering, where load is balanced across multiple links through selective advertisement of more-specific routes on different links to adjust the amount of traffic received on each; and o Attempts to prevent prefix-hijacking by other operators who might advertise more-specifics to steer traffic toward them; there are several known instances of this behavior today [BHB06]. RFC4192], for others, the necessary changes are sufficiently difficult so as to make renumbering effectively impossible. For these reasons, PI address space is sought by a growing number of customers. Current RIR policy reflects this trend, and their policy is to allocate PI prefixes to all customers who claim a need. Routing PI prefixes requires additional entries in the DFZ routing and forwarding tables. At present, ISPs do not typically charge to route PI prefixes. Therefore, the "costs" of the additional prefixes, in terms of routing table entries and processing overhead, is born by the global routing system as a whole, rather than directly by the users of PI space. The workshop participants observed that no strong disincentive exists to discourage the increasing use of PI address space.
RFC4116]. There are several reasons for the observed increase in multihoming, including the increased reliance on the Internet for mission- and business-critical applications and the general decrease in cost to obtain Internet connectivity. Multihoming provides backup routing -- Internet connection redundancy; in some circumstances, multihoming is mandatory due to contract or law. Multihoming can be accomplished using either PI or PA address space, and multihomed sites generally have their own AS numbers (although some do not; this generally occurs when such customers are statically routed). A multihomed site using PI address space has its prefixes present in the forwarding and routing tables of each of its providers. For PA space, each prefix allocated from one provider's address allocation will be aggregatable for that provider but not the others. If the addresses are allocated from a 'primary' ISP (i.e., one that the site uses for routing unless a failure occurs), then the additional routing table entries only appear during path failures to that primary ISP. A problem with multihoming arises when a customer's PA IP prefixes are advertised by AS(es) other than their 'primary' ISP's. Because of the longest-matching prefix forwarding rule, in this case, the customer's traffic will be directed through the non- primary AS(s). In response, the primary ISP is forced to de- aggregate the customer's prefix in order to keep the customer's traffic flowing through it instead of the non-primary AS(s).
o Finally, TE is sometimes deployed to enforce certain forms of policy (e.g., Canadian government traffic may not be permitted to transit through the United States). Few tools exist for inter-domain traffic engineering today. Network operators usually achieve traffic engineering by "tweaking" the processing of routing protocols to achieve desired results. At the BGP level, if the address range requiring TE is a portion of a larger PA address aggregate, network operators implementing TE are forced to de-aggregate otherwise aggregatable prefixes in order to steer the traffic of the particular address range to specific paths. In today's highly competitive environment, providers require TE to maintain good performance and low cost in their networks. However, the current practice of TE deployment results in an increase of the DFZ RIB; although individual operators may have a certain gain from doing TE, it leads to an overall increased cost for the Internet routing infrastructure as a whole. ARIN] has relaxed its policy for allocation of such space and has been allocating /48 prefixes when customers request PI prefixes. Thus, the same pressures affecting IPv4 address allocations also affect IPv6 allocations. Appendix D. It is worth noting that this information has generated quite a bit of discussion since the workshop, and as such requires further community input.] The workshop heard from Tony Li about the relationship between Moore's law and the ability to build cost-effective, high-performance routers. The scalability of the current routing subsystem manifests itself in the forwarding table (FIB) and routing table (RIB) of the routers in the core of the Internet. The implementation choices for FIB storage are on-chip SRAM, off-chip SRAM, or DRAM. DRAM is commonly used in lower end devices. RIB storage is done via DRAM.
[Editor's note: The exact implementation of a high-performance router's RIB and FIB memories is the subject of much debate; it is also possible that alternative designs may appear in the future.] The scalability question then becomes whether these memory technologies can scale faster than the size of the full routing table. Intrinsic in this statement is the assumption that core routers will be continually and indefinitely upgraded on a periodic basis to keep up with the technology curve and that the costs of those upgrades will be passed along to the general Internet community. ML]. The semiconductor industry has been following this density trend for the last 40 or so years. The commonly held wisdom is that Moore's law will save the day by ensuring that technology will continue to scale at the historical rate that will surpass the growth rate of routing information. However, it is vital to understand that Moore's law comes out of the high-volume portion of the semiconductor industry, where the costs of silicon are dominated by the actual fabrication costs. The customized silicon used in core routers is produced in far lower volume, typically in the 1,000-10,000 parts per year, whereas microprocessors are running in the tens of millions per year. This places the router silicon well off the cost curve, where the economies of scale are not directly inherited, and yield improvements are not directly inherited from the best current practices. Thus, router silicon benefits from the technological advances made in semiconductors, but does not follow Moore's law from a cost perspective. To date, this cost difference has not shown clearly. However, the growth in bandwidth of the Internet and the steady climb of the speed of individual links has forced router manufacturers to apply more sophisticated silicon technology continuously. There has been a new generation of router hardware that has grown at about 4x the bandwidth every three years, and increases in routing table size have been absorbed by the new generations of hardware. Now that router hardware is nearing the practical limits of per-lambda bandwidth, it is possible that upgrades solely for meeting the forwarding table scaling will become more visible.
DRAM] [Molinero]. This is an issue because BGP convergence time is limited by DRAM access speeds. In processing a BGP update, a BGP speaker receives a path and must compare it to all of the other paths it has stored for the prefix. It then iterates over all of the prefixes in the update stream. This results in a memory access pattern that has proven to limit the effectiveness of processor caching. As a result, BGP convergence time degrades at the routing table growth rate, divided by the speed improvement rate of DRAM. In the long run, this is likely to become a significant issue.
alternative. If this choice is selected, then growth in the available FIB is tightly coupled to process technology improvements, which are driven by the general-purpose CPU market. While this growth rate should suffice, in general, the forwarding engine market is decidedly off the high-volume price curve, resulting in spiraling costs to support basic forwarding. Moreover, if there is any change in Moore's law or decrease in the rate of processor technology evolution, the forwarding engine could quickly become the technological leader of silicon technology. This would rapidly result in forwarding technology becoming prohibitively expensive.
A key metric for system evaluation is now the unit of forwarding bandwidth per Watt-- [(Mb/s)/W]. About 60% of the power goes to the forwarding engine circuits, with the rest divided between the memories, route processors, and interconnect. Using parallelization to achieve higher bandwidths can aggravate the situation, due to increased power and cooling demands. [Editor's note: Many in the community have commented that heat, power consumption, and the attendant heat dissipation, along with size limitations of fabrication processes for high speed parallel I/O interfaces, are the current limiting factors.] CIDRRPT], with bursts that even exceed Moore's law, the trend is for the costs of technology refresh to continue to grow, indefinitely, even in constant dollars.
CIDRRPT], several thousands of mobile networks, each represented by a single prefix announcement, may not necessarily raise serious routing scalability or stability concerns. However, there is an open question regarding whether this number can become substantially larger if other types of mobile networks, such as networks on trains or ships, come into play. If such mobile networks become commonplace, then their impact on the global routing system needs to be assessed. RFC3775]), handle the mobility by one level of indirection through home agents; mobile hosts do not appear any different, from a routing perspective, than stationary hosts. If we follow the same approach, new mobile devices should not present challenges beyond the increase in the size of the host population. The workshop participants recognized that the increase in the number of mobile devices can be significant, and that if a scalable routing system supporting generic identity-locator separation were developed and introduced, billions of mobile gadgets could be supported without bringing undue impact on global routing scalability and stability. Further investigation is needed to gain a complete understanding of the implications on the global routing system of connecting many new mobile hand-held devices (including mobile sensor networks) to the Internet.
IDR-REQS]. To benefit from the insights obtained from these past results, the workshop reviewed several major previous and ongoing IETF efforts: 1. The MULTI6 working group's exploration of the solution space and the lessons learned, 2. The solution to multihoming being developed by the SHIM6 Working Group, and its pros and cons, 3. The GSE proposal made by O'Dell in 1997, and its pros and cons, and 4. Map-and-Encap [RFC1955], a general indirection-based solution to scalable multihoming support.
the ability to control the traffic flow of the entire site. Conversely, handling multiple addresses by individual hosts offers each host the flexibility to choose different policies for selecting a provider; it also implies changes to all the hosts of a multihomed site. During the process of evaluating all the proposals, two major lessons were learned: o Changing anything in the current practice is hard: for example, inserting an additional header into the protocol would impact IP fragmentation processing, and the current congestion control assumes that each TCP connection follows a single routing path. In addition, operators ask for the ability to perform traffic engineering on a per-site basis, and specification of site policy is often interdependent with the IP address structure. o The IP address has been used as an identifier and has been codified into many Internet applications that manipulate IP addresses directly or include IP addresses within the application layer data stream. IP addresses have also been used as identifiers in configuring network policies. Changing the semantics of an IP address, for example, using only the last 64- bit as identifiers as proposed by GSE, would require changes to all such applications and network devices.
prefixes, one from each of its multiple providers, to facilitate provider-based prefix aggregation. However, this gain comes with several significant costs. First, SHIM6 requires modifications to all host stack implementations to support the shim processing. Second, the shim layer must maintain the mapping between the identifier and the multiple locators returned from IPv6 AAAA name resolution, and must take the responsibility to try multiple locators if failures ever occur during the end-to-end communication. At this time, the host has little information to determine the order of locators it should use in reaching a multihomed destination, however, there is ongoing effort in addressing this issue. Furthermore, as a host-based approach, SHIM6 provides little control to the service provider for effective traffic engineering. At the same time, it also imposes additional state information on the host regarding the multiple locators of the remote communication end. Such state information may not be a significant issue for individual user hosts, but can lead to larger resource demands on large application servers that handle hundreds of thousands of simultaneous TCP connections. Yet another major issue with the SHIM6 solution is the need for renumbering when a site changes providers. Although a multihomed site is assigned multiple address blocks, none of them can be treated as a persistent identifier for the site. When the site changes one of its providers, it must purge the address block of that provider from the entire site. The current practice of using the IP address as both an identifier and a locator has been strengthened by the use of IP addresses in access control lists present in various types of policy-enforcement devices (e.g., firewalls). If SHIM6's ULIDs are to be used for policy enforcement, a change of providers may necessitate the re-configuration of many such devices. GSE] and indirection approaches, such as Map-and-Encap [RFC1955], in general. The GSE proposal changes the IPv6 address structure to bear the semantics of both an identifier and a locator. The first n bytes of the 16-byte IPv6 address are called the Routing Goop (RG), and are used by the routing system exclusively as a locator. The last 8 bytes of the IPv6 address specify an interface on an end-system. The middle (16 - n - 8) bytes are used to identify site local topology. The border routers of a site re-write the source RG of each outgoing packet to make the source address part of the source provider's address aggregation; they also re-write the destination RG of each incoming packet to hide the site's RG from all the internal routers and hosts. Although GSE
designates the lower 8 bytes of the IPv6 address as identifiers, the extent to which GSE could be made compatible with increasingly- popular cryptographically-generated addresses (CGA) remains to be determined [dGSE]. All identifier/locator split proposals require a mapping service that can return a set of locators corresponding to a given identifier. In addition, these proposals must also address the problem of detecting locator failures and redirecting data flows to remaining locators for a multihomed site. The Map-and-Encap proposal did not address these issues. GSE proposed to use DNS for providing the mapping service, but it did not offer an effective means for locator failure recovery. GSE also requires host stack modifications, as the upper layers and applications are only allowed to use the lower 8-bytes, rather than the entire, IPv6 address. RFC1955] represents a more general form of this indirection solution, which uses tunneling, instead of locator rewriting, to cross the DFZ and support provider-based prefix aggregation. This class of solutions avoids the provider and customer conflicts regarding PA and PI prefixes by putting each in a separate name space, so that ISPs can use topologically aggregatable addresses while customers can have their globally unique and provider-independent identifiers. Thus, it supports scalable multihoming, and requires no changes to the end systems when the encapsulation is performed by the border routers of a site. It also requires no changes to the current practice of both applications as well as backbone operations. However, all gains of an effective solution are accompanied with certain associated costs. As stated earlier in this section, a mapping service must be provided. This mapping service not only brings with it the associated complexity and cost, but it also adds another point of failure and could also be a potential target for malicious attacks. Any solution to routing scalability is necessarily a cost/benefit tradeoff. Given the high potential of its gains, this indirection approach deserves special attention in our search for scalable routing solutions.
DFZ]. Given that the IPv6 routing architecture is the same as the IPv4 architecture (with substantially larger address space), if/when IPv6 becomes widely deployed, it is natural to predict that routing table growth for IPv6 will only exacerbate the situation. The increasing deployment of Virtual Private Network/Virtual Routing and Forwarding (VPN/VRF) is considered another major factor driving the routing system growth. However, there are different views regarding whether this factor has, or does not have, a direct impact to the DFZ RIB. A common practice is to delegate specific routers to handle VPN connections, thus backbone routers do not necessarily hold
state for individual VPNs. Nevertheless, VPNs do represent scalability challenges in network operations. Section 3, multihoming, along with traffic engineering, appear to be the major factors driving the growth of the DFZ RIB. Below, we elaborate their impact on the DFZ RIB.
PathExp], and convergence delay is largely determined by the minimum route advertisement interval (MRAI) timer [RFC4098], except those cases when a route is withdrawn. Route withdrawals tend to suffer from path explorations and hence slow convergence; one participant's experience suggests that the withdrawal delays often last up to a couple of minutes. One may argue that, if the destination becomes unreachable, a long convergence delay would not bring further damage to applications. However, there are often cases where a more specific route (a longer prefix) has failed, yet the destination can still be reached through an aggregated route (a shorter prefix). In these cases, the long convergence delay does impact application performance. While IGPs are designed to and do converge more quickly than BGP might, the workshop participants were concerned that, in addition to the various special purpose routes that IGPs must carry, the rapid growth of the DFZ RIB size can effectively slow down IGP convergence. The IGP convergence delay can be due to multiple factors, including 1. Delays in detecting physical failures, 2. The delay in loading updated information into the FIB, and
3. The large size of the internal RIB, often twice as big as the DFZ RIB, which can lead to both longer route computation time and longer FIB loading time. The workshop participants hold different views regarding (1) the severity of the routing convergence problem; and (2) whether it is an architectural problem, or an implementation issue. However, people generally agree that if we solve the routing scalability problem, that will certainly help reduce the convergence delay or make the problem a much easier one to handle because of the reduced number of routes to process.
routing system. These discussions were covered in Section 5 of this report. Routing security is another issue that was brought up a number of times during the workshop. The consensus from the workshop participants was that, however important routing security may be, it was out of scope for this workshop, whose main goal was to produce a problem statement about addressing and routing scalability. It was duly considered that security must be one of the top design goals when we get to a solution development stage. It was also noted that, if we continue to allow the routing table to grow indefinitely, then it may be impossible to add security enhancements in the future. Section 4, as well as pressure for shorter depreciation cycles, which in turn also translates to cost increases. RFC 2547 VPN [RFC2547] deployment, the solution must enable the routing system to scale gracefully, as measured by the number of
o DFZ Internet routes, and o Internal routes. In addition, scalable support for traffic engineering (TE) must be considered as a business necessity, not an option. Capacity planning involves placing circuits based on traffic demand over a relatively long time scale, while TE must work more immediately to match the traffic load to the existing capacity and to match the routing policy requirements. It was recognized that different parties in the Internet may have different specific TE requirements. For example, o End site TE: based on locally determined performance or cost policies, end sites may wish to control the traffic volume exiting to, or entering from specific providers. o Small ISP to transit ISP TE: operators may face tight resource constraints and wish to influence the volume of entering traffic from both customers and providers along specific routing paths to best utilize the limited resources. o Large ISP TE: given the densely connected nature of the Internet topology, a given destination normally can be reached through different routing paths. An operator may wish to be able to adjust the traffic volume sent to each of its peers based on business relations with its neighbor ASs. At this time, it remains an open issue whether a scalable TE solution would be necessarily inside the routing protocol, or can be accomplished through means that are external to the routing system.
the new solution, there should be measurable benefits to balance the costs. Independent of what kind of solutions the IETF develops, if any, it is unlikely that the resulting routing system would stay constant in size. Instead, the workshop participants believed the routing system will continue to grow, and that ISPs will continue to go through system and hardware upgrade cycles. Many attendees expressed a desire that the scaling properties of the system can allow the hardware to keep up with the Internet growth at a rate that is comparable to the current costs, for example, allowing one to keep a 5-year hardware depreciation cycle, as opposed to a situation where scaling leads to accelerated cost increases.
that. This way enables us to gain a full understanding of the tradeoffs, and what potential gains, if any, that we may achieve by relaxing the backward-compatibility concerns. As a rule of thumb for successful deployment, for any new design, its chance of success is higher if it makes fewer changes to the existing system.
presented in Section 6, where we examined the gains and costs of a few different approaches to scalable multihoming support (SHIM6, GSE, and a general tunneling approach). A major task in the solution development is to understand who may have to give up what, and whether that makes a worthy tradeoff. Before ending this discussion on the solution criteria, it is worth mentioning the shortest presentation at the workshop, which was made by Tony Li (the presentation slides can be found from Appendix D). He asked a fundamental question: what is at stake? It is the Internet itself. If the routing system does not scale with the continued growth of the Internet, eventually the costs might spiral out of control, the digital divide widen, and the Internet growth slow down, stop, or retreat. Compared to this problem, he considered that none of the criteria mentioned so far (except solving the problem) was important enough to block the development and deployment of an effective solution.
should lead the investigation into understanding of both how to make this architectural change and the overall impact of the change. Fourth, given the goal of developing a long-term solution, and the fact that development and deployment cycles will necessarily take some time, it may be helpful (or even necessary) to buy some time through engineering feasible short- or intermediate-term solutions (e.g., FIB compression). Fifth, the workshop participants believe the next step is to develop a roadmap from here to the solution deployment. The IAB and IESG are expected to take on the leadership role in this roadmap development, and to leverage on the momentum from this successful workshop to move forward quickly. The roadmap should provide clearly defined short-, medium-, and long-term objectives to guide the solution development process, so that the community as a whole can proceed in an orchestrated way, seeing exactly where we are going when engineering necessary short-term fixes. Finally, the workshop participants also made a number of suggestions that the IETF might consider when examining the solution space. These suggestions are captured in Appendix A. [RFC1955] Hinden, R., "New Scheme for Internet Routing and Addressing (ENCAPS) for IPNG", RFC 1955, June 1996. [RFC2547] Rosen, E. and Y. Rekhter, "BGP/MPLS VPNs", RFC 2547, March 1999. [RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support in IPv6", RFC 3775, June 2004.
[RFC4098] Berkowitz, H., Davies, E., Hares, S., Krishnaswamy, P., and M. Lepp, "Terminology for Benchmarking BGP Device Convergence in the Control Plane", RFC 4098, June 2005. [RFC4116] Abley, J., Lindqvist, K., Davies, E., Black, B., and V. Gill, "IPv4 Multihoming Practices and Limitations", RFC 4116, July 2005. [RFC4192] Baker, F., Lear, E., and R. Droms, "Procedures for Renumbering an IPv6 Network without a Flag Day", RFC 4192, September 2005. [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan", BCP 122, RFC 4632, August 2006. [IDR-REQS] Doria, A. and E. Davies, "Analysis of IDR requirements and History", Work in Progress, February 2007. [ARIN] "American Registry for Internet Numbers", http://www.arin.net/index.shtml. [PIPA] Karrenberg, D., "IPv4 Address Allocation and Assignment Policies for the RIPE NCC Service Region", RIPE-387 http://www.ripe.net/docs/ipv4-policies.html, 2006. [SHIM6] "Site Multihoming by IPv6 Intermediation (shim6)", http://www.ietf.org/html.charters/shim6-charter.html. [EID] Chiappa, J., "Endpoints and Endpoint Names: A Proposed Enhancement to the Internet Architecture", http://www.chiappa.net/~jnc/tech/endpoints.txt, 1999. [GSE] O'Dell, M., "GSE - An Alternate Addressing Architecture for IPv6", Work in Progress, 1997. [dGSE] Zhang, L., "An Overview of Multihoming and Open Issues in GSE", IETF Journal, http://www.isoc.org/tools/blogs/ ietfjournal/?p=98#more-98, 2006. [PathExp] Oliveira, R. and et. al., "Quantifying Path Exploration in the Internet", Internet Measurement Conference (IMC) 2006, http://www.cs.ucla.edu/~rveloso/papers/ imc175f-oliveira.pdf.
[DynPrefix] Oliveira, R. and et. al., "Measurement of Highly Active Prefixes in BGP", IEEE GLOBECOM 2005 http://www.cs.ucla.edu/~rveloso/papers/activity.pdf. [BHB06] Boothe, P., Hielbert, J., and R. Bush, "Short-Lived Prefix Hijacking on the Internet", NANOG 36 http://www.nanog.org/mtg-0602/pdf/boothe.pdf, 2006. [ROFL] Caesar, M. and et. al., "ROFL: Routing on Flat Labels", SIGCOMM 2006, http://www.sigcomm.org/sigcomm2006/ discussion/showpaper.php?paper_id=34, 2006. [CNIR] Abraham, I. and et. al., "Compact Name-Independent Routing with Minimum Stretch", ACM Symposium on Parallel Algorithms and Architectures, http://citeseer.ist.psu.edu/710757.html, 2004. [BGT04] Bu, T., Gao, L., and D. Towsley, "On Characterizing BGP Routing Table Growth", J. Computer and Telecomm Networking V45N1, 2004. [Fuller] Fuller, V., "Scaling issues with ipv6 routing+ multihoming", http://www.iab.org/about/workshops/ routingandaddressing/vaf-iab-raws.pdf, 2006. [H03] Huston, G., "Analyzing the Internet's BGP Routing Table", http://www.potaroo.net/papers/ipj/ 2001-v4-n1-bgp/bgp.pdf, 2003. [BGP2005] Huston, G., "2005 -- A BGP Year in Review", http:// www.apnic.net/meetings/21/docs/sigs/routing/ routing-pres-huston-routing-update.pdf. [DFZ] Huston, G., "Growth of the BGP Table - 1994 to Present", http://bgp.potaroo.net, 2006. [GIH] Huston, G., "Wither Routing?", http://www.potaroo.net/ispcol/2006-11/raw.html, 2006. [ATNAC2006] Huston, G. and G. Armitage, "Projecting Future IPv4 Router Requirements from Trends in Dynamic BGP Behaviour", http://www.potaroo.net/papers/phd/ atnac-2006/bgp-atnac2006.pdf, 2006. [CIDRRPT] "The CIDR Report", http://www.cidr-report.org.
[ML] "Moore's Law", Wikipedia http://en.wikipedia.org/wiki/Moore's_law, 2006. [Molinero] Molinero-Fernandez, P., "Technology trends in routers and switches", PhD thesis, Stanford University http:// klamath.stanford.edu/~molinero/thesis/html/ pmf_thesis_node5.html, 2005. [DRAM] Landler, P., "DRAM Productivity and Capacity/Demand Model", Global Economic Workshop http:// www.sematech.org/meetings/archives/GES/19990514/docs/ 07_econ.pdf, 1999.
Russ Housley (IESG) Geoff Huston Daniel Karrenberg Dorian Kim Olaf Kolkman (IAB) Darrel Lewis Tony Li Kurtis Lindqvist (IAB) Peter Lothberg David Meyer (IAB) Christopher Morrow Dave Oran (IAB) Phil Roberts (IAB Executive Director) Jason Schiller Peter Schoenmaker Ted Seely Mark Townsley (IESG) Iljitsch van Beijnum Ruediger Volk Magnus Westerlund (IESG) Lixia Zhang (IAB)
1015-1030: Coffee Break 1200-1300: Lunch 1330-1730: Afternoon session: What are the top 3 routing problems in your network? Moderator: Kurt Erik Lindqvist 1500-1530: Coffee Break Dinner at Indrapura (http://www.indrapura.nl), sponsored by Cisco --------- DAY 2: The proposed goal is to formulate a problem statement 0800-0830: Welcome 0830-1000: Morning session: What's on the table Moderator: Elwyn Davies - shim6 - GSE 1000-1030: Coffee Break 1030-1200: Problem Statement session #1: document the problems Moderator: David Meyer 1200-1300: Lunch 1300-1500: Problem Statement session # 2, cont; Moderator: Dino Farinacci - Constraints on solutions 1500-1530: Coffee Break 1530-1730: Summary and Wrap-up Moderator: Leslie Daigle http://www.iab.org/about/workshops/routingandaddressing
Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at firstname.lastname@example.org.