Network Working Group E. Rosen Request for Comments: 2547 Y. Rekhter Category: Informational Cisco Systems, Inc. March 1999 BGP/MPLS VPNs Status of this Memo This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (1999). All Rights Reserved.
AbstractThis document describes a method by which a Service Provider with an IP backbone may provide VPNs (Virtual Private Networks) for its customers. MPLS (Multiprotocol Label Switching) is used for forwarding packets over the backbone, and BGP (Border Gateway Protocol) is used for distributing routes over the backbone. The primary goal of this method is to support the outsourcing of IP backbone services for enterprise networks. It does so in a manner which is simple for the enterprise, while still scalable and flexible for the Service Provider, and while allowing the Service Provider to add value. These techniques can also be used to provide a VPN which itself provides IP service to customers. 1 Introduction ....................................... 2 1.1 Virtual Private Networks ........................... 2 1.2 Edge Devices ....................................... 3 1.3 VPNs with Overlapping Address Spaces ............... 4 1.4 VPNs with Different Routes to the Same System ...... 4 1.5 Multiple Forwarding Tables in PEs .................. 5 1.6 SP Backbone Routers ................................ 5 1.7 Security ........................................... 5 2 Sites and CEs ...................................... 6 3 Per-Site Forwarding Tables in the PEs .............. 6 3.1 Virtual Sites ...................................... 8 4 VPN Route Distribution via BGP ..................... 8 4.1 The VPN-IPv4 Address Family ........................ 9 4.2 Controlling Route Distribution ..................... 10
4.2.1 The Target VPN Attribute ........................... 10 4.2.2 Route Distribution Among PEs by BGP ................ 12 4.2.3 The VPN of Origin Attribute ........................ 13 4.2.4 Building VPNs using Target and Origin Attributes ... 14 5 Forwarding Across the Backbone ..................... 15 6 How PEs Learn Routes from CEs ...................... 16 7 How CEs learn Routes from PEs ...................... 19 8 What if the CE Supports MPLS? ...................... 19 8.1 Virtual Sites ...................................... 19 8.2 Representing an ISP VPN as a Stub VPN .............. 20 9 Security ........................................... 20 9.1 Point-to-Point Security Tunnels between CE Routers . 21 9.2 Multi-Party Security Associations .................. 21 10 Quality of Service ................................. 22 11 Scalability ........................................ 22 12 Intellectual Property Considerations ............... 23 13 Security Considerations ............................ 23 14 Acknowledgments .................................... 23 15 Authors' Addresses ................................. 24 16 References ......................................... 24 17 Full Copyright Statement............................. 25
whether a particular collection of sites is a VPN are the policies of the customers. Some customers will want the implementation of these policies to be entirely the responsibility of the SP. Other customers may want to implement these policies themselves, or to share with the SP the responsibility for implementing these policies. In this document, we are primarily discussing mechanisms that may be used to implement these policies. The mechanisms we describe are general enough to allow these policies to be implemented either by the SP alone, or by a VPN customer together with the SP. Most of the discussion is focused on the former case, however. The mechanisms discussed in this document allow the implementation of a wide range of policies. For example, within a given VPN, we can allow every site to have a direct route to every other site ("full mesh"), or we can restrict certain pairs of sites from having direct routes to each other ("partial mesh"). In this document, we are particularly interested in the case where the common backbone offers an IP service. We are primarily concerned with the case in which an enterprise is outsourcing its backbone to a service provider, or perhaps to a set of service providers, with which it maintains contractual relationships. We are not focused on providing VPNs over the public Internet. In the rest of this introduction, we specify some properties which VPNs should have. The remainder of this document outlines a VPN model which has all these properties. The VPN Model of this document appears to be an instance of the framework described in .
other sites. Routers at different sites do not directly exchange routing information with each other; in fact, they do not even need to know of each other at all (except in the case where this is necessary for security purposes, see section 9). As a consequence, very large VPNs (i.e., VPNs with a very large number of sites) are easily supported, while the routing strategy for each individual site is greatly simplified. It is important to maintain clear administrative boundaries between the SP and its customers (cf. ). The PE and P routers should be administered solely by the SP, and the SP's customers should not have any management access to it. The CE devices should be administered solely by the customer (unless the customer has contracted the management services out to the SP).
instead to the firewall at site B. If the firewall allows the traffic to pass, it then appears to be traffic coming from site B, and follows the route to site A. 9]) packets only if those packets have been labeled by trusted sources. We also assume that it is possible for label switched paths to cross the boundary between service providers.
different VPNs, it should not be possible for systems in one VPN to gain access to systems in another VPN. It should also be possible to deploy standard security procedures.
As an example, let PE1, PE2, and PE3 be three PE routers, and let CE1, CE2, and CE3 be three CE routers. Suppose that PE1 learns, from CE1, the routes which are reachable at CE1's site. If PE2 and PE3 are attached respectively to CE2 and CE3, and there is some VPN V containing CE1, CE2, and CE3, then PE1 uses BGP to distribute to PE2 and PE3 the routes which it has learned from CE1. PE2 and PE3 use these routes to populate the forwarding tables which they associate respectively with the sites of CE2 and CE3. Routes from sites which are not in VPN V do not appear in these forwarding tables, which means that packets from CE2 or CE3 cannot be sent to sites which are not in VPN V. If a site is in multiple VPNs, the forwarding table associated with that site can contain routes from the full set of VPNs of which the site is a member. A PE generally maintains only one forwarding table per site, even if it is multiply connected to that site. Also, different sites can share the same forwarding table if they are meant to use exactly the same set of routes. Suppose a packet is received by a PE router from a particular directly attached site, but the packet's destination address does not match any entry in the forwarding table associated with that site. If the SP is not providing Internet access for that site, then the packet is discarded as undeliverable. If the SP is providing Internet access for that site, then the PE's Internet forwarding table will be consulted. This means that in general, only one forwarding table per PE need ever contain routes from the Internet, even if Internet access is provided. To maintain proper isolation of one VPN from another, it is important that no router in the backbone accept a labeled packet from any adjacent non-backbone device unless (a) the label at the top of the label stack was actually distributed by the backbone router to the non-backbone device, and (b) the backbone router can determine that use of that label will cause the packet to leave the backbone before any labels lower in the stack will be inspected, and before the IP header will be inspected. These restrictions are necessary in order to prevent packets from entering a VPN where they do not belong. The per-site forwarding tables in a PE are ONLY used for packets which arrive from a site which is directly attached to the PE. They are not used for routing packets which arrive from other routers that belong to the SP backbone. As a result, there may be multiple different routes to the same system, where the route followed by a given packet is determined by the site from which the packet enters the backbone. E.g., one may have one route to a given system for
packets from the extranet (where the route leads to a firewall), and a different route to the same system for packets from the intranet (including packets that have already passed through the firewall). Section 8 contains a brief discussion of how the CE might support multiple virtual sites if it does support MPLS.
that we need to allow BGP to install and distribute multiple routes to a single IP address prefix. Further, we must ensure that POLICY is used to determine which sites can be use which routes; given that several such routes are installed by BGP, only one such must appear in any particular per-site forwarding table. We meet these goals by the use of a new address family, as specified below. 3] allow BGP to carry routes from multiple "address families". We introduce the notion of the "VPN- IPv4 address family". A VPN-IPv4 address is a 12-byte quantity, beginning with an 8-byte "Route Distinguisher (RD)" and ending with a 4-byte IPv4 address. If two VPNs use the same IPv4 address prefix, the PEs translate these into unique VPN-IPv4 address prefixes. This ensures that if the same address is used in two different VPNs, it is possible to install two completely different routes to that address, one for each VPN. The RD does not by itself impose any semantics; it contains no information about the origin of the route or about the set of VPNs to which the route is to be distributed. The purpose of the RD is solely to allow one to create distinct routes to a common IPv4 address prefix. Other means are used to determine where to redistribute the route (see section 4.2). The RD can also be used to create multiple different routes to the very same system. In section 3, we gave an example where the route to a particular server had to be different for intranet traffic than for extranet traffic. This can be achieved by creating two different VPN-IPv4 routes that have the same IPv4 part, but different RDs. This allows BGP to install multiple different routes to the same system, and allows policy to be used (see section 4.2.3) to decide which packets use which route. The RDs are structured so that every service provider can administer its own "numbering space" (i.e., can make its own assignments of RDs), without conflicting with the RD assignments made by any other service provider. An RD consists of a two-byte type field, an administrator field, and an assigned number field. The value of the type field determines the lengths of the other two fields, as well as the semantics of the administrator field. The administrator field identifies an assigned number authority, and the assigned number field contains a number which has been assigned, by the identified authority, for a particular purpose. For example, one could have an RD whose administrator field contains an Autonomous System number
(ASN), and whose (4-byte) number field contains a number assigned by the SP to whom IANA has assigned that ASN. RDs are given this structure in order to ensure that an SP which provides VPN backbone service can always create a unique RD when it needs to do so. However, the structuring provides no semantics. When BGP compares two such address prefixes, it ignores the structure entirely. If the Administrator subfield and the Assigned Number subfield of a VPN-IPv4 address are both set to all zeroes, the VPN-IPv4 address is considered to have exactly the same meaning as the corresponding globally unique IPv4 address. In particular, this VPN-IPv4 address and the corresponding globally unique IPv4 address will be considered comparable by BGP. In all other cases, a VPN-IPv4 address and its corresponding globally unique IPv4 address will be considered noncomparable by BGP. A given per-site forwarding table will only have one VPN-IPv4 route for any given IPv4 address prefix. When a packet's destination address is matched against a VPN-IPv4 route, only the IPv4 part is actually matched. A PE needs to be configured to associate routes which lead to particular CE with a particular RD. The PE may be configured to associate all routes leading to the same CE with the same RD, or it may be configured to associate different routes with different RDs, even if they lead to the same CE.
In essence, a Target VPN attribute identifies a set of sites. Associating a particular Target VPN attribute with a route allows that route to be placed in the per-site forwarding tables that are used for routing traffic which is received from the corresponding sites. There is a set of Target VPNs that a PE router attaches to a route received from site S. And there is a set of Target VPNs that a PE router uses to determine whether a route received from another PE router could be placed in the forwarding table associated with site S. The two sets are distinct, and need not be the same. The function performed by the Target VPN attribute is similar to that performed by the BGP Communities Attribute. However, the format of the latter is inadequate, since it allows only a two-byte numbering space. It would be fairly straightforward to extend the BGP Communities Attribute to provide a larger numbering space. It should also be possible to structure the format, similar to what we have described for RDs (see section 4.1), so that a type field defines the length of an administrator field, and the remainder of the attribute is a number from the specified administrator's numbering space. When a BGP speaker has received two routes to the same VPN-IPv4 prefix, it chooses one, according to the BGP rules for route preference. Note that a route can only have one RD, but it can have multiple Target VPNs. In BGP, scalability is improved if one has a single route with multiple attributes, as opposed to multiple routes. One could eliminate the Target VPN attribute by creating more routes (i.e., using more RDs), but the scaling properties would be less favorable. How does a PE determine which Target VPN attributes to associate with a given route? There are a number of different possible ways. The PE might be configured to associate all routes that lead to a particular site with a particular Target VPN. Or the PE might be configured to associate certain routes leading to a particular site with one Target VPN, and certain with another. Or the CE router, when it distributes these routes to the PE (see section 6), might specify one or more Target VPNs for each route. The latter method shifts the control of the mechanisms used to implement the VPN policies from the SP to the customer. If this method is used, it may still be desirable to have the PE eliminate any Target VPNs that, according to its own configuration, are not allowed, and/or to add in some Target VPNs that according to its own configuration are mandatory.
It might be more accurate, if less suggestive, to call this attribute the "Route Target" attribute instead of the "VPN Target" attribute. It really identifies only a set of sites which will be able to use the route, without prejudice to whether those sites constitute what might intuitively be called a VPN. 8]) When the PE processes a received packet that has this label at the top of the stack, the PE will pop the stack, and send the packet directly to the site from to which the route leads. This will usually mean that it just sends the packet to the CE router from which it learned the route. The label may also determine the data link encapsulation. In most cases, the label assigned by a PE will cause the packet to be sent directly to a CE, and the PE which receives the labeled packet will not look up the packet's destination address in any forwarding table. However, it is also possible for the PE to assign a label which implicitly identifies a particular forwarding table. In this case, the PE receiving a packet that label would look up the packet's destination address in one of its forwarding tables. While this can
be very useful in certain circumstances, we do not consider it further in this paper. Note that the MPLS label that is distributed in this way is only usable if there is a label switched path between the router that installs a route and the BGP next hop of that route. We do not make any assumption about the procedure used to set up that label switched path. It may be set up on a pre-established basis, or it may be set up when a route which would need it is installed. It may be a "best effort" route, or it may be a traffic engineered route. Between a particular PE router and its BGP next hop for a particular route there may be one LSP, or there may be several, perhaps with different QoS characteristics. All that matters for the VPN architecture is that some label switched path between the router and its BGP next hop exists. All the usual techniques for using route reflectors  to improve scalability, e.g., route reflector hierarchies, are available. If route reflectors are used, there is no need to have any one route reflector know all the VPN-IPv4 routes for all the VPNs supported by the backbone. One can have separate route reflectors, which do not communicate with each other, each of which supports a subset of the total set of VPNs. If a given PE router is not attached to any of the Target VPNs of a particular route, it should not receive that route; the other PE or route reflector which is distributing routes to it should apply outbound filtering to avoid sending it unnecessary routes. Of course, if a PE router receives a route via BGP, and that PE is not attached to any of the route's target VPNs, the PE should apply inbound filtering to the route, neither installing nor redistributing it. A router which is not attached to any VPN, i.e., a P router, never installs any VPN-IPv4 routes at all. These distribution rules ensure that there is no one box which needs to know all the VPN-IPv4 routes that are supported over the backbone. As a result, the total number of such routes that can be supported over the backbone is not bound by the capacity of any single device, and therefore can increase virtually without bound.
identify the enterprise which owns the site where the route leads, or to identify the site's intranet. However, other uses are also possible. This attribute could be encoded as an extended BGP communities attribute. In situations in which it is necessary to identify the source of a route, it is this attribute, not the RD, which must be used. This attribute may be used when "constructing" VPNs, as described below. It might be more accurate, if less suggestive, to call this attribute the "Route Origin" attribute instead of the "VPN of Origin" attribute. It really identifies the route only has having come from one of a particular set of sites, without prejudice as to whether that particular set of sites really constitutes a VPN.
section 8 for some discussion of the case where the CE desires to received labeled packets.) When a packet enters the backbone from a particular site via a particular PE router, the packet's route is determined by the contents of the forwarding table which that PE router associated with that site. The forwarding tables of the PE router where the packet
leaves the backbone are not relevant. As a result, one may have multiple routes to the same system, where the particular route chosen for a particular packet is based on the site from which the packet enters the backbone. Note that it is the two-level labeling that makes it possible to keep all the VPN routes out of the P routers, and this in turn is crucial to ensuring the scalability of the model. The backbone does not even need to have routes to the CEs, only to the PEs.
R1, and as a result distributes an IPv4 route R2 to a CE, then R2 must not be distributed back from that CE's site to a PE router, say PE2, (where PE1 and PE2 may be the same router or different routers), unless PE2 maps R2 to a VPN-IPv4 route which is different than (i.e., contains a different RD than) R1. 3. The PE and CE routers may be OSPF peers. In this case, the site should be a single OSPF area, the CE should be an ABR in that area, and the PE should be an ABR which is not in that area. Also, the PE should report no router links other than those to the CEs which are at the same site. (This technique should be used only in stub VPNs.) 4. The PE and CE routers may be BGP peers, and the CE router may use BGP (in particular, EBGP to tell the PE router the set of address prefixes which are at the CE router's site. (This technique can be used in stub VPNs or transit VPNs.) From a purely technical perspective, this is by far the best technique: a) Unlike the IGP alternatives, this does not require the PE to run multiple routing algorithm instances in order to talk to multiple CEs b) BGP is explicitly designed for just this function: passing routing information between systems run by different administrations c) If the site contains "BGP backdoors", i.e., routers with BGP connections to routers other than PE routers, this procedure will work correctly in all circumstances. The other procedures may or may not work, depending on the precise circumstances. d) Use of BGP makes it easy for the CE to pass attributes of the routes to the PE. For example, the CE may suggest a particular Target for each route, from among the Target attributes that the PE is authorized to attach to the route. On the other hand, using BGP is likely to be something new for the CE administrators, except in the case where the customer itself is already an Internet Service Provider (ISP).
If a site is not in a transit VPN, note that it need not have a unique Autonomous System Number (ASN). Every CE whose site which is not in a transit VPN can use the same ASN. This can be chosen from the private ASN space, and it will be stripped out by the PE. Routing loops are prevented by use of the Site of Origin Attribute (see below). If a set of sites constitute a transit VPN, it is convenient to represent them as a BGP Confederation, so that the internal structure of the VPN is hidden from any router which is not within the VPN. In this case, each site in the VPN would need two BGP connections to the backbone, one which is internal to the confederation and one which is external to it. The usual intra-confederation procedures would have to be slightly modified in order to take account for the fact that the backbone and the sites may have different policies. The backbone is a member of the confederation on one of the connections, but is not a member on the other. These techniques may be useful if the customer for the VPN service is an ISP. This technique allows a customer that is an ISP to obtain VPN backbone service from one of its ISP peers. (However, if a VPN customer is itself an ISP, and its CE routers support MPLS, a much simpler technique can be used, wherein the ISP is regarded as a stub VPN. See section 8.) When we do not need to distinguish among the different ways in which a PE can be informed of the address prefixes which exist at a given site, we will simply say that the PE has "learned" the routes from that site. Before a PE can redistribute a VPN-IPv4 route learned from a site, it must assign certain attributes to the route. There are three such attributes: - Site of Origin This attribute uniquely identifies the site from which the PE router learned the route. All routes learned from a particular site must be assigned the same Site of Origin attribute, even if a site is multiply connected to a single PE, or is connected to multiple PEs. Distinct Site of Origin attributes must be used for distinct sites. This attribute could be encoded as an extended BGP communities attribute (section 4.2.1). - VPN of Origin See section 4.2.1.
- Target VPN See section 4.2.1.
the CE, it would know which forwarding table to look in; the label placed on the packet by the CE would identify only the virtual site from which the packet is coming. 5]. This is discussed in the remainder of this section.
6]. Every VPN-IPv4 route can have an attribute which identifies the next CE router that will be traversed if that route is followed. If this information is provided to all the CE routers in the VPN, standard IPSEC Tunnel Mode can be used. If the CE and PE are BGP peers, it is natural to present this information as a BGP attribute. Each CE that is to use IPSEC should also be configured with a set of address prefixes, such that it is prohibited from sending insecure traffic to any of those addresses. This prevents the CE from sending insecure traffic if, for some reason, it fails to obtain the necessary information. When MPLS is used to carry packets between the two endpoints of an IPSEC tunnel, the IPSEC outer header does not really perform any function. It might be beneficial to develop a form of IPSEC tunnel mode which allows the outer header to be omitted when MPLS is used.
With such a scheme, standard Tunnel Mode IPSEC could not be used, because there is no way to fill in the IP destination address field of the "outer header". However, when MPLS is used for forwarding, there is no real need for this outer header anyway; the PE router can use MPLS to get a packet to a tunnel endpoint without even knowing the IP address of that endpoint; it only needs to see the IP destination address of the "inner header". A significant advantage of a scheme like this is that it makes routing changes (in particular, a change of egress CE for a particular address prefix) transparent to the security mechanism. This could be particularly important in the case of multi-provider VPNs, where the need to distribute information about such routing changes simply to support the security mechanisms could result in scalability issues. Another advantage is that it eliminates the need for the outer IP header, since the MPLS encapsulation performs its role. 10], or, where ATM is used as the backbone, through the use of ATM QoS capabilities. The traffic engineering work discussed in  is also directly applicable to MPLS/BGP VPNs. Traffic engineering could even be used to establish LSPs with particular QoS characteristics between particular pairs of sites, if that is desirable. Where an MPLS/BGP VPN spans multiple SPs, the architecture described in  may be useful. An SP may apply either intserv or diffserv capabilities to a particular VPN, as appropriate.
P routers do not maintain any VPN routes. In order to properly forward VPN traffic, the P routers need only maintain routes to the PE routers and the ASBRs. The use of two levels of labeling is what makes it possible to keep the VPN routes out of the P routers. A PE router to maintains VPN routes, but only for those VPNs to which it is directly attached. Route reflectors and ASBRs can be partitioned among VPNs so that each partition carries routes for only a subset of the VPNs provided by the Service Provider. Thus no single Route Reflector or ASBR is required to maintain routes for all the VPNs. As a result, no single component within the Service Provider network has to maintain all the routes for all the VPNs. So the total capacity of the network to support increasing numbers of VPNs is not limited by the capacity of any individual component.
 Awduche, Berger, Gan, Li, Swallow, and Srinavasan, "Extensions to RSVP for LSP Tunnels", Work in Progress.  Bates, T. and R. Chandrasekaran, "BGP Route Reflection: An alternative to full mesh IBGP", RFC 1966, June 1996.  Bates, T., Chandra, R., Katz, D. and Y. Rekhter, "Multiprotocol Extensions for BGP4", RFC 2283, February 1998.  Gleeson, Heinanen, and Armitage, "A Framework for IP Based Virtual Private Networks", Work in Progress.  Kent and Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998.  Li, "CPE based VPNs using MPLS", October 1998, Work in Progress.  Li, T. and Y. Rekhter, "A Provider Architecture for Differentiated Services and Traffic Engineering (PASTE)", RFC 2430, October 1998.  Rekhter and Rosen, "Carrying Label Information in BGP4", Work in Progress.  Rosen, Viswanathan, and Callon, "Multiprotocol Label Switching Architecture", Work in Progress.  Rosen, Rekhter, Tappan, Farinacci, Fedorkow, Li, and Conta, "MPLS Label Stack Encoding", Work in Progress.