RFC 8014

An Architecture for Data-Center Network Virtualization over Layer 3 (NVO3)

Pages: 35
Informational

Part 2 of 2 – Pages 16 to 35

RFC8014 - Page 16 prevText

5.  Tenant System Types

   This section describes a number of special Tenant System types and
   how they fit into an NVO3 system.

5.1.  Overlay-Aware Network Service Appliances

   Some Network Service Appliances [NVE-NVA] (virtual or physical)
   provide tenant-aware services.  That is, the specific service they
   provide depends on the identity of the tenant making use of the
   service.  For example, firewalls are now becoming available that
   support multitenancy where a single firewall provides virtual
   firewall service on a per-tenant basis, using per-tenant
   configuration rules and maintaining per-tenant state.  Such
   appliances will be aware of the VN an activity corresponds to while
   processing requests.  Unlike server virtualization, which shields VMs
   from needing to know about multitenancy, a Network Service Appliance
   may explicitly support multitenancy.  In such cases, the Network
   Service Appliance itself will be aware of network virtualization and
   either embed an NVE directly or implement a split-NVE as described in
   Section 4.2.  Unlike server virtualization, however, the Network
   Service Appliance may not be running a hypervisor, and the VM
   orchestration system may not interact with the Network Service
   Appliance.  The NVE on such appliances will need to support a control
   plane to obtain the necessary information needed to fully participate
   in an NV Domain.

RFC8014 - Page 17

5.2.  Bare Metal Servers

   Many data centers will continue to have at least some servers
   operating as non-virtualized (or "bare metal") machines running a
   traditional operating system and workload.  In such systems, there
   will be no NVE functionality on the server, and the server will have
   no knowledge of NVO3 (including whether overlays are even in use).
   In such environments, the NVE functionality can reside on the first-
   hop physical switch.  In such a case, the network administrator would
   (manually) configure the switch to enable the appropriate NVO3
   functionality on the switch port connecting the server and associate
   that port with a specific virtual network.  Such configuration would
   typically be static, since the server is not virtualized and, once
   configured, is unlikely to change frequently.  Consequently, this
   scenario does not require any protocol or standards work.

5.3.  Gateways

   Gateways on VNs relay traffic onto and off of a virtual network.
   Tenant Systems use gateways to reach destinations outside of the
   local VN.  Gateways receive encapsulated traffic from one VN, remove
   the encapsulation header, and send the native packet out onto the
   data-center network for delivery.  Outside traffic enters a VN in a
   reverse manner.

   Gateways can be either virtual (i.e., implemented as a VM) or
   physical (i.e., a standalone physical device).  For performance
   reasons, standalone hardware gateways may be desirable in some cases.
   Such gateways could consist of a simple switch forwarding traffic
   from a VN onto the local data-center network or could embed router
   functionality.  On such gateways, network interfaces connecting to
   virtual networks will (at least conceptually) embed NVE (or split-
   NVE) functionality within them.  As in the case with Network Service
   Appliances, gateways may not support a hypervisor and will need an
   appropriate control-plane protocol to obtain the information needed
   to provide NVO3 service.

   Gateways handle several different use cases.  For example, one use
   case consists of systems supporting overlays together with systems
   that do not (e.g., bare metal servers).  Gateways could be used to
   connect legacy systems supporting, e.g., L2 VLANs, to specific
   virtual networks, effectively making them part of the same virtual
   network.  Gateways could also forward traffic between a virtual
   network and other hosts on the data-center network or relay traffic
   between different VNs.  Finally, gateways can provide external
   connectivity such as Internet or VPN access.

RFC8014 - Page 18

5.3.1.  Gateway Taxonomy

   As can be seen from the discussion above, there are several types of
   gateways that can exist in an NVO3 environment.  This section breaks
   them down into the various types that could be supported.  Note that
   each of the types below could be either implemented in a centralized
   manner or distributed to coexist with the NVEs.

5.3.1.1.  L2 Gateways (Bridging)

   L2 Gateways act as Layer 2 bridges to forward Ethernet frames based
   on the MAC addresses present in them.

   L2 VN to Legacy L2:  This type of gateway bridges traffic between L2
      VNs and other legacy L2 networks such as VLANs or L2 VPNs.

   L2 VN to L2 VN:  The main motivation for this type of gateway is to
      create separate groups of Tenant Systems using L2 VNs such that
      the gateway can enforce network policies between each L2 VN.

5.3.1.2.  L3 Gateways (Only IP Packets)

   L3 Gateways forward IP packets based on the IP addresses present in
   the packets.

   L3 VN to Legacy L2:  This type of gateway forwards packets between L3
      VNs and legacy L2 networks such as VLANs or L2 VPNs.  The original
      sender's destination MAC address in any frames that the gateway
      forwards from a legacy L2 network would be the MAC address of the
      gateway.

   L3 VN to Legacy L3:  This type of gateway forwards packets between L3
      VNs and legacy L3 networks.  These legacy L3 networks could be
      local to the data center, be in the WAN, or be an L3 VPN.

   L3 VN to L2 VN:  This type of gateway forwards packets between L3 VNs
      and L2 VNs.  The original sender's destination MAC address in any
      frames that the gateway forwards from a L2 VN would be the MAC
      address of the gateway.

   L2 VN to L2 VN:  This type of gateway acts similar to a traditional
      router that forwards between L2 interfaces.  The original sender's
      destination MAC address in any frames that the gateway forwards
      from any of the L2 VNs would be the MAC address of the gateway.

   L3 VN to L3 VN:  The main motivation for this type of gateway is to
      create separate groups of Tenant Systems using L3 VNs such that
      the gateway can enforce network policies between each L3 VN.

RFC8014 - Page 19

5.4.  Distributed Inter-VN Gateways

   The relaying of traffic from one VN to another deserves special
   consideration.  Whether traffic is permitted to flow from one VN to
   another is a matter of policy and would not (by default) be allowed
   unless explicitly enabled.  In addition, NVAs are the logical place
   to maintain policy information about allowed inter-VN communication.
   Policy enforcement for inter-VN communication can be handled in (at
   least) two different ways.  Explicit gateways could be the central
   point for such enforcement, with all inter-VN traffic forwarded to
   such gateways for processing.  Alternatively, the NVA can provide
   such information directly to NVEs by either providing a mapping for a
   target Tenant System (TS) on another VN or indicating that such
   communication is disallowed by policy.

   When inter-VN gateways are centralized, traffic between TSs on
   different VNs can take suboptimal paths, i.e., triangular routing
   results in paths that always traverse the gateway.  In the worst
   case, traffic between two TSs connected to the same NVE can be hair-
   pinned through an external gateway.  As an optimization, individual
   NVEs can be part of a distributed gateway that performs such
   relaying, reducing or completely eliminating triangular routing.  In
   a distributed gateway, each ingress NVE can perform such relaying
   activity directly so long as it has access to the policy information
   needed to determine whether cross-VN communication is allowed.
   Having individual NVEs be part of a distributed gateway allows them
   to tunnel traffic directly to the destination NVE without the need to
   take suboptimal paths.

   The NVO3 architecture supports distributed gateways for the case of
   inter-VN communication.  Such support requires that NVO3 control
   protocols include mechanisms for the maintenance and distribution of
   policy information about what type of cross-VN communication is
   allowed so that NVEs acting as distributed gateways can tunnel
   traffic from one VN to another as appropriate.

   Distributed gateways could also be used to distribute other
   traditional router services to individual NVEs.  The NVO3
   architecture does not preclude such implementations but does not
   define or require them as they are outside the scope of the NVO3
   architecture.

RFC8014 - Page 20

5.5.  ARP and Neighbor Discovery

   Strictly speaking, for an L2 service, special processing of the
   Address Resolution Protocol (ARP) [RFC826] and IPv6 Neighbor
   Discovery (ND) [RFC4861] is not required.  ARP requests are
   broadcast, and an NVO3 can deliver ARP requests to all members of a
   given L2 virtual network just as it does for any packet sent to an L2
   broadcast address.  Similarly, ND requests are sent via IP multicast,
   which NVO3 can support by delivering via L2 multicast.  However, as a
   performance optimization, an NVE can intercept ARP (or ND) requests
   from its attached TSs and respond to them directly using information
   in its mapping tables.  Since an NVE will have mechanisms for
   determining the NVE address associated with a given TS, the NVE can
   leverage the same mechanisms to suppress sending ARP and ND requests
   for a given TS to other members of the VN.  The NVO3 architecture
   supports such a capability.

6.  NVE-NVE Interaction

   Individual NVEs will interact with each other for the purposes of
   tunneling and delivering traffic to remote TSs.  At a minimum, a
   control protocol may be needed for tunnel setup and maintenance.  For
   example, tunneled traffic may need to be encrypted or integrity
   protected, in which case it will be necessary to set up appropriate
   security associations between NVE peers.  It may also be desirable to
   perform tunnel maintenance (e.g., continuity checks) on a tunnel in
   order to detect when a remote NVE becomes unreachable.  Such generic
   tunnel setup and maintenance functions are not generally
   NVO3-specific.  Hence, the NVO3 architecture expects to leverage
   existing tunnel maintenance protocols rather than defining new ones.

   Some NVE-NVE interactions may be specific to NVO3 (in particular, be
   related to information kept in mapping tables) and agnostic to the
   specific tunnel type being used.  For example, when tunneling traffic
   for TS-X to a remote NVE, it is possible that TS-X is not presently
   associated with the remote NVE.  Normally, this should not happen,
   but there could be race conditions where the information an NVE has
   learned from the NVA is out of date relative to actual conditions.
   In such cases, the remote NVE could return an error or warning
   indication, allowing the sending NVE to attempt a recovery or
   otherwise attempt to mitigate the situation.

   The NVE-NVE interaction could signal a range of indications, for
   example:

   o  "No such TS here", upon a receipt of a tunneled packet for an
      unknown TS

RFC8014 - Page 21

   o  "TS-X not here, try the following NVE instead" (i.e., a redirect)

   o  "Delivered to correct NVE but could not deliver packet to TS-X"

   When an NVE receives information from a remote NVE that conflicts
   with the information it has in its own mapping tables, it should
   consult with the NVA to resolve those conflicts.  In particular, it
   should confirm that the information it has is up to date, and it
   might indicate the error to the NVA so as to nudge the NVA into
   following up (as appropriate).  While it might make sense for an NVE
   to update its mapping table temporarily in response to an error from
   a remote NVE, any changes must be handled carefully as doing so can
   raise security considerations if the received information cannot be
   authenticated.  That said, a sending NVE might still take steps to
   mitigate a problem, such as applying rate limiting to data traffic
   towards a particular NVE or TS.

7.  Network Virtualization Authority (NVA)

   Before sending traffic to and receiving traffic from a virtual
   network, an NVE must obtain the information needed to build its
   internal forwarding tables and state as listed in Section 4.3.  An
   NVE can obtain such information from a Network Virtualization
   Authority (NVA).

   The NVA is the entity that is expected to provide address mapping and
   other information to NVEs.  NVEs can interact with an NVA to obtain
   any required information they need in order to properly forward
   traffic on behalf of tenants.  The term "NVA" refers to the overall
   system, without regard to its scope or how it is implemented.

7.1.  How an NVA Obtains Information

   There are two primary ways in which an NVA can obtain the address
   dissemination information it manages: from the VM orchestration
   system and/or directly from the NVEs themselves.

   On virtualized systems, the NVA may be able to obtain the address-
   mapping information associated with VMs from the VM orchestration
   system itself.  If the VM orchestration system contains a master
   database for all the virtualization information, having the NVA
   obtain information directly from the orchestration system would be a
   natural approach.  Indeed, the NVA could effectively be co-located
   with the VM orchestration system itself.  In such systems, the VM
   orchestration system communicates with the NVE indirectly through the
   hypervisor.

RFC8014 - Page 22

   However, as described in Section 4, not all NVEs are associated with
   hypervisors.  In such cases, NVAs cannot leverage VM orchestration
   protocols to interact with an NVE and will instead need to peer
   directly with them.  By peering directly with an NVE, NVAs can obtain
   information about the TSs connected to that NVE and can distribute
   information to the NVE about the VNs those TSs are associated with.
   For example, whenever a Tenant System attaches to an NVE, that NVE
   would notify the NVA that the TS is now associated with that NVE.
   Likewise, when a TS detaches from an NVE, that NVE would inform the
   NVA.  By communicating directly with NVEs, both the NVA and the NVE
   are able to maintain up-to-date information about all active tenants
   and the NVEs to which they are attached.

7.2.  Internal NVA Architecture

   For reliability and fault tolerance reasons, an NVA would be
   implemented in a distributed or replicated manner without single
   points of failure.  How the NVA is implemented, however, is not
   important to an NVE so long as the NVA provides a consistent and
   well-defined interface to the NVE.  For example, an NVA could be
   implemented via database techniques whereby a server stores address-
   mapping information in a traditional (possibly replicated) database.
   Alternatively, an NVA could be implemented in a distributed fashion
   using an existing (or modified) routing protocol to maintain and
   distribute mappings.  So long as there is a clear interface between
   the NVE and NVA, how an NVA is architected and implemented is not
   important to an NVE.

   A number of architectural approaches could be used to implement NVAs
   themselves.  NVAs manage address bindings and distribute them to
   where they need to go.  One approach would be to use the Border
   Gateway Protocol (BGP) [RFC4364] (possibly with extensions) and route
   reflectors.  Another approach could use a transaction-based database
   model with replicated servers.  Because the implementation details
   are local to an NVA, there is no need to pick exactly one solution
   technology, so long as the external interfaces to the NVEs (and
   remote NVAs) are sufficiently well defined to achieve
   interoperability.

7.3.  NVA External Interface

   Conceptually, from the perspective of an NVE, an NVA is a single
   entity.  An NVE interacts with the NVA, and it is the NVA's
   responsibility to ensure that interactions between the NVE and NVA
   result in consistent behavior across the NVA and all other NVEs using
   the same NVA.  Because an NVA is built from multiple internal
   components, an NVA will have to ensure that information flows to all
   internal NVA components appropriately.

RFC8014 - Page 23

   One architectural question is how the NVA presents itself to the NVE.
   For example, an NVA could be required to provide access via a single
   IP address.  If NVEs only have one IP address to interact with, it
   would be the responsibility of the NVA to handle NVA component
   failures, e.g., by using a "floating IP address" that migrates among
   NVA components to ensure that the NVA can always be reached via the
   one address.  Having all NVA accesses through a single IP address,
   however, adds constraints to implementing robust failover, load
   balancing, etc.

   In the NVO3 architecture, an NVA is accessed through one or more IP
   addresses (or an IP address/port combination).  If multiple IP
   addresses are used, each IP address provides equivalent
   functionality, meaning that an NVE can use any of the provided
   addresses to interact with the NVA.  Should one address stop working,
   an NVE is expected to failover to another.  While the different
   addresses result in equivalent functionality, one address may respond
   more quickly than another, e.g., due to network conditions, load on
   the server, etc.

   To provide some control over load balancing, NVA addresses may have
   an associated priority.  Addresses are used in order of priority,
   with no explicit preference among NVA addresses having the same
   priority.  To provide basic load balancing among NVAs of equal
   priorities, NVEs could use some randomization input to select among
   equal-priority NVAs.  Such a priority scheme facilitates failover and
   load balancing, for example, by allowing a network operator to
   specify a set of primary and backup NVAs.

   It may be desirable to have individual NVA addresses responsible for
   a subset of information about an NV Domain.  In such a case, NVEs
   would use different NVA addresses for obtaining or updating
   information about particular VNs or TS bindings.  Key questions with
   such an approach are how information would be partitioned and how an
   NVE could determine which address to use to get the information it
   needs.

   Another possibility is to treat the information on which NVA
   addresses to use as cached (soft-state) information at the NVEs, so
   that any NVA address can be used to obtain any information, but NVEs
   are informed of preferences for which addresses to use for particular
   information on VNs or TS bindings.  That preference information would
   be cached for future use to improve behavior, e.g., if all requests
   for a specific subset of VNs are forwarded to a specific NVA
   component, the NVE can optimize future requests within that subset by
   sending them directly to that NVA component via its address.

RFC8014 - Page 24

8.  NVE-NVA Protocol

   As outlined in Section 4.3, an NVE needs certain information in order
   to perform its functions.  To obtain such information from an NVA, an
   NVE-NVA protocol is needed.  The NVE-NVA protocol provides two
   functions.  First, it allows an NVE to obtain information about the
   location and status of other TSs with which it needs to communicate.
   Second, the NVE-NVA protocol provides a way for NVEs to provide
   updates to the NVA about the TSs attached to that NVE (e.g., when a
   TS attaches or detaches from the NVE) or about communication errors
   encountered when sending traffic to remote NVEs.  For example, an NVE
   could indicate that a destination it is trying to reach at a
   destination NVE is unreachable for some reason.

   While having a direct NVE-NVA protocol might seem straightforward,
   the existence of existing VM orchestration systems complicates the
   choices an NVE has for interacting with the NVA.

8.1.  NVE-NVA Interaction Models

   An NVE interacts with an NVA in at least two (quite different) ways:

   o  NVEs embedded within the same server as the hypervisor can obtain
      necessary information entirely through the hypervisor-facing side
      of the NVE.  Such an approach is a natural extension to existing
      VM orchestration systems supporting server virtualization because
      an existing protocol between the hypervisor and VM orchestration
      system already exists and can be leveraged to obtain any needed
      information.  Specifically, VM orchestration systems used to
      create, terminate, and migrate VMs already use well-defined
      (though typically proprietary) protocols to handle the
      interactions between the hypervisor and VM orchestration system.
      For such systems, it is a natural extension to leverage the
      existing orchestration protocol as a sort of proxy protocol for
      handling the interactions between an NVE and the NVA.  Indeed,
      existing implementations can already do this.

   o  Alternatively, an NVE can obtain needed information by interacting
      directly with an NVA via a protocol operating over the data-center
      underlay network.  Such an approach is needed to support NVEs that
      are not associated with systems performing server virtualization
      (e.g., as in the case of a standalone gateway) or where the NVE
      needs to communicate directly with the NVA for other reasons.

   The NVO3 architecture will focus on support for the second model
   above.  Existing virtualization environments are already using the
   first model, but they are not sufficient to cover the case of

RFC8014 - Page 25

   standalone gateways -- such gateways may not support virtualization
   and do not interface with existing VM orchestration systems.

8.2.  Direct NVE-NVA Protocol

   An NVE can interact directly with an NVA via an NVE-NVA protocol.
   Such a protocol can be either independent of the NVA internal
   protocol or an extension of it.  Using a purpose-specific protocol
   would provide architectural separation and independence between the
   NVE and NVA.  The NVE and NVA interact in a well-defined way, and
   changes in the NVA (or NVE) do not need to impact each other.  Using
   a dedicated protocol also ensures that both NVE and NVA
   implementations can evolve independently and without dependencies on
   each other.  Such independence is important because the upgrade path
   for NVEs and NVAs is quite different.  Upgrading all the NVEs at a
   site will likely be more difficult in practice than upgrading NVAs
   because of their large number -- one on each end device.  In
   practice, it would be prudent to assume that once an NVE has been
   implemented and deployed, it may be challenging to get subsequent NVE
   extensions and changes implemented and deployed, whereas an NVA (and
   its associated internal protocols) is more likely to evolve over time
   as experience is gained from usage and upgrades will involve fewer
   nodes.

   Requirements for a direct NVE-NVA protocol can be found in [NVE-NVA].

8.3.  Propagating Information Between NVEs and NVAs

   Information flows between NVEs and NVAs in both directions.  The NVA
   maintains information about all VNs in the NV Domain so that NVEs do
   not need to do so themselves.  NVEs obtain information from the NVA
   about where a given remote TS destination resides.  NVAs, in turn,
   obtain information from NVEs about the individual TSs attached to
   those NVEs.

   While the NVA could push information relevant to every virtual
   network to every NVE, such an approach scales poorly and is
   unnecessary.  In practice, a given NVE will only need and want to
   know about VNs to which it is attached.  Thus, an NVE should be able
   to subscribe to updates only for the virtual networks it is
   interested in receiving updates for.  The NVO3 architecture supports
   a model where an NVE is not required to have full mapping tables for
   all virtual networks in an NV Domain.

   Before sending unicast traffic to a remote TS (or TSs for broadcast
   or multicast traffic), an NVE must know where the remote TS(s)
   currently reside.  When a TS attaches to a virtual network, the NVE
   obtains information about that VN from the NVA.  The NVA can provide

RFC8014 - Page 26

   that information to the NVE at the time the TS attaches to the VN,
   either because the NVE requests the information when the attach
   operation occurs or because the VM orchestration system has initiated
   the attach operation and provides associated mapping information to
   the NVE at the same time.

   There are scenarios where an NVE may wish to query the NVA about
   individual mappings within a VN.  For example, when sending traffic
   to a remote TS on a remote NVE, that TS may become unavailable (e.g.,
   because it has migrated elsewhere or has been shut down, in which
   case the remote NVE may return an error indication).  In such
   situations, the NVE may need to query the NVA to obtain updated
   mapping information for a specific TS or to verify that the
   information is still correct despite the error condition.  Note that
   such a query could also be used by the NVA as an indication that
   there may be an inconsistency in the network and that it should take
   steps to verify that the information it has about the current state
   and location of a specific TS is still correct.

   For very large virtual networks, the amount of state an NVE needs to
   maintain for a given virtual network could be significant.  Moreover,
   an NVE may only be communicating with a small subset of the TSs on
   such a virtual network.  In such cases, the NVE may find it desirable
   to maintain state only for those destinations it is actively
   communicating with.  In such scenarios, an NVE may not want to
   maintain full mapping information about all destinations on a VN.
   However, if it needs to communicate with a destination for which it
   does not have mapping information, it will need to be able to query
   the NVA on demand for the missing information on a per-destination
   basis.

   The NVO3 architecture will need to support a range of operations
   between the NVE and NVA.  Requirements for those operations can be
   found in [NVE-NVA].

9.  Federated NVAs

   An NVA provides service to the set of NVEs in its NV Domain.  Each
   NVA manages network virtualization information for the virtual
   networks within its NV Domain.  An NV Domain is administered by a
   single entity.

   In some cases, it will be necessary to expand the scope of a specific
   VN or even an entire NV Domain beyond a single NVA.  For example, an
   administrator managing multiple data centers may wish to operate all
   of its data centers as a single NV Region.  Such cases are handled by
   having different NVAs peer with each other to exchange mapping
   information about specific VNs.  NVAs operate in a federated manner

RFC8014 - Page 27

   with a set of NVAs operating as a loosely coupled federation of
   individual NVAs.  If a virtual network spans multiple NVAs (e.g.,
   located at different data centers), and an NVE needs to deliver
   tenant traffic to an NVE that is part of a different NV Domain, it
   still interacts only with its NVA, even when obtaining mappings for
   NVEs associated with a different NV Domain.

   Figure 3 shows a scenario where two separate NV Domains (A and B)
   share information about a VN.  VM1 and VM2 both connect to the same
   VN, even though the two VMs are in separate NV Domains.  There are
   two cases to consider.  In the first case, NV Domain B does not allow
   NVE-A to tunnel traffic directly to NVE-B.  There could be a number
   of reasons for this.  For example, NV Domains A and B may not share a
   common address space (i.e., traversal through a NAT device is
   required), or for policy reasons, a domain might require that all
   traffic between separate NV Domains be funneled through a particular
   device (e.g., a firewall).  In such cases, NVA-2 will advertise to
   NVA-1 that VM1 on the VN is available and direct that traffic between
   the two nodes be forwarded via IP-G (an IP Gateway).  IP-G would then
   decapsulate received traffic from one NV Domain, translate it
   appropriately for the other domain, and re-encapsulate the packet for
   delivery.

                    xxxxxx                          xxxx        +-----+
   +-----+     xxxxxx    xxxxxx               xxxxxx    xxxxx   | VM2 |
   | VM1 |    xx              xx            xxx             xx  |-----|
   |-----|   xx                x          xx                 x  |NVE-B|
   |NVE-A|   x                 x  +----+  x                   x +-----+
   +--+--+   x   NV Domain A   x  |IP-G|--x                    x    |
      +-------x               xx--+    | x                     xx   |
              x              x    +----+ x     NV Domain B      x   |
           +---x           xx            xx                     x---+
           |    xxxx      xx           +->xx                   xx
           |       xxxxxxxx            |   xx                 xx
       +---+-+                         |     xx              xx
       |NVA-1|                      +--+--+    xx         xxx
       +-----+                      |NVA-2|     xxxx   xxxx
                                    +-----+        xxxxx

               Figure 3: VM1 and VM2 in Different NV Domains

   NVAs at one site share information and interact with NVAs at other
   sites, but only in a controlled manner.  It is expected that policy
   and access control will be applied at the boundaries between
   different sites (and NVAs) so as to minimize dependencies on external
   NVAs that could negatively impact the operation within a site.  It is
   an architectural principle that operations involving NVAs at one site
   not be immediately impacted by failures or errors at another site.

RFC8014 - Page 28

   (Of course, communication between NVEs in different NV Domains may be
   impacted by such failures or errors.)  It is a strong requirement
   that an NVA continue to operate properly for local NVEs even if
   external communication is interrupted (e.g., should communication
   between a local and remote NVA fail).

   At a high level, a federation of interconnected NVAs has some
   analogies to BGP and Autonomous Systems.  Like an Autonomous System,
   NVAs at one site are managed by a single administrative entity and do
   not interact with external NVAs except as allowed by policy.
   Likewise, the interface between NVAs at different sites is well
   defined so that the internal details of operations at one site are
   largely hidden to other sites.  Finally, an NVA only peers with other
   NVAs that it has a trusted relationship with, i.e., where a VN is
   intended to span multiple NVAs.

   Reasons for using a federated model include:

   o  Provide isolation among NVAs operating at different sites at
      different geographic locations.

   o  Control the quantity and rate of information updates that flow
      (and must be processed) between different NVAs in different data
      centers.

   o  Control the set of external NVAs (and external sites) a site peers
      with.  A site will only peer with other sites that are cooperating
      in providing an overlay service.

   o  Allow policy to be applied between sites.  A site will want to
      carefully control what information it exports (and to whom) as
      well as what information it is willing to import (and from whom).

   o  Allow different protocols and architectures to be used for intra-
      NVA vs. inter-NVA communication.  For example, within a single
      data center, a replicated transaction server using database
      techniques might be an attractive implementation option for an
      NVA, and protocols optimized for intra-NVA communication would
      likely be different from protocols involving inter-NVA
      communication between different sites.

   o  Allow for optimized protocols rather than using a one-size-fits-
      all approach.  Within a data center, networks tend to have lower
      latency, higher speed, and higher redundancy when compared with
      WAN links interconnecting data centers.  The design constraints
      and trade-offs for a protocol operating within a data-center
      network are different from those operating over WAN links.  While
      a single protocol could be used for both cases, there could be

RFC8014 - Page 29

      advantages to using different and more specialized protocols for
      the intra- and inter-NVA case.

9.1.  Inter-NVA Peering

   To support peering between different NVAs, an inter-NVA protocol is
   needed.  The inter-NVA protocol defines what information is exchanged
   between NVAs.  It is assumed that the protocol will be used to share
   addressing information between data centers and must scale well over
   WAN links.

10.  Control Protocol Work Areas

   The NVO3 architecture consists of two major distinct entities: NVEs
   and NVAs.  In order to provide isolation and independence between
   these two entities, the NVO3 architecture calls for well-defined
   protocols for interfacing between them.  For an individual NVA, the
   architecture calls for a logically centralized entity that could be
   implemented in a distributed or replicated fashion.  While the IETF
   may choose to define one or more specific architectural approaches to
   building individual NVAs, there is little need to pick exactly one
   approach to the exclusion of others.  An NVA for a single domain will
   likely be deployed as a single vendor product; thus, there is little
   benefit in standardizing the internal structure of an NVA.

   Individual NVAs peer with each other in a federated manner.  The NVO3
   architecture calls for a well-defined interface between NVAs.

   Finally, a hypervisor-NVE protocol is needed to cover the split-NVE
   scenario described in Section 4.2.

11.  NVO3 Data-Plane Encapsulation

   When tunneling tenant traffic, NVEs add an encapsulation header to
   the original tenant packet.  The exact encapsulation to use for NVO3
   does not seem to be critical.  The main requirement is that the
   encapsulation support a Context ID of sufficient size.  A number of
   encapsulations already exist that provide a VN Context of sufficient
   size for NVO3.  For example, Virtual eXtensible Local Area Network
   (VXLAN) [RFC7348] has a 24-bit VXLAN Network Identifier (VNI).
   Network Virtualization using Generic Routing Encapsulation (NVGRE)
   [RFC7637] has a 24-bit Tenant Network ID (TNI).  MPLS-over-GRE
   provides a 20-bit label field.  While there is widespread recognition
   that a 12-bit VN Context would be too small (only 4096 distinct
   values), it is generally agreed that 20 bits (1 million distinct
   values) and 24 bits (16.8 million distinct values) are sufficient for
   a wide variety of deployment scenarios.

RFC8014 - Page 30

12.  Operations, Administration, and Maintenance (OAM)

   The simplicity of operating and debugging overlay networks will be
   critical for successful deployment.

   Overlay networks are based on tunnels between NVEs, so the
   Operations, Administration, and Maintenance (OAM) [RFC6291] framework
   for overlay networks can draw from prior IETF OAM work for tunnel-
   based networks, specifically L2VPN OAM [RFC6136].  RFC 6136 focuses
   on Fault Management and Performance Management as fundamental to
   L2VPN service delivery, leaving the Configuration Management,
   Accounting Management, and Security Management components of the Open
   Systems Interconnection (OSI) Fault, Configuration, Accounting,
   Performance, and Security (FCAPS) taxonomy [M.3400] for further
   study.  This section does likewise for NVO3 OAM, but those three
   areas continue to be important parts of complete OAM functionality
   for NVO3.

   The relationship between the overlay and underlay networks is a
   consideration for fault and performance management -- a fault in the
   underlay may manifest as fault and/or performance issues in the
   overlay.  Diagnosing and fixing such issues are complicated by NVO3
   abstracting the underlay network away from the overlay network (e.g.,
   intermediate nodes on the underlay network path between NVEs are
   hidden from overlay VNs).

   NVO3-specific OAM techniques, protocol constructs, and tools are
   needed to provide visibility beyond this abstraction to diagnose and
   correct problems that appear in the overlay.  Two examples are
   underlay-aware traceroute [TRACEROUTE-VXLAN] and ping protocol
   constructs for overlay networks [VXLAN-FAILURE] [NVO3-OVERLAY].

   NVO3-specific tools and techniques are best viewed as complements to
   (i.e., not as replacements for) single-network tools that apply to
   the overlay and/or underlay networks.  Coordination among the
   individual network tools (for the overlay and underlay networks) and
   NVO3-aware, dual-network tools is required to achieve effective
   monitoring and fault diagnosis.  For example, the defect detection
   intervals and performance measurement intervals ought to be
   coordinated among all tools involved in order to provide consistency
   and comparability of results.

   For further discussion of NVO3 OAM requirements, see [NVO3-OAM].

RFC8014 - Page 31

13.  Summary

   This document presents the overall architecture for NVO3.  The
   architecture calls for three main areas of protocol work:

   1.  A hypervisor-NVE protocol to support split-NVEs as discussed in
       Section 4.2

   2.  An NVE-NVA protocol for disseminating VN information (e.g., inner
       to outer address mappings)

   3.  An NVA-NVA protocol for exchange of information about specific
       virtual networks between federated NVAs

   It should be noted that existing protocols or extensions of existing
   protocols are applicable.

14.  Security Considerations

   The data plane and control plane described in this architecture will
   need to address potential security threats.

   For the data plane, tunneled application traffic may need protection
   against being misdelivered, being modified, or having its content
   exposed to an inappropriate third party.  In all cases, encryption
   between authenticated tunnel endpoints (e.g., via use of IPsec
   [RFC4301]) and enforcing policies that control which endpoints and
   VNs are permitted to exchange traffic can be used to mitigate risks.

   For the control plane, a combination of authentication and encryption
   can be used between NVAs, between the NVA and NVE, as well as between
   different components of the split-NVE approach.  All entities will
   need to properly authenticate with each other and enable encryption
   for their interactions as appropriate to protect sensitive
   information.

   Leakage of sensitive information about users or other entities
   associated with VMs whose traffic is virtualized can also be covered
   by using encryption for the control-plane protocols and enforcing
   policies that control which NVO3 components are permitted to exchange
   control-plane traffic.

   Control-plane elements such as NVEs and NVAs need to collect
   performance and other data in order to carry out their functions.
   This data can sometimes be unexpectedly sensitive, for example,
   allowing non-obvious inferences of activity within a VM.  This
   provides a reason to minimize the data collected in some environments
   in order to limit potential exposure of sensitive information.  As

RFC8014 - Page 32

   noted briefly in RFC 6973 [RFC6973] and RFC 7258 [RFC7258], there is
   an inevitable tension between being privacy sensitive and taking into
   account network operations in NVO3 protocol development.

   See the NVO3 framework security considerations in RFC 7365 [RFC7365]
   for further discussion.

15.  Informative References

   [FRAMEWORK-MCAST]
              Ghanwani, A., Dunbar, L., McBride, M., Bannai, V., and R.
              Krishnan, "A Framework for Multicast in Network
              Virtualization Overlays", Work in Progress,
              draft-ietf-nvo3-mcast-framework-05, May 2016.

   [IEEE.802.1Q]
              IEEE, "IEEE Standard for Local and metropolitan area
              networks--Bridges and Bridged Networks", IEEE 802.1Q-2014,
              DOI 10.1109/ieeestd.2014.6991462,
              <http://ieeexplore.ieee.org/servlet/
              opac?punumber=6991460>.

   [M.3400]   ITU-T, "TMN management functions", ITU-T
              Recommendation M.3400, February 2000,
              <https://www.itu.int/rec/T-REC-M.3400-200002-I/>.

   [NVE-NVA]  Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network
              Virtualization NVE to NVA Control Protocol Requirements",
              Work in Progress, draft-ietf-nvo3-nve-nva-cp-req-05, March
              2016.

   [NVO3-OAM] Chen, H., Ed., Ashwood-Smith, P., Xia, L., Iyengar, R.,
              Tsou, T., Sajassi, A., Boucadair, M., Jacquenet, C.,
              Daikoku, M., Ghanwani, A., and R. Krishnan, "NVO3
              Operations, Administration, and Maintenance Requirements",
              Work in Progress, draft-ashwood-nvo3-oam-requirements-04,
              October 2015.

   [NVO3-OVERLAY]
              Kumar, N., Pignataro, C., Rao, D., and S. Aldrin,
              "Detecting NVO3 Overlay Data Plane failures", Work in
              Progress, draft-kumar-nvo3-overlay-ping-01, January 2014.

   [RFC826]  Plummer, D., "Ethernet Address Resolution Protocol: Or
              Converting Network Protocol Addresses to 48.bit Ethernet
              Address for Transmission on Ethernet Hardware", STD 37,
              RFC 826, DOI 10.17487/RFC0826, November 1982,
              <http://www.rfc-editor.org/info/rfc826>.

RFC8014 - Page 33

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, DOI 10.17487/RFC4301,
              December 2005, <http://www.rfc-editor.org/info/rfc4301>.

   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
              2006, <http://www.rfc-editor.org/info/rfc4364>.

   [RFC4861]  Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
              "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
              DOI 10.17487/RFC4861, September 2007,
              <http://www.rfc-editor.org/info/rfc4861>.

   [RFC6136]  Sajassi, A., Ed. and D. Mohan, Ed., "Layer 2 Virtual
              Private Network (L2VPN) Operations, Administration, and
              Maintenance (OAM) Requirements and Framework", RFC 6136,
              DOI 10.17487/RFC6136, March 2011,
              <http://www.rfc-editor.org/info/rfc6136>.

   [RFC6291]  Andersson, L., van Helvoort, H., Bonica, R., Romascanu,
              D., and S. Mansfield, "Guidelines for the Use of the "OAM"
              Acronym in the IETF", BCP 161, RFC 6291,
              DOI 10.17487/RFC6291, June 2011,
              <http://www.rfc-editor.org/info/rfc6291>.

   [RFC6973]  Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
              Morris, J., Hansen, M., and R. Smith, "Privacy
              Considerations for Internet Protocols", RFC 6973,
              DOI 10.17487/RFC6973, July 2013,
              <http://www.rfc-editor.org/info/rfc6973>.

   [RFC7258]  Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an
              Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May
              2014, <http://www.rfc-editor.org/info/rfc7258>.

   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
              eXtensible Local Area Network (VXLAN): A Framework for
              Overlaying Virtualized Layer 2 Networks over Layer 3
              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
              <http://www.rfc-editor.org/info/rfc7348>.

   [RFC7364]  Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L.,
              Kreeger, L., and M. Napierala, "Problem Statement:
              Overlays for Network Virtualization", RFC 7364,
              DOI 10.17487/RFC7364, October 2014,
              <http://www.rfc-editor.org/info/rfc7364>.

RFC8014 - Page 34

   [RFC7365]  Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
              Rekhter, "Framework for Data Center (DC) Network
              Virtualization", RFC 7365, DOI 10.17487/RFC7365, October
              2014, <http://www.rfc-editor.org/info/rfc7365>.

   [RFC7637]  Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
              Virtualization Using Generic Routing Encapsulation",
              RFC 7637, DOI 10.17487/RFC7637, September 2015,
              <http://www.rfc-editor.org/info/rfc7637>.

   [TRACEROUTE-VXLAN]
              Nordmark, E., Appanna, C., Lo, A., Boutros, S., and A.
              Dubey, "Layer-Transcending Traceroute for Overlay Networks
              like VXLAN", Work in Progress, draft-nordmark-nvo3-
              transcending-traceroute-03, July 2016.

   [USECASES]
              Yong, L., Dunbar, L., Toy, M., Isaac, A., and V. Manral,
              "Use Cases for Data Center Network Virtualization Overlay
              Networks", Work in Progress, draft-ietf-nvo3-use-case-15,
              December 2016.

   [VXLAN-FAILURE]
              Jain, P., Singh, K., Balus, F., Henderickx, W., and V.
              Bannai, "Detecting VXLAN Segment Failure", Work in
              Progress, draft-jain-nvo3-vxlan-ping-00, June 2013.

Acknowledgements

   Helpful comments and improvements to this document have come from
   Alia Atlas, Abdussalam Baryun, Spencer Dawkins, Linda Dunbar, Stephen
   Farrell, Anton Ivanov, Lizhong Jin, Suresh Krishnan, Mirja Kuehlwind,
   Greg Mirsky, Carlos Pignataro, Dennis (Xiaohong) Qin, Erik Smith,
   Takeshi Takahashi, Ziye Yang, and Lucy Yong.

RFC8014 - Page 35

Authors' Addresses

   David Black
   Dell EMC

   Email: david.black@dell.com

   Jon Hudson
   Independent

   Email: jon.hudson@gmail.com


   Lawrence Kreeger
   Independent

   Email: lkreeger@gmail.com


   Marc Lasserre
   Independent

   Email: mmlasserre@gmail.com


   Thomas Narten
   IBM

   Email: narten@us.ibm.com