Tech-invite   3GPPspecs   Glossaries   IETFRFCs   Groups   SIP   ABNFs   Ti+   Search in Tech-invite

in Index   Prev   Next
in Index   Prev   None  Group: BESS

RFC 8584

Framework for Ethernet VPN Designated Forwarder Election Extensibility

Pages: 32
Proposed STD
Updates:  7432
Part 1 of 2 – Pages 1 to 19
None   None   Next

Top   ToC   Page 1
Internet Engineering Task Force (IETF)                   J. Rabadan, Ed.
Request for Comments: 8584                                         Nokia
Updates: 7432                                            S. Mohanty, Ed.
Category: Standards Track                                     A. Sajassi
ISSN: 2070-1721                                                    Cisco
                                                                J. Drake
                                                                 Juniper
                                                              K. Nagaraj
                                                            S. Sathappan
                                                                   Nokia
                                                              April 2019


 Framework for Ethernet VPN Designated Forwarder Election Extensibility

Abstract

   An alternative to the default Designated Forwarder (DF) selection
   algorithm in Ethernet VPNs (EVPNs) is defined.  The DF is the
   Provider Edge (PE) router responsible for sending Broadcast, Unknown
   Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge
   (CE) device on a given VLAN on a particular Ethernet Segment (ES).
   In addition, the ability to influence the DF election result for a
   VLAN based on the state of the associated Attachment Circuit (AC) is
   specified.  This document clarifies the DF election Finite State
   Machine in EVPN services.  Therefore, it updates the EVPN
   specification (RFC 7432).

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   https://www.rfc-editor.org/info/rfc8584.
Top   ToC   Page 2
Copyright Notice

   Copyright (c) 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................3
      1.1. Conventions and Terminology ................................3
      1.2. Default Designated Forwarder (DF) Election in EVPN
           Services ...................................................5
      1.3. Problem Statement ..........................................8
           1.3.1. Unfair Load Balancing and Service Disruption ........8
           1.3.2. Traffic Black-Holing on Individual AC Failures .....10
      1.4. The Need for Extending the Default DF Election in
           EVPN Services .............................................12
   2. Designated Forwarder Election Protocol and BGP Extensions ......13
      2.1. The DF Election Finite State Machine (FSM) ................13
      2.2. The DF Election Extended Community ........................16
           2.2.1. Backward Compatibility .............................19
   3. The Highest Random Weight DF Election Algorithm ................19
      3.1. HRW and Consistent Hashing ................................20
      3.2. HRW Algorithm for EVPN DF Election ........................20
   4. The AC-Influenced DF Election Capability .......................22
      4.1. AC-Influenced DF Election Capability for
           VLAN-Aware Bundle Services ................................24
   5. Solution Benefits ..............................................25
   6. Security Considerations ........................................26
   7. IANA Considerations ............................................27
   8. References .....................................................28
      8.1. Normative References ......................................28
      8.2. Informative References ....................................29
   Acknowledgments ...................................................30
   Contributors ......................................................30
   Authors' Addresses ................................................31
Top   ToC   Page 3
1.  Introduction

   The Designated Forwarder (DF) in Ethernet VPNs (EVPNs) is the
   Provider Edge (PE) router responsible for sending Broadcast, Unknown
   Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge
   (CE) device on a given VLAN on a particular Ethernet Segment (ES).
   The DF is elected from the set of multihomed PEs attached to a given
   ES, each of which advertises an ES route for the ES as identified by
   its Ethernet Segment Identifier (ESI).  By default, the EVPN uses a
   DF election algorithm referred to as "service carving".  The DF
   election algorithm is based on a modulus function (V mod N) that
   takes the number of PEs in the ES (N) and the VLAN value (V) as
   input.  This document addresses inefficiencies in the default DF
   election algorithm by defining a new DF election algorithm and an
   ability to influence the DF election result for a VLAN, depending on
   the state of the associated Attachment Circuit (AC).  In order to
   avoid any ambiguity with the identifier used in the DF election
   algorithm, this document uses the term "Ethernet Tag" instead of
   "VLAN".  This document also creates a registry with IANA for future
   DF election algorithms and capabilities (see Section 7).  It also
   presents a formal definition and clarification of the DF election
   Finite State Machine (FSM).  Therefore, this document updates
   [RFC7432], and EVPN implementations MUST conform to the
   prescribed FSM.

   The procedures described in this document apply to DF election in all
   EVPN solutions, including those described in [RFC7432] and [RFC8214].
   Apart from the formal description of the FSM, this document does not
   intend to update other procedures described in [RFC7432]; it only
   aims to improve the behavior of the DF election on PEs that are
   upgraded to follow the procedures described in this document.

1.1.  Conventions and Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   o  AC: Attachment Circuit.  An AC has an Ethernet Tag associated
      with it.

   o  ACS: Attachment Circuit Status.

   o  BUM: Broadcast, unknown unicast, and multicast.

   o  DF: Designated Forwarder.
Top   ToC   Page 4
   o  NDF: Non-Designated Forwarder.

   o  BDF: Backup Designated Forwarder.

   o  Ethernet A-D per ES route: Refers to Route Type 1 as defined in
      [RFC7432] or to Auto-discovery per Ethernet Segment route.

   o  Ethernet A-D per EVI route: Refers to Route Type 1 as defined in
      [RFC7432] or to Auto-discovery per EVPN Instance route.

   o  ES: Ethernet Segment.

   o  ESI: Ethernet Segment Identifier.

   o  EVI: EVPN Instance.

   o  MAC-VRF: A Virtual Routing and Forwarding table for Media Access
      Control (MAC) addresses on a PE.

   o  BD: Broadcast Domain.  An EVI may be comprised of one BD
      (VLAN-based or VLAN Bundle services) or multiple BDs (VLAN-aware
      Bundle services).

   o  Bridge table: An instantiation of a BD on a MAC-VRF.

   o  HRW: Highest Random Weight.

   o  VID: VLAN Identifier.

   o  CE-VID: Customer Edge VLAN Identifier.

   o  Ethernet Tag: Used to represent a BD that is configured on a given
      ES for the purpose of DF election.  Note that any of the following
      may be used to represent a BD: VIDs (including Q-in-Q tags),
      configured IDs, VNIs (Virtual Extensible Local Area Network
      (VXLAN) Network Identifiers), normalized VIDs, I-SIDs (Service
      Instance Identifiers), etc., as long as the representation of the
      BDs is configured consistently across the multihomed PEs attached
      to that ES.  The Ethernet Tag value MUST be different from zero.

   o  Ethernet Tag ID: Refers to the identifier used in the EVPN routes
      defined in [RFC7432].  Its value may be the same as the Ethernet
      Tag value (see the definition for Ethernet Tag) when advertising
      routes for VLAN-aware Bundle services.  Note that in the case of
      VLAN-based or VLAN Bundle services, the Ethernet Tag ID is zero.
Top   ToC   Page 5
   o  DF election procedure: Also called "DF election".  Refers to the
      process in its entirety, including the discovery of the PEs in the
      ES, the creation and maintenance of the PE candidate list, and the
      selection of a PE.

   o  DF algorithm: A component of the DF election procedure.  Strictly
      refers to the selection of a PE for a given <ES, Ethernet Tag>.

   o  RR: Route Reflector.  A network routing component for BGP
      [RFC4456].  It offers an alternative to the logical full-mesh
      requirement of the Internal Border Gateway Protocol (IBGP).  The
      purpose of the RR is concentration.  Multiple BGP routers can peer
      with a central point, the RR -- acting as a route reflector server
      -- rather than peer with every other router in a full mesh.  This
      results in an O(N) peering as opposed to O(N^2).

   o  TTL: Time To Live.

   This document also assumes that the reader is familiar with the
   terminology provided in [RFC7432].

1.2.  Default Designated Forwarder (DF) Election in EVPN Services

   [RFC7432] defines the DF as the EVPN PE responsible for:

   o  Flooding BUM traffic on a given Ethernet Tag on a particular ES to
      the CE.  This is valid for Single-Active and All-Active EVPN
      multihoming.

   o  Sending unicast traffic on a given Ethernet Tag on a particular ES
      to the CE.  This is valid for Single-Active multihoming.
Top   ToC   Page 6
   Figure 1 illustrates an example that we will use to explain the DF
   function.

                        +---------------+
                        |   IP/MPLS     |
                        |   Core        |
          +----+ ES1 +----+           +----+
          | CE1|-----|    |           |    |____ES2
          +----+     | PE1|           | PE2|    \
                     |    |           +----+     \+----+
                     +----+             |         | CE2|
                        |             +----+     /+----+
                        |             |    |____/   |
                        |             | PE3|    ES2 /
                        |             +----+       /
                        |               |         /
                        +-------------+----+     /
                                      | PE4|____/ES2
                                      |    |
                                      +----+

                        Figure 1: EVPN Multihoming

   Figure 1 illustrates a case where there are two ESes: ES1 and ES2.
   PE1 is attached to CE1 via ES1, whereas PE2, PE3, and PE4 are
   attached to CE2 via ES2, i.e., PE2, PE3, and PE4 form a redundancy
   group.  Since CE2 is multihomed to different PEs on the same ES, it
   is necessary for PE2, PE3, and PE4 to agree on a DF to satisfy the
   above-mentioned requirements.

   The effect of forwarding loops in a Layer 2 network is particularly
   severe because of the broadcast nature of Ethernet traffic and the
   lack of a TTL.  Therefore, it is very important that, in the case of
   a multihomed CE, only one of the PEs be used to send BUM traffic
   to it.

   One of the prerequisites for this support is that participating PEs
   must agree amongst themselves as to who would act as the DF.  This
   needs to be achieved through a distributed algorithm in which each
   participating PE independently and unambiguously selects one of the
   participating PEs as the DF, and the result should be consistent and
   unanimous.

   The default algorithm for DF election defined by [RFC7432] at the
   granularity of (ESI, EVI) is referred to as "service carving".  In
   this document, service carving and the default DF election algorithm
   are used interchangeably.  With service carving, it is possible to
   elect multiple DFs per ES (one per EVI) in order to perform load
Top   ToC   Page 7
   balancing of traffic destined to a given ES.  The objective is that
   the load-balancing procedures should carve up the BD space among the
   redundant PE nodes evenly, in such a way that every PE is the DF for
   a distinct set of EVIs.

   The DF election algorithm (as described in [RFC7432], Section 8.5) is
   based on a modulus operation.  The PEs to which the ES (for which DF
   election is to be carried out per EVI) is multihomed form an ordered
   (ordinal) list in ascending order by PE IP address value.  For
   example, there are N PEs: PE0, PE1,... PE(N-1) ranked as per
   increasing IP addresses in the ordinal list; then, for each VLAN with
   Ethernet Tag V, configured on ES1, PEx is the DF for VLAN V on ES1
   when x equals (V mod N).  In the case of a VLAN Bundle, only the
   lowest VLAN is used.  In the case when the planned density is high
   (meaning there are a significant number of VLANs and the Ethernet
   Tags are uniformly distributed), the thinking is that the DF election
   will be spread across the PEs hosting that ES and good load balancing
   can be achieved.

   However, the described default DF election algorithm has some
   undesirable properties and, in some cases, can be somewhat disruptive
   and unfair.  This document describes some of those issues and defines
   a mechanism for dealing with them.  These mechanisms do involve
   changes to the default DF election algorithm, but they do not require
   any changes to the EVPN route exchange, and changes in the EVPN
   routes will be minimal.

   In addition, there is a need to extend the DF election procedures so
   that new algorithms and capabilities are possible.  A single
   algorithm (the default DF election algorithm) may not meet the
   requirements in all the use cases.

   Note that while [RFC7432] elects a DF per <ES, EVI>, this document
   elects a DF per <ES, BD>.  This means that unlike [RFC7432], where
   for a VLAN-aware Bundle service EVI there is only one DF for the EVI,
   this document specifies that there will be multiple DFs, one for each
   BD configured in that EVI.
Top   ToC   Page 8
1.3.  Problem Statement

   This section describes some potential issues with the default DF
   election algorithm.

1.3.1.  Unfair Load Balancing and Service Disruption

   There are three fundamental problems with the current default DF
   election algorithm.

   1.  The algorithm will not perform well when the Ethernet Tag follows
       a non-uniform distribution -- for instance, when the Ethernet
       Tags are all even or all odd.  In such a case, let us assume that
       the ES is multihomed to two PEs; one of the PEs will be elected
       as the DF for all of the VLANs.  This is very suboptimal.  It
       defeats the purpose of service carving, as the DFs are not really
       evenly spread across the PEs hosting the ES.  In fact, in this
       particular case, one of the PEs does not get elected as the DF at
       all, so it does not participate in DF responsibilities at all.
       Consider another example where, referring to Figure 1, let's
       assume that (1) PE2, PE3, and PE4 are listed in ascending order
       by IP address and (2) each VLAN configured on ES2 is associated
       with an Ethernet Tag of the form (3x+1), where x is an integer.
       This will result in PE3 always being selected as the DF.

   2.  The Ethernet Tag that identifies the BD can be as large as 2^24;
       however, it is not guaranteed that the tenant BD on the ES will
       conform to a uniform distribution.  In fact, it is up to the
       customer what BDs they will configure on the ES.  Quoting
       [Knuth]:

          In general, we want to avoid values of M that divide r^k+a or
          r^k-a, where k and a are small numbers and r is the radix of
          the alphabetic character set (usually r=64, 256 or 100), since
          a remainder modulo such a value of M tends to be largely a
          simple superposition of key digits.  Such considerations
          suggest that we choose M to be a prime number such that
          r^k!=a(modulo)M or r^k!=?a(modulo)M for small k & a.

       In our case, N is the number of PEs (Section 8.5 of [RFC7432]).
       N corresponds to M above.  Since N, N-1, or N+1 need not satisfy
       the primality properties of M, as per the modulo-based DF
       assignment [RFC7432], whenever a PE goes down or a new PE boots
       up (attached to the same ES), the modulo scheme will not
       necessarily map BDs to PEs uniformly.
Top   ToC   Page 9
   3.  Disruption is another problem.  Consider a case when the same ES
       is multihomed to a set of PEs.  When the ES is DOWN in one of the
       PEs, say PE1, or PE1 itself reboots, or the BGP process goes down
       or the connectivity between PE1 and an RR goes down, the
       effective number of PEs in the system now becomes N-1, and DFs
       are computed for all the VLANs that are configured on that ES.
       In general, if the DF for a VLAN V happens not to be PE1, but
       some other PE, say PE2, it is likely that some other PE
       (different from PE1 and PE2) will become the new DF.  This is not
       desirable.  Similarly, when a new PE hosts the same ES, the
       mapping again changes because of the modulus operation.  This
       results in needless churn.  Again referring to Figure 1, say V1,
       V2, and V3 are VLANs configured on ES2 with associated Ethernet
       Tags of values 999, 1000, and 1001, respectively.  So, PE1, PE2,
       and PE3 are the DFs for V1, V2, and V3, respectively.  Now when
       PE3 goes down, PE2 will become the DF for V1 and PE1 will become
       the DF for V2.

   One point to note is that the default DF election algorithm assumes
   that all the PEs who are multihomed to the same ES (and interested in
   the DF election by exchanging EVPN routes) use an Originating
   Router's IP address [RFC7432] of the same family.  This does not need
   to be the case, as the EVPN address family can be carried over an
   IPv4 or IPv6 peering, and the PEs attached to the same ES may use an
   address of either family.

   Mathematically, a conventional hash function maps a key k to a number
   i representing one of m hash buckets through a function h(k), i.e.,
   i = h(k).  In the EVPN case, h is simply a modulo-m hash function
   viz. h(V) = V mod N, where N is the number of PEs that are multihomed
   to the ES in question.  It is well known that for good hash
   distribution using the modulus operation, the modulus N should be a
   prime number not too close to a power of 2 [CLRS2009].  When the
   effective number of PEs changes from N to N-1 (or vice versa), all
   the objects (VLAN V) will be remapped except those for which V mod N
   and V mod (N-1) refer to the same PE in the previous and subsequent
   ordinal rankings, respectively.  From a forwarding perspective, this
   is a churn, as it results in reprogramming the PE ports as either
   blocking or non-blocking at the PEs where the DF state changes.

   This document addresses this problem and furnishes a solution to this
   undesirable behavior.
Top   ToC   Page 10
1.3.2.  Traffic Black-Holing on Individual AC Failures

   The default DF election algorithm defined by [RFC7432] takes into
   account only two variables in the modulus function for a given ES:
   the existence of the PE's IP address in the candidate list and the
   locally provisioned Ethernet Tags.

   If the DF for an <ESI, EVI> fails (due to physical link/node
   failures), an ES route withdrawal will make the NDF PEs re-elect the
   DF for that <ESI, EVI> and the service will be recovered.

   However, the default DF election procedure does not provide
   protection against "logical" failures or human errors that may occur
   at the service level on the DF, while the list of active PEs for a
   given ES does not change.  These failures may have an impact not only
   on the local PE where the issue happens but also on the rest of the
   PEs of the ES.  Some examples of such logical failures are listed
   below:

   (a)  A given individual AC defined in an ES is accidentally shut down
        or is not provisioned yet (hence, the ACS is DOWN), while the ES
        is operationally active (since the ES route is active).

   (b)  A given MAC-VRF with a defined ES is either shut down or not
        provisioned yet, while the ES is operationally active (since the
        ES route is active).  In this case, the ACS of all the ACs
        defined in that MAC-VRF is considered to be DOWN.

   Neither (a) nor (b) will trigger the DF re-election on the remote
   multihomed PEs for a given ES, since the ACS is not taken into
   account in the DF election procedures.  While the ACS is used as a DF
   election tiebreaker and trigger in Virtual Private LAN Service (VPLS)
   multihoming procedures [VPLS-MH], there is no procedure defined in
   the EVPN specification [RFC7432] to trigger the DF re-election based
   on the ACS change on the DF.
Top   ToC   Page 11
   Figure 2 shows an example of logical AC failure.

                               +---+
                               |CE4|
                               +---+
                                 |
                            PE4  |
                           +-----+-----+
           +---------------|  +-----+  |---------------+
           |               |  | BD-1|  |               |
           |               +-----------+               |
           |                                           |
           |                   EVPN                    |
           |                                           |
           | PE1               PE2                PE3  |
           | (NDF)             (DF)               (NDF)|
       +-----------+       +-----------+       +-----------+
       |  | BD-1|  |       |  | BD-1|  |       |  | BD-1|  |
       |  +-----+  |-------|  +-----+  |-------|  +-----+  |
       +-----------+       +-----------+       +-----------+
              AC1\   ES12   /AC2  AC3\   ES23   /AC4
                  \        /          \        /
                   \      /            \      /
                    +----+              +----+
                    |CE12|              |CE23|
                    +----+              +----+

          Figure 2: Default DF Election and Traffic Black-Holing

   BD-1 is defined in PE1, PE2, PE3, and PE4.  CE12 is a multihomed CE
   connected to ES12 in PE1 and PE2.  Similarly, CE23 is multihomed to
   PE2 and PE3 using ES23.  Both CE12 and CE23 are connected to BD-1
   through VLAN-based service interfaces: CE12-VID 1 (VID 1 on CE12) is
   associated with AC1 and AC2 in BD-1, whereas CE23-VID 1 is associated
   with AC3 and AC4 in BD-1.  Assume that, although not represented,
   there are other ACs defined on these ESes mapped to different BDs.
Top   ToC   Page 12
   After executing the default DF election algorithm as described in
   [RFC7432], PE2 turns out to be the DF for ES12 and ES23 in BD-1.  The
   following issues may arise:

   (a)  If AC2 is accidentally shut down or is not configured yet, CE12
        traffic will be impacted.  In the case of All-Active
        multihoming, the BUM traffic to CE12 will be "black-holed",
        whereas for Single-Active multihoming, all the traffic to/from
        CE12 will be discarded.  This is because a logical failure in
        PE2's AC2 may not trigger an ES route withdrawal for ES12 (since
        there are still other ACs active on ES12); therefore, PE1 will
        not rerun the DF election procedures.

   (b)  If the bridge table for BD-1 is administratively shut down or is
        not configured yet on PE2, CE12 and CE23 will both be impacted:
        BUM traffic to both CEs will be discarded in the case of
        All-Active multihoming, and all traffic will be discarded
        to/from the CEs in the case of Single-Active multihoming.  This
        is because PE1 and PE3 will not rerun the DF election procedures
        and will keep assuming that PE2 is the DF.

   Quoting [RFC7432], "When an Ethernet tag is decommissioned on an
   Ethernet segment, then the PE MUST withdraw the Ethernet A-D per EVI
   route(s) announced for the <ESI, Ethernet tags> that are impacted by
   the decommissioning."  However, while this A-D per EVI route
   withdrawal is used at the remote PEs performing aliasing or backup
   procedures, it is not used to influence the DF election for the
   affected EVIs.

   This document adds an optional modification of the DF election
   procedure so that the ACS may be taken into account as a variable in
   the DF election; therefore, EVPN can provide protection against
   logical failures.

1.4.  The Need for Extending the Default DF Election in EVPN Services

   Section 1.3 describes some of the issues that exist in the default DF
   election procedures.  In order to address those issues, this document
   introduces a new DF election framework.  This framework allows the
   PEs to agree on a common DF election algorithm, as well as the
   capabilities to enable during the DF election procedure.  Generally,
   "DF election algorithm" refers to the algorithm by which a number of
   input parameters are used to determine the DF PE, while "DF election
   capability" refers to an additional feature that can be used prior to
   the invocation of the DF election algorithm, such as modifying the
   inputs (or list of candidate PEs).
Top   ToC   Page 13
   Within this framework, this document defines a new DF election
   algorithm and a new capability that can influence the DF election
   result:

   o  The new DF election algorithm is referred to as "Highest Random
      Weight" (HRW).  The HRW procedures are described in Section 3.

   o  The new DF election capability is referred to as "AC-Influenced DF
      election" (AC-DF).  The AC-DF procedures are described in
      Section 4.

   o  HRW and AC-DF mechanisms are independent of each other.
      Therefore, a PE may support either HRW or AC-DF independently or
      may support both of them together.  A PE may also support the
      AC-DF capability along with the default DF election algorithm per
      [RFC7432].

   In addition, this document defines a way to indicate the support of
   HRW and/or AC-DF along with the EVPN ES routes advertised for a given
   ES.  Refer to Section 2.2 for more details.

2.  Designated Forwarder Election Protocol and BGP Extensions

   This section describes the BGP extensions required to support the new
   DF election procedures.  In addition, since the EVPN specification
   [RFC7432] leaves several questions open as to the precise FSM
   behavior of the DF election, Section 2.1 precisely describes the
   intended behavior.

2.1.  The DF Election Finite State Machine (FSM)

   Per [RFC7432], the FSM shown in Figure 3 is executed per <ES, VLAN>
   in the case of VLAN-based service or <ES, [VLANs in VLAN Bundle]> in
   the case of a VLAN Bundle on each participating PE.  Note that the
   FSM is conceptual.  Any design or implementation MUST comply with
   behavior that is equivalent to the behavior outlined in this FSM.
Top   ToC   Page 14
                     VLAN_CHANGE                VLAN_CHANGE
                     RCVD_ES                    RCVD_ES
                     LOST_ES                    LOST_ES
                     +----+                     +-------+
                     |    |                     |       v
                     |  +-+----+   ES_UP       ++-------++
                     +->+ INIT +-------------->+ DF_WAIT |
                        ++-----+               +-------+-+
                         ^                             |
     +-----------+       |                             |DF_TIMER
     | ANY_STATE +-------+         VLAN_CHANGE         |
     +-----------+ ES_DOWN    +-----------------+      |
                              |    RCVD_ES      v      v
                     +--------++   LOST_ES     ++------+-+
                     | DF_DONE +<--------------+ DF_CALC +<-+
                     +---------+   CALCULATED  +-------+-+  |
                                                       |    |
                                                       +----+
                                                       VLAN_CHANGE
                                                       RCVD_ES
                                                       LOST_ES

                Figure 3: DF Election Finite State Machine

   Observe that each EVI is locally configured on each of the multihomed
   PEs attached to a given ES and that the FSM does not provide any
   protection against inconsistent configuration between these PEs.
   That is, for a given EVI, one or more of the PEs are inadvertently
   configured with a different set of VLANs for a VLAN-aware Bundle
   service or with different VLANs for a VLAN-based service.

   The states and events shown in Figure 3 are defined as follows.

   States:

   1.  INIT: Initial state.

   2.  DF_WAIT: State in which the participant waits for enough
       information to perform the DF election for the EVI/ESI/VLAN
       combination.

   3.  DF_CALC: State in which the new DF is recomputed.

   4.  DF_DONE: State in which the corresponding DF for the EVI/ESI/VLAN
       combination has been elected.

   5.  ANY_STATE: Refers to any of the above states.
Top   ToC   Page 15
   Events:

   1.  ES_UP: The ES has been locally configured as "UP".

   2.  ES_DOWN: The ES has been locally configured as "DOWN".

   3.  VLAN_CHANGE: The VLANs configured in a bundle (that uses the ES)
       changed.  This event is necessary for VLAN Bundles only.

   4.  DF_TIMER: DF timer [RFC7432] (referred to as "Wait timer" in this
       document) has expired.

   5.  RCVD_ES: A new or changed ES route is received in an Update
       message with an MP_REACH_NLRI.  Receiving an unchanged Update
       MUST NOT trigger this event.

   6.  LOST_ES: An Update message with an MP_UNREACH_NLRI for a
       previously received ES route has been received.  If such a
       message is seen for a route that has not been advertised
       previously, the event MUST NOT be triggered.

   7.  CALCULATED: DF has been successfully calculated.

   Corresponding actions when transitions are performed or states are
   entered/exited:

   1.   ANY_STATE on ES_DOWN:
        (i) Stop the DF Wait timer.
        (ii) Assume an NDF for the local PE.

   2.   INIT on ES_UP: Transition to DF_WAIT.

   3.   INIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing.

   4.   DF_WAIT on entering the state:
        (i) Start the DF Wait timer if not started already or expired.
        (ii) Assume an NDF for the local PE.

   5.   DF_WAIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing.

   6.   DF_WAIT on DF_TIMER: Transition to DF_CALC.

   7.   DF_CALC on entering or re-entering the state:
        (i) Rebuild the candidate list, perform a hash, and perform the
        election.
        (ii) Afterwards, the FSM generates a CALCULATED event against
        itself.
Top   ToC   Page 16
   8.   DF_CALC on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do as prescribed in
        Transition 7.

   9.   DF_CALC on CALCULATED: Mark the election result for the VLAN or
        bundle, and transition to DF_DONE.

   10.  DF_DONE on exiting the state: If a new DF election is triggered
        and the current DF is lost, then assume an NDF for the local PE
        for the VLAN or VLAN Bundle.

   11.  DF_DONE on VLAN_CHANGE, RCVD_ES, or LOST_ES: Transition to
        DF_CALC.

   The above events and transitions are defined for the default DF
   election algorithm.  As described in Section 4, the use of the AC-DF
   capability introduces additional events and transitions.

2.2.  The DF Election Extended Community

   For the DF election procedures to be consistent and unanimous, it is
   necessary that all the participating PEs agree on the DF election
   algorithm and capabilities to be used.  For instance, it is not
   possible for some PEs to continue to use the default DF election
   algorithm while some PEs use HRW.  For brownfield deployments and for
   interoperability with legacy PEs, it is important that all PEs have
   the ability to fall back on the default DF election.  A PE can
   indicate its willingness to support HRW and/or AC-DF by signaling a
   DF Election Extended Community along with the ES route (Route
   Type 4).

   The DF Election Extended Community is a new BGP transitive Extended
   Community attribute [RFC4360] that is defined to identify the DF
   election procedure to be used for the ES.  Figure 4 shows the
   encoding of the DF Election Extended Community.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     | Type = 0x06   | Sub-Type(0x06)| RSV |  DF Alg |    Bitmap     ~
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     ~     Bitmap    |            Reserved                           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 4: DF Election Extended Community
Top   ToC   Page 17
   Where:

   o  Type: 0x06, as registered with IANA (Section 7) for EVPN Extended
      Communities.

   o  Sub-Type: 0x06.  "DF Election Extended Community", as registered
      with IANA.

   o  RSV/Reserved: Reserved bits for information that is specific to
      DF Alg.

   o  DF Alg (5 bits): Encodes the DF election algorithm values (between
      0 and 31) that the advertising PE desires to use for the ES.  This
      document creates an IANA registry called "DF Alg" (Section 7),
      which contains the following values:

      -  Type 0: Default DF election algorithm, or modulus-based
         algorithm as defined in [RFC7432].

      -  Type 1: HRW Algorithm (Section 3).

      -  Types 2-30: Unassigned.

      -  Type 31: Reserved for Experimental Use.

   o  Bitmap (2 octets): Encodes "capabilities" to use with the DF
      election algorithm in the DF Alg field.  This document creates an
      IANA registry (Section 7) for the Bitmap field, with values 0-15.
      This registry is called "DF Election Capabilities" and includes
      the bit values listed below.

                              1 1 1 1 1 1
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         | |A|                           |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

       Figure 5: Bitmap Field in the DF Election Extended Community

      -  Bit 0 (corresponds to Bit 24 of the DF Election Extended
         Community): Unassigned.

      -  Bit 1: AC-DF Capability (AC-Influenced DF election; see
         Section 4).  When set to 1, it indicates the desire to use
         AC-DF with the rest of the PEs in the ES.

      -  Bits 2-15: Unassigned.
Top   ToC   Page 18
   The DF Election Extended Community is used as follows:

   o  A PE SHOULD attach the DF Election Extended Community to any
      advertised ES route, and the Extended Community MUST be sent if
      the ES is locally configured with a DF election algorithm other
      than the default DF election algorithm or if a capability is
      required to be used.  In the Extended Community, the PE indicates
      the desired "DF Alg" algorithm and "Bitmap" capabilities to be
      used for the ES.

      -  Only one DF Election Extended Community can be sent along with
         an ES route.  Note that the intent is not for the advertising
         PE to indicate all the supported DF election algorithms and
         capabilities but to signal the preferred one.

      -  DF Alg values 0 and 1 can both be used with Bit 1 (AC-DF) set
         to 0 or 1.

      -  In general, a specific DF Alg SHOULD determine the use of the
         reserved bits in the Extended Community, which may be used in a
         different way for a different DF Alg.  In particular, for DF
         Alg values 0 and 1, the reserved bits are not set by the
         advertising PE and SHOULD be ignored by the receiving PE.

   o  When a PE receives the ES routes from all the other PEs for the ES
      in question, it checks to see if all the advertisements have the
      Extended Community with the same DF Alg and Bitmap:

      -  If they do, this particular PE MUST follow the procedures for
         the advertised DF Alg and capabilities.  For instance, if all
         ES routes for a given ES indicate DF Alg HRW and AC-DF set
         to 1, then the PEs attached to the ES will perform the DF
         election as per the HRW algorithm and following the AC-DF
         procedures.

      -  Otherwise, if even a single advertisement for Route Type 4 is
         received without the locally configured DF Alg and capability,
         the default DF election algorithm MUST be used as prescribed in
         [RFC7432].  This procedure handles the case where participating
         PEs in the ES disagree about the DF algorithm and capability to
         be applied.

      -  The absence of the DF Election Extended Community or the
         presence of multiple DF Election Extended Communities (in the
         same route) MUST be interpreted by a receiving PE as an
         indication of the default DF election algorithm on the sending
         PE -- that is, DF Alg 0 and no DF election capabilities.
Top   ToC   Page 19
   o  When all the PEs in an ES advertise DF Type 31, they will rely on
      the local policy to decide how to proceed with the DF election.

   o  For any new capability defined in the future, the applicability/
      compatibility of this new capability to/with the existing DF Alg
      values must be assessed on a case-by-case basis.

   o  Likewise, for any new DF Alg defined in the future, its
      applicability/compatibility to/with the existing capabilities must
      be assessed on a case-by-case basis.

2.2.1.  Backward Compatibility

   Implementations that comply with [RFC7432] only (i.e.,
   implementations that predate this specification) will not advertise
   the DF Election Extended Community.  That means that all other
   participating PEs in the ES will not receive DF preferences and will
   revert to the default DF election algorithm without AC-DF.

   Similarly, an implementation that complies with [RFC7432] only and
   that receives a DF Election Extended Community will ignore it and
   will continue to use the default DF election algorithm.



(page 19 continued on part 2)

Next Section