RFC 8156

DHCPv6 Failover Protocol

Pages: 96
Proposed Standard

Part 4 of 5 – Pages 66 to 85

RFC8156 - Page 66 prevText

8.  Endpoint States

8.1.  State Machine Operation

   Each server (or, more accurately, failover endpoint) can take on a
   variety of failover states.  These states play a crucial role in
   determining the actions that a server will perform when processing a
   request from a DHCP client as well as dealing with changing external
   conditions (e.g., loss of connection to a failover partner).

   The failover state in which a server is running controls the
   following behaviors:

   o  Responsiveness - the server is either responsive to DHCP client
      requests, renew responsive, or unresponsive.

   o  Allocation Pool - which pool of addresses (or prefixes) can be
      used for advertisement on receipt of a SOLICIT or allocation on
      receipt of a REQUEST, RENEW, or REBIND message.

   o  MCLT - ensure that valid lifetimes are not beyond what the partner
      has acked plus the MCLT (unless the failover state doesn't require
      this restriction).

   A server will transition from one failover state to another based on
   the specific values held by the following state variables:

   o  Current failover state.

   o  Communications status ("OK" or not "OK").

   o  Partner's failover state (if known).

   Whenever any of the above state variables change state, the state
   machine is invoked, which may then trigger a change in the current
   failover state.  Thus, whenever the communications status changes,
   the state machine processing is invoked.  This may or may not result
   in a change in the current failover state.

   Whenever a server transitions to a new failover state, the new state
   MUST be communicated to its failover partner in a STATE message if
   the communications status is "OK".  In addition, whenever a server
   makes a transition into a new state, it MUST record the new state,
   its current understanding of its partner's state, and the time at
   which it entered the new state in stable storage.

RFC8156 - Page 67

   The state transition diagram below (Figure 6) gives a condensed view
   of the state machine.  If there are any differences between text
   describing a particular state and the information shown in Figure 6,
   the text should be considered authoritative.

   In Figure 6, the terms "responsive", "r-responsive", and
   "unresponsive" appear in the states and refer to whether the server
   in the indicated state is allowed to be responsive, renew responsive,
   or unresponsive, respectively.  The "+", "-", or "*" in the upper
   right corner of each state is a notation about whether communication
   is ongoing with the other server, with "+" meaning that
   communications are "OK", "-" meaning that communications are
   interrupted, and "*" meaning that communications may be either "OK"
   or interrupted.

RFC8156 - Page 68

       +---------------+  V  +--------------+
       |    RECOVER  * |  |  |   STARTUP  - |
       |(unresponsive) |  +->+(unresponsive)|
       +------+--------+     +--------------+
       +-Comm. OK             +-----------------+
       |     Other State:     |  PARTNER-DOWN - +<---------------------+
       |    RESOLUTION-INTER. | (responsive)    |                      ^
      All     POTENTIAL-      +----+------------+                      |
     Others   CONFLICT------------ | --------+                         |
       |      CONFLICT-DONE     Comm. OK     |     +--------------+    |
    UPDREQ or                 Other State:   |  +--+ RESOLUTION - |    |
    UPDREQALL                  |       |     |  |  | INTERRUPTED  |    |
    Rcv UPDDONE             RECOVER    All   |  |  | (responsive) |    |
       |  +---------------+    |      Others |  |  +------+-----+-+    |
       +->+RECOVER-WAIT * | RECOVER-   |     |  |         ^     |      |
          |(unresponsive) |  WAIT or   |     |  Comm.     |    Ext.    |
          +-----------+---+  DONE      |     |  OK     Comm.   Cmd---->+
   Comm.---+     Wait MCLT     |       V     V  V     Failed           |
   Changed |          V    +---+   +---+-----+--+-+       |            |
    |  +---+----------++   |       | POTENTIAL  + +-------+            |
    |  |RECOVER-DONE * |  Wait     | CONFLICT     +------+             |
    +->+(unresponsive) |  for      |(unresponsive)|   Primary          |
       +------+--------+  Other  +>+----+--------++   resolve    Comm. |
        Comm. OK          State: |      |        ^    conflict  Changed|
   +---Other State:-+   RECOVER- |   Secondary   |       V       V   | |
   |    |           |     DONE   |   resolve     |  +----+-------+--++ |
   | All Others:  POTENT.  |     |   conflict    |  |CONFLICT-DONE * | |
   | Wait for    CONFLICT--|-----+      |        |  | (responsive)   | |
   | Other State:          V            V        |  +-------+--------+ |
   | NORMAL or RECOVER-   ++------------+---+    | Other State: NORMAL |
   |    |       DONE      |     NORMAL    + +<--------------+          |
   |    +--+----------+-->+ pri: responsive +-------External Command-->+
   |       ^          ^   |sec: r-responsive|    |                     |
   |       |          |   +--------+--------+    |                     |
   |       |          |            |             |                     |
   |   Wait for   Comm. OK  Comm. Failed         |             External
   |    Other      Other           |             |             Command
   |    State:     State:     Start Auto         |                or
   | RECOVER-DONE  NORMAL    Partner Down     Comm. OK           Auto
   |       |     COMM.-INT.      Timer       Other State:       Partner
   |    Comm. OK      |            V          All Others         Down
   |   Other State:   |  +---------+--------+    |            expiration
   |     RECOVER      +--+ COMMUNICATIONS - +----+                     |
   |       +-------------+   INTERRUPTED    |                          |
   RECOVER               |  (responsive)    +------------------------->+
   RECOVER-WAIT--------->+------------------+

                 Figure 6: Failover Endpoint State Machine

RFC8156 - Page 69

8.2.  State Machine Initialization

   The state machine is characterized by storage (in stable storage) of
   at least the following information:

   o  Current failover state.

   o  Previous failover state.

   o  Start time of current failover state.

   o  Partner's failover state.

   o  Start time of partner's failover state.

   o  Time most recent message received from partner.

   The state machine is initialized by reading these data items from
   stable storage and restoring their values from the information saved.
   If there is no information in stable storage concerning these items,
   then they should be initialized as follows:

   o  Current failover state: Primary: PARTNER-DOWN, Secondary: RECOVER.

   o  Previous failover state: None.

   o  Start time of current failover state: Current time.

   o  Partner's failover state: None until reception of STATE message.

   o  Start time of partner's failover state: None until reception of
      STATE message.

   o  Time most recent message received from partner: None until message
      received.

RFC8156 - Page 70

8.3.  STARTUP State

   The STARTUP state affords an opportunity for a server to probe its
   partner server before starting to service DHCP clients.  When in the
   STARTUP state, a server attempts to learn its partner's state and
   determine (using that information if it is available) what state it
   should enter.

   The STARTUP state is not shown with any specific state transitions in
   the state machine diagram (Figure 6) because the processing during
   the STARTUP state can cause the server to transition to any of the
   other states, so that specific state transition arcs would only
   obscure other information.

8.3.1.  Operation in STARTUP State

   The server MUST NOT be responsive to DHCP clients in STARTUP state.

   Whenever a STATE message is sent to the partner while in STARTUP
   state, the STARTUP flag MUST be set in the OPTION_F_SERVER_FLAGS
   option and the previously recorded failover state MUST be placed in
   the OPTION_F_SERVER_STATE option, each of which is included in the
   STATE message.

8.3.2.  Transition out of STARTUP State

   The algorithm below is followed every time the server initializes
   itself and enters STARTUP state.

   The variables PREVIOUS-STATE and CURRENT-STATE are defined for use in
   the algorithm description below.  PREVIOUS-STATE is simply for
   storage of a state, while CURRENT-STATE not only stores the current
   state but also changes the current state of the failover endpoint to
   whatever state is set in CURRENT-STATE.

   Step 1: If there is any record of a previous failover state in stable
           storage for this server, then set the PREVIOUS-STATE to the
           last recorded value in stable storage and the TIME-OF-FAILURE
           to the time the server failed or a time beyond which the
           server could not have been operating, and go to Step 2.

           If there is no record of any previous failover state in
           stable storage for this server, then set the PREVIOUS-STATE
           to RECOVER, and set the TIME-OF-FAILURE to 0.  This will
           allow two servers that already have lease information to
           synchronize themselves prior to operating.

RFC8156 - Page 71

           In some cases, an existing server will be commissioned as a
           failover server and brought back into operation when its
           partner is not yet available.  In this case, the newly
           commissioned failover server will not operate until its
           partner comes online -- but it has operational
           responsibilities as a DHCP server nonetheless.  To properly
           handle this situation, a server SHOULD be configurable in
           such a way as to move directly into PARTNER-DOWN state after
           the startup period expires if it has been unable to contact
           its partner during the startup period.

   Step 2: Implementations will differ in the ways that they deal with
           the state machine for failover endpoint states.  In many
           cases, state transitions will occur when communications go
           from "OK" to failed or from failed to "OK", and some
           implementations will implement a portion of their state
           machine processing based on these changes.

           In these cases, during startup, if the PREVIOUS-STATE is one
           where communications were "OK", then set the PREVIOUS-STATE
           to the state that is the result of the communication failed
           state transition when in that state (if such a transition
           exists -- some states don't have a communication failed state
           transition, since they allow both "communications OK" and
           "failed").

   Step 3: Start the STARTUP state timer.  The time that a server
           remains in the STARTUP state (absent any communications with
           its partner) is implementation dependent but SHOULD be short.
           It SHOULD be long enough for a TCP connection to a heavily
           loaded partner to be created across a slow network.

   Step 4: If the server is a primary server, attempt to create a TCP
           connection to the failover partner.  If the server is a
           secondary server, listen on the failover port and wait for
           the primary server to connect.  See Section 6.1.

RFC8156 - Page 72

   Step 5: Wait for "communications OK".

           When and if communications become "OK", clear the STARTUP
           flag, and set the CURRENT-STATE to the PREVIOUS-STATE.

           If the partner is in PARTNER-DOWN state and if the time at
           which it entered PARTNER-DOWN state (as received in the
           OPTION_F_START_TIME_OF_STATE option in the STATE message) is
           later than the last recorded time of operation of this
           server, then set CURRENT-STATE to RECOVER.  If the time at
           which it entered PARTNER-DOWN state is earlier than the last
           recorded time of operation of this server, then set
           CURRENT-STATE to POTENTIAL-CONFLICT.

           Then, transition to the CURRENT-STATE and take the
           "communications OK" state transition based on the
           CURRENT-STATE of this server and the partner.

   Step 6: If the startup time expires prior to communications becoming
           "OK", the server SHOULD transition to PREVIOUS-STATE.

8.4.  PARTNER-DOWN State

   PARTNER-DOWN state is a state either server can enter.  When in this
   state, the server assumes that it is the only server operating and
   serving the client base.  If one server is in PARTNER-DOWN state, the
   other server MUST NOT be operating.

   A server can enter PARTNER-DOWN state as a result of either
   (1) operator intervention (when an operator determines that the
   server's partner is, indeed, down) or (2) an optional
   auto-partner-down capability where PARTNER-DOWN state is entered
   automatically after a server has been in COMMUNICATIONS-INTERRUPTED
   state for a predetermined period of time.

8.4.1.  Operation in PARTNER-DOWN State

   The server MUST be responsive in PARTNER-DOWN state, regardless of
   whether it is primary or secondary.

   It will allow renewal of all outstanding leases.

   For delegable prefixes, the server will allocate leases from its own
   pool, and after a fixed period of time (the MCLT interval) has
   elapsed from entry into PARTNER-DOWN state, it may allocate delegable
   prefixes from the set of all available pools.  The server MUST fully
   deplete its own pool before starting allocations from its downed
   partner's pool.

RFC8156 - Page 73

   IPv6 addresses available for independent allocation by the other
   server (upon entering PARTNER-DOWN state) SHOULD NOT be allocated to
   a client.  If one elects to do so anyway, they MUST NOT be allocated
   to a new client until the MCLT beyond the entry into PARTNER-DOWN
   state has elapsed.

   A server in PARTNER-DOWN state MUST NOT allocate a lease to a DHCP
   client different from the client to which it was allocated at the
   time of entry into PARTNER-DOWN state until the MCLT beyond the
   maximum of the following times: client expiration time, most recently
   transmitted partner-lifetime, most recently received ack of the
   partner-time from the partner, and most recently acked
   partner-lifetime to the partner.  If this time would be earlier than
   the current time plus the MCLT, then the time the server entered
   PARTNER-DOWN state plus the MCLT is used.

   The server is not restricted by the MCLT when offering valid
   lifetimes while in PARTNER-DOWN state.

   In the unlikely case when there are two servers operating in
   PARTNER-DOWN state, there is a chance that duplicate leases for the
   same prefix could be assigned.  This leads to a POTENTIAL-CONFLICT
   (unresponsive) state when the servers reestablish contact.  This
   issue of duplicate leases can be prevented as long as the server
   grants new leases from its own pool; therefore, the server operating
   in PARTNER-DOWN state MUST use its own pool first for new leases
   before assigning any leases from its downed partner's pool.

8.4.2.  Transition out of PARTNER-DOWN State

   When a server in PARTNER-DOWN state succeeds in establishing a
   connection to its partner, its actions are conditional on the state
   and flags received in the STATE message from the other server as part
   of the process of establishing the connection.

   If the STARTUP bit is set in the OPTION_F_SERVER_FLAGS option of a
   received STATE message, a server in PARTNER-DOWN state MUST NOT take
   any state transitions based on reestablishing communications.  If a
   server is in PARTNER-DOWN state, it ignores all STATE messages from
   its partner that have the STARTUP bit set in the
   OPTION_F_SERVER_FLAGS option of the STATE message.

RFC8156 - Page 74

   If the STARTUP bit is not set in the OPTION_F_SERVER_FLAGS option of
   a STATE message received from its partner, then a server in
   PARTNER-DOWN state takes the following actions, based on the state of
   the partner as received in a STATE message (either immediately after
   establishing communications or at any time later when a new state is
   received):

   o  If the partner is in NORMAL, COMMUNICATIONS-INTERRUPTED,
      PARTNER-DOWN, POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or
      CONFLICT-DONE state, then transition to POTENTIAL-CONFLICT state.

   o  If the partner is in RECOVER or RECOVER-WAIT state, then stay in
      PARTNER-DOWN state.

   o  If the partner is in RECOVER-DONE state, then transition to
      NORMAL state.

8.5.  RECOVER State

   This state indicates that the server has no information in its stable
   storage or that it is reintegrating with a server in PARTNER-DOWN
   state after it has been down.  A server in this state MUST attempt to
   refresh its stable storage from the other server.

8.5.1.  Operation in RECOVER State

   The server MUST NOT be responsive in RECOVER state.

   A server in RECOVER state will attempt to reestablish communications
   with the other server.

8.5.2.  Transition out of RECOVER State

   If the other server is in POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED,
   or CONFLICT-DONE state when communications are reestablished, then
   the server in RECOVER state will move itself to POTENTIAL-CONFLICT
   state.

   If the other server is in any other state, then the server in RECOVER
   state will request an update of missing binding information by
   sending an UPDREQ message.  If the server has determined that it has
   lost its stable storage because it has no record of ever having
   talked to its partner even though its partner does have a record of
   communicating with it, it MUST send an UPDREQALL message; otherwise,
   it MUST send an UPDREQ message.

   It will wait for an UPDDONE message, and upon receipt of that message
   it will transition to RECOVER-WAIT state.

RFC8156 - Page 75

   If communication fails during the reception of the results of the
   UPDREQ or UPDREQALL message, the server will remain in RECOVER state
   and will reissue the UPDREQ or UPDREQALL message when communications
   are reestablished.

   If an UPDDONE message isn't received within an implementation-
   dependent amount of time and no BNDUPD messages are being received,
   the connection SHOULD be dropped.

                   A                                        B
                 Server                                  Server

                   |                                        |
                RECOVER                               PARTNER-DOWN
                   |                                        |
                   | >--UPDREQ-------------------->         |
                   |                                        |
                   |        <---------------------BNDUPD--< |
                   | >--BNDREPLY------------------>         |
                  ...                                      ...
                   |                                        |
                   |        <---------------------BNDUPD--< |
                   | >--BNDREPLY------------------>         |
                   |                                        |
                   |        <--------------------UPDDONE--< |
                   |                                        |
              RECOVER-WAIT                                  |
                   |                                        |
                   | >--STATE-(RECOVER-WAIT)------>         |
                   |                                        |
                   |                                        |
          Wait MCLT from last known                         |
             time of failover operation                     |
                   |                                        |
              RECOVER-DONE                                  |
                   |                                        |
                   | >--STATE-(RECOVER-DONE)------>         |
                   |                                     NORMAL
                   |        <-------------(NORMAL)-STATE--< |
                NORMAL                                      |
                   | >---- State-(NORMAL)--------------->   |
                   |                                        |
                   |                                        |

                 Figure 7: Transition out of RECOVER State

RFC8156 - Page 76

   If at any time while a server is in RECOVER state communication
   fails, the server will stay in RECOVER state.  When communications
   are restored, it will restart the process of transitioning out of
   RECOVER state.

8.6.  RECOVER-WAIT State

   This state indicates that the server has sent an UPDREQ or UPDREQALL
   message and has received the UPDDONE message indicating that it has
   received all outstanding binding update information.  In the
   RECOVER-WAIT state, the server will wait for the MCLT in order to
   ensure that any processing that this server might have done prior to
   losing its stable storage will not cause future difficulties.

8.6.1.  Operation in RECOVER-WAIT State

   The server MUST NOT be responsive in RECOVER-WAIT state.

8.6.2.  Transition out of RECOVER-WAIT State

   Upon entry into RECOVER-WAIT state, the server MUST start a timer
   whose expiration is set to a time equal to the time the server went
   down (the TIME-OF-FAILURE from Section 8.3.2), if known, or the time
   the server started (if the TIME-OF-FAILURE is unknown), plus the
   MCLT.  When this timer expires, the server will transition into
   RECOVER-DONE state.

   This allows any IPv6 addresses or prefixes that were allocated by
   this server prior to the loss of its client binding information in
   stable storage to contact the other server or to time out.

   If the server has never before run failover, then there is no need to
   wait in this state, and the server MAY transition immediately to
   RECOVER-DONE state.  However, to determine if this server has run
   failover, it is vital that the information provided by the partner be
   utilized, since the stable storage of this server may have been lost.

   If communication fails while a server is in RECOVER-WAIT state, it
   has no effect on the operation of this state.  The server SHOULD
   continue to operate its timer, and if the timer expires during the
   period where communications with the other server have failed, then
   the server SHOULD transition to RECOVER-DONE state.  This is rare --
   failover state transitions are not usually made while communications
   are interrupted, but in this case there is no reason to inhibit this
   transition.

RFC8156 - Page 77

8.7.  RECOVER-DONE State

   This state exists to allow an interlocked transition for one server
   from RECOVER state and another server from PARTNER-DOWN or
   COMMUNICATIONS-INTERRUPTED state into NORMAL state.

8.7.1.  Operation in RECOVER-DONE State

   A server in RECOVER-DONE state SHOULD be renew responsive and MAY
   respond to RENEW requests but MUST only change the state of a lease
   that appears in the RENEW request.  It MUST NOT allocate any
   additional leases when in RECOVER-DONE state and should only respond
   to RENEW requests where it already has a record of the lease.

8.7.2.  Transition out of RECOVER-DONE State

   When a server in RECOVER-DONE state determines that its partner
   server has entered NORMAL or RECOVER-DONE state, it will transition
   into NORMAL state.

   If the partner server enters RECOVER or RECOVER-WAIT state, this
   server transitions to COMMUNICATIONS-INTERRUPTED.

   If the partner server enters POTENTIAL-CONFLICT state, this server
   enters POTENTIAL-CONFLICT state as well.

   If communication fails while in RECOVER-DONE state, a server will
   stay in RECOVER-DONE state.

8.8.  NORMAL State

   NORMAL state is the state used by a server when it is communicating
   with the other server and any required resynchronization has been
   performed.  While some binding database synchronization is performed
   in NORMAL state, potential conflicts are resolved prior to entry into
   NORMAL state, as is binding database data loss.

   When entering NORMAL state, a server will send to the other server
   all currently unacknowledged binding updates as BNDUPD messages.

   When the above process is complete, if the server entering NORMAL
   state is a secondary server, then it will request delegable prefixes
   for allocation using the POOLREQ message.

RFC8156 - Page 78

8.8.1.  Operation in NORMAL State

   The primary server is responsive in NORMAL state.  The secondary is
   renew responsive in NORMAL state.

   When in NORMAL state, a primary server will operate in the following
   manner:

   Valid lifetime calculations
      As discussed in Section 4.4, the lease interval given to a DHCP
      client can never be more than the MCLT greater than the most
      recently acknowledged partner lifetime received from the failover
      partner or the current time, whichever is later.

      As long as a server adheres to this constraint, the specifics of
      the lease interval that it gives to a DHCP client or the value of
      the partner lifetime sent to its failover partner are
      implementation dependent.

   Lazy update of partner server
      After sending a REPLY that includes a lease update to a client,
      the server servicing a DHCP client request attempts to update its
      partner with the new binding information.  See Section 4.3.

   Reallocation of leases between clients
      Whenever a client binding is released or expires, a BNDUPD message
      must be sent to the partner, setting the binding state to RELEASED
      or EXPIRED.  However, until a BNDREPLY is received for this
      message, the lease cannot be allocated to another client.  It
      cannot be allocated to the same client again if a BNDUPD message
      was sent; otherwise, it can.  See Section 4.2.2.1 for details.

   In NORMAL state, each server receives binding updates from its
   partner server in BNDUPD messages (see Section 7.5.5).  It records
   these in its binding database in stable storage and then sends a
   corresponding BNDREPLY message to its partner server (see
   Section 7.6).

8.8.2.  Transition out of NORMAL State

   If a server in NORMAL state receives an external command informing it
   that its partner is down, it will transition immediately into
   PARTNER-DOWN state.  Generally, this would be an unusual situation,
   where some external agency knew the partner server was down prior to
   the failover server discovering it on its own.

RFC8156 - Page 79

   If a server in NORMAL state fails to receive acks to messages sent to
   its partner for an implementation-dependent period of time, it MAY
   move into COMMUNICATIONS-INTERRUPTED state.  This situation might
   occur if the partner server was capable of maintaining the TCP
   connection between the server and also capable of sending a CONTACT
   message periodically but was (for some reason) incapable of
   processing BNDUPD messages.

   If it is determined that communications are not "OK" (as defined in
   Section 6.6), then the server should transition into
   COMMUNICATIONS-INTERRUPTED state.

   If a server in NORMAL state receives any messages from its partner
   where the partner has changed state from that expected by the server
   in NORMAL state, then the server should transition into
   COMMUNICATIONS-INTERRUPTED state and take the appropriate state
   transition from there.  For example, it would be expected that the
   partner would transition from POTENTIAL-CONFLICT state into NORMAL
   state but not that the partner would transition from NORMAL state
   into POTENTIAL-CONFLICT state.

   If a server in NORMAL state receives a DISCONNECT message from its
   partner, then the server should transition into
   COMMUNICATIONS-INTERRUPTED state.

8.9.  COMMUNICATIONS-INTERRUPTED State

   A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
   unable to communicate with its partner.  Primary and secondary
   servers cycle automatically (without administrative intervention)
   between NORMAL state and COMMUNICATIONS-INTERRUPTED state as the
   network connection between them fails and recovers, or as the partner
   server cycles between operational and non-operational.  No allocation
   of duplicate leases can occur while the servers cycle between these
   states.

   When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
   configured to support an automatic transition out of
   COMMUNICATIONS-INTERRUPTED state and into PARTNER-DOWN state (i.e.,
   auto-partner-down has been configured), then a timer is started for
   the length of the configured auto-partner-down period.

   A server transitioning into the COMMUNICATIONS-INTERRUPTED state from
   the NORMAL state SHOULD raise an alarm condition to alert
   administrative staff to a potential problem in the DHCP subsystem.

RFC8156 - Page 80

8.9.1.  Operation in COMMUNICATIONS-INTERRUPTED State

   In this state, a server MUST respond to all DHCP client requests.
   When allocating new leases, each server allocates from its own pool,
   where the primary MUST allocate only FREE delegable prefixes and the
   secondary MUST allocate only FREE-BACKUP delegable prefixes, and each
   server allocates from its own independent IPv6 address ranges.  When
   responding to RENEW messages, each server will allow continued
   renewal of a DHCP client's current lease, regardless of whether that
   lease was given out by the receiving server or not, although the
   renewal period MUST NOT exceed the MCLT beyond the later of (1) the
   partner lifetime already acknowledged by the other server or (2) now.

   However, since the server cannot communicate with its partner in this
   state, the acknowledged partner lifetime will not be updated, despite
   continued RENEW message processing.  This is likely to eventually
   cause the actual lifetimes to converge to the MCLT (unless this is
   greater than the desired lease time, which would be unusual).

   The server should continue to try to establish a connection with its
   partner.

8.9.2.  Transition out of COMMUNICATIONS-INTERRUPTED State

   If the auto-partner-down timer expires while a server is in
   COMMUNICATIONS-INTERRUPTED state, it will transition immediately into
   PARTNER-DOWN state.

   If a server in COMMUNICATIONS-INTERRUPTED state receives an external
   command informing it that its partner is down, it will transition
   immediately into PARTNER-DOWN state.

   If communications with the other server are restored, then the server
   in COMMUNICATIONS-INTERRUPTED state will transition into another
   state based on the state of the partner:

   o  NORMAL or COMMUNICATIONS-INTERRUPTED: Transition into
      NORMAL state.

   o  RECOVER: Stay in COMMUNICATIONS-INTERRUPTED state.

   o  RECOVER-DONE: Transition into NORMAL state.

   o  PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or
      RESOLUTION-INTERRUPTED: Transition into POTENTIAL-CONFLICT state.

RFC8156 - Page 81

   Figure 8 illustrates the transition from NORMAL state to
   COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again.

             Primary                                Secondary
              Server                                  Server

              NORMAL                                  NORMAL
                | >--CONTACT------------------->         |
                |        <--------------------CONTACT--< |
                |         [TCP connection broken]        |
           COMMUNICATIONS-         :              COMMUNICATIONS-
             INTERRUPTED           :                INTERRUPTED
                |      [attempt new TCP connection]      |
                |         [connection succeeds]          |
                |                                        |
                | >--CONNECT------------------->         |
                |        <---------------CONNECTREPLY--< |
                | >--STATE--------------------->         |
                |                                     NORMAL
                |        <-------------------STATE-----< |
              NORMAL                                     |
                |                                        |
                | >--BNDUPD-------------------->         |
                |        <-------------------BNDREPLY--< |
                |                                        |
                |        <---------------------BNDUPD--< |
                | >------BNDREPLY-------------->         |
               ...                                      ...
                |                                        |
                |        <--------------------POOLREQ--< |
                | >--POOLRESP------------------>         |
                |                                        |
                | >--BNDUPD-(#1)--------------->         |
                |        <-------------------BNDREPLY--< |
                |                                        |
                | >--BNDUPD-(#2)--------------->         |
                |        <-------------------BNDREPLY--< |
                |                                        |

                  Figure 8: Transition from NORMAL State
               to COMMUNICATIONS-INTERRUPTED State and Back

RFC8156 - Page 82

8.10.  POTENTIAL-CONFLICT State

   This state indicates that the two servers are attempting to
   reintegrate with each other but at least one of them was running in a
   state that did not guarantee that automatic reintegration would be
   possible.  In POTENTIAL-CONFLICT state, the servers may determine
   that the same lease has been offered and accepted by two different
   clients.

   A goal of the failover protocol is to minimize the possibility that
   POTENTIAL-CONFLICT state is ever entered.

   When a primary server enters POTENTIAL-CONFLICT state, it should
   request that the secondary send it all updates that the primary
   server has not yet acknowledged by sending an UPDREQ message to the
   secondary server.

   A secondary server entering POTENTIAL-CONFLICT state will wait for
   the primary to send it an UPDREQ message.

8.10.1.  Operation in POTENTIAL-CONFLICT State

   Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming
   DHCP requests.

8.10.2.  Transition out of POTENTIAL-CONFLICT State

   If communication with the partner fails while in POTENTIAL-CONFLICT
   state, then the server will transition to RESOLUTION-INTERRUPTED
   state.

   Whenever either server receives an UPDDONE message from its partner
   while in POTENTIAL-CONFLICT state, it MUST transition to a new state.
   The primary MUST transition to CONFLICT-DONE state, and the secondary
   MUST transition to NORMAL state.  This will cause the primary server
   to leave POTENTIAL-CONFLICT state prior to the secondary, since the
   primary sends an UPDREQ message and receives an UPDDONE message
   before the secondary sends an UPDREQ message and receives its UPDDONE
   message.

   When a secondary server receives an indication that the primary
   server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE
   state, it SHOULD send an UPDREQ message to the primary server.

RFC8156 - Page 83

             Primary                                Secondary
             Server                                  Server

               |                                        |
         POTENTIAL-CONFLICT                    POTENTIAL-CONFLICT
               |                                        |
               | >--UPDREQ-------------------->         |
               |                                        |
               |        <---------------------BNDUPD--< |
               | >--BNDREPLY------------------>         |
              ...                                      ...
               |                                        |
               |        <---------------------BNDUPD--< |
               | >--BNDREPLY------------------>         |
               |                                        |
               |        <--------------------UPDDONE--< |
         CONFLICT-DONE                                  |
               | >--STATE--(CONFLICT-DONE)---->         |
               |        <---------------------UPDREQ--< |
               |                                        |
               | >--BNDUPD-------------------->         |
               |        <-------------------BNDREPLY--< |
              ...                                      ...
               | >--BNDUPD-------------------->         |
               |        <-------------------BNDREPLY--< |
               |                                        |
               | >--UPDDONE------------------->         |
               |                                     NORMAL
               |        <------------STATE--(NORMAL)--< |
            NORMAL                                      |
               | >--STATE--(NORMAL)----------->         |
               |                                        |
               |        <--------------------POOLREQ--< |
               | >------POOLRESP-------------->         |
               |                                        |

           Figure 9: Transition out of POTENTIAL-CONFLICT State

8.11.  RESOLUTION-INTERRUPTED State

   This state indicates that the two servers were attempting to
   reintegrate with each other in POTENTIAL-CONFLICT state but
   communication failed prior to completion of reintegration.

   The RESOLUTION-INTERRUPTED state exists because servers are not
   responsive in POTENTIAL-CONFLICT state, and if one server drops out
   of service while both servers are in POTENTIAL-CONFLICT state, the
   server that remains in service will not be able to process DHCP

RFC8156 - Page 84

   client requests and there will be no DHCP server available to process
   client requests.  The RESOLUTION-INTERRUPTED state is the state that
   a server moves to if its partner disappears while it is in
   POTENTIAL-CONFLICT state.

   When a server enters RESOLUTION-INTERRUPTED state, it SHOULD raise an
   alarm condition to alert administrative staff of a problem in the
   DHCP subsystem.

8.11.1.  Operation in RESOLUTION-INTERRUPTED State

   In this state, a server MUST respond to all DHCP client requests.
   When allocating new leases, each server SHOULD allocate from its own
   pool (if that can be determined), where the primary SHOULD allocate
   only FREE leases and the secondary SHOULD allocate only FREE-BACKUP
   leases.  When responding to renewal requests, each server will allow
   continued renewal of a DHCP client's current lease, independent of
   whether that lease was given out by the receiving server or not,
   although the renewal period MUST NOT exceed the MCLT beyond the
   later of (1) the partner lifetime already acknowledged by the other
   server or (2) now.

   However, since the server cannot communicate with its partner in this
   state, the acknowledged partner lifetime will not be updated in any
   new bindings.

8.11.2.  Transition out of RESOLUTION-INTERRUPTED State

   If a server in RESOLUTION-INTERRUPTED state receives an external
   command informing it that its partner is down, it will transition
   immediately into PARTNER-DOWN state.

   If communications with the other server are restored, then the server
   in RESOLUTION-INTERRUPTED state will transition into
   POTENTIAL-CONFLICT state.

8.12.  CONFLICT-DONE State

   This state indicates that during the process where the two servers
   are attempting to reintegrate with each other, the primary server has
   received all of the updates from the secondary server.  It makes a
   transition into CONFLICT-DONE state so that it can be totally
   responsive to the client load.  There is no operational difference
   between CONFLICT-DONE and NORMAL for the primary server, as in both
   states it responds to all clients' requests.  The distinction between
   CONFLICT-DONE and NORMAL states is necessary in the event that a
   load-balancing extension is ever defined.

RFC8156 - Page 85

8.12.1.  Operation in CONFLICT-DONE State

   A primary server in CONFLICT-DONE state is fully responsive to all
   DHCP clients (similar to the situation in COMMUNICATIONS-INTERRUPTED
   state).

   If communication fails, remain in CONFLICT-DONE state.  If
   communication becomes "OK", remain in CONFLICT-DONE state until the
   conditions for transition out of CONFLICT-DONE state are satisfied.

8.12.2.  Transition out of CONFLICT-DONE State

   If communication with the partner fails while in CONFLICT-DONE state,
   then the server will remain in CONFLICT-DONE state.

   When a primary server determines that the secondary server has made a
   transition into NORMAL state, the primary server will also transition
   into NORMAL state.

(page 85 continued on part 5)