Tech-invite3GPPspecsGlossariesIETFRFCsGroupsSIPABNFsWorld Map

RFC 7530


Network File System (NFS) Version 4 Protocol

Part 7 of 14, p. 119 to 139
Prev RFC Part       Next RFC Part


prevText      Top      Up      ToC       Page 119 
9.5.  Lease Renewal

   The purpose of a lease is to allow a server to remove stale locks
   that are held by a client that has crashed or is otherwise
   unreachable.  It is not a mechanism for cache consistency, and lease
   renewals may not be denied if the lease interval has not expired.

   The client can implicitly provide a positive indication that it is
   still active and that the associated state held at the server, for
   the client, is still valid.  Any operation made with a valid clientid
   OPEN_DOWNGRADE, READ, SETATTR, or WRITE) informs the server to renew
   all of the leases for that client (i.e., all those sharing a given
   client ID).  In the latter case, the stateid must not be one of the
   special stateids (anonymous stateid or READ bypass stateid).

   Note that if the client had restarted or rebooted, the client would
   not be making these requests without issuing the SETCLIENTID/
   SETCLIENTID_CONFIRM sequence.  The use of the SETCLIENTID/
   SETCLIENTID_CONFIRM sequence (one that changes the client verifier)
   notifies the server to drop the locking state associated with the
   client.  SETCLIENTID/SETCLIENTID_CONFIRM never renews a lease.

   If the server has rebooted, the stateids (NFS4ERR_STALE_STATEID
   error) or the client ID (NFS4ERR_STALE_CLIENTID error) will not be
   valid, hence preventing spurious renewals.

   This approach allows for low-overhead lease renewal, which scales
   well.  In the typical case, no extra RPCs are required for lease
   renewal, and in the worst case, one RPC is required every lease
   period (i.e., a RENEW operation).  The number of locks held by the
   client is not a factor since all state for the client is involved
   with the lease renewal action.

   Since all operations that create a new lease also renew existing
   leases, the server must maintain a common lease expiration time for
   all valid leases for a given client.  This lease time can then be
   easily updated upon implicit lease renewal actions.

Top      Up      ToC       Page 120 
9.6.  Crash Recovery

   The important requirement in crash recovery is that both the client
   and the server know when the other has failed.  Additionally, it is
   required that a client sees a consistent view of data across server
   restarts or reboots.  All READ and WRITE operations that may have
   been queued within the client or network buffers must wait until the
   client has successfully recovered the locks protecting the READ and
   WRITE operations.

9.6.1.  Client Failure and Recovery

   In the event that a client fails, the server may recover the client's
   locks when the associated leases have expired.  Conflicting locks
   from another client may only be granted after this lease expiration.
   If the client is able to restart or reinitialize within the lease
   period, the client may be forced to wait the remainder of the lease
   period before obtaining new locks.

   To minimize client delay upon restart, open and lock requests are
   associated with an instance of the client by a client-supplied
   verifier.  This verifier is part of the initial SETCLIENTID call made
   by the client.  The server returns a client ID as a result of the
   SETCLIENTID operation.  The client then confirms the use of the
   client ID with SETCLIENTID_CONFIRM.  The client ID in combination
   with an opaque owner field is then used by the client to identify the
   open-owner for OPEN.  This chain of associations is then used to
   identify all locks for a particular client.

   Since the verifier will be changed by the client upon each
   initialization, the server can compare a new verifier to the verifier
   associated with currently held locks and determine that they do not
   match.  This signifies the client's new instantiation and subsequent
   loss of locking state.  As a result, the server is free to release
   all locks held that are associated with the old client ID that was
   derived from the old verifier.

   Note that the verifier must have the same uniqueness properties of
   the verifier for the COMMIT operation.

9.6.2.  Server Failure and Recovery

   If the server loses locking state (usually as a result of a restart
   or reboot), it must allow clients time to discover this fact and
   re-establish the lost locking state.  The client must be able to
   re-establish the locking state without having the server deny valid
   requests because the server has granted conflicting access to another
   client.  Likewise, if there is the possibility that clients have

Top      Up      ToC       Page 121 
   not yet re-established their locking state for a file, the server
   must disallow READ and WRITE operations for that file.  The duration
   of this recovery period is equal to the duration of the lease period.

   A client can determine that server failure (and thus loss of locking
   state) has occurred, when it receives one of two errors.  The
   NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a
   reboot or restart.  The NFS4ERR_STALE_CLIENTID error indicates a
   client ID invalidated by reboot or restart.  When either of these is
   received, the client must establish a new client ID (see
   Section 9.1.1) and re-establish the locking state as discussed below.

   The period of special handling of locking and READs and WRITEs, equal
   in duration to the lease period, is referred to as the "grace
   period".  During the grace period, clients recover locks and the
   associated state by reclaim-type locking requests (i.e., LOCK
   requests with reclaim set to TRUE and OPEN operations with a claim
   type of either CLAIM_PREVIOUS or CLAIM_DELEGATE_PREV).  During the
   grace period, the server must reject READ and WRITE operations and
   non-reclaim locking requests (i.e., other LOCK and OPEN operations)
   with an error of NFS4ERR_GRACE.

   If the server can reliably determine that granting a non-reclaim
   request will not conflict with reclamation of locks by other clients,
   the NFS4ERR_GRACE error does not have to be returned and the
   non-reclaim client request can be serviced.  For the server to be
   able to service READ and WRITE operations during the grace period, it
   must again be able to guarantee that no possible conflict could arise
   between an impending reclaim locking request and the READ or WRITE
   operation.  If the server is unable to offer that guarantee, the
   NFS4ERR_GRACE error must be returned to the client.

   For a server to provide simple, valid handling during the grace
   period, the easiest method is to simply reject all non-reclaim
   locking requests and READ and WRITE operations by returning the
   NFS4ERR_GRACE error.  However, a server may keep information about
   granted locks in stable storage.  With this information, the server
   could determine if a regular lock or READ or WRITE operation can be
   safely processed.

   For example, if a count of locks on a given file is available in
   stable storage, the server can track reclaimed locks for the file,
   and when all reclaims have been processed, non-reclaim locking
   requests may be processed.  This way, the server can ensure that
   non-reclaim locking requests will not conflict with potential reclaim
   requests.  With respect to I/O requests, if the server is able to

Top      Up      ToC       Page 122 
   determine that there are no outstanding reclaim requests for a file
   by information from stable storage or another similar mechanism, the
   processing of I/O requests could proceed normally for the file.

   To reiterate, for a server that allows non-reclaim lock and I/O
   requests to be processed during the grace period, it MUST determine
   that no lock subsequently reclaimed will be rejected and that no lock
   subsequently reclaimed would have prevented any I/O operation
   processed during the grace period.

   Clients should be prepared for the return of NFS4ERR_GRACE errors for
   non-reclaim lock and I/O requests.  In this case, the client should
   employ a retry mechanism for the request.  A delay (on the order of
   several seconds) between retries should be used to avoid overwhelming
   the server.  Further discussion of the general issue is included in
   [Floyd].  The client must account for the server that is able to
   perform I/O and non-reclaim locking requests within the grace period
   as well as those that cannot do so.

   A reclaim-type locking request outside the server's grace period can
   only succeed if the server can guarantee that no conflicting lock or
   I/O request has been granted since reboot or restart.

   A server may, upon restart, establish a new value for the lease
   period.  Therefore, clients should, once a new client ID is
   established, refetch the lease_time attribute and use it as the basis
   for lease renewal for the lease associated with that server.
   However, the server must establish, for this restart event, a grace
   period at least as long as the lease period for the previous server
   instantiation.  This allows the client state obtained during the
   previous server instance to be reliably re-established.

9.6.3.  Network Partitions and Recovery

   If the duration of a network partition is greater than the lease
   period provided by the server, the server will have not received a
   lease renewal from the client.  If this occurs, the server may cancel
   the lease and free all locks held for the client.  As a result, all
   stateids held by the client will become invalid or stale.  Once the
   client is able to reach the server after such a network partition,
   all I/O submitted by the client with the now invalid stateids will
   fail with the server returning the error NFS4ERR_EXPIRED.  Once this
   error is received, the client will suitably notify the application
   that held the lock.

Top      Up      ToC       Page 123  Courtesy Locks

   As a courtesy to the client or as an optimization, the server may
   continue to hold locks, including delegations, on behalf of a client
   for which recent communication has extended beyond the lease period,
   delaying the cancellation of the lease.  If the server receives a
   lock or I/O request that conflicts with one of these courtesy locks
   or if it runs out of resources, the server MAY cause lease
   cancellation to occur at that time and henceforth return
   NFS4ERR_EXPIRED when any of the stateids associated with the freed
   locks is used.  If lease cancellation has not occurred and the server
   receives a lock or I/O request that conflicts with one of the
   courtesy locks, the requirements are as follows:

   o  In the case of a courtesy lock that is not a delegation, it MUST
      free the courtesy lock and grant the new request.

   o  In the case of a lock or an I/O request that conflicts with a
      delegation that is being held as a courtesy lock, the server MAY
      delay resolution of the request but MUST NOT reject the request
      and MUST free the delegation and grant the new request eventually.

   o  In the case of a request for a delegation that conflicts with a
      delegation that is being held as a courtesy lock, the server MAY
      grant the new request or not as it chooses, but if it grants the
      conflicting request, the delegation held as a courtesy lock MUST
      be freed.

   If the server does not reboot or cancel the lease before the network
   partition is healed, when the original client tries to access a
   courtesy lock that was freed, the server SHOULD send back an
   NFS4ERR_BAD_STATEID to the client.  If the client tries to access a
   courtesy lock that was not freed, then the server SHOULD mark all of
   the courtesy locks as implicitly being renewed.  Lease Cancellation

   As a result of lease expiration, leases may be canceled, either
   immediately upon expiration or subsequently, depending on the
   occurrence of a conflicting lock or extension of the period of
   partition beyond what the server will tolerate.

   When a lease is canceled, all locking state associated with it is
   freed, and the use of any of the associated stateids will result in
   NFS4ERR_EXPIRED being returned.  Similarly, the use of the associated
   clientid will result in NFS4ERR_EXPIRED being returned.

Top      Up      ToC       Page 124 
   The client should recover from this situation by using SETCLIENTID
   followed by SETCLIENTID_CONFIRM, in order to establish a new
   clientid.  Once a lock is obtained using this clientid, a lease will
   be established.  Client's Reaction to a Freed Lock

   There is no way for a client to predetermine how a given server is
   going to behave during a network partition.  When the partition
   heals, the client still has either all of its locks, some of its
   locks, or none of them.  The client will be able to examine the
   various error return values to determine its response.


      All locks have been freed as a result of a lease cancellation that
      occurred during the partition.  The client should use a
      SETCLIENTID to recover.


      The current lock has been revoked before, during, or after the
      partition.  The client SHOULD handle this error as it normally


      The current lock has been revoked/released during the partition,
      and the server did not reboot.  Other locks MAY still be renewed.
      The client need not do a SETCLIENTID and instead SHOULD probe via
      a RENEW call.


      The current lock has been revoked during the partition, and the
      server rebooted.  The server might have no information on the
      other locks.  They may still be renewable.


      The client's locks have been revoked during the partition, and the
      server rebooted.  None of the client's locks will be renewable.


      The server has not rebooted.  The client SHOULD handle this error
      as it normally would.

Top      Up      ToC       Page 125  Edge Conditions

   When a network partition is combined with a server reboot, then both
   the server and client have responsibilities to ensure that the client
   does not reclaim a lock that it should no longer be able to access.
   Briefly, those are:

   o  Client's responsibility: A client MUST NOT attempt to reclaim any
      locks that it did not hold at the end of its most recent
      successfully established client lease.

   o  Server's responsibility: A server MUST NOT allow a client to
      reclaim a lock unless it knows that it could not have since
      granted a conflicting lock.  However, in deciding whether a
      conflicting lock could have been granted, it is permitted to
      assume that its clients are responsible, as above.

   A server may consider a client's lease "successfully established"
   once it has received an OPEN operation from that client.

   The above are directed to CLAIM_PREVIOUS reclaims and not to
   CLAIM_DELEGATE_PREV reclaims, which generally do not involve a server
   reboot.  However, when a server persistently stores delegation
   information to support CLAIM_DELEGATE_PREV across a period in which
   both client and server are down at the same time, similar strictures

   The next sections give examples showing what can go wrong if these
   responsibilities are neglected and also provide examples of server
   implementation strategies that could meet a server's
   responsibilities.  First Server Edge Condition

   The first edge condition has the following scenario:

   1.  Client A acquires a lock.

   2.  Client A and the server experience mutual network partition, such
       that client A is unable to renew its lease.

   3.  Client A's lease expires, so the server releases the lock.

   4.  Client B acquires a lock that would have conflicted with that of
       client A.

   5.  Client B releases the lock.

Top      Up      ToC       Page 126 
   6.  The server reboots.

   7.  The network partition between client A and the server heals.

   8.  Client A issues a RENEW operation and gets back an

   9.  Client A reclaims its lock within the server's grace period.

   Thus, at the final step, the server has erroneously granted
   client A's lock reclaim.  If client B modified the object the lock
   was protecting, client A will experience object corruption.  Second Server Edge Condition

   The second known edge condition follows:

   1.   Client A acquires a lock.

   2.   The server reboots.

   3.   Client A and the server experience mutual network partition,
        such that client A is unable to reclaim its lock within the
        grace period.

   4.   The server's reclaim grace period ends.  Client A has no locks
        recorded on the server.

   5.   Client B acquires a lock that would have conflicted with that of
        client A.

   6.   Client B releases the lock.

   7.   The server reboots a second time.

   8.   The network partition between client A and the server heals.

   9.   Client A issues a RENEW operation and gets back an

   10.  Client A reclaims its lock within the server's grace period.

   As with the first edge condition, the final step of the scenario of
   the second edge condition has the server erroneously granting
   client A's lock reclaim.

Top      Up      ToC       Page 127  Handling Server Edge Conditions

   In both of the above examples, the client attempts reclaim of a lock
   that it held at the end of its most recent successfully established
   lease; thus, it has fulfilled its responsibility.

   The server, however, has failed, by granting a reclaim, despite
   having granted a conflicting lock since the reclaimed lock was last

   Solving these edge conditions requires that the server either (1)
   assume after it reboots that an edge condition occurs, and thus
   return NFS4ERR_NO_GRACE for all reclaim attempts, or (2) record some
   information in stable storage.  The amount of information the server
   records in stable storage is in inverse proportion to how harsh the
   server wants to be whenever the edge conditions occur.  The server
   that is completely tolerant of all edge conditions will record in
   stable storage every lock that is acquired, removing the lock record
   from stable storage only when the lock is unlocked by the client and
   the lock's owner advances the sequence number such that the lock
   release is not the last stateful event for the owner's sequence.  For
   the two aforementioned edge conditions, the harshest a server can be,
   and still support a grace period for reclaims, requires that the
   server record in stable storage some minimal information.  For
   example, a server implementation could, for each client, save in
   stable storage a record containing:

   o  the client's id string.

   o  a boolean that indicates if the client's lease expired or if there
      was administrative intervention (see Section 9.8) to revoke a
      byte-range lock, share reservation, or delegation.

   o  a timestamp that is updated the first time after a server boot or
      reboot the client acquires byte-range locking, share reservation,
      or delegation state on the server.  The timestamp need not be
      updated on subsequent lock requests until the server reboots.

   The server implementation would also record in stable storage the
   timestamps from the two most recent server reboots.

   Assuming the above record keeping, for the first edge condition,
   after the server reboots, the record that client A's lease expired
   means that another client could have acquired a conflicting record
   lock, share reservation, or delegation.  Hence, the server must
   reject a reclaim from client A with the error NFS4ERR_NO_GRACE or

Top      Up      ToC       Page 128 
   For the second edge condition, after the server reboots for a second
   time, the record that the client had an unexpired record lock, share
   reservation, or delegation established before the server's previous
   incarnation means that the server must reject a reclaim from client A
   with the error NFS4ERR_NO_GRACE or NFS4ERR_RECLAIM_BAD.

   Regardless of the level and approach to record keeping, the server
   MUST implement one of the following strategies (which apply to
   reclaims of share reservations, byte-range locks, and delegations):

   1.  Reject all reclaims with NFS4ERR_NO_GRACE.  This is extremely
       harsh but is necessary if the server does not want to record lock
       state in stable storage.

   2.  Record sufficient state in stable storage to meet its
       responsibilities.  In doubt, the server should err on the side of
       being harsh.

       In the event that, after a server reboot, the server determines
       that there is unrecoverable damage or corruption to stable
       storage, then for all clients and/or locks affected, the server
       MUST return NFS4ERR_NO_GRACE.  Client Edge Condition

   A third edge condition affects the client and not the server.  If the
   server reboots in the middle of the client reclaiming some locks and
   then a network partition is established, the client might be in the
   situation of having reclaimed some, but not all, locks.  In that
   case, a conservative client would assume that the non-reclaimed locks
   were revoked.

   The third known edge condition follows:

   1.   Client A acquires a lock 1.

   2.   Client A acquires a lock 2.

   3.   The server reboots.

   4.   Client A issues a RENEW operation and gets back an

   5.   Client A reclaims its lock 1 within the server's grace period.

   6.   Client A and the server experience mutual network partition,
        such that client A is unable to reclaim its remaining locks
        within the grace period.

Top      Up      ToC       Page 129 
   7.   The server's reclaim grace period ends.

   8.   Client B acquires a lock that would have conflicted with
        client A's lock 2.

   9.   Client B releases the lock.

   10.  The server reboots a second time.

   11.  The network partition between client A and the server heals.

   12.  Client A issues a RENEW operation and gets back an

   13.  Client A reclaims both lock 1 and lock 2 within the server's
        grace period.

   At the last step, the client reclaims lock 2 as if it had held that
   lock continuously, when in fact a conflicting lock was granted to
   client B.

   This occurs because the client failed its responsibility, by
   attempting to reclaim lock 2 even though it had not held that lock at
   the end of the lease that was established by the SETCLIENTID after
   the first server reboot.  (The client did hold lock 2 on a previous
   lease, but it is only the most recent lease that matters.)

   A server could avoid this situation by rejecting the reclaim of
   lock 2.  However, to do so accurately, it would have to ensure that
   additional information about individual locks held survives a reboot.
   Server implementations are not required to do that, so the client
   must not assume that the server will.

   Instead, a client MUST reclaim only those locks that it successfully
   acquired from the previous server instance, omitting any that it
   failed to reclaim before a new reboot.  Thus, in the last step above,
   client A should reclaim only lock 1.  Client's Handling of Reclaim Errors

   A mandate for the client's handling of the NFS4ERR_NO_GRACE and
   NFS4ERR_RECLAIM_BAD errors is outside the scope of this
   specification, since the strategies for such handling are very
   dependent on the client's operating environment.  However, one
   potential approach is described below.

Top      Up      ToC       Page 130 
   When the client's reclaim fails, it could examine the change
   attribute of the objects the client is trying to reclaim state for,
   and use that to determine whether to re-establish the state via
   normal OPEN or LOCK requests.  This is acceptable, provided the
   client's operating environment allows it.  In other words, the client
   implementer is advised to document the behavior for his users.  The
   client could also inform the application that its byte-range lock or
   share reservations (whether they were delegated or not) have been
   lost, such as via a UNIX signal, a GUI pop-up window, etc.  See
   Section 10.5 for a discussion of what the client should do for
   dealing with unreclaimed delegations on client state.

   For further discussion of revocation of locks, see Section 9.8.

9.7.  Recovery from a Lock Request Timeout or Abort

   In the event a lock request times out, a client may decide to not
   retry the request.  The client may also abort the request when the
   process for which it was issued is terminated (e.g., in UNIX due to a
   signal).  It is possible, though, that the server received the
   request and acted upon it.  This would change the state on the server
   without the client being aware of the change.  It is paramount that
   the client resynchronize state with the server before it attempts any
   other operation that takes a seqid and/or a stateid with the same
   state-owner.  This is straightforward to do without a special
   resynchronize operation.

   Since the server maintains the last lock request and response
   received on the state-owner, for each state-owner, the client should
   cache the last lock request it sent such that the lock request did
   not receive a response.  From this, the next time the client does a
   lock operation for the state-owner, it can send the cached request,
   if there is one, and if the request was one that established state
   (e.g., a LOCK or OPEN operation), the server will return the cached
   result or, if it never saw the request, perform it.  The client can
   follow up with a request to remove the state (e.g., a LOCKU or CLOSE
   operation).  With this approach, the sequencing and stateid
   information on the client and server for the given state-owner will
   resynchronize, and in turn the lock state will resynchronize.

9.8.  Server Revocation of Locks

   At any point, the server can revoke locks held by a client and the
   client must be prepared for this event.  When the client detects that
   its locks have been or may have been revoked, the client is
   responsible for validating the state information between itself and
   the server.  Validating locking state for the client means that it
   must verify or reclaim state for each lock currently held.

Top      Up      ToC       Page 131 
   The first instance of lock revocation is upon server reboot or
   re-initialization.  In this instance, the client will receive an
   client will proceed with normal crash recovery as described in the
   previous section.

   The second lock revocation event is the inability to renew the lease
   before expiration.  While this is considered a rare or unusual event,
   the client must be prepared to recover.  Both the server and client
   will be able to detect the failure to renew the lease and are capable
   of recovering without data corruption.  For the server, it tracks the
   last renewal event serviced for the client and knows when the lease
   will expire.  Similarly, the client must track operations that will
   renew the lease period.  Using the time that each such request was
   sent and the time that the corresponding reply was received, the
   client should bound the time that the corresponding renewal could
   have occurred on the server and thus determine if it is possible that
   a lease period expiration could have occurred.

   The third lock revocation event can occur as a result of
   administrative intervention within the lease period.  While this is
   considered a rare event, it is possible that the server's
   administrator has decided to release or revoke a particular lock held
   by the client.  As a result of revocation, the client will receive an
   error of NFS4ERR_ADMIN_REVOKED.  In this instance, the client may
   assume that only the state-owner's locks have been lost.  The client
   notifies the lock holder appropriately.  The client cannot assume
   that the lease period has been renewed as a result of a failed

   When the client determines the lease period may have expired, the
   client must mark all locks held for the associated lease as
   "unvalidated".  This means the client has been unable to re-establish
   or confirm the appropriate lock state with the server.  As described
   in Section 9.6, there are scenarios in which the server may grant
   conflicting locks after the lease period has expired for a client.
   When it is possible that the lease period has expired, the client
   must validate each lock currently held to ensure that a conflicting
   lock has not been granted.  The client may accomplish this task by
   issuing an I/O request; if there is no relevant I/O pending, a
   zero-length read specifying the stateid associated with the lock in
   question can be synthesized to trigger the renewal.  If the response
   to the request is success, the client has validated all of the locks
   governed by that stateid and re-established the appropriate state
   between itself and the server.

Top      Up      ToC       Page 132 
   If the I/O request is not successful, then one or more of the locks
   associated with the stateid were revoked by the server, and the
   client must notify the owner.

9.9.  Share Reservations

   A share reservation is a mechanism to control access to a file.  It
   is a separate and independent mechanism from byte-range locking.
   When a client opens a file, it issues an OPEN operation to the server
   specifying the type of access required (READ, WRITE, or BOTH) and the
   type of access to deny others (OPEN4_SHARE_DENY_NONE,
   OPEN4_SHARE_DENY_BOTH).  If the OPEN fails, the client will fail the
   application's open request.

   Pseudo-code definition of the semantics:

     if (request.access == 0)
             return (NFS4ERR_INVAL)
     else if ((request.access & file_state.deny) ||
         (request.deny & file_state.access))
             return (NFS4ERR_DENIED)

   This checking of share reservations on OPEN is done with no exception
   for an existing OPEN for the same open-owner.

   The constants used for the OPEN and OPEN_DOWNGRADE operations for the
   access and deny fields are as follows:

   const OPEN4_SHARE_ACCESS_READ   = 0x00000001;
   const OPEN4_SHARE_ACCESS_WRITE  = 0x00000002;
   const OPEN4_SHARE_ACCESS_BOTH   = 0x00000003;

   const OPEN4_SHARE_DENY_NONE     = 0x00000000;
   const OPEN4_SHARE_DENY_READ     = 0x00000001;
   const OPEN4_SHARE_DENY_WRITE    = 0x00000002;
   const OPEN4_SHARE_DENY_BOTH     = 0x00000003;

9.10.  OPEN/CLOSE Operations

   To provide correct share semantics, a client MUST use the OPEN
   operation to obtain the initial filehandle and indicate the desired
   access and what access, if any, to deny.  Even if the client intends
   to use one of the special stateids (anonymous stateid or READ bypass
   stateid), it must still obtain the filehandle for the regular file
   with the OPEN operation so the appropriate share semantics can be

Top      Up      ToC       Page 133 
   applied.  Clients that do not have a deny mode built into their
   programming interfaces for opening a file should request a deny mode

   The OPEN operation with the CREATE flag also subsumes the CREATE
   operation for regular files as used in previous versions of the NFS
   protocol.  This allows a create with a share to be done atomically.

   The CLOSE operation removes all share reservations held by the
   open-owner on that file.  If byte-range locks are held, the client
   SHOULD release all locks before issuing a CLOSE.  The server MAY free
   all outstanding locks on CLOSE, but some servers may not support the
   CLOSE of a file that still has byte-range locks held.  The server
   MUST return failure, NFS4ERR_LOCKS_HELD, if any locks would exist
   after the CLOSE.

   The LOOKUP operation will return a filehandle without establishing
   any lock state on the server.  Without a valid stateid, the server
   will assume that the client has the least access.  For example, if
   one client opened a file with OPEN4_SHARE_DENY_BOTH and another
   client accesses the file via a filehandle obtained through LOOKUP,
   the second client could only read the file using the special READ
   bypass stateid.  The second client could not WRITE the file at all
   because it would not have a valid stateid from OPEN and the special
   anonymous stateid would not be allowed access.

9.10.1.  Close and Retention of State Information

   Since a CLOSE operation requests deallocation of a stateid, dealing
   with retransmission of the CLOSE may pose special difficulties, since
   the state information, which normally would be used to determine the
   state of the open file being designated, might be deallocated,
   resulting in an NFS4ERR_BAD_STATEID error.

   Servers may deal with this problem in a number of ways.  To provide
   the greatest degree of assurance that the protocol is being used
   properly, a server should, rather than deallocate the stateid, mark
   it as close-pending, and retain the stateid with this status, until
   later deallocation.  In this way, a retransmitted CLOSE can be
   recognized since the stateid points to state information with this
   distinctive status, so that it can be handled without error.

Top      Up      ToC       Page 134 
   When adopting this strategy, a server should retain the state
   information until the earliest of:

   o  Another validly sequenced request for the same open-owner, that is
      not a retransmission.

   o  The time that an open-owner is freed by the server due to period
      with no activity.

   o  All locks for the client are freed as a result of a SETCLIENTID.

   Servers may avoid this complexity, at the cost of less complete
   protocol error checking, by simply responding NFS4_OK in the event of
   a CLOSE for a deallocated stateid, on the assumption that this case
   must be caused by a retransmitted close.  When adopting this
   approach, it is desirable to at least log an error when returning a
   no-error indication in this situation.  If the server maintains a
   reply-cache mechanism, it can verify that the CLOSE is indeed a
   retransmission and avoid error logging in most cases.

9.11.  Open Upgrade and Downgrade

   When an OPEN is done for a file and the open-owner for which the open
   is being done already has the file open, the result is to upgrade the
   open file status maintained on the server to include the access and
   deny bits specified by the new OPEN as well as those for the existing
   OPEN.  The result is that there is one open file, as far as the
   protocol is concerned, and it includes the union of the access and
   deny bits for all of the OPEN requests completed.  Only a single
   CLOSE will be done to reset the effects of both OPENs.  Note that the
   client, when issuing the OPEN, may not know that the same file is in
   fact being opened.  The above only applies if both OPENs result in
   the OPENed object being designated by the same filehandle.

   When the server chooses to export multiple filehandles corresponding
   to the same file object and returns different filehandles on two
   different OPENs of the same file object, the server MUST NOT "OR"
   together the access and deny bits and coalesce the two open files.
   Instead, the server must maintain separate OPENs with separate
   stateids and will require separate CLOSEs to free them.

   When multiple open files on the client are merged into a single open
   file object on the server, the close of one of the open files (on the
   client) may necessitate change of the access and deny status of the
   open file on the server.  This is because the union of the access and
   deny bits for the remaining opens may be smaller (i.e., a proper
   subset) than previously.  The OPEN_DOWNGRADE operation is used to
   make the necessary change, and the client should use it to update the

Top      Up      ToC       Page 135 
   server so that share reservation requests by other clients are
   handled properly.  The stateid returned has the same "other" field as
   that passed to the server.  The seqid value in the returned stateid
   MUST be incremented (Section 9.1.4), even in situations in which
   there has been no change to the access and deny bits for the file.

9.12.  Short and Long Leases

   When determining the time period for the server lease, the usual
   lease trade-offs apply.  Short leases are good for fast server
   recovery at a cost of increased RENEW or READ (with zero length)
   requests.  Longer leases are certainly kinder and gentler to servers
   trying to handle very large numbers of clients.  The number of RENEW
   requests drops in proportion to the lease time.  The disadvantages of
   long leases are slower recovery after server failure (the server must
   wait for the leases to expire and the grace period to elapse before
   granting new lock requests) and increased file contention (if the
   client fails to transmit an unlock request, then the server must wait
   for lease expiration before granting new locks).

   Long leases are usable if the server is able to store lease state in
   non-volatile memory.  Upon recovery, the server can reconstruct the
   lease state from its non-volatile memory and continue operation with
   its clients, and therefore long leases would not be an issue.

9.13.  Clocks, Propagation Delay, and Calculating Lease Expiration

   To avoid the need for synchronized clocks, lease times are granted by
   the server as a time delta.  However, there is a requirement that the
   client and server clocks do not drift excessively over the duration
   of the lock.  There is also the issue of propagation delay across the
   network -- which could easily be several hundred milliseconds -- as
   well as the possibility that requests will be lost and need to be

   To take propagation delay into account, the client should subtract it
   from lease times (e.g., if the client estimates the one-way
   propagation delay as 200 msec, then it can assume that the lease is
   already 200 msec old when it gets it).  In addition, it will take
   another 200 msec to get a response back to the server.  So the client
   must send a lock renewal or write data back to the server 400 msec
   before the lease would expire.

   The server's lease period configuration should take into account the
   network distance of the clients that will be accessing the server's
   resources.  It is expected that the lease period will take into
   account the network propagation delays and other network delay

Top      Up      ToC       Page 136 
   factors for the client population.  Since the protocol does not allow
   for an automatic method to determine an appropriate lease period, the
   server's administrator may have to tune the lease period.

9.14.  Migration, Replication, and State

   When responsibility for handling a given file system is transferred
   to a new server (migration) or the client chooses to use an
   alternative server (e.g., in response to server unresponsiveness) in
   the context of file system replication, the appropriate handling of
   state shared between the client and server (i.e., locks, leases,
   stateids, and client IDs) is as described below.  The handling
   differs between migration and replication.  For a related discussion
   of file server state and recovery of same, see the subsections of
   Section 9.6.

   In cases in which one server is expected to accept opaque values from
   the client that originated from another server, the servers SHOULD
   encode the opaque values in big-endian byte order.  If this is done,
   the new server will be able to parse values like stateids, directory
   cookies, filehandles, etc. even if their native byte order is
   different from that of other servers cooperating in the replication
   and migration of the file system.

9.14.1.  Migration and State

   In the case of migration, the servers involved in the migration of a
   file system SHOULD transfer all server state from the original server
   to the new server.  This must be done in a way that is transparent to
   the client.  This state transfer will ease the client's transition
   when a file system migration occurs.  If the servers are successful
   in transferring all state, the client will continue to use stateids
   assigned by the original server.  Therefore, the new server must
   recognize these stateids as valid.  This holds true for the client ID
   as well.  Since responsibility for an entire file system is
   transferred with a migration event, there is no possibility that
   conflicts will arise on the new server as a result of the transfer of

   As part of the transfer of information between servers, leases would
   be transferred as well.  The leases being transferred to the new
   server will typically have a different expiration time from those for
   the same client, previously on the old server.  To maintain the
   property that all leases on a given server for a given client expire
   at the same time, the server should advance the expiration time to
   the later of the leases being transferred or the leases already
   present.  This allows the client to maintain lease renewal of both
   classes without special effort.

Top      Up      ToC       Page 137 
   The servers may choose not to transfer the state information upon
   migration.  However, this choice is discouraged.  In this case, when
   the client presents state information from the original server (e.g.,
   in a RENEW operation or a READ operation of zero length), the client
   must be prepared to receive either NFS4ERR_STALE_CLIENTID or
   NFS4ERR_STALE_STATEID from the new server.  The client should then
   recover its state information as it normally would in response to a
   server failure.  The new server must take care to allow for the
   recovery of state information as it would in the event of server

   A client SHOULD re-establish new callback information with the new
   server as soon as possible, according to sequences described in
   Sections 16.33 and 16.34.  This ensures that server operations are
   not blocked by the inability to recall delegations.

9.14.2.  Replication and State

   Since client switch-over in the case of replication is not under
   server control, the handling of state is different.  In this case,
   leases, stateids, and client IDs do not have validity across a
   transition from one server to another.  The client must re-establish
   its locks on the new server.  This can be compared to the
   re-establishment of locks by means of reclaim-type requests after a
   server reboot.  The difference is that the server has no provision to
   distinguish requests reclaiming locks from those obtaining new locks
   or to defer the latter.  Thus, a client re-establishing a lock on the
   new server (by means of a LOCK or OPEN request), may have the
   requests denied due to a conflicting lock.  Since replication is
   intended for read-only use of file systems, such denial of locks
   should not pose large difficulties in practice.  When an attempt to
   re-establish a lock on a new server is denied, the client should
   treat the situation as if its original lock had been revoked.

9.14.3.  Notification of Migrated Lease

   In the case of lease renewal, the client may not be submitting
   requests for a file system that has been migrated to another server.
   This can occur because of the implicit lease renewal mechanism.  The
   client renews leases for all file systems when submitting a request
   to any one file system at the server.

   In order for the client to schedule renewal of leases that may have
   been relocated to the new server, the client must find out about
   lease relocation before those leases expire.  To accomplish this, all
   operations that implicitly renew leases for a client (such as OPEN,
   CLOSE, READ, WRITE, RENEW, LOCK, and others) will return the error
   NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be

Top      Up      ToC       Page 138 
   renewed has been transferred to a new server.  This condition will
   continue until the client receives an NFS4ERR_MOVED error and the
   server receives the subsequent GETATTR(fs_locations) for an access to
   each file system for which a lease has been moved to a new server.
   By convention, the compound including the GETATTR(fs_locations)
   SHOULD append a RENEW operation to permit the server to identify the
   client doing the access.

   Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports
   file system migration MUST probe all file systems from that server on
   which it holds open state.  Once the client has successfully probed
   all those file systems that are migrated, the server MUST resume
   normal handling of stateful requests from that client.

   In order to support legacy clients that do not handle the
   NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after
   a wait of at least two lease periods, at which time it will resume
   normal handling of stateful requests from all clients.  If a client
   attempts to access the migrated files, the server MUST reply with

   When the client receives an NFS4ERR_MOVED error, the client can
   follow the normal process to obtain the new server information
   (through the fs_locations attribute) and perform renewal of those
   leases on the new server.  If the server has not had state
   transferred to it transparently, the client will receive either
   as described above.  The client can then recover state information as
   it does in the event of server failure.

9.14.4.  Migration and the lease_time Attribute

   In order that the client may appropriately manage its leases in the
   case of migration, the destination server must establish proper
   values for the lease_time attribute.

   When state is transferred transparently, that state should include
   the correct value of the lease_time attribute.  The lease_time
   attribute on the destination server must never be less than that on
   the source since this would result in premature expiration of leases
   granted by the source server.  Upon migration, in which state is
   transferred transparently, the client is under no obligation to
   refetch the lease_time attribute and may continue to use the value
   previously fetched (on the source server).

   If state has not been transferred transparently (i.e., the client
   sees a real or simulated server reboot), the client should fetch the
   value of lease_time on the new (i.e., destination) server and use it

Top      Up      ToC       Page 139 
   for subsequent locking requests.  However, the server must respect a
   grace period at least as long as the lease_time on the source server,
   in order to ensure that clients have ample time to reclaim their
   locks before potentially conflicting non-reclaimed locks are granted.
   The means by which the new server obtains the value of lease_time on
   the old server is left to the server implementations.  It is not
   specified by the NFSv4 protocol.

(page 139 continued on part 8)

Next RFC Part