9.5. Lease Renewal
The purpose of a lease is to allow a server to remove stale locks
that are held by a client that has crashed or is otherwise
unreachable. It is not a mechanism for cache consistency, and lease
renewals may not be denied if the lease interval has not expired.
The client can implicitly provide a positive indication that it is
still active and that the associated state held at the server, for
the client, is still valid. Any operation made with a valid clientid
(DELEGPURGE, LOCK, LOCKT, OPEN, RELEASE_LOCKOWNER, or RENEW) or a
valid stateid (CLOSE, DELEGRETURN, LOCK, LOCKU, OPEN, OPEN_CONFIRM,
OPEN_DOWNGRADE, READ, SETATTR, or WRITE) informs the server to renew
all of the leases for that client (i.e., all those sharing a given
client ID). In the latter case, the stateid must not be one of the
special stateids (anonymous stateid or READ bypass stateid).
Note that if the client had restarted or rebooted, the client would
not be making these requests without issuing the SETCLIENTID/
SETCLIENTID_CONFIRM sequence. The use of the SETCLIENTID/
SETCLIENTID_CONFIRM sequence (one that changes the client verifier)
notifies the server to drop the locking state associated with the
client. SETCLIENTID/SETCLIENTID_CONFIRM never renews a lease.
If the server has rebooted, the stateids (NFS4ERR_STALE_STATEID
error) or the client ID (NFS4ERR_STALE_CLIENTID error) will not be
valid, hence preventing spurious renewals.
This approach allows for low-overhead lease renewal, which scales
well. In the typical case, no extra RPCs are required for lease
renewal, and in the worst case, one RPC is required every lease
period (i.e., a RENEW operation). The number of locks held by the
client is not a factor since all state for the client is involved
with the lease renewal action.
Since all operations that create a new lease also renew existing
leases, the server must maintain a common lease expiration time for
all valid leases for a given client. This lease time can then be
easily updated upon implicit lease renewal actions.
9.6. Crash Recovery
The important requirement in crash recovery is that both the client
and the server know when the other has failed. Additionally, it is
required that a client sees a consistent view of data across server
restarts or reboots. All READ and WRITE operations that may have
been queued within the client or network buffers must wait until the
client has successfully recovered the locks protecting the READ and
9.6.1. Client Failure and Recovery
In the event that a client fails, the server may recover the client's
locks when the associated leases have expired. Conflicting locks
from another client may only be granted after this lease expiration.
If the client is able to restart or reinitialize within the lease
period, the client may be forced to wait the remainder of the lease
period before obtaining new locks.
To minimize client delay upon restart, open and lock requests are
associated with an instance of the client by a client-supplied
verifier. This verifier is part of the initial SETCLIENTID call made
by the client. The server returns a client ID as a result of the
SETCLIENTID operation. The client then confirms the use of the
client ID with SETCLIENTID_CONFIRM. The client ID in combination
with an opaque owner field is then used by the client to identify the
open-owner for OPEN. This chain of associations is then used to
identify all locks for a particular client.
Since the verifier will be changed by the client upon each
initialization, the server can compare a new verifier to the verifier
associated with currently held locks and determine that they do not
match. This signifies the client's new instantiation and subsequent
loss of locking state. As a result, the server is free to release
all locks held that are associated with the old client ID that was
derived from the old verifier.
Note that the verifier must have the same uniqueness properties of
the verifier for the COMMIT operation.
9.6.2. Server Failure and Recovery
If the server loses locking state (usually as a result of a restart
or reboot), it must allow clients time to discover this fact and
re-establish the lost locking state. The client must be able to
re-establish the locking state without having the server deny valid
requests because the server has granted conflicting access to another
client. Likewise, if there is the possibility that clients have
not yet re-established their locking state for a file, the server
must disallow READ and WRITE operations for that file. The duration
of this recovery period is equal to the duration of the lease period.
A client can determine that server failure (and thus loss of locking
state) has occurred, when it receives one of two errors. The
NFS4ERR_STALE_STATEID error indicates a stateid invalidated by a
reboot or restart. The NFS4ERR_STALE_CLIENTID error indicates a
client ID invalidated by reboot or restart. When either of these is
received, the client must establish a new client ID (see
Section 9.1.1) and re-establish the locking state as discussed below.
The period of special handling of locking and READs and WRITEs, equal
in duration to the lease period, is referred to as the "grace
period". During the grace period, clients recover locks and the
associated state by reclaim-type locking requests (i.e., LOCK
requests with reclaim set to TRUE and OPEN operations with a claim
type of either CLAIM_PREVIOUS or CLAIM_DELEGATE_PREV). During the
grace period, the server must reject READ and WRITE operations and
non-reclaim locking requests (i.e., other LOCK and OPEN operations)
with an error of NFS4ERR_GRACE.
If the server can reliably determine that granting a non-reclaim
request will not conflict with reclamation of locks by other clients,
the NFS4ERR_GRACE error does not have to be returned and the
non-reclaim client request can be serviced. For the server to be
able to service READ and WRITE operations during the grace period, it
must again be able to guarantee that no possible conflict could arise
between an impending reclaim locking request and the READ or WRITE
operation. If the server is unable to offer that guarantee, the
NFS4ERR_GRACE error must be returned to the client.
For a server to provide simple, valid handling during the grace
period, the easiest method is to simply reject all non-reclaim
locking requests and READ and WRITE operations by returning the
NFS4ERR_GRACE error. However, a server may keep information about
granted locks in stable storage. With this information, the server
could determine if a regular lock or READ or WRITE operation can be
For example, if a count of locks on a given file is available in
stable storage, the server can track reclaimed locks for the file,
and when all reclaims have been processed, non-reclaim locking
requests may be processed. This way, the server can ensure that
non-reclaim locking requests will not conflict with potential reclaim
requests. With respect to I/O requests, if the server is able to
determine that there are no outstanding reclaim requests for a file
by information from stable storage or another similar mechanism, the
processing of I/O requests could proceed normally for the file.
To reiterate, for a server that allows non-reclaim lock and I/O
requests to be processed during the grace period, it MUST determine
that no lock subsequently reclaimed will be rejected and that no lock
subsequently reclaimed would have prevented any I/O operation
processed during the grace period.
Clients should be prepared for the return of NFS4ERR_GRACE errors for
non-reclaim lock and I/O requests. In this case, the client should
employ a retry mechanism for the request. A delay (on the order of
several seconds) between retries should be used to avoid overwhelming
the server. Further discussion of the general issue is included in
[Floyd]. The client must account for the server that is able to
perform I/O and non-reclaim locking requests within the grace period
as well as those that cannot do so.
A reclaim-type locking request outside the server's grace period can
only succeed if the server can guarantee that no conflicting lock or
I/O request has been granted since reboot or restart.
A server may, upon restart, establish a new value for the lease
period. Therefore, clients should, once a new client ID is
established, refetch the lease_time attribute and use it as the basis
for lease renewal for the lease associated with that server.
However, the server must establish, for this restart event, a grace
period at least as long as the lease period for the previous server
instantiation. This allows the client state obtained during the
previous server instance to be reliably re-established.
9.6.3. Network Partitions and Recovery
If the duration of a network partition is greater than the lease
period provided by the server, the server will have not received a
lease renewal from the client. If this occurs, the server may cancel
the lease and free all locks held for the client. As a result, all
stateids held by the client will become invalid or stale. Once the
client is able to reach the server after such a network partition,
all I/O submitted by the client with the now invalid stateids will
fail with the server returning the error NFS4ERR_EXPIRED. Once this
error is received, the client will suitably notify the application
that held the lock.
126.96.36.199. Courtesy Locks
As a courtesy to the client or as an optimization, the server may
continue to hold locks, including delegations, on behalf of a client
for which recent communication has extended beyond the lease period,
delaying the cancellation of the lease. If the server receives a
lock or I/O request that conflicts with one of these courtesy locks
or if it runs out of resources, the server MAY cause lease
cancellation to occur at that time and henceforth return
NFS4ERR_EXPIRED when any of the stateids associated with the freed
locks is used. If lease cancellation has not occurred and the server
receives a lock or I/O request that conflicts with one of the
courtesy locks, the requirements are as follows:
o In the case of a courtesy lock that is not a delegation, it MUST
free the courtesy lock and grant the new request.
o In the case of a lock or an I/O request that conflicts with a
delegation that is being held as a courtesy lock, the server MAY
delay resolution of the request but MUST NOT reject the request
and MUST free the delegation and grant the new request eventually.
o In the case of a request for a delegation that conflicts with a
delegation that is being held as a courtesy lock, the server MAY
grant the new request or not as it chooses, but if it grants the
conflicting request, the delegation held as a courtesy lock MUST
If the server does not reboot or cancel the lease before the network
partition is healed, when the original client tries to access a
courtesy lock that was freed, the server SHOULD send back an
NFS4ERR_BAD_STATEID to the client. If the client tries to access a
courtesy lock that was not freed, then the server SHOULD mark all of
the courtesy locks as implicitly being renewed.
188.8.131.52. Lease Cancellation
As a result of lease expiration, leases may be canceled, either
immediately upon expiration or subsequently, depending on the
occurrence of a conflicting lock or extension of the period of
partition beyond what the server will tolerate.
When a lease is canceled, all locking state associated with it is
freed, and the use of any of the associated stateids will result in
NFS4ERR_EXPIRED being returned. Similarly, the use of the associated
clientid will result in NFS4ERR_EXPIRED being returned.
The client should recover from this situation by using SETCLIENTID
followed by SETCLIENTID_CONFIRM, in order to establish a new
clientid. Once a lock is obtained using this clientid, a lease will
184.108.40.206. Client's Reaction to a Freed Lock
There is no way for a client to predetermine how a given server is
going to behave during a network partition. When the partition
heals, the client still has either all of its locks, some of its
locks, or none of them. The client will be able to examine the
various error return values to determine its response.
All locks have been freed as a result of a lease cancellation that
occurred during the partition. The client should use a
SETCLIENTID to recover.
The current lock has been revoked before, during, or after the
partition. The client SHOULD handle this error as it normally
The current lock has been revoked/released during the partition,
and the server did not reboot. Other locks MAY still be renewed.
The client need not do a SETCLIENTID and instead SHOULD probe via
a RENEW call.
The current lock has been revoked during the partition, and the
server rebooted. The server might have no information on the
other locks. They may still be renewable.
The client's locks have been revoked during the partition, and the
server rebooted. None of the client's locks will be renewable.
The server has not rebooted. The client SHOULD handle this error
as it normally would.
220.127.116.11. Edge Conditions
When a network partition is combined with a server reboot, then both
the server and client have responsibilities to ensure that the client
does not reclaim a lock that it should no longer be able to access.
Briefly, those are:
o Client's responsibility: A client MUST NOT attempt to reclaim any
locks that it did not hold at the end of its most recent
successfully established client lease.
o Server's responsibility: A server MUST NOT allow a client to
reclaim a lock unless it knows that it could not have since
granted a conflicting lock. However, in deciding whether a
conflicting lock could have been granted, it is permitted to
assume that its clients are responsible, as above.
A server may consider a client's lease "successfully established"
once it has received an OPEN operation from that client.
The above are directed to CLAIM_PREVIOUS reclaims and not to
CLAIM_DELEGATE_PREV reclaims, which generally do not involve a server
reboot. However, when a server persistently stores delegation
information to support CLAIM_DELEGATE_PREV across a period in which
both client and server are down at the same time, similar strictures
The next sections give examples showing what can go wrong if these
responsibilities are neglected and also provide examples of server
implementation strategies that could meet a server's
18.104.22.168.1. First Server Edge Condition
The first edge condition has the following scenario:
1. Client A acquires a lock.
2. Client A and the server experience mutual network partition, such
that client A is unable to renew its lease.
3. Client A's lease expires, so the server releases the lock.
4. Client B acquires a lock that would have conflicted with that of
5. Client B releases the lock.
6. The server reboots.
7. The network partition between client A and the server heals.
8. Client A issues a RENEW operation and gets back an
9. Client A reclaims its lock within the server's grace period.
Thus, at the final step, the server has erroneously granted
client A's lock reclaim. If client B modified the object the lock
was protecting, client A will experience object corruption.
22.214.171.124.2. Second Server Edge Condition
The second known edge condition follows:
1. Client A acquires a lock.
2. The server reboots.
3. Client A and the server experience mutual network partition,
such that client A is unable to reclaim its lock within the
4. The server's reclaim grace period ends. Client A has no locks
recorded on the server.
5. Client B acquires a lock that would have conflicted with that of
6. Client B releases the lock.
7. The server reboots a second time.
8. The network partition between client A and the server heals.
9. Client A issues a RENEW operation and gets back an
10. Client A reclaims its lock within the server's grace period.
As with the first edge condition, the final step of the scenario of
the second edge condition has the server erroneously granting
client A's lock reclaim.
126.96.36.199.3. Handling Server Edge Conditions
In both of the above examples, the client attempts reclaim of a lock
that it held at the end of its most recent successfully established
lease; thus, it has fulfilled its responsibility.
The server, however, has failed, by granting a reclaim, despite
having granted a conflicting lock since the reclaimed lock was last
Solving these edge conditions requires that the server either (1)
assume after it reboots that an edge condition occurs, and thus
return NFS4ERR_NO_GRACE for all reclaim attempts, or (2) record some
information in stable storage. The amount of information the server
records in stable storage is in inverse proportion to how harsh the
server wants to be whenever the edge conditions occur. The server
that is completely tolerant of all edge conditions will record in
stable storage every lock that is acquired, removing the lock record
from stable storage only when the lock is unlocked by the client and
the lock's owner advances the sequence number such that the lock
release is not the last stateful event for the owner's sequence. For
the two aforementioned edge conditions, the harshest a server can be,
and still support a grace period for reclaims, requires that the
server record in stable storage some minimal information. For
example, a server implementation could, for each client, save in
stable storage a record containing:
o the client's id string.
o a boolean that indicates if the client's lease expired or if there
was administrative intervention (see Section 9.8) to revoke a
byte-range lock, share reservation, or delegation.
o a timestamp that is updated the first time after a server boot or
reboot the client acquires byte-range locking, share reservation,
or delegation state on the server. The timestamp need not be
updated on subsequent lock requests until the server reboots.
The server implementation would also record in stable storage the
timestamps from the two most recent server reboots.
Assuming the above record keeping, for the first edge condition,
after the server reboots, the record that client A's lease expired
means that another client could have acquired a conflicting record
lock, share reservation, or delegation. Hence, the server must
reject a reclaim from client A with the error NFS4ERR_NO_GRACE or
For the second edge condition, after the server reboots for a second
time, the record that the client had an unexpired record lock, share
reservation, or delegation established before the server's previous
incarnation means that the server must reject a reclaim from client A
with the error NFS4ERR_NO_GRACE or NFS4ERR_RECLAIM_BAD.
Regardless of the level and approach to record keeping, the server
MUST implement one of the following strategies (which apply to
reclaims of share reservations, byte-range locks, and delegations):
1. Reject all reclaims with NFS4ERR_NO_GRACE. This is extremely
harsh but is necessary if the server does not want to record lock
state in stable storage.
2. Record sufficient state in stable storage to meet its
responsibilities. In doubt, the server should err on the side of
In the event that, after a server reboot, the server determines
that there is unrecoverable damage or corruption to stable
storage, then for all clients and/or locks affected, the server
MUST return NFS4ERR_NO_GRACE.
188.8.131.52.4. Client Edge Condition
A third edge condition affects the client and not the server. If the
server reboots in the middle of the client reclaiming some locks and
then a network partition is established, the client might be in the
situation of having reclaimed some, but not all, locks. In that
case, a conservative client would assume that the non-reclaimed locks
The third known edge condition follows:
1. Client A acquires a lock 1.
2. Client A acquires a lock 2.
3. The server reboots.
4. Client A issues a RENEW operation and gets back an
5. Client A reclaims its lock 1 within the server's grace period.
6. Client A and the server experience mutual network partition,
such that client A is unable to reclaim its remaining locks
within the grace period.
7. The server's reclaim grace period ends.
8. Client B acquires a lock that would have conflicted with
client A's lock 2.
9. Client B releases the lock.
10. The server reboots a second time.
11. The network partition between client A and the server heals.
12. Client A issues a RENEW operation and gets back an
13. Client A reclaims both lock 1 and lock 2 within the server's
At the last step, the client reclaims lock 2 as if it had held that
lock continuously, when in fact a conflicting lock was granted to
This occurs because the client failed its responsibility, by
attempting to reclaim lock 2 even though it had not held that lock at
the end of the lease that was established by the SETCLIENTID after
the first server reboot. (The client did hold lock 2 on a previous
lease, but it is only the most recent lease that matters.)
A server could avoid this situation by rejecting the reclaim of
lock 2. However, to do so accurately, it would have to ensure that
additional information about individual locks held survives a reboot.
Server implementations are not required to do that, so the client
must not assume that the server will.
Instead, a client MUST reclaim only those locks that it successfully
acquired from the previous server instance, omitting any that it
failed to reclaim before a new reboot. Thus, in the last step above,
client A should reclaim only lock 1.
184.108.40.206.5. Client's Handling of Reclaim Errors
A mandate for the client's handling of the NFS4ERR_NO_GRACE and
NFS4ERR_RECLAIM_BAD errors is outside the scope of this
specification, since the strategies for such handling are very
dependent on the client's operating environment. However, one
potential approach is described below.
When the client's reclaim fails, it could examine the change
attribute of the objects the client is trying to reclaim state for,
and use that to determine whether to re-establish the state via
normal OPEN or LOCK requests. This is acceptable, provided the
client's operating environment allows it. In other words, the client
implementer is advised to document the behavior for his users. The
client could also inform the application that its byte-range lock or
share reservations (whether they were delegated or not) have been
lost, such as via a UNIX signal, a GUI pop-up window, etc. See
Section 10.5 for a discussion of what the client should do for
dealing with unreclaimed delegations on client state.
For further discussion of revocation of locks, see Section 9.8.
9.7. Recovery from a Lock Request Timeout or Abort
In the event a lock request times out, a client may decide to not
retry the request. The client may also abort the request when the
process for which it was issued is terminated (e.g., in UNIX due to a
signal). It is possible, though, that the server received the
request and acted upon it. This would change the state on the server
without the client being aware of the change. It is paramount that
the client resynchronize state with the server before it attempts any
other operation that takes a seqid and/or a stateid with the same
state-owner. This is straightforward to do without a special
Since the server maintains the last lock request and response
received on the state-owner, for each state-owner, the client should
cache the last lock request it sent such that the lock request did
not receive a response. From this, the next time the client does a
lock operation for the state-owner, it can send the cached request,
if there is one, and if the request was one that established state
(e.g., a LOCK or OPEN operation), the server will return the cached
result or, if it never saw the request, perform it. The client can
follow up with a request to remove the state (e.g., a LOCKU or CLOSE
operation). With this approach, the sequencing and stateid
information on the client and server for the given state-owner will
resynchronize, and in turn the lock state will resynchronize.
9.8. Server Revocation of Locks
At any point, the server can revoke locks held by a client and the
client must be prepared for this event. When the client detects that
its locks have been or may have been revoked, the client is
responsible for validating the state information between itself and
the server. Validating locking state for the client means that it
must verify or reclaim state for each lock currently held.
The first instance of lock revocation is upon server reboot or
re-initialization. In this instance, the client will receive an
error (NFS4ERR_STALE_STATEID or NFS4ERR_STALE_CLIENTID) and the
client will proceed with normal crash recovery as described in the
The second lock revocation event is the inability to renew the lease
before expiration. While this is considered a rare or unusual event,
the client must be prepared to recover. Both the server and client
will be able to detect the failure to renew the lease and are capable
of recovering without data corruption. For the server, it tracks the
last renewal event serviced for the client and knows when the lease
will expire. Similarly, the client must track operations that will
renew the lease period. Using the time that each such request was
sent and the time that the corresponding reply was received, the
client should bound the time that the corresponding renewal could
have occurred on the server and thus determine if it is possible that
a lease period expiration could have occurred.
The third lock revocation event can occur as a result of
administrative intervention within the lease period. While this is
considered a rare event, it is possible that the server's
administrator has decided to release or revoke a particular lock held
by the client. As a result of revocation, the client will receive an
error of NFS4ERR_ADMIN_REVOKED. In this instance, the client may
assume that only the state-owner's locks have been lost. The client
notifies the lock holder appropriately. The client cannot assume
that the lease period has been renewed as a result of a failed
When the client determines the lease period may have expired, the
client must mark all locks held for the associated lease as
"unvalidated". This means the client has been unable to re-establish
or confirm the appropriate lock state with the server. As described
in Section 9.6, there are scenarios in which the server may grant
conflicting locks after the lease period has expired for a client.
When it is possible that the lease period has expired, the client
must validate each lock currently held to ensure that a conflicting
lock has not been granted. The client may accomplish this task by
issuing an I/O request; if there is no relevant I/O pending, a
zero-length read specifying the stateid associated with the lock in
question can be synthesized to trigger the renewal. If the response
to the request is success, the client has validated all of the locks
governed by that stateid and re-established the appropriate state
between itself and the server.
If the I/O request is not successful, then one or more of the locks
associated with the stateid were revoked by the server, and the
client must notify the owner.
9.9. Share Reservations
A share reservation is a mechanism to control access to a file. It
is a separate and independent mechanism from byte-range locking.
When a client opens a file, it issues an OPEN operation to the server
specifying the type of access required (READ, WRITE, or BOTH) and the
type of access to deny others (OPEN4_SHARE_DENY_NONE,
OPEN4_SHARE_DENY_READ, OPEN4_SHARE_DENY_WRITE, or
OPEN4_SHARE_DENY_BOTH). If the OPEN fails, the client will fail the
application's open request.
Pseudo-code definition of the semantics:
if (request.access == 0)
else if ((request.access & file_state.deny) ||
(request.deny & file_state.access))
This checking of share reservations on OPEN is done with no exception
for an existing OPEN for the same open-owner.
The constants used for the OPEN and OPEN_DOWNGRADE operations for the
access and deny fields are as follows:
const OPEN4_SHARE_ACCESS_READ = 0x00000001;
const OPEN4_SHARE_ACCESS_WRITE = 0x00000002;
const OPEN4_SHARE_ACCESS_BOTH = 0x00000003;
const OPEN4_SHARE_DENY_NONE = 0x00000000;
const OPEN4_SHARE_DENY_READ = 0x00000001;
const OPEN4_SHARE_DENY_WRITE = 0x00000002;
const OPEN4_SHARE_DENY_BOTH = 0x00000003;
9.10. OPEN/CLOSE Operations
To provide correct share semantics, a client MUST use the OPEN
operation to obtain the initial filehandle and indicate the desired
access and what access, if any, to deny. Even if the client intends
to use one of the special stateids (anonymous stateid or READ bypass
stateid), it must still obtain the filehandle for the regular file
with the OPEN operation so the appropriate share semantics can be
applied. Clients that do not have a deny mode built into their
programming interfaces for opening a file should request a deny mode
The OPEN operation with the CREATE flag also subsumes the CREATE
operation for regular files as used in previous versions of the NFS
protocol. This allows a create with a share to be done atomically.
The CLOSE operation removes all share reservations held by the
open-owner on that file. If byte-range locks are held, the client
SHOULD release all locks before issuing a CLOSE. The server MAY free
all outstanding locks on CLOSE, but some servers may not support the
CLOSE of a file that still has byte-range locks held. The server
MUST return failure, NFS4ERR_LOCKS_HELD, if any locks would exist
after the CLOSE.
The LOOKUP operation will return a filehandle without establishing
any lock state on the server. Without a valid stateid, the server
will assume that the client has the least access. For example, if
one client opened a file with OPEN4_SHARE_DENY_BOTH and another
client accesses the file via a filehandle obtained through LOOKUP,
the second client could only read the file using the special READ
bypass stateid. The second client could not WRITE the file at all
because it would not have a valid stateid from OPEN and the special
anonymous stateid would not be allowed access.
9.10.1. Close and Retention of State Information
Since a CLOSE operation requests deallocation of a stateid, dealing
with retransmission of the CLOSE may pose special difficulties, since
the state information, which normally would be used to determine the
state of the open file being designated, might be deallocated,
resulting in an NFS4ERR_BAD_STATEID error.
Servers may deal with this problem in a number of ways. To provide
the greatest degree of assurance that the protocol is being used
properly, a server should, rather than deallocate the stateid, mark
it as close-pending, and retain the stateid with this status, until
later deallocation. In this way, a retransmitted CLOSE can be
recognized since the stateid points to state information with this
distinctive status, so that it can be handled without error.
When adopting this strategy, a server should retain the state
information until the earliest of:
o Another validly sequenced request for the same open-owner, that is
not a retransmission.
o The time that an open-owner is freed by the server due to period
with no activity.
o All locks for the client are freed as a result of a SETCLIENTID.
Servers may avoid this complexity, at the cost of less complete
protocol error checking, by simply responding NFS4_OK in the event of
a CLOSE for a deallocated stateid, on the assumption that this case
must be caused by a retransmitted close. When adopting this
approach, it is desirable to at least log an error when returning a
no-error indication in this situation. If the server maintains a
reply-cache mechanism, it can verify that the CLOSE is indeed a
retransmission and avoid error logging in most cases.
9.11. Open Upgrade and Downgrade
When an OPEN is done for a file and the open-owner for which the open
is being done already has the file open, the result is to upgrade the
open file status maintained on the server to include the access and
deny bits specified by the new OPEN as well as those for the existing
OPEN. The result is that there is one open file, as far as the
protocol is concerned, and it includes the union of the access and
deny bits for all of the OPEN requests completed. Only a single
CLOSE will be done to reset the effects of both OPENs. Note that the
client, when issuing the OPEN, may not know that the same file is in
fact being opened. The above only applies if both OPENs result in
the OPENed object being designated by the same filehandle.
When the server chooses to export multiple filehandles corresponding
to the same file object and returns different filehandles on two
different OPENs of the same file object, the server MUST NOT "OR"
together the access and deny bits and coalesce the two open files.
Instead, the server must maintain separate OPENs with separate
stateids and will require separate CLOSEs to free them.
When multiple open files on the client are merged into a single open
file object on the server, the close of one of the open files (on the
client) may necessitate change of the access and deny status of the
open file on the server. This is because the union of the access and
deny bits for the remaining opens may be smaller (i.e., a proper
subset) than previously. The OPEN_DOWNGRADE operation is used to
make the necessary change, and the client should use it to update the
server so that share reservation requests by other clients are
handled properly. The stateid returned has the same "other" field as
that passed to the server. The seqid value in the returned stateid
MUST be incremented (Section 9.1.4), even in situations in which
there has been no change to the access and deny bits for the file.
9.12. Short and Long Leases
When determining the time period for the server lease, the usual
lease trade-offs apply. Short leases are good for fast server
recovery at a cost of increased RENEW or READ (with zero length)
requests. Longer leases are certainly kinder and gentler to servers
trying to handle very large numbers of clients. The number of RENEW
requests drops in proportion to the lease time. The disadvantages of
long leases are slower recovery after server failure (the server must
wait for the leases to expire and the grace period to elapse before
granting new lock requests) and increased file contention (if the
client fails to transmit an unlock request, then the server must wait
for lease expiration before granting new locks).
Long leases are usable if the server is able to store lease state in
non-volatile memory. Upon recovery, the server can reconstruct the
lease state from its non-volatile memory and continue operation with
its clients, and therefore long leases would not be an issue.
9.13. Clocks, Propagation Delay, and Calculating Lease Expiration
To avoid the need for synchronized clocks, lease times are granted by
the server as a time delta. However, there is a requirement that the
client and server clocks do not drift excessively over the duration
of the lock. There is also the issue of propagation delay across the
network -- which could easily be several hundred milliseconds -- as
well as the possibility that requests will be lost and need to be
To take propagation delay into account, the client should subtract it
from lease times (e.g., if the client estimates the one-way
propagation delay as 200 msec, then it can assume that the lease is
already 200 msec old when it gets it). In addition, it will take
another 200 msec to get a response back to the server. So the client
must send a lock renewal or write data back to the server 400 msec
before the lease would expire.
The server's lease period configuration should take into account the
network distance of the clients that will be accessing the server's
resources. It is expected that the lease period will take into
account the network propagation delays and other network delay
factors for the client population. Since the protocol does not allow
for an automatic method to determine an appropriate lease period, the
server's administrator may have to tune the lease period.
9.14. Migration, Replication, and State
When responsibility for handling a given file system is transferred
to a new server (migration) or the client chooses to use an
alternative server (e.g., in response to server unresponsiveness) in
the context of file system replication, the appropriate handling of
state shared between the client and server (i.e., locks, leases,
stateids, and client IDs) is as described below. The handling
differs between migration and replication. For a related discussion
of file server state and recovery of same, see the subsections of
In cases in which one server is expected to accept opaque values from
the client that originated from another server, the servers SHOULD
encode the opaque values in big-endian byte order. If this is done,
the new server will be able to parse values like stateids, directory
cookies, filehandles, etc. even if their native byte order is
different from that of other servers cooperating in the replication
and migration of the file system.
9.14.1. Migration and State
In the case of migration, the servers involved in the migration of a
file system SHOULD transfer all server state from the original server
to the new server. This must be done in a way that is transparent to
the client. This state transfer will ease the client's transition
when a file system migration occurs. If the servers are successful
in transferring all state, the client will continue to use stateids
assigned by the original server. Therefore, the new server must
recognize these stateids as valid. This holds true for the client ID
as well. Since responsibility for an entire file system is
transferred with a migration event, there is no possibility that
conflicts will arise on the new server as a result of the transfer of
As part of the transfer of information between servers, leases would
be transferred as well. The leases being transferred to the new
server will typically have a different expiration time from those for
the same client, previously on the old server. To maintain the
property that all leases on a given server for a given client expire
at the same time, the server should advance the expiration time to
the later of the leases being transferred or the leases already
present. This allows the client to maintain lease renewal of both
classes without special effort.
The servers may choose not to transfer the state information upon
migration. However, this choice is discouraged. In this case, when
the client presents state information from the original server (e.g.,
in a RENEW operation or a READ operation of zero length), the client
must be prepared to receive either NFS4ERR_STALE_CLIENTID or
NFS4ERR_STALE_STATEID from the new server. The client should then
recover its state information as it normally would in response to a
server failure. The new server must take care to allow for the
recovery of state information as it would in the event of server
A client SHOULD re-establish new callback information with the new
server as soon as possible, according to sequences described in
Sections 16.33 and 16.34. This ensures that server operations are
not blocked by the inability to recall delegations.
9.14.2. Replication and State
Since client switch-over in the case of replication is not under
server control, the handling of state is different. In this case,
leases, stateids, and client IDs do not have validity across a
transition from one server to another. The client must re-establish
its locks on the new server. This can be compared to the
re-establishment of locks by means of reclaim-type requests after a
server reboot. The difference is that the server has no provision to
distinguish requests reclaiming locks from those obtaining new locks
or to defer the latter. Thus, a client re-establishing a lock on the
new server (by means of a LOCK or OPEN request), may have the
requests denied due to a conflicting lock. Since replication is
intended for read-only use of file systems, such denial of locks
should not pose large difficulties in practice. When an attempt to
re-establish a lock on a new server is denied, the client should
treat the situation as if its original lock had been revoked.
9.14.3. Notification of Migrated Lease
In the case of lease renewal, the client may not be submitting
requests for a file system that has been migrated to another server.
This can occur because of the implicit lease renewal mechanism. The
client renews leases for all file systems when submitting a request
to any one file system at the server.
In order for the client to schedule renewal of leases that may have
been relocated to the new server, the client must find out about
lease relocation before those leases expire. To accomplish this, all
operations that implicitly renew leases for a client (such as OPEN,
CLOSE, READ, WRITE, RENEW, LOCK, and others) will return the error
NFS4ERR_LEASE_MOVED if responsibility for any of the leases to be
renewed has been transferred to a new server. This condition will
continue until the client receives an NFS4ERR_MOVED error and the
server receives the subsequent GETATTR(fs_locations) for an access to
each file system for which a lease has been moved to a new server.
By convention, the compound including the GETATTR(fs_locations)
SHOULD append a RENEW operation to permit the server to identify the
client doing the access.
Upon receiving the NFS4ERR_LEASE_MOVED error, a client that supports
file system migration MUST probe all file systems from that server on
which it holds open state. Once the client has successfully probed
all those file systems that are migrated, the server MUST resume
normal handling of stateful requests from that client.
In order to support legacy clients that do not handle the
NFS4ERR_LEASE_MOVED error correctly, the server SHOULD time out after
a wait of at least two lease periods, at which time it will resume
normal handling of stateful requests from all clients. If a client
attempts to access the migrated files, the server MUST reply with
When the client receives an NFS4ERR_MOVED error, the client can
follow the normal process to obtain the new server information
(through the fs_locations attribute) and perform renewal of those
leases on the new server. If the server has not had state
transferred to it transparently, the client will receive either
NFS4ERR_STALE_CLIENTID or NFS4ERR_STALE_STATEID from the new server,
as described above. The client can then recover state information as
it does in the event of server failure.
9.14.4. Migration and the lease_time Attribute
In order that the client may appropriately manage its leases in the
case of migration, the destination server must establish proper
values for the lease_time attribute.
When state is transferred transparently, that state should include
the correct value of the lease_time attribute. The lease_time
attribute on the destination server must never be less than that on
the source since this would result in premature expiration of leases
granted by the source server. Upon migration, in which state is
transferred transparently, the client is under no obligation to
refetch the lease_time attribute and may continue to use the value
previously fetched (on the source server).
If state has not been transferred transparently (i.e., the client
sees a real or simulated server reboot), the client should fetch the
value of lease_time on the new (i.e., destination) server and use it
for subsequent locking requests. However, the server must respect a
grace period at least as long as the lease_time on the source server,
in order to ensure that clients have ample time to reclaim their
locks before potentially conflicting non-reclaimed locks are granted.
The means by which the new server obtains the value of lease_time on
the old server is left to the server implementations. It is not
specified by the NFSv4 protocol.