Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7530

Network File System (NFS) Version 4 Protocol

Pages: 323
Proposed Standard
Errata
Obsoletes:  3530
Updated by:  79318587
Part 6 of 14 – Pages 98 to 118
First   Prev   Next

Top   ToC   RFC7530 - Page 98   prevText

9. File Locking and Share Reservations

Integrating locking into the NFS protocol necessarily causes it to be stateful. With the inclusion of share reservations, the protocol becomes substantially more dependent on state than the traditional combination of NFS and NLM (Network Lock Manager) [xnfs]. There are three components to making this state manageable: o clear division between client and server o ability to reliably detect inconsistency in state between client and server o simple and robust recovery mechanisms In this model, the server owns the state information. The client requests changes in locks, and the server responds with the changes made. Non-client-initiated changes in locking state are infrequent. The client receives prompt notification of such changes and can adjust its view of the locking state to reflect the server's changes. Individual pieces of state created by the server and passed to the client at its request are represented by 128-bit stateids. These stateids may represent a particular open file, a set of byte-range locks held by a particular owner, or a recallable delegation of privileges to access a file in particular ways or at a particular location. In all cases, there is a transition from the most general information that represents a client as a whole to the eventual lightweight stateid used for most client and server locking interactions. The details of this transition will vary with the type of object, but it always starts with a client ID. To support Win32 share reservations, it is necessary to atomically OPEN or CREATE files and apply the appropriate locks in the same operation. Having a separate share/unshare operation would not allow correct implementation of the Win32 OpenFile API. In order to correctly implement share semantics, the previous NFS protocol mechanisms used when a file is opened or created (LOOKUP, CREATE, ACCESS) need to be replaced. The NFSv4 protocol has an OPEN operation that subsumes the NFSv3 methodology of LOOKUP, CREATE, and ACCESS. However, because many operations require a filehandle, the traditional LOOKUP is preserved to map a filename to a filehandle without establishing state on the server. The policy of granting access or modifying files is managed by the server based on the client's state. These mechanisms can implement policy ranging from advisory only locking to full mandatory locking.
Top   ToC   RFC7530 - Page 99

9.1. Opens and Byte-Range Locks

It is assumed that manipulating a byte-range lock is rare when compared to READ and WRITE operations. It is also assumed that server restarts and network partitions are relatively rare. Therefore, it is important that the READ and WRITE operations have a lightweight mechanism to indicate if they possess a held lock. A byte-range lock request contains the heavyweight information required to establish a lock and uniquely define the owner of the lock. The following sections describe the transition from the heavyweight information to the eventual stateid used for most client and server locking and lease interactions.

9.1.1. Client ID

For each LOCK request, the client must identify itself to the server. This is done in such a way as to allow for correct lock identification and crash recovery. A sequence of a SETCLIENTID operation followed by a SETCLIENTID_CONFIRM operation is required to establish the identification onto the server. Establishment of identification by a new incarnation of the client also has the effect of immediately breaking any leased state that a previous incarnation of the client might have had on the server, as opposed to forcing the new client incarnation to wait for the leases to expire. Breaking the lease state amounts to the server removing all lock, share reservation, and, where the server is not supporting the CLAIM_DELEGATE_PREV claim type, all delegation state associated with the same client with the same identity. For a discussion of delegation state recovery, see Section 10.2.1. Owners of opens and owners of byte-range locks are separate entities and remain separate even if the same opaque arrays are used to designate owners of each. The protocol distinguishes between open-owners (represented by open_owner4 structures) and lock-owners (represented by lock_owner4 structures). Both sorts of owners consist of a clientid and an opaque owner string. For each client, the set of distinct owner values used with that client constitutes the set of owners of that type, for the given client. Each open is associated with a specific open-owner, while each byte-range lock is associated with a lock-owner and an open-owner, the latter being the open-owner associated with the open file under which the LOCK operation was done.
Top   ToC   RFC7530 - Page 100
   Client identification is encapsulated in the following structure:

   struct nfs_client_id4 {
           verifier4       verifier;
           opaque          id<NFS4_OPAQUE_LIMIT>;
   };

   The first field, verifier, is a client incarnation verifier that is
   used to detect client reboots.  Only if the verifier is different
   from that which the server has previously recorded for the client (as
   identified by the second field of the structure, id) does the server
   start the process of canceling the client's leased state.

   The second field, id, is a variable-length string that uniquely
   defines the client.

   There are several considerations for how the client generates the id
   string:

   o  The string should be unique so that multiple clients do not
      present the same string.  The consequences of two clients
      presenting the same string range from one client getting an error
      to one client having its leased state abruptly and unexpectedly
      canceled.

   o  The string should be selected so the subsequent incarnations
      (e.g., reboots) of the same client cause the client to present the
      same string.  The implementer is cautioned against an approach
      that requires the string to be recorded in a local file because
      this precludes the use of the implementation in an environment
      where there is no local disk and all file access is from an NFSv4
      server.

   o  The string should be different for each server network address
      that the client accesses, rather than common to all server network
      addresses.  The reason is that it may not be possible for the
      client to tell if the same server is listening on multiple network
      addresses.  If the client issues SETCLIENTID with the same id
      string to each network address of such a server, the server will
      think it is the same client, and each successive SETCLIENTID will
      cause the server to begin the process of removing the client's
      previous leased state.

   o  The algorithm for generating the string should not assume that the
      client's network address won't change.  This includes changes
      between client incarnations and even changes while the client is
      still running in its current incarnation.  This means that if the
      client includes just the client's and server's network address in
Top   ToC   RFC7530 - Page 101
      the id string, there is a real risk, after the client gives up the
      network address, that another client, using a similar algorithm
      for generating the id string, will generate a conflicting id
      string.

   Given the above considerations, an example of a well-generated id
   string is one that includes:

   o  The server's network address.

   o  The client's network address.

   o  For a user-level NFSv4 client, it should contain additional
      information to distinguish the client from other user-level
      clients running on the same host, such as a universally unique
      identifier (UUID).

   o  Additional information that tends to be unique, such as one or
      more of:

      *  The client machine's serial number (for privacy reasons, it is
         best to perform some one-way function on the serial number).

      *  A MAC address (for privacy reasons, it is best to perform some
         one-way function on the MAC address).

      *  The timestamp of when the NFSv4 software was first installed on
         the client (though this is subject to the previously mentioned
         caution about using information that is stored in a file,
         because the file might only be accessible over NFSv4).

      *  A true random number.  However, since this number ought to be
         the same between client incarnations, this shares the same
         problem as that of using the timestamp of the software
         installation.

   As a security measure, the server MUST NOT cancel a client's leased
   state if the principal that established the state for a given id
   string is not the same as the principal issuing the SETCLIENTID.

   Note that SETCLIENTID (Section 16.33) and SETCLIENTID_CONFIRM
   (Section 16.34) have a secondary purpose of establishing the
   information the server needs to make callbacks to the client for the
   purpose of supporting delegations.  It is permitted to change this
   information via SETCLIENTID and SETCLIENTID_CONFIRM within the same
   incarnation of the client without removing the client's leased state.
Top   ToC   RFC7530 - Page 102
   Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully
   completed, the client uses the shorthand client identifier, of type
   clientid4, instead of the longer and less compact nfs_client_id4
   structure.  This shorthand client identifier (a client ID) is
   assigned by the server and should be chosen so that it will not
   conflict with a client ID previously assigned by the server.  This
   applies across server restarts or reboots.  When a client ID is
   presented to a server and that client ID is not recognized, as would
   happen after a server reboot, the server will reject the request with
   the error NFS4ERR_STALE_CLIENTID.  When this happens, the client must
   obtain a new client ID by use of the SETCLIENTID operation and then
   proceed to any other necessary recovery for the server reboot case
   (see Section 9.6.2).

   The client must also employ the SETCLIENTID operation when it
   receives an NFS4ERR_STALE_STATEID error using a stateid derived from
   its current client ID, since this also indicates a server reboot,
   which has invalidated the existing client ID (see Section 9.6.2 for
   details).

   See the detailed descriptions of SETCLIENTID (Section 16.33.4) and
   SETCLIENTID_CONFIRM (Section 16.34.4) for a complete specification of
   the operations.

9.1.2. Server Release of Client ID

If the server determines that the client holds no associated state for its client ID, the server may choose to release the client ID. The server may make this choice for an inactive client so that resources are not consumed by those intermittently active clients. If the client contacts the server after this release, the server must ensure that the client receives the appropriate error so that it will use the SETCLIENTID/SETCLIENTID_CONFIRM sequence to establish a new identity. It should be clear that the server must be very hesitant to release a client ID since the resulting work on the client to recover from such an event will be the same burden as if the server had failed and restarted. Typically, a server would not release a client ID unless there had been no activity from that client for many minutes. Note that if the id string in a SETCLIENTID request is properly constructed, and if the client takes care to use the same principal for each successive use of SETCLIENTID, then, barring an active denial-of-service attack, NFS4ERR_CLID_INUSE should never be returned.
Top   ToC   RFC7530 - Page 103
   However, client bugs, server bugs, or perhaps a deliberate change of
   the principal owner of the id string (such as the case of a client
   that changes security flavors, and under the new flavor there is no
   mapping to the previous owner) will in rare cases result in
   NFS4ERR_CLID_INUSE.

   In that event, when the server gets a SETCLIENTID for a client ID
   that currently has no state, or it has state but the lease has
   expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST
   allow the SETCLIENTID and confirm the new client ID if followed by
   the appropriate SETCLIENTID_CONFIRM.

9.1.3. Use of Seqids

In several contexts, 32-bit sequence values called "seqids" are used as part of managing locking state. Such values are used: o To provide an ordering of locking-related operations associated with a particular lock-owner or open-owner. See Section 9.1.7 for a detailed explanation. o To define an ordered set of instances of a set of locks sharing a particular set of ownership characteristics. See Section 9.1.4.2 for a detailed explanation. Successive seqid values for the same object are normally arrived at by incrementing the current value by one. This pattern continues until the seqid is incremented past NFS4_UINT32_MAX, in which case one (rather than zero) is to be the next seqid value. When two seqid values are to be compared to determine which of the two is later, the possibility of wraparound needs to be considered. In many cases, the values are such that simple numeric comparisons can be used. For example, if the seqid values to be compared are both less than one million, the higher value can be considered the later. On the other hand, if one of the values is at or near NFS_UINT32_MAX and the other is less than one million, then implementations can reasonably decide that the lower value has had one more wraparound and is thus, while numerically lower, actually later. Implementations can compare seqids in the presence of potential wraparound by adopting the reasonable assumption that the chain of increments from one to the other is shorter than 2**31. So, if the difference between the two seqids is less than 2**31, then the lower seqid is to be treated as earlier. If, however, the difference
Top   ToC   RFC7530 - Page 104
   between the two seqids is greater than or equal to 2**31, then it can
   be assumed that the lower seqid has encountered one more wraparound
   and can be treated as later.

9.1.4. Stateid Definition

When the server grants a lock of any type (including opens, byte-range locks, and delegations), it responds with a unique stateid that represents a set of locks (often a single lock) for the same file, of the same type, and sharing the same ownership characteristics. Thus, opens of the same file by different open-owners each have an identifying stateid. Similarly, each set of byte-range locks on a file owned by a specific lock-owner has its own identifying stateid. Delegations also have associated stateids by which they may be referenced. The stateid is used as a shorthand reference to a lock or set of locks, and given a stateid, the server can determine the associated state-owner or state-owners (in the case of an open-owner/lock-owner pair) and the associated filehandle. When stateids are used, the current filehandle must be the one associated with that stateid. All stateids associated with a given client ID are associated with a common lease that represents the claim of those stateids and the objects they represent to be maintained by the server. See Section 9.5 for a discussion of the lease. Each stateid must be unique to the server. Many operations take a stateid as an argument but not a clientid, so the server must be able to infer the client from the stateid.
9.1.4.1. Stateid Types
With the exception of special stateids (see Section 9.1.4.3), each stateid represents locking objects of one of a set of types defined by the NFSv4 protocol. Note that in all these cases, where we speak of a guarantee, it is understood there are situations such as a client restart, or lock revocation, that allow the guarantee to be voided. o Stateids may represent opens of files. Each stateid in this case represents the OPEN state for a given client ID/open-owner/filehandle triple. Such stateids are subject to change (with consequent incrementing of the stateid's seqid) in response to OPENs that result in upgrade and OPEN_DOWNGRADE operations.
Top   ToC   RFC7530 - Page 105
   o  Stateids may represent sets of byte-range locks.

      All locks held on a particular file by a particular owner and all
      gotten under the aegis of a particular open file are associated
      with a single stateid, with the seqid being incremented whenever
      LOCK and LOCKU operations affect that set of locks.

   o  Stateids may represent file delegations, which are recallable
      guarantees by the server to the client that other clients will not
      reference, or will not modify, a particular file until the
      delegation is returned.

      A stateid represents a single delegation held by a client for a
      particular filehandle.

9.1.4.2. Stateid Structure
Stateids are divided into two fields: a 96-bit "other" field identifying the specific set of locks and a 32-bit "seqid" sequence value. Except in the case of special stateids (see Section 9.1.4.3), a particular value of the "other" field denotes a set of locks of the same type (for example, byte-range locks, opens, or delegations), for a specific file or directory, and sharing the same ownership characteristics. The seqid designates a specific instance of such a set of locks, and is incremented to indicate changes in such a set of locks, by either the addition or deletion of locks from the set, a change in the byte-range they apply to, or an upgrade or downgrade in the type of one or more locks. When such a set of locks is first created, the server returns a stateid with a seqid value of one. On subsequent operations that modify the set of locks, the server is required to advance the seqid field by one whenever it returns a stateid for the same state-owner/file/type combination and the operation is one that might make some change in the set of locks actually designated. In this case, the server will return a stateid with an "other" field the same as previously used for that state-owner/file/type combination, with an incremented seqid field. Seqids will be compared, by both the client and the server. The client uses such comparisons to determine the order of operations, while the server uses them to determine whether the NFS4ERR_OLD_STATEID error is to be returned. In all cases, the possibility of seqid wraparound needs to be taken into account, as discussed in Section 9.1.3.
Top   ToC   RFC7530 - Page 106
9.1.4.3. Special Stateids
Stateid values whose "other" field is either all zeros or all ones are reserved. They MUST NOT be assigned by the server but have special meanings defined by the protocol. The particular meaning depends on whether the "other" field is all zeros or all ones and the specific value of the seqid field. The following combinations of "other" and seqid are defined in NFSv4: Anonymous Stateid: When "other" and seqid are both zero, the stateid is treated as a special anonymous stateid, which can be used in READ, WRITE, and SETATTR requests to indicate the absence of any open state associated with the request. When an anonymous stateid value is used, and an existing open denies the form of access requested, then access will be denied to the request. READ Bypass Stateid: When "other" and seqid are both all ones, the stateid is a special READ bypass stateid. When this value is used in WRITE or SETATTR, it is treated like the anonymous value. When used in READ, the server MAY grant access, even if access would normally be denied to READ requests. If a stateid value is used that has all zeros or all ones in the "other" field but does not match one of the cases above, the server MUST return the error NFS4ERR_BAD_STATEID. Special stateids, unlike other stateids, are not associated with individual client IDs or filehandles and can be used with all valid client IDs and filehandles.
9.1.4.4. Stateid Lifetime and Validation
Stateids must remain valid until either a client restart or a server restart, or until the client returns all of the locks associated with the stateid by means of an operation such as CLOSE or DELEGRETURN. If the locks are lost due to revocation, as long as the client ID is valid, the stateid remains a valid designation of that revoked state. Stateids associated with byte-range locks are an exception. They remain valid even if a LOCKU frees all remaining locks, so long as the open file with which they are associated remains open. It should be noted that there are situations in which the client's locks become invalid, without the client requesting they be returned. These include lease expiration and a number of forms of lock revocation within the lease period. It is important to note that in these situations, the stateid remains valid and the client can use it to determine the disposition of the associated lost locks.
Top   ToC   RFC7530 - Page 107
   An "other" value must never be reused for a different purpose (i.e.,
   different filehandle, owner, or type of locks) within the context of
   a single client ID.  A server may retain the "other" value for the
   same purpose beyond the point where it may otherwise be freed, but if
   it does so, it must maintain seqid continuity with previous values.

   One mechanism that may be used to satisfy the requirement that the
   server recognize invalid and out-of-date stateids is for the server
   to divide the "other" field of the stateid into two fields:

   o  An index into a table of locking-state structures.

   o  A generation number that is incremented on each allocation of a
      table entry for a particular use.

   And then store the following in each table entry:

   o  The client ID with which the stateid is associated.

   o  The current generation number for the (at most one) valid stateid
      sharing this index value.

   o  The filehandle of the file on which the locks are taken.

   o  An indication of the type of stateid (open, byte-range lock, file
      delegation).

   o  The last seqid value returned corresponding to the current "other"
      value.

   o  An indication of the current status of the locks associated with
      this stateid -- in particular, whether these have been revoked
      and, if so, for what reason.

   With this information, an incoming stateid can be validated and the
   appropriate error returned when necessary.  Special and non-special
   stateids are handled separately.  (See Section 9.1.4.3 for a
   discussion of special stateids.)

   When a stateid is being tested, and the "other" field is all zeros or
   all ones, a check that the "other" and seqid fields match a defined
   combination for a special stateid is done and the results determined
   as follows:

   o  If the "other" and seqid fields do not match a defined combination
      associated with a special stateid, the error NFS4ERR_BAD_STATEID
      is returned.
Top   ToC   RFC7530 - Page 108
   o  If the combination is valid in general but is not appropriate to
      the context in which the stateid is used (e.g., an all-zero
      stateid is used when an open stateid is required in a LOCK
      operation), the error NFS4ERR_BAD_STATEID is also returned.

   o  Otherwise, the check is completed and the special stateid is
      accepted as valid.

   When a stateid is being tested, and the "other" field is neither all
   zeros nor all ones, the following procedure could be used to validate
   an incoming stateid and return an appropriate error, when necessary,
   assuming that the "other" field would be divided into a table index
   and an entry generation.  Note that the terms "earlier" and "later"
   used in connection with seqid comparison are to be understood as
   explained in Section 9.1.3.

   o  If the table index field is outside the range of the associated
      table, return NFS4ERR_BAD_STATEID.

   o  If the selected table entry is of a different generation than that
      specified in the incoming stateid, return NFS4ERR_BAD_STATEID.

   o  If the selected table entry does not match the current filehandle,
      return NFS4ERR_BAD_STATEID.

   o  If the stateid represents revoked state or state lost as a result
      of lease expiration, then return NFS4ERR_EXPIRED,
      NFS4ERR_BAD_STATEID, or NFS4ERR_ADMIN_REVOKED, as appropriate.

   o  If the stateid type is not valid for the context in which the
      stateid appears, return NFS4ERR_BAD_STATEID.  Note that a stateid
      may be valid in general but invalid for a particular operation,
      as, for example, when a stateid that doesn't represent byte-range
      locks is passed to the non-from_open case of LOCK or to LOCKU, or
      when a stateid that does not represent an open is passed to CLOSE
      or OPEN_DOWNGRADE.  In such cases, the server MUST return
      NFS4ERR_BAD_STATEID.

   o  If the seqid field is not zero and it is later than the current
      sequence value corresponding to the current "other" field, return
      NFS4ERR_BAD_STATEID.

   o  If the seqid field is earlier than the current sequence value
      corresponding to the current "other" field, return
      NFS4ERR_OLD_STATEID.
Top   ToC   RFC7530 - Page 109
   o  Otherwise, the stateid is valid, and the table entry should
      contain any additional information about the type of stateid and
      information associated with that particular type of stateid, such
      as the associated set of locks (e.g., open-owner and lock-owner
      information), as well as information on the specific locks
      themselves, such as open modes and byte ranges.

9.1.4.5. Stateid Use for I/O Operations
Clients performing Input/Output (I/O) operations need to select an appropriate stateid based on the locks (including opens and delegations) held by the client and the various types of state-owners sending the I/O requests. SETATTR operations that change the file size are treated like I/O operations in this regard. The following rules, applied in order of decreasing priority, govern the selection of the appropriate stateid. In following these rules, the client will only consider locks of which it has actually received notification by an appropriate operation response or callback. o If the client holds a delegation for the file in question, the delegation stateid SHOULD be used. o Otherwise, if the entity corresponding to the lock-owner (e.g., a process) sending the I/O has a byte-range lock stateid for the associated open file, then the byte-range lock stateid for that lock-owner and open file SHOULD be used. o If there is no byte-range lock stateid, then the OPEN stateid for the current open-owner, i.e., the OPEN stateid for the open file in question, SHOULD be used. o Finally, if none of the above apply, then a special stateid SHOULD be used. Ignoring these rules may result in situations in which the server does not have information necessary to properly process the request. For example, when mandatory byte-range locks are in effect, if the stateid does not indicate the proper lock-owner, via a lock stateid, a request might be avoidably rejected. The server, however, should not try to enforce these ordering rules and should use whatever information is available to properly process I/O requests. In particular, when a client has a delegation for a given file, it SHOULD take note of this fact in processing a request, even if it is sent with a special stateid.
Top   ToC   RFC7530 - Page 110
9.1.4.6. Stateid Use for SETATTR Operations
In the case of SETATTR operations, a stateid is present. In cases other than those that set the file size, the client may send either a special stateid or, when a delegation is held for the file in question, a delegation stateid. While the server SHOULD validate the stateid and may use the stateid to optimize the determination as to whether a delegation is held, it SHOULD note the presence of a delegation even when a special stateid is sent, and MUST accept a valid delegation stateid when sent.

9.1.5. Lock-Owner

When requesting a lock, the client must present to the server the client ID and an identifier for the owner of the requested lock. These two fields comprise the lock-owner and are defined as follows: o A client ID returned by the server as part of the client's use of the SETCLIENTID operation. o A variable-length opaque array used to uniquely define the owner of a lock managed by the client. This may be a thread id, process id, or other unique value. When the server grants the lock, it responds with a unique stateid. The stateid is used as a shorthand reference to the lock-owner, since the server will be maintaining the correspondence between them.

9.1.6. Use of the Stateid and Locking

All READ, WRITE, and SETATTR operations contain a stateid. For the purposes of this section, SETATTR operations that change the size attribute of a file are treated as if they are writing the area between the old and new size (i.e., the range truncated or added to the file by means of the SETATTR), even where SETATTR is not explicitly mentioned in the text. The stateid passed to one of these operations must be one that represents an OPEN (e.g., via the open-owner), a set of byte-range locks, or a delegation, or it may be a special stateid representing anonymous access or the READ bypass stateid. If the state-owner performs a READ or WRITE in a situation in which it has established a lock or share reservation on the server (any OPEN constitutes a share reservation), the stateid (previously returned by the server) must be used to indicate what locks, including both byte-range locks and share reservations, are held by the state-owner. If no state is established by the client -- either
Top   ToC   RFC7530 - Page 111
   byte-range lock or share reservation -- the anonymous stateid is
   used.  Regardless of whether an anonymous stateid or a stateid
   returned by the server is used, if there is a conflicting share
   reservation or mandatory byte-range lock held on the file, the server
   MUST refuse to service the READ or WRITE operation.

   Share reservations are established by OPEN operations and by their
   nature are mandatory in that when the OPEN denies READ or WRITE
   operations, that denial results in such operations being rejected
   with error NFS4ERR_LOCKED.  Byte-range locks may be implemented by
   the server as either mandatory or advisory, or the choice of
   mandatory or advisory behavior may be determined by the server on the
   basis of the file being accessed (for example, some UNIX-based
   servers support a "mandatory lock bit" on the mode attribute such
   that if set, byte-range locks are required on the file before I/O is
   possible).  When byte-range locks are advisory, they only prevent the
   granting of conflicting lock requests and have no effect on READs or
   WRITEs.  Mandatory byte-range locks, however, prevent conflicting I/O
   operations.  When they are attempted, they are rejected with
   NFS4ERR_LOCKED.  When the client gets NFS4ERR_LOCKED on a file it
   knows it has the proper share reservation for, it will need to issue
   a LOCK request on the region of the file that includes the region the
   I/O was to be performed on, with an appropriate locktype (i.e.,
   READ*_LT for a READ operation, WRITE*_LT for a WRITE operation).

   With NFSv3, there was no notion of a stateid, so there was no way to
   tell if the application process of the client sending the READ or
   WRITE operation had also acquired the appropriate byte-range lock on
   the file.  Thus, there was no way to implement mandatory locking.
   With the stateid construct, this barrier has been removed.

   Note that for UNIX environments that support mandatory file locking,
   the distinction between advisory and mandatory locking is subtle.  In
   fact, advisory and mandatory byte-range locks are exactly the same
   insofar as the APIs and requirements on implementation are concerned.
   If the mandatory lock attribute is set on the file, the server checks
   to see if the lock-owner has an appropriate shared (read) or
   exclusive (write) byte-range lock on the region it wishes to read or
   write to.  If there is no appropriate lock, the server checks if
   there is a conflicting lock (which can be done by attempting to
   acquire the conflicting lock on behalf of the lock-owner and, if
   successful, release the lock after the READ or WRITE is done), and if
   there is, the server returns NFS4ERR_LOCKED.

   For Windows environments, there are no advisory byte-range locks, so
   the server always checks for byte-range locks during I/O requests.
Top   ToC   RFC7530 - Page 112
   Thus, the NFSv4 LOCK operation does not need to distinguish between
   advisory and mandatory byte-range locks.  It is the NFSv4 server's
   processing of the READ and WRITE operations that introduces the
   distinction.

   Every stateid other than the special stateid values noted in this
   section, whether returned by an OPEN-type operation (i.e., OPEN,
   OPEN_DOWNGRADE) or by a LOCK-type operation (i.e., LOCK or LOCKU),
   defines an access mode for the file (i.e., READ, WRITE, or
   READ-WRITE) as established by the original OPEN that began the
   stateid sequence, and as modified by subsequent OPENs and
   OPEN_DOWNGRADEs within that stateid sequence.  When a READ, WRITE, or
   SETATTR that specifies the size attribute is done, the operation is
   subject to checking against the access mode to verify that the
   operation is appropriate given the OPEN with which the operation is
   associated.

   In the case of WRITE-type operations (i.e., WRITEs and SETATTRs that
   set size), the server must verify that the access mode allows writing
   and return an NFS4ERR_OPENMODE error if it does not.  In the case of
   READ, the server may perform the corresponding check on the access
   mode, or it may choose to allow READ on opens for WRITE only, to
   accommodate clients whose write implementation may unavoidably do
   reads (e.g., due to buffer cache constraints).  However, even if
   READs are allowed in these circumstances, the server MUST still check
   for locks that conflict with the READ (e.g., another open specifying
   denial of READs).  Note that a server that does enforce the access
   mode check on READs need not explicitly check for conflicting share
   reservations since the existence of OPEN for read access guarantees
   that no conflicting share reservation can exist.

   A READ bypass stateid MAY allow READ operations to bypass locking
   checks at the server.  However, WRITE operations with a READ bypass
   stateid MUST NOT bypass locking checks and are treated exactly the
   same as if an anonymous stateid were used.

   A lock may not be granted while a READ or WRITE operation using one
   of the special stateids is being performed and the range of the lock
   request conflicts with the range of the READ or WRITE operation.  For
   the purposes of this paragraph, a conflict occurs when a shared lock
   is requested and a WRITE operation is being performed, or an
   exclusive lock is requested and either a READ or a WRITE operation is
   being performed.  A SETATTR that sets size is treated similarly to a
   WRITE as discussed above.
Top   ToC   RFC7530 - Page 113

9.1.7. Sequencing of Lock Requests

Locking is different than most NFS operations as it requires "at-most-one" semantics that are not provided by ONC RPC. ONC RPC over a reliable transport is not sufficient because a sequence of locking requests may span multiple TCP connections. In the face of retransmission or reordering, lock or unlock requests must have a well-defined and consistent behavior. To accomplish this, each lock request contains a sequence number that is a consecutively increasing integer. Different state-owners have different sequences. The server maintains the last sequence number (L) received and the response that was returned. The server SHOULD assign a seqid value of one for the first request issued for any given state-owner. Subsequent values are arrived at by incrementing the seqid value, subject to wraparound as described in Section 9.1.3. Note that for requests that contain a sequence number, for each state-owner, there should be no more than one outstanding request. When a request is received, its sequence number (r) is compared to that of the last one received (L). Only if it has the correct next sequence, normally L + 1, is the request processed beyond the point of seqid checking. Given a properly functioning client, the response to (r) must have been received before the last request (L) was sent. If a duplicate of last request (r == L) is received, the stored response is returned. If the sequence value received is any other value, it is rejected with the return of error NFS4ERR_BAD_SEQID. Sequence history is reinitialized whenever the SETCLIENTID/ SETCLIENTID_CONFIRM sequence changes the client verifier. It is critical that the server maintain the last response sent to the client to provide a more reliable cache of duplicate non-idempotent requests than that of the traditional cache described in [Chet]. The traditional duplicate request cache uses a least recently used algorithm for removing unneeded requests. However, the last lock request and response on a given state-owner must be cached as long as the lock state exists on the server. The client MUST advance the sequence number for the CLOSE, LOCK, LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE operations. This is true even in the event that the previous operation that used the sequence number received an error. The only exception to this rule is if the previous operation received one of the following errors: NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID, NFS4ERR_BAD_STATEID, NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR, NFS4ERR_RESOURCE, NFS4ERR_NOFILEHANDLE, or NFS4ERR_MOVED.
Top   ToC   RFC7530 - Page 114

9.1.8. Recovery from Replayed Requests

As described above, the sequence number is per state-owner. As long as the server maintains the last sequence number received and follows the methods described above, there are no risks of a Byzantine router re-sending old requests. The server need only maintain the (state-owner, sequence number) state as long as there are open files or closed files with locks outstanding. LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence number, and therefore the risk of the replay of these operations resulting in undesired effects is non-existent while the server maintains the state-owner state.

9.1.9. Interactions of Multiple Sequence Values

Some operations may have multiple sources of data for request sequence checking and retransmission determination. Some operations have multiple sequence values associated with multiple types of state-owners. In addition, such operations may also have a stateid with its own seqid value, that will be checked for validity. As noted above, there may be multiple sequence values to check. The following rules should be followed by the server in processing these multiple sequence values within a single operation. o When a sequence value associated with a state-owner is unavailable for checking because the state-owner is unknown to the server, it takes no part in the comparison. o When any of the state-owner sequence values are invalid, NFS4ERR_BAD_SEQID is returned. When a stateid sequence is checked, NFS4ERR_BAD_STATEID or NFS4ERR_OLD_STATEID is returned as appropriate, but NFS4ERR_BAD_SEQID has priority. o When any one of the sequence values matches a previous request, for a state-owner, it is treated as a retransmission and not re-executed. When the type of the operation does not match that originally used, NFS4ERR_BAD_SEQID is returned. When the server can determine that the request differs from the original, it may return NFS4ERR_BAD_SEQID. o When multiple sequence values match previous operations but the operations are not the same, NFS4ERR_BAD_SEQID is returned.
Top   ToC   RFC7530 - Page 115
   o  When there are no sequence values available for comparison and the
      operation is an OPEN, the server indicates to the client that an
      OPEN_CONFIRM is required, unless it can conclusively determine
      that confirmation is not required (e.g., by knowing that no
      open-owner state has ever been released for the current clientid).

9.1.10. Releasing State-Owner State

When a particular state-owner no longer holds open or file locking state at the server, the server may choose to release the sequence number state associated with the state-owner. The server may make this choice based on lease expiration, the reclamation of server memory, or other implementation-specific details. Note that when this is done, a retransmitted request, normally identified by a matching state-owner sequence, may not be correctly recognized, so that the client will not receive the original response that it would have if the state-owner state was not released. If the server were able to be sure that a given state-owner would never again be used by a client, such an issue could not arise. Even when the state-owner state is released and the client subsequently uses that state-owner, retransmitted requests will be detected as invalid and the request not executed, although the client may have a recovery path that is more complicated than simply getting the original response back transparently. In any event, the server is able to safely release state-owner state (in the sense that retransmitted requests will not be erroneously acted upon) when the state-owner is not currently being utilized by the client (i.e., there are no open files associated with an open-owner and no lock stateids associated with a lock-owner). The server may choose to hold the state-owner state in order to simplify the recovery path, in the case in which retransmissions of currently active requests are received. However, the period for which it chooses to hold this state is implementation specific. In the case that a LOCK, LOCKU, OPEN_DOWNGRADE, or CLOSE is retransmitted after the server has previously released the state-owner state, the server will find that the state-owner has no files open and an error will be returned to the client. If the state-owner does have a file open, the stateid will not match and again an error is returned to the client.
Top   ToC   RFC7530 - Page 116

9.1.11. Use of Open Confirmation

In the case that an OPEN is retransmitted and the open-owner is being used for the first time or the open-owner state has been previously released by the server, the use of the OPEN_CONFIRM operation will prevent incorrect behavior. When the server observes the use of the open-owner for the first time, it will direct the client to perform the OPEN_CONFIRM for the corresponding OPEN. This sequence establishes the use of an open-owner and associated sequence number. Since the OPEN_CONFIRM sequence connects a new open-owner on the server with an existing open-owner on a client, the sequence number may have any valid (i.e., non-zero) value. The OPEN_CONFIRM step assures the server that the value received is the correct one. (See Section 16.18 for further details.) There are a number of situations in which the requirement to confirm an OPEN would pose difficulties for the client and server, in that they would be prevented from acting in a timely fashion on information received, because that information would be provisional, subject to deletion upon non-confirmation. Fortunately, these are situations in which the server can avoid the need for confirmation when responding to open requests. The two constraints are: o The server must not bestow a delegation for any open that would require confirmation. o The server MUST NOT require confirmation on a reclaim-type open (i.e., one specifying claim type CLAIM_PREVIOUS or CLAIM_DELEGATE_PREV). These constraints are related in that reclaim-type opens are the only ones in which the server may be required to send a delegation. For CLAIM_NULL, sending the delegation is optional, while for CLAIM_DELEGATE_CUR, no delegation is sent. Delegations being sent with an open requiring confirmation are troublesome because recovering from non-confirmation adds undue complexity to the protocol, while requiring confirmation on reclaim- type opens poses difficulties in that the inability to resolve the status of the reclaim until lease expiration may make it difficult to have timely determination of the set of locks being reclaimed (since the grace period may expire). Requiring open confirmation on reclaim-type opens is avoidable because of the nature of the environments in which such opens are done. For CLAIM_PREVIOUS opens, this is immediately after server reboot, so there should be no time for open-owners to be created, found to be unused, and recycled. For CLAIM_DELEGATE_PREV opens,
Top   ToC   RFC7530 - Page 117
   we are dealing with either a client reboot situation or a network
   partition resulting in deletion of lease state (and returning
   NFS4ERR_EXPIRED).  A server that supports delegations can be sure
   that no open-owners for that client have been recycled since client
   initialization or deletion of lease state and thus can be confident
   that confirmation will not be required.

9.2. Lock Ranges

The protocol allows a lock-owner to request a lock with a byte range and then either upgrade or unlock a sub-range of the initial lock. It is expected that this will be an uncommon type of request. In any case, servers or server file systems may not be able to support sub-range lock semantics. In the event that a server receives a locking request that represents a sub-range of current locking state for the lock-owner, the server is allowed to return the error NFS4ERR_LOCK_RANGE to signify that it does not support sub-range lock operations. Therefore, the client should be prepared to receive this error and, if appropriate, report the error to the requesting application. The client is discouraged from combining multiple independent locking ranges that happen to be adjacent into a single request, since the server may not support sub-range requests, and for reasons related to the recovery of file locking state in the event of server failure. As discussed in Section 9.6.2 below, the server may employ certain optimizations during recovery that work effectively only when the client's behavior during lock recovery is similar to the client's locking behavior prior to server failure.

9.3. Upgrading and Downgrading Locks

If a client has a write lock on a record, it can request an atomic downgrade of the lock to a read lock via the LOCK request, by setting the type to READ_LT. If the server supports atomic downgrade, the request will succeed. If not, it will return NFS4ERR_LOCK_NOTSUPP. The client should be prepared to receive this error and, if appropriate, report the error to the requesting application. If a client has a read lock on a record, it can request an atomic upgrade of the lock to a write lock via the LOCK request by setting the type to WRITE_LT or WRITEW_LT. If the server does not support atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP. If the upgrade can be achieved without an existing conflict, the request will succeed. Otherwise, the server will return either NFS4ERR_DENIED or NFS4ERR_DEADLOCK. The error NFS4ERR_DEADLOCK is returned if the client issued the LOCK request with the type set to WRITEW_LT and the
Top   ToC   RFC7530 - Page 118
   server has detected a deadlock.  The client should be prepared to
   receive such errors and, if appropriate, report them to the
   requesting application.

9.4. Blocking Locks

Some clients require the support of blocking locks. The NFSv4 protocol must not rely on a callback mechanism and therefore is unable to notify a client when a previously denied lock has been granted. Clients have no choice but to continually poll for the lock. This presents a fairness problem. Two new lock types are added, READW and WRITEW, and are used to indicate to the server that the client is requesting a blocking lock. The server should maintain an ordered list of pending blocking locks. When the conflicting lock is released, the server may wait the lease period for the first waiting client to re-request the lock. After the lease period expires, the next waiting client request is allowed the lock. Clients are required to poll at an interval sufficiently small that it is likely to acquire the lock in a timely manner. The server is not required to maintain a list of pending blocked locks, as it is not used to provide correct operation but only to increase fairness. Because of the unordered nature of crash recovery, storing of lock state to stable storage would be required to guarantee ordered granting of blocking locks. Servers may also note the lock types and delay returning denial of the request to allow extra time for a conflicting lock to be released, allowing a successful return. In this way, clients can avoid the burden of needlessly frequent polling for blocking locks. The server should take care with the length of delay in the event that the client retransmits the request. If a server receives a blocking lock request, denies it, and then later receives a non-blocking request for the same lock, which is also denied, then it should remove the lock in question from its list of pending blocking locks. Clients should use such a non-blocking request to indicate to the server that this is the last time they intend to poll for the lock, as may happen when the process requesting the lock is interrupted. This is a courtesy to the server, to prevent it from unnecessarily waiting a lease period before granting other lock requests. However, clients are not required to perform this courtesy, and servers must not depend on them doing so. Also, clients must be prepared for the possibility that this final locking request will be accepted.


(next page on part 7)

Next Section