tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search     info

RFC 7530

 
 
 

Network File System (NFS) Version 4 Protocol

Part 6 of 14, p. 98 to 118
Prev RFC Part       Next RFC Part

 


prevText      Top      Up      ToC       Page 98 
9.  File Locking and Share Reservations

   Integrating locking into the NFS protocol necessarily causes it to be
   stateful.  With the inclusion of share reservations, the protocol
   becomes substantially more dependent on state than the traditional
   combination of NFS and NLM (Network Lock Manager) [xnfs].  There are
   three components to making this state manageable:

   o  clear division between client and server

   o  ability to reliably detect inconsistency in state between client
      and server

   o  simple and robust recovery mechanisms

   In this model, the server owns the state information.  The client
   requests changes in locks, and the server responds with the changes
   made.  Non-client-initiated changes in locking state are infrequent.
   The client receives prompt notification of such changes and can
   adjust its view of the locking state to reflect the server's changes.

   Individual pieces of state created by the server and passed to the
   client at its request are represented by 128-bit stateids.  These
   stateids may represent a particular open file, a set of byte-range
   locks held by a particular owner, or a recallable delegation of
   privileges to access a file in particular ways or at a particular
   location.

   In all cases, there is a transition from the most general information
   that represents a client as a whole to the eventual lightweight
   stateid used for most client and server locking interactions.  The
   details of this transition will vary with the type of object, but it
   always starts with a client ID.

   To support Win32 share reservations, it is necessary to atomically
   OPEN or CREATE files and apply the appropriate locks in the same
   operation.  Having a separate share/unshare operation would not allow
   correct implementation of the Win32 OpenFile API.  In order to
   correctly implement share semantics, the previous NFS protocol
   mechanisms used when a file is opened or created (LOOKUP, CREATE,
   ACCESS) need to be replaced.  The NFSv4 protocol has an OPEN
   operation that subsumes the NFSv3 methodology of LOOKUP, CREATE, and
   ACCESS.  However, because many operations require a filehandle, the
   traditional LOOKUP is preserved to map a filename to a filehandle
   without establishing state on the server.  The policy of granting
   access or modifying files is managed by the server based on the
   client's state.  These mechanisms can implement policy ranging from
   advisory only locking to full mandatory locking.

Top      Up      ToC       Page 99 
9.1.  Opens and Byte-Range Locks

   It is assumed that manipulating a byte-range lock is rare when
   compared to READ and WRITE operations.  It is also assumed that
   server restarts and network partitions are relatively rare.
   Therefore, it is important that the READ and WRITE operations have a
   lightweight mechanism to indicate if they possess a held lock.  A
   byte-range lock request contains the heavyweight information required
   to establish a lock and uniquely define the owner of the lock.

   The following sections describe the transition from the heavyweight
   information to the eventual stateid used for most client and server
   locking and lease interactions.

9.1.1.  Client ID

   For each LOCK request, the client must identify itself to the server.
   This is done in such a way as to allow for correct lock
   identification and crash recovery.  A sequence of a SETCLIENTID
   operation followed by a SETCLIENTID_CONFIRM operation is required to
   establish the identification onto the server.  Establishment of
   identification by a new incarnation of the client also has the effect
   of immediately breaking any leased state that a previous incarnation
   of the client might have had on the server, as opposed to forcing the
   new client incarnation to wait for the leases to expire.  Breaking
   the lease state amounts to the server removing all lock, share
   reservation, and, where the server is not supporting the
   CLAIM_DELEGATE_PREV claim type, all delegation state associated with
   the same client with the same identity.  For a discussion of
   delegation state recovery, see Section 10.2.1.

   Owners of opens and owners of byte-range locks are separate entities
   and remain separate even if the same opaque arrays are used to
   designate owners of each.  The protocol distinguishes between
   open-owners (represented by open_owner4 structures) and lock-owners
   (represented by lock_owner4 structures).

   Both sorts of owners consist of a clientid and an opaque owner
   string.  For each client, the set of distinct owner values used with
   that client constitutes the set of owners of that type, for the given
   client.

   Each open is associated with a specific open-owner, while each
   byte-range lock is associated with a lock-owner and an open-owner,
   the latter being the open-owner associated with the open file under
   which the LOCK operation was done.

Top      Up      ToC       Page 100 
   Client identification is encapsulated in the following structure:

   struct nfs_client_id4 {
           verifier4       verifier;
           opaque          id<NFS4_OPAQUE_LIMIT>;
   };

   The first field, verifier, is a client incarnation verifier that is
   used to detect client reboots.  Only if the verifier is different
   from that which the server has previously recorded for the client (as
   identified by the second field of the structure, id) does the server
   start the process of canceling the client's leased state.

   The second field, id, is a variable-length string that uniquely
   defines the client.

   There are several considerations for how the client generates the id
   string:

   o  The string should be unique so that multiple clients do not
      present the same string.  The consequences of two clients
      presenting the same string range from one client getting an error
      to one client having its leased state abruptly and unexpectedly
      canceled.

   o  The string should be selected so the subsequent incarnations
      (e.g., reboots) of the same client cause the client to present the
      same string.  The implementer is cautioned against an approach
      that requires the string to be recorded in a local file because
      this precludes the use of the implementation in an environment
      where there is no local disk and all file access is from an NFSv4
      server.

   o  The string should be different for each server network address
      that the client accesses, rather than common to all server network
      addresses.  The reason is that it may not be possible for the
      client to tell if the same server is listening on multiple network
      addresses.  If the client issues SETCLIENTID with the same id
      string to each network address of such a server, the server will
      think it is the same client, and each successive SETCLIENTID will
      cause the server to begin the process of removing the client's
      previous leased state.

   o  The algorithm for generating the string should not assume that the
      client's network address won't change.  This includes changes
      between client incarnations and even changes while the client is
      still running in its current incarnation.  This means that if the
      client includes just the client's and server's network address in

Top      Up      ToC       Page 101 
      the id string, there is a real risk, after the client gives up the
      network address, that another client, using a similar algorithm
      for generating the id string, will generate a conflicting id
      string.

   Given the above considerations, an example of a well-generated id
   string is one that includes:

   o  The server's network address.

   o  The client's network address.

   o  For a user-level NFSv4 client, it should contain additional
      information to distinguish the client from other user-level
      clients running on the same host, such as a universally unique
      identifier (UUID).

   o  Additional information that tends to be unique, such as one or
      more of:

      *  The client machine's serial number (for privacy reasons, it is
         best to perform some one-way function on the serial number).

      *  A MAC address (for privacy reasons, it is best to perform some
         one-way function on the MAC address).

      *  The timestamp of when the NFSv4 software was first installed on
         the client (though this is subject to the previously mentioned
         caution about using information that is stored in a file,
         because the file might only be accessible over NFSv4).

      *  A true random number.  However, since this number ought to be
         the same between client incarnations, this shares the same
         problem as that of using the timestamp of the software
         installation.

   As a security measure, the server MUST NOT cancel a client's leased
   state if the principal that established the state for a given id
   string is not the same as the principal issuing the SETCLIENTID.

   Note that SETCLIENTID (Section 16.33) and SETCLIENTID_CONFIRM
   (Section 16.34) have a secondary purpose of establishing the
   information the server needs to make callbacks to the client for the
   purpose of supporting delegations.  It is permitted to change this
   information via SETCLIENTID and SETCLIENTID_CONFIRM within the same
   incarnation of the client without removing the client's leased state.

Top      Up      ToC       Page 102 
   Once a SETCLIENTID and SETCLIENTID_CONFIRM sequence has successfully
   completed, the client uses the shorthand client identifier, of type
   clientid4, instead of the longer and less compact nfs_client_id4
   structure.  This shorthand client identifier (a client ID) is
   assigned by the server and should be chosen so that it will not
   conflict with a client ID previously assigned by the server.  This
   applies across server restarts or reboots.  When a client ID is
   presented to a server and that client ID is not recognized, as would
   happen after a server reboot, the server will reject the request with
   the error NFS4ERR_STALE_CLIENTID.  When this happens, the client must
   obtain a new client ID by use of the SETCLIENTID operation and then
   proceed to any other necessary recovery for the server reboot case
   (see Section 9.6.2).

   The client must also employ the SETCLIENTID operation when it
   receives an NFS4ERR_STALE_STATEID error using a stateid derived from
   its current client ID, since this also indicates a server reboot,
   which has invalidated the existing client ID (see Section 9.6.2 for
   details).

   See the detailed descriptions of SETCLIENTID (Section 16.33.4) and
   SETCLIENTID_CONFIRM (Section 16.34.4) for a complete specification of
   the operations.

9.1.2.  Server Release of Client ID

   If the server determines that the client holds no associated state
   for its client ID, the server may choose to release the client ID.
   The server may make this choice for an inactive client so that
   resources are not consumed by those intermittently active clients.
   If the client contacts the server after this release, the server must
   ensure that the client receives the appropriate error so that it will
   use the SETCLIENTID/SETCLIENTID_CONFIRM sequence to establish a new
   identity.  It should be clear that the server must be very hesitant
   to release a client ID since the resulting work on the client to
   recover from such an event will be the same burden as if the server
   had failed and restarted.  Typically, a server would not release a
   client ID unless there had been no activity from that client for many
   minutes.

   Note that if the id string in a SETCLIENTID request is properly
   constructed, and if the client takes care to use the same principal
   for each successive use of SETCLIENTID, then, barring an active
   denial-of-service attack, NFS4ERR_CLID_INUSE should never be
   returned.

Top      Up      ToC       Page 103 
   However, client bugs, server bugs, or perhaps a deliberate change of
   the principal owner of the id string (such as the case of a client
   that changes security flavors, and under the new flavor there is no
   mapping to the previous owner) will in rare cases result in
   NFS4ERR_CLID_INUSE.

   In that event, when the server gets a SETCLIENTID for a client ID
   that currently has no state, or it has state but the lease has
   expired, rather than returning NFS4ERR_CLID_INUSE, the server MUST
   allow the SETCLIENTID and confirm the new client ID if followed by
   the appropriate SETCLIENTID_CONFIRM.

9.1.3.  Use of Seqids

   In several contexts, 32-bit sequence values called "seqids" are used
   as part of managing locking state.  Such values are used:

   o  To provide an ordering of locking-related operations associated
      with a particular lock-owner or open-owner.  See Section 9.1.7 for
      a detailed explanation.

   o  To define an ordered set of instances of a set of locks sharing a
      particular set of ownership characteristics.  See Section 9.1.4.2
      for a detailed explanation.

   Successive seqid values for the same object are normally arrived at
   by incrementing the current value by one.  This pattern continues
   until the seqid is incremented past NFS4_UINT32_MAX, in which case
   one (rather than zero) is to be the next seqid value.

   When two seqid values are to be compared to determine which of the
   two is later, the possibility of wraparound needs to be considered.
   In many cases, the values are such that simple numeric comparisons
   can be used.  For example, if the seqid values to be compared are
   both less than one million, the higher value can be considered the
   later.  On the other hand, if one of the values is at or near
   NFS_UINT32_MAX and the other is less than one million, then
   implementations can reasonably decide that the lower value has had
   one more wraparound and is thus, while numerically lower, actually
   later.

   Implementations can compare seqids in the presence of potential
   wraparound by adopting the reasonable assumption that the chain of
   increments from one to the other is shorter than 2**31.  So, if the
   difference between the two seqids is less than 2**31, then the lower
   seqid is to be treated as earlier.  If, however, the difference

Top      Up      ToC       Page 104 
   between the two seqids is greater than or equal to 2**31, then it can
   be assumed that the lower seqid has encountered one more wraparound
   and can be treated as later.

9.1.4.  Stateid Definition

   When the server grants a lock of any type (including opens,
   byte-range locks, and delegations), it responds with a unique stateid
   that represents a set of locks (often a single lock) for the same
   file, of the same type, and sharing the same ownership
   characteristics.  Thus, opens of the same file by different
   open-owners each have an identifying stateid.  Similarly, each set of
   byte-range locks on a file owned by a specific lock-owner has its own
   identifying stateid.  Delegations also have associated stateids by
   which they may be referenced.  The stateid is used as a shorthand
   reference to a lock or set of locks, and given a stateid, the server
   can determine the associated state-owner or state-owners (in the case
   of an open-owner/lock-owner pair) and the associated filehandle.
   When stateids are used, the current filehandle must be the one
   associated with that stateid.

   All stateids associated with a given client ID are associated with a
   common lease that represents the claim of those stateids and the
   objects they represent to be maintained by the server.  See
   Section 9.5 for a discussion of the lease.

   Each stateid must be unique to the server.  Many operations take a
   stateid as an argument but not a clientid, so the server must be able
   to infer the client from the stateid.

9.1.4.1.  Stateid Types

   With the exception of special stateids (see Section 9.1.4.3), each
   stateid represents locking objects of one of a set of types defined
   by the NFSv4 protocol.  Note that in all these cases, where we speak
   of a guarantee, it is understood there are situations such as a
   client restart, or lock revocation, that allow the guarantee to be
   voided.

   o  Stateids may represent opens of files.

      Each stateid in this case represents the OPEN state for a given
      client ID/open-owner/filehandle triple.  Such stateids are subject
      to change (with consequent incrementing of the stateid's seqid) in
      response to OPENs that result in upgrade and OPEN_DOWNGRADE
      operations.

Top      Up      ToC       Page 105 
   o  Stateids may represent sets of byte-range locks.

      All locks held on a particular file by a particular owner and all
      gotten under the aegis of a particular open file are associated
      with a single stateid, with the seqid being incremented whenever
      LOCK and LOCKU operations affect that set of locks.

   o  Stateids may represent file delegations, which are recallable
      guarantees by the server to the client that other clients will not
      reference, or will not modify, a particular file until the
      delegation is returned.

      A stateid represents a single delegation held by a client for a
      particular filehandle.

9.1.4.2.  Stateid Structure

   Stateids are divided into two fields: a 96-bit "other" field
   identifying the specific set of locks and a 32-bit "seqid" sequence
   value.  Except in the case of special stateids (see Section 9.1.4.3),
   a particular value of the "other" field denotes a set of locks of the
   same type (for example, byte-range locks, opens, or delegations), for
   a specific file or directory, and sharing the same ownership
   characteristics.  The seqid designates a specific instance of such a
   set of locks, and is incremented to indicate changes in such a set of
   locks, by either the addition or deletion of locks from the set, a
   change in the byte-range they apply to, or an upgrade or downgrade in
   the type of one or more locks.

   When such a set of locks is first created, the server returns a
   stateid with a seqid value of one.  On subsequent operations that
   modify the set of locks, the server is required to advance the
   seqid field by one whenever it returns a stateid for the same
   state-owner/file/type combination and the operation is one that might
   make some change in the set of locks actually designated.  In this
   case, the server will return a stateid with an "other" field the same
   as previously used for that state-owner/file/type combination, with
   an incremented seqid field.

   Seqids will be compared, by both the client and the server.  The
   client uses such comparisons to determine the order of operations,
   while the server uses them to determine whether the
   NFS4ERR_OLD_STATEID error is to be returned.  In all cases, the
   possibility of seqid wraparound needs to be taken into account, as
   discussed in Section 9.1.3.

Top      Up      ToC       Page 106 
9.1.4.3.  Special Stateids

   Stateid values whose "other" field is either all zeros or all ones
   are reserved.  They MUST NOT be assigned by the server but have
   special meanings defined by the protocol.  The particular meaning
   depends on whether the "other" field is all zeros or all ones and the
   specific value of the seqid field.

   The following combinations of "other" and seqid are defined in NFSv4:

   Anonymous Stateid:  When "other" and seqid are both zero, the stateid
      is treated as a special anonymous stateid, which can be used in
      READ, WRITE, and SETATTR requests to indicate the absence of any
      open state associated with the request.  When an anonymous stateid
      value is used, and an existing open denies the form of access
      requested, then access will be denied to the request.

   READ Bypass Stateid:  When "other" and seqid are both all ones, the
      stateid is a special READ bypass stateid.  When this value is used
      in WRITE or SETATTR, it is treated like the anonymous value.  When
      used in READ, the server MAY grant access, even if access would
      normally be denied to READ requests.

   If a stateid value is used that has all zeros or all ones in the
   "other" field but does not match one of the cases above, the server
   MUST return the error NFS4ERR_BAD_STATEID.

   Special stateids, unlike other stateids, are not associated with
   individual client IDs or filehandles and can be used with all valid
   client IDs and filehandles.

9.1.4.4.  Stateid Lifetime and Validation

   Stateids must remain valid until either a client restart or a server
   restart, or until the client returns all of the locks associated with
   the stateid by means of an operation such as CLOSE or DELEGRETURN.
   If the locks are lost due to revocation, as long as the client ID is
   valid, the stateid remains a valid designation of that revoked state.
   Stateids associated with byte-range locks are an exception.  They
   remain valid even if a LOCKU frees all remaining locks, so long as
   the open file with which they are associated remains open.

   It should be noted that there are situations in which the client's
   locks become invalid, without the client requesting they be returned.
   These include lease expiration and a number of forms of lock
   revocation within the lease period.  It is important to note that in
   these situations, the stateid remains valid and the client can use it
   to determine the disposition of the associated lost locks.

Top      Up      ToC       Page 107 
   An "other" value must never be reused for a different purpose (i.e.,
   different filehandle, owner, or type of locks) within the context of
   a single client ID.  A server may retain the "other" value for the
   same purpose beyond the point where it may otherwise be freed, but if
   it does so, it must maintain seqid continuity with previous values.

   One mechanism that may be used to satisfy the requirement that the
   server recognize invalid and out-of-date stateids is for the server
   to divide the "other" field of the stateid into two fields:

   o  An index into a table of locking-state structures.

   o  A generation number that is incremented on each allocation of a
      table entry for a particular use.

   And then store the following in each table entry:

   o  The client ID with which the stateid is associated.

   o  The current generation number for the (at most one) valid stateid
      sharing this index value.

   o  The filehandle of the file on which the locks are taken.

   o  An indication of the type of stateid (open, byte-range lock, file
      delegation).

   o  The last seqid value returned corresponding to the current "other"
      value.

   o  An indication of the current status of the locks associated with
      this stateid -- in particular, whether these have been revoked
      and, if so, for what reason.

   With this information, an incoming stateid can be validated and the
   appropriate error returned when necessary.  Special and non-special
   stateids are handled separately.  (See Section 9.1.4.3 for a
   discussion of special stateids.)

   When a stateid is being tested, and the "other" field is all zeros or
   all ones, a check that the "other" and seqid fields match a defined
   combination for a special stateid is done and the results determined
   as follows:

   o  If the "other" and seqid fields do not match a defined combination
      associated with a special stateid, the error NFS4ERR_BAD_STATEID
      is returned.

Top      Up      ToC       Page 108 
   o  If the combination is valid in general but is not appropriate to
      the context in which the stateid is used (e.g., an all-zero
      stateid is used when an open stateid is required in a LOCK
      operation), the error NFS4ERR_BAD_STATEID is also returned.

   o  Otherwise, the check is completed and the special stateid is
      accepted as valid.

   When a stateid is being tested, and the "other" field is neither all
   zeros nor all ones, the following procedure could be used to validate
   an incoming stateid and return an appropriate error, when necessary,
   assuming that the "other" field would be divided into a table index
   and an entry generation.  Note that the terms "earlier" and "later"
   used in connection with seqid comparison are to be understood as
   explained in Section 9.1.3.

   o  If the table index field is outside the range of the associated
      table, return NFS4ERR_BAD_STATEID.

   o  If the selected table entry is of a different generation than that
      specified in the incoming stateid, return NFS4ERR_BAD_STATEID.

   o  If the selected table entry does not match the current filehandle,
      return NFS4ERR_BAD_STATEID.

   o  If the stateid represents revoked state or state lost as a result
      of lease expiration, then return NFS4ERR_EXPIRED,
      NFS4ERR_BAD_STATEID, or NFS4ERR_ADMIN_REVOKED, as appropriate.

   o  If the stateid type is not valid for the context in which the
      stateid appears, return NFS4ERR_BAD_STATEID.  Note that a stateid
      may be valid in general but invalid for a particular operation,
      as, for example, when a stateid that doesn't represent byte-range
      locks is passed to the non-from_open case of LOCK or to LOCKU, or
      when a stateid that does not represent an open is passed to CLOSE
      or OPEN_DOWNGRADE.  In such cases, the server MUST return
      NFS4ERR_BAD_STATEID.

   o  If the seqid field is not zero and it is later than the current
      sequence value corresponding to the current "other" field, return
      NFS4ERR_BAD_STATEID.

   o  If the seqid field is earlier than the current sequence value
      corresponding to the current "other" field, return
      NFS4ERR_OLD_STATEID.

Top      Up      ToC       Page 109 
   o  Otherwise, the stateid is valid, and the table entry should
      contain any additional information about the type of stateid and
      information associated with that particular type of stateid, such
      as the associated set of locks (e.g., open-owner and lock-owner
      information), as well as information on the specific locks
      themselves, such as open modes and byte ranges.

9.1.4.5.  Stateid Use for I/O Operations

   Clients performing Input/Output (I/O) operations need to select an
   appropriate stateid based on the locks (including opens and
   delegations) held by the client and the various types of state-owners
   sending the I/O requests.  SETATTR operations that change the file
   size are treated like I/O operations in this regard.

   The following rules, applied in order of decreasing priority, govern
   the selection of the appropriate stateid.  In following these rules,
   the client will only consider locks of which it has actually received
   notification by an appropriate operation response or callback.

   o  If the client holds a delegation for the file in question, the
      delegation stateid SHOULD be used.

   o  Otherwise, if the entity corresponding to the lock-owner (e.g., a
      process) sending the I/O has a byte-range lock stateid for the
      associated open file, then the byte-range lock stateid for that
      lock-owner and open file SHOULD be used.

   o  If there is no byte-range lock stateid, then the OPEN stateid for
      the current open-owner, i.e., the OPEN stateid for the open file
      in question, SHOULD be used.

   o  Finally, if none of the above apply, then a special stateid SHOULD
      be used.

   Ignoring these rules may result in situations in which the server
   does not have information necessary to properly process the request.
   For example, when mandatory byte-range locks are in effect, if the
   stateid does not indicate the proper lock-owner, via a lock stateid,
   a request might be avoidably rejected.

   The server, however, should not try to enforce these ordering rules
   and should use whatever information is available to properly process
   I/O requests.  In particular, when a client has a delegation for a
   given file, it SHOULD take note of this fact in processing a request,
   even if it is sent with a special stateid.

Top      Up      ToC       Page 110 
9.1.4.6.  Stateid Use for SETATTR Operations

   In the case of SETATTR operations, a stateid is present.  In cases
   other than those that set the file size, the client may send either a
   special stateid or, when a delegation is held for the file in
   question, a delegation stateid.  While the server SHOULD validate the
   stateid and may use the stateid to optimize the determination as to
   whether a delegation is held, it SHOULD note the presence of a
   delegation even when a special stateid is sent, and MUST accept a
   valid delegation stateid when sent.

9.1.5.  Lock-Owner

   When requesting a lock, the client must present to the server the
   client ID and an identifier for the owner of the requested lock.
   These two fields comprise the lock-owner and are defined as follows:

   o  A client ID returned by the server as part of the client's use of
      the SETCLIENTID operation.

   o  A variable-length opaque array used to uniquely define the owner
      of a lock managed by the client.

      This may be a thread id, process id, or other unique value.

   When the server grants the lock, it responds with a unique stateid.
   The stateid is used as a shorthand reference to the lock-owner, since
   the server will be maintaining the correspondence between them.

9.1.6.  Use of the Stateid and Locking

   All READ, WRITE, and SETATTR operations contain a stateid.  For the
   purposes of this section, SETATTR operations that change the size
   attribute of a file are treated as if they are writing the area
   between the old and new size (i.e., the range truncated or added to
   the file by means of the SETATTR), even where SETATTR is not
   explicitly mentioned in the text.  The stateid passed to one of these
   operations must be one that represents an OPEN (e.g., via the
   open-owner), a set of byte-range locks, or a delegation, or it may be
   a special stateid representing anonymous access or the READ bypass
   stateid.

   If the state-owner performs a READ or WRITE in a situation in which
   it has established a lock or share reservation on the server (any
   OPEN constitutes a share reservation), the stateid (previously
   returned by the server) must be used to indicate what locks,
   including both byte-range locks and share reservations, are held by
   the state-owner.  If no state is established by the client -- either

Top      Up      ToC       Page 111 
   byte-range lock or share reservation -- the anonymous stateid is
   used.  Regardless of whether an anonymous stateid or a stateid
   returned by the server is used, if there is a conflicting share
   reservation or mandatory byte-range lock held on the file, the server
   MUST refuse to service the READ or WRITE operation.

   Share reservations are established by OPEN operations and by their
   nature are mandatory in that when the OPEN denies READ or WRITE
   operations, that denial results in such operations being rejected
   with error NFS4ERR_LOCKED.  Byte-range locks may be implemented by
   the server as either mandatory or advisory, or the choice of
   mandatory or advisory behavior may be determined by the server on the
   basis of the file being accessed (for example, some UNIX-based
   servers support a "mandatory lock bit" on the mode attribute such
   that if set, byte-range locks are required on the file before I/O is
   possible).  When byte-range locks are advisory, they only prevent the
   granting of conflicting lock requests and have no effect on READs or
   WRITEs.  Mandatory byte-range locks, however, prevent conflicting I/O
   operations.  When they are attempted, they are rejected with
   NFS4ERR_LOCKED.  When the client gets NFS4ERR_LOCKED on a file it
   knows it has the proper share reservation for, it will need to issue
   a LOCK request on the region of the file that includes the region the
   I/O was to be performed on, with an appropriate locktype (i.e.,
   READ*_LT for a READ operation, WRITE*_LT for a WRITE operation).

   With NFSv3, there was no notion of a stateid, so there was no way to
   tell if the application process of the client sending the READ or
   WRITE operation had also acquired the appropriate byte-range lock on
   the file.  Thus, there was no way to implement mandatory locking.
   With the stateid construct, this barrier has been removed.

   Note that for UNIX environments that support mandatory file locking,
   the distinction between advisory and mandatory locking is subtle.  In
   fact, advisory and mandatory byte-range locks are exactly the same
   insofar as the APIs and requirements on implementation are concerned.
   If the mandatory lock attribute is set on the file, the server checks
   to see if the lock-owner has an appropriate shared (read) or
   exclusive (write) byte-range lock on the region it wishes to read or
   write to.  If there is no appropriate lock, the server checks if
   there is a conflicting lock (which can be done by attempting to
   acquire the conflicting lock on behalf of the lock-owner and, if
   successful, release the lock after the READ or WRITE is done), and if
   there is, the server returns NFS4ERR_LOCKED.

   For Windows environments, there are no advisory byte-range locks, so
   the server always checks for byte-range locks during I/O requests.

Top      Up      ToC       Page 112 
   Thus, the NFSv4 LOCK operation does not need to distinguish between
   advisory and mandatory byte-range locks.  It is the NFSv4 server's
   processing of the READ and WRITE operations that introduces the
   distinction.

   Every stateid other than the special stateid values noted in this
   section, whether returned by an OPEN-type operation (i.e., OPEN,
   OPEN_DOWNGRADE) or by a LOCK-type operation (i.e., LOCK or LOCKU),
   defines an access mode for the file (i.e., READ, WRITE, or
   READ-WRITE) as established by the original OPEN that began the
   stateid sequence, and as modified by subsequent OPENs and
   OPEN_DOWNGRADEs within that stateid sequence.  When a READ, WRITE, or
   SETATTR that specifies the size attribute is done, the operation is
   subject to checking against the access mode to verify that the
   operation is appropriate given the OPEN with which the operation is
   associated.

   In the case of WRITE-type operations (i.e., WRITEs and SETATTRs that
   set size), the server must verify that the access mode allows writing
   and return an NFS4ERR_OPENMODE error if it does not.  In the case of
   READ, the server may perform the corresponding check on the access
   mode, or it may choose to allow READ on opens for WRITE only, to
   accommodate clients whose write implementation may unavoidably do
   reads (e.g., due to buffer cache constraints).  However, even if
   READs are allowed in these circumstances, the server MUST still check
   for locks that conflict with the READ (e.g., another open specifying
   denial of READs).  Note that a server that does enforce the access
   mode check on READs need not explicitly check for conflicting share
   reservations since the existence of OPEN for read access guarantees
   that no conflicting share reservation can exist.

   A READ bypass stateid MAY allow READ operations to bypass locking
   checks at the server.  However, WRITE operations with a READ bypass
   stateid MUST NOT bypass locking checks and are treated exactly the
   same as if an anonymous stateid were used.

   A lock may not be granted while a READ or WRITE operation using one
   of the special stateids is being performed and the range of the lock
   request conflicts with the range of the READ or WRITE operation.  For
   the purposes of this paragraph, a conflict occurs when a shared lock
   is requested and a WRITE operation is being performed, or an
   exclusive lock is requested and either a READ or a WRITE operation is
   being performed.  A SETATTR that sets size is treated similarly to a
   WRITE as discussed above.

Top      Up      ToC       Page 113 
9.1.7.  Sequencing of Lock Requests

   Locking is different than most NFS operations as it requires
   "at-most-one" semantics that are not provided by ONC RPC.  ONC RPC
   over a reliable transport is not sufficient because a sequence of
   locking requests may span multiple TCP connections.  In the face of
   retransmission or reordering, lock or unlock requests must have a
   well-defined and consistent behavior.  To accomplish this, each lock
   request contains a sequence number that is a consecutively increasing
   integer.  Different state-owners have different sequences.  The
   server maintains the last sequence number (L) received and the
   response that was returned.  The server SHOULD assign a seqid value
   of one for the first request issued for any given state-owner.
   Subsequent values are arrived at by incrementing the seqid value,
   subject to wraparound as described in Section 9.1.3.

   Note that for requests that contain a sequence number, for each
   state-owner, there should be no more than one outstanding request.

   When a request is received, its sequence number (r) is compared to
   that of the last one received (L).  Only if it has the correct next
   sequence, normally L + 1, is the request processed beyond the point
   of seqid checking.  Given a properly functioning client, the response
   to (r) must have been received before the last request (L) was sent.
   If a duplicate of last request (r == L) is received, the stored
   response is returned.  If the sequence value received is any other
   value, it is rejected with the return of error NFS4ERR_BAD_SEQID.
   Sequence history is reinitialized whenever the SETCLIENTID/
   SETCLIENTID_CONFIRM sequence changes the client verifier.

   It is critical that the server maintain the last response sent to the
   client to provide a more reliable cache of duplicate non-idempotent
   requests than that of the traditional cache described in [Chet].  The
   traditional duplicate request cache uses a least recently used
   algorithm for removing unneeded requests.  However, the last lock
   request and response on a given state-owner must be cached as long as
   the lock state exists on the server.

   The client MUST advance the sequence number for the CLOSE, LOCK,
   LOCKU, OPEN, OPEN_CONFIRM, and OPEN_DOWNGRADE operations.  This is
   true even in the event that the previous operation that used the
   sequence number received an error.  The only exception to this rule
   is if the previous operation received one of the following errors:
   NFS4ERR_STALE_CLIENTID, NFS4ERR_STALE_STATEID, NFS4ERR_BAD_STATEID,
   NFS4ERR_BAD_SEQID, NFS4ERR_BADXDR, NFS4ERR_RESOURCE,
   NFS4ERR_NOFILEHANDLE, or NFS4ERR_MOVED.

Top      Up      ToC       Page 114 
9.1.8.  Recovery from Replayed Requests

   As described above, the sequence number is per state-owner.  As long
   as the server maintains the last sequence number received and follows
   the methods described above, there are no risks of a Byzantine router
   re-sending old requests.  The server need only maintain the
   (state-owner, sequence number) state as long as there are open files
   or closed files with locks outstanding.

   LOCK, LOCKU, OPEN, OPEN_DOWNGRADE, and CLOSE each contain a sequence
   number, and therefore the risk of the replay of these operations
   resulting in undesired effects is non-existent while the server
   maintains the state-owner state.

9.1.9.  Interactions of Multiple Sequence Values

   Some operations may have multiple sources of data for request
   sequence checking and retransmission determination.  Some operations
   have multiple sequence values associated with multiple types of
   state-owners.  In addition, such operations may also have a stateid
   with its own seqid value, that will be checked for validity.

   As noted above, there may be multiple sequence values to check.  The
   following rules should be followed by the server in processing these
   multiple sequence values within a single operation.

   o  When a sequence value associated with a state-owner is unavailable
      for checking because the state-owner is unknown to the server, it
      takes no part in the comparison.

   o  When any of the state-owner sequence values are invalid,
      NFS4ERR_BAD_SEQID is returned.  When a stateid sequence is
      checked, NFS4ERR_BAD_STATEID or NFS4ERR_OLD_STATEID is returned as
      appropriate, but NFS4ERR_BAD_SEQID has priority.

   o  When any one of the sequence values matches a previous request,
      for a state-owner, it is treated as a retransmission and not
      re-executed.  When the type of the operation does not match that
      originally used, NFS4ERR_BAD_SEQID is returned.  When the server
      can determine that the request differs from the original, it may
      return NFS4ERR_BAD_SEQID.

   o  When multiple sequence values match previous operations but the
      operations are not the same, NFS4ERR_BAD_SEQID is returned.

Top      Up      ToC       Page 115 
   o  When there are no sequence values available for comparison and the
      operation is an OPEN, the server indicates to the client that an
      OPEN_CONFIRM is required, unless it can conclusively determine
      that confirmation is not required (e.g., by knowing that no
      open-owner state has ever been released for the current clientid).

9.1.10.  Releasing State-Owner State

   When a particular state-owner no longer holds open or file locking
   state at the server, the server may choose to release the sequence
   number state associated with the state-owner.  The server may make
   this choice based on lease expiration, the reclamation of server
   memory, or other implementation-specific details.  Note that when
   this is done, a retransmitted request, normally identified by a
   matching state-owner sequence, may not be correctly recognized, so
   that the client will not receive the original response that it would
   have if the state-owner state was not released.

   If the server were able to be sure that a given state-owner would
   never again be used by a client, such an issue could not arise.  Even
   when the state-owner state is released and the client subsequently
   uses that state-owner, retransmitted requests will be detected as
   invalid and the request not executed, although the client may have a
   recovery path that is more complicated than simply getting the
   original response back transparently.

   In any event, the server is able to safely release state-owner state
   (in the sense that retransmitted requests will not be erroneously
   acted upon) when the state-owner is not currently being utilized by
   the client (i.e., there are no open files associated with an
   open-owner and no lock stateids associated with a lock-owner).  The
   server may choose to hold the state-owner state in order to simplify
   the recovery path, in the case in which retransmissions of currently
   active requests are received.  However, the period for which it
   chooses to hold this state is implementation specific.

   In the case that a LOCK, LOCKU, OPEN_DOWNGRADE, or CLOSE is
   retransmitted after the server has previously released the
   state-owner state, the server will find that the state-owner has no
   files open and an error will be returned to the client.  If the
   state-owner does have a file open, the stateid will not match and
   again an error is returned to the client.

Top      Up      ToC       Page 116 
9.1.11.  Use of Open Confirmation

   In the case that an OPEN is retransmitted and the open-owner is being
   used for the first time or the open-owner state has been previously
   released by the server, the use of the OPEN_CONFIRM operation will
   prevent incorrect behavior.  When the server observes the use of the
   open-owner for the first time, it will direct the client to perform
   the OPEN_CONFIRM for the corresponding OPEN.  This sequence
   establishes the use of an open-owner and associated sequence number.
   Since the OPEN_CONFIRM sequence connects a new open-owner on the
   server with an existing open-owner on a client, the sequence number
   may have any valid (i.e., non-zero) value.  The OPEN_CONFIRM step
   assures the server that the value received is the correct one.  (See
   Section 16.18 for further details.)

   There are a number of situations in which the requirement to confirm
   an OPEN would pose difficulties for the client and server, in that
   they would be prevented from acting in a timely fashion on
   information received, because that information would be provisional,
   subject to deletion upon non-confirmation.  Fortunately, these are
   situations in which the server can avoid the need for confirmation
   when responding to open requests.  The two constraints are:

   o  The server must not bestow a delegation for any open that would
      require confirmation.

   o  The server MUST NOT require confirmation on a reclaim-type open
      (i.e., one specifying claim type CLAIM_PREVIOUS or
      CLAIM_DELEGATE_PREV).

   These constraints are related in that reclaim-type opens are the only
   ones in which the server may be required to send a delegation.  For
   CLAIM_NULL, sending the delegation is optional, while for
   CLAIM_DELEGATE_CUR, no delegation is sent.

   Delegations being sent with an open requiring confirmation are
   troublesome because recovering from non-confirmation adds undue
   complexity to the protocol, while requiring confirmation on reclaim-
   type opens poses difficulties in that the inability to resolve the
   status of the reclaim until lease expiration may make it difficult to
   have timely determination of the set of locks being reclaimed (since
   the grace period may expire).

   Requiring open confirmation on reclaim-type opens is avoidable
   because of the nature of the environments in which such opens are
   done.  For CLAIM_PREVIOUS opens, this is immediately after server
   reboot, so there should be no time for open-owners to be created,
   found to be unused, and recycled.  For CLAIM_DELEGATE_PREV opens,

Top      Up      ToC       Page 117 
   we are dealing with either a client reboot situation or a network
   partition resulting in deletion of lease state (and returning
   NFS4ERR_EXPIRED).  A server that supports delegations can be sure
   that no open-owners for that client have been recycled since client
   initialization or deletion of lease state and thus can be confident
   that confirmation will not be required.

9.2.  Lock Ranges

   The protocol allows a lock-owner to request a lock with a byte range
   and then either upgrade or unlock a sub-range of the initial lock.
   It is expected that this will be an uncommon type of request.  In any
   case, servers or server file systems may not be able to support
   sub-range lock semantics.  In the event that a server receives a
   locking request that represents a sub-range of current locking state
   for the lock-owner, the server is allowed to return the error
   NFS4ERR_LOCK_RANGE to signify that it does not support sub-range lock
   operations.  Therefore, the client should be prepared to receive this
   error and, if appropriate, report the error to the requesting
   application.

   The client is discouraged from combining multiple independent locking
   ranges that happen to be adjacent into a single request, since the
   server may not support sub-range requests, and for reasons related to
   the recovery of file locking state in the event of server failure.
   As discussed in Section 9.6.2 below, the server may employ certain
   optimizations during recovery that work effectively only when the
   client's behavior during lock recovery is similar to the client's
   locking behavior prior to server failure.

9.3.  Upgrading and Downgrading Locks

   If a client has a write lock on a record, it can request an atomic
   downgrade of the lock to a read lock via the LOCK request, by setting
   the type to READ_LT.  If the server supports atomic downgrade, the
   request will succeed.  If not, it will return NFS4ERR_LOCK_NOTSUPP.
   The client should be prepared to receive this error and, if
   appropriate, report the error to the requesting application.

   If a client has a read lock on a record, it can request an atomic
   upgrade of the lock to a write lock via the LOCK request by setting
   the type to WRITE_LT or WRITEW_LT.  If the server does not support
   atomic upgrade, it will return NFS4ERR_LOCK_NOTSUPP.  If the upgrade
   can be achieved without an existing conflict, the request will
   succeed.  Otherwise, the server will return either NFS4ERR_DENIED or
   NFS4ERR_DEADLOCK.  The error NFS4ERR_DEADLOCK is returned if the
   client issued the LOCK request with the type set to WRITEW_LT and the

Top      Up      ToC       Page 118 
   server has detected a deadlock.  The client should be prepared to
   receive such errors and, if appropriate, report them to the
   requesting application.

9.4.  Blocking Locks

   Some clients require the support of blocking locks.  The NFSv4
   protocol must not rely on a callback mechanism and therefore is
   unable to notify a client when a previously denied lock has been
   granted.  Clients have no choice but to continually poll for the
   lock.  This presents a fairness problem.  Two new lock types are
   added, READW and WRITEW, and are used to indicate to the server that
   the client is requesting a blocking lock.  The server should maintain
   an ordered list of pending blocking locks.  When the conflicting lock
   is released, the server may wait the lease period for the first
   waiting client to re-request the lock.  After the lease period
   expires, the next waiting client request is allowed the lock.
   Clients are required to poll at an interval sufficiently small that
   it is likely to acquire the lock in a timely manner.  The server is
   not required to maintain a list of pending blocked locks, as it is
   not used to provide correct operation but only to increase fairness.
   Because of the unordered nature of crash recovery, storing of lock
   state to stable storage would be required to guarantee ordered
   granting of blocking locks.

   Servers may also note the lock types and delay returning denial of
   the request to allow extra time for a conflicting lock to be
   released, allowing a successful return.  In this way, clients can
   avoid the burden of needlessly frequent polling for blocking locks.
   The server should take care with the length of delay in the event
   that the client retransmits the request.

   If a server receives a blocking lock request, denies it, and then
   later receives a non-blocking request for the same lock, which is
   also denied, then it should remove the lock in question from its list
   of pending blocking locks.  Clients should use such a non-blocking
   request to indicate to the server that this is the last time they
   intend to poll for the lock, as may happen when the process
   requesting the lock is interrupted.  This is a courtesy to the
   server, to prevent it from unnecessarily waiting a lease period
   before granting other lock requests.  However, clients are not
   required to perform this courtesy, and servers must not depend on
   them doing so.  Also, clients must be prepared for the possibility
   that this final locking request will be accepted.


Next RFC Part