tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search     info

RFC 7530

 
 
 

Network File System (NFS) Version 4 Protocol

Part 11 of 14, p. 206 to 237
Prev RFC Part       Next RFC Part

 


prevText      Top      Up      ToC       Page 206 
14.  NFSv4 Requests

   For the NFSv4 RPC program, there are two traditional RPC procedures:
   NULL and COMPOUND.  All other functionality is defined as a set of
   operations, and these operations are defined in normal XDR/RPC syntax
   and semantics.  However, these operations are encapsulated within the
   COMPOUND procedure.  This requires that the client combine one or
   more of the NFSv4 operations into a single request.

   The NFS4_CALLBACK program is used to provide server-to-client
   signaling and is constructed in a fashion similar to the NFSv4
   program.  The procedures CB_NULL and CB_COMPOUND are defined in the
   same way as NULL and COMPOUND are within the NFS program.  The
   CB_COMPOUND request also encapsulates the remaining operations of the
   NFS4_CALLBACK program.  There is no predefined RPC program number for
   the NFS4_CALLBACK program.  It is up to the client to specify a
   program number in the "transient" program range.  The program and
   port numbers of the NFS4_CALLBACK program are provided by the client
   as part of the SETCLIENTID/SETCLIENTID_CONFIRM sequence.  The program
   and port can be changed by another SETCLIENTID/SETCLIENTID_CONFIRM
   sequence, and it is possible to use the sequence to change them
   within a client incarnation without removing relevant leased client
   state.

Top      Up      ToC       Page 207 
14.1.  COMPOUND Procedure

   The COMPOUND procedure provides the opportunity for better
   performance within high-latency networks.  The client can avoid
   cumulative latency of multiple RPCs by combining multiple dependent
   operations into a single COMPOUND procedure.  A COMPOUND operation
   may provide for protocol simplification by allowing the client to
   combine basic procedures into a single request that is customized for
   the client's environment.

   The CB_COMPOUND procedure precisely parallels the features of
   COMPOUND as described above.

   The basic structure of the COMPOUND procedure is:

   +-----+--------------+--------+-----------+-----------+-----------+--
   | tag | minorversion | numops | op + args | op + args | op + args |
   +-----+--------------+--------+-----------+-----------+-----------+--

   and the reply's structure is:

     +------------+-----+--------+-----------------------+--
     |last status | tag | numres | status + op + results |
     +------------+-----+--------+-----------------------+--

   The numops and numres fields, used in the depiction above, represent
   the count for the counted array encoding used to signify the number
   of arguments or results encoded in the request and response.  As per
   the XDR encoding, these counts must match exactly the number of
   operation arguments or results encoded.

14.2.  Evaluation of a COMPOUND Request

   The server will process the COMPOUND procedure by evaluating each of
   the operations within the COMPOUND procedure in order.  Each
   component operation consists of a 32-bit operation code, followed by
   the argument of length determined by the type of operation.  The
   results of each operation are encoded in sequence into a reply
   buffer.  The results of each operation are preceded by the opcode and
   a status code (normally zero).  If an operation results in a non-zero
   status code, the status will be encoded, evaluation of the COMPOUND
   sequence will halt, and the reply will be returned.  Note that
   evaluation stops even in the event of "non-error" conditions such as
   NFS4ERR_SAME.

Top      Up      ToC       Page 208 
   There are no atomicity requirements for the operations contained
   within the COMPOUND procedure.  The operations being evaluated as
   part of a COMPOUND request may be evaluated simultaneously with other
   COMPOUND requests that the server receives.

   A COMPOUND is not a transaction, and it is the client's
   responsibility to recover from any partially completed COMPOUND
   procedure.  These may occur at any point due to errors such as
   NFS4ERR_RESOURCE and NFS4ERR_DELAY.  Note that these errors can occur
   in an otherwise valid operation string.  Further, a server reboot
   that occurs in the middle of processing a COMPOUND procedure may
   leave the client with the difficult task of determining how far
   COMPOUND processing has proceeded.  Therefore, the client should
   avoid overly complex COMPOUND procedures in the event of the failure
   of an operation within the procedure.

   Each operation assumes a current filehandle and a saved filehandle
   that are available as part of the execution context of the COMPOUND
   request.  Operations may set, change, or return the current
   filehandle.  The saved filehandle is used for temporary storage of a
   filehandle value and as operands for the RENAME and LINK operations.

14.3.  Synchronous Modifying Operations

   NFSv4 operations that modify the file system are synchronous.  When
   an operation is successfully completed at the server, the client can
   trust that any data associated with the request is now in stable
   storage (the one exception is in the case of the file data in a WRITE
   operation with the UNSTABLE4 option specified).

   This implies that any previous operations within the same COMPOUND
   request are also reflected in stable storage.  This behavior enables
   the client's ability to recover from a partially executed COMPOUND
   request that may have resulted from the failure of the server.  For
   example, if a COMPOUND request contains operations A and B and the
   server is unable to send a response to the client, then depending on
   the progress the server made in servicing the request, the result of
   both operations may be reflected in stable storage or just
   operation A may be reflected.  The server must not have just the
   results of operation B in stable storage.

14.4.  Operation Values

   The operations encoded in the COMPOUND procedure are identified by
   operation values.  To avoid overlap with the RPC procedure numbers,
   operations 0 (zero) and 1 are not defined.  Operation 2 is not
   defined but is reserved for future use with minor versioning.

Top      Up      ToC       Page 209 
15.  NFSv4 Procedures

15.1.  Procedure 0: NULL - No Operation

15.1.1.  SYNOPSIS

     <null>

15.1.2.  ARGUMENT

     void;

15.1.3.  RESULT

     void;

15.1.4.  DESCRIPTION

   Standard NULL procedure.  Void argument, void response.  This
   procedure has no functionality associated with it.  Because of this,
   it is sometimes used to measure the overhead of processing a service
   request.  Therefore, the server should ensure that no unnecessary
   work is done in servicing this procedure.

Top      Up      ToC       Page 210 
15.2.  Procedure 1: COMPOUND - COMPOUND Operations

15.2.1.  SYNOPSIS

     compoundargs -> compoundres

15.2.2.  ARGUMENT

     union nfs_argop4 switch (nfs_opnum4 argop) {
             case <OPCODE>: <argument>;
             ...
     };

   struct COMPOUND4args {
           utf8str_cs      tag;
           uint32_t        minorversion;
           nfs_argop4      argarray<>;
   };

15.2.3.  RESULT

     union nfs_resop4 switch (nfs_opnum4 resop) {
             case <OPCODE>: <argument>;
             ...
     };

   struct COMPOUND4res {
           nfsstat4        status;
           utf8str_cs      tag;
           nfs_resop4      resarray<>;
   };

15.2.4.  DESCRIPTION

   The COMPOUND procedure is used to combine one or more of the NFS
   operations into a single RPC request.  The main NFS RPC program has
   two main procedures: NULL and COMPOUND.  All other operations use the
   COMPOUND procedure as a wrapper.

   The COMPOUND procedure is used to combine individual operations into
   a single RPC request.  The server interprets each of the operations
   in turn.  If an operation is executed by the server and the status of
   that operation is NFS4_OK, then the next operation in the COMPOUND
   procedure is executed.  The server continues this process until there
   are no more operations to be executed or one of the operations has a
   status value other than NFS4_OK.

Top      Up      ToC       Page 211 
   In the processing of the COMPOUND procedure, the server may find that
   it does not have the available resources to execute any or all of the
   operations within the COMPOUND sequence.  In this case, the error
   NFS4ERR_RESOURCE will be returned for the particular operation within
   the COMPOUND procedure where the resource exhaustion occurred.  This
   assumes that all previous operations within the COMPOUND sequence
   have been evaluated successfully.  The results for all of the
   evaluated operations must be returned to the client.

   The server will generally choose between two methods of decoding the
   client's request.  The first would be the traditional one-pass XDR
   decode, in which decoding of the entire COMPOUND precedes execution
   of any operation within it.  If there is an XDR decoding error in
   this case, an RPC XDR decode error would be returned.  The second
   method would be to make an initial pass to decode the basic COMPOUND
   request and then to XDR decode each of the individual operations, as
   the server is ready to execute it.  In this case, the server may
   encounter an XDR decode error during such an operation decode, after
   previous operations within the COMPOUND have been executed.  In this
   case, the server would return the error NFS4ERR_BADXDR to signify the
   decode error.

   The COMPOUND arguments contain a minorversion field.  The initial and
   default value for this field is 0 (zero).  This field will be used by
   future minor versions such that the client can communicate to the
   server what minor version is being requested.  If the server receives
   a COMPOUND procedure with a minorversion field value that it does not
   support, the server MUST return an error of
   NFS4ERR_MINOR_VERS_MISMATCH and a zero-length resultdata array.

   Contained within the COMPOUND results is a status field.  If the
   results array length is non-zero, this status must be equivalent to
   the status of the last operation that was executed within the
   COMPOUND procedure.  Therefore, if an operation incurred an error,
   then the status value will be the same error value as is being
   returned for the operation that failed.

   Note that operations 0 (zero), 1 (one), and 2 (two) are not defined
   for the COMPOUND procedure.  It is possible that the server receives
   a request that contains an operation that is less than the first
   legal operation (OP_ACCESS) or greater than the last legal operation
   (OP_RELEASE_LOCKOWNER).  In this case, the server's response will
   encode the opcode OP_ILLEGAL rather than the illegal opcode of the
   request.  The status field in the ILLEGAL return results will be set
   to NFS4ERR_OP_ILLEGAL.  The COMPOUND procedure's return results will
   also be NFS4ERR_OP_ILLEGAL.

Top      Up      ToC       Page 212 
   The definition of the "tag" in the request is left to the
   implementer.  It may be used to summarize the content of the COMPOUND
   request for the benefit of packet sniffers and engineers debugging
   implementations.  However, the value of "tag" in the response SHOULD
   be the same value as the value provided in the request.  This applies
   to the tag field of the CB_COMPOUND procedure as well.

15.2.4.1.  Current Filehandle

   The current filehandle and the saved filehandle are used throughout
   the protocol.  Most operations implicitly use the current filehandle
   as an argument, and many set the current filehandle as part of the
   results.  The combination of client-specified sequences of operations
   and current and saved filehandle arguments and results allows for
   greater protocol flexibility.  The best or easiest example of current
   filehandle usage is a sequence like the following:

                        PUTFH fh1              {fh1}
                        LOOKUP "compA"         {fh2}
                        GETATTR                {fh2}
                        LOOKUP "compB"         {fh3}
                        GETATTR                {fh3}
                        LOOKUP "compC"         {fh4}
                        GETATTR                {fh4}
                        GETFH

                    Figure 1: Filehandle Usage Example

   In this example, the PUTFH (Section 16.20) operation explicitly sets
   the current filehandle value, while the result of each LOOKUP
   operation sets the current filehandle value to the resultant file
   system object.  Also, the client is able to insert GETATTR operations
   using the current filehandle as an argument.

   The PUTROOTFH (Section 16.22) and PUTPUBFH (Section 16.21) operations
   also set the current filehandle.  The above example would replace
   "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in
   order to achieve the same effect (on the assumption that "compA" is
   directly below the root of the namespace).

   Along with the current filehandle, there is a saved filehandle.
   While the current filehandle is set as the result of operations like
   LOOKUP, the saved filehandle must be set directly with the use of the
   SAVEFH operation.  The SAVEFH operation copies the current filehandle
   value to the saved value.  The saved filehandle value is used in
   combination with the current filehandle value for the LINK and RENAME
   operations.  The RESTOREFH operation will copy the saved filehandle

Top      Up      ToC       Page 213 
   value to the current filehandle value; as a result, the saved
   filehandle value may be used as a sort of "scratch" area for the
   client's series of operations.

15.2.5.  IMPLEMENTATION

   Since an error of any type may occur after only a portion of the
   operations have been evaluated, the client must be prepared to
   recover from any failure.  If the source of an NFS4ERR_RESOURCE error
   was a complex or lengthy set of operations, it is likely that if the
   number of operations were reduced the server would be able to
   evaluate them successfully.  Therefore, the client is responsible for
   dealing with this type of complexity in recovery.

   A single compound should not contain multiple operations that have
   different values for the clientid field used in OPEN, LOCK, or RENEW.
   This can cause confusion in cases in which operations that do not
   contain clientids have potential interactions with operations that
   do.  When only a single clientid has been used, it is clear what
   client is being referenced.  For a particular example involving the
   interaction of OPEN and GETATTR, see Section 16.16.6.

Top      Up      ToC       Page 214 
16.  NFSv4 Operations

16.1.  Operation 3: ACCESS - Check Access Rights

16.1.1.  SYNOPSIS

     (cfh), accessreq -> supported, accessrights

16.1.2.  ARGUMENT

   const ACCESS4_READ      = 0x00000001;
   const ACCESS4_LOOKUP    = 0x00000002;
   const ACCESS4_MODIFY    = 0x00000004;
   const ACCESS4_EXTEND    = 0x00000008;
   const ACCESS4_DELETE    = 0x00000010;
   const ACCESS4_EXECUTE   = 0x00000020;

   struct ACCESS4args {
           /* CURRENT_FH: object */
           uint32_t        access;
   };

16.1.3.  RESULT

   struct ACCESS4resok {
           uint32_t        supported;
           uint32_t        access;
   };

   union ACCESS4res switch (nfsstat4 status) {
    case NFS4_OK:
            ACCESS4resok   resok4;
    default:
            void;
   };

16.1.4.  DESCRIPTION

   ACCESS determines the access rights that a user, as identified by the
   credentials in the RPC request, has with respect to the file system
   object specified by the current filehandle.  The client encodes the
   set of access rights that are to be checked in the bitmask "access".
   The server checks the permissions encoded in the bitmask.  If a
   status of NFS4_OK is returned, two bitmasks are included in the
   response.  The first, "supported", represents the access rights for
   which the server can verify reliably.  The second, "access",
   represents the access rights available to the user for the filehandle
   provided.  On success, the current filehandle retains its value.

Top      Up      ToC       Page 215 
   Note that the supported field will contain only as many values as
   were originally sent in the arguments.  For example, if the client
   sends an ACCESS operation with only the ACCESS4_READ value set and
   the server supports this value, the server will return only
   ACCESS4_READ even if it could have reliably checked other values.

   The results of this operation are necessarily advisory in nature.  A
   return status of NFS4_OK and the appropriate bit set in the bitmask
   do not imply that such access will be allowed to the file system
   object in the future.  This is because access rights can be revoked
   by the server at any time.

   The following access permissions may be requested:

   ACCESS4_READ:  Read data from file or read a directory.

   ACCESS4_LOOKUP:  Look up a name in a directory (no meaning for
      non-directory objects).

   ACCESS4_MODIFY:  Rewrite existing file data or modify existing
      directory entries.

   ACCESS4_EXTEND:  Write new data or add directory entries.

   ACCESS4_DELETE:  Delete an existing directory entry.

   ACCESS4_EXECUTE:  Execute file (no meaning for a directory).

   On success, the current filehandle retains its value.

16.1.5.  IMPLEMENTATION

   In general, it is not sufficient for the client to attempt to deduce
   access permissions by inspecting the uid, gid, and mode fields in the
   file attributes or by attempting to interpret the contents of the ACL
   attribute.  This is because the server may perform uid or gid mapping
   or enforce additional access control restrictions.  It is also
   possible that the server may not be in the same ID space as the
   client.  In these cases (and perhaps others), the client cannot
   reliably perform an access check with only current file attributes.

   In the NFSv2 protocol, the only reliable way to determine whether an
   operation was allowed was to try it and see if it succeeded or
   failed.  Using the ACCESS operation in the NFSv4 protocol, the client
   can ask the server to indicate whether or not one or more classes of
   operations are permitted.  The ACCESS operation is provided to allow
   clients to check before doing a series of operations that might
   result in an access failure.  The OPEN operation provides a point

Top      Up      ToC       Page 216 
   where the server can verify access to the file object and the method
   to return that information to the client.  The ACCESS operation is
   still useful for directory operations or for use in the case where
   the UNIX API "access" is used on the client.

   The information returned by the server in response to an ACCESS call
   is not permanent.  It was correct at the exact time that the server
   performed the checks, but not necessarily afterward.  The server can
   revoke access permission at any time.

   The client should use the effective credentials of the user to build
   the authentication information in the ACCESS request used to
   determine access rights.  It is the effective user and group
   credentials that are used in subsequent READ and WRITE operations.

   Many implementations do not directly support the ACCESS4_DELETE
   permission.  Operating systems like UNIX will ignore the
   ACCESS4_DELETE bit if set on an access request on a non-directory
   object.  In these systems, delete permission on a file is determined
   by the access permissions on the directory in which the file resides,
   instead of being determined by the permissions of the file itself.
   Therefore, the mask returned enumerating which access rights can be
   supported will have the ACCESS4_DELETE value set to 0.  This
   indicates to the client that the server was unable to check that
   particular access right.  The ACCESS4_DELETE bit in the access mask
   returned will then be ignored by the client.

Top      Up      ToC       Page 217 
16.2.  Operation 4: CLOSE - Close File

16.2.1.  SYNOPSIS

     (cfh), seqid, open_stateid -> open_stateid

16.2.2.  ARGUMENT

   struct CLOSE4args {
           /* CURRENT_FH: object */
           seqid4          seqid;
           stateid4        open_stateid;
   };

16.2.3.  RESULT

   union CLOSE4res switch (nfsstat4 status) {
    case NFS4_OK:
            stateid4       open_stateid;
    default:
            void;
   };

16.2.4.  DESCRIPTION

   The CLOSE operation releases share reservations for the regular or
   named attribute file as specified by the current filehandle.  The
   share reservations and other state information released at the server
   as a result of this CLOSE are only associated with the supplied
   stateid.  The sequence id provides for the correct ordering.  State
   associated with other OPENs is not affected.

   If byte-range locks are held, the client SHOULD release all locks
   before issuing a CLOSE.  The server MAY free all outstanding locks on
   CLOSE, but some servers may not support the CLOSE of a file that
   still has byte-range locks held.  The server MUST return failure if
   any locks would exist after the CLOSE.

   On success, the current filehandle retains its value.

16.2.5.  IMPLEMENTATION

   Even though CLOSE returns a stateid, this stateid is not useful to
   the client and should be treated as deprecated.  CLOSE "shuts down"
   the state associated with all OPENs for the file by a single
   open-owner.  As noted above, CLOSE will either release all file
   locking state or return an error.  Therefore, the stateid returned by
   CLOSE is not useful for the operations that follow.

Top      Up      ToC       Page 218 
16.3.  Operation 5: COMMIT - Commit Cached Data

16.3.1.  SYNOPSIS

     (cfh), offset, count -> verifier

16.3.2.  ARGUMENT

   struct COMMIT4args {
           /* CURRENT_FH: file */
           offset4         offset;
           count4          count;
   };

16.3.3.  RESULT

   struct COMMIT4resok {
           verifier4       writeverf;
   };

   union COMMIT4res switch (nfsstat4 status) {
    case NFS4_OK:
            COMMIT4resok   resok4;
    default:
            void;
   };

16.3.4.  DESCRIPTION

   The COMMIT operation forces or flushes data to stable storage for the
   file specified by the current filehandle.  The flushed data is that
   which was previously written with a WRITE operation that had the
   stable field set to UNSTABLE4.

   The offset specifies the position within the file where the flush is
   to begin.  An offset value of 0 (zero) means to flush data starting
   at the beginning of the file.  The count specifies the number of
   bytes of data to flush.  If count is 0 (zero), a flush from the
   offset to the end of the file is done.

   The server returns a write verifier upon successful completion of the
   COMMIT.  The write verifier is used by the client to determine if the
   server has restarted or rebooted between the initial WRITE(s) and the
   COMMIT.  The client does this by comparing the write verifier
   returned from the initial writes and the verifier returned by the
   COMMIT operation.  The server must vary the value of the write
   verifier at each server event or instantiation that may lead to a

Top      Up      ToC       Page 219 
   loss of uncommitted data.  Most commonly, this occurs when the server
   is rebooted; however, other events at the server may result in
   uncommitted data loss as well.

   On success, the current filehandle retains its value.

16.3.5.  IMPLEMENTATION

   The COMMIT operation is similar in operation and semantics to the
   POSIX fsync() [fsync] system call that synchronizes a file's state
   with the disk (file data and metadata are flushed to disk or stable
   storage).  COMMIT performs the same operation for a client, flushing
   any unsynchronized data and metadata on the server to the server's
   disk or stable storage for the specified file.  Like fsync(), it may
   be that there is some modified data or no modified data to
   synchronize.  The data may have been synchronized by the server's
   normal periodic buffer synchronization activity.  COMMIT should
   return NFS4_OK, unless there has been an unexpected error.

   COMMIT differs from fsync() in that it is possible for the client to
   flush a range of the file (most likely triggered by a buffer-
   reclamation scheme on the client before the file has been completely
   written).

   The server implementation of COMMIT is reasonably simple.  If the
   server receives a full file COMMIT request that is starting at offset
   0 and count 0, it should do the equivalent of fsync()'ing the file.
   Otherwise, it should arrange to have the cached data in the range
   specified by offset and count to be flushed to stable storage.  In
   both cases, any metadata associated with the file must be flushed to
   stable storage before returning.  It is not an error for there to be
   nothing to flush on the server.  This means that the data and
   metadata that needed to be flushed have already been flushed or lost
   during the last server failure.

   The client implementation of COMMIT is a little more complex.  There
   are two reasons for wanting to commit a client buffer to stable
   storage.  The first is that the client wants to reuse a buffer.  In
   this case, the offset and count of the buffer are sent to the server
   in the COMMIT request.  The server then flushes any cached data based
   on the offset and count, and flushes any metadata associated with the
   file.  It then returns the status of the flush and the write
   verifier.  The other reason for the client to generate a COMMIT is
   for a full file flush, such as may be done at CLOSE.  In this case,
   the client would gather all of the buffers for this file that contain
   uncommitted data, do the COMMIT operation with an offset of 0 and
   count of 0, and then free all of those buffers.  Any other dirty
   buffers would be sent to the server in the normal fashion.

Top      Up      ToC       Page 220 
   After a buffer is written by the client with the stable parameter set
   to UNSTABLE4, the buffer must be considered modified by the client
   until the buffer has been either flushed via a COMMIT operation or
   written via a WRITE operation with the stable parameter set to
   FILE_SYNC4 or DATA_SYNC4.  This is done to prevent the buffer from
   being freed and reused before the data can be flushed to stable
   storage on the server.

   When a response is returned from either a WRITE or a COMMIT operation
   and it contains a write verifier that is different than previously
   returned by the server, the client will need to retransmit all of the
   buffers containing uncommitted cached data to the server.  How this
   is to be done is up to the implementer.  If there is only one buffer
   of interest, then it should probably be sent back over in a WRITE
   request with the appropriate stable parameter.  If there is more than
   one buffer, it might be worthwhile to retransmit all of the buffers
   in WRITE requests with the stable parameter set to UNSTABLE4 and then
   retransmit the COMMIT operation to flush all of the data on the
   server to stable storage.  The timing of these retransmissions is
   left to the implementer.

   The above description applies to page-cache-based systems as well as
   buffer-cache-based systems.  In those systems, the virtual memory
   system will need to be modified instead of the buffer cache.

Top      Up      ToC       Page 221 
16.4.  Operation 6: CREATE - Create a Non-regular File Object

16.4.1.  SYNOPSIS

     (cfh), name, type, attrs -> (cfh), cinfo, attrset

16.4.2.  ARGUMENT

   union createtype4 switch (nfs_ftype4 type) {
    case NF4LNK:
            linktext4 linkdata;
    case NF4BLK:
    case NF4CHR:
            specdata4 devdata;
    case NF4SOCK:
    case NF4FIFO:
    case NF4DIR:
            void;
    default:
            void;  /* server should return NFS4ERR_BADTYPE */
   };

   struct CREATE4args {
           /* CURRENT_FH: directory for creation */
           createtype4     objtype;
           component4      objname;
           fattr4          createattrs;
   };

16.4.3.  RESULT

   struct CREATE4resok {
           change_info4    cinfo;
           bitmap4         attrset;        /* attributes set */
   };

   union CREATE4res switch (nfsstat4 status) {
    case NFS4_OK:
            CREATE4resok resok4;
    default:
            void;
   };

Top      Up      ToC       Page 222 
16.4.4.  DESCRIPTION

   The CREATE operation creates a non-regular file object in a directory
   with a given name.  The OPEN operation is used to create a regular
   file.

   The objname specifies the name for the new object.  The objtype
   determines the type of object to be created: directory, symlink, etc.

   If an object of the same name already exists in the directory, the
   server will return the error NFS4ERR_EXIST.

   For the directory where the new file object was created, the server
   returns change_info4 information in cinfo.  With the atomic field of
   the change_info4 struct, the server will indicate if the before and
   after change attributes were obtained atomically with respect to the
   file object creation.

   If the objname is of zero length, NFS4ERR_INVAL will be returned.
   The objname is also subject to the normal UTF-8, character support,
   and name checks.  See Section 12.7 for further discussion.

   The current filehandle is replaced by that of the new object.

   The createattrs field specifies the initial set of attributes for the
   object.  The set of attributes may include any writable attribute
   valid for the object type.  When the operation is successful, the
   server will return to the client an attribute mask signifying which
   attributes were successfully set for the object.

   If createattrs includes neither the owner attribute nor an ACL with
   an ACE for the owner, and if the server's file system both supports
   and requires an owner attribute (or an owner ACE), then the server
   MUST derive the owner (or the owner ACE).  This would typically be
   from the principal indicated in the RPC credentials of the call, but
   the server's operating environment or file system semantics may
   dictate other methods of derivation.  Similarly, if createattrs
   includes neither the group attribute nor a group ACE, and if the
   server's file system both supports and requires the notion of a group
   attribute (or group ACE), the server MUST derive the group attribute
   (or the corresponding owner ACE) for the file.  This could be from
   the RPC's credentials, such as the group principal if the credentials
   include it (such as with AUTH_SYS), from the group identifier
   associated with the principal in the credentials (e.g., POSIX systems
   have a user database [getpwnam] that has the group identifier for
   every user identifier), inherited from the directory the object is

Top      Up      ToC       Page 223 
   created in, or whatever else the server's operating environment
   or file system semantics dictate.  This applies to the OPEN
   operation too.

   Conversely, it is possible the client will specify in createattrs an
   owner attribute, group attribute, or ACL that the principal indicated
   the RPC's credentials does not have permissions to create files for.
   The error to be returned in this instance is NFS4ERR_PERM.  This
   applies to the OPEN operation too.

16.4.5.  IMPLEMENTATION

   If the client desires to set attribute values after the create, a
   SETATTR operation can be added to the COMPOUND request so that the
   appropriate attributes will be set.

Top      Up      ToC       Page 224 
16.5.  Operation 7: DELEGPURGE - Purge Delegations Awaiting Recovery

16.5.1.  SYNOPSIS

     clientid ->

16.5.2.  ARGUMENT

   struct DELEGPURGE4args {
           clientid4       clientid;
   };

16.5.3.  RESULT

   struct DELEGPURGE4res {
           nfsstat4        status;
   };

16.5.4.  DESCRIPTION

   DELEGPURGE purges all of the delegations awaiting recovery for a
   given client.  This is useful for clients that do not commit
   delegation information to stable storage, to indicate that
   conflicting requests need not be delayed by the server awaiting
   recovery of delegation information.

   This operation is provided to support clients that record delegation
   information in stable storage on the client.  In this case,
   DELEGPURGE should be issued immediately after doing delegation
   recovery (using CLAIM_DELEGATE_PREV) on all delegations known to the
   client.  Doing so will notify the server that no additional
   delegations for the client will be recovered, allowing it to free
   resources and avoid delaying other clients who make requests that
   conflict with the unrecovered delegations.  All clients SHOULD use
   DELEGPURGE as part of recovery once it is known that no further
   CLAIM_DELEGATE_PREV recovery will be done.  This includes clients
   that do not record delegation information in stable storage, who
   would then do a DELEGPURGE immediately after SETCLIENTID_CONFIRM.

Top      Up      ToC       Page 225 
   The set of delegations known to the server and the client may be
   different.  The reasons for this include:

   o  A client may fail after making a request that resulted in
      delegation but before it received the results and committed them
      to the client's stable storage.

   o  A client may fail after deleting its indication that a delegation
      exists but before the delegation return is fully processed by the
      server.

   o  In the case in which the server and the client restart, the server
      may have limited persistent recording of delegations to a subset
      of those in existence.

   o  A client may have only persistently recorded information about a
      subset of delegations.

   The server MAY support DELEGPURGE, but its support or non-support
   should match that of CLAIM_DELEGATE_PREV:

   o  A server may support both DELEGPURGE and CLAIM_DELEGATE_PREV.

   o  A server may support neither DELEGPURGE nor CLAIM_DELEGATE_PREV.

   This fact allows a client starting up to determine if the server is
   prepared to support persistent storage of delegation information and
   thus whether it may use write-back caching to local persistent
   storage, relying on CLAIM_DELEGATE_PREV recovery to allow such
   changed data to be flushed safely to the server in the event of
   client restart.

Top      Up      ToC       Page 226 
16.6.  Operation 8: DELEGRETURN - Return Delegation

16.6.1.  SYNOPSIS

     (cfh), stateid ->

16.6.2.  ARGUMENT

   struct DELEGRETURN4args {
           /* CURRENT_FH: delegated file */
           stateid4        deleg_stateid;
   };

16.6.3.  RESULT

   struct DELEGRETURN4res {
           nfsstat4        status;
   };

16.6.4.  DESCRIPTION

   DELEGRETURN returns the delegation represented by the current
   filehandle and stateid.

   Delegations may be returned when recalled or voluntarily (i.e.,
   before the server has recalled them).  In either case, the client
   must properly propagate state changed under the context of the
   delegation to the server before returning the delegation.

Top      Up      ToC       Page 227 
16.7.  Operation 9: GETATTR - Get Attributes

16.7.1.  SYNOPSIS

     (cfh), attrbits -> attrbits, attrvals

16.7.2.  ARGUMENT

   struct GETATTR4args {
           /* CURRENT_FH: directory or file */
           bitmap4         attr_request;
   };

16.7.3.  RESULT

   struct GETATTR4resok {
           fattr4          obj_attributes;
   };

   union GETATTR4res switch (nfsstat4 status) {
    case NFS4_OK:
            GETATTR4resok  resok4;
    default:
            void;
   };

16.7.4.  DESCRIPTION

   The GETATTR operation will obtain attributes for the file system
   object specified by the current filehandle.  The client sets a bit in
   the bitmap argument for each attribute value that it would like the
   server to return.  The server returns an attribute bitmap that
   indicates the attribute values for which it was able to return
   values, followed by the attribute values ordered lowest attribute
   number first.

   The server MUST return a value for each attribute that the client
   requests if the attribute is supported by the server.  If the server
   does not support an attribute or cannot approximate a useful value,
   then it MUST NOT return the attribute value and MUST NOT set the
   attribute bit in the result bitmap.  The server MUST return an error
   if it supports an attribute on the target but cannot obtain its
   value.  In that case, no attribute values will be returned.

   File systems that are absent should be treated as having support for
   a very small set of attributes as described in Section 8.3.1 -- even
   if previously, when the file system was present, more attributes were
   supported.

Top      Up      ToC       Page 228 
   All servers MUST support the REQUIRED attributes, as specified in
   Section 5, for all file systems, with the exception of absent file
   systems.

   On success, the current filehandle retains its value.

16.7.5.  IMPLEMENTATION

   Suppose there is an OPEN_DELEGATE_WRITE delegation held by another
   client for the file in question, and size and/or change are among the
   set of attributes being interrogated.  The server has two choices.
   First, the server can obtain the actual current value of these
   attributes from the client holding the delegation by using the
   CB_GETATTR callback.  Second, the server, particularly when the
   delegated client is unresponsive, can recall the delegation in
   question.  The GETATTR MUST NOT proceed until one of the following
   occurs:

   o  The requested attribute values are returned in the response to
      CB_GETATTR.

   o  The OPEN_DELEGATE_WRITE delegation is returned.

   o  The OPEN_DELEGATE_WRITE delegation is revoked.

   Unless one of the above happens very quickly, one or more
   NFS4ERR_DELAY errors will be returned while a delegation is
   outstanding.

Top      Up      ToC       Page 229 
16.8.  Operation 10: GETFH - Get Current Filehandle

16.8.1.  SYNOPSIS

     (cfh) -> filehandle

16.8.2.  ARGUMENT

     /* CURRENT_FH: */
     void;

16.8.3.  RESULT

   struct GETFH4resok {
           nfs_fh4         object;
   };

   union GETFH4res switch (nfsstat4 status) {
    case NFS4_OK:
            GETFH4resok     resok4;
    default:
            void;
   };

16.8.4.  DESCRIPTION

   This operation returns the current filehandle value.

   On success, the current filehandle retains its value.

16.8.5.  IMPLEMENTATION

   Operations that change the current filehandle, like LOOKUP or CREATE,
   do not automatically return the new filehandle as a result.  For
   instance, if a client needs to look up a directory entry and obtain
   its filehandle, then the following request is needed.

     PUTFH  (directory filehandle)
     LOOKUP (entry name)
     GETFH

Top      Up      ToC       Page 230 
16.9.  Operation 11: LINK - Create Link to a File

16.9.1.  SYNOPSIS

     (sfh), (cfh), newname -> (cfh), cinfo

16.9.2.  ARGUMENT

   struct LINK4args {
           /* SAVED_FH: source object */
           /* CURRENT_FH: target directory */
           component4      newname;
   };

16.9.3.  RESULT

   struct LINK4resok {
           change_info4    cinfo;
   };

   union LINK4res switch (nfsstat4 status) {
    case NFS4_OK:
            LINK4resok resok4;
    default:
            void;
   };

16.9.4.  DESCRIPTION

   The LINK operation creates an additional newname for the file
   represented by the saved filehandle, as set by the SAVEFH operation,
   in the directory represented by the current filehandle.  The existing
   file and the target directory must reside within the same file system
   on the server.  On success, the current filehandle will continue to
   be the target directory.  If an object exists in the target directory
   with the same name as newname, the server must return NFS4ERR_EXIST.

   For the target directory, the server returns change_info4 information
   in cinfo.  With the atomic field of the change_info4 struct, the
   server will indicate if the before and after change attributes were
   obtained atomically with respect to the link creation.

   If newname has a length of 0 (zero), or if newname does not obey the
   UTF-8 definition, the error NFS4ERR_INVAL will be returned.

Top      Up      ToC       Page 231 
16.9.5.  IMPLEMENTATION

   Changes to any property of the "hard" linked files are reflected in
   all of the linked files.  When a link is made to a file, the
   attributes for the file should have a value for numlinks that is one
   greater than the value before the LINK operation.

   The statement "file and the target directory must reside within the
   same file system on the server" means that the fsid fields in the
   attributes for the objects are the same.  If they reside on different
   file systems, the error NFS4ERR_XDEV is returned.  This error may be
   returned by some servers when there is an internal partitioning of a
   file system that the LINK operation would violate.

   On some servers, "." and ".." are illegal values for newname, and the
   error NFS4ERR_BADNAME will be returned if they are specified.

   When the current filehandle designates a named attribute directory
   and the object to be linked (the saved filehandle) is not a named
   attribute for the same object, the error NFS4ERR_XDEV MUST be
   returned.  When the saved filehandle designates a named attribute and
   the current filehandle is not the appropriate named attribute
   directory, the error NFS4ERR_XDEV MUST also be returned.

   When the current filehandle designates a named attribute directory
   and the object to be linked (the saved filehandle) is a named
   attribute within that directory, the server MAY return the error
   NFS4ERR_NOTSUPP.

   In the case that newname is already linked to the file represented by
   the saved filehandle, the server will return NFS4ERR_EXIST.

   Note that symbolic links are created with the CREATE operation.

Top      Up      ToC       Page 232 
16.10.  Operation 12: LOCK - Create Lock

16.10.1.  SYNOPSIS

     (cfh) locktype, reclaim, offset, length, locker -> stateid

16.10.2.  ARGUMENT

   enum nfs_lock_type4 {
           READ_LT         = 1,
           WRITE_LT        = 2,
           READW_LT        = 3,    /* blocking read */
           WRITEW_LT       = 4     /* blocking write */
   };

   /*
    * For LOCK, transition from open_owner to new lock_owner
    */
   struct open_to_lock_owner4 {
           seqid4          open_seqid;
           stateid4        open_stateid;
           seqid4          lock_seqid;
           lock_owner4     lock_owner;
   };

   /*
    * For LOCK, existing lock_owner continues to request file locks
    */
   struct exist_lock_owner4 {
           stateid4        lock_stateid;
           seqid4          lock_seqid;
   };

   union locker4 switch (bool new_lock_owner) {
    case TRUE:
            open_to_lock_owner4     open_owner;
    case FALSE:
            exist_lock_owner4       lock_owner;
   };

Top      Up      ToC       Page 233 
   /*
    * LOCK/LOCKT/LOCKU: Record lock management
    */
   struct LOCK4args {
           /* CURRENT_FH: file */
           nfs_lock_type4  locktype;
           bool            reclaim;
           offset4         offset;
           length4         length;
           locker4         locker;
   };

16.10.3.  RESULT

   struct LOCK4denied {
           offset4         offset;
           length4         length;
           nfs_lock_type4  locktype;
           lock_owner4     owner;
   };

   struct LOCK4resok {
           stateid4        lock_stateid;
   };

   union LOCK4res switch (nfsstat4 status) {
    case NFS4_OK:
            LOCK4resok     resok4;
    case NFS4ERR_DENIED:
            LOCK4denied    denied;
    default:
            void;
   };

16.10.4.  DESCRIPTION

   The LOCK operation requests a byte-range lock for the byte range
   specified by the offset and length parameters.  The lock type is also
   specified to be one of the nfs_lock_type4s.  If this is a reclaim
   request, the reclaim parameter will be TRUE.

   Bytes in a file may be locked even if those bytes are not currently
   allocated to the file.  To lock the file from a specific offset
   through the end-of-file (no matter how long the file actually is),
   use a length field with all bits set to 1 (one).  If the length is
   zero, or if a length that is not all bits set to one is specified,
   and the length when added to the offset exceeds the maximum 64-bit
   unsigned integer value, the error NFS4ERR_INVAL will result.

Top      Up      ToC       Page 234 
   32-bit servers are servers that support locking for byte offsets that
   fit within 32 bits (i.e., less than or equal to NFS4_UINT32_MAX).  If
   the client specifies a range that overlaps one or more bytes beyond
   offset NFS4_UINT32_MAX but does not end at offset NFS4_UINT64_MAX,
   then such a 32-bit server MUST return the error NFS4ERR_BAD_RANGE.

   In the case that the lock is denied, the owner, offset, and length of
   a conflicting lock are returned.

   On success, the current filehandle retains its value.

16.10.5.  IMPLEMENTATION

   If the server is unable to determine the exact offset and length of
   the conflicting lock, the same offset and length that were provided
   in the arguments should be returned in the denied results.  Section 9
   contains a full description of this and the other file locking
   operations.

   LOCK operations are subject to permission checks and to checks
   against the access type of the associated file.  However, the
   specific rights and modes required for various types of locks
   reflect the semantics of the server-exported file system, and are not
   specified by the protocol.  For example, Windows 2000 allows a write
   lock of a file open for READ, while a POSIX-compliant system
   does not.

   When the client makes a lock request that corresponds to a range that
   the lock-owner has locked already (with the same or different lock
   type), or to a sub-region of such a range, or to a region that
   includes multiple locks already granted to that lock-owner, in whole
   or in part, and the server does not support such locking operations
   (i.e., does not support POSIX locking semantics), the server will
   return the error NFS4ERR_LOCK_RANGE.  In that case, the client may
   return an error, or it may emulate the required operations, using
   only LOCK for ranges that do not include any bytes already locked by
   that lock-owner and LOCKU of locks held by that lock-owner
   (specifying an exactly matching range and type).  Similarly, when the
   client makes a lock request that amounts to upgrading (changing from
   a read lock to a write lock) or downgrading (changing from a write
   lock to a read lock) an existing record lock and the server does not
   support such a lock, the server will return NFS4ERR_LOCK_NOTSUPP.
   Such operations may not perfectly reflect the required semantics in
   the face of conflicting lock requests from other clients.

   When a client holds an OPEN_DELEGATE_WRITE delegation, the client
   holding that delegation is assured that there are no opens by other
   clients.  Thus, there can be no conflicting LOCK operations from such

Top      Up      ToC       Page 235 
   clients.  Therefore, the client may be handling locking requests
   locally, without doing LOCK operations on the server.  If it does
   that, it must be prepared to update the lock status on the server by
   sending appropriate LOCK and LOCKU operations before returning the
   delegation.

   When one or more clients hold OPEN_DELEGATE_READ delegations, any
   LOCK operation where the server is implementing mandatory locking
   semantics MUST result in the recall of all such delegations.  The
   LOCK operation may not be granted until all such delegations are
   returned or revoked.  Except where this happens very quickly, one or
   more NFS4ERR_DELAY errors will be returned to requests made while the
   delegation remains outstanding.

   The locker argument specifies the lock-owner that is associated with
   the LOCK request.  The locker4 structure is a switched union that
   indicates whether the client has already created byte-range locking
   state associated with the current open file and lock-owner.  There
   are multiple cases to be considered, corresponding to possible
   combinations of whether locking state has been created for the
   current open file and lock-owner, and whether the boolean
   new_lock_owner is set.  In all of the cases, there is a lock_seqid
   specified, whether the lock-owner is specified explicitly or
   implicitly.  This seqid value is used for checking lock-owner
   sequencing/replay issues.  When the given lock-owner is not known to
   the server, this establishes an initial sequence value for the new
   lock-owner.

   o  In the case in which the state has been created and the boolean is
      false, the only part of the argument other than lock_seqid is just
      a stateid representing the set of locks associated with that open
      file and lock-owner.

   o  In the case in which the state has been created and the boolean is
      true, the server rejects the request with the error
      NFS4ERR_BAD_SEQID.  The only exception is where there is a
      retransmission of a previous request in which the boolean was
      true.  In this case, the lock_seqid will match the original
      request, and the response will reflect the final case, below.

   o  In the case where no byte-range locking state has been established
      and the boolean is true, the argument contains an
      open_to_lock_owner structure that specifies the stateid of the
      open file and the lock-owner to be used for the lock.  Note that
      although the open-owner is not given explicitly, the open_seqid
      associated with it is used to check for open-owner sequencing
      issues.  This case provides a method to use the established state
      of the open_stateid to transition to the use of a lock stateid.

Top      Up      ToC       Page 236 
16.11.  Operation 13: LOCKT - Test for Lock

16.11.1.  SYNOPSIS

     (cfh) locktype, offset, length, owner -> {void, NFS4ERR_DENIED ->
     owner}

16.11.2.  ARGUMENT

   struct LOCKT4args {
           /* CURRENT_FH: file */
           nfs_lock_type4  locktype;
           offset4         offset;
           length4         length;
           lock_owner4     owner;
   };

16.11.3.  RESULT

   union LOCKT4res switch (nfsstat4 status) {
    case NFS4ERR_DENIED:
            LOCK4denied    denied;
    case NFS4_OK:
            void;
    default:
            void;
   };

16.11.4.  DESCRIPTION

   The LOCKT operation tests the lock as specified in the arguments.  If
   a conflicting lock exists, the owner, offset, length, and type of the
   conflicting lock are returned; if no lock is held, nothing other than
   NFS4_OK is returned.  Lock types READ_LT and READW_LT are processed
   in the same way in that a conflicting lock test is done without
   regard to blocking or non-blocking.  The same is true for WRITE_LT
   and WRITEW_LT.

   The ranges are specified as for LOCK.  The NFS4ERR_INVAL and
   NFS4ERR_BAD_RANGE errors are returned under the same circumstances as
   for LOCK.

   On success, the current filehandle retains its value.

Top      Up      ToC       Page 237 
16.11.5.  IMPLEMENTATION

   If the server is unable to determine the exact offset and length of
   the conflicting lock, the same offset and length that were provided
   in the arguments should be returned in the denied results.  Section 9
   contains further discussion of the file locking mechanisms.

   LOCKT uses a lock_owner4, rather than a stateid4 as is used in LOCK,
   to identify the owner.  This is because the client does not have to
   open the file to test for the existence of a lock, so a stateid may
   not be available.

   The test for conflicting locks SHOULD exclude locks for the current
   lock-owner.  Note that since such locks are not examined the possible
   existence of overlapping ranges may not affect the results of LOCKT.
   If the server does examine locks that match the lock-owner for the
   purpose of range checking, NFS4ERR_LOCK_RANGE may be returned.  In
   the event that it returns NFS4_OK, clients may do a LOCK and receive
   NFS4ERR_LOCK_RANGE on the LOCK request because of the flexibility
   provided to the server.

   When a client holds an OPEN_DELEGATE_WRITE delegation, it may choose
   (see Section 16.10.5) to handle LOCK requests locally.  In such a
   case, LOCKT requests will similarly be handled locally.


Next RFC Part