tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Gloss.     Arch.     IMS     UICC    |    Misc.    |    search     info

RFC 3010


NFS version 4 Protocol

Part 4 of 8, p. 78 to 102
Prev RFC Part       Next RFC Part


prevText      Top      Up      ToC       Page 78 
9.4.  Open Delegation

   When a file is being OPENed, the server may delegate further handling
   of opens and closes for that file to the opening client.  Any such
   delegation is recallable, since the circumstances that allowed for
   the delegation are subject to change.  In particular, the server may
   receive a conflicting OPEN from another client, the server must
   recall the delegation before deciding whether the OPEN from the other
   client may be granted.  Making a delegation is up to the server and
   clients should not assume that any particular OPEN either will or
   will not result in an open delegation.  The following is a typical
   set of conditions that servers might use in deciding whether OPEN
   should be delegated:

   o  The client must be able to respond to the server's callback
      requests.  The server will use the CB_NULL procedure for a test of
      callback ability.

   o  The client must have responded properly to previous recalls.

   o  There must be no current open conflicting with the requested

   o  There should be no current delegation that conflicts with the
      delegation being requested.

   o  The probability of future conflicting open requests should be low
      based on the recent history of the file.

Top      Up      ToC       Page 79 
   o  The existence of any server-specific semantics of OPEN/CLOSE that
      would make the required handling incompatible with the prescribed
      handling that the delegated client would apply (see below).

   There are two types of open delegations, read and write.  A read open
   delegation allows a client to handle, on its own, requests to open a
   file for reading that do not deny read access to others.  Multiple
   read open delegations may be outstanding simultaneously and do not
   conflict.  A write open delegation allows the client to handle, on
   its own, all opens.  Only one write open delegation may exist for a
   given file at a given time and it is inconsistent with any read open

   When a client has a read open delegation, it may not make any changes
   to the contents or attributes of the file but it is assured that no
   other client may do so.  When a client has a write open delegation,
   it may modify the file data since no other client will be accessing
   the file's data.  The client holding a write delegation may only
   affect file attributes which are intimately connected with the file
   data:  object_size, time_modify, change.

   When a client has an open delegation, it does not send OPENs or
   CLOSEs to the server but updates the appropriate status internally.
   For a read open delegation, opens that cannot be handled locally
   (opens for write or that deny read access) must be sent to the

   When an open delegation is made, the response to the OPEN contains an
   open delegation structure which specifies the following:

   o  the type of delegation (read or write)

   o  space limitation information to control flushing of data on close
      (write open delegation only, see the section "Open Delegation and
      Data Caching")

   o  an nfsace4 specifying read and write permissions

   o  a stateid to represent the delegation for READ and WRITE

   The stateid is separate and distinct from the stateid for the OPEN
   proper.  The standard stateid, unlike the delegation stateid, is
   associated with a particular nfs_lockowner and will continue to be
   valid after the delegation is recalled and the file remains open.

Top      Up      ToC       Page 80 
   When a request internal to the client is made to open a file and open
   delegation is in effect, it will be accepted or rejected solely on
   the basis of the following conditions.  Any requirement for other
   checks to be made by the delegate should result in open delegation
   being denied so that the checks can be made by the server itself.

   o  The access and deny bits for the request and the file as described
      in the section "Share Reservations".

   o  The read and write permissions as determined below.

   The nfsace4 passed with delegation can be used to avoid frequent
   ACCESS calls.  The permission check should be as follows:

   o  If the nfsace4 indicates that the open may be done, then it should
      be granted without reference to the server.

   o  If the nfsace4 indicates that the open may not be done, then an
      ACCESS request must be sent to the server to obtain the definitive

   The server may return an nfsace4 that is more restrictive than the
   actual ACL of the file.  This includes an nfsace4 that specifies
   denial of all access.  Note that some common practices such as
   mapping the traditional user "root" to the user "nobody" may make it
   incorrect to return the actual ACL of the file in the delegation

   The use of delegation together with various other forms of caching
   creates the possibility that no server authentication will ever be
   performed for a given user since all of the user's requests might be
   satisfied locally.  Where the client is depending on the server for
   authentication, the client should be sure authentication occurs for
   each user by use of the ACCESS operation.  This should be the case
   even if an ACCESS operation would not be required otherwise.  As
   mentioned before, the server may enforce frequent authentication by
   returning an nfsace4 denying all access with every open delegation.

9.4.1.  Open Delegation and Data Caching

   OPEN delegation allows much of the message overhead associated with
   the opening and closing files to be eliminated.  An open when an open
   delegation is in effect does not require that a validation message be
   sent to the server.  The continued endurance of the "read open
   delegation" provides a guarantee that no OPEN for write and thus no
   write has occurred.  Similarly, when closing a file opened for write
   and if write open delegation is in effect, the data written does not
   have to be flushed to the server until the open delegation is

Top      Up      ToC       Page 81 
   recalled.  The continued endurance of the open delegation provides a
   guarantee that no open and thus no read or write has been done by
   another client.

   For the purposes of open delegation, READs and WRITEs done without an
   OPEN are treated as the functional equivalents of a corresponding
   type of OPEN.  This refers to the READs and WRITEs that use the
   special stateids consisting of all zero bits or all one bits.
   Therefore, READs or WRITEs with a special stateid done by another
   client will force the server to recall a write open delegation.  A
   WRITE with a special stateid done by another client will force a
   recall of read open delegations.

   With delegations, a client is able to avoid writing data to the
   server when the CLOSE of a file is serviced.  The CLOSE operation is
   the usual point at which the client is notified of a lack of stable
   storage for the modified file data generated by the application.  At
   the CLOSE, file data is written to the server and through normal
   accounting the server is able to determine if the available file
   system space for the data has been exceeded (i.e. server returns
   NFS4ERR_NOSPC or NFS4ERR_DQUOT).  This accounting includes quotas.
   The introduction of delegations requires that a alternative method be
   in place for the same type of communication to occur between client
   and server.

   In the delegation response, the server provides either the limit of
   the size of the file or the number of modified blocks and associated
   block size.  The server must ensure that the client will be able to
   flush data to the server of a size equal to that provided in the
   original delegation.  The server must make this assurance for all
   outstanding delegations.  Therefore, the server must be careful in
   its management of available space for new or modified data taking
   into account available file system space and any applicable quotas.
   The server can recall delegations as a result of managing the
   available file system space.  The client should abide by the server's
   state space limits for delegations.  If the client exceeds the stated
   limits for the delegation, the server's behavior is undefined.

   Based on server conditions, quotas or available file system space,
   the server may grant write open delegations with very restrictive
   space limitations.  The limitations may be defined in a way that will
   always force modified data to be flushed to the server on close.

   With respect to authentication, flushing modified data to the server
   after a CLOSE has occurred may be problematic.  For example, the user
   of the application may have logged off of the client and unexpired
   authentication credentials may not be present.  In this case, the
   client may need to take special care to ensure that local unexpired

Top      Up      ToC       Page 82 
   credentials will in fact be available.  This may be accomplished by
   tracking the expiration time of credentials and flushing data well in
   advance of their expiration or by making private copies of
   credentials to assure their availability when needed.

9.4.2.  Open Delegation and File Locks

   When a client holds a write open delegation, lock operations are
   performed locally.  This includes those required for mandatory file
   locking.  This can be done since the delegation implies that there
   can be no conflicting locks.  Similarly, all of the revalidations
   that would normally be associated with obtaining locks and the
   flushing of data associated with the releasing of locks need not be

9.4.3.  Recall of Open Delegation

   The following events necessitate recall of an open delegation:

   o  Potentially conflicting OPEN request (or READ/WRITE done with
      "special" stateid)

   o  SETATTR issued by another client

   o  REMOVE request for the file

   o  RENAME request for the file as either source or target of the

   Whether a RENAME of a directory in the path leading to the file
   results in recall of an open delegation depends on the semantics of
   the server file system.  If that file system denies such RENAMEs when
   a file is open, the recall must be performed to determine whether the
   file in question is, in fact, open.

   In addition to the situations above, the server may choose to recall
   open delegations at any time if resource constraints make it
   advisable to do so.  Clients should always be prepared for the
   possibility of recall.

   The server needs to employ special handling for a GETATTR where the
   target is a file that has a write open delegation in effect.  In this
   case, the client holding the delegation needs to be interrogated.
   The server will use a CB_GETATTR callback, if the GETATTR attribute
   bits include any of the attributes that a write open delegate may
   modify (object_size, time_modify, change).

Top      Up      ToC       Page 83 
   When a client receives a recall for an open delegation, it needs to
   update state on the server before returning the delegation.  These
   same updates must be done whenever a client chooses to return a
   delegation voluntarily.  The following items of state need to be
   dealt with:

   o  If the file associated with the delegation is no longer open and
      no previous CLOSE operation has been sent to the server, a CLOSE
      operation must be sent to the server.

   o  If a file has other open references at the client, then OPEN
      operations must be sent to the server.  The appropriate stateids
      will be provided by the server for subsequent use by the client
      since the delegation stateid will not longer be valid.  These OPEN
      requests are done with the claim type of CLAIM_DELEGATE_CUR.  This
      will allow the presentation of the delegation stateid so that the
      client can establish the appropriate rights to perform the OPEN.
      (see the section "Operation 18: OPEN" for details.)

   o  If there are granted file locks, the corresponding LOCK operations
      need to be performed.  This applies to the write open delegation
      case only.

   o  For a write open delegation, if at the time of recall the file is
      not open for write, all modified data for the file must be flushed
      to the server.  If the delegation had not existed, the client
      would have done this data flush before the CLOSE operation.

   o  For a write open delegation when a file is still open at the time
      of recall, any modified data for the file needs to be flushed to
      the server.

   o  With the write open delegation in place, it is possible that the
      file was truncated during the duration of the delegation.  For
      example, the truncation could have occurred as a result of an OPEN
      UNCHECKED with a object_size attribute value of zero.  Therefore,
      if a truncation of the file has occurred and this operation has
      not been propagated to the server, the truncation must occur
      before any modified data is written to the server.

   In the case of write open delegation, file locking imposes some
   additional requirements.  The flushing of any modified data in any
   region for which a write lock was released while the write open
   delegation was in effect is what is required to precisely maintain
   the associated invariant.  However, because the write open delegation
   implies no other locking by other clients, a simpler implementation

Top      Up      ToC       Page 84 
   is to flush all modified data for the file (as described just above)
   if any write lock has been released while the write open delegation
   was in effect.

9.4.4.  Delegation Revocation

   At the point a delegation is revoked, if there are associated opens
   on the client, the applications holding these opens need to be
   notified.  This notification usually occurs by returning errors for
   READ/WRITE operations or when a close is attempted for the open file.

   If no opens exist for the file at the point the delegation is
   revoked, then notification of the revocation is unnecessary.
   However, if there is modified data present at the client for the
   file, the user of the application should be notified.  Unfortunately,
   it may not be possible to notify the user since active applications
   may not be present at the client.  See the section "Revocation
   Recovery for Write Open Delegation" for additional details.

9.5.  Data Caching and Revocation

   When locks and delegations are revoked, the assumptions upon which
   successful caching depend are no longer guaranteed.  The owner of the
   locks or share reservations which have been revoked needs to be
   notified.  This notification includes applications with a file open
   that has a corresponding delegation which has been revoked.  Cached
   data associated with the revocation must be removed from the client.
   In the case of modified data existing in the client's cache, that
   data must be removed from the client without it being written to the
   server.  As mentioned, the assumptions made by the client are no
   longer valid at the point when a lock or delegation has been revoked.
   For example, another client may have been granted a conflicting lock
   after the revocation of the lock at the first client.  Therefore, the
   data within the lock range may have been modified by the other
   client.  Obviously, the first client is unable to guarantee to the
   application what has occurred to the file in the case of revocation.

   Notification to a lock owner will in many cases consist of simply
   returning an error on the next and all subsequent READs/WRITEs to the
   open file or on the close.  Where the methods available to a client
   make such notification impossible because errors for certain
   operations may not be returned, more drastic action such as signals
   or process termination may be appropriate.  The justification for
   this is that an invariant for which an application depends on may be
   violated.  Depending on how errors are typically treated for the
   client operating environment, further levels of notification
   including logging, console messages, and GUI pop-ups may be

Top      Up      ToC       Page 85 
9.5.1.  Revocation Recovery for Write Open Delegation

   Revocation recovery for a write open delegation poses the special
   issue of modified data in the client cache while the file is not
   open.  In this situation, any client which does not flush modified
   data to the server on each close must ensure that the user receives
   appropriate notification of the failure as a result of the
   revocation.  Since such situations may require human action to
   correct problems, notification schemes in which the appropriate user
   or administrator is notified may be necessary.  Logging and console
   messages are typical examples.

   If there is modified data on the client, it must not be flushed
   normally to the server.  A client may attempt to provide a copy of
   the file data as modified during the delegation under a different
   name in the file system name space to ease recovery.  Unless the
   client can determine that the file has not modified by any other
   client, this technique must be limited to situations in which a
   client has a complete cached copy of the file in question.  Use of
   such a technique may be limited to files under a certain size or may
   only be used when sufficient disk space is guaranteed to be available
   within the target file system and when the client has sufficient
   buffering resources to keep the cached copy available until it is
   properly stored to the target file system.

9.6.  Attribute Caching

   The attributes discussed in this section do not include named
   attributes.  Individual named attributes are analogous to files and
   caching of the data for these needs to be handled just as data
   caching is for ordinary files.  Similarly, LOOKUP results from an
   OPENATTR directory are to be cached on the same basis as any other
   pathnames and similarly for directory contents.

   Clients may cache file attributes obtained from the server and use
   them to avoid subsequent GETATTR requests.  Such caching is write
   through in that modification to file attributes is always done by
   means of requests to the server and should not be done locally and
   cached.  The exception to this are modifications to attributes that
   are intimately connected with data caching.  Therefore, extending a
   file by writing data to the local data cache is reflected immediately
   in the object_size as seen on the client without this change being
   immediately reflected on the server.  Normally such changes are not
   propagated directly to the server but when the modified data is
   flushed to the server, analogous attribute changes are made on the
   server.  When open delegation is in effect, the modified attributes
   may be returned to the server in the response to a CB_RECALL call.

Top      Up      ToC       Page 86 
   The result of local caching of attributes is that the attribute
   caches maintained on individual clients will not be coherent. Changes
   made in one order on the server may be seen in a different order on
   one client and in a third order on a different client.

   The typical file system application programming interfaces do not
   provide means to atomically modify or interrogate attributes for
   multiple files at the same time.  The following rules provide an
   environment where the potential incoherences mentioned above can be
   reasonably managed.  These rules are derived from the practice of
   previous NFS protocols.

   o  All attributes for a given file (per-fsid attributes excepted) are
      cached as a unit at the client so that no non-serializability can
      arise within the context of a single file.

   o  An upper time boundary is maintained on how long a client cache
      entry can be kept without being refreshed from the server.

   o  When operations are performed that change attributes at the
      server, the updated attribute set is requested as part of the
      containing RPC.  This includes directory operations that update
      attributes indirectly.  This is accomplished by following the
      modifying operation with a GETATTR operation and then using the
      results of the GETATTR to update the client's cached attributes.

   Note that if the full set of attributes to be cached is requested by
   READDIR, the results can be cached by the client on the same basis as
   attributes obtained via GETATTR.

   A client may validate its cached version of attributes for a file by
   fetching only the change attribute and assuming that if the change
   attribute has the same value as it did when the attributes were
   cached, then no attributes have changed.  The possible exception is
   the attribute time_access.

9.7.  Name Caching

   The results of LOOKUP and READDIR operations may be cached to avoid
   the cost of subsequent LOOKUP operations.  Just as in the case of
   attribute caching, inconsistencies may arise among the various client
   caches.  To mitigate the effects of these inconsistencies and given
   the context of typical file system APIs, the following rules should
   be followed:

   o  The results of unsuccessful LOOKUPs should not be cached, unless
      they are specifically reverified at the point of use.

Top      Up      ToC       Page 87 
   o  An upper time boundary is maintained on how long a client name
      cache entry can be kept without verifying that the entry has not
      been made invalid by a directory change operation performed by
      another client.

   When a client is not making changes to a directory for which there
   exist name cache entries, the client needs to periodically fetch
   attributes for that directory to ensure that it is not being
   modified.  After determining that no modification has occurred, the
   expiration time for the associated name cache entries may be updated
   to be the current time plus the name cache staleness bound.

   When a client is making changes to a given directory, it needs to
   determine whether there have been changes made to the directory by
   other clients.  It does this by using the change attribute as
   reported before and after the directory operation in the associated
   change_info4 value returned for the operation.  The server is able to
   communicate to the client whether the change_info4 data is provided
   atomically with respect to the directory operation.  If the change
   values are provided atomically, the client is then able to compare
   the pre-operation change value with the change value in the client's
   name cache.  If the comparison indicates that the directory was
   updated by another client, the name cache associated with the
   modified directory is purged from the client.  If the comparison
   indicates no modification, the name cache can be updated on the
   client to reflect the directory operation and the associated timeout
   extended.  The post-operation change value needs to be saved as the
   basis for future change_info4 comparisons.

   As demonstrated by the scenario above, name caching requires that the
   client revalidate name cache data by inspecting the change attribute
   of a directory at the point when the name cache item was cached.
   This requires that the server update the change attribute for
   directories when the contents of the corresponding directory is
   modified.  For a client to use the change_info4 information
   appropriately and correctly, the server must report the pre and post
   operation change attribute values atomically.  When the server is
   unable to report the before and after values atomically with respect
   to the directory operation, the server must indicate that fact in the
   change_info4 return value.  When the information is not atomically
   reported, the client should not assume that other clients have not
   changed the directory.

9.8.  Directory Caching

   The results of READDIR operations may be used to avoid subsequent
   READDIR operations.  Just as in the cases of attribute and name
   caching, inconsistencies may arise among the various client caches.

Top      Up      ToC       Page 88 
   To mitigate the effects of these inconsistencies, and given the
   context of typical file system APIs, the following rules should be

   o  Cached READDIR information for a directory which is not obtained
      in a single READDIR operation must always be a consistent snapshot
      of directory contents.  This is determined by using a GETATTR
      before the first READDIR and after the last of READDIR that
      contributes to the cache.

   o  An upper time boundary is maintained to indicate the length of
      time a directory cache entry is considered valid before the client
      must revalidate the cached information.

   The revalidation technique parallels that discussed in the case of
   name caching.  When the client is not changing the directory in
   question, checking the change attribute of the directory with GETATTR
   is adequate.  The lifetime of the cache entry can be extended at
   these checkpoints.  When a client is modifying the directory, the
   client needs to use the change_info4 data to determine whether there
   are other clients modifying the directory.  If it is determined that
   no other client modifications are occurring, the client may update
   its directory cache to reflect its own changes.

   As demonstrated previously, directory caching requires that the
   client revalidate directory cache data by inspecting the change
   attribute of a directory at the point when the directory was cached.
   This requires that the server update the change attribute for
   directories when the contents of the corresponding directory is
   modified.  For a client to use the change_info4 information
   appropriately and correctly, the server must report the pre and post
   operation change attribute values atomically.  When the server is
   unable to report the before and after values atomically with respect
   to the directory operation, the server must indicate that fact in the
   change_info4 return value.  When the information is not atomically
   reported, the client should not assume that other clients have not
   changed the directory.

10.  Minor Versioning

   To address the requirement of an NFS protocol that can evolve as the
   need arises, the NFS version 4 protocol contains the rules and
   framework to allow for future minor changes or versioning.

   The base assumption with respect to minor versioning is that any
   future accepted minor version must follow the IETF process and be
   documented in a standards track RFC.  Therefore, each minor version
   number will correspond to an RFC.  Minor version zero of the NFS

Top      Up      ToC       Page 89 
   version 4 protocol is represented by this RFC.  The COMPOUND
   procedure will support the encoding of the minor version being
   requested by the client.

   The following items represent the basic rules for the development of
   minor versions.  Note that a future minor version may decide to
   modify or add to the following rules as part of the minor version

   1    Procedures are not added or deleted

        To maintain the general RPC model, NFS version 4 minor versions
        will not add or delete procedures from the NFS program.

   2    Minor versions may add operations to the COMPOUND and
        CB_COMPOUND procedures.

        The addition of operations to the COMPOUND and CB_COMPOUND
        procedures does not affect the RPC model.

   2.1  Minor versions may append attributes to GETATTR4args, bitmap4,
        and GETATTR4res.

        This allows for the expansion of the attribute model to allow
        for future growth or adaptation.

   2.2  Minor version X must append any new attributes after the last
        documented attribute.

        Since attribute results are specified as an opaque array of
        per-attribute XDR encoded results, the complexity of adding new
        attributes in the midst of the current definitions will be too

   3    Minor versions must not modify the structure of an existing
        operation's arguments or results.

        Again the complexity of handling multiple structure definitions
        for a single operation is too burdensome.  New operations should
        be added instead of modifying existing structures for a minor

        This rule does not preclude the following adaptations in a minor

        o  adding bits to flag fields such as new attributes to
           GETATTR's bitmap4 data type

Top      Up      ToC       Page 90 
        o  adding bits to existing attributes like ACLs that have flag

        o  extending enumerated types (including NFS4ERR_*) with new

   4    Minor versions may not modify the structure of existing

   5    Minor versions may not delete operations.

        This prevents the potential reuse of a particular operation
        "slot" in a future minor version.

   6    Minor versions may not delete attributes.

   7    Minor versions may not delete flag bits or enumeration values.

   8    Minor versions may declare an operation as mandatory to NOT

        Specifying an operation as "mandatory to not implement" is
        equivalent to obsoleting an operation.  For the client, it means
        that the operation should not be sent to the server.  For the
        server, an NFS error can be returned as opposed to "dropping"
        the request as an XDR decode error.  This approach allows for
        the obsolescence of an operation while maintaining its structure
        so that a future minor version can reintroduce the operation.

   8.1  Minor versions may declare attributes mandatory to NOT

   8.2  Minor versions may declare flag bits or enumeration values as
        mandatory to NOT implement.

   9    Minor versions may downgrade features from mandatory to
        recommended, or recommended to optional.

   10   Minor versions may upgrade features from optional to recommended
        or recommended to mandatory.

   11   A client and server that support minor version X must support
        minor versions 0 (zero) through X-1 as well.

   12   No new features may be introduced as mandatory in a minor

Top      Up      ToC       Page 91 
        This rule allows for the introduction of new functionality and
        forces the use of implementation experience before designating a
        feature as mandatory.

   13   A client MUST NOT attempt to use a stateid, file handle, or
        similar returned object from the COMPOUND procedure with minor
        version X for another COMPOUND procedure with minor version Y,
        where X != Y.

11.  Internationalization

   The primary issue in which NFS needs to deal with
   internationalization, or I18n, is with respect to file names and
   other strings as used within the protocol.  The choice of string
   representation must allow reasonable name/string access to clients
   which use various languages.  The UTF-8 encoding of the UCS as
   defined by [ISO10646] allows for this type of access and follows the
   policy described in "IETF Policy on Character Sets and Languages",
   [RFC2277].  This choice is explained further in the following.

11.1.  Universal Versus Local Character Sets

   [RFC1345] describes a table of 16 bit characters for many different
   languages (the bit encodings match Unicode, though of course RFC1345
   is somewhat out of date with respect to current Unicode assignments).
   Each character from each language has a unique 16 bit value in the 16
   bit character set.  Thus this table can be thought of as a universal
   character set.  [RFC1345] then talks about groupings of subsets of
   the entire 16 bit character set into "Charset Tables".  For example
   one might take all the Greek characters from the 16 bit table (which
   are consecutively allocated), and normalize their offsets to a table
   that fits in 7 bits.  Thus it is determined that "lower case alpha"
   is in the same position as "upper case a" in the US-ASCII table, and
   "upper case alpha" is in the same position as "lower case a" in the
   US-ASCII table.

   These normalized subset character sets can be thought of as "local
   character sets", suitable for an operating system locale.

   Local character sets are not suitable for the NFS protocol.  Consider
   someone who creates a file with a name in a Swedish character set.
   If someone else later goes to access the file with their locale set
   to the Swedish language, then there are no problems.  But if someone
   in say the US-ASCII locale goes to access the file, the file name
   will look very different, because the Swedish characters in the 7 bit
   table will now be represented in US-ASCII characters on the display.
   It would be preferable to give the US-ASCII user a way to display the

Top      Up      ToC       Page 92 
   file name using Swedish glyphs. In order to do that, the NFS protocol
   would have to include the locale with the file name on each operation
   to create a file.

   But then what of the situation when there is a path name on the
   server like:


   Each component could have been created with a different locale.  If
   one issues CREATE with multi-component path name, and if some of the
   leading components already exist, what is to be done with the
   existing components?  Is the current locale attribute replaced with
   the user's current one?  These types of situations quickly become too
   complex when there is an alternate solution.

   If the NFS version 4 protocol used a universal 16 bit or 32 bit
   character set (or an encoding of a 16 bit or 32 bit character set
   into octets), then the server and client need not care if the locale
   of the user accessing the file is different than the locale of the
   user who created the file.  The unique 16 bit or 32 bit encoding of
   the character allows for determination of what language the character
   is from and also how to display that character on the client.  The
   server need not know what locales are used.

11.2.  Overview of Universal Character Set Standards

   The previous section makes a case for using a universal character
   set.  This section makes the case for using UTF-8 as the specific
   universal character set for the NFS version 4 protocol.

   [RFC2279] discusses UTF-* (UTF-8 and other UTF-XXX encodings),
   Unicode, and UCS-*.  There are two standards bodies managing
   universal code sets:

   o  ISO/IEC which has the standard 10646-1

   o  Unicode which has the Unicode standard

   Both standards bodies have pledged to track each other's assignments
   of character codes.

   The following is a brief analysis of the various standards.

   UCS       Universal Character Set.  This is ISO/IEC 10646-1: "a
             multi-octet character set called the Universal Character
             Set (UCS), which encompasses most of the world's writing

Top      Up      ToC       Page 93 
   UCS-2     a two octet per character encoding that addresses the first
             2^16 characters of UCS. Currently there are no UCS
             characters beyond that range.

   UCS-4     a four octet per character encoding that permits the
             encoding of up to 2^31 characters.

   UTF       UTF is an abbreviation of the term "UCS transformation
             format" and is used in the naming of various standards for
             encoding of UCS characters as described below.

   UTF-1     Only historical interest; it has been removed from 10646-1

   UTF-7     Encodes the entire "repertoire" of UCS "characters using
             only octets with the higher order bit clear".  [RFC2152]
             describes UTF-7. UTF-7 accomplishes this by reserving one
             of the 7bit US-ASCII characters as a "shift" character to
             indicate non-US-ASCII characters.

   UTF-8     Unlike UTF-7, uses all 8 bits of the octets. US-ASCII
             characters are encoded as before unchanged. Any octet with
             the high bit cleared can only mean a US-ASCII character.
             The high bit set means that a UCS character is being

   UTF-16    Encodes UCS-4 characters into UCS-2 characters using a
             reserved range in UCS-2.

   Unicode   Unicode and UCS-2 are the same; [RFC2279] states:

             Up to the present time, changes in Unicode and amendments
             to ISO/IEC 10646 have tracked each other, so that the
             character repertoires and code point assignments have
             remained in sync.  The relevant standardization committees
             have committed to maintain this very useful synchronism.

11.3.  Difficulties with UCS-4, UCS-2, Unicode

   Adapting existing applications, and file systems to multi-octet
   schemes like UCS and Unicode can be difficult.  A significant amount
   of code has been written to process streams of bytes. Also there are
   many existing stored objects described with 7 bit or 8 bit
   characters. Doubling or quadrupling the bandwidth and storage
   requirements seems like an expensive way to accomplish I18N.

Top      Up      ToC       Page 94 
   UCS-2 and Unicode are "only" 16 bits long.  That might seem to be
   enough but, according to [Unicode1], 49,194 Unicode characters are
   already assigned.  According to [Unicode2] there are still more
   languages that need to be added.

11.4.  UTF-8 and its solutions

   UTF-8 solves problems for NFS that exist with the use of UCS and
   Unicode.  UTF-8 will encode 16 bit and 32 bit characters in a way
   that will be compact for most users. The encoding table from UCS-4 to
   UTF-8, as copied from [RFC2279]:

      UCS-4 range (hex.)           UTF-8 octet sequence (binary)
    0000 0000-0000 007F   0xxxxxxx
    0000 0080-0000 07FF   110xxxxx 10xxxxxx
    0000 0800-0000 FFFF   1110xxxx 10xxxxxx 10xxxxxx
    0001 0000-001F FFFF   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    0020 0000-03FF FFFF   111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
    0400 0000-7FFF FFFF   1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

   See [RFC2279] for precise encoding and decoding rules. Note because
   of UTF-16, the algorithm from Unicode/UCS-2 to UTF-8 needs to account
   for the reserved range between D800 and DFFF.

   Note that the 16 bit UCS or Unicode characters require no more than 3
   octets to encode into UTF-8

   Interestingly, UTF-8 has room to handle characters larger than 31
   bits, because the leading octet of form:


   is not defined. If needed, ISO could either use that octet to
   indicate a sequence of an encoded 8 octet character, or perhaps use
   11111110 to permit the next octet to indicate an even more expandable
   character set.

   So using UTF-8 to represent character encodings means never having to
   run out of room.

11.5.  Normalization

   The client and server operating environments may differ in their
   policies and operational methods with respect to character
   normalization (See [Unicode1] for a discussion of normalization
   forms).  This difference may also exist between applications on the
   same client.  This adds to the difficulty of providing a single

Top      Up      ToC       Page 95 
   normalization policy for the protocol that allows for maximal
   interoperability.  This issue is similar to the character case issues
   where the server may or may not support case insensitive file name
   matching and may or may not preserve the character case when storing
   file names.  The protocol does not mandate a particular behavior but
   allows for the various permutations.

   The NFS version 4 protocol does not mandate the use of a particular
   normalization form at this time.  A later revision of this
   specification may specify a particular normalization form.
   Therefore, the server and client can expect that they may receive
   unnormalized characters within protocol requests and responses.  If
   the operating environment requires normalization, then the
   implementation must normalize the various UTF-8 encoded strings
   within the protocol before presenting the information to an
   application (at the client) or local file system (at the server).

12.  Error Definitions

   NFS error numbers are assigned to failed operations within a compound
   request.  A compound request contains a number of NFS operations that
   have their results encoded in sequence in a compound reply.  The
   results of successful operations will consist of an NFS4_OK status
   followed by the encoded results of the operation.  If an NFS
   operation fails, an error status will be entered in the reply and the
   compound request will be terminated.

   A description of each defined error follows:

   NFS4_OK               Indicates the operation completed successfully.

   NFS4ERR_ACCES         Permission denied. The caller does not have the
                         correct permission to perform the requested
                         operation. Contrast this with NFS4ERR_PERM,
                         which restricts itself to owner or privileged
                         user permission failures.

   NFS4ERR_BADHANDLE     Illegal NFS file handle. The file handle failed
                         internal consistency checks.

   NFS4ERR_BADTYPE       An attempt was made to create an object of a
                         type not supported by the server.

   NFS4ERR_BAD_COOKIE    READDIR cookie is stale.

   NFS4ERR_BAD_SEQID     The sequence number in a locking request is
                         neither the next expected number or the last
                         number processed.

Top      Up      ToC       Page 96 
   NFS4ERR_BAD_STATEID   A stateid generated by the current server
                         instance, but which does not designate any
                         locking state (either current or superseded)
                         for a current lockowner-file pair, was used.

   NFS4ERR_CLID_INUSE    The SETCLIENTID procedure has found that a
                         client id is already in use by another client.

   NFS4ERR_DELAY         The server initiated the request, but was not
                         able to complete it in a timely fashion. The
                         client should wait and then try the request
                         with a new RPC transaction ID.  For example,
                         this error should be returned from a server
                         that supports hierarchical storage and receives
                         a request to process a file that has been
                         migrated. In this case, the server should start
                         the immigration process and respond to client
                         with this error.  This error may also occur
                         when a necessary delegation recall makes
                         processing a request in a timely fashion

   NFS4ERR_DENIED        An attempt to lock a file is denied.  Since
                         this may be a temporary condition, the client
                         is encouraged to retry the lock request until
                         the lock is accepted.

   NFS4ERR_DQUOT         Resource (quota) hard limit exceeded. The
                         user's resource limit on the server has been

   NFS4ERR_EXIST         File exists. The file specified already exists.

   NFS4ERR_EXPIRED       A lease has expired that is being used in the
                         current procedure.

   NFS4ERR_FBIG          File too large. The operation would have caused
                         a file to grow beyond the server's limit.

   NFS4ERR_FHEXPIRED     The file handle provided is volatile and has
                         expired at the server.

   NFS4ERR_GRACE         The server is in its recovery or grace period
                         which should match the lease period of the

Top      Up      ToC       Page 97 
   NFS4ERR_INVAL         Invalid argument or unsupported argument for an
                         operation. Two examples are attempting a
                         READLINK on an object other than a symbolic
                         link or attempting to SETATTR a time field on a
                         server that does not support this operation.

   NFS4ERR_IO            I/O error. A hard error (for example, a disk
                         error) occurred while processing the requested

   NFS4ERR_ISDIR         Is a directory. The caller specified a
                         directory in a non-directory operation.

   NFS4ERR_LEASE_MOVED   A lease being renewed is associated with a file
                         system that has been migrated to a new server.

   NFS4ERR_LOCKED        A read or write operation was attempted on a
                         locked file.

   NFS4ERR_LOCK_RANGE    A lock request is operating on a sub-range of a
                         current lock for the lock owner and the server
                         does not support this type of request.

                         The server has received a request that
                         specifies an unsupported minor version.  The
                         server must return a COMPOUND4res with a zero
                         length operations result array.

   NFS4ERR_MLINK         Too many hard links.

   NFS4ERR_MOVED         The filesystem which contains the current
                         filehandle object has been relocated or
                         migrated to another server.  The client may
                         obtain the new filesystem location by obtaining
                         the "fs_locations" attribute for the current
                         filehandle.  For further discussion, refer to
                         the section "Filesystem Migration or

   NFS4ERR_NAMETOOLONG   The filename in an operation was too long.

   NFS4ERR_NODEV         No such device.

   NFS4ERR_NOENT         No such file or directory. The file or
                         directory name specified does not exist.

Top      Up      ToC       Page 98 
   NFS4ERR_NOFILEHANDLE  The logical current file handle value has not
                         been set properly.  This may be a result of a
                         malformed COMPOUND operation (i.e. no PUTFH or
                         PUTROOTFH before an operation that requires the
                         current file handle be set).

   NFS4ERR_NOSPC         No space left on device. The operation would
                         have caused the server's file system to exceed
                         its limit.

   NFS4ERR_NOTDIR        Not a directory. The caller specified a non-
                         directory in a directory operation.

   NFS4ERR_NOTEMPTY      An attempt was made to remove a directory that
                         was not empty.

   NFS4ERR_NOTSUPP       Operation is not supported.

   NFS4ERR_NOT_SAME      This error is returned by the VERIFY operation
                         to signify that the attributes compared were
                         not the same as provided in the client's

   NFS4ERR_NXIO          I/O error. No such device or address.

   NFS4ERR_OLD_STATEID   A stateid which designates the locking state
                         for a lockowner-file at an earlier time was

   NFS4ERR_PERM          Not owner. The operation was not allowed
                         because the caller is either not a privileged
                         user (root) or not the owner of the target of
                         the operation.

   NFS4ERR_READDIR_NOSPC The encoded response to a READDIR request
                         exceeds the size limit set by the initial

   NFS4ERR_RESOURCE      For the processing of the COMPOUND procedure,
                         the server may exhaust available resources and
                         can not continue processing procedures within
                         the COMPOUND operation.  This error will be
                         returned from the server in those instances of
                         resource exhaustion related to the processing
                         of the COMPOUND procedure.

   NFS4ERR_ROFS          Read-only file system. A modifying operation
                         was attempted on a read-only file system.

Top      Up      ToC       Page 99 
   NFS4ERR_SAME          This error is returned by the NVERIFY operation
                         to signify that the attributes compared were
                         the same as provided in the client's request.

   NFS4ERR_SERVERFAULT   An error occurred on the server which does not
                         map to any of the legal NFS version 4 protocol
                         error values.  The client should translate this
                         into an appropriate error.  UNIX clients may
                         choose to translate this to EIO.

   NFS4ERR_SHARE_DENIED  An attempt to OPEN a file with a share
                         reservation has failed because of a share

   NFS4ERR_STALE         Invalid file handle. The file handle given in
                         the arguments was invalid. The file referred to
                         by that file handle no longer exists or access
                         to it has been revoked.

   NFS4ERR_STALE_CLIENTID A clientid not recognized by the server was
                         used in a locking or SETCLIENTID_CONFIRM

   NFS4ERR_STALE_STATEID A stateid generated by an earlier server
                         instance was used.

   NFS4ERR_SYMLINK       The current file handle provided for a LOOKUP
                         is not a directory but a symbolic link.  Also
                         used if the final component of the OPEN path is
                         a symbolic link.

                         NFS4ERR_TOOSMALL      Buffer or request is too

   NFS4ERR_WRONGSEC      The security mechanism being used by the client
                         for the procedure does not match the server's
                         security policy.  The client should change the
                         security mechanism being used and retry the

   NFS4ERR_XDEV          Attempt to do a cross-device hard link.

13.  NFS Version 4 Requests

   For the NFS version 4 RPC program, there are two traditional RPC
   procedures: NULL and COMPOUND.  All other functionality is defined as
   a set of operations and these operations are defined in normal
   XDR/RPC syntax and semantics.  However, these operations are

Top      Up      ToC       Page 100 
   encapsulated within the COMPOUND procedure.  This requires that the
   client combine one or more of the NFS version 4 operations into a
   single request.

   The NFS4_CALLBACK program is used to provide server to client
   signaling and is constructed in a similar fashion as the NFS version
   4 program.  The procedures CB_NULL and CB_COMPOUND are defined in the
   same way as NULL and COMPOUND are within the NFS program.  The
   CB_COMPOUND request also encapsulates the remaining operations of the
   NFS4_CALLBACK program.  There is no predefined RPC program number for
   the NFS4_CALLBACK program.  It is up to the client to specify a
   program number in the "transient" program range.  The program and
   port number of the NFS4_CALLBACK program are provided by the client
   as part of the SETCLIENTID operation and therefore is fixed for the
   life of the client instantiation.

13.1.  Compound Procedure

   The COMPOUND procedure provides the opportunity for better
   performance within high latency networks.  The client can avoid
   cumulative latency of multiple RPCs by combining multiple dependent
   operations into a single COMPOUND procedure.  A compound operation
   may provide for protocol simplification by allowing the client to
   combine basic procedures into a single request that is customized for
   the client's environment.

   The CB_COMPOUND procedure precisely parallels the features of
   COMPOUND as described above.

   The basics of the COMPOUND procedures construction is:

                  | op + args | op + args | op + args |

   and the reply looks like this:

      |last status | status + op + results | status + op + results |

13.2.  Evaluation of a Compound Request

   The server will process the COMPOUND procedure by evaluating each of
   the operations within the COMPOUND procedure in order.  Each
   component operation consists of a 32 bit operation code, followed by
   the argument of length determined by the type of operation. The
   results of each operation are encoded in sequence into a reply

Top      Up      ToC       Page 101 
   buffer.  The results of each operation are preceded by the opcode and
   a status code (normally zero).  If an operation results in a non-zero
   status code, the status will be encoded and evaluation of the
   compound sequence will halt and the reply will be returned.  Note
   that evaluation stops even in the event of "non error" conditions
   such as NFS4ERR_SAME.

   There are no atomicity requirements for the operations contained
   within the COMPOUND procedure.  The operations being evaluated as
   part of a COMPOUND request may be evaluated simultaneously with other
   COMPOUND requests that the server receives.

   It is the client's responsibility for recovering from any partially
   completed COMPOUND procedure.  Partially completed COMPOUND
   procedures may occur at any point due to errors such as
   NFS4ERR_RESOURCE and NFS4ERR_LONG_DELAY.  This may occur even given
   an otherwise valid operation string.  Further, a server reboot which
   occurs in the middle of processing a COMPOUND procedure may leave the
   client with the difficult task of determining how far COMPOUND
   processing has proceeded.  Therefore, the client should avoid overly
   complex COMPOUND procedures in the event of the failure of an
   operation within the procedure.

   Each operation assumes a "current" and "saved" filehandle that is
   available as part of the execution context of the compound request.
   Operations may set, change, or return the current filehandle.  The
   "saved" filehandle is used for temporary storage of a filehandle
   value and as operands for the RENAME and LINK operations.

13.3.  Synchronous Modifying Operations

   NFS version 4 operations that modify the file system are synchronous.
   When an operation is successfully completed at the server, the client
   can depend that any data associated with the request is now on stable
   storage (the one exception is in the case of the file data in a WRITE
   operation with the UNSTABLE option specified).

   This implies that any previous operations within the same compound
   request are also reflected in stable storage.  This behavior enables
   the client's ability to recover from a partially executed compound
   request which may resulted from the failure of the server.  For
   example, if a compound request contains operations A and B and the
   server is unable to send a response to the client, depending on the
   progress the server made in servicing the request the result of both
   operations may be reflected in stable storage or just operation A may
   be reflected.  The server must not have just the results of operation
   B in stable storage.

Top      Up      ToC       Page 102 
13.4.  Operation Values

   The operations encoded in the COMPOUND procedure are identified by
   operation values.  To avoid overlap with the RPC procedure numbers,
   operations 0 (zero) and 1 are not defined.  Operation 2 is not
   defined but reserved for future use with minor versioning.

(page 102 continued on part 5)

Next RFC Part