Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7530

Network File System (NFS) Version 4 Protocol

Pages: 323
Proposed Standard
Errata
Obsoletes:  3530
Updated by:  79318587
Part 9 of 14 – Pages 162 to 178
First   Prev   Next

Top   ToC   RFC7530 - Page 162   prevText

10.5. Data Caching and Revocation

When locks and delegations are revoked, the assumptions upon which successful caching depend are no longer guaranteed. For any locks or share reservations that have been revoked, the corresponding owner needs to be notified. This notification includes applications with a file open that has a corresponding delegation that has been revoked.
Top   ToC   RFC7530 - Page 163
   Cached data associated with the revocation must be removed from the
   client.  In the case of modified data existing in the client's cache,
   that data must be removed from the client without it being written to
   the server.  As mentioned, the assumptions made by the client are no
   longer valid at the point when a lock or delegation has been revoked.
   For example, another client may have been granted a conflicting lock
   after the revocation of the lock at the first client.  Therefore, the
   data within the lock range may have been modified by the other
   client.  Obviously, the first client is unable to guarantee to the
   application what has occurred to the file in the case of revocation.

   Notification to a lock-owner will in many cases consist of simply
   returning an error on the next and all subsequent READs/WRITEs to the
   open file or on the close.  Where the methods available to a client
   make such notification impossible because errors for certain
   operations may not be returned, more drastic action, such as signals
   or process termination, may be appropriate.  The justification for
   this is that an invariant on which an application depends may be
   violated.  Depending on how errors are typically treated for the
   client operating environment, further levels of notification,
   including logging, console messages, and GUI pop-ups, may be
   appropriate.

10.5.1. Revocation Recovery for Write Open Delegation

Revocation recovery for an OPEN_DELEGATE_WRITE delegation poses the special issue of modified data in the client cache while the file is not open. In this situation, any client that does not flush modified data to the server on each close must ensure that the user receives appropriate notification of the failure as a result of the revocation. Since such situations may require human action to correct problems, notification schemes in which the appropriate user or administrator is notified may be necessary. Logging and console messages are typical examples. If there is modified data on the client, it must not be flushed normally to the server. A client may attempt to provide a copy of the file data as modified during the delegation under a different name in the file system namespace to ease recovery. Note that when the client can determine that the file has not been modified by any other client, or when the client has a complete cached copy of the file in question, such a saved copy of the client's view of the file may be of particular value for recovery. In other cases, recovery using a copy of the file, based partially on the client's cached data and partially on the server copy as modified by other clients, will be anything but straightforward, so clients may avoid saving file contents in these situations or mark the results specially to warn users of possible problems.
Top   ToC   RFC7530 - Page 164
   The saving of such modified data in delegation revocation situations
   may be limited to files of a certain size or might be used only when
   sufficient disk space is available within the target file system.
   Such saving may also be restricted to situations when the client has
   sufficient buffering resources to keep the cached copy available
   until it is properly stored to the target file system.

10.6. Attribute Caching

The attributes discussed in this section do not include named attributes. Individual named attributes are analogous to files, and caching of the data for these needs to be handled just as data caching is for regular files. Similarly, LOOKUP results from an OPENATTR directory are to be cached on the same basis as any other pathnames and similarly for directory contents. Clients may cache file attributes obtained from the server and use them to avoid subsequent GETATTR requests. This cache is write through caching in that any modifications to the file attributes are always done by means of requests to the server, which means the modifications should not be done locally and should not be cached. Exceptions to this are modifications to attributes that are intimately connected with data caching. Therefore, extending a file by writing data to the local data cache is reflected immediately in the size as seen on the client without this change being immediately reflected on the server. Normally, such changes are not propagated directly to the server, but when the modified data is flushed to the server, analogous attribute changes are made on the server. When open delegation is in effect, the modified attributes may be returned to the server in the response to a CB_GETATTR call. The result of local caching of attributes is that the attribute caches maintained on individual clients will not be coherent. Changes made in one order on the server may be seen in a different order on one client and in a third order on a different client. The typical file system application programming interfaces do not provide means to atomically modify or interrogate attributes for multiple files at the same time. The following rules provide an environment where the potential incoherency mentioned above can be reasonably managed. These rules are derived from the practice of previous NFS protocols. o All attributes for a given file (per-fsid attributes excepted) are cached as a unit at the client so that no non-serializability can arise within the context of a single file.
Top   ToC   RFC7530 - Page 165
   o  An upper time boundary is maintained on how long a client cache
      entry can be kept without being refreshed from the server.

   o  When operations are performed that modify attributes at the
      server, the updated attribute set is requested as part of the
      containing RPC.  This includes directory operations that update
      attributes indirectly.  This is accomplished by following the
      modifying operation with a GETATTR operation and then using the
      results of the GETATTR to update the client's cached attributes.

   Note that if the full set of attributes to be cached is requested by
   READDIR, the results can be cached by the client on the same basis as
   attributes obtained via GETATTR.

   A client may validate its cached version of attributes for a file by
   only fetching both the change and time_access attributes and assuming
   that if the change attribute has the same value as it did when the
   attributes were cached, then no attributes other than time_access
   have changed.  The time_access attribute is also fetched because many
   servers operate in environments where the operation that updates
   change does not update time_access.  For example, POSIX file
   semantics do not update access time when a file is modified by the
   write system call.  Therefore, the client that wants a current
   time_access value should fetch it with change during the attribute
   cache validation processing and update its cached time_access.

   The client may maintain a cache of modified attributes for those
   attributes intimately connected with data of modified regular files
   (size, time_modify, and change).  Other than those three attributes,
   the client MUST NOT maintain a cache of modified attributes.
   Instead, attribute changes are immediately sent to the server.

   In some operating environments, the equivalent to time_access is
   expected to be implicitly updated by each read of the content of the
   file object.  If an NFS client is caching the content of a file
   object, whether it is a regular file, directory, or symbolic link,
   the client SHOULD NOT update the time_access attribute (via SETATTR
   or a small READ or READDIR request) on the server with each read that
   is satisfied from cache.  The reason is that this can defeat the
   performance benefits of caching content, especially since an explicit
   SETATTR of time_access may alter the change attribute on the server.
   If the change attribute changes, clients that are caching the content
   will think the content has changed and will re-read unmodified data
   from the server.  Nor is the client encouraged to maintain a modified
   version of time_access in its cache, since this would mean that the
   client either will eventually have to write the access time to the
   server with bad performance effects or would never update the
   server's time_access, thereby resulting in a situation where an
Top   ToC   RFC7530 - Page 166
   application that caches access time between a close and open of the
   same file observes the access time oscillating between the past and
   present.  The time_access attribute always means the time of last
   access to a file by a READ that was satisfied by the server.  This
   way, clients will tend to see only time_access changes that go
   forward in time.

10.7. Data and Metadata Caching and Memory-Mapped Files

Some operating environments include the capability for an application to map a file's content into the application's address space. Each time the application accesses a memory location that corresponds to a block that has not been loaded into the address space, a page fault occurs and the file is read (or if the block does not exist in the file, the block is allocated and then instantiated in the application's address space). As long as each memory-mapped access to the file requires a page fault, the relevant attributes of the file that are used to detect access and modification (time_access, time_metadata, time_modify, and change) will be updated. However, in many operating environments, when page faults are not required, these attributes will not be updated on reads or updates to the file via memory access (regardless of whether the file is a local file or is being accessed remotely). A client or server MAY fail to update attributes of a file that is being accessed via memory-mapped I/O. This has several implications: o If there is an application on the server that has memory mapped a file that a client is also accessing, the client may not be able to get a consistent value of the change attribute to determine whether its cache is stale or not. A server that knows that the file is memory mapped could always pessimistically return updated values for change so as to force the application to always get the most up-to-date data and metadata for the file. However, due to the negative performance implications of this, such behavior is OPTIONAL. o If the memory-mapped file is not being modified on the server and instead is just being read by an application via the memory-mapped interface, the client will not see an updated time_access attribute. However, in many operating environments, neither will any process running on the server. Thus, NFS clients are at no disadvantage with respect to local processes. o If there is another client that is memory mapping the file and if that client is holding an OPEN_DELEGATE_WRITE delegation, the same set of issues as discussed in the previous two bullet items apply. So, when a server does a CB_GETATTR to a file that the client has
Top   ToC   RFC7530 - Page 167
      modified in its cache, the response from CB_GETATTR will not
      necessarily be accurate.  As discussed earlier, the client's
      obligation is to report that the file has been modified since the
      delegation was granted, not whether it has been modified again
      between successive CB_GETATTR calls, and the server MUST assume
      that any file the client has modified in cache has been modified
      again between successive CB_GETATTR calls.  Depending on the
      nature of the client's memory management system, this weak
      obligation may not be possible.  A client MAY return stale
      information in CB_GETATTR whenever the file is memory mapped.

   o  The mixture of memory mapping and file locking on the same file is
      problematic.  Consider the following scenario, where the page size
      on each client is 8192 bytes.

      *  Client A memory maps first page (8192 bytes) of file X.

      *  Client B memory maps first page (8192 bytes) of file X.

      *  Client A write locks first 4096 bytes.

      *  Client B write locks second 4096 bytes.

      *  Client A, via a STORE instruction, modifies part of its locked
         region.

      *  Simultaneous to client A, client B issues a STORE on part of
         its locked region.

   Here, the challenge is for each client to resynchronize to get a
   correct view of the first page.  In many operating environments, the
   virtual memory management systems on each client only know a page is
   modified, not that a subset of the page corresponding to the
   respective lock regions has been modified.  So it is not possible for
   each client to do the right thing, which is to only write to the
   server that portion of the page that is locked.  For example, if
   client A simply writes out the page, and then client B writes out the
   page, client A's data is lost.

   Moreover, if mandatory locking is enabled on the file, then we have a
   different problem.  When clients A and B issue the STORE
   instructions, the resulting page faults require a byte-range lock on
   the entire page.  Each client then tries to extend their locked range
   to the entire page, which results in a deadlock.

   Communicating the NFS4ERR_DEADLOCK error to a STORE instruction is
   difficult at best.
Top   ToC   RFC7530 - Page 168
   If a client is locking the entire memory-mapped file, there is no
   problem with advisory or mandatory byte-range locking, at least until
   the client unlocks a region in the middle of the file.

   Given the above issues, the following are permitted:

   o  Clients and servers MAY deny memory mapping a file they know there
      are byte-range locks for.

   o  Clients and servers MAY deny a byte-range lock on a file they know
      is memory mapped.

   o  A client MAY deny memory mapping a file that it knows requires
      mandatory locking for I/O.  If mandatory locking is enabled after
      the file is opened and mapped, the client MAY deny the application
      further access to its mapped file.

10.8. Name Caching

The results of LOOKUP and READDIR operations may be cached to avoid the cost of subsequent LOOKUP operations. Just as in the case of attribute caching, inconsistencies may arise among the various client caches. To mitigate the effects of these inconsistencies and given the context of typical file system APIs, an upper time boundary is maintained on how long a client name cache entry can be kept without verifying that the entry has not been made invalid by a directory change operation performed by another client. When a client is not making changes to a directory for which there exist name cache entries, the client needs to periodically fetch attributes for that directory to ensure that it is not being modified. After determining that no modification has occurred, the expiration time for the associated name cache entries may be updated to be the current time plus the name cache staleness bound. When a client is making changes to a given directory, it needs to determine whether there have been changes made to the directory by other clients. It does this by using the change attribute as reported before and after the directory operation in the associated change_info4 value returned for the operation. The server is able to communicate to the client whether the change_info4 data is provided atomically with respect to the directory operation. If the change values are provided atomically, the client is then able to compare the pre-operation change value with the change value in the client's name cache. If the comparison indicates that the directory was updated by another client, the name cache associated with the modified directory is purged from the client. If the comparison indicates no modification, the name cache can be updated on the
Top   ToC   RFC7530 - Page 169
   client to reflect the directory operation and the associated timeout
   extended.  The post-operation change value needs to be saved as the
   basis for future change_info4 comparisons.

   As demonstrated by the scenario above, name caching requires that the
   client revalidate name cache data by inspecting the change attribute
   of a directory at the point when the name cache item was cached.
   This requires that the server update the change attribute for
   directories when the contents of the corresponding directory are
   modified.  For a client to use the change_info4 information
   appropriately and correctly, the server must report the pre- and
   post-operation change attribute values atomically.  When the server
   is unable to report the before and after values atomically with
   respect to the directory operation, the server must indicate that
   fact in the change_info4 return value.  When the information is not
   atomically reported, the client should not assume that other clients
   have not changed the directory.

10.9. Directory Caching

The results of READDIR operations may be used to avoid subsequent READDIR operations. Just as in the cases of attribute and name caching, inconsistencies may arise among the various client caches. To mitigate the effects of these inconsistencies, and given the context of typical file system APIs, the following rules should be followed: o Cached READDIR information for a directory that is not obtained in a single READDIR operation must always be a consistent snapshot of directory contents. This is determined by using a GETATTR before the first READDIR and after the last READDIR that contributes to the cache. o An upper time boundary is maintained to indicate the length of time a directory cache entry is considered valid before the client must revalidate the cached information. The revalidation technique parallels that discussed in the case of name caching. When the client is not changing the directory in question, checking the change attribute of the directory with GETATTR is adequate. The lifetime of the cache entry can be extended at these checkpoints. When a client is modifying the directory, the client needs to use the change_info4 data to determine whether there are other clients modifying the directory. If it is determined that no other client modifications are occurring, the client may update its directory cache to reflect its own changes.
Top   ToC   RFC7530 - Page 170
   As demonstrated previously, directory caching requires that the
   client revalidate directory cache data by inspecting the change
   attribute of a directory at the point when the directory was cached.
   This requires that the server update the change attribute for
   directories when the contents of the corresponding directory are
   modified.  For a client to use the change_info4 information
   appropriately and correctly, the server must report the pre- and
   post-operation change attribute values atomically.  When the server
   is unable to report the before and after values atomically with
   respect to the directory operation, the server must indicate that
   fact in the change_info4 return value.  When the information is not
   atomically reported, the client should not assume that other clients
   have not changed the directory.

11. Minor Versioning

To address the requirement of an NFS protocol that can evolve as the need arises, the NFSv4 protocol contains the rules and framework to allow for future minor changes or versioning. The base assumption with respect to minor versioning is that any future accepted minor version must follow the IETF process and be documented in a Standards Track RFC. Therefore, each minor version number will correspond to an RFC. Minor version 0 of the NFSv4 protocol is represented by this RFC. The COMPOUND and CB_COMPOUND procedures support the encoding of the minor version being requested by the client. Future minor versions will extend, rather than replace, the XDR for the preceding minor version, as had been done in moving from NFSv2 to NFSv3 and from NFSv3 to NFSv4.0. Specification of detailed rules for the construction of minor versions will be addressed in documents defining early minor versions or, more desirably, in an RFC establishing a versioning framework for NFSv4 as a whole.

12. Internationalization

12.1. Introduction

Internationalization is a complex topic with its own set of terminology (see [RFC6365]). The topic is made more complex in NFSv4.0 by the tangled history and state of NFS implementations. This section describes what we might call "NFSv4.0 internationalization" (i.e., internationalization as implemented by existing clients and servers) as the basis upon which NFSv4.0 clients may implement internationalization support.
Top   ToC   RFC7530 - Page 171
   This section is based on the behavior of existing implementations.
   Note that the behaviors described are each demonstrated by a
   combination of an NFSv4 server implementation proper and a
   server-side physical file system.  It is common for servers and
   physical file systems to be configurable as to the behavior shown.
   In the discussion below, each configuration that shows different
   behavior is considered separately.

   Note that in this section, the key words "MUST", "SHOULD", and "MAY"
   retain their normal meanings.  However, in deriving this
   specification from implementation patterns, we document below how the
   normative terms used derive from the behavior of existing
   implementations, in those situations in which existing implementation
   behavior patterns can be determined.

   o  Behavior implemented by all existing clients or servers is
      described using "MUST", since new implementations need to follow
      existing ones to be assured of interoperability.  While it is
      possible that different behavior might be workable, we have found
      no case where this seems reasonable.

      The converse holds for "MUST NOT": if a type of behavior poses
      interoperability problems, it MUST NOT be implemented by any
      existing clients or servers.

   o  Behavior implemented by most existing clients or servers, where
      that behavior is more desirable than any alternative, is described
      using "SHOULD", since new implementations need to follow that
      existing practice unless there are strong reasons to do otherwise.

      The converse holds for "SHOULD NOT".

   o  Behavior implemented by some, but not all, existing clients or
      servers is described using "MAY", indicating that new
      implementations have a choice as to whether they will behave in
      that way.  Thus, new implementations will have the same
      flexibility that existing ones do.

   o  Behavior implemented by all existing clients or servers, so far as
      is known -- but where there remains some uncertainty as to details
      -- is described using "should".  Such cases primarily concern
      details of error returns.  New implementations should follow
      existing practice even though such situations generally do not
      affect interoperability.

   There are also cases in which certain server behaviors, while not
   known to exist, cannot be reliably determined not to exist.  In part,
   this is a consequence of the long period of time that has elapsed
Top   ToC   RFC7530 - Page 172
   since the publication of [RFC3530], resulting in a situation in which
   those involved in the implementation may no longer be involved in or
   aware of working group activities.

   In the case of possible server behavior that is neither known to
   exist nor known not to exist, we use "SHOULD NOT" and "MUST NOT" as
   follows, and similarly for "SHOULD" and "MUST".

   o  In some cases, the potential behavior is not known to exist but is
      of such a nature that, if it were in fact implemented,
      interoperability difficulties would be expected and reported,
      giving us cause to conclude that the potential behavior is not
      implemented.  For such behavior, we use "MUST NOT".  Similarly, we
      use "MUST" to apply to the contrary behavior.

   o  In other cases, potential behavior is not known to exist but the
      behavior, while undesirable, is not of such a nature that we are
      able to draw any conclusions about its potential existence.  In
      such cases, we use "SHOULD NOT".  Similarly, we use "SHOULD" to
      apply to the contrary behavior.

   In the case of a "MAY", "SHOULD", or "SHOULD NOT" that applies to
   servers, clients need to be aware that there are servers that may or
   may not take the specified action, and they need to be prepared for
   either eventuality.

12.2. Limitations on Internationalization-Related Processing in the NFSv4 Context

There are a number of noteworthy circumstances that limit the degree to which internationalization-related processing can be made universal with regard to NFSv4 clients and servers: o The NFSv4 client is part of an extensive set of client-side software components whose design and internal interfaces are not within the IETF's purview, limiting the degree to which a particular character encoding may be made standard. o Server-side handling of file component names is typically implemented within a server-side physical file system, whose handling of character encoding and normalization is not specifiable by the IETF. o Typical implementation patterns in UNIX systems result in the NFSv4 client having no knowledge of the character encoding being used, which may even vary between processes on the same client system.
Top   ToC   RFC7530 - Page 173
   o  Users may need access to files stored previously with non-UTF-8
      encodings, or with UTF-8 encodings that do not match any
      particular normalization form.

12.3. Summary of Server Behavior Types

As mentioned in Section 12.6, servers MAY reject component name strings that are not valid UTF-8. This leads to a number of types of valid server behavior, as outlined below. When these are combined with the valid normalization-related behaviors as described in Section 12.4, this leads to the combined behaviors outlined below. o Servers that limit file component names to UTF-8 strings exist with normalization-related handling as described in Section 12.4. These are best described as "UTF-8-only servers". o Servers that do not limit file component names to UTF-8 strings are very common and are necessary to deal with clients/ applications not oriented to the use of UTF-8. Such servers ignore normalization-related issues, and there is no way for them to implement either normalization or representation-independent lookups. These are best described as "UTF-8-unaware servers", since they treat file component names as uninterpreted strings of bytes and have no knowledge of the characters represented. See Section 12.7 for details. o It is possible for a server to allow component names that are not valid UTF-8, while still being aware of the structure of UTF-8 strings. Such servers could implement either normalization or representation-independent lookups but apply those techniques only to valid UTF-8 strings. Such servers are not common, but it is possible to configure at least one known server to have this behavior. This behavior SHOULD NOT be used due to the possibility that a filename using one character set may, by coincidence, have the appearance of a UTF-8 filename; the results of UTF-8 normalization or representation-independent lookups are unlikely to be correct in all cases with respect to the other character set.

12.4. String Encoding

Strings that potentially contain characters outside the ASCII range [RFC20] are generally represented in NFSv4 using the UTF-8 encoding [RFC3629] of Unicode [UNICODE]. See [RFC3629] for precise encoding and decoding rules.
Top   ToC   RFC7530 - Page 174
   Some details of the protocol treatment depend on the type of string:

   o  For strings that are component names, the preferred encoding for
      any non-ASCII characters is the UTF-8 representation of Unicode.

      In many cases, clients have no knowledge of the encoding being
      used, with the encoding done at the user level under the control
      of a per-process locale specification.  As a result, it may be
      impossible for the NFSv4 client to enforce the use of UTF-8.  The
      use of non-UTF-8 encodings can be problematic, since it may
      interfere with access to files stored using other forms of name
      encoding.  Also, normalization-related processing (see
      Section 12.5) of a string not encoded in UTF-8 could result in
      inappropriate name modification or aliasing.  In cases in which
      one has a non-UTF-8 encoded name that accidentally conforms to
      UTF-8 rules, substitution of canonically equivalent strings can
      change the non-UTF-8 encoded name drastically.

      The kinds of modification and aliasing mentioned here can lead to
      both false negatives and false positives, depending on the strings
      in question, which can result in security issues such as elevation
      of privilege and denial of service (see [RFC6943] for further
      discussion).

   o  For strings based on domain names, non-ASCII characters MUST be
      represented using the UTF-8 encoding of Unicode, and additional
      string format restrictions apply.  See Section 12.6 for details.

   o  The contents of symbolic links (of type linktext4 in the XDR) MUST
      be treated as opaque data by NFSv4 servers.  Although UTF-8
      encoding is often used, it need not be.  In this respect, the
      contents of symbolic links are like the contents of regular files
      in that their encoding is not within the scope of this
      specification.

   o  For other sorts of strings, any non-ASCII characters SHOULD be
      represented using the UTF-8 encoding of Unicode.

12.5. Normalization

The client and server operating environments may differ in their policies and operational methods with respect to character normalization (see [UNICODE] for a discussion of normalization forms). This difference may also exist between applications on the same client. This adds to the difficulty of providing a single normalization policy for the protocol that allows for maximal interoperability. This issue is similar to the issues of character case where the server may or may not support case-insensitive
Top   ToC   RFC7530 - Page 175
   filename matching and may or may not preserve the character case when
   storing filenames.  The protocol does not mandate a particular
   behavior but allows for a range of useful behaviors.

   The NFSv4 protocol does not mandate the use of a particular
   normalization form at this time.  A subsequent minor version of the
   NFSv4 protocol might specify a particular normalization form.
   Therefore, the server and client can expect that they may receive
   unnormalized characters within protocol requests and responses.  If
   the operating environment requires normalization, then the
   implementation will need to normalize the various UTF-8 encoded
   strings within the protocol before presenting the information to an
   application (at the client) or local file system (at the server).

   Server implementations MAY normalize filenames to conform to a
   particular normalization form before using the resulting string when
   looking up or creating a file.  Servers MAY also perform
   normalization-insensitive string comparisons without modifying the
   names to match a particular normalization form.  Except in cases in
   which component names are excluded from normalization-related
   handling because they are not valid UTF-8 strings, a server MUST make
   the same choice (as to whether to normalize or not, the target form
   of normalization, and whether to do normalization-insensitive string
   comparisons) in the same way for all accesses to a particular file
   system.  Servers SHOULD NOT reject a filename because it does not
   conform to a particular normalization form, as this may deny access
   to clients that use a different normalization form.

12.6. Types with Processing Defined by Other Internet Areas

There are two types of strings that NFSv4 deals with that are based on domain names. Processing of such strings is defined by other Internet standards, and hence the processing behavior for such strings should be consistent across all server operating systems and server file systems. These are as follows: o Server names as they appear in the fs_locations attribute. Note that for most purposes, such server names will only be sent by the server to the client. The exception is the use of the fs_locations attribute in a VERIFY or NVERIFY operation. o Principal suffixes that are used to denote sets of users and groups, and are in the form of domain names.
Top   ToC   RFC7530 - Page 176
   The general rules for handling all of these domain-related strings
   are similar and independent of the role of the sender or receiver as
   client or server, although the consequences of failure to obey these
   rules may be different for client or server.  The server can report
   errors when it is sent invalid strings, whereas the client will
   simply ignore invalid string or use a default value in their place.

   The string sent SHOULD be in the form of one or more U-labels as
   defined by [RFC5890].  If that is impractical, it can instead be in
   the form of one or more LDH labels [RFC5890] or a UTF-8 domain name
   that contains labels that are not properly formatted U-labels.  The
   receiver needs to be able to accept domain and server names in any of
   the formats allowed.  The server MUST reject, using the error
   NFS4ERR_INVAL, a string that is not valid UTF-8, or that contains an
   ASCII label that is not a valid LDH label, or that contains an
   XN-label (begins with "xn--") for which the characters after "xn--"
   are not valid output of the Punycode algorithm [RFC3492].

   When a domain string is part of id@domain or group@domain, there are
   two possible approaches:

   1.  The server treats the domain string as a series of U-labels.  In
       cases where the domain string is a series of A-labels or
       Non-Reserved LDH (NR-LDH) labels, it converts them to U-labels
       using the Punycode algorithm [RFC3492].  In cases where the
       domain string is a series of other sorts of LDH labels, the
       server can use the ToUnicode function defined in [RFC3490] to
       convert the string to a series of labels that generally conform
       to the U-label syntax.  In cases where the domain string is a
       UTF-8 string that contains non-U-labels, the server can attempt
       to use the ToASCII function defined in [RFC3490] and then the
       ToUnicode function on the string to convert it to a series of
       labels that generally conform to the U-label syntax.  As a
       result, the domain string returned within a user id on a GETATTR
       may not match that sent when the user id is set using SETATTR,
       although when this happens, the domain will be in the form that
       generally conforms to the U-label syntax.

   2.  The server does not attempt to treat the domain string as a
       series of U-labels; specifically, it does not map a domain string
       that is not a U-label into a U-label using the methods described
       above.  As a result, the domain string returned on a GETATTR of
       the user id MUST be the same as that used when setting the
       user id by the SETATTR.

   A server SHOULD use the first method.
Top   ToC   RFC7530 - Page 177
   For VERIFY and NVERIFY, additional string processing requirements
   apply to verification of the owner and owner_group attributes; see
   Section 5.9.

12.7. Errors Related to UTF-8

Where the client sends an invalid UTF-8 string, the server MAY return an NFS4ERR_INVAL error. This includes cases in which inappropriate prefixes are detected and where the count includes trailing bytes that do not constitute a full Universal Multiple-Octet Coded Character Set (UCS) character. Requirements for server handling of component names that are not valid UTF-8, when a server does not return NFS4ERR_INVAL in response to receiving them, are described in Section 12.8. Where the string supplied by the client is not rejected with NFS4ERR_INVAL but contains characters that are not supported by the server as a value for that string (e.g., names containing slashes, or characters that do not fit into 16 bits when converted from UTF-8 to a Unicode codepoint), the server should return an NFS4ERR_BADCHAR error. Where a UTF-8 string is used as a filename, and the file system, while supporting all of the characters within the name, does not allow that particular name to be used, the server should return the error NFS4ERR_BADNAME. This includes such situations as file system prohibitions of "." and ".." as filenames for certain operations, and similar constraints.

12.8. Servers That Accept File Component Names That Are Not Valid UTF-8 Strings

As stated previously, servers MAY accept, on all or on some subset of the physical file systems exported, component names that are not valid UTF-8 strings. A typical pattern is for a server to use UTF-8-unaware physical file systems that treat component names as uninterpreted strings of bytes, rather than having any awareness of the character set being used. Such servers SHOULD NOT change the stored representation of component names from those received on the wire and SHOULD use an octet-by- octet comparison of component name strings to determine equivalence (as opposed to any broader notion of string comparison). This is because the server has no knowledge of the character encoding being used.
Top   ToC   RFC7530 - Page 178
   Nonetheless, when such a server uses a broader notion of string
   equivalence than what is recommended in the preceding paragraph, the
   following considerations apply:

   o  Outside of 7-bit ASCII, string processing that changes string
      contents is usually specific to a character set and hence is
      generally unsafe when the character set is unknown.  This
      processing could change the filename in an unexpected fashion,
      rendering the file inaccessible to the application or client that
      created or renamed the file and to others expecting the original
      filename.  Hence, such processing should not be performed, because
      doing so is likely to result in incorrect string modification or
      aliasing.

   o  Unicode normalization is particularly dangerous, as such
      processing assumes that the string is UTF-8.  When that assumption
      is false because a different character set was used to create the
      filename, normalization may corrupt the filename with respect to
      that character set, rendering the file inaccessible to the
      application that created it and others expecting the original
      filename.  Hence, Unicode normalization SHOULD NOT be performed,
      because it may cause incorrect string modification or aliasing.

   When the above recommendations are not followed, the resulting string
   modification and aliasing can lead to both false negatives and false
   positives, depending on the strings in question, which can result in
   security issues such as elevation of privilege and denial of service
   (see [RFC6943] for further discussion).



(page 178 continued on part 10)

Next Section