1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0, is described in [30]. It generally follows the
guidelines for minor versioning that are listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server
that support minor version X must support minor versions 0 through
X-1") and 12 ("no new features may be introduced as mandatory in a
minor version"). These divergences are due to the introduction of
the sessions model for managing non-idempotent operations and the
RECLAIM_COMPLETE operation. These two new features are
infrastructural in nature and simplify implementation of existing and
other new features. Making them anything but REQUIRED would add
undue complexity to protocol definition and implementation. NFSv4.1
accordingly updates the minor versioning guidelines (Section 2.7).
As a minor version, NFSv4.1 is consistent with the overall goals for
NFSv4, but extends the protocol so as to better meet those goals,
based on experiences with NFSv4.0. In addition, NFSv4.1 has adopted
some additional goals, which motivate some of the major extensions in
NFSv4.1.
1.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [1].
1.3. Scope of This Document
This document describes the NFSv4.1 protocol. With respect to
NFSv4.0, this document does not:
o describe the NFSv4.0 protocol, except where needed to contrast
with NFSv4.1.
o modify the specification of the NFSv4.0 protocol.
o clarify the NFSv4.0 protocol.
1.4. NFSv4 Goals
The NFSv4 protocol is a further revision of the NFS protocol defined
already by NFSv3 [31]. It retains the essential characteristics of
previous versions: easy recovery; independence of transport
protocols, operating systems, and file systems; simplicity; and good
performance. NFSv4 has the following goals:
o Improved access and good performance on the Internet
The protocol is designed to transit firewalls easily, perform well
where latency is high and bandwidth is low, and scale to very
large numbers of clients per server.
o Strong security with negotiation built into the protocol
The protocol builds on the work of the ONCRPC working group in
supporting the RPCSEC_GSS protocol. Additionally, the NFSv4.1
protocol provides a mechanism to allow clients and servers the
ability to negotiate security and require clients and servers to
support a minimal set of security schemes.
o Good cross-platform interoperability
The protocol features a file system model that provides a useful,
common set of features that does not unduly favor one file system
or operating system over another.
o Designed for protocol extensions
The protocol is designed to accept standard extensions within a
framework that enables and encourages backward compatibility.
1.5. NFSv4.1 Goals
NFSv4.1 has the following goals, within the framework established by
the overall NFSv4 goals.
o To correct significant structural weaknesses and oversights
discovered in the base protocol.
o To add clarity and specificity to areas left unaddressed or not
addressed in sufficient detail in the base protocol. However, as
stated in Section 1.3, it is not a goal to clarify the NFSv4.0
protocol in the NFSv4.1 specification.
o To add specific features based on experience with the existing
protocol and recent industry developments.
o To provide protocol support to take advantage of clustered server
deployments including the ability to provide scalable parallel
access to files distributed among multiple servers.
1.6. General Definitions
The following definitions provide an appropriate context for the
reader.
Byte: In this document, a byte is an octet, i.e., a datum exactly 8
bits in length.
Client: The client is the entity that accesses the NFS server's
resources. The client may be an application that contains the
logic to access the NFS server directly. The client may also be
the traditional operating system client that provides remote file
system services for a set of applications.
A client is uniquely identified by a client owner.
With reference to byte-range locking, the client is also the
entity that maintains a set of locks on behalf of one or more
applications. This client is responsible for crash or failure
recovery for those locks it manages.
Note that multiple clients may share the same transport and
connection and multiple clients may exist on the same network
node.
Client ID: The client ID is a 64-bit quantity used as a unique,
short-hand reference to a client-supplied verifier and client
owner. The server is responsible for supplying the client ID.
Client Owner: The client owner is a unique string, opaque to the
server, that identifies a client. Multiple network connections
and source network addresses originating from those connections
may share a client owner. The server is expected to treat
requests from connections with the same client owner as coming
from the same client.
File System: The file system is the collection of objects on a
server (as identified by the major identifier of a server owner,
which is defined later in this section) that share the same fsid
attribute (see Section 5.8.1.9).
Lease: A lease is an interval of time defined by the server for
which the client is irrevocably granted locks. At the end of a
lease period, locks may be revoked if the lease has not been
extended. A lock must be revoked if a conflicting lock has been
granted after the lease interval.
A server grants a client a single lease for all state.
Lock: The term "lock" is used to refer to byte-range (in UNIX
environments, also known as record) locks, share reservations,
delegations, or layouts unless specifically stated otherwise.
Secret State Verifier (SSV): The SSV is a unique secret key shared
between a client and server. The SSV serves as the secret key for
an internal (that is, internal to NFSv4.1) Generic Security
Services (GSS) mechanism (the SSV GSS mechanism; see
Section 2.10.9). The SSV GSS mechanism uses the SSV to compute
message integrity code (MIC) and Wrap tokens. See
Section 2.10.8.3 for more details on how NFSv4.1 uses the SSV and
the SSV GSS mechanism.
Server: The Server is the entity responsible for coordinating client
access to a set of file systems and is identified by a server
owner. A server can span multiple network addresses.
Server Owner: The server owner identifies the server to the client.
The server owner consists of a major identifier and a minor
identifier. When the client has two connections each to a peer
with the same major identifier, the client assumes that both peers
are the same server (the server namespace is the same via each
connection) and that lock state is sharable across both
connections. When each peer has both the same major and minor
identifiers, the client assumes that each connection might be
associable with the same session.
Stable Storage: Stable storage is storage from which data stored by
an NFSv4.1 server can be recovered without data loss from multiple
power failures (including cascading power failures, that is,
several power failures in quick succession), operating system
failures, and/or hardware failure of components other than the
storage medium itself (such as disk, nonvolatile RAM, flash
memory, etc.).
Some examples of stable storage that are allowable for an NFS
server include:
1. Media commit of data; that is, the modified data has been
successfully written to the disk media, for example, the disk
platter.
2. An immediate reply disk drive with battery-backed, on-drive
intermediate storage or uninterruptible power system (UPS).
3. Server commit of data with battery-backed intermediate storage
and recovery software.
4. Cache commit with uninterruptible power system (UPS) and
recovery software.
Stateid: A stateid is a 128-bit quantity returned by a server that
uniquely defines the open and locking states provided by the
server for a specific open-owner or lock-owner/open-owner pair for
a specific file and type of lock.
Verifier: A verifier is a 64-bit quantity generated by the client
that the server can use to determine if the client has restarted
and lost all previous lock state.
1.7. Overview of NFSv4.1 Features
The major features of the NFSv4.1 protocol will be reviewed in brief.
This will be done to provide an appropriate context for both the
reader who is familiar with the previous versions of the NFS protocol
and the reader who is new to the NFS protocols. For the reader new
to the NFS protocols, there is still a set of fundamental knowledge
that is expected. The reader should be familiar with the External
Data Representation (XDR) and Remote Procedure Call (RPC) protocols
as described in [2] and [3]. A basic knowledge of file systems and
distributed file systems is expected as well.
In general, this specification of NFSv4.1 will not distinguish those
features added in minor version 1 from those present in the base
protocol but will treat NFSv4.1 as a unified whole. See Section 1.8
for a summary of the differences between NFSv4.0 and NFSv4.1.
1.7.1. RPC and Security
As with previous versions of NFS, the External Data Representation
(XDR) and Remote Procedure Call (RPC) mechanisms used for the NFSv4.1
protocol are those defined in [2] and [3]. To meet end-to-end
security requirements, the RPCSEC_GSS framework [4] is used to extend
the basic RPC security. With the use of RPCSEC_GSS, various
mechanisms can be provided to offer authentication, integrity, and
privacy to the NFSv4 protocol. Kerberos V5 is used as described in
[5] to provide one security framework. With the use of RPCSEC_GSS,
other mechanisms may also be specified and used for NFSv4.1 security.
To enable in-band security negotiation, the NFSv4.1 protocol has
operations that provide the client a method of querying the server
about its policies regarding which security mechanisms must be used
for access to the server's file system resources. With this, the
client can securely match the security mechanism that meets the
policies specified at both the client and server.
NFSv4.1 introduces parallel access (see Section 1.7.2.2), which is
called pNFS. The security framework described in this section is
significantly modified by the introduction of pNFS (see
Section 12.9), because data access is sometimes not over RPC. The
level of significance varies with the storage protocol (see
Section 12.2.5) and can be as low as zero impact (see Section 13.12).
1.7.2. Protocol Structure
1.7.2.1. Core Protocol
Unlike NFSv3, which used a series of ancillary protocols (e.g., NLM,
NSM (Network Status Monitor), MOUNT), within all minor versions of
NFSv4 a single RPC protocol is used to make requests to the server.
Facilities that had been separate protocols, such as locking, are now
integrated within a single unified protocol.
1.7.2.2. Parallel Access
Minor version 1 supports high-performance data access to a clustered
server implementation by enabling a separation of metadata access and
data access, with the latter done to multiple servers in parallel.
Such parallel data access is controlled by recallable objects known
as "layouts", which are integrated into the protocol locking model.
Clients direct requests for data access to a set of data servers
specified by the layout via a data storage protocol which may be
NFSv4.1 or may be another protocol.
Because the protocols used for parallel data access are not
necessarily RPC-based, the RPC-based security model (Section 1.7.1)
is obviously impacted (see Section 12.9). The degree of impact
varies with the storage protocol (see Section 12.2.5) used for data
access, and can be as low as zero (see Section 13.12).
1.7.3. File System Model
The general file system model used for the NFSv4.1 protocol is the
same as previous versions. The server file system is hierarchical
with the regular files contained within being treated as opaque byte
streams. In a slight departure, file and directory names are encoded
with UTF-8 to deal with the basics of internationalization.
The NFSv4.1 protocol does not require a separate protocol to provide
for the initial mapping between path name and filehandle. All file
systems exported by a server are presented as a tree so that all file
systems are reachable from a special per-server global root
filehandle. This allows LOOKUP operations to be used to perform
functions previously provided by the MOUNT protocol. The server
provides any necessary pseudo file systems to bridge any gaps that
arise due to unexported gaps between exported file systems.
1.7.3.1. Filehandles
As in previous versions of the NFS protocol, opaque filehandles are
used to identify individual files and directories. Lookup-type and
create operations translate file and directory names to filehandles,
which are then used to identify objects in subsequent operations.
The NFSv4.1 protocol provides support for persistent filehandles,
guaranteed to be valid for the lifetime of the file system object
designated. In addition, it provides support to servers to provide
filehandles with more limited validity guarantees, called volatile
filehandles.
1.7.3.2. File Attributes
The NFSv4.1 protocol has a rich and extensible file object attribute
structure, which is divided into REQUIRED, RECOMMENDED, and named
attributes (see Section 5).
Several (but not all) of the REQUIRED attributes are derived from the
attributes of NFSv3 (see the definition of the fattr3 data type in
[31]). An example of a REQUIRED attribute is the file object's type
(Section 5.8.1.2) so that regular files can be distinguished from
directories (also known as folders in some operating environments)
and other types of objects. REQUIRED attributes are discussed in
Section 5.1.
An example of three RECOMMENDED attributes are acl, sacl, and dacl.
These attributes define an Access Control List (ACL) on a file object
(Section 6). An ACL provides directory and file access control
beyond the model used in NFSv3. The ACL definition allows for
specification of specific sets of permissions for individual users
and groups. In addition, ACL inheritance allows propagation of
access permissions and restrictions down a directory tree as file
system objects are created. RECOMMENDED attributes are discussed in
Section 5.2.
A named attribute is an opaque byte stream that is associated with a
directory or file and referred to by a string name. Named attributes
are meant to be used by client applications as a method to associate
application-specific data with a regular file or directory. NFSv4.1
modifies named attributes relative to NFSv4.0 by tightening the
allowed operations in order to prevent the development of non-
interoperable implementations. Named attributes are discussed in
Section 5.3.
1.7.3.3. Multi-Server Namespace
NFSv4.1 contains a number of features to allow implementation of
namespaces that cross server boundaries and that allow and facilitate
a non-disruptive transfer of support for individual file systems
between servers. They are all based upon attributes that allow one
file system to specify alternate or new locations for that file
system.
These attributes may be used together with the concept of absent file
systems, which provide specifications for additional locations but no
actual file system content. This allows a number of important
facilities:
o Location attributes may be used with absent file systems to
implement referrals whereby one server may direct the client to a
file system provided by another server. This allows extensive
multi-server namespaces to be constructed.
o Location attributes may be provided for present file systems to
provide the locations of alternate file system instances or
replicas to be used in the event that the current file system
instance becomes unavailable.
o Location attributes may be provided when a previously present file
system becomes absent. This allows non-disruptive migration of
file systems to alternate servers.
1.7.4. Locking Facilities
As mentioned previously, NFSv4.1 is a single protocol that includes
locking facilities. These locking facilities include support for
many types of locks including a number of sorts of recallable locks.
Recallable locks such as delegations allow the client to be assured
that certain events will not occur so long as that lock is held.
When circumstances change, the lock is recalled via a callback
request. The assurances provided by delegations allow more extensive
caching to be done safely when circumstances allow it.
The types of locks are:
o Share reservations as established by OPEN operations.
o Byte-range locks.
o File delegations, which are recallable locks that assure the
holder that inconsistent opens and file changes cannot occur so
long as the delegation is held.
o Directory delegations, which are recallable locks that assure the
holder that inconsistent directory modifications cannot occur so
long as the delegation is held.
o Layouts, which are recallable objects that assure the holder that
direct access to the file data may be performed directly by the
client and that no change to the data's location that is
inconsistent with that access may be made so long as the layout is
held.
All locks for a given client are tied together under a single client-
wide lease. All requests made on sessions associated with the client
renew that lease. When the client's lease is not promptly renewed,
the client's locks are subject to revocation. In the event of server
restart, clients have the opportunity to safely reclaim their locks
within a special grace period.
1.8. Differences from NFSv4.0
The following summarizes the major differences between minor version
1 and the base protocol:
o Implementation of the sessions model (Section 2.10).
o Parallel access to data (Section 12).
o Addition of the RECLAIM_COMPLETE operation to better structure the
lock reclamation process (Section 18.51).
o Enhanced delegation support as follows.
* Delegations on directories and other file types in addition to
regular files (Section 18.39, Section 18.49).
* Operations to optimize acquisition of recalled or denied
delegations (Section 18.49, Section 20.5, Section 20.7).
* Notifications of changes to files and directories
(Section 18.39, Section 20.4).
* A method to allow a server to indicate that it is recalling one
or more delegations for resource management reasons, and thus a
method to allow the client to pick which delegations to return
(Section 20.6).
o Attributes can be set atomically during exclusive file create via
the OPEN operation (see the new EXCLUSIVE4_1 creation method in
Section 18.16).
o Open files can be preserved if removed and the hard link count
("hard link" is defined in an Open Group [6] standard) goes to
zero, thus obviating the need for clients to rename deleted files
to partially hidden names -- colloquially called "silly rename"
(see the new OPEN4_RESULT_PRESERVE_UNLINKED reply flag in
Section 18.16).
o Improved compatibility with Microsoft Windows for Access Control
Lists (Section 6.2.3, Section 6.2.2, Section 6.4.3.2).
o Data retention (Section 5.13).
o Identification of the implementation of the NFS client and server
(Section 18.35).
o Support for notification of the availability of byte-range locks
(see the new OPEN4_RESULT_MAY_NOTIFY_LOCK reply flag in
Section 18.16 and see Section 20.11).
o In NFSv4.1, LIPKEY and SPKM-3 are not required security mechanisms
[32].
2. Core Infrastructure
2.1. Introduction
NFSv4.1 relies on core infrastructure common to nearly every
operation. This core infrastructure is described in the remainder of
this section.
2.2. RPC and XDR
The NFSv4.1 protocol is a Remote Procedure Call (RPC) application
that uses RPC version 2 and the corresponding eXternal Data
Representation (XDR) as defined in [3] and [2].
2.2.1. RPC-Based Security
Previous NFS versions have been thought of as having a host-based
authentication model, where the NFS server authenticates the NFS
client, and trusts the client to authenticate all users. Actually,
NFS has always depended on RPC for authentication. One of the first
forms of RPC authentication, AUTH_SYS, had no strong authentication
and required a host-based authentication approach. NFSv4.1 also
depends on RPC for basic security services and mandates RPC support
for a user-based authentication model. The user-based authentication
model has user principals authenticated by a server, and in turn the
server authenticated by user principals. RPC provides some basic
security services that are used by NFSv4.1.
2.2.1.1. RPC Security Flavors
As described in Section 7.2 ("Authentication") of [3], RPC security
is encapsulated in the RPC header, via a security or authentication
flavor, and information specific to the specified security flavor.
Every RPC header conveys information used to identify and
authenticate a client and server. As discussed in Section 2.2.1.1.1,
some security flavors provide additional security services.
NFSv4.1 clients and servers MUST implement RPCSEC_GSS. (This
requirement to implement is not a requirement to use.) Other
flavors, such as AUTH_NONE and AUTH_SYS, MAY be implemented as well.
2.2.1.1.1. RPCSEC_GSS and Security Services
RPCSEC_GSS [4] uses the functionality of GSS-API [7]. This allows
for the use of various security mechanisms by the RPC layer without
the additional implementation overhead of adding RPC security
flavors.
2.2.1.1.1.1. Identification, Authentication, Integrity, Privacy
Via the GSS-API, RPCSEC_GSS can be used to identify and authenticate
users on clients to servers, and servers to users. It can also
perform integrity checking on the entire RPC message, including the
RPC header, and on the arguments or results. Finally, privacy,
usually via encryption, is a service available with RPCSEC_GSS.
Privacy is performed on the arguments and results. Note that if
privacy is selected, integrity, authentication, and identification
are enabled. If privacy is not selected, but integrity is selected,
authentication and identification are enabled. If integrity and
privacy are not selected, but authentication is enabled,
identification is enabled. RPCSEC_GSS does not provide
identification as a separate service.
Although GSS-API has an authentication service distinct from its
privacy and integrity services, GSS-API's authentication service is
not used for RPCSEC_GSS's authentication service. Instead, each RPC
request and response header is integrity protected with the GSS-API
integrity service, and this allows RPCSEC_GSS to offer per-RPC
authentication and identity. See [4] for more information.
NFSv4.1 client and servers MUST support RPCSEC_GSS's integrity and
authentication service. NFSv4.1 servers MUST support RPCSEC_GSS's
privacy service. NFSv4.1 clients SHOULD support RPCSEC_GSS's privacy
service.
2.2.1.1.1.2. Security Mechanisms for NFSv4.1
RPCSEC_GSS, via GSS-API, normalizes access to mechanisms that provide
security services. Therefore, NFSv4.1 clients and servers MUST
support the Kerberos V5 security mechanism.
The use of RPCSEC_GSS requires selection of mechanism, quality of
protection (QOP), and service (authentication, integrity, privacy).
For the mandated security mechanisms, NFSv4.1 specifies that a QOP of
zero is used, leaving it up to the mechanism or the mechanism's
configuration to map QOP zero to an appropriate level of protection.
Each mandated mechanism specifies a minimum set of cryptographic
algorithms for implementing integrity and privacy. NFSv4.1 clients
and servers MUST be implemented on operating environments that comply
with the REQUIRED cryptographic algorithms of each REQUIRED
mechanism.
2.2.1.1.1.2.1. Kerberos V5
The Kerberos V5 GSS-API mechanism as described in [5] MUST be
implemented with the RPCSEC_GSS services as specified in the
following table:
column descriptions:
1 == number of pseudo flavor
2 == name of pseudo flavor
3 == mechanism's OID
4 == RPCSEC_GSS service
5 == NFSv4.1 clients MUST support
6 == NFSv4.1 servers MUST support
1 2 3 4 5 6
------------------------------------------------------------------
390003 krb5 1.2.840.113554.1.2.2 rpc_gss_svc_none yes yes
390004 krb5i 1.2.840.113554.1.2.2 rpc_gss_svc_integrity yes yes
390005 krb5p 1.2.840.113554.1.2.2 rpc_gss_svc_privacy no yes
Note that the number and name of the pseudo flavor are presented here
as a mapping aid to the implementor. Because the NFSv4.1 protocol
includes a method to negotiate security and it understands the GSS-
API mechanism, the pseudo flavor is not needed. The pseudo flavor is
needed for the NFSv3 since the security negotiation is done via the
MOUNT protocol as described in [33].
At the time NFSv4.1 was specified, the Advanced Encryption Standard
(AES) with HMAC-SHA1 was a REQUIRED algorithm set for Kerberos V5.
In contrast, when NFSv4.0 was specified, weaker algorithm sets were
REQUIRED for Kerberos V5, and were REQUIRED in the NFSv4.0
specification, because the Kerberos V5 specification at the time did
not specify stronger algorithms. The NFSv4.1 specification does not
specify REQUIRED algorithms for Kerberos V5, and instead, the
implementor is expected to track the evolution of the Kerberos V5
standard if and when stronger algorithms are specified.
2.2.1.1.1.2.1.1. Security Considerations for Cryptographic Algorithms
in Kerberos V5
When deploying NFSv4.1, the strength of the security achieved depends
on the existing Kerberos V5 infrastructure. The algorithms of
Kerberos V5 are not directly exposed to or selectable by the client
or server, so there is some due diligence required by the user of
NFSv4.1 to ensure that security is acceptable where needed.
2.2.1.1.1.3. GSS Server Principal
Regardless of what security mechanism under RPCSEC_GSS is being used,
the NFS server MUST identify itself in GSS-API via a
GSS_C_NT_HOSTBASED_SERVICE name type. GSS_C_NT_HOSTBASED_SERVICE
names are of the form:
service@hostname
For NFS, the "service" element is
nfs
Implementations of security mechanisms will convert nfs@hostname to
various different forms. For Kerberos V5, the following form is
RECOMMENDED:
nfs/hostname
2.3. COMPOUND and CB_COMPOUND
A significant departure from the versions of the NFS protocol before
NFSv4 is the introduction of the COMPOUND procedure. For the NFSv4
protocol, in all minor versions, there are exactly two RPC
procedures, NULL and COMPOUND. The COMPOUND procedure is defined as
a series of individual operations and these operations perform the
sorts of functions performed by traditional NFS procedures.
The operations combined within a COMPOUND request are evaluated in
order by the server, without any atomicity guarantees. A limited set
of facilities exist to pass results from one operation to another.
Once an operation returns a failing result, the evaluation ends and
the results of all evaluated operations are returned to the client.
With the use of the COMPOUND procedure, the client is able to build
simple or complex requests. These COMPOUND requests allow for a
reduction in the number of RPCs needed for logical file system
operations. For example, multi-component look up requests can be
constructed by combining multiple LOOKUP operations. Those can be
further combined with operations such as GETATTR, READDIR, or OPEN
plus READ to do more complicated sets of operation without incurring
additional latency.
NFSv4.1 also contains a considerable set of callback operations in
which the server makes an RPC directed at the client. Callback RPCs
have a similar structure to that of the normal server requests. In
all minor versions of the NFSv4 protocol, there are two callback RPC
procedures: CB_NULL and CB_COMPOUND. The CB_COMPOUND procedure is
defined in an analogous fashion to that of COMPOUND with its own set
of callback operations.
The addition of new server and callback operations within the
COMPOUND and CB_COMPOUND request framework provides a means of
extending the protocol in subsequent minor versions.
Except for a small number of operations needed for session creation,
server requests and callback requests are performed within the
context of a session. Sessions provide a client context for every
request and support robust reply protection for non-idempotent
requests.
2.4. Client Identifiers and Client Owners
For each operation that obtains or depends on locking state, the
specific client needs to be identifiable by the server.
Each distinct client instance is represented by a client ID. A
client ID is a 64-bit identifier representing a specific client at a
given time. The client ID is changed whenever the client re-
initializes, and may change when the server re-initializes. Client
IDs are used to support lock identification and crash recovery.
During steady state operation, the client ID associated with each
operation is derived from the session (see Section 2.10) on which the
operation is sent. A session is associated with a client ID when the
session is created.
Unlike NFSv4.0, the only NFSv4.1 operations possible before a client
ID is established are those needed to establish the client ID.
A sequence of an EXCHANGE_ID operation followed by a CREATE_SESSION
operation using that client ID (eir_clientid as returned from
EXCHANGE_ID) is required to establish and confirm the client ID on
the server. Establishment of identification by a new incarnation of
the client also has the effect of immediately releasing any locking
state that a previous incarnation of that same client might have had
on the server. Such released state would include all byte-range
lock, share reservation, layout state, and -- where the server
supports neither the CLAIM_DELEGATE_PREV nor CLAIM_DELEG_CUR_FH claim
types -- all delegation state associated with the same client with
the same identity. For discussion of delegation state recovery, see
Section 10.2.1. For discussion of layout state recovery, see
Section 12.7.1.
Releasing such state requires that the server be able to determine
that one client instance is the successor of another. Where this
cannot be done, for any of a number of reasons, the locking state
will remain for a time subject to lease expiration (see Section 8.3)
and the new client will need to wait for such state to be removed, if
it makes conflicting lock requests.
Client identification is encapsulated in the following client owner
data type:
struct client_owner4 {
verifier4 co_verifier;
opaque co_ownerid<NFS4_OPAQUE_LIMIT>;
};
The first field, co_verifier, is a client incarnation verifier. The
server will start the process of canceling the client's leased state
if co_verifier is different than what the server has previously
recorded for the identified client (as specified in the co_ownerid
field).
The second field, co_ownerid, is a variable length string that
uniquely defines the client so that subsequent instances of the same
client bear the same co_ownerid with a different verifier.
There are several considerations for how the client generates the
co_ownerid string:
o The string should be unique so that multiple clients do not
present the same string. The consequences of two clients
presenting the same string range from one client getting an error
to one client having its leased state abruptly and unexpectedly
cancelled.
o The string should be selected so that subsequent incarnations
(e.g., restarts) of the same client cause the client to present
the same string. The implementor is cautioned from an approach
that requires the string to be recorded in a local file because
this precludes the use of the implementation in an environment
where there is no local disk and all file access is from an
NFSv4.1 server.
o The string should be the same for each server network address that
the client accesses. This way, if a server has multiple
interfaces, the client can trunk traffic over multiple network
paths as described in Section 2.10.5. (Note: the precise opposite
was advised in the NFSv4.0 specification [30].)
o The algorithm for generating the string should not assume that the
client's network address will not change, unless the client
implementation knows it is using statically assigned network
addresses. This includes changes between client incarnations and
even changes while the client is still running in its current
incarnation. Thus, with dynamic address assignment, if the client
includes just the client's network address in the co_ownerid
string, there is a real risk that after the client gives up the
network address, another client, using a similar algorithm for
generating the co_ownerid string, would generate a conflicting
co_ownerid string.
Given the above considerations, an example of a well-generated
co_ownerid string is one that includes:
o If applicable, the client's statically assigned network address.
o Additional information that tends to be unique, such as one or
more of:
* The client machine's serial number (for privacy reasons, it is
best to perform some one-way function on the serial number).
* A Media Access Control (MAC) address (again, a one-way function
should be performed).
* The timestamp of when the NFSv4.1 software was first installed
on the client (though this is subject to the previously
mentioned caution about using information that is stored in a
file, because the file might only be accessible over NFSv4.1).
* A true random number. However, since this number ought to be
the same between client incarnations, this shares the same
problem as that of using the timestamp of the software
installation.
o For a user-level NFSv4.1 client, it should contain additional
information to distinguish the client from other user-level
clients running on the same host, such as a process identifier or
other unique sequence.
The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across
server restarts.
In the event of a server restart, a client may find out that its
current client ID is no longer valid when it receives an
NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on
the characteristics of the sessions involved, specifically whether
the session is persistent (see Section 2.10.6.5), but in each case
the client will receive this error when it attempts to establish a
new session with the existing client ID and receives the error
NFS4ERR_STALE_CLIENTID, indicating that a new client ID needs to be
obtained via EXCHANGE_ID and the new session established with that
client ID.
When a session is not persistent, the client will find out that it
needs to create a new session as a result of getting an
NFS4ERR_BADSESSION, since the session in question was lost as part of
a server restart. When the existing client ID is presented to a
server as part of creating a session and that client ID is not
recognized, as would happen after a server restart, the server will
reject the request with the error NFS4ERR_STALE_CLIENTID.
In the case of the session being persistent, the client will re-
establish communication using the existing session after the restart.
This session will be associated with the existing client ID but may
only be used to retransmit operations that the client previously
transmitted and did not see replies to. Replies to operations that
the server previously performed will come from the reply cache;
otherwise, NFS4ERR_DEADSESSION will be returned. Hence, such a
session is referred to as "dead". In this situation, in order to
perform new operations, the client needs to establish a new session.
If an attempt is made to establish this new session with the existing
client ID, the server will reject the request with
NFS4ERR_STALE_CLIENTID.
When NFS4ERR_STALE_CLIENTID is received in either of these
situations, the client needs to obtain a new client ID by use of the
EXCHANGE_ID operation, then use that client ID as the basis of a new
session, and then proceed to any other necessary recovery for the
server restart case (see Section 8.4.2).
See the descriptions of EXCHANGE_ID (Section 18.35) and
CREATE_SESSION (Section 18.36) for a complete specification of these
operations.
2.4.1. Upgrade from NFSv4.0 to NFSv4.1
To facilitate upgrade from NFSv4.0 to NFSv4.1, a server may compare a
value of data type client_owner4 in an EXCHANGE_ID with a value of
data type nfs_client_id4 that was established using the SETCLIENTID
operation of NFSv4.0. A server that does so will allow an upgraded
client to avoid waiting until the lease (i.e., the lease established
by the NFSv4.0 instance client) expires. This requires that the
value of data type client_owner4 be constructed the same way as the
value of data type nfs_client_id4. If the latter's contents included
the server's network address (per the recommendations of the NFSv4.0
specification [30]), and the NFSv4.1 client does not wish to use a
client ID that prevents trunking, it should send two EXCHANGE_ID
operations. The first EXCHANGE_ID will have a client_owner4 equal to
the nfs_client_id4. This will clear the state created by the NFSv4.0
client. The second EXCHANGE_ID will not have the server's network
address. The state created for the second EXCHANGE_ID will not have
to wait for lease expiration, because there will be no state to
expire.
2.4.2. Server Release of Client ID
NFSv4.1 introduces a new operation called DESTROY_CLIENTID
(Section 18.50), which the client SHOULD use to destroy a client ID
it no longer needs. This permits graceful, bilateral release of a
client ID. The operation cannot be used if there are sessions
associated with the client ID, or state with an unexpired lease.
If the server determines that the client holds no associated state
for its client ID (associated state includes unrevoked sessions,
opens, locks, delegations, layouts, and wants), the server MAY choose
to unilaterally release the client ID in order to conserve resources.
If the client contacts the server after this release, the server MUST
ensure that the client receives the appropriate error so that it will
use the EXCHANGE_ID/CREATE_SESSION sequence to establish a new client
ID. The server ought to be very hesitant to release a client ID
since the resulting work on the client to recover from such an event
will be the same burden as if the server had failed and restarted.
Typically, a server would not release a client ID unless there had
been no activity from that client for many minutes. As long as there
are sessions, opens, locks, delegations, layouts, or wants, the
server MUST NOT release the client ID. See Section 2.10.13.1.4 for
discussion on releasing inactive sessions.
2.4.3. Resolving Client Owner Conflicts
When the server gets an EXCHANGE_ID for a client owner that currently
has no state, or that has state but the lease has expired, the server
MUST allow the EXCHANGE_ID and confirm the new client ID if followed
by the appropriate CREATE_SESSION.
When the server gets an EXCHANGE_ID for a new incarnation of a client
owner that currently has an old incarnation with state and an
unexpired lease, the server is allowed to dispose of the state of the
previous incarnation of the client owner if one of the following is
true:
o The principal that created the client ID for the client owner is
the same as the principal that is sending the EXCHANGE_ID
operation. Note that if the client ID was created with
SP4_MACH_CRED state protection (Section 18.35), the principal MUST
be based on RPCSEC_GSS authentication, the RPCSEC_GSS service used
MUST be integrity or privacy, and the same GSS mechanism and
principal MUST be used as that used when the client ID was
created.
o The client ID was established with SP4_SSV protection
(Section 18.35, Section 2.10.8.3) and the client sends the
EXCHANGE_ID with the security flavor set to RPCSEC_GSS using the
GSS SSV mechanism (Section 2.10.9).
o The client ID was established with SP4_SSV protection, and under
the conditions described herein, the EXCHANGE_ID was sent with
SP4_MACH_CRED state protection. Because the SSV might not persist
across client and server restart, and because the first time a
client sends EXCHANGE_ID to a server it does not have an SSV, the
client MAY send the subsequent EXCHANGE_ID without an SSV
RPCSEC_GSS handle. Instead, as with SP4_MACH_CRED protection, the
principal MUST be based on RPCSEC_GSS authentication, the
RPCSEC_GSS service used MUST be integrity or privacy, and the same
GSS mechanism and principal MUST be used as that used when the
client ID was created.
If none of the above situations apply, the server MUST return
NFS4ERR_CLID_INUSE.
If the server accepts the principal and co_ownerid as matching that
which created the client ID, and the co_verifier in the EXCHANGE_ID
differs from the co_verifier used when the client ID was created,
then after the server receives a CREATE_SESSION that confirms the
client ID, the server deletes state. If the co_verifier values are
the same (e.g., the client either is updating properties of the
client ID (Section 18.35) or is attempting trunking (Section 2.10.5),
the server MUST NOT delete state.
2.5. Server Owners
The server owner is similar to a client owner (Section 2.4), but
unlike the client owner, there is no shorthand server ID. The server
owner is defined in the following data type:
struct server_owner4 {
uint64_t so_minor_id;
opaque so_major_id<NFS4_OPAQUE_LIMIT>;
};
The server owner is returned from EXCHANGE_ID. When the so_major_id
fields are the same in two EXCHANGE_ID results, the connections that
each EXCHANGE_ID were sent over can be assumed to address the same
server (as defined in Section 1.6). If the so_minor_id fields are
also the same, then not only do both connections connect to the same
server, but the session can be shared across both connections. The
reader is cautioned that multiple servers may deliberately or
accidentally claim to have the same so_major_id or so_major_id/
so_minor_id; the reader should examine Sections 2.10.5 and 18.35 in
order to avoid acting on falsely matching server owner values.
The considerations for generating a so_major_id are similar to that
for generating a co_ownerid string (see Section 2.4). The
consequences of two servers generating conflicting so_major_id values
are less dire than they are for co_ownerid conflicts because the
client can use RPCSEC_GSS to compare the authenticity of each server
(see Section 2.10.5).
2.6. Security Service Negotiation
With the NFSv4.1 server potentially offering multiple security
mechanisms, the client needs a method to determine or negotiate which
mechanism is to be used for its communication with the server. The
NFS server may have multiple points within its file system namespace
that are available for use by NFS clients. These points can be
considered security policy boundaries, and, in some NFS
implementations, are tied to NFS export points. In turn, the NFS
server may be configured such that each of these security policy
boundaries may have different or multiple security mechanisms in use.
The security negotiation between client and server SHOULD be done
with a secure channel to eliminate the possibility of a third party
intercepting the negotiation sequence and forcing the client and
server to choose a lower level of security than required or desired.
See Section 21 for further discussion.
2.6.1. NFSv4.1 Security Tuples
An NFS server can assign one or more "security tuples" to each
security policy boundary in its namespace. Each security tuple
consists of a security flavor (see Section 2.2.1.1) and, if the
flavor is RPCSEC_GSS, a GSS-API mechanism Object Identifier (OID), a
GSS-API quality of protection, and an RPCSEC_GSS service.
2.6.2. SECINFO and SECINFO_NO_NAME
The SECINFO and SECINFO_NO_NAME operations allow the client to
determine, on a per-filehandle basis, what security tuple is to be
used for server access. In general, the client will not have to use
either operation except during initial communication with the server
or when the client crosses security policy boundaries at the server.
However, the server's policies may also change at any time and force
the client to negotiate a new security tuple.
Where the use of different security tuples would affect the type of
access that would be allowed if a request was sent over the same
connection used for the SECINFO or SECINFO_NO_NAME operation (e.g.,
read-only vs. read-write) access, security tuples that allow greater
access should be presented first. Where the general level of access
is the same and different security flavors limit the range of
principals whose privileges are recognized (e.g., allowing or
disallowing root access), flavors supporting the greatest range of
principals should be listed first.
2.6.3. Security Error
Based on the assumption that each NFSv4.1 client and server MUST
support a minimum set of security (i.e., Kerberos V5 under
RPCSEC_GSS), the NFS client will initiate file access to the server
with one of the minimal security tuples. During communication with
the server, the client may receive an NFS error of NFS4ERR_WRONGSEC.
This error allows the server to notify the client that the security
tuple currently being used contravenes the server's security policy.
The client is then responsible for determining (see Section 2.6.3.1)
what security tuples are available at the server and choosing one
that is appropriate for the client.
2.6.3.1. Using NFS4ERR_WRONGSEC, SECINFO, and SECINFO_NO_NAME
This section explains the mechanics of NFSv4.1 security negotiation.
2.6.3.1.1. Put Filehandle Operations
The term "put filehandle operation" refers to PUTROOTFH, PUTPUBFH,
PUTFH, and RESTOREFH. Each of the subsections herein describes how
the server handles a subseries of operations that starts with a put
filehandle operation.
2.6.3.1.1.1. Put Filehandle Operation + SAVEFH
The client is saving a filehandle for a future RESTOREFH, LINK, or
RENAME. SAVEFH MUST NOT return NFS4ERR_WRONGSEC. To determine
whether or not the put filehandle operation returns NFS4ERR_WRONGSEC,
the server implementation pretends SAVEFH is not in the series of
operations and examines which of the situations described in the
other subsections of Section 2.6.3.1.1 apply.
2.6.3.1.1.2. Two or More Put Filehandle Operations
For a series of N put filehandle operations, the server MUST NOT
return NFS4ERR_WRONGSEC to the first N-1 put filehandle operations.
The Nth put filehandle operation is handled as if it is the first in
a subseries of operations. For example, if the server received a
COMPOUND request with this series of operations -- PUTFH, PUTROOTFH,
LOOKUP -- then the PUTFH operation is ignored for NFS4ERR_WRONGSEC
purposes, and the PUTROOTFH, LOOKUP subseries is processed as
according to Section 2.6.3.1.1.3.
2.6.3.1.1.3. Put Filehandle Operation + LOOKUP (or OPEN of an Existing
Name)
This situation also applies to a put filehandle operation followed by
a LOOKUP or an OPEN operation that specifies an existing component
name.
In this situation, the client is potentially crossing a security
policy boundary, and the set of security tuples the parent directory
supports may differ from those of the child. The server
implementation may decide whether to impose any restrictions on
security policy administration. There are at least three approaches
(sec_policy_child is the tuple set of the child export,
sec_policy_parent is that of the parent).
(a) sec_policy_child <= sec_policy_parent (<= for subset). This
means that the set of security tuples specified on the security
policy of a child directory is always a subset of its parent
directory.
(b) sec_policy_child ^ sec_policy_parent != {} (^ for intersection,
{} for the empty set). This means that the set of security
tuples specified on the security policy of a child directory
always has a non-empty intersection with that of the parent.
(c) sec_policy_child ^ sec_policy_parent == {}. This means that the
set of security tuples specified on the security policy of a
child directory may not intersect with that of the parent. In
other words, there are no restrictions on how the system
administrator may set up these tuples.
In order for a server to support approaches (b) (for the case when a
client chooses a flavor that is not a member of sec_policy_parent)
and (c), the put filehandle operation cannot return NFS4ERR_WRONGSEC
when there is a security tuple mismatch. Instead, it should be
returned from the LOOKUP (or OPEN by existing component name) that
follows.
Since the above guideline does not contradict approach (a), it should
be followed in general. Even if approach (a) is implemented, it is
possible for the security tuple used to be acceptable for the target
of LOOKUP but not for the filehandles used in the put filehandle
operation. The put filehandle operation could be a PUTROOTFH or
PUTPUBFH, where the client cannot know the security tuples for the
root or public filehandle. Or the security policy for the filehandle
used by the put filehandle operation could have changed since the
time the filehandle was obtained.
Therefore, an NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC in
response to the put filehandle operation if the operation is
immediately followed by a LOOKUP or an OPEN by component name.
2.6.3.1.1.4. Put Filehandle Operation + LOOKUPP
Since SECINFO only works its way down, there is no way LOOKUPP can
return NFS4ERR_WRONGSEC without SECINFO_NO_NAME. SECINFO_NO_NAME
solves this issue via style SECINFO_STYLE4_PARENT, which works in the
opposite direction as SECINFO. As with Section 2.6.3.1.1.3, a put
filehandle operation that is followed by a LOOKUPP MUST NOT return
NFS4ERR_WRONGSEC. If the server does not support SECINFO_NO_NAME,
the client's only recourse is to send the put filehandle operation,
LOOKUPP, GETFH sequence of operations with every security tuple it
supports.
Regardless of whether SECINFO_NO_NAME is supported, an NFSv4.1 server
MUST NOT return NFS4ERR_WRONGSEC in response to a put filehandle
operation if the operation is immediately followed by a LOOKUPP.
2.6.3.1.1.5. Put Filehandle Operation + SECINFO/SECINFO_NO_NAME
A security-sensitive client is allowed to choose a strong security
tuple when querying a server to determine a file object's permitted
security tuples. The security tuple chosen by the client does not
have to be included in the tuple list of the security policy of
either the parent directory indicated in the put filehandle operation
or the child file object indicated in SECINFO (or any parent
directory indicated in SECINFO_NO_NAME). Of course, the server has
to be configured for whatever security tuple the client selects;
otherwise, the request will fail at the RPC layer with an appropriate
authentication error.
In theory, there is no connection between the security flavor used by
SECINFO or SECINFO_NO_NAME and those supported by the security
policy. But in practice, the client may start looking for strong
flavors from those supported by the security policy, followed by
those in the REQUIRED set.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to a put
filehandle operation that is immediately followed by SECINFO or
SECINFO_NO_NAME. The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC
from SECINFO or SECINFO_NO_NAME.
2.6.3.1.1.6. Put Filehandle Operation + Nothing
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC.
2.6.3.1.1.7. Put Filehandle Operation + Anything Else
"Anything Else" includes OPEN by filehandle.
The security policy enforcement applies to the filehandle specified
in the put filehandle operation. Therefore, the put filehandle
operation MUST return NFS4ERR_WRONGSEC when there is a security tuple
mismatch. This avoids the complexity of adding NFS4ERR_WRONGSEC as
an allowable error to every other operation.
A COMPOUND containing the series put filehandle operation +
SECINFO_NO_NAME (style SECINFO_STYLE4_CURRENT_FH) is an efficient way
for the client to recover from NFS4ERR_WRONGSEC.
The NFSv4.1 server MUST NOT return NFS4ERR_WRONGSEC to any operation
other than a put filehandle operation, LOOKUP, LOOKUPP, and OPEN (by
component name).
2.6.3.1.1.8. Operations after SECINFO and SECINFO_NO_NAME
Suppose a client sends a COMPOUND procedure containing the series
SEQUENCE, PUTFH, SECINFO_NONAME, READ, and suppose the security tuple
used does not match that required for the target file. By rule (see
Section 2.6.3.1.1.5), neither PUTFH nor SECINFO_NO_NAME can return
NFS4ERR_WRONGSEC. By rule (see Section 2.6.3.1.1.7), READ cannot
return NFS4ERR_WRONGSEC. The issue is resolved by the fact that
SECINFO and SECINFO_NO_NAME consume the current filehandle (note that
this is a change from NFSv4.0). This leaves no current filehandle
for READ to use, and READ returns NFS4ERR_NOFILEHANDLE.
2.6.3.1.2. LINK and RENAME
The LINK and RENAME operations use both the current and saved
filehandles. Technically, the server MAY return NFS4ERR_WRONGSEC
from LINK or RENAME if the security policy of the saved filehandle
rejects the security flavor used in the COMPOUND request's
credentials. If the server does so, then if there is no intersection
between the security policies of saved and current filehandles, this
means that it will be impossible for the client to perform the
intended LINK or RENAME operation.
For example, suppose the client sends this COMPOUND request:
SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH, RENAME "c" "d", where
filehandles bFH and aFH refer to different directories. Suppose no
common security tuple exists between the security policies of aFH and
bFH. If the client sends the request using credentials acceptable to
bFH's security policy but not aFH's policy, then the PUTFH aFH
operation will fail with NFS4ERR_WRONGSEC. After a SECINFO_NO_NAME
request, the client sends SEQUENCE, PUTFH bFH, SAVEFH, PUTFH aFH,
RENAME "c" "d", using credentials acceptable to aFH's security policy
but not bFH's policy. The server returns NFS4ERR_WRONGSEC on the
RENAME operation.
To prevent a client from an endless sequence of a request containing
LINK or RENAME, followed by a request containing SECINFO_NO_NAME or
SECINFO, the server MUST detect when the security policies of the
current and saved filehandles have no mutually acceptable security
tuple, and MUST NOT return NFS4ERR_WRONGSEC from LINK or RENAME in
that situation. Instead the server MUST do one of two things:
o The server can return NFS4ERR_XDEV.
o The server can allow the security policy of the current filehandle
to override that of the saved filehandle, and so return NFS4_OK.