RFC 8166

Remote Direct Memory Access Transport for Remote Procedure Call Version 1

Pages: 55
Proposed Standard
→ Errata
Obsoletes: 5666
Updated by: 8797

Part 3 of 3 – Pages 44 to 55

RFC8166 - Page 44 prevText

7.  Protocol Extensibility

   The RPC-over-RDMA header format is specified using XDR, unlike the
   message header used with RPC-over-TCP.  To maintain a high degree of
   interoperability among implementations of RPC-over-RDMA, any change
   to this XDR requires a protocol version number change.  New versions
   of RPC-over-RDMA may be published as separate protocol specifications
   without updating this document.

   The first four fields in every RPC-over-RDMA header must remain
   aligned at the same fixed offsets for all versions of the RPC-over-
   RDMA protocol.  The version number must be in a fixed place to enable
   implementations to detect protocol version mismatches.

   For version mismatches to be reported in a fashion that all future
   version implementations can reliably decode, the rdma_proc field must
   remain in a fixed place, the value of ERR_VERS must always remain the
   same, and the field placement in struct rpc_rdma_errvers must always
   remain the same.

7.1.  Conventional Extensions

   Introducing new capabilities to RPC-over-RDMA version 1 is limited to
   the adoption of conventions that make use of existing XDR (defined in
   this document) and allowed abstract RDMA operations.  Because no
   mechanism for detecting optional features exists in RPC-over-RDMA
   version 1, implementations must rely on ULPs to communicate the
   existence of such extensions.

   Such extensions must be specified in a Standards Track RFC with
   appropriate review by the NFSv4 Working Group and the IESG.  An
   example of a conventional extension to RPC-over-RDMA version 1 is the
   specification of backward direction message support to enable NFSv4.1
   callback operations, described in [RFC8167].

8.  Security Considerations

8.1.  Memory Protection

   A primary consideration is the protection of the integrity and
   confidentiality of local memory by an RPC-over-RDMA transport.  The
   use of an RPC-over-RDMA transport protocol MUST NOT introduce
   vulnerabilities to system memory contents nor to memory owned by user
   processes.

RFC8166 - Page 45

   It is REQUIRED that any RDMA provider used for RPC transport be
   conformant to the requirements of [RFC5042] in order to satisfy these
   protections.  These protections are provided by the RDMA layer
   specifications, and in particular, their security models.

8.1.1.  Protection Domains

   The use of Protection Domains to limit the exposure of memory regions
   to a single connection is critical.  Any attempt by an endpoint not
   participating in that connection to reuse memory handles needs to
   result in immediate failure of that connection.  Because ULP security
   mechanisms rely on this aspect of Reliable Connection behavior,
   strong authentication of remote endpoints is recommended.

8.1.2.  Handle Predictability

   Unpredictable memory handles should be used for any operation
   requiring advertised memory regions.  Advertising a continuously
   registered memory region allows a remote host to read or write to
   that region even when an RPC involving that memory is not under way.
   Therefore, implementations should avoid advertising persistently
   registered memory.

8.1.3.  Memory Protection

   Requesters should register memory regions for remote access only when
   they are about to be the target of an RPC operation that involves an
   RDMA Read or Write.

   Registered memory regions should be invalidated as soon as related
   RPC operations are complete.  Invalidation and DMA unmapping of
   memory regions should be complete before message integrity checking
   is done and before the RPC consumer is allowed to continue execution
   and use or alter the contents of a memory region.

   An RPC transaction on a Requester might be terminated before a reply
   arrives if the RPC consumer exits unexpectedly (for example, it is
   signaled or a segmentation fault occurs).  When an RPC terminates
   abnormally, memory regions associated with that RPC should be
   invalidated appropriately before the regions are released to be
   reused for other purposes on the Requester.

8.1.4.  Denial of Service

   A detailed discussion of denial-of-service exposures that can result
   from the use of an RDMA transport is found in Section 6.4 of
   [RFC5042].

RFC8166 - Page 46

   A Responder is not obliged to pull Read chunks that are unreasonably
   large.  The Responder can use an RDMA_ERROR response to terminate
   RPCs with unreadable Read chunks.  If a Responder transmits more data
   than a Requester is prepared to receive in a Write or Reply chunk,
   the RDMA Network Interface Cards (RNICs) typically terminate the
   connection.  For further discussion, see Section 4.5.  Such repeated
   chunk errors can deny service to other users sharing the connection
   from the errant Requester.

   An RPC-over-RDMA transport implementation is not responsible for
   throttling the RPC request rate, other than to keep the number of
   concurrent RPC transactions at or under the number of credits granted
   per connection.  This is explained in Section 3.3.1.  A sender can
   trigger a self denial of service by exceeding the credit grant
   repeatedly.

   When an RPC has been canceled due to a signal or premature exit of an
   application process, a Requester may invalidate the RPC's Write and
   Reply chunks.  Invalidation prevents the subsequent arrival of the
   Responder's reply from altering the memory regions associated with
   those chunks after the memory has been reused.

   On the Requester, a malfunctioning application or a malicious user
   can create a situation where RPCs are continuously initiated and then
   aborted, resulting in Responder replies that terminate the underlying
   RPC-over-RDMA connection repeatedly.  Such situations can deny
   service to other users sharing the connection from that Requester.

8.2.  RPC Message Security

   ONC RPC provides cryptographic security via the RPCSEC_GSS framework
   [RFC7861].  RPCSEC_GSS implements message authentication
   (rpc_gss_svc_none), per-message integrity checking
   (rpc_gss_svc_integrity), and per-message confidentiality
   (rpc_gss_svc_privacy) in the layer above RPC-over-RDMA.  The latter
   two services require significant computation and movement of data on
   each endpoint host.  Some performance benefits enabled by RDMA
   transports can be lost.

8.2.1.  RPC-over-RDMA Protection at Lower Layers

   For any RPC transport, utilizing RPCSEC_GSS integrity or privacy
   services has performance implications.  Protection below the RPC
   transport is often more appropriate in performance-sensitive
   deployments, especially if it, too, can be offloaded.  Certain
   configurations of IPsec can be co-located in RDMA hardware, for
   example, without change to RDMA consumers and little loss of data

RFC8166 - Page 47

   movement efficiency.  Such arrangements can also provide a higher
   degree of privacy by hiding endpoint identity or altering the
   frequency at which messages are exchanged, at a performance cost.

   The use of protection in a lower layer MAY be negotiated through the
   use of an RPCSEC_GSS security flavor defined in [RFC7861] in
   conjunction with the Channel Binding mechanism [RFC5056] and IPsec
   Channel Connection Latching [RFC5660].  Use of such mechanisms is
   REQUIRED where integrity or confidentiality is desired and where
   efficiency is required.

8.2.2.  RPCSEC_GSS on RPC-over-RDMA Transports

   Not all RDMA devices and fabrics support the above protection
   mechanisms.  Also, per-message authentication is still required on
   NFS clients where multiple users access NFS files.  In these cases,
   RPCSEC_GSS can protect NFS traffic conveyed on RPC-over-RDMA
   connections.

   RPCSEC_GSS extends the ONC RPC protocol [RFC5531] without changing
   the format of RPC messages.  By observing the conventions described
   in this section, an RPC-over-RDMA transport can convey RPCSEC_GSS-
   protected RPC messages interoperably.

   As part of the ONC RPC protocol, protocol elements of RPCSEC_GSS that
   appear in the Payload stream of an RPC-over-RDMA message (such as
   control messages exchanged as part of establishing or destroying a
   security context or data items that are part of RPCSEC_GSS
   authentication material) MUST NOT be reduced.

8.2.2.1.  RPCSEC_GSS Context Negotiation

   Some NFS client implementations use a separate connection to
   establish a Generic Security Service (GSS) context for NFS operation.
   These clients use TCP and the standard NFS port (2049) for context
   establishment.  To enable the use of RPCSEC_GSS with NFS/RDMA, an NFS
   server MUST also provide a TCP-based NFS service on port 2049.

8.2.2.2.  RPC-over-RDMA with RPCSEC_GSS Authentication

   The RPCSEC_GSS authentication service has no impact on the DDP-
   eligibility of data items in a ULP.

   However, RPCSEC_GSS authentication material appearing in an RPC
   message header can be larger than, say, an AUTH_SYS authenticator.
   In particular, when an RPCSEC_GSS pseudoflavor is in use, a Requester

RFC8166 - Page 48

   needs to accommodate a larger RPC credential when marshaling RPC Call
   messages and needs to provide for a maximum size RPCSEC_GSS verifier
   when allocating reply buffers and Reply chunks.

   RPC messages, and thus Payload streams, are made larger as a result.
   ULP operations that fit in a Short Message when a simpler form of
   authentication is in use might need to be reduced, or conveyed via a
   Long Message, when RPCSEC_GSS authentication is in use.  It is more
   likely that a Requester provides both a Read list and a Reply chunk
   in the same RPC-over-RDMA header to convey a Long Call and provision
   a receptacle for a Long Reply.  More frequent use of Long Messages
   can impact transport efficiency.

8.2.2.3.  RPC-over-RDMA with RPCSEC_GSS Integrity or Privacy

   The RPCSEC_GSS integrity service enables endpoints to detect
   modification of RPC messages in flight.  The RPCSEC_GSS privacy
   service prevents all but the intended recipient from viewing the
   cleartext content of RPC arguments and results.  RPCSEC_GSS integrity
   and privacy services are end-to-end.  They protect RPC arguments and
   results from application to server endpoint, and back.

   The RPCSEC_GSS integrity and encryption services operate on whole RPC
   messages after they have been XDR encoded for transmit, and before
   they have been XDR decoded after receipt.  Both sender and receiver
   endpoints use intermediate buffers to prevent exposure of encrypted
   data or unverified cleartext data to RPC consumers.  After
   verification, encryption, and message wrapping has been performed,
   the transport layer MAY use RDMA data transfer between these
   intermediate buffers.

   The process of reducing a DDP-eligible data item removes the data
   item and its XDR padding from the encoded XDR stream.  XDR padding of
   a reduced data item is not transferred in an RPC-over-RDMA message.
   After reduction, the Payload stream contains fewer octets than the
   whole XDR stream did beforehand.  XDR padding octets are often zero
   bytes, but they don't have to be.  Thus, reducing DDP-eligible items
   affects the result of message integrity verification or encryption.

   Therefore, a sender MUST NOT reduce a Payload stream when RPCSEC_GSS
   integrity or encryption services are in use.  Effectively, no data
   item is DDP-eligible in this situation, and Chunked Messages cannot
   be used.  In this mode, an RPC-over-RDMA transport operates in the
   same manner as a transport that does not support DDP.

RFC8166 - Page 49

   When an RPCSEC_GSS integrity or privacy service is in use, a
   Requester provides both a Read list and a Reply chunk in the same
   RPC-over-RDMA header to convey a Long Call and provision a receptacle
   for a Long Reply.

8.2.2.4.  Protecting RPC-over-RDMA Transport Headers

   Like the base fields in an ONC RPC message (XID, call direction, and
   so on), the contents of an RPC-over-RDMA message's Transport stream
   are not protected by RPCSEC_GSS.  This exposes XIDs, connection
   credit limits, and chunk lists (but not the content of the data items
   they refer to) to malicious behavior, which could redirect data that
   is transferred by the RPC-over-RDMA message, result in spurious
   retransmits, or trigger connection loss.

   In particular, if an attacker alters the information contained in the
   chunk lists of an RPC-over-RDMA header, data contained in those
   chunks can be redirected to other registered memory regions on
   Requesters.  An attacker might alter the arguments of RDMA Read and
   RDMA Write operations on the wire to similar effect.  If such
   alterations occur, the use of RPCSEC_GSS integrity or privacy
   services enable a Requester to detect unexpected material in a
   received RPC message.

   Encryption at lower layers, as described in Section 8.2.1, protects
   the content of the Transport stream.  To address attacks on RDMA
   protocols themselves, RDMA transport implementations should conform
   to [RFC5042].

9.  IANA Considerations

   A set of RPC netids for resolving RPC-over-RDMA services is specified
   by this document.  This is unchanged from [RFC5666].

   The RPC-over-RDMA transport has been assigned an RPC netid, which is
   an rpcbind [RFC1833] string used to describe the underlying protocol
   in order for RPC to select the appropriate transport framing, as well
   as the format of the service addresses and ports.

   The following netid registry strings are defined for this purpose:

      NC_RDMA "rdma"
      NC_RDMA6 "rdma6"

   The "rdma" netid is to be used when IPv4 addressing is employed by
   the underlying transport, and "rdma6" for IPv6 addressing.  The netid
   assignment policy and registry are defined in [RFC5665].

RFC8166 - Page 50

   These netids MAY be used for any RDMA network that satisfies the
   requirements of Section 2.3.2 and that is able to identify service
   endpoints using IP port addressing, possibly through use of a
   translation service as described in Section 5.

   The use of the RPC-over-RDMA protocol has no effect on RPC Program
   numbers or existing registered port numbers.  However, new port
   numbers MAY be registered for use by RPC-over-RDMA-enabled services,
   as appropriate to the new networks over which the services will
   operate.

   For example, the NFS/RDMA service defined in [RFC5667] has been
   assigned the port 20049 in the "Service Name and Transport Protocol
   Port Number Registry".  This is distinct from the port number defined
   for NFS on TCP, which is assigned the port 2049 in the same registry.
   NFS clients use the same RPC Program number for NFS (100003) when
   using either transport [RFC5531] (see the "Remote Procedure Call
   (RPC) Program Numbers" registry).

10.  References

10.1.  Normative References

   [RFC1833]  Srinivasan, R., "Binding Protocols for ONC RPC Version 2",
              RFC 1833, DOI 10.17487/RFC1833, August 1995,
              <http://www.rfc-editor.org/info/rfc1833>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

   [RFC4506]  Eisler, M., Ed., "XDR: External Data Representation
              Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May
              2006, <http://www.rfc-editor.org/info/rfc4506>.

   [RFC5042]  Pinkerton, J. and E. Deleganes, "Direct Data Placement
              Protocol (DDP) / Remote Direct Memory Access Protocol
              (RDMAP) Security", RFC 5042, DOI 10.17487/RFC5042, October
              2007, <http://www.rfc-editor.org/info/rfc5042>.

   [RFC5056]  Williams, N., "On the Use of Channel Bindings to Secure
              Channels", RFC 5056, DOI 10.17487/RFC5056, November 2007,
              <http://www.rfc-editor.org/info/rfc5056>.

   [RFC5531]  Thurlow, R., "RPC: Remote Procedure Call Protocol
              Specification Version 2", RFC 5531, DOI 10.17487/RFC5531,
              May 2009, <http://www.rfc-editor.org/info/rfc5531>.

RFC8166 - Page 51

   [RFC5660]  Williams, N., "IPsec Channels: Connection Latching",
              RFC 5660, DOI 10.17487/RFC5660, October 2009,
              <http://www.rfc-editor.org/info/rfc5660>.

   [RFC5665]  Eisler, M., "IANA Considerations for Remote Procedure Call
              (RPC) Network Identifiers and Universal Address Formats",
              RFC 5665, DOI 10.17487/RFC5665, January 2010,
              <http://www.rfc-editor.org/info/rfc5665>.

   [RFC7861]  Adamson, A. and N. Williams, "Remote Procedure Call (RPC)
              Security Version 3", RFC 7861, DOI 10.17487/RFC7861,
              November 2016, <http://www.rfc-editor.org/info/rfc7861>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <http://www.rfc-editor.org/info/rfc8174>.

10.2.  Informative References

   [IBARCH]   InfiniBand Trade Association, "InfiniBand Architecture
              Specification Volume 1", Release 1.3, March 2015,
              <http://www.infinibandta.org/content/
              pages.php?pg=technology_download>.

   [RFC768]   Postel, J., "User Datagram Protocol", STD 6, RFC 768,
              DOI 10.17487/RFC0768, August 1980,
              <http://www.rfc-editor.org/info/rfc768>.

   [RFC793]   Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, DOI 10.17487/RFC0793, September 1981,
              <http://www.rfc-editor.org/info/rfc793>.

   [RFC1094]  Nowicki, B., "NFS: Network File System Protocol
              specification", RFC 1094, DOI 10.17487/RFC1094, March
              1989, <http://www.rfc-editor.org/info/rfc1094>.

   [RFC1813]  Callaghan, B., Pawlowski, B., and P. Staubach, "NFS
              Version 3 Protocol Specification", RFC 1813,
              DOI 10.17487/RFC1813, June 1995,
              <http://www.rfc-editor.org/info/rfc1813>.

   [RFC5040]  Recio, R., Metzler, B., Culley, P., Hilland, J., and D.
              Garcia, "A Remote Direct Memory Access Protocol
              Specification", RFC 5040, DOI 10.17487/RFC5040, October
              2007, <http://www.rfc-editor.org/info/rfc5040>.

RFC8166 - Page 52

   [RFC5041]  Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct
              Data Placement over Reliable Transports", RFC 5041,
              DOI 10.17487/RFC5041, October 2007,
              <http://www.rfc-editor.org/info/rfc5041>.

   [RFC5532]  Talpey, T. and C. Juszczak, "Network File System (NFS)
              Remote Direct Memory Access (RDMA) Problem Statement",
              RFC 5532, DOI 10.17487/RFC5532, May 2009,
              <http://www.rfc-editor.org/info/rfc5532>.

   [RFC5661]  Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
              "Network File System (NFS) Version 4 Minor Version 1
              Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010,
              <http://www.rfc-editor.org/info/rfc5661>.

   [RFC5662]  Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed.,
              "Network File System (NFS) Version 4 Minor Version 1
              External Data Representation Standard (XDR) Description",
              RFC 5662, DOI 10.17487/RFC5662, January 2010,
              <http://www.rfc-editor.org/info/rfc5662>.

   [RFC5666]  Talpey, T. and B. Callaghan, "Remote Direct Memory Access
              Transport for Remote Procedure Call", RFC 5666,
              DOI 10.17487/RFC5666, January 2010,
              <http://www.rfc-editor.org/info/rfc5666>.

   [RFC5667]  Talpey, T. and B. Callaghan, "Network File System (NFS)
              Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667,
              January 2010, <http://www.rfc-editor.org/info/rfc5667>.

   [RFC7530]  Haynes, T., Ed. and D. Noveck, Ed., "Network File System
              (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530,
              March 2015, <http://www.rfc-editor.org/info/rfc7530>.

   [RFC8167]  Lever, C., "Bidirectional Remote Procedure Call on RPC-
              over-RDMA Transports", RFC 8167, DOI 10.17487/RFC8167,
              June 2017, <http://www.rfc-editor.org/info/rfc8167>.

RFC8166 - Page 53

Appendix A.  Changes from RFC 5666

A.1.  Changes to the Specification

   The following alterations have been made to the RPC-over-RDMA version
   1 specification.  The section numbers below refer to [RFC5666].

   o  Section 2 has been expanded to introduce and explain key RPC
      [RFC5531], XDR [RFC4506], and RDMA [RFC5040] terminology.  These
      terms are now used consistently throughout the specification.

   o  Section 3 has been reorganized and split into subsections to help
      readers locate specific requirements and definitions.

   o  Sections 4 and 5 have been combined to improve the organization of
      this information.

   o  The optional Connection Configuration Protocol has never been
      implemented.  The specification of CCP has been deleted from this
      specification.

   o  A section consolidating requirements for ULBs has been added.

   o  An XDR extraction mechanism is provided, along with full
      copyright, matching the approach used in [RFC5662].

   o  The "Security Considerations" section has been expanded to include
      a discussion of how RPC-over-RDMA security depends on features of
      the underlying RDMA transport.

   o  A subsection describing the use of RPCSEC_GSS [RFC7861] with RPC-
      over-RDMA version 1 has been added.

A.2.  Changes to the Protocol

   Although the protocol described herein interoperates with existing
   implementations of [RFC5666], the following changes have been made
   relative to the protocol described in that document:

   o  Support for the Read-Read transfer model has been removed.  Read-
      Read is a slower transfer model than Read-Write.  As a result,
      implementers have chosen not to support it.  Removal of Read-Read
      simplifies explanatory text, and the RDMA_DONE procedure is no
      longer part of the protocol.

RFC8166 - Page 54

   o  The specification of RDMA_MSGP in [RFC5666] is not adequate,
      although some incomplete implementations exist.  Even if an
      adequate specification were provided and an implementation were
      produced, benefit for protocols such as NFSv4.0 [RFC7530] is
      doubtful.  Therefore, the RDMA_MSGP message type is no longer
      supported.

   o  Technical issues with regard to handling RPC-over-RDMA header
      errors have been corrected.

   o  Specific requirements related to implicit XDR roundup and complex
      XDR data types have been added.

   o  Explicit guidance is provided related to sizing Write chunks,
      managing multiple chunks in the Write list, and handling unused
      Write chunks.

   o  Clear guidance about Send and Receive buffer sizes has been
      introduced.  This enables better decisions about when a Reply
      chunk must be provided.

Acknowledgments

   The editor gratefully acknowledges the work of Brent Callaghan and
   Tom Talpey on the original RPC-over-RDMA Version 1 specification
   [RFC5666].

   Dave Noveck provided excellent review, constructive suggestions, and
   consistent navigational guidance throughout the process of drafting
   this document.  Dave also contributed much of the organization and
   content of Section 7 and helped the authors understand the
   complexities of XDR extensibility.

   The comments and contributions of Karen Deitke, Dai Ngo, Chunli
   Zhang, Dominique Martinet, and Mahesh Siddheshwar are accepted with
   great thanks.  The editor also wishes to thank Bill Baker, Greg
   Marsden, and Matt Benjamin for their support of this work.

   The extract.sh shell script and formatting conventions were first
   described by the authors of the NFSv4.1 XDR specification [RFC5662].

   Special thanks go to Transport Area Director Spencer Dawkins, NFSV4
   Working Group Chair and Document Shepherd Spencer Shepler, and NFSV4
   Working Group Secretary Thomas Haynes for their support.

RFC8166 - Page 55

Authors' Addresses

   Charles Lever (editor)
   Oracle Corporation
   1015 Granger Avenue
   Ann Arbor, MI  48104
   United States of America

   Phone: +1 248 816 6463
   Email: chuck.lever@oracle.com


   William Allen Simpson
   Red Hat
   1384 Fontaine
   Madison Heights, MI  48071
   United States of America

   Email: william.allen.simpson@gmail.com


   Tom Talpey
   Microsoft Corp.
   One Microsoft Way
   Redmond, WA  98052
   United States of America

   Phone: +1 425 704-9945
   Email: ttalpey@microsoft.com