tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search     info

RFC 5040

Proposed STD
Pages: 66
Top     in Index     Prev     Next
in Group Index     Prev in Group     Next in Group     Group: RDDP

A Remote Direct Memory Access Protocol Specification

Part 1 of 3, p. 1 to 19
None       Next RFC Part

Updated by:    7146


Top       ToC       Page 1 
Network Working Group                                           R. Recio
Request for Comments: 5040                                    B. Metzler
Category: Standards Track                                IBM Corporation
                                                               P. Culley
                                                              J. Hilland
                                                 Hewlett-Packard Company
                                                               D. Garcia
                                                            October 2007


          A Remote Direct Memory Access Protocol Specification

Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   This document defines a Remote Direct Memory Access Protocol (RDMAP)
   that operates over the Direct Data Placement Protocol (DDP protocol).
   RDMAP provides read and write services directly to applications and
   enables data to be transferred directly into Upper Layer Protocol
   (ULP) Buffers without intermediate data copies.  It also enables a
   kernel bypass implementation.

Top       Page 2 
Table of Contents

   1. Introduction ....................................................4
      1.1. Architectural Goals ........................................4
      1.2. Protocol Overview ..........................................5
      1.3. RDMAP Layering .............................................7
   2. Glossary ........................................................8
      2.1. General ....................................................8
      2.2. LLP .......................................................10
      2.3. Direct Data Placement (DDP) ...............................11
      2.4. Remote Direct Memory Access (RDMA) ........................13
   3. ULP and Transport Attributes ...................................15
      3.1. Transport Requirements and Assumptions ....................15
      3.2. RDMAP Interactions with the ULP ...........................16
   4. Header Format ..................................................19
      4.1. RDMAP Control and Invalidate STag Field ...................20
      4.2. RDMA Message Definitions ..................................23
      4.3. RDMA Write Header .........................................24
      4.4. RDMA Read Request Header ..................................24
      4.5. RDMA Read Response Header .................................26
      4.6. Send Header and Send with Solicited Event Header ..........26
      4.7. Send with Invalidate Header and Send with SE and
           Invalidate Header .........................................26
      4.8. Terminate Header ..........................................26
   5. Data Transfer ..................................................32
      5.1. RDMA Write Message ........................................32
      5.2. RDMA Read Operation .......................................33
           5.2.1. RDMA Read Request Message ..........................33
           5.2.2. RDMA Read Response Message .........................35
      5.3. Send Message Type .........................................36
      5.4. Terminate Message .........................................37
      5.5. Ordering and Completions ..................................38
   6. RDMAP Stream Management ........................................41
      6.1. Stream Initialization .....................................41
      6.2. Stream Teardown ...........................................42
           6.2.1. RDMAP Abortive Termination .........................43
   7. RDMAP Error Management .........................................43
      7.1. RDMAP Error Surfacing .....................................44
      7.2. Errors Detected at the Remote Peer on Incoming
           RDMA Messages .............................................45
   8. Security Considerations ........................................46
      8.1. Summary of RDMAP-Specific Security Requirements ...........46
           8.1.1. RDMAP (RNIC) Requirements ..........................47
           8.1.2. Privileged Resource Manager Requirements ...........48
      8.2. Security Services for RDMAP ...............................49
           8.2.1. Available Security Services ........................49
           8.2.2. Requirements for IPsec Services for RDMAP ..........50
   9. IANA Considerations ............................................51

Top      ToC       Page 3 
   10. References ....................................................52
      10.1. Normative References .....................................52
      10.2. Informative References ...................................53
   Appendix A. DDP Segment Formats for RDMA Messages .................54
      A.1. DDP Segment for RDMA Write ................................54
      A.2. DDP Segment for RDMA Read Request .........................55
      A.3. DDP Segment for RDMA Read Response ........................56
      A.4. DDP Segment for Send and Send with Solicited Event ........56
      A.5. DDP Segment for Send with Invalidate and Send with SE and
           Invalidate ................................................57
      A.6. DDP Segment for Terminate .................................58
   Appendix B. Ordering and Completion Table .........................59
   Appendix C. Contributors ..........................................61

Table of Figures

   Figure 1: RDMAP Layering ...........................................7
   Figure 2: Example of MPA, DDP, and RDMAP Header Alignment over TCP .8
   Figure 3: DDP Control, RDMAP Control, and Invalidate STag Fields ..20
   Figure 4: RDMA Usage of DDP Fields ................................22
   Figure 5: RDMA Message Definitions ................................23
   Figure 6: RDMA Read Request Header Format .........................24
   Figure 7: Terminate Header Format .................................27
   Figure 8: Terminate Control Field .................................27
   Figure 9: Terminate Control Field Values ..........................29
   Figure 10: Error Type to RDMA Message Mapping .....................32
   Figure 11: RDMA Write, DDP Segment Format .........................54
   Figure 12: RDMA Read Request, DDP Segment Format ..................55
   Figure 13: RDMA Read Response, DDP Segment Format .................56
   Figure 14: Send and Send with Solicited Event, DDP Segment Format .56
   Figure 15: Send with Invalidate and Send with SE and Invalidate,
              DDP Segment Format .....................................57
   Figure 16: Terminate, DDP Segment Format ..........................58
   Figure 17: Operation Ordering .....................................59

Top      ToC       Page 4 
1.  Introduction

   Today, communications over TCP/IP typically require copy operations,
   which add latency and consume significant CPU and memory resources.
   The Remote Direct Memory Access Protocol (RDMAP) enables removal of
   data copy operations and enables reduction in latencies by allowing a
   local application to read or write data on a remote computer's memory
   with minimal demands on memory bus bandwidth and CPU processing
   overhead, while preserving memory protection semantics.

   RDMAP is layered on top of Direct Data Placement (DDP) and uses the
   two buffer models available from DDP.  DDP-related terminology is
   discussed in Section 2.3.  As RDMAP builds on DDP, the reader is
   advised to become familiar with [DDP].

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

1.1.  Architectural Goals

   RDMAP has been designed with the following high-level architectural
   goals:

   *  Provide a data transfer operation that allows a Local Peer to
      transfer up to 2^32 - 1 octets directly into a previously
      Advertised Buffer (i.e., Tagged Buffer) located at a Remote Peer
      without requiring a copy operation.  This is referred to as the
      RDMA Write data transfer operation.

   *  Provide a data transfer operation that allows a Local Peer to
      retrieve up to 2^32 - 1 octets directly from a previously
      Advertised Buffer (i.e., Tagged Buffer) located at a Remote Peer
      without requiring a copy operation.  This is referred to as the
      RDMA Read data transfer operation.

   *  Provide a data transfer operation that allows a Local Peer to send
      up to 2^32 - 1 octets directly into a buffer located at a Remote
      Peer that has not been explicitly Advertised.  This is referred to
      as the Send (Send with Invalidate, Send with Solicited Event, and
      Send with Solicited Event and Invalidate) data transfer operation.

   *  Enable the local ULP to use the Send Operation Type (includes
      Send, Send with Invalidate, Send with Solicited Event, and Send
      with Solicited Event and Invalidate) to signal to the remote ULP
      the Completion of all previous Messages initiated by the local
      ULP.

Top      ToC       Page 5 
   *  Provide for all operations on a single RDMAP Stream to be reliably
      transmitted in the order that they were submitted.

   *  Provide RDMAP capabilities independently for each Stream when the
      LLP supports multiple data Streams within an LLP connection.

1.2.  Protocol Overview

   RDMAP provides seven data transfer operations.  Except for the RDMA
   Read operation, each operation generates exactly one RDMA Message.
   Following is a brief overview of the RDMA Operations and RDMA
   Messages:

   1.  Send - A Send operation uses a Send Message to transfer data from
       the Data Source into a buffer that has not been explicitly
       Advertised by the Data Sink.  The Send Message uses the DDP
       Untagged Buffer Model to transfer the ULP Message into the Data
       Sink's Untagged Buffer.

   2.  Send with Invalidate - A Send with Invalidate operation uses a
       Send with Invalidate Message to transfer data from the Data
       Source into a buffer that has not been explicitly Advertised by
       the Data Sink.  The Send with Invalidate Message includes all
       functionality of the Send Message, with one addition: an STag
       field is included in the Send with Invalidate Message.  After the
       message has been Placed and Delivered at the Data Sink, the
       Remote Peer's buffer identified by the STag can no longer be
       accessed remotely until the Remote Peer's ULP re-enables access
       and Advertises the buffer.

   3.  Send with Solicited Event (Send with SE) - A Send with Solicited
       Event operation uses a Send with Solicited Event Message to
       transfer data from the Data Source into an Untagged Buffer at the
       Data Sink.  The Send with Solicited Event Message is similar to
       the Send Message, with one addition: when the Send with Solicited
       Event Message has been Placed and Delivered, an Event may be
       generated at the recipient, if the recipient is configured to
       generate such an Event.

   4.  Send with Solicited Event and Invalidate (Send with SE and
       Invalidate) - A Send with Solicited Event and Invalidate
       operation uses a Send with Solicited Event and Invalidate Message
       to transfer data from the Data Source into a buffer that has not
       been explicitly Advertised by the Data Sink.  The Send with
       Solicited Event and Invalidate Message is similar to the Send
       with Invalidate Message, with one addition: when the Send with

Top      ToC       Page 6 
       Solicited Event and Invalidate Message has been Placed and
       Delivered, an Event may be generated at the recipient, if the
       recipient is configured to generate such an Event.

   5.  Remote Direct Memory Access Write - An RDMA Write operation uses
       an RDMA Write Message to transfer data from the Data Source to a
       previously Advertised Buffer at the Data Sink.

       The ULP at the Remote Peer, which in this case is the Data Sink,
       enables the Data Sink Tagged Buffer for access and Advertises the
       buffer's size (length), location (Tagged Offset), and Steering
       Tag (STag) to the Data Source through a ULP-specific mechanism.
       The ULP at the Local Peer, which in this case is the Data Source,
       initiates the RDMA Write operation.  The RDMA Write Message uses
       the DDP Tagged Buffer Model to transfer the ULP Message into the
       Data Sink's Tagged Buffer.  Note: the STag associated with the
       Tagged Buffer remains valid until the ULP at the Remote Peer
       invalidates it or the ULP at the Local Peer invalidates it
       through a Send with Invalidate or Send with Solicited Event and
       Invalidate.

   6.  Remote Direct Memory Access Read - The RDMA Read operation
       transfers data to a Tagged Buffer at the Local Peer, which in
       this case is the Data Sink, from a Tagged Buffer at the Remote
       Peer, which in this case is the Data Source.  The ULP at the Data
       Source enables the Data Source Tagged Buffer for access and
       Advertises the buffer's size (length), location (Tagged Offset),
       and Steering Tag (STag) to the Data Sink through a ULP-specific
       mechanism.  The ULP at the Data Sink enables the Data Sink Tagged
       Buffer for access and initiates the RDMA Read operation.  The
       RDMA Read operation consists of a single RDMA Read Request
       Message and a single RDMA Read Response Message, and the latter
       may be segmented into multiple DDP Segments.

       The RDMA Read Request Message uses the DDP Untagged Buffer Model
       to Deliver the STag, starting Tagged Offset, and length for both
       the Data Source and Data Sink Tagged Buffers to the Remote Peer's
       RDMA Read Request Queue.

       The RDMA Read Response Message uses the DDP Tagged Buffer Model
       to Deliver the Data Source's Tagged Buffer to the Data Sink,
       without any involvement from the ULP at the Data Source.

       Note: the Data Source STag associated with the Tagged Buffer
       remains valid until the ULP at the Data Source invalidates it or
       the ULP at the Data Sink invalidates it through a Send with

Top      ToC       Page 7 
       Invalidate or Send with Solicited Event and Invalidate.  The Data
       Sink STag associated with the Tagged Buffer remains valid until
       the ULP at the Data Sink invalidates it.

   7.  Terminate - A Terminate operation uses a Terminate Message to
       transfer to the Remote Peer information associated with an error
       that occurred at the Local Peer.  The Terminate Message uses the
       DDP Untagged Buffer Model to transfer the Message into the Data
       Sink's Untagged Buffer.

1.3.  RDMAP Layering

   RDMAP is dependent on DDP, subject to the requirements defined in
   Section 3.1, "Transport Requirements and Assumptions".  Figure 1,
   "RDMAP Layering", depicts the relationship between Upper Layer
   Protocols (ULPs), RDMAP, DDP protocol, the framing layer, and the
   transport.  For LLP protocol definitions of each LLP, see [MPA],
   [TCP], and [SCTP].

                 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                 |                                     |
                 |     Upper Layer Protocol (ULP)      |
                 |                                     |
                 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                 |                                     |
                 |              RDMAP                  |
                 |                                     |
                 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                 |                                     |
                 |           DDP protocol              |
                 |                                     |
                 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                 |                 |                   |
                 |       MPA       |                   |
                 |                 |                   |
                 +-+-+-+-+-+-+-+-+-+       SCTP        |
                 |                 |                   |
                 |       TCP       |                   |
                 |                 |                   |
                 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 1: RDMAP Layering

   If RDMAP is layered over DDP/MPA/TCP, then the respective headers and
   ULP Payload are arranged as follows (Note: For clarity, MPA header
   and CRC fields are included but MPA markers are not shown):

Top      ToC       Page 8 
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    //                           TCP Header                        //
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |         MPA Header            |                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
    |                                                               |
    //                        DDP Header                           //
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    //                        RDMA Header                          //
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                                                               |
    //                        ULP Payload                          //
    //                 (shown with no pad bytes)                   //
    //                                                             //
    |                                                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           MPA CRC                             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 2: Example of MPA, DDP, and RDMAP Header Alignment over TCP

2.  Glossary

2.1.  General

   Advertisement (Advertised, Advertise, Advertisements, Advertises) -
       the act of informing a Remote Peer that a local RDMA Buffer is
       available to it.  A Node makes available an RDMA Buffer for
       incoming RDMA Read or RDMA Write access by informing its RDMA/DDP
       peer of the Tagged Buffer identifiers (STag, base address, and
       buffer length).  This Advertisement of Tagged Buffer information
       is not defined by RDMA/DDP and is left to the ULP.  A typical
       method would be for the Local Peer to embed the Tagged Buffer's
       Steering Tag, base address, and length in a Send Message destined
       for the Remote Peer.

   Completion - Refer to "RDMA Completion" in Section 2.4.

   Completed - See "RDMA Completion" in Section 2.4.

   Complete - See "RDMA Completion" in Section 2.4.

Top      ToC       Page 9 
   Completes - See "RDMA Completion" in Section 2.4.

   Data Sink - The peer receiving a data payload.  Note that the Data
       Sink can be required to both send and receive RDMA/DDP Messages
       to transfer a data payload.

   Data Source - The peer sending a data payload.  Note that the Data
       Source can be required to both send and receive RDMA/DDP Messages
       to transfer a data payload.

   Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined
       as the process of informing the ULP or consumer that a particular
       Message is available for use.  This is specifically different
       from "Placement", which may generally occur in any order, while
       the order of "Delivery" is strictly defined.  See "Data
       Placement" in Section 2.3.

   Delivery - See Data Delivery in Section 2.1.

   Delivered - See Data Delivery in Section 2.1.

   Delivers - See Data Delivery in Section 2.1.

   Fabric - The collection of links, switches, and routers that connect
       a set of Nodes with RDMA/DDP protocol implementations.

   Fence (Fenced, Fences) - To block the current RDMA Operation from
       executing until prior RDMA Operations have Completed.

   iWARP - A suite of wire protocols comprised of RDMAP, DDP, and MPA.
       The iWARP protocol suite may be layered above TCP, SCTP, or other
       transport protocols.

   Local Peer - The RDMA/DDP protocol implementation on the local end of
       the connection.  Used to refer to the local entity when
       describing a protocol exchange or other interaction between two
       Nodes.

   Node - A computing device attached to one or more links of a Fabric
       (network).  A Node in this context does not refer to a specific
       application or protocol instantiation running on the computer.  A
       Node may consist of one or more RNICs installed in a host
       computer.

   Placement - See "Data Placement" in Section 2.3.

   Placed - See "Data Placement" in Section 2.3.

Top      ToC       Page 10 
   Places - See "Data Placement" in Section 2.3.

   Remote Peer - The RDMA/DDP protocol implementation on the opposite
       end of the connection.  Used to refer to the remote entity when
       describing protocol exchanges or other interactions between two
       Nodes.

   RNIC - RDMA Network Interface Controller.  In this context, this
       would be a network I/O adapter or embedded controller with iWARP
       and Verbs functionality.

   RNIC Interface (RI) - The presentation of the RNIC to the Verbs
       Consumer as implemented through the combination of the RNIC and
       the RNIC driver.

   Termination - See "RDMAP Abortive Termination" in Section 2.4.

   Terminated - See "RDMAP Abortive Termination" in Section 2.4.

   Terminate - See "RDMAP Abortive Termination" in Section 2.4.

   Terminates - See "RDMAP Abortive Termination" in Section 2.4.

   ULP - Upper Layer Protocol.  The protocol layer above the one
       currently being referenced.  The ULP for RDMA/DDP is expected to
       be an OS, Application, adaptation layer, or proprietary device.
       The RDMA/DDP documents do not specify a ULP -- they provide a set
       of semantics that allow a ULP to be designed to utilize RDMA/DDP.

   ULP Payload - The ULP data that is contained within a single protocol
       segment or packet (e.g., a DDP Segment).

   Verbs - An abstract description of the functionality of an RNIC
       Interface.  The OS may expose some or all of this functionality
       via one or more APIs to applications.  The OS will also use some
       of the functionality to manage the RNIC Interface.

2.2.  LLP

   LLP - Lower Layer Protocol.  The protocol layer beneath the protocol
       layer currently being referenced.  For example, for DDP, the LLP
       is SCTP, MPA, or other transport protocols.  For RDMA, the LLP is
       DDP.

   LLP Connection - Corresponds to an LLP transport-level connection
       between the peer LLP layers on two Nodes.

Top      ToC       Page 11 
   LLP Stream - Corresponds to a single LLP transport-level Stream
       between the peer LLP layers on two Nodes.  One or more LLP
       Streams may map to a single transport-level LLP connection.  For
       transport protocols that support multiple Streams per connection
       (e.g., SCTP), an LLP Stream corresponds to one transport-level
       Stream.

   MULPDU - Maximum ULPDU.  The current maximum size of the record that
       is acceptable for DDP to pass to the LLP for transmission.

   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by
       the layer above MPA.

2.3.  Direct Data Placement (DDP)

   Data Placement (Placement, Placed, Places) - For DDP, this term is
       specifically used to indicate the process of writing to a data
       buffer by a DDP implementation.  DDP Segments carry Placement
       information, which may be used by the receiving DDP
       implementation to perform Data Placement of the DDP Segment ULP
       Payload.  See "Data Delivery".

   DDP Abortive Teardown - The act of closing a DDP Stream without
       attempting to Complete in-progress and pending DDP Messages.

   DDP Graceful Teardown - The act of closing a DDP Stream such that all
       in-progress and pending DDP Messages are allowed to Complete
       successfully.

   DDP Control Field - A fixed 16-bit field in the DDP Header.  The DDP
       Control Field contains an 8-bit field whose contents are reserved
       for use by the ULP.

   DDP Header - The header present in all DDP segments.  The DDP Header
       contains control and Placement fields that are used to define the
       final Placement location for the ULP Payload carried in a DDP
       Segment.

   DDP Message - A ULP-defined unit of data interchange, which is
       subdivided into one or more DDP segments.  This segmentation may
       occur for a variety of reasons, including segmentation to respect
       the maximum segment size of the underlying transport protocol.

   DDP Segment - The smallest unit of data transfer for the DDP
       protocol.  It includes a DDP Header and ULP Payload (if present).
       A DDP Segment should be sized to fit within the underlying
       transport protocol MULPDU.

Top      ToC       Page 12 
   DDP Stream - A sequence of DDP Messages whose ordering is defined by
       the LLP.  For SCTP, a DDP Stream maps directly to an SCTP Stream.
       For MPA, a DDP Stream maps directly to a TCP connection, and a
       single DDP Stream is supported.  Note that DDP has no ordering
       guarantees between DDP Streams.

   Direct Data Placement - A mechanism whereby ULP data contained within
       DDP Segments may be Placed directly into its final destination in
       memory without processing of the ULP.  This may occur even when
       the DDP Segments arrive out of order.  Out-of-order Placement
       support may require the Data Sink to implement the LLP and DDP as
       one functional block.

   Direct Data Placement Protocol (DDP) - Also, a wire protocol that
       supports Direct Data Placement by associating explicit memory
       buffer placement information with the LLP payload units.

   Message Offset (MO) - For the DDP Untagged Buffer Model, specifies
       the offset, in bytes, from the start of a DDP Message.

   Message Sequence Number (MSN) - For the DDP Untagged Buffer Model,
       specifies a sequence number that is increasing with each DDP
       Message.

   Queue Number (QN) - For the DDP Untagged Buffer Model, identifies a
       destination Data Sink queue for a DDP Segment.

   Steering Tag - An identifier of a Tagged Buffer on a Node, valid as
       defined within a protocol specification.

   STag - Steering Tag

   Tagged Buffer - A buffer that is explicitly Advertised to the Remote
       Peer through exchange of an STag, Tagged Offset, and length.

   Tagged Buffer Model - A DDP data transfer model used to transfer
       Tagged Buffers from the Local Peer to the Remote Peer.

   Tagged DDP Message - A DDP Message that targets a Tagged Buffer.

   Tagged Offset (TO) - The offset within a Tagged Buffer on a Node.

   Untagged Buffer - A buffer that is not explicitly Advertised to the
       Remote Peer.  Untagged Buffers support one of the two available
       data transfer mechanisms called the Untagged Buffer Model.  An
       Untagged Buffer is used to send asynchronous control messages to
       the Remote Peer for RDMA Read, Send, and Terminate requests.
       Untagged Buffers handle Untagged DDP Messages.

Top      ToC       Page 13 
   Untagged Buffer Model - A DDP data transfer model used to transfer
       Untagged Buffers from the Local Peer to the Remote Peer.

   Untagged DDP Message - A DDP Message that targets an Untagged Buffer.

2.4.  Remote Direct Memory Access (RDMA)

   Completion Queues (CQs) - Logical components of the RNIC Interface
       that conceptually represent how an RNIC notifies the ULP about
       the completion of the transmission of data, or the completion of
       the reception of data; see [RDMASEC].

   Event - An indication provided by the RDMAP layer to the ULP to
       indicate a Completion or other condition requiring immediate
       attention.

   Invalidate STag - A mechanism used to prevent the Remote Peer from
       reusing a previous explicitly Advertised STag, until the Local
       Peer makes it available through a subsequent explicit
       Advertisement.  The STag cannot be accessed remotely until it is
       explicitly Advertised again.

   RDMA Completion (Completion, Completed, Complete, Completes) - For
       RDMA, Completion is defined as the process of informing the ULP
       that a particular RDMA Operation has performed all functions
       specified for the RDMA Operations, including Placement and
       Delivery.  The Completion semantic of each RDMA Operation is
       distinctly defined.

   RDMA Message - A data transfer mechanism used to fulfill an RDMA
       Operation.

   RDMA Operation - A sequence of RDMA Messages, including control
       Messages, to transfer data from a Data Source to a Data Sink.
       The following RDMA Operations are defined: RDMA Writes, RDMA
       Read, Send, Send with Invalidate, Send with Solicited Event, Send
       with Solicited Event and Invalidate, and Terminate.

   RDMA Protocol (RDMAP) - A wire protocol that supports RDMA Operations
       to transfer ULP data between a Local Peer and the Remote Peer.

   RDMAP Abortive Termination (Termination, Terminated, Terminate,
       Terminates) - The act of closing an RDMAP Stream without
       attempting to Complete in-progress and pending RDMA Operations.

   RDMAP Graceful Termination - The act of closing an RDMAP Stream such
       that all in-progress and pending RDMA Operations are allowed to
       Complete successfully.

Top      ToC       Page 14 
   RDMA Read - An RDMA Operation used by the Data Sink to transfer the
       contents of a source RDMA buffer from the Remote Peer to the
       Local Peer.  An RDMA Read operation consists of a single RDMA
       Read Request Message and a single RDMA Read Response Message.

   RDMA Read Request - An RDMA Message used by the Data Sink to request
       the Data Source to transfer the contents of an RDMA buffer.  The
       RDMA Read Request Message describes both the Data Source and Data
       Sink RDMA buffers.

   RDMA Read Request Queue - The queue used for processing RDMA Read
       Requests.  The RDMA Read Request Queue has a DDP Queue Number of
       1.

   RDMA Read Response - An RDMA Message used by the Data Source to
       transfer the contents of an RDMA buffer to the Data Sink, in
       response to an RDMA Read Request.  The RDMA Read Response Message
       only describes the data sink RDMA buffer.

   RDMAP Stream - An association between a pair of RDMAP
       implementations, possibly on different Nodes, which transfer ULP
       data using RDMA Operations.  There may be multiple RDMAP Streams
       on a single Node.  An RDMAP Stream maps directly to a single DDP
       Stream.

   RDMA Write - An RDMA Operation that transfers the contents of a
       source RDMA Buffer from the Local Peer to a destination RDMA
       Buffer at the Remote Peer using RDMA.  The RDMA Write Message
       only describes the Data Sink RDMA buffer.

   Remote Direct Memory Access (RDMA) - A method of accessing memory on
       a remote system in which the local system specifies the remote
       location of the data to be transferred.  Employing an RNIC in the
       remote system allows the access to take place without
       interrupting the processing of the CPU(s) on the system.

   Send - An RDMA Operation that transfers the contents of a ULP Buffer
       from the Local Peer to an Untagged Buffer at the Remote Peer.

   Send Message Type - A Send Message, Send with Invalidate Message,
       Send with Solicited Event Message, or Send with Solicited Event
       and Invalidate Message.

   Send Operation Type - A Send Operation, Send with Invalidate
       Operation, Send with Solicited Event Operation, or Send with
       Solicited Event and Invalidate Operation.

Top      ToC       Page 15 
   Solicited Event (SE) - A facility by which an RDMA Operation sender
       may cause an Event to be generated at the recipient, if the
       recipient is configured to generate such an Event, when a Send
       with Solicited Event Message or Send with Solicited Event and
       Invalidate Message is received.  Note: The Local Peer's ULP can
       use the Solicited Event mechanism to ensure that Messages
       designated as important to the ULP are handled in an expeditious
       manner by the Remote Peer's ULP.  The ULP at the Local Peer can
       indicate a given Send Message Type is important by using the Send
       with Solicited Event Message or Send with Solicited Event and
       Invalidate Message.  The ULP at the Remote Peer can choose to
       only be notified when valid Send with Solicited Event Messages
       and/or Send with Solicited Event and Invalidate Messages arrive
       and handle other valid incoming Send Messages or Send with
       Invalidate Messages at its leisure.

   Terminate - An RDMA Message used by a Node to pass an error
       indication to the peer Node on an RDMAP Stream.  This operation
       is for RDMAP use only.

   ULP Buffer - A buffer owned above the RDMAP layer and Advertised to
       the RDMAP layer either as a Tagged Buffer or an Untagged ULP
       Buffer.

   ULP Message - The ULP data that is handed to a specific protocol
       layer for transmission.  Data boundaries are preserved as they
       are transmitted through iWARP.

3.  ULP and Transport Attributes

3.1.  Transport Requirements and Assumptions

   RDMAP MUST be layered on top of the Direct Data Placement Protocol
   [DDP].

   RDMAP requires the following DDP support:

   *  RDMAP uses three queues for Untagged Buffers:

      *  Queue Number 0 (used by RDMAP for Send, Send with Invalidate,
         Send with Solicited Event, and Send with Solicited Event and
         Invalidate operations).

      *  Queue Number 1 (used by RDMAP for RDMA Read operations).

      *  Queue Number 2 (used by RDMAP for Terminate operations).

   *  DDP maps a single RDMA Message to a single DDP Message.

Top      ToC       Page 16 
   *  DDP uses the STag and Tagged Offset provided by the RDMAP for
      Tagged Buffer Messages (i.e., RDMA Write and RDMA Read Response).

   *  When the DDP layer Delivers an Untagged DDP Message to the RDMAP
      layer, DDP provides the length of the DDP Message.  This ensures
      that RDMAP does not have to carry a length field in its header.

   *  When the RDMAP layer provides an RDMA Message to the DDP layer,
      DDP must insert the RsvdULP field value provided by the RDMAP
      layer into the associated DDP Message.

   *  When the DDP layer Delivers a DDP Message to the RDMAP layer, DDP
      provides the RsvdULP field.

   *  The RsvdULP field must be 1 octet for DDP Tagged Messages and 5
      octets for DDP Untagged Messages.

   *  DDP propagates to RDMAP all operation or protection errors (used
      by RDMAP Terminate) and, when appropriate, the DDP Header fields
      of the DDP Segment that encountered the error.

   *  If an RDMA Operation is aborted by DDP or a lower layer, the
      contents of the Data Sink buffers associated with the operation
      are considered indeterminate.

   *  DDP, in conjunction with the lower layers, provides reliable, in-
      order Delivery.

3.2.  RDMAP Interactions with the ULP

   RDMAP provides the ULP with access to the following RDMA Operations
   as defined in this specification:

   *  Send

   *  Send with Solicited Event

   *  Send with Invalidate

   *  Send with Solicited Event and Invalidate

   *  RDMA Write

   *  RDMA Read

Top      ToC       Page 17 
   For Send Operation Types, the following are the interactions between
   the RDMAP layer and the ULP:

   *  At the Data Source:

      *  The ULP passes to the RDMAP layer the following:

         *  ULP Message Length

         *  ULP Message

         *  An indication of the Send Operation Type, where the valid
            types are: Send, Send with Solicited Event, Send with
            Invalidate, or Send with Solicited Event and Invalidate.

         *  An Invalidate STag, if the Send Operation Type was Send with
            Invalidate or Send with Solicited Event and Invalidate.

      *  When the Send Operation Type Completes, an indication of the
         Completion results.

   *  At the Data Sink:

      *  If the Send Operation Type Completed successfully, the RDMAP
         layer passes the following information to the ULP Layer:

         *  ULP Message Length

         *  ULP Message

         *  An Event, if the Data Sink is configured to generate an
            Event.

         *  An Invalidated STag, if the Send Operation Type was Send
            with Invalidate or Send with Solicited Event and Invalidate.

      *  If the Send Operation Type Completed in error, the Data Sink
         RDMAP layer will pass up the corresponding error information to
         the Data Sink ULP and send a Terminate Message to the Data
         Source RDMAP layer.  The Data Source RDMAP layer will then pass
         up the Terminate Message to the ULP.

   For RDMA Write operations, the following are the interactions between
   the RDMAP layer and the ULP:

   *  At the Data Source:

      *  The ULP passes to the RDMAP layer the following:

Top      ToC       Page 18 
         *  ULP Message Length

         *  ULP Message

         *  Data Sink STag

         *  Data Sink Tagged Offset

         *  When the RDMA Write operation Completes, an indication of
            the Completion results.

   *  At the Data Sink:

      *  If the RDMA Write completed successfully, the RDMAP layer does
         not Deliver the RDMA Write to the ULP.  It does Place the ULP
         Message transferred through the RDMA Write Message into the ULP
         Buffer.

      *  If the RDMA Write completed in error, the Data Sink RDMAP layer
         will pass up the corresponding error information to the Data
         Sink ULP and send a Terminate Message to the Data Source RDMAP
         layer.  The Data Source RDMAP layer will then pass up the
         Terminate Message to the ULP.

   For RDMA Read operations, the following are the interactions between
   the RDMAP layer and the ULP:

   *  At the Data Sink:

      *  The ULP passes to the RDMAP layer the following:

         *  ULP Message Length

         *  Data Source STag

         *  Data Sink STag

         *  Data Source Tagged Offset

         *  Data Sink Tagged Offset

      *  When the RDMA Read operation Completes, an indication of the
         Completion results.

   *  At the Data Source:

      *  If no error occurred while processing the RDMA Read Request,
         the Data Source will not pass up any information to the ULP.

Top      ToC       Page 19 
      *  If an error occurred while processing the RDMA Read Request,
         the Data Source RDMAP layer will pass up the corresponding
         error information to the Data Source ULP and send a Terminate
         Message to the Data Sink RDMAP layer.  The Data Sink RDMAP
         layer will then pass up the Terminate Message to the ULP.

   For STags made available to the RDMAP layer, following are the
   interactions between the RDMAP layer and the ULP:

   *  If the ULP enables an STag, the ULP passes the following to the
      RDMAP layer:

      *  STag;

      *  range of Tagged Offsets that are associated with a given STag;

      *  remote access rights (read, write, or read and write)
         associated with a given, valid STag; and

      *  association between a given STag and a given RDMAP Stream.

   *  If the ULP disables an STag, the ULP passes to the RDMAP layer the
      STag.

   If an error occurs at the RDMAP layer, the RDMAP layer may pass back
   error information (e.g., the content of a Terminate Message) to the
   ULP.



(page 19 continued on part 2)

Next RFC Part