tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search     info

RFC 7145

Proposed STD
Pages: 91
Top     in Index     Prev     Next
in Group Index     Prev in Group     Next in Group     Group: STORM

Internet Small Computer System Interface (iSCSI) Extensions for the Remote Direct Memory Access (RDMA) Specification

Part 1 of 5, p. 1 to 20
None       Next RFC Part

Obsoletes:    5046


Top       ToC       Page 1 
Internet Engineering Task Force (IETF)                             M. Ko
Request for Comments: 7145
Obsoletes: 5046                                             A. Nezhinsky
Category: Standards Track                                       Mellanox
ISSN: 2070-1721                                               April 2014


      Internet Small Computer System Interface (iSCSI) Extensions
        for the Remote Direct Memory Access (RDMA) Specification

Abstract

   Internet Small Computer System Interface (iSCSI) Extensions for
   Remote Direct Memory Access (RDMA) provides the RDMA data transfer
   capability to iSCSI by layering iSCSI on top of an RDMA-Capable
   Protocol.  An RDMA-Capable Protocol provides RDMA Read and Write
   services, which enable data to be transferred directly into SCSI I/O
   Buffers without intermediate data copies.  This document describes
   the extensions to the iSCSI protocol to support RDMA services as
   provided by an RDMA-Capable Protocol.

   This document obsoletes RFC 5046.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc7145.

Page 2 
Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................5
      1.1. Motivation .................................................5
      1.2. iSCSI/iSER Layering ........................................6
      1.3. Architectural Goals ........................................7
      1.4. Protocol Overview ..........................................7
      1.5. RDMA Services and iSER .....................................9
           1.5.1. STag ................................................9
           1.5.2. Send ...............................................10
           1.5.3. RDMA Write .........................................11
           1.5.4. RDMA Read ..........................................11
      1.6. SCSI Read Overview ........................................11
      1.7. SCSI Write Overview .......................................12
   2. Definitions and Acronyms .......................................12
      2.1. Definitions ...............................................12
      2.2. Acronyms ..................................................18
      2.3. Conventions ...............................................20
   3. Upper-Layer Interface Requirements .............................20
      3.1. Operational Primitives offered by iSER ....................21
           3.1.1. Send_Control .......................................21
           3.1.2. Put_Data ...........................................21
           3.1.3. Get_Data ...........................................22
           3.1.4. Allocate_Connection_Resources ......................22
           3.1.5. Deallocate_Connection_Resources ....................23
           3.1.6. Enable_Datamover ...................................23
           3.1.7. Connection_Terminate ...............................23
           3.1.8. Notice_Key_Values ..................................24
           3.1.9. Deallocate_Task_Resources ..........................24
      3.2. Operational Primitives Used by iSER .......................24
           3.2.1. Control_Notify .....................................25
           3.2.2. Data_Completion_Notify .............................25
           3.2.3. Data_ACK_Notify ....................................25

Top      ToC       Page 3 
           3.2.4. Connection_Terminate_Notify ........................26
      3.3. iSCSI Protocol Usage Requirements .........................26
   4. Lower-Layer Interface Requirements .............................27
      4.1. Interactions with the RCaP Layer ..........................27
      4.2. Interactions with the Transport Layer .....................28
   5. Connection Setup and Termination ...............................28
      5.1. iSCSI/iSER Connection Setup ...............................28
           5.1.1. Initiator Behavior .................................30
           5.1.2. Target Behavior ....................................31
           5.1.3. iSER Hello Exchange ................................33
      5.2. iSCSI/iSER Connection Termination .........................36
           5.2.1. Normal Connection Termination at the Initiator .....36
           5.2.2. Normal Connection Termination at the Target ........36
           5.2.3. Termination without Logout Request/Response PDUs ...37
   6. Login/Text Operational Keys ....................................38
      6.1. HeaderDigest and DataDigest ...............................38
      6.2. MaxRecvDataSegmentLength ..................................38
      6.3. RDMAExtensions ............................................39
      6.4. TargetRecvDataSegmentLength ...............................40
      6.5. InitiatorRecvDataSegmentLength ............................41
      6.6. OFMarker and IFMarker .....................................41
      6.7. MaxOutstandingUnexpectedPDUs ..............................41
      6.8. MaxAHSLength ..............................................42
      6.9. TaggedBufferForSolicitedDataOnly ..........................43
      6.10. iSERHelloRequired ........................................43
   7. iSCSI PDU Considerations .......................................44
      7.1. iSCSI Data-Type PDU .......................................44
      7.2. iSCSI Control-Type PDU ....................................45
      7.3. iSCSI PDUs ................................................45
           7.3.1. SCSI Command .......................................45
           7.3.2. SCSI Response ......................................47
           7.3.3. Task Management Function Request/Response ..........49
           7.3.4. SCSI Data-out ......................................50
           7.3.5. SCSI Data-in .......................................51
           7.3.6. Ready To Transfer (R2T) ............................53
           7.3.7. Asynchronous Message ...............................55
           7.3.8. Text Request and Text Response .....................55
           7.3.9. Login Request and Login Response ...................55
           7.3.10. Logout Request and Logout Response ................56
           7.3.11. SNACK Request .....................................56
           7.3.12. Reject ............................................56
           7.3.13. NOP-Out and NOP-In ................................57
   8. Flow Control and STag Management ...............................57
      8.1. Flow Control for RDMA Send Messages .......................57
           8.1.1. Flow Control for Control-Type PDUs from the
                  Initiator ..........................................58
           8.1.2. Flow Control for Control-Type PDUs from the
                  Target .............................................60

Top      ToC       Page 4 
      8.2. Flow Control for RDMA Read Resources ......................61
      8.3. STag Management ...........................................62
           8.3.1. Allocation of STags ................................62
           8.3.2. Invalidation of STags ..............................62
   9. iSER Control and Data Transfer .................................64
      9.1. iSER Header Format ........................................64
      9.2. iSER Header Format for iSCSI Control-Type PDU .............65
      9.3. iSER Header Format for iSER Hello Message .................67
      9.4. iSER Header Format for iSER HelloReply Message ............68
      9.5. SCSI Data Transfer Operations .............................69
           9.5.1. SCSI Write Operation ...............................69
           9.5.2. SCSI Read Operation ................................70
           9.5.3. Bidirectional Operation ............................70
   10. iSER Error Handling and Recovery ..............................71
      10.1. Error Handling ...........................................71
           10.1.1. Errors in the Transport Layer .....................71
           10.1.2. Errors in the RCaP Layer ..........................72
           10.1.3. Errors in the iSER Layer ..........................73
           10.1.4. Errors in the iSCSI Layer .........................75
      10.2. Error Recovery ...........................................76
           10.2.1. PDU Recovery ......................................77
           10.2.2. Connection Recovery ...............................77
   11. Security Considerations .......................................78
   12. IANA Considerations ...........................................79
   13. References ....................................................79
      13.1. Normative References .....................................79
      13.2. Informative References ...................................80
   Appendix A. Summary of Changes from RFC 5046 ......................81
   Appendix B. Message Format for iSER ...............................83
   B.1. iWARP Message Format for iSER Hello Message ..................83
   B.2. iWARP Message Format for iSER HelloReply Message .............84
   B.3. iSER Header Format for SCSI Read Command PDU .................85
   B.4. iSER Header Format for SCSI Write Command PDU ................86
   B.5. iSER Header Format for SCSI Response PDU .....................87
   Appendix C. Architectural discussion of iSER over InfiniBand ......88
   C.1. Host Side of iSCSI and iSER Connections in InfiniBand ........88
   C.2. Storage Side of iSCSI and iSER Mixed Network Environment .....89
   C.3. Discovery Processes for an InfiniBand Host ...................89
   C.4. IBTA Connection Specifications ...............................90
   Appendix D. Acknowledgments .......................................90

Top      ToC       Page 5 
Table of Figures

   Figure 1. Example of iSCSI/iSER Layering in Full Feature Phase .....6
   Figure 2. iSER Header Format ......................................64
   Figure 3. iSER Header Format for iSCSI Control-Type PDU ...........65
   Figure 4. iSER Header Format for iSER Hello Message ...............67
   Figure 5. iSER Header Format for iSER HelloReply Message ..........68
   Figure 6. SendSE Message Containing an iSER Hello Message .........83
   Figure 7. SendSE Message Containing an iSER HelloReply Message ....84
   Figure 8. iSER Header Format for SCSI Read Command PDU ............85
   Figure 9. iSER Header Format for SCSI Write Command PDU ...........86
   Figure 10. iSER Header Format for SCSI Response PDU ...............87
   Figure 11. iSCSI and iSER on IB ...................................88
   Figure 12. Storage Controller with TCP, iWARP, and IB Connections .89

1.  Introduction

1.1.  Motivation

   The iSCSI protocol ([iSCSI]) is a mapping of the SCSI Architecture
   Model (see [SAM5] and [iSCSI-SAM]) over the TCP protocol.  SCSI
   commands are carried by iSCSI requests, and SCSI responses and status
   are carried by iSCSI responses.  Other iSCSI protocol exchanges and
   SCSI Data are also transported in iSCSI PDUs.

   Out-of-order TCP segments in the Traditional iSCSI model have to be
   stored and reassembled before the iSCSI protocol layer within an end
   node can place the data in the iSCSI buffers.  This reassembly is
   required because not every TCP segment is likely to contain an iSCSI
   header to enable its placement and TCP itself does not have a built-
   in mechanism for signaling ULP (Upper Level Protocol) message
   boundaries to aid placement of out-of-order segments.  This TCP
   reassembly at high network speeds is quite counterproductive for the
   following reasons: wasted memory bandwidth in data copying, need for
   reassembly memory, wasted CPU cycles in data copying, and the general
   store-and-forward latency from an application perspective.

   The generic term RDMA-Capable Protocol (RCaP) is used to refer to
   protocol stacks that provide the Remote Direct Memory Access (RDMA)
   functionality, such as iWARP and InfiniBand.

   With the availability of RDMA-Capable Controllers within a host
   system, it is appropriate for iSCSI to be able to exploit the direct
   data placement function of the RDMA-Capable Controller like other
   applications.

Top      ToC       Page 6 
   iSCSI Extensions for RDMA (iSER) is designed precisely to take
   advantage of generic RDMA technologies -- iSER's goal is to permit
   iSCSI to employ direct data placement and RDMA capabilities using a
   generic RDMA-Capable Controller.  In summary, the iSCSI/iSER protocol
   stack is designed to enable scaling to high speeds by relying on a
   generic data placement process and RDMA technologies and products
   that enable direct data placement of both in-order and out-of-order
   data.

   This document describes iSER as a protocol extension to iSCSI, both
   for convenience of description and also because it is true in a very
   strict protocol sense.  However, it is to be noted that iSER is in
   reality extending the connectivity of the iSCSI protocol defined in
   [iSCSI], and the name "iSER" reflects this reality.

   When the iSCSI protocol as defined in [iSCSI] (i.e., without the iSER
   enhancements) is intended in the rest of the document, the term
   "Traditional iSCSI" is used to make the intention clear.

   This document obsoletes RFC 5046.  See Appendix A for the list of
   changes from RFC 5046.

1.2.  iSCSI/iSER Layering

   iSCSI Extensions for RDMA (iSER) is layered between the iSCSI layer
   and the RCaP layer.

         +--------------------------------------------------------+
         |                        SCSI                            |
         +--------------------------------------------------------+
         |                        iSCSI                           |
   DI -> +--------------------------------------------------------+
         |                         iSER                           |
         +-------+--------------------------+---------------------+
         | RDMAP |                          |                     |
         +-------+      InfiniBand          |                     |
         |  DDP  |       Reliable           |       Other         |
         +-------+       Connected          |        RDMA         |
         |  MPA  |       Transport          |       Capable       |
         +-------+        Service           |       Protocol      |
         |  TCP  |                          |                     |
         +-------+--------------------------+---------------------+
         |  IP   | InfiniBand Network Layer | Other Network Layer |
         +-------+--------------------------+---------------------+

    Figure 1: Example of iSCSI/iSER Layering in Full Feature Phase

Top      ToC       Page 7 
   Figure 1 shows an example of the relationship between SCSI, iSCSI,
   iSER, and the different RCaP layers.  For TCP, the RCaP is iWARP.
   For InfiniBand, the RCaP is the Reliable Connected Transport Service.
   Note that the iSCSI layer as described here supports the RDMA
   Extensions as used in iSER.

1.3.  Architectural Goals

   This section summarizes the architectural goals that guided the
   design of iSER.

   1.  Provide an RDMA data transfer model for iSCSI that enables direct
       in-order or out-of-order data placement of SCSI data into pre-
       allocated SCSI buffers while maintaining in-order data delivery.

   2.  Do not require any major changes to the SCSI Architecture Model
       [SAM5] and SCSI command set standards.

   3.  Utilize the existing iSCSI infrastructure (sometimes referred to
       as "iSCSI ecosystem") including but not limited to MIB,
       bootstrapping, negotiation, naming and discovery, and security.

   4.  Enable a session to operate in the Traditional iSCSI data
       transfer mode if iSER is not supported by either the initiator or
       the target.  (Do not require iSCSI Full Feature Phase
       interoperability between an end node operating in Traditional
       iSCSI mode and an end node operating in iSER-assisted mode.)

   5.  Allow initiator and target implementations to utilize generic
       RDMA-Capable Controllers such as RNICs or to implement iSCSI and
       iSER in software.  (Do not require iSCSI- or iSER-specific
       assists in the RCaP implementation or RDMA-Capable Controller.)

   6.  Implement a lightweight Datamover protocol for iSCSI with minimal
       state maintenance.

1.4.  Protocol Overview

   Consistent with the architectural goals stated in Section 1.3, the
   iSER protocol does not require changes in the iSCSI ecosystem or any
   related SCSI specifications.  The iSER protocol defines the mapping
   of iSCSI PDUs to RCaP Messages in such a way that it is entirely
   feasible to realize iSCSI/iSER implementations that are based on
   generic RDMA-Capable Controllers.  The iSER protocol layer requires
   minimal state maintenance to assist a connection during the iSCSI
   Full Feature Phase, besides being oblivious to the notion of an iSCSI
   session.  The crucial protocol aspects of iSER may be summarized as
   follows:

Top      ToC       Page 8 
   1.  iSER-assisted mode is negotiated during the iSCSI login in the
       leading connection for each session, and an entire iSCSI session
       can only operate in one mode (i.e., a connection in a session
       cannot operate in iSER-assisted mode if a different connection of
       the same session is already in Full Feature Phase in the
       Traditional iSCSI mode).

   2.  Once in iSER-assisted mode, all iSCSI interactions on that
       connection use RCaP Messages.

   3.  A Send Message is used for carrying an iSCSI control-type PDU
       preceded by an iSER header.  See Section 7.2 for more details on
       iSCSI control-type PDUs.

   4.  RDMA Write, RDMA Read Request, and RDMA Read Response Messages
       are used for carrying control and all data information associated
       with the iSCSI data-type PDUs (i.e., SCSI Data-In PDUs and R2T
       PDUs).  iSER does not use SCSI Data-Out PDUs for solicited data,
       and SCSI Data-Out PDUs for unsolicited data are not treated as
       iSCSI data-type PDUs by iSER because RDMA is not used.  See
       Section 7.1 for more details on iSCSI data-type PDUs.

   5.  The target drives all data transfer (with the exception of iSCSI
       unsolicited data) for SCSI writes and SCSI reads, by issuing RDMA
       Read Requests and RDMA Writes, respectively.

   6.  RCaP is responsible for ensuring data integrity.  (For example,
       iWARP includes a CRC-enhanced framing layer called MPA on top of
       TCP; and for InfiniBand, the CRCs are included in the Reliable
       Connection mode).  For this reason, iSCSI header and data digests
       are negotiated to "None" for iSCSI/iSER sessions.

   7.  The iSCSI error recovery hierarchy defined in [iSCSI] is fully
       supported by iSER.  (However, see Section 7.3.11 on the handling
       of SNACK Request PDUs.)

   8.  iSER requires no changes to iSCSI security and text mode
       negotiation mechanisms.

   Note that Traditional iSCSI implementations may have to be adapted to
   employ iSER.  It is expected that the adaptation when required is
   likely to be centered around the upper-layer interface requirements
   of iSER (Section 3).

Top      ToC       Page 9 
1.5.  RDMA Services and iSER

   iSER is designed to work with software and/or hardware protocol
   stacks providing the protocol services defined in RCaP documents such
   as [RDMAP], [IB], etc.  The following subsections describe the key
   protocol elements of RCaP services on which iSER relies.

1.5.1.  STag

   An STag is the identifier of an I/O Buffer unique to an RDMA-Capable
   Controller that the iSER layer Advertises to the remote iSCSI/iSER
   node in order to complete a SCSI I/O.

   In iSER, Advertisement is the act of informing the target by the
   initiator that an I/O Buffer is available at the initiator for RDMA
   Read or RDMA Write access by the target.  The initiator Advertises
   the I/O Buffer by including the STag and the Base Offset in the
   header of an iSER Message containing the SCSI Command PDU to the
   target.  The buffer length is as specified in the SCSI Command PDU.

   The iSER layer at the initiator Advertises the STag and the Base
   Offset for the I/O Buffer of each SCSI I/O to the iSER layer at the
   target in the iSER header of a Send Message containing the SCSI
   Command PDU, unless the I/O can be completely satisfied by
   unsolicited data alone.  The SendSE Message should be used if
   supported by the RCaP layer (e.g., iWARP).

   The iSER layer at the target provides the STag for the I/O Buffer
   that is the Data Sink of an RDMA Read Operation (Section 1.5.4) to
   the RCaP layer on the initiator node -- i.e., this is completely
   transparent to the iSER layer at the initiator.

   The iSER layer at the initiator SHOULD invalidate the Advertised STag
   upon a normal completion of the associated task.  The Send with
   Invalidate Message, if supported by the RCaP layer (e.g., iWARP), can
   be used for automatic invalidation when it is used to carry the SCSI
   Response PDU.  There are two exceptions to this automatic
   invalidation -- bidirectional commands and abnormal completion of a
   command.  The iSER layer at the initiator SHOULD explicitly
   invalidate the STag in these two cases.  That iSER layer MUST check
   that STag invalidation has occurred whenever receipt of a Send with
   Invalidate message is the expected means of causing an STag to be
   invalidated, and it MUST perform the STag invalidation if the STag
   has not already been invalidated (e.g., because a Send Message was
   used instead of Send with Invalidate).

Top      ToC       Page 10 
   If the Advertised STag is not invalidated as recommended in the
   foregoing paragraph (e.g., in order to cache the STag for future
   reuse), the I/O Buffer remains exposed to the network for access by
   the RCaP.  Such an I/O Buffer is capable of being read or written by
   the RCaP outside the scope of the iSCSI operation for which it was
   originally established; this fact has both robustness and security
   considerations.  The robustness considerations are that the system
   containing the iSER initiator may react poorly to an unexpected
   modification of its memory.  For the security considerations, see
   Section 11.

1.5.2.  Send

   Send is the RDMA Operation that is not addressed to an Advertised
   buffer and uses Untagged buffers as the message is received.

   The iSER layer at the initiator uses the Send Operation to transmit
   any iSCSI control-type PDU to the target.  As an example, the
   initiator uses Send Operations to transfer iSER Messages containing
   SCSI Command PDUs to the iSER layer at the target.

   An iSER layer at the target uses the Send Operation to transmit any
   iSCSI control-type PDU to the initiator.  As an example, the target
   uses Send Operations to transfer iSER Messages containing SCSI
   Response PDUs to the iSER layer at the initiator.

   For interoperability, iSER implementations SHOULD accept and
   correctly process SendSE and SendInvSE messages.  However, SendSE and
   SendInvSE messages are to be regarded as optimizations or
   enhancements to the basic Send Message, and their support may vary by
   RCaP protocol and specific implementation.  In general, these
   messages SHOULD NOT be used, unless the RCaP requires support for
   them in all implementations.  If these messages are used, the
   implementation SHOULD be capable of reverting to use of Send in order
   to work with a receiver that does not support these messages.
   Attempted use of these messages with a peer that does not support
   them may result in a fatal error that closes the RCaP connection.
   For example, these messages SHOULD NOT be used with the InfiniBand
   RCaP because InfiniBand does not require support for them in all
   cases.  New iSER implementations SHOULD use Send (and not SendSE or
   SendInvSE) unless there are compelling reasons for doing otherwise.
   Similarly, iSER implementations SHOULD NOT rely on events triggered
   by SendSE and SendInvSE, as these messages may not be used.

Top      ToC       Page 11 
1.5.3.  RDMA Write

   RDMA Write is the RDMA Operation that is used to place data into an
   Advertised buffer at the Data Sink.  The Data Source addresses the
   Message using an STag and a Tagged Offset that are valid on the Data
   Sink.

   The iSER layer at the target uses the RDMA Write Operation to
   transfer the contents of a local I/O Buffer to an Advertised I/O
   Buffer at the initiator.  The iSER layer at the target uses the RDMA
   Write to transfer the whole data or part of the data required to
   complete a SCSI Read command.

   The iSER layer at the initiator does not employ RDMA Writes.

1.5.4.  RDMA Read

   RDMA Read is the RDMA Operation that is used to retrieve data from an
   Advertised buffer at the Data Source.  The sender of the RDMA Read
   Request addresses the Message using an STag and a Tagged Offset that
   are valid on the Data Source in addition to providing a valid local
   STag and Tagged Offset that identify the Data Sink.

   The iSER layer at the target uses the RDMA Read Operation to transfer
   the contents of an Advertised I/O Buffer at the initiator to a local
   I/O Buffer at the target.  The iSER layer at the target uses the RDMA
   Read to fetch whole or part of the data required to complete a SCSI
   Write Command.

   The iSER layer at the initiator does not employ RDMA Reads.

1.6.  SCSI Read Overview

   The iSER layer at the initiator receives the SCSI Command PDU from
   the iSCSI layer.  The iSER layer at the initiator generates an STag
   for the I/O Buffer of the SCSI Read and Advertises the buffer by
   including the STag and the Base Offset as part of the iSER header for
   the PDU.  The iSER Message is transferred to the target using a Send
   Message.  The SendSE Message should be used if supported by the RCaP
   layer (e.g., iWARP).

   The iSER layer at the target uses one or more RDMA Writes to transfer
   the data required to complete the SCSI Read.

   The iSER layer at the target uses a Send Message to transfer the SCSI
   Response PDU back to the iSER layer at the initiator.  The iSER layer
   at the initiator invalidates the STag and notifies the iSCSI layer of

Top      ToC       Page 12 
   the availability of the SCSI Response PDU.  The Send with Invalidate
   Message, if supported by the RCaP layer (e.g., iWARP), can be used
   for automatic invalidation of the STag.

1.7.  SCSI Write Overview

   The iSER layer at the initiator receives the SCSI Command PDU from
   the iSCSI layer.  If solicited data transfer is involved, the iSER
   layer at the initiator generates an STag for the I/O Buffer of the
   SCSI Write and Advertises the buffer by including the STag and the
   Base Offset as part of the iSER header for the PDU.  The iSER Message
   is transferred to the target using a Send Message.  The SendSE
   Message should be used if supported by the RCaP layer (e.g., iWARP).

   The iSER layer at the initiator may optionally send one or more non-
   immediate unsolicited data PDUs to the target using Send Messages.

   If solicited data transfer is involved, the iSER layer at the target
   uses one or more RDMA Reads to transfer the data required to complete
   the SCSI Write.

   The iSER layer at the target uses a Send Message to transfer the SCSI
   Response PDU back to the iSER layer at the initiator.  The iSER layer
   at the initiator invalidates the STag and notifies the iSCSI layer of
   the availability of the SCSI Response PDU.  The Send with Invalidate
   Message, if supported by the RCaP layer (e.g., iWARP), can be used
   for automatic invalidation of the STag.

2.  Definitions and Acronyms

2.1.  Definitions

   Advertisement (Advertised, Advertise, Advertisements, Advertises) --
      The act of informing a remote iSER (iSCSI Extensions for RDMA)
      layer that a local node's buffer is available to it.  A node makes
      a buffer available for incoming RDMA Read Request Message or
      incoming RDMA Write Message access by informing the remote iSER
      layer of the Tagged Buffer identifiers (STag, Base Offset, and
      buffer length).  Note that this Advertisement of Tagged Buffer
      information is the responsibility of the iSER layer on either end
      and is not defined by the RDMA-Capable Protocol.  A typical method
      would be for the iSER layer to embed the Tagged Buffer's STag,
      Base Offset, and buffer length in a message destined for the
      remote iSER layer.

   Base Offset - A value when added to the Buffer Offset forms the
      Tagged Offset.

Top      ToC       Page 13 
   Completion (Completed, Complete, Completes) - Completion is defined
      as the process by which the RDMA-Capable Protocol layer informs
      the iSER layer that a particular RDMA Operation has performed all
      functions specified for the RDMA Operation.

   Connection - A connection is a logical bidirectional communication
      channel between the initiator and the target, e.g., a TCP
      connection.  Communication between the initiator and the target
      occurs over one or more connections.  The connections carry
      control messages, SCSI commands, parameters, and data within iSCSI
      Protocol Data Units (iSCSI PDUs).

   Connection Handle - An information element that identifies the
      particular iSCSI connection and is unique for a given iSCSI layer
      and the underlying iSER layer.  Every invocation of an Operational
      Primitive is qualified with the Connection Handle.

   Data Sink - The peer receiving a data payload.  Note that the Data
      Sink can be required to both send and receive RCaP (RDMA-Capable
      Protocol) Messages to transfer a data payload.

   Data Source - The peer sending a data payload.  Note that the Data
      Source can be required to both send and receive RCaP Messages to
      transfer a data payload.

   Datamover Interface (DI) - The interface between the iSCSI layer and
      the Datamover Layer as described in [DA].

   Datamover Layer - A layer that is directly below the iSCSI layer and
      above the underlying transport layers.  This layer exposes and
      uses a set of transport-independent Operational Primitives for the
      communication between the iSCSI layer and itself.  The Datamover
      layer, operating in conjunction with the transport layers, moves
      the control and data information on the iSCSI connection.  In this
      specification, the iSER layer is the Datamover layer.

   Datamover Protocol - A Datamover protocol is the wire protocol that
      is defined to realize the Datamover-layer functionality.  In this
      specification, the iSER protocol is the Datamover protocol.

   Inbound RDMA Read Queue Depth (IRD) - The maximum number of incoming
      outstanding RDMA Read Requests that the RDMA-Capable Controller
      can handle on a particular RCaP Stream at the Data Source.  For
      some RDMA-Capable Protocol layers, the term "IRD" may be known by
      a different name.  For example, for InfiniBand, the equivalent to
      IRD is the Responder Resources.

Top      ToC       Page 14 
   I/O Buffer - A buffer that is used in a SCSI Read or Write operation
      so SCSI data may be sent from or received into that buffer.

   iSCSI - The iSCSI protocol as defined in [iSCSI] is a mapping of the
      SCSI Architecture Model of SAM-5 over TCP.

   iSCSI control-type PDU - Any iSCSI PDU that is not an iSCSI data-
      type PDU and also not a SCSI Data-Out PDU carrying solicited data
      is defined as an iSCSI control-type PDU.  Specifically, it is to
      be noted that SCSI Data-Out PDUs for unsolicited data are defined
      as iSCSI control-type PDUs.

   iSCSI data-type PDU - An iSCSI data-type PDU is defined as an iSCSI
      PDU that causes data transfer via RDMA operations at the iSER
      layer, transparent to the remote iSCSI layer, to take place
      between the peer iSCSI nodes on a Full Feature Phase iSCSI
      connection.  An iSCSI data-type PDU, when requested for
      transmission by the sender iSCSI layer, results in the associated
      data transfer without the participation of the remote iSCSI layer,
      i.e., the PDU itself is not delivered as-is to the remote iSCSI
      layer.  The following iSCSI PDUs constitute the set of iSCSI data-
      type PDUs -- SCSI Data-In PDU and R2T PDU.

   iSCSI Layer - A layer in the protocol stack implementation within an
      end node that implements the iSCSI protocol and interfaces with
      the iSER layer via the Datamover Interface.

   iSCSI PDU (iSCSI Protocol Data Unit) - The iSCSI layer at the
      initiator and the iSCSI layer at the target divide their
      communications into messages.  The term "iSCSI Protocol Data Unit"
      (iSCSI PDU) is used for these messages.

   iSCSI/iSER Connection - An iSER-assisted iSCSI connection.  An iSCSI
      connection that is not iSER assisted always maps onto a TCP
      connection at the transport level.  But an iSER-assisted iSCSI
      connection may not have an underlying TCP connection.  For some
      RCaP implementations (e.g., iWARP), an iSER-assisted iSCSI
      connection has an underlying TCP connection.  For other RCaP
      implementations (e.g., InfiniBand), there is no underlying TCP
      connection.  (In the specific example of InfiniBand [IB], an iSER-
      assisted iSCSI connection is directly mapped onto the InfiniBand
      Reliable Connection-based (RC) channel.)

   iSCSI/iSER Session - An iSER-assisted iSCSI session.  All connections
      of an iSCSI/iSER session are iSCSI/iSER connections.

   iSER - iSCSI Extensions for RDMA, the protocol defined in this
      document.

Top      ToC       Page 15 
   iSER-assisted - A term generally used to describe the operation of
      iSCSI when the iSER functionality is also enabled below the iSCSI
      layer for the specific iSCSI/iSER connection in question.

   iSER-IRD - This variable represents the maximum number of incoming
      outstanding RDMA Read Requests that the iSER layer at the
      initiator grants on a particular RCaP Stream.

   iSER-ORD - This variable represents the maximum number of outstanding
      RDMA Read Requests that the iSER layer can initiate on a
      particular RCaP Stream.  This variable is maintained only by the
      iSER layer at the target.

   iSER Layer - The layer that implements the iSCSI Extensions for RDMA
      (iSER) protocol.

   iWARP - A suite of wire protocols comprising of [RDMAP], [DDP], and
      [MPA] when layered above [TCP].  [RDMAP] and [DDP] may be layered
      above SCTP or other transport protocols.

   Local Mapping - A task state record maintained by the iSER layer that
      associates the Initiator Task Tag to the Local STag(s).  The
      specifics of the record structure are implementation dependent.

   Local Peer - The implementation of the RDMA-Capable Protocol on the
      local end of the connection.  Used to refer to the local entity
      when describing protocol exchanges or other interactions between
      two nodes.

   Node - A computing device attached to one or more links of a network.
      A node in this context does not refer to a specific application or
      protocol instantiation running on the computer.  A node may
      consist of one or more RDMA-Capable Controllers installed in a
      host computer.

   Operational Primitive - An Operational Primitive is an abstract
      functional interface procedure that requests another layer to
      perform a specific action on the requestor's behalf or notifies
      the other layer of some event.  The Datamover Interface between an
      iSCSI layer and a Datamover layer within an iSCSI end node uses a
      set of Operational Primitives to define the functional interface
      between the two layers.  Note that not every invocation of an
      Operational Primitive may elicit a response from the requested
      layer.  A full discussion of the Operational Primitive types and
      request-response semantics available to iSCSI and iSER can be
      found in [DA].

Top      ToC       Page 16 
   Outbound RDMA Read Queue Depth (ORD) - The maximum number of
      outstanding RDMA Read Requests that the RDMA-Capable Controller
      can initiate on a particular RCaP Stream at the Data Sink.  For
      some RDMA-Capable Protocol layer, the term "ORD" may be known by a
      different name.  For example, for InfiniBand, the equivalent to
      ORD is the Initiator Depth.

   Phase Collapse - Refers to the optimization in iSCSI where the SCSI
      status is transferred along with the final SCSI Data-In PDU from a
      target.  See Section 4.2 in [iSCSI].

   RCaP Message - One or more packets of the network layer that
      constitute a single RDMA operation or a part of an RDMA Read
      Operation of the RDMA-Capable Protocol.  For iWARP, an RCaP
      Message is known as an RDMAP Message.

   RCaP Stream - A single bidirectional association between the peer
      RDMA-Capable Protocol layers on two nodes over a single transport-
      level stream.  For iWARP, an RCaP Stream is known as an RDMAP
      Stream, and the association is created following a successful
      Login Phase during which iSER support is negotiated.

   RDMA-Capable Protocol (RCaP) - The protocol or protocol suite that
      provides a reliable RDMA transport functionality, e.g., iWARP,
      InfiniBand, etc.

   RDMA-Capable Controller - A network I/O adapter or embedded
      controller with RDMA functionality.  For example, for iWARP, this
      could be an RNIC, and for InfiniBand, this could be a HCA (Host
      Channel Adapter) or TCA (Target Channel Adapter).

   RDMA-enabled Network Interface Controller (RNIC) - A network I/O
      adapter or embedded controller with iWARP functionality.

   RDMA Operation - A sequence of RCaP Messages, including control
      messages, to transfer data from a Data Source to a Data Sink.  The
      following RDMA Operations are defined -- RDMA Write Operation,
      RDMA Read Operation, and Send Operation.

   RDMA Protocol (RDMAP) - A wire protocol that supports RDMA Operations
      to transfer ULP data between a Local Peer and the Remote Peer as
      described in [RDMAP].

   RDMA Read Operation - An RDMA Operation used by the Data Sink to
      transfer the contents of a Data Source buffer from the Remote Peer
      to a Data Sink buffer at the Local Peer.  An RDMA Read operation
      consists of a single RDMA Read Request Message and a single RDMA
      Read Response Message.

Top      ToC       Page 17 
   RDMA Read Request - An RCaP Message used by the Data Sink to request
      the Data Source to transfer the contents of a buffer.  The RDMA
      Read Request Message describes both the Data Source and the Data
      Sink buffers.

   RDMA Read Response - An RCaP Message used by the Data Source to
      transfer the contents of a buffer to the Data Sink, in response to
      an RDMA Read Request.  The RDMA Read Response Message only
      describes the Data Sink buffer.

   RDMA Write Operation - An RDMA Operation used by the Data Source to
      transfer the contents of a Data Source buffer from the Local Peer
      to a Data Sink buffer at the Remote Peer.  The RDMA Write Message
      only describes the Data Sink buffer.

   Remote Direct Memory Access (RDMA) - A method of accessing memory on
      a remote system in which the local system specifies the remote
      location of the data to be transferred.  Employing an RDMA-
      Capable Controller in the remote system allows the access to take
      place without interrupting the processing of the CPU(s) on the
      system.

   Remote Mapping - A task state record maintained by the iSER layer
      that associates the Initiator Task Tag to the Advertised STag(s)
      and the Base Offset(s).  The specifics of the record structure are
      implementation dependent.

   Remote Peer - The implementation of the RDMA-Capable Protocol on the
      opposite end of the connection.  Used to refer to the remote
      entity when describing protocol exchanges or other interactions
      between two nodes.

   SCSI Layer - This layer builds/receives SCSI CDBs (Command Descriptor
      Blocks) and sends/receives them with the remaining command execute
      [SAM5] parameters to/from the iSCSI layer.

   Send - An RDMA Operation that transfers the content of a buffer from
      the Local Peer to an untagged buffer at the Remote Peer.

   SendInvSE Message - A Send with Solicited Event and Invalidate
      Message.

   SendSE Message - A Send with Solicited Event Message.

   Sequence Number (SN) - DataSN for a SCSI Data-In PDU and R2TSN for an
      R2T PDU.  The semantics for both types of sequence numbers are as
      defined in [iSCSI].

Top      ToC       Page 18 
   Session, iSCSI Session - The group of connections that link an
      initiator SCSI port with a target SCSI port form an iSCSI session
      (equivalent to a SCSI Initiator-Target (I-T) nexus).  Connections
      can be added to and removed from a session even while the I-T
      nexus is intact.  Across all connections within a session, an
      initiator sees one and the same target.

   Steering Tag (STag) - An identifier of a Tagged Buffer on a node
      (Local or Remote) as defined in [RDMAP] and [DDP].  For other
      RDMA-Capable Protocols, the Steering Tag may be known by different
      names but will be referred to herein as STags.  For example, for
      InfiniBand, a Remote STag is known as an R-Key, and a Local STag
      is known as an L-Key, and both will be considered STags.

   Tagged Buffer - A buffer that is explicitly Advertised to the iSER
      layer at the remote node through the exchange of an STag, Base
      Offset, and length.

   Tagged Offset - The offset within a Tagged Buffer.

   Traditional iSCSI - Refers to the iSCSI protocol as defined in
      [iSCSI] (i.e., without the iSER enhancements).

   Untagged Buffer - A buffer that is not explicitly Advertised to the
      iSER layer at the remode node.

2.2.  Acronyms

   Acronym        Definition

   --------------------------------------------------------------

   AHS            Additional Header Segment

   BHS            Basic Header Segment

   CO             Connection Only

   CRC            Cyclic Redundancy Check

   DDP            Direct Data Placement Protocol

   DI             Datamover Interface

   HCA            Host Channel Adapter

   IANA           Internet Assigned Numbers Authority

Top      ToC       Page 19 
   IB             InfiniBand

   IETF           Internet Engineering Task Force

   I/O            Input - Output

   IO             Initialize Only

   IP             Internet Protocol

   IPoIB          IP over InfiniBand

   IPsec          Internet Protocol Security

   iSER           iSCSI Extensions for RDMA

   ITT            Initiator Task Tag

   LO             Leading Only

   MPA            Marker PDU Aligned Framing for TCP

   NOP            No Operation

   NSG            Next Stage (during the iSCSI Login Phase)

   PDU            Protocol Data Unit

   R2T            Ready To Transfer

   R2TSN          Ready To Transfer Sequence Number

   RCaP           RDMA-Capable Protocol

   RDMA           Remote Direct Memory Access

   RDMAP          Remote Direct Memory Access Protocol

   RFC            Request For Comments

   RNIC           RDMA-enabled Network Interface Controller

   SAM5           SCSI Architecture Model - 5

   SCSI           Small Computer System Interface

Top      ToC       Page 20 
   SNACK          Selective Negative Acknowledgment - also

                  Sequence Number Acknowledgement for data

   STag           Steering Tag

   SW             Session Wide

   TCA            Target Channel Adapter

   TCP            Transmission Control Protocol

   TMF            Task Management Function

   TTT            Target Transfer Tag

   ULP            Upper Level Protocol

2.3.  Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].



(page 20 continued on part 2)

Next RFC Part