tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Gloss.     Arch.     IMS     UICC    |    Misc.    |    search     info

RFC 5046

 
 
 

Internet Small Computer System Interface (iSCSI) Extensions for Remote Direct Memory Access (RDMA)

Part 3 of 4, p. 35 to 58
Prev RFC Part       Next RFC Part

 


prevText      Top      Up      ToC       Page 35 
6.  Login/Text Operational Keys

   Certain iSCSI login/text operational keys have restricted usage in
   iSER, and additional keys are used to support the iSER protocol
   functionality.  All other keys defined in [RFC3720] and not discussed
   in this section may be used on iSCSI/iSER connections with the same
   semantics.

6.1.  HeaderDigest and DataDigest

   Irrelevant when: RDMAExtensions=Yes

   Negotiations resulting in RDMAExtensions=Yes for a session implies
   HeaderDigest=None and DataDigest=None for all connections in that
   session and overrides both the default and an explicit setting.

Top      Up      ToC       Page 36 
6.2.  MaxRecvDataSegmentLength

   For an iSCSI connection belonging to a session in which
   RDMAExtensions=Yes was negotiated on the leading connection of the
   session, MaxRecvDataSegmentLength need not be declared in the Login
   Phase.  Instead, InitiatorRecvDataSegmentLength (as described in
   Section 6.5) and TargetRecvDataSegmentLength (as described in Section
   6.4) keys are negotiated.  The values of the local and remote
   MaxRecvDataSegmentLength are derived from the
   InitiatorRecvDataSegmentLength and TargetRecvDataSegmentLength keys
   even if the MaxRecvDataSegmentLength is declared during the Login
   Phase.

   In the Full Feature Phase, the initiator MUST consider the value of
   its local MaxRecvDataSegmentLength (that it would have declared to
   the target) as having the value of InitiatorRecvDataSegmentLength,
   and the value of the remote MaxRecvDataSegmentLength (that would have
   been declared by the target) as having the value of
   TargetRecvDataSegmentLength.  Similarly, the target MUST consider the
   value of its local MaxRecvDataSegmentLength (that it would have
   declared to the initiator) as having the value of
   TargetRecvDataSegmentLength, and the value of the remote
   MaxRecvDataSegmentLength (that would have been declared by the
   initiator) as having the value of InitiatorRecvDataSegmentLength.

   The MaxRecvDataSegmentLength key is applicable only for iSCSI
   control-type PDUs.

6.3.  RDMAExtensions

   Use: LO (leading only)

   Senders: Initiator and Target

   Scope: SW (session-wide)

   RDMAExtensions=<boolean-value>

   Irrelevant when: SessionType=Discovery

   Default is No

   Result function is AND

   This key is used by the initiator and the target to negotiate support
   for iSER-assisted mode.  To enable the use of iSER-assisted mode,
   both the initiator and the target MUST exchange RDMAExtensions=Yes.

Top      Up      ToC       Page 37 
   iSER-assisted mode MUST NOT be used if either the initiator or the
   target offers RDMAExtensions=No.

   An iSER-enabled node is not required to initiate the RDMAExtensions
   key exchange if it prefers to operate in the Traditional iSCSI mode.
   However, if the RDMAExtensions key is to be negotiated, an initiator
   MUST offer the key in the first Login Request PDU in the
   LoginOperationalNegotiation stage of the leading connection, and a
   target MUST offer the key in the first Login Response PDU with which
   it is allowed to do so (i.e., the first Login Response PDU issued
   after the first Login Request PDU with the C bit set to 0) in the
   LoginOperationalNegotiation stage of the leading connection.  In
   response to the offered key=value pair of RDMAExtensions=yes, an
   initiator MUST respond in the next Login Request PDU with which it is
   allowed to do so, and a target MUST respond in the next Login
   Response PDU with which it is allowed to do so.

   Negotiating the RDMAExtensions key first enables a node to negotiate
   the optimal value for other keys.  Certain iSCSI keys such as
   MaxBurstLength, MaxOutstandingR2T, ErrorRecoveryLevel, InitialR2T,
   ImmediateData, etc., may be negotiated differently depending on
   whether the connection is in Traditional iSCSI mode or iSER-assisted
   mode.

6.4.  TargetRecvDataSegmentLength

   Use: IO (Initialize only)

   Senders: Initiator and Target

   Scope: CO (connection-only)

   Irrelevant when: RDMAExtensions=No

   TargetRecvDataSegmentLength=<numerical-value-512-to-(2**24-1)>

   Default is 8192 bytes

   Result function is minimum

   This key is relevant only for the iSCSI connection of an iSCSI
   session if RDMAExtensions=Yes is negotiated on the leading connection
   of the session.  It is used by the initiator and target to negotiate
   the maximum size of the data segment that an initiator may send to
   the target in an iSCSI control-type PDU in the Full Feature Phase.
   For SCSI Command PDUs and SCSI Data-out PDUs containing non-immediate
   unsolicited data to be sent by the initiator, the initiator MUST send
   all non-Final PDUs with a data segment size of exactly

Top      Up      ToC       Page 38 
   TargetRecvDataSegmentLength whenever the PDUs constitute a data
   sequence whose size is larger than TargetRecvDataSegmentLength.

6.5.  InitiatorRecvDataSegmentLength

   Use: IO (Initialize only)

   Senders: Initiator and Target

   Scope: CO (connection-only)

   Irrelevant when: RDMAExtensions=No

   InitiatorRecvDataSegmentLength=<numerical-value-512-to-(2**24-1)>

   Default is 8192 bytes

   Result function is minimum

   This key is relevant only for the iSCSI connection of an iSCSI
   session if RDMAExtensions=Yes is negotiated on the leading connection
   of the session.  It is used by the initiator and target to negotiate
   the maximum size of the data segment that a target may send to the
   initiator in an iSCSI control-type PDU in the Full Feature Phase.

6.6.  OFMarker and IFMarker

   Irrelevant when: RDMAExtensions=Yes

   Negotiations resulting in RDMAExtensions=Yes for a session implies
   OFMarker=No and IFMarker=No for all connections in that session and
   overrides both the default and an explicit setting.

6.7.  MaxOutstandingUnexpectedPDUs

   Use: LO (leading only), Declarative

   Senders: Initiator and Target

   Scope: SW (session-wide)

   Irrelevant when: RDMAExtensions=No

   MaxOutstandingUnexpectedPDUs=<numerical-value-from-2-to-(2**32-1) |
   0>

   Default is 0

Top      Up      ToC       Page 39 
   This key is used by the initiator and the target to declare the
   maximum number of outstanding "unexpected" iSCSI control-type PDUs
   that it can receive in the Full Feature Phase.  It is intended to
   allow the receiving side to determine the amount of buffer resources
   needed beyond the normal flow control mechanism available in iSCSI.
   An initiator or target should select a value such that it would not
   impose an unnecessary constraint on the iSCSI layer under normal
   circumstances.  The value of 0 is defined to indicate that the
   declarer has no limit on the maximum number of outstanding
   "unexpected" iSCSI control-type PDUs that it can receive.  See
   Sections 8.1.1 and 8.1.2 for the usage of this key.  Note that iSER
   Hello and HelloReply Messages are not iSCSI control-type PDUs and are
   not affected by this key.

7.  iSCSI PDU Considerations

   When a connection is in the iSER-assisted mode, two types of message
   transfers are allowed between the iSCSI layer at the initiator and
   the iSCSI layer at the target.  These are known as the iSCSI data-
   type PDUs and the iSCSI control-type PDUs, and these terms are
   described in the following sections.

7.1.  iSCSI Data-Type PDU

   An iSCSI data-type PDU is defined as an iSCSI PDU that causes data
   transfer, transparent to the remote iSCSI layer, to take place
   between the peer iSCSI nodes in the full feature phase of an
   iSCSI/iSER connection.  An iSCSI data-type PDU, when requested for
   transmission by the iSCSI layer in the sending node, results in the
   data being transferred without the participation of the iSCSI layers
   at the sending and the receiving nodes.  This is due to the fact that
   the PDU itself is not delivered as-is to the iSCSI layer in the
   receiving node.  Instead, the data transfer operations are
   transformed into the appropriate RDMA operations that are handled by
   the RDMA-Capable Controller.  The set of iSCSI data-type PDUs
   consists of SCSI Data-in PDUs and R2T PDUs.

   If the invocation of the Operational Primitive by the iSCSI layer to
   request that the iSER layer process an iSCSI data-type PDU is
   qualified with Notify_Enable set, then upon completing the RDMA
   operation, the iSER layer at the target MUST notify the iSCSI layer
   at the target by invoking the Data_Completion_Notify Operational
   Primitive qualified with ITT and SN.  There is no data completion
   notification at the initiator since the RDMA operations are
   completely handled by the RDMA-Capable Controller at the initiator
   and the iSER layer at the initiator is not involved with the data
   transfer associated with iSCSI data-type PDUs.

Top      Up      ToC       Page 40 
   If the invocation of the Operational Primitive by the iSCSI layer to
   request that the iSER layer process an iSCSI data-type PDU is
   qualified with Notify_Enable cleared, then upon completing the RDMA
   operation, the iSER layer at the target MUST NOT notify the iSCSI
   layer at the target and MUST NOT invoke the Data_Completion_Notify
   Operational Primitive.

   If an operation associated with an iSCSI data-type PDU fails for any
   reason, the contents of the Data Sink buffers associated with the
   operation are considered indeterminate.

7.2.  iSCSI Control-Type PDU

   Any iSCSI PDU that is not an iSCSI data-type PDU and also not a SCSI
   Data-out PDU carrying solicited data is defined as an iSCSI control-
   type PDU.  The iSCSI layer invokes the Send_Control Operational
   Primitive to request that the iSER layer process an iSCSI control-
   type PDU.  iSCSI control-type PDUs are transferred using Send Message
   Types of RCaP.  Specifically, note that SCSI Data-out PDUs carrying
   unsolicited data are defined as iSCSI control-type PDUs.  See Section
   7.3.4 on the treatment of SCSI Data-out PDUs.

   When the iSER layer receives an iSCSI control-type PDU, it MUST
   notify the iSCSI layer by invoking the Control_Notify Operational
   Primitive qualified with the iSCSI control-type PDU.

7.3.  iSCSI PDUs

   This section describes the handling of each of the iSCSI PDU types by
   the iSER layer.  The iSCSI layer requests that the iSER layer process
   the iSCSI PDU by invoking the appropriate Operational Primitive.  A
   Connection_Handle MUST qualify each of these invocations.  In
   addition, BHS and the optional AHS of the iSCSI PDU as defined in
   [RFC3720] MUST qualify each of the invocations.  The qualifying
   Connection_Handle, the BHS, and the AHS are not explicitly listed in
   the subsequent sections.

7.3.1.  SCSI Command

      Type:  control-type PDU

      PDU-specific qualifiers (for SCSI Write or bidirectional command):
      ImmediateDataSize, UnsolicitedDataSize, DataDescriptorOut

      PDU-specific qualifiers (for SCSI read or bidirectional command):
      DataDescriptorIn

Top      Up      ToC       Page 41 
   The iSER layer at the initiator MUST send the SCSI command in a
   SendSE Message to the target.

   For a SCSI Write or bidirectional command, the iSCSI layer at the
   initiator MUST invoke the Send_Control Operational Primitive as
   follows:

   *  If there is immediate data to be transferred for the SCSI Write or
      bidirectional command, the qualifier ImmediateDataSize MUST be
      used to define the number of bytes of immediate unsolicited data
      to be sent with the Write or bidirectional command, and the
      qualifier DataDescriptorOut MUST be used to define the initiator's
      I/O Buffer containing the SCSI Write data.

   *  If there is unsolicited data to be transferred for the SCSI Write
      or bidirectional command, the qualifier UnsolicitedDataSize MUST
      be used to define the number of bytes of immediate and non-
      immediate unsolicited data for the command.  The iSCSI layer will
      issue one or more SCSI Data-out PDUs for the non-immediate
      unsolicited data.  See Section 7.3.4 on SCSI Data-out.

   *  If there is solicited data to be transferred for the SCSI write or
      bidirectional command, as indicated by the Expected Data Transfer
      Length in the SCSI Command PDU exceeding the value of
      UnsolicitedDataSize, the iSER layer at the initiator MUST do the
      following:

         a.  It MUST allocate a Write STag for the I/O Buffer defined by
             the qualifier DataDescriptorOut.  The DataDescriptorOut
             describes the I/O buffer starting with the immediate
             unsolicited data (if any), followed by the non-immediate
             unsolicited data (if any) and solicited data.  This means
             that the BufferOffset for the SCSI Data-out for this
             command is equal to the TO.  This implies that a zero TO
             for this STag points to the beginning of this I/O Buffer.

         b.  It MUST establish a Local Mapping that associates the
             Initiator Task Tag (ITT) to the Write STag.

         c.  It MUST Advertise the Write STag to the target by sending
             it as the Write STag in the iSER header of the iSER Message
             (the payload of the SendSE Message of RCaP) containing the
             SCSI write or bidirectional command PDU.  See Section 9.2
             on iSER Header Format for the iSCSI Control-Type PDU.

   For a SCSI read or bidirectional command, the iSCSI layer at the
   initiator MUST invoke the Send_Control Operational Primitive
   qualified with DataDescriptorIn, which defines the initiator's I/O

Top      Up      ToC       Page 42 
   Buffer for receiving the SCSI Read data.  The iSER layer at the
   initiator MUST do the following:

         a.  It MUST allocate a Read STag for the I/O Buffer.

         b.  It MUST establish a Local Mapping that associates the
             Initiator Task Tag (ITT) to the Read STag.

         c.  It MUST Advertise the Read STag to the target by sending it
             as the Read STag in the iSER header of the iSER Message
             (the payload of the SendSE Message of RCaP) containing the
             SCSI read or bidirectional command PDU.  See Section 9.2 on
             iSER Header Format for the iSCSI Control-Type PDU.

   If the amount of unsolicited data to be transferred in a SCSI command
   exceeds TargetRecvDataSegmentLength, then the iSCSI layer at the
   initiator MUST segment the data into multiple iSCSI control-type
   PDUs, with the data segment length in all PDUs generated except the
   last one having exactly the size TargetRecvDataSegmentLength.  The
   data segment length of the last iSCSI control-type PDU carrying the
   unsolicited data can be up to TargetRecvDataSegmentLength.

   When the iSER layer at the target receives the SCSI command, it MUST
   establish a Remote Mapping that associates the ITT to the Advertised
   Write STag and the Read STag if present in the iSER header.  The
   Write STag is used by the iSER layer at the target in handling the
   data transfer associated with the R2T PDU(s) as described in Section
   7.3.6.  The Read STag is used in handling the SCSI Data-in PDU(s)
   from the iSCSI layer at the target as described in Section 7.3.5.

7.3.2.  SCSI Response

      Type:  control-type PDU

      PDU-specific qualifiers:  DataDescriptorStatus

   The iSCSI layer at the target MUST invoke the Send_Control
   Operational Primitive qualified with DataDescriptorStatus, which
   defines the buffer containing the sense and response information.
   The iSCSI layer at the target MUST always return the SCSI status for
   a SCSI command in a separate SCSI Response PDU.  "Phase collapse" for
   transferring SCSI status in a SCSI Data-in PDU MUST NOT be used.  The
   iSER layer at the target sends the SCSI Response PDU according to the
   following rules:

   *  If no STags are Advertised by the initiator in the iSER Message
      containing the SCSI command PDU, then the iSER layer at the target
      MUST send a SendSE Message containing the SCSI Response PDU.

Top      Up      ToC       Page 43 
   *  If the initiator Advertised a Read STag in the iSER Message
      containing the SCSI Command PDU, then the iSER layer at the target
      MUST send a SendInvSE Message containing the SCSI Response PDU.
      The header of the SendInvSE Message MUST carry the Read STag to be
      invalidated at the initiator.

   *  If the initiator Advertised only the Write STag in the iSER
      Message containing the SCSI Command PDU, then the iSER layer at
      the target MUST send a SendInvSE Message containing the SCSI
      Response PDU.  The header of the SendInvSE Message MUST carry the
      Write STag to be invalidated at the initiator.

   When the iSCSI layer at the target invokes the Send_Control
   Operational Primitive to send the SCSI Response PDU, the iSER layer
   at the target MUST invalidate the Remote Mapping that associates the
   ITT to the Advertised STag(s) before transferring the SCSI Response
   PDU to the initiator.

   Upon receiving the SendInvSE Message containing the SCSI Response PDU
   from the target, the RCaP layer at the initiator will invalidate the
   STag specified in the header.  The iSER layer at the initiator MUST
   ensure that the correct STag is invalidated.  If both the Read and
   the Write STags are Advertised earlier by the initiator, then the
   iSER layer at the initiator MUST explicitly invalidate the Write STag
   upon receiving the SendInvSE Message because the header of the
   SendInvSE Message can only carry one STag (in this case, the Read
   STag) to be invalidated.

   The iSER layer at the initiator MUST ensure the invalidation of the
   STag(s) used in a command before notifying the iSCSI layer at the
   initiator by invoking the Control_Notify Operational Primitive
   qualified with the SCSI Response.  This precludes the possibility of
   using the STag(s) after the completion of the command, thereby
   causing data corruption.

   When the iSER layer at the initiator receives the SendSE or the
   SendInvSE Message containing the SCSI Response PDU, it SHOULD
   invalidate the Local Mapping that associates the ITT to the local
   STag(s).  The iSER layer MUST ensure that all local STag(s)
   associated with the ITT are invalidated before notifying the iSCSI
   layer of the SCSI Response PDU by invoking the Control_Notify
   Operational Primitive qualified with the SCSI Response PDU.

Top      Up      ToC       Page 44 
7.3.3.  Task Management Function Request/Response

      Type:  control-type PDU

      PDU-specific qualifiers (for TMF Request):  DataDescriptorOut,
      DataDescriptorIn

   The iSER layer MUST use a SendSE Message to send the Task Management
   Function Request/Response PDU.

   For the Task Management Function Request with the TASK REASSIGN
   function, the iSER layer at the initiator MUST do the following:

   *  It MUST use the ITT as specified in the Referenced Task Tag from
      the Task Management Function Request PDU to locate the existing
      STag(s), if any, in the Local Mapping(s) that associates the ITT
      to the local STag(s).

   *  It MUST invalidate the existing STag(s), if any, and the Local
      Mapping(s) that associates the ITT to the local STag(s).

   *  It MUST allocate a Read STag for the I/O Buffer as defined by the
      qualifier DataDescriptorIn if the Send_Control Operational
      Primitive invocation is qualified with DataDescriptorIn.

   *  It MUST allocate a Write STag for the I/O Buffer as defined by the
      qualifier DataDescriptorOut if the Send_Control Operational
      Primitive invocation is qualified with DataDescriptorOut.

   *  If STags are allocated, it MUST establish a new Local Mapping(s)
      that associate the ITT to the allocated STag(s).

   *  It MUST Advertise the STags, if allocated, to the target in the
      iSER header of the SendSE Message carrying the iSCSI PDU, as
      described in Section 9.2.

   For the Task Management Function Request with the TASK REASSIGN
   function for a SCSI read or bidirectional command, the iSCSI layer at
   the initiator MUST set ExpDataSN to 0 since the data transfer and
   acknowledgements happen transparently to the iSCSI layer at the
   initiator.  This provides the flexibility to the iSCSI layer at the
   target to request transmission of only the unacknowledged data as
   specified in [RFC3720].

   When the iSER layer at the target receives the Task Management
   Function Request with the TASK REASSIGN function, it MUST do the
   following:

Top      Up      ToC       Page 45 
   *  It MUST use the ITT as specified in the Referenced Task Tag from
      the Task Management Function Request PDU to locate the mappings
      that associate the ITT to the Advertised STag(s) and the local
      STag(s), if any.

   *  It MUST invalidate the local STag(s), if any, associated with the
      ITT.

   *  It MUST replace the Advertised STag(s) in the Remote Mapping that
      associates the ITT to the Advertised STag(s) with the Write STag
      and the Read STag if present in the iSER header.  The Write STag
      is used in the handling of the R2T PDU(s) from the iSCSI layer at
      the target as described in Section 7.3.6.  The Read STag is used
      in the handling of the SCSI Data-in PDU(s) from the iSCSI layer at
      the target as described in Section 7.3.5.

7.3.4.  SCSI Data-Out

      Type:  control-type PDU

      PDU-specific qualifiers:  DataDescriptorOut

   The iSCSI layer at the initiator MUST invoke the Send_Control
   Operational Primitive qualified with DataDescriptorOut, which defines
   the initiator's I/O Buffer containing unsolicited SCSI Write data.

   If the amount of unsolicited data to be transferred as SCSI Data-out
   exceeds TargetRecvDataSegmentLength, then the iSCSI layer at the
   initiator MUST segment the data into multiple iSCSI control-type
   PDUs, with the DataSegmentLength having the value of
   TargetRecvDataSegmentLength in all PDUs generated except the last
   one.  The DataSegmentLength of the last iSCSI control-type PDU
   carrying the unsolicited data can be up to
   TargetRecvDataSegmentLength.  The iSCSI layer at the target MUST
   perform the reassembly function for the unsolicited data.

   For unsolicited data, if the F bit is set to 0 in a SCSI Data-out
   PDU, the iSER layer at the initiator MUST use a Send Message to send
   the SCSI Data-out PDU.  If the F bit is set to 1, the iSER layer at
   the initiator MUST use a SendSE Message to send the SCSI Data-out
   PDU.

   Note that for solicited data, the SCSI Data-out PDUs are not used
   since R2T PDUs are not delivered to the iSCSI layer at the initiator;
   instead, R2T PDUs are transformed by the iSER layer at the target
   into RDMA Read operations.  (See Section 7.3.6.)

Top      Up      ToC       Page 46 
7.3.5.  SCSI Data-In

      Type:  data-type PDU

      PDU-specific qualifiers:  DataDescriptorIn

   When the iSCSI layer at the target is ready to return the SCSI Read
   data to the initiator, it MUST invoke the Put_Data Operational
   Primitive qualified with DataDescriptorIn, which defines the SCSI
   Data-in buffer.  See Section 7.1 on the general requirement on the
   handling of iSCSI data-type PDUs.  SCSI Data-in PDU(s) are used in
   SCSI Read data transfer as described in Section 9.5.2.

   The iSER layer at the target MUST do the following for each
   invocation of the Put_Data Operational Primitive:

   1.  It MUST use the ITT in the SCSI Data-in PDU to locate the remote
       Read STag in the Remote Mapping that associates the ITT to
       Advertised STag(s).  The Remote Mapping was established earlier
       by the iSER layer at the target when the SCSI read command was
       received from the initiator.

   2.  It MUST generate and send an RDMA Write Message containing the
       read data to the initiator.

       a.  It MUST use the remote Read STag as the Data Sink STag of the
           RDMA Write Message.

       b.  It MUST use the Buffer Offset from the SCSI Data-in PDU as
           the Data Sink Tagged Offset of the RDMA Write Message.

       c.  It MUST use DataSegmentLength from the SCSI Data-in PDU to
           determine the amount of data to be sent in the RDMA Write
           Message.

   3.  It MUST associate DataSN and ITT from the SCSI Data-in PDU with
       the RDMA Write operation.  If the Put_Data Operational Primitive
       invocation was qualified with Notify_Enable set, then when the
       iSER layer at the target receives a completion from the RCaP
       layer for the RDMA Write Message, the iSER layer at the target
       MUST notify the iSCSI layer by invoking the
       Data_Completion_Notify Operational Primitive qualified with
       DataSN and ITT.  Conversely, if the Put_Data Operational
       Primitive invocation was qualified with Notify_Enable cleared,
       then the iSER layer at the target MUST NOT notify the iSCSI layer
       on completion and MUST NOT invoke the Data_Completion_Notify
       Operational Primitive.

Top      Up      ToC       Page 47 
   When the A-bit is set to 1 in the SCSI Data-in PDU, the iSER layer at
   the target MUST notify the iSCSI layer at the target when the data
   transfer is complete at the initiator.  To perform this additional
   function, the iSER layer at the target can take advantage of the
   operational ErrorRecoveryLevel if previously disclosed by the iSCSI
   layer via an earlier invocation of the Notice_Key_Values Operational
   Primitive.  There are two approaches that can be taken:

   1.  If the iSER layer at the target knows that the operational
       ErrorRecoveryLevel is 2, or if the iSER layer at the target does
       not know the operational ErrorRecoveryLevel, then the iSER layer
       at the target MUST issue a zero-length RDMA Read Request Message
       following the RDMA Write Message.  When the iSER layer at the
       target receives a completion for the RDMA Read Request Message
       from the RCaP layer, implying that the RDMA-Capable Controller at
       the initiator has completed processing the RDMA Write Message due
       to the completion ordering semantics of RCaP, the iSER layer at
       the target MUST notify the iSCSI layer at the target by invoking
       the Data_Ack_Notify Operational Primitive qualified with ITT and
       DataSN (see Section 3.2.3).

   2.  If the iSER layer at the target knows that the operational
       ErrorRecoveryLevel is 1, then the iSER layer at the target MUST
       do one of the following:

       a.  It MUST notify the iSCSI layer at the target by invoking the
           Data_Ack_Notify Operational Primitive qualified with ITT and
           DataSN (see Section 3.2.3) when it receives the local
           completion from the RCaP layer for the RDMA Write Message.
           This is allowed since digest errors do not occur in iSER (see
           Section 10.1.4.2) and a CRC error will cause the connection
           to be terminated and the task to be terminated anyway.  The
           local RDMA Write completion from the RCaP layer guarantees
           that the RCaP layer will not access the I/O Buffer again to
           transfer the data associated with that RDMA Write operation.

       b.  Alternatively, it MUST use the same procedure for handling
           the data transfer completion at the initiator as for
           ErrorRecoveryLevel 2.

   Note that the iSCSI layer at the target cannot set the A-bit to 1 if
   the ErrorRecoveryLevel=0.

   The SCSI status MUST always be returned in a separate SCSI Response
   PDU.  The S bit in the SCSI Data-in PDU MUST always be set to 0.
   There MUST NOT be a "phase collapse" in the SCSI Data-in PDU.

Top      Up      ToC       Page 48 
   Since the RDMA Write Message only transfers the data portion of the
   SCSI Data-in PDU but not the control information in the header, such
   as ExpCmdSN, if timely updates of such information are crucial, the
   iSCSI layer at the initiator MAY issue NOP-Out PDUs to request that
   the iSCSI layer at the target respond with the information using NOP-
   In PDUs.

7.3.6.  Ready to Transfer (R2T)

      Type:  data-type PDU

      PDU-specific qualifiers:  DataDescriptorOut

   In order to send an R2T PDU, the iSCSI layer at the target MUST
   invoke the Get_Data Operational Primitive qualified with
   DataDescriptorOut, which defines the I/O Buffer for receiving the
   SCSI Write data from the initiator.  See Section 7.1 on the general
   requirements on the handling of iSCSI data-type PDUs.

   The iSER layer at the target MUST do the following for each
   invocation of the Get_Data Operational Primitive:

   1.  It MUST ensure a valid local STag for the I/O Buffer and a valid
       Local Mapping that associates the Initiator Task Tag (ITT) to the
       local STag.  This may involve allocating a valid local STag and
       establishing a Local Mapping.

   2.  It MUST use the ITT in the R2T to locate the remote Write STag in
       the Remote Mapping that associates the ITT to Advertised STag(s).
       The Remote Mapping is established earlier by the iSER layer at
       the target when the iSER Message containing the Advertised Write
       STag and the SCSI Command PDU for a SCSI write or bidirectional
       command is received from the initiator.

   3.  If the iSER-ORD value at the target is set to 0, the iSER layer
       at the target MUST terminate the connection and free up the
       resources associated with the connection (as described in Section
       5.2.3) if it receives the R2T PDU from the iSCSI layer at the
       target.  Upon termination of the connection, the iSER layer at
       the target MUST notify the iSCSI layer at the target by invoking
       the Connection_Terminate_Notify Operational Primitive.

   4.  If the iSER-ORD value at the target is set to greater than 0, the
       iSER layer at the target MUST transform the R2T PDU into an RDMA
       Read Request Message.  While transforming the R2T PDU, the iSER
       layer at the target MUST ensure that the number of outstanding
       RDMA Read Request Messages does not exceed the iSER-ORD value.
       To transform the R2T PDU, the iSER layer at the target:

Top      Up      ToC       Page 49 
       a.  MUST derive the local STag and local Tagged Offset from the
           DataDescriptorOut that qualified the Get_Data invocation.

       b.  MUST use the local STag as the Data Sink STag of the RDMA
           Read Request Message.

       c.  MUST use the local Tagged Offset as the Data Sink Tagged
           Offset of the RDMA Read Request Message.

       d.  MUST use the Desired Data Transfer Length from the R2T PDU as
           the RDMA Read Message Size of the RDMA Read Request Message.

       e.  MUST use the remote Write STag as the Data Source STag of the
           RDMA Read Request Message.

       f.  MUST use the Buffer Offset from the R2T PDU as the Data
           Source Tagged Offset of the RDMA Read Request Message.

   5.  It MUST associate R2TSN and ITT from the R2T PDU with the RDMA
       Read operation.  If the Get_Data Operational Primitive invocation
       is qualified with Notify_Enable set, then when the iSER layer at
       the target receives a completion from the RCaP layer for the RDMA
       Read operation, the iSER layer at the target MUST notify the
       iSCSI layer by invoking the Data_Completion_Notify Operational
       Primitive qualified with R2TSN and ITT.  Conversely, if the
       Get_Data Operational Primitive invocation is qualified with
       Notify_Enable cleared, then the iSER layer at the target MUST NOT
       notify the iSCSI layer on completion and MUST NOT invoke the
       Data_Completion_Notify Operational Primitive.

   When the RCaP layer at the initiator receives a valid RDMA Read
   Request Message, it will return an RDMA Read Response Message
   containing the solicited write data to the target.  When the RCaP
   layer at target receives the RDMA Read Response Message from the
   initiator, it will place the solicited data in the I/O Buffer
   referenced by the Data Sink STag in the RDMA Read Response Message.

   Since the RDMA Read Request Message from the target does not transfer
   the control information in the R2T PDU, such as ExpCmdSN, if timely
   updates of such information are crucial, the iSCSI layer at the
   initiator MAY issue NOP-Out PDUs to request that the iSCSI layer at
   the target respond with the information using NOP-In PDUs.

   Similarly, since the RDMA Read Response Message from the initiator
   only transfers the data but not the control information normally
   found in the SCSI Data-out PDU, such as ExpStatSN, if timely updates
   of such information are crucial, the iSCSI layer at the target MAY

Top      Up      ToC       Page 50 
   issue NOP-In PDUs to request that the iSCSI layer at the initiator
   respond with the information using NOP-Out PDUs.

7.3.7.  Asynchronous Message

      Type:  control-type PDU

      PDU-specific qualifiers:  DataDescriptorSense

   The iSCSI layer MUST invoke the Send_Control Operational Primitive
   qualified with DataDescriptorSense, which defines the buffer
   containing the sense and iSCSI Event information.  The iSER layer
   MUST use a SendSE Message to send the Asynchronous Message PDU.

7.3.8.  Text Request and Text Response

      Type:  control-type PDU

      PDU-specific qualifiers:  DataDescriptorTextOut (for Text
      Request), DataDescriptorIn (for Text Response)

   The iSCSI layer MUST invoke the Send_Control Operational Primitive
   qualified with DataDescriptorTextOut (or DataDescriptorIn), which
   defines the Text Request (or Text Response) buffer.  The iSER layer
   MUST use SendSE Messages to send the Text Request (or Text Response
   PDUs).

7.3.9.  Login Request and Login Response

   During the login negotiation, the iSCSI layer interacts with the
   transport layer directly and the iSER layer is not involved.  See
   Section 5.1 on iSCSI/iSER connection setup.  If the underlying
   transport is TCP, the Login Request PDUs and the Login Response PDUs
   are exchanged when the connection between the initiator and the
   target is still in the byte stream mode.

   The iSCSI layer MUST not send a Login Request (or a Login Response)
   PDU during the Full Feature Phase.  A Login Request (or a Login
   Response) PDU, if used, MUST be treated as an iSCSI protocol error.
   The iSER layer MAY reject such a PDU from the iSCSI layer with an
   appropriate error code.  If a Login Request PDU is received by the
   iSCSI layer at the target, it MUST respond with a Reject PDU with a
   reason code of "protocol error".

Top      Up      ToC       Page 51 
7.3.10.  Logout Request and Logout Response

      Type:  control-type PDU

      PDU-specific qualifiers:  None

   The iSER layer MUST use a SendSE Message to send the Logout Request
   or Logout Response PDU.  Sections 5.2.1 and 5.2.2 describe the
   handling of the Logout Request and the Logout Response at the
   initiator and the target and the interactions between the initiator
   and the target to terminate a connection.

7.3.11.  SNACK Request

   Since HeaderDigest and DataDigest must be negotiated to "None", there
   are no digest errors when the connection is in iSER-assisted mode.
   Also, since RCaP delivers all messages in the order they were sent,
   there are no sequence errors when the connection is in iSER-assisted
   mode.  Therefore, the iSCSI layer MUST NOT send SNACK Request PDUs.
   A SNCAK Request PDU, if used, MUST be treated as an iSCSI protocol
   error.  The iSER layer MAY reject such a PDU from the iSCSI layer
   with an appropriate error code.  If a SNACK Request PDU is received
   by the iSCSI layer at the target, it MUST respond with a Reject PDU
   with a reason code of "protocol error".

7.3.12.  Reject

      Type:  control-type PDU

      PDU-specific qualifiers:  DataDescriptorReject

   The iSCSI layer MUST invoke the Send_Control Operational Primitive
   qualified with DataDescriptorReject, which defines the Reject buffer.
   The iSER layer MUST use a SendSE Message to send the Reject PDU.

7.3.13.  NOP-Out and NOP-In

      Type:  control-type PDU

      PDU-specific qualifiers:  DataDescriptorNOPOut (for NOP-Out),
      DataDescriptorNOPIn (for NOP-In)

   The iSCSI layer MUST invoke the Send_Control Operational Primitive
   qualified with DataDescriptorNOPOut (or DataDescriptorNOPIn), which
   defines the Ping (or Return Ping) data buffer.  The iSER layer MUST
   use SendSE Messages to send the NOP-Out (or NOP-In) PDU.

Top      Up      ToC       Page 52 
8.  Flow Control and STag Management

8.1.  Flow Control for RDMA Send Message Types

   Send Message Types in RCaP are used by the iSER layer to transfer
   iSCSI control-type PDUs.  Each Send Message Type in RCaP consumes an
   Untagged Buffer at the Data Sink.  However, neither the RCaP layer
   nor the iSER layer provides an explicit flow control mechanism for
   the Send Message Types.  Therefore, the iSER layer SHOULD provision
   enough Untagged buffers for handling incoming Send Message Types to
   prevent buffer exhaustion at the RCaP layer.  If buffer exhaustion
   occurs, it may result in the termination of the connection.

   An implementation may choose to satisfy the buffer requirement by
   using a common buffer pool shared across multiple connections, with
   usage limits on a per-connection basis and usage limits on the buffer
   pool itself.  In such an implementation, exceeding the buffer usage
   limit for a connection or the buffer pool itself may trigger
   interventions from the iSER layer to replenish the buffer pool and/or
   to isolate the connection causing the problem.

   iSER also provides the MaxOutstandingUnexpectedPDUs key to be used by
   the initiator and the target to declare the maximum number of
   outstanding "unexpected" control-type PDUs that it can receive.  It
   is intended to allow the receiving side to determine the amount of
   buffer resources needed beyond the normal flow control mechanism
   available in iSCSI.

   The buffer resources required at both the initiator and the target as
   a result of control-type PDUs sent by the initiator is described in
   Section 8.1.1.  The buffer resources required at both the initiator
   and target as a result of control-type PDUs sent by the target is
   described in Section 8.1.2.

8.1.1.  Flow Control for Control-Type PDUs from the Initiator

   The control-type PDUs that can be sent by an initiator to a target
   can be grouped into the following categories:

   1.  Regulated:  Control-type PDUs in this category are regulated by
       the iSCSI CmdSN window mechanism and the immediate flag is not
       set.

   2.  Unregulated but Expected:  Control-type PDUs in this category are
       not regulated by the iSCSI CmdSN window mechanism but are
       expected by the target.

Top      Up      ToC       Page 53 
   3.  Unregulated and Unexpected:  Control-type PDUs in this category
       are not regulated by the iSCSI CmdSN window mechanism and are
       "unexpected" by the target.

8.1.1.1.  Control-Type PDUs from the Initiator in the Regulated Category

   Control-type PDUs that can be sent by the initiator in this category
   are regulated by the iSCSI CmdSN window mechanism and the immediate
   flag is not set.

   The queuing capacity required of the iSCSI layer at the target is
   described in Section 3.2.2.1 of [RFC3720].  For each of the control-
   type PDUs that can be sent by the initiator in this category, the
   initiator MUST provision for the buffer resources required for the
   corresponding control-type PDU sent as a response from the target.
   The following is a list of the PDUs that can be sent by the initiator
   and the PDUs that are sent by the target in response:

       a.  When an initiator sends a SCSI Command PDU, it expects a SCSI
           Response PDU from the target.

       b.  When the initiator sends a Task Management Function Request
           PDU, it expects a Task Management Function Response PDU from
           the target.

       c.  When the initiator sends a Text Request PDU, it expects a
           Text Response PDU from the target.

       d.  When the initiator sends a Logout Request PDU, it expects a
           Logout Response PDU from the target.

       e.  When the initiator sends a NOP-Out PDU as a ping request with
           ITT != 0xffffffff and TTT = 0xffffffff, it expects a NOP-In
           PDU from the target with the same ITT and TTT as in the ping
           request.

   The response from the target for any of the PDUs enumerated here may
   alternatively be in the form of a Reject PDU sent instead before the
   task is active, as described in Section 6.3 of [RFC3720].

8.1.1.2.  Control-Type PDUs from the Initiator in the Unregulated but
          Expected Category

   For the control-type PDUs in the Unregulated but Expected category,
   the amount of buffering resources required at the target can be
   predetermined.  The following is a list of the PDUs in this category:

Top      Up      ToC       Page 54 
       a.  SCSI Data-out PDUs are used by the initiator to send
           unsolicited data.  The amount of buffer resources required by
           the target can be determined using FirstBurstLength.  Note
           that SCSI Data-out PDUs are not used for solicited data since
           the R2T PDU that is used for solicitation is transformed into
           RDMA Read operations by the iSER layer at the target.  See
           Section 7.3.4.

       b.  A NOP-Out PDU with TTT != 0xffffffff is sent as a ping
           response by the initiator to the NOP-In PDU sent as a ping
           request by the target.

8.1.1.3.  Control-Type PDUs from the Initiator in the Unregulated and
          Unexpected Category

   PDUs in the Unregulated and Unexpected category are PDUs with the
   immediate flag set.  The number of PDUs in this category that can be
   sent by an initiator is controlled by the value of
   MaxOutstandingUnexpectedPDUs declared by the target (see Section
   6.7).  After a PDU in this category is sent by the initiator, it is
   outstanding until it is retired.  At any time, the number of
   outstanding unexpected PDUs MUST not exceed the value of
   MaxOutstandingUnexpectedPDUs declared by the target.

   The target uses the value of MaxOutstandingUnexpectedPDUs that it
   declared to determine the amount of buffer resources required for
   control-type PDUs in this category that can be sent by an initiator.
   For the initiator, for each of the control-type PDUs that can be sent
   in this category, the initiator MUST provision for the buffer
   resources if required for the corresponding control-type PDU that can
   be sent as a response from the target.

   An outstanding PDU in this category is retired as follows.  If the
   CmdSN of the PDU sent by the initiator in this category is x, the PDU
   is outstanding until the initiator sends a non-immediate control-type
   PDU on the same connection with CmdSN = y (where y is at least x) and
   the target responds with a control-type PDU on any connection where
   ExpCmdSN is at least y+1.

   When the number of outstanding unexpected control-type PDUs equals
   MaxOutstandingUnexpectedPDUs, the iSCSI layer at the initiator MUST
   NOT generate any unexpected PDUs that otherwise it would have
   generated, even if it is intended for immediate delivery.

Top      Up      ToC       Page 55 
8.1.2.  Flow Control for Control-Type PDUs from the Target

   Control-type PDUs that can be sent by a target and are expected by
   the initiator are listed in the Regulated category (see Section
   8.1.1.1).

   For the control-type PDUs that can be sent by a target and are
   unexpected by the initiator, the number is controlled by
   MaxOutstandingUnexpectedPDUs declared by the initiator (see Section
   6.7).  After a PDU in this category is sent by a target, it is
   outstanding until it is retired.  At any time, the number of
   outstanding unexpected PDUs MUST not exceed the value of
   MaxOutstandingUnexpectedPDUs declared by the initiator.  The
   initiator uses the value of MaxOutstandingUnexpectedPDUs that it
   declared to determine the amount of buffer resources required for
   control-type PDUs in this category that can be sent by a target.  The
   following is a list of the PDUs in this category and the conditions
   for retiring the outstanding PDU:

       a.  For an Asynchronous Message PDU with StatSN = x, the PDU is
           outstanding until the initiator sends a control-type PDU with
           ExpStatSN set to at least x+1.

       b.  For a Reject PDU with StatSN = x that is sent after a task is
           active, the PDU is outstanding until the initiator sends a
           control-type PDU with ExpStatSN set to at least x+1.

       c.  For a NOP-In PDU with ITT = 0xffffffff and StatSN = x, the
           PDU is outstanding until the initiator responds with a
           control-type PDU on the same connection where ExpStatSN is at
           least x+1.  But if the NOP-In PDU is sent as a ping request
           with TTT != 0xffffffff, the PDU can also be retired when the
           initiator sends a NOP-Out PDU with the same ITT and TTT as in
           the ping request.  Note that when a target sends a NOP-In PDU
           as a ping request, it must provision a buffer for the NOP-Out
           PDU sent as a ping response from the initiator.

   When the number of outstanding unexpected control-type PDUs equals
   MaxOutstandingUnexpectedPDUs, the iSCSI layer at the target MUST NOT
   generate any unexpected PDUs that otherwise it would have generated,
   even if its intent is to indicate an iSCSI error condition (e.g.,
   Asynchronous Message, Reject).  Task timeouts, as in the initiator
   waiting for a command completion or other connection and session
   level exceptions, will ensure that correct operational behavior will
   result in these cases despite not generating the PDU.  This rule
   overrides any other requirements elsewhere that require that a Reject
   PDU MUST be sent.

Top      Up      ToC       Page 56 
   (Implementation note:  A SCSI task timeout and recovery can be a
   lengthy process and hence SHOULD be avoided by proper provisioning of
   resources.)

   (Implementation note:  To ensure that the initiator has a means to
   inform the target that outstanding PDUs have been retired, the target
   should reserve the last unexpected control-type PDU allowable by the
   value of MaxOutstandingUnexpectedPDUs declared by the initiator for
   sending a NOP-In ping request with TTT != 0xffffffff to allow the
   initiator to return the NOP-Out ping response with the current
   ExpStatSN.)

8.2.  Flow Control for RDMA Read Resources

   The total number of RDMA Read operations that can be active
   simultaneously on an iSCSI/iSER connection depends on the amount of
   resources allocated as declared in the iSER Hello exchange described
   in Section 5.1.3.  Exceeding the number of RDMA Read operations
   allowed on a connection will result in the connection being
   terminated by the RCaP layer.  The iSER layer at the target maintains
   the iSER-ORD to keep track of the maximum number of RDMA Read
   Requests that can be issued by the iSER layer on a particular RCaP
   Stream.

   During connection setup (see Section 5.1), iSER-IRD is known at the
   initiator and iSER-ORD is known at the target after the iSER layers
   at the initiator and the target have respectively allocated the
   connection resources necessary to support RCaP, as directed by the
   Allocate_Connection_Resources Operational Primitive from the iSCSI
   layer before the end of the iSCSI Login Phase.  In the Full Feature
   Phase, the first message sent by the initiator is the iSER Hello
   Message (see Section 9.3), which contains the value of iSER-IRD.  In
   response to the iSER Hello Message, the target sends the iSER
   HelloReply Message (see Section 9.4), which contains the value of
   iSER-ORD.  The iSER layer at both the initiator and the target MAY
   adjust (lower) the resources associated with iSER-IRD and iSER-ORD
   respectively to match the iSER-ORD value declared in the HelloReply
   Message.  The iSER layer at the target MUST flow control the RDMA
   Read Request Messages to not exceed the iSER-ORD value at the target.

8.3.  STag Management

   An STag, as defined in [RDMAP], is an identifier of a Tagged Buffer
   used in an RDMA operation.  The allocation and the subsequent
   invalidation of the STags are specified in this document if the STags
   are exposed on the wire by being Advertised in the iSER header or
   declared in the header of an RCaP Message.

Top      Up      ToC       Page 57 
8.3.1.  Allocation of STags

   When the iSCSI layer at the initiator invokes the Send_Control
   Operational Primitive to request that the iSER layer at the initiator
   process a SCSI command, zero, one, or two STags may be allocated by
   the iSER layer.  See Section 7.3.1 for details.  The number of STags
   allocated depends on whether the command is unidirectional or
   bidirectional and whether or not solicited write data transfer is
   involved.

   When the iSCSI layer at the initiator invokes the Send_Control
   Operational Primitive to request that the iSER layer at the initiator
   process a Task Management Function Request with the TASK REASSIGN
   function, besides allocating zero, one, or two STags, the iSER layer
   MUST invalidate the existing STags, if any, associated with the ITT.
   See Section 7.3.3 for details.

   The iSER layer at the target allocates a local Data Sink STag when
   the iSCSI layer at the target invokes the Get_Data Operational
   Primitive to request that the iSER layer process an R2T PDU.  See
   Section 7.3.6 for details.

8.3.2.  Invalidation of STags

   The invalidation of the STags at the initiator at the completion of a
   unidirectional or bidirectional command when the associated SCSI
   Response PDU is sent by the target is described in Section 7.3.2.

   When a unidirectional or bidirectional command concludes without the
   associated SCSI Response PDU being sent by the target, the iSCSI
   layer at the initiator MUST request that the iSER layer at the
   initiator invalidate the STags by invoking the
   Deallocate_Task_Resources Operational Primitive qualified with ITT.
   In response, the iSER layer at the initiator MUST locate the STag(s)
   (if any) in the Local Mapping that associates the ITT to the local
   STag(s).  The iSER layer at the initiator MUST invalidate the STag(s)
   (if any) and the Local Mapping.

   For an RDMA Read operation used to realize a SCSI Write data
   transfer, the iSER layer at the target SHOULD invalidate the Data
   Sink STag at the conclusion of the RDMA Read operation referencing
   the Data Sink STag (to permit the immediate reuse of buffer
   resources).

   For an RDMA Write operation used to realize a SCSI Read data
   transfer, the Data Source STag at the target is not declared to the
   initiator and is not exposed on the wire.  Invalidation of the STag
   is thus not specified.

Top      Up      ToC       Page 58 
   When a unidirectional or bidirectional command concludes without the
   associated SCSI Response PDU being sent by the target, the iSCSI
   layer at the target MUST request that the iSER layer at the target
   invalidate the STags by invoking the Deallocate_Task_Resources
   Operational Primitive qualified with ITT.  In response, the iSER
   layer at the target MUST locate the local STag(s) (if any) in the
   Local Mapping that associates the ITT to the local STag(s).  The iSER
   layer at the target MUST invalidate the local STag(s) (if any) and
   the mapping.



(page 58 continued on part 4)

Next RFC Part