Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7143

Internet Small Computer System Interface (iSCSI) Protocol (Consolidated)

Pages: 295
Proposed Standard
Obsoletes:  3720398048505048
Updates:  3721
Part 6 of 10 – Pages 132 to 169
First   Prev   Next

Top   ToC   RFC7143 - Page 132   prevText

9.1. iSCSI Security Mechanisms

The entities involved in iSCSI security are the initiator, target, and the IP communication endpoints. iSCSI scenarios in which multiple initiators or targets share a single communication endpoint are expected. To accommodate such scenarios, iSCSI supports two separate security mechanisms: in-band authentication between the initiator and the target at the iSCSI connection level (carried out by exchange of iSCSI Login PDUs), and packet protection (integrity, authentication, and confidentiality) by IPsec at the IP level. The two security mechanisms complement each other. The in-band authentication provides end-to-end trust (at login time) between the iSCSI initiator and the target, while IPsec provides a secure channel between the IP communication endpoints. iSCSI can be used to access sensitive information for which significant security protection is appropriate. As further specified in the rest of this security considerations section, both iSCSI security mechanisms are mandatory to implement (MUST). The use of in-band authentication is strongly recommended (SHOULD). In contrast, the use of IPsec is optional (MAY), as the security risks that it addresses may only be present over a subset of the networks used by an iSCSI connection or a session; a specific example is that when an iSCSI session spans data centers, IPsec VPN gateways at the data center boundaries to protect the WAN connectivity between data centers may be appropriate in combination with in-band iSCSI authentication. Further details on typical iSCSI scenarios and the relationship between the initiators, targets, and the communication endpoints can be found in [RFC3723].

9.2. In-Band Initiator-Target Authentication

During login, the target MAY authenticate the initiator and the initiator MAY authenticate the target. The authentication is performed on every new iSCSI connection by an exchange of iSCSI Login PDUs using a negotiated authentication method. The authentication method cannot assume an underlying IPsec protection, because IPsec is optional to use. An attacker should gain as little advantage as possible by inspecting the authentication phase PDUs. Therefore, a method using cleartext (or equivalent) passwords MUST NOT be used; on the other hand, identity protection is not strictly required. The authentication mechanism protects against an unauthorized login to storage resources by using a false identity (spoofing). Once the authentication phase is completed, if the underlying IPsec is not used, all PDUs are sent and received in the clear. The
Top   ToC   RFC7143 - Page 133
   authentication mechanism alone (without underlying IPsec) should only
   be used when there is no risk of eavesdropping or of message
   insertion, deletion, modification, and replaying.

   Section 12 defines several authentication methods and the exact steps
   that must be followed in each of them, including the iSCSI-text-keys
   and their allowed values in each step.  Whenever an iSCSI initiator
   gets a response whose keys, or their values, are not according to the
   step definition, it MUST abort the connection.

   Whenever an iSCSI target gets a request or response whose keys, or
   their values, are not according to the step definition, it MUST
   answer with a Login reject with the "Initiator Error" or "Missing
   Parameter" status.  These statuses are not intended for
   cryptographically incorrect values such as the CHAP response, for
   which the "Authentication Failure" status MUST be specified.  The
   importance of this rule can be illustrated in CHAP with target
   authentication (see Section 12.1.3), where the initiator would have
   been able to conduct a reflection attack by omitting its response key
   (CHAP_R), using the same CHAP challenge as the target and reflecting
   the target's response back to the target.  In CHAP, this is prevented
   because the target must answer the missing CHAP_R key with a
   Login reject with the "Missing Parameter" status.

   For some of the authentication methods, a key specifies the identity
   of the iSCSI initiator or target for authentication purposes.  The
   value associated with that key MAY be different from the iSCSI name
   and SHOULD be configurable (CHAP_N: see Section 12.1.3; SRP_U: see
   Section 12.1.2).  For this reason, iSCSI implementations SHOULD
   manage authentication in a way that impersonation across iSCSI names
   via these authentication identities is not possible.  Specifically,
   implementations SHOULD allow configuration of an authentication
   identity for a Name if different, and authentication credentials for
   that identity.  During the login time, implementations SHOULD verify
   the Name-to-identity relationship in addition to authenticating the
   identity through the negotiated authentication method.

   When an iSCSI session has multiple TCP connections, either
   concurrently or sequentially, the authentication method and
   identities should not vary among the connections.  Therefore, all
   connections in an iSCSI session SHOULD use the same authentication
   method, iSCSI name, and authentication identity (for authentication
   methods that use an authentication identity).  Implementations SHOULD
   check this and cause an authentication failure on a new connection
   that uses a different authentication method, iSCSI name, or
   authentication identity from those already used in the session.  In
Top   ToC   RFC7143 - Page 134
   addition, implementations SHOULD NOT support both authenticated and
   unauthenticated TCP connections in the same iSCSI session, added
   either concurrently or sequentially to the session.

9.2.1. CHAP Considerations

Compliant iSCSI initiators and targets MUST implement the CHAP authentication method [RFC1994] (according to Section 12.1.3, including the target authentication option). When CHAP is performed over a non-encrypted channel, it is vulnerable to an off-line dictionary attack. Implementations MUST support the use of up to 128-bit random CHAP secrets, including the means to generate such secrets and to accept them from an external generation source. Implementations MUST NOT provide secret generation (or expansion) means other than random generation. An administrative entity of an environment in which CHAP is used with a secret that has less than 96 random bits MUST enforce IPsec encryption (according to the implementation requirements in Section 9.3.2) to protect the connection. Moreover, in this case, IKE authentication with group pre-shared cryptographic keys SHOULD NOT be used unless it is not essential to protect group members against off-line dictionary attacks by other members. CHAP secrets MUST be an integral number of bytes (octets). A compliant implementation SHOULD NOT continue with the login step in which it should send a CHAP response (CHAP_R; see Section 12.1.3) unless it can verify that the CHAP secret is at least 96 bits or that IPsec encryption is being used to protect the connection. Any CHAP secret used for initiator authentication MUST NOT be configured for authentication of any target, and any CHAP secret used for target authentication MUST NOT be configured for authentication of any initiator. If the CHAP response received by one end of an iSCSI connection is the same as the CHAP response that the receiving endpoint would have generated for the same CHAP challenge, the response MUST be treated as an authentication failure and cause the connection to close (this ensures that the same CHAP secret is not used for authentication in both directions). Also, if an iSCSI implementation can function as both initiator and target, different CHAP secrets and identities MUST be configured for these two roles. The following is an example of the attacks prevented by the above requirements: a) "Rogue" wants to impersonate "Storage" to Alice and knows that a single secret is used for both directions of Storage-Alice authentication.
Top   ToC   RFC7143 - Page 135
      b) Rogue convinces Alice to open two connections to itself and
         identifies itself as Storage on both connections.

      c) Rogue issues a CHAP challenge on Connection 1, waits for Alice
         to respond, and then reflects Alice's challenge as the initial
         challenge to Alice on Connection 2.

      d) If Alice doesn't check for the reflection across connections,
         Alice's response on Connection 2 enables Rogue to impersonate
         Storage on Connection 1, even though Rogue does not know the
         Alice-Storage CHAP secret.

   Originators MUST NOT reuse the CHAP challenge sent by the responder
   for the other direction of a bidirectional authentication.
   Responders MUST check for this condition and close the iSCSI TCP
   connection if it occurs.

   The same CHAP secret SHOULD NOT be configured for authentication of
   multiple initiators or multiple targets, as this enables any of them
   to impersonate any other one of them, and compromising one of them
   enables the attacker to impersonate any of them.  It is recommended
   that iSCSI implementations check for the use of identical CHAP
   secrets by different peers when this check is feasible and take
   appropriate measures to warn users and/or administrators when this is
   detected.

   When an iSCSI initiator or target authenticates itself to
   counterparts in multiple administrative domains, it SHOULD use a
   different CHAP secret for each administrative domain to avoid
   propagating security compromises across domains.

   Within a single administrative domain:

      - A single CHAP secret MAY be used for authentication of an
        initiator to multiple targets.

      - A single CHAP secret MAY be used for an authentication of a
        target to multiple initiators when the initiators use an
        external server (e.g., RADIUS [RFC2865]) to verify the target's
        CHAP responses and do not know the target's CHAP secret.

   If an external response verification server (e.g., RADIUS) is not
   used, employing a single CHAP secret for authentication of a target
   to multiple initiators requires that all such initiators know that
   target's secret.  Any of these initiators can impersonate the target
   to any other such initiator, and compromise of such an initiator
   enables an attacker to impersonate the target to all such initiators.
   Targets SHOULD use separate CHAP secrets for authentication to each
Top   ToC   RFC7143 - Page 136
   initiator when such risks are of concern; in this situation, it may
   be useful to configure a separate logical iSCSI target with its own
   iSCSI Node Name for each initiator or group of initiators among which
   such separation is desired.

   The above requirements strengthen the security properties of CHAP
   authentication for iSCSI by comparison to the basic CHAP
   authentication mechanism [RFC1994].  It is very important to adhere
   to these requirements, especially the requirements for strong (large
   randomly generated) CHAP secrets, as iSCSI implementations and
   deployments that fail to use strong CHAP secrets are likely to be
   highly vulnerable to off-line dictionary attacks on CHAP secrets.

   Replacement of CHAP with a better authentication mechanism is
   anticipated in a future version of iSCSI.  The FC-SP-2 standard
   [FC-SP-2] has specified the Extensible Authentication Protocol -
   Generalized Pre-Shared Key (EAP-GPSK) authentication mechanism
   [RFC5433] as an alternative to (and possible future replacement for)
   Fibre Channel's similar usage of strengthened CHAP.  Another possible
   replacement for CHAP is a secure password mechanism, e.g., an updated
   version of iSCSI's current SRP authentication mechanism.

9.2.2. SRP Considerations

The strength of the SRP authentication method (specified in [RFC2945]) is dependent on the characteristics of the group being used (i.e., the prime modulus N and generator g). As described in [RFC2945], N is required to be a Sophie Germain prime (of the form N = 2q + 1, where q is also prime) and the generator g is a primitive root of GF(N). In iSCSI authentication, the prime modulus N MUST be at least 768 bits. The list of allowed SRP groups is provided in [RFC3723].

9.2.3. Kerberos Considerations

iSCSI uses raw Kerberos V5 [RFC4120] for authenticating a client (iSCSI initiator) principal to a service (iSCSI target) principal. Note that iSCSI does not use the Generic Security Service Application Program Interface (GSS-API) [RFC2743] or the Kerberos V5 GSS-API security mechanism [RFC4121]. This means that iSCSI implementations supporting the KRB5 AuthMethod (Section 12.1) are directly involved in the Kerberos protocol. When Kerberos V5 is used for authentication, the following actions MUST be performed as specified in [RFC4120]: - The target MUST validate KRB_AP_REQ to ensure that the initiator can be trusted.
Top   ToC   RFC7143 - Page 137
      - When mutual authentication is selected, the initiator MUST
        validate KRB_AP_REP to determine the outcome of mutual
        authentication.

   As Kerberos V5 is capable of providing mutual authentication,
   implementations SHOULD support mutual authentication by default for
   login authentication.

   Note, however, that Kerberos authentication only assures that the
   server (iSCSI target) can be trusted by the Kerberos client
   (initiator) and vice versa; an initiator should employ appropriately
   secured service discovery techniques (e.g., iSNS; see Section 4.2.7)
   to ensure that it is talking to the intended target principal.

   iSCSI does not use Kerberos v5 for either integrity or
   confidentiality protection of the iSCSI protocol.  iSCSI uses IPsec
   for those purposes as specified in Section 9.3.

9.3. IPsec

iSCSI uses the IPsec mechanism for packet protection (cryptographic integrity, authentication, and confidentiality) at the IP level between the iSCSI communicating endpoints. The following sections describe the IPsec protocols that must be implemented for data authentication and integrity; confidentiality; and cryptographic key management. An iSCSI initiator or target may provide the required IPsec support fully integrated or in conjunction with an IPsec front-end device. In the latter case, the compliance requirements with regard to IPsec support apply to the "combined device". Only the "combined device" is to be considered an iSCSI device. Detailed considerations and recommendations for using IPsec for iSCSI are provided in [RFC3723] as updated by [RFC7146]. The IPsec requirements are reproduced here for convenience and are intended to match those in [RFC7146]; in the event of a discrepancy, the requirements in [RFC7146] apply.

9.3.1. Data Authentication and Integrity

Data authentication and integrity are provided by a cryptographic keyed Message Authentication Code in every sent packet. This code protects against message insertion, deletion, and modification. Protection against message replay is realized by using a sequence counter.
Top   ToC   RFC7143 - Page 138
   An iSCSI-compliant initiator or target MUST provide data
   authentication and integrity by implementing IPsec v2 [RFC2401] with
   ESPv2 [RFC2406] in tunnel mode, SHOULD provide data authentication
   and integrity by implementing IPsec v3 [RFC4301] with ESPv3 [RFC4303]
   in tunnel mode, and MAY provide data authentication and integrity by
   implementing either IPsec v2 or v3 with the appropriate version of
   ESP in transport mode.  The IPsec implementation MUST fulfill the
   following iSCSI-specific requirements:

      - HMAC-SHA1 MUST be implemented in the specific form of
        HMAC-SHA-1-96 [RFC2404].

      - AES CBC MAC with XCBC extensions using 128-bit keys SHOULD be
        implemented [RFC3566].

      - Implementations that support IKEv2 [RFC5996] SHOULD also
        implement AES Galois Message Authentication Code (GMAC)
        [RFC4543] using 128-bit keys.

   The ESP anti-replay service MUST also be implemented.

   At the high speeds at which iSCSI is expected to operate, a single
   IPsec SA could rapidly exhaust the ESP 32-bit sequence number space,
   requiring frequent rekeying of the SA, as rollover of the ESP
   sequence number within a single SA is prohibited for both ESPv2
   [RFC2406] and ESPv3 [RFC4303].  In order to provide the means to
   avoid this potentially undesirable frequent rekeying, implementations
   that are capable of operating at speeds of 1 gigabit/second or higher
   MUST implement extended (64-bit) sequence numbers for ESPv2 (and
   ESPv3, if supported) and SHOULD use extended sequence numbers for all
   iSCSI traffic.  Extended sequence number negotiation as part of
   security association establishment is specified in [RFC4304] for
   IKEv1 and [RFC5996] for IKEv2.

9.3.2. Confidentiality

Confidentiality is provided by encrypting the data in every packet. When confidentiality is used, it MUST be accompanied by data authentication and integrity to provide comprehensive protection against eavesdropping and against message insertion, deletion, modification, and replaying. An iSCSI-compliant initiator or target MUST provide confidentiality by implementing IPsec v2 [RFC2401] with ESPv2 [RFC2406] in tunnel mode, SHOULD provide confidentiality by implementing IPsec v3 [RFC4301] with ESPv3 [RFC4303] in tunnel mode, and MAY provide
Top   ToC   RFC7143 - Page 139
   confidentiality by implementing either IPsec v2 or v3 with the
   appropriate version of ESP in transport mode, with the following
   iSCSI-specific requirements that apply to IPsec v2 and IPsec v3:

      - 3DES in CBC mode MAY be implemented [RFC2451].

      - AES in CBC mode with 128-bit keys MUST be implemented [RFC3602];
        other key sizes MAY be supported.

      - AES in Counter mode MAY be implemented [RFC3686].

      - Implementations that support IKEv2 [RFC5996] SHOULD also
        implement AES Galois/Counter Mode (GCM) with 128-bit keys
        [RFC4106]; other key sizes MAY be supported.

   Due to its inherent weakness, DES in CBC mode MUST NOT be used.

   The NULL encryption algorithm MUST also be implemented.

9.3.3. Policy, Security Associations, and Cryptographic Key Management

A compliant iSCSI implementation MUST meet the cryptographic key management requirements of the IPsec protocol suite. Authentication, security association negotiation, and cryptographic key management MUST be provided by implementing IKE [RFC2409] using the IPsec DOI [RFC2407] and SHOULD be provided by implementing IKEv2 [RFC5996], with the following iSCSI-specific requirements: a) Peer authentication using a pre-shared cryptographic key MUST be supported. Certificate-based peer authentication using digital signatures MAY be supported. For IKEv1 ([RFC2409]), peer authentication using the public key encryption methods outlined in Sections 5.2 and 5.3 of [RFC2409] SHOULD NOT be used. b) When digital signatures are used to achieve authentication, an IKE negotiator SHOULD use IKE Certificate Request Payload(s) to specify the certificate authority. IKE negotiators SHOULD check certificate validity via the pertinent Certificate Revocation List (CRL) or via the use of the Online Certificate Status Protocol (OCSP) [RFC6960] before accepting a PKI certificate for use in IKE authentication procedures. OCSP support within the IKEv2 protocol is specified in [RFC4806]. These checks may not be needed in environments where a small number of certificates are statically configured as trust anchors.
Top   ToC   RFC7143 - Page 140
      c) Conformant iSCSI implementations of IKEv1 MUST support Main
         Mode and SHOULD support Aggressive Mode.  Main Mode with a
         pre-shared key authentication method SHOULD NOT be used when
         either the initiator or the target uses dynamically assigned
         addresses.  While in many cases pre-shared keys offer good
         security, situations in which dynamically assigned addresses
         are used force the use of a group pre-shared key, which creates
         vulnerability to a man-in-the-middle attack.

      d) In the IKEv1 Phase 2 Quick Mode, in exchanges for creating the
         Phase 2 SA, the Identification Payload MUST be present.

      e) The following identification type requirements apply to IKEv1:
         ID_IPV4_ADDR, ID_IPV6_ADDR (if the protocol stack supports
         IPv6), and ID_FQDN Identification Types MUST be supported;
         ID_USER_FQDN SHOULD be supported.  The IP Subnet, IP Address
         Range, ID_DER_ASN1_DN, and ID_DER_ASN1_GN Identification Types
         SHOULD NOT be used.  The ID_KEY_ID Identification Type MUST NOT
         be used.

      f) If IKEv2 is supported, the following identification
         requirements apply:  ID_IPV4_ADDR, ID_IPV6_ADDR (if the
         protocol stack supports IPv6), and ID_FQDN Identification Types
         MUST be supported; ID_RFC822_ADDR SHOULD be supported.  The
         ID_DER_ASN1_DN and ID_DER_ASN1_GN Identification Types SHOULD
         NOT be used.  The ID_KEY_ID Identification Type MUST NOT be
         used.

   The reasons for the "MUST NOT" and "SHOULD NOT" for identification
   type requirements in preceding bullets e) and f) are:

      - IP Subnet and IP Address Range are too broad to usefully
        identify an iSCSI endpoint.

      - The DN and GN types are X.500 identities; it is usually better
        to use an identity from subjectAltName in a PKI certificate.

      - ID_KEY_ID is not interoperable as specified.

   Manual cryptographic keying MUST NOT be used, because it does not
   provide the necessary rekeying support.

   When Diffie-Hellman (DH) groups are used, a DH group of at least
   2048 bits SHOULD be offered as a part of all proposals to create
   IPsec security associations to protect iSCSI traffic, with both IKEv1
   and IKEv2.
Top   ToC   RFC7143 - Page 141
   When IPsec is used, the receipt of an IKEv1 Phase 2 delete message or
   an IKEv2 INFORMATIONAL exchange that deletes the SA SHOULD NOT be
   interpreted as a reason for tearing down the iSCSI TCP connection.
   If additional traffic is sent on it, a new IKE SA will be created to
   protect it.

   The method used by the initiator to determine whether the target
   should be connected using IPsec is regarded as an issue of IPsec
   policy administration and thus not defined in the iSCSI standard.

   The method used by an initiator that supports both IPsec v2 and v3 to
   determine which versions of IPsec are supported by the target is also
   regarded as an issue of IPsec policy administration and thus not
   defined in the iSCSI standard.  If both IPsec v2 and v3 are supported
   by both the initiator and target, the use of IPsec v3 is recommended.

   If an iSCSI target is discovered via a SendTargets request in a
   Discovery session not using IPsec, the initiator should assume that
   it does not need IPsec to establish a session to that target.  If an
   iSCSI target is discovered using a Discovery session that does use
   IPsec, the initiator SHOULD use IPsec when establishing a session to
   that target.

9.4. Security Considerations for the X#NodeArchitecture Key

The security considerations in this section are specific to the X#NodeArchitecture discussed in Section 13.26. This extension key transmits specific implementation details about the node that sends it; such details may be considered sensitive in some environments. For example, if a certain software or firmware version is known to contain security weaknesses, announcing the presence of that version via this key may not be desirable. The countermeasures for this security concern are: a) sending less detailed information in the key values, b) not sending the extension key, or c) using IPsec ([RFC4303]) to provide confidentiality for the iSCSI connection on which the key is sent. To support the first and second countermeasures, all implementations of this extension key MUST provide an administrative mechanism to disable sending the key. In addition, all implementations SHOULD provide an administrative mechanism to configure a verbosity level of the key value, thereby controlling the amount of information sent.
Top   ToC   RFC7143 - Page 142
   For example, a lower verbosity level might enable transmission of
   node architecture component names only, but no version numbers.  The
   choice of which countermeasure is most appropriate depends on the
   environment.  However, sending less detailed information in the key
   values may be an acceptable countermeasure in many environments,
   since it provides a compromise between sending too much information
   and the other more complete countermeasures of not sending the key at
   all or using IPsec.

   In addition to security considerations involving transmission of the
   key contents, any logging method(s) used for the key values MUST keep
   the information secure from intruders.  For all implementations, the
   requirements to address this security concern are as follows:

      a) Display of the log MUST only be possible with administrative
         rights to the node.

      b) Options to disable logging to disk and to keep logs for a fixed
         duration SHOULD be provided.

   Finally, it is important to note that different nodes may have
   different levels of risk, and these differences may affect the
   implementation.  The components of risk include assets, threats, and
   vulnerabilities.  Consider the following example iSCSI nodes, which
   demonstrate differences in assets and vulnerabilities of the nodes,
   and, as a result, differences in implementation:

      a) One iSCSI target based on a special-purpose operating system:
         Since the iSCSI target controls access to the data storage
         containing company assets, the asset level is seen as very
         high.  Also, because of the special-purpose operating system,
         in which vulnerabilities are less well known, the vulnerability
         level is viewed as low.

      b) Multiple iSCSI initiators in a blade farm, each running a
         general-purpose operating system: The asset level of each node
         is viewed as low, since blades are replaceable and low cost.
         However, the vulnerability level is viewed as high, since there
         may be many well-known vulnerabilities to that general-purpose
         operating system.  For this target, an appropriate
         implementation might be the logging of received key values but
         no transmission of the key.  For this initiator, an appropriate
         implementation might be transmission of the key but no logging
         of received key values.
Top   ToC   RFC7143 - Page 143

9.5. SCSI Access Control Considerations

iSCSI is a SCSI transport protocol and as such does not apply any access controls on SCSI-level operations such as SCSI task management functions (e.g., LU reset; see Section 11.5.1). SCSI-level access controls (e.g., ACCESS CONTROL OUT; see [SPC3]) have to be appropriately deployed in practice to address SCSI-level security considerations, in addition to security via iSCSI connection and packet protection mechanisms that were already discussed in preceding sections.

10. Notes to Implementers

This section notes some of the performance and reliability considerations of the iSCSI protocol. This protocol was designed to allow efficient silicon and software implementations. The iSCSI task tag mechanism was designed to enable Direct Data Placement (DDP -- a DMA form) at the iSCSI level or lower. The guiding assumption made throughout the design of this protocol is that targets are resource constrained relative to initiators. Implementers are also advised to consider the implementation consequences of the iSCSI-to-SCSI mapping model as outlined in Section 4.4.3.

10.1. Multiple Network Adapters

The iSCSI protocol allows multiple connections, not all of which need to go over the same network adapter. If multiple network connections are to be utilized with hardware support, the iSCSI protocol command- data-status allegiance to one TCP connection ensures that there is no need to replicate information across network adapters or otherwise require them to cooperate. However, some task management commands may require some loose form of cooperation or replication at least on the target.

10.1.1. Conservative Reuse of ISIDs

Historically, the SCSI model (and implementations and applications based on that model) has assumed that SCSI ports are static, physical entities. Recent extensions to the SCSI model have taken advantage of persistent worldwide unique names for these ports. In iSCSI, however, the SCSI initiator ports are the endpoints of dynamically created sessions, so the presumptions of "static and physical" do not apply. In any case, the "model" sections (particularly,
Top   ToC   RFC7143 - Page 144
   Section 4.4.1) provide for persistent, reusable names for the
   iSCSI-type SCSI initiator ports even though there does not need to be
   any physical entity bound to these names.

   To both minimize the disruption of legacy applications and better
   facilitate the SCSI features that rely on persistent names for SCSI
   ports, iSCSI implementations SHOULD attempt to provide a stable
   presentation of SCSI initiator ports (both to the upper OS layers and
   the targets to which they connect).  This can be achieved in an
   initiator implementation by conservatively reusing ISIDs.  In other
   words, the same ISID should be used in the login process to multiple
   target portal groups (of the same iSCSI target or different iSCSI
   targets).  The ISID RULE (Section 4.4.3) only prohibits reuse to the
   same target portal group.  It does not "preclude" reuse to other
   target portal groups.  The principle of conservative reuse
   "encourages" reuse to other target portal groups.  When a SCSI target
   device sees the same (InitiatorName, ISID) pair in different sessions
   to different target portal groups, it can identify the underlying
   SCSI initiator port on each session as the same SCSI port.  In
   effect, it can recognize multiple paths from the same source.

10.1.2. iSCSI Name, ISID, and TPGT Use

The designers of the iSCSI protocol are aware that legacy SCSI transports rely on initiator identity to assign access to storage resources. Although newer techniques that simplify access control are available, support for configuration and authentication schemes that are based on initiator identity is deemed important in order to support legacy systems and administration software. iSCSI thus supports the notion that it should be possible to assign access to storage resources based on "initiator device" identity. When there are multiple hardware or software components coordinated as a single iSCSI node, there must be some (logical) entity that represents the iSCSI node that makes the iSCSI Node Name available to all components involved in session creation and login. Similarly, this entity that represents the iSCSI node must be able to coordinate session identifier resources (the ISID for initiators) to enforce both the ISID RULE and the TSIH RULE (see Section 4.4.3). For targets, because of the closed environment, implementation of this entity should be straightforward. However, vendors of iSCSI hardware (e.g., NICs or HBAs) intended for targets SHOULD provide mechanisms for configuration of the iSCSI Node Name across the portal groups instantiated by multiple instances of these components within a target.
Top   ToC   RFC7143 - Page 145
   However, complex targets making use of multiple Target Portal Group
   Tags may reconfigure them to achieve various quality goals.  The
   initiators have two mechanisms at their disposal to discover and/or
   check reconfiguring targets -- the Discovery session type and a key
   returned by the target during login to confirm the TPGT.  An
   initiator should attempt to "rediscover" the target configuration
   whenever a session is terminated unexpectedly.

   For initiators, in the long term, it is expected that operating
   system vendors will take on the role of this entity and provide
   standard APIs that can inform components of their iSCSI Node Name and
   can configure and/or coordinate ISID allocation, use, and reuse.

   Recognizing that such initiator APIs are not available today, other
   implementations of the role of this entity are possible.  For
   example, a human may instantiate the (common) node name as part of
   the installation process of each iSCSI component involved in session
   creation and login.  This may be done by pointing the component to
   either a vendor-specific location for this datum or a system-wide
   location.  The structure of the ISID namespace (see Section 11.12.5
   and [RFC3721]) facilitates implementation of the ISID coordination by
   allowing each component vendor to independently (of other vendor's
   components) coordinate allocation, use, and reuse of its own
   partition of the ISID namespace in a vendor-specific manner.
   Partitioning of the ISID namespace within initiator portal groups
   managed by that vendor allows each such initiator portal group to act
   independently of all other portal groups when selecting an ISID for a
   login; this facilitates enforcement of the ISID RULE (see
   Section 4.4.3) at the initiator.

   A vendor of iSCSI hardware (e.g., NICs or HBAs) intended for use in
   initiators MUST implement a mechanism for configuring the iSCSI Node
   Name.  Vendors and administrators must ensure that iSCSI Node Names
   are worldwide unique.  It is therefore important that when one
   chooses to reuse the iSCSI Node Name of a disabled unit one does not
   reassign that name to the original unit unless its worldwide
   uniqueness can be ascertained again.

   In addition, a vendor of iSCSI hardware must implement a mechanism to
   configure and/or coordinate ISIDs for all sessions managed by
   multiple instances of that hardware within a given iSCSI node.  Such
   configuration might be either permanently preassigned at the factory
   (in a necessarily globally unique way), statically assigned (e.g.,
   partitioned across all the NICs at initialization in a locally unique
   way), or dynamically assigned (e.g., on-line allocator, also in a
   locally unique way).  In the latter two cases, the configuration may
Top   ToC   RFC7143 - Page 146
   be via public APIs (perhaps driven by an independent vendor's
   software, such as the OS vendor) or private APIs driven by the
   vendor's own software.

   The process of name assignment and coordination has to be as
   encompassing and automated as possible, as years of legacy usage have
   shown that it is highly error-prone.  It should be mentioned that
   today SCSI has alternative schemes of access control that can be used
   by all transports, and their security is not dependent on strict
   naming coordination.

10.2. Autosense and Auto Contingent Allegiance (ACA)

"Autosense" refers to the automatic return of sense data to the initiator in cases where a command did not complete successfully. iSCSI initiators and targets MUST support and use Autosense. ACA helps preserve ordered command execution in the presence of errors. As there can be many commands in-flight between an initiator and a target, SCSI initiator functionality in some operating systems depends on ACA to enforce ordered command execution during error recovery, and hence iSCSI initiator implementations for those operating systems need to support ACA. In order to support error recovery for these operating systems and iSCSI initiators, iSCSI targets SHOULD support ACA.

10.3. iSCSI Timeouts

iSCSI recovery actions are often dependent on iSCSI timeouts being recognized and acted upon before SCSI timeouts. Determining the right timeouts to use for various iSCSI actions (command acknowledgments expected, status acknowledgments, etc.) is very much dependent on infrastructure (e.g., hardware, links, TCP/IP stack, iSCSI driver). As a guide, the implementer may use an average NOP-Out/NOP-In turnaround delay multiplied by a "safety factor" (e.g., 4) as a good estimate for the basic delay of the iSCSI stack for a given connection. The safety factor should account for network load variability. For connection teardown, the implementer may want to also consider TCP common practice for the given infrastructure. Text negotiations MAY also be subject to either time limits or limits in the number of exchanges. Those limits SHOULD be generous enough to avoid affecting interoperability (e.g., allowing each key to be negotiated on a separate exchange). The relationship between iSCSI timeouts and SCSI timeouts should also be considered. SCSI timeouts should be longer than iSCSI timeouts plus the time required for iSCSI recovery whenever iSCSI recovery is
Top   ToC   RFC7143 - Page 147
   planned.  Alternatively, an implementer may choose to interlock iSCSI
   timeouts and recovery with SCSI timeouts so that SCSI recovery will
   become active only where iSCSI is not planned to, or failed to,
   recover.

   The implementer may also want to consider the interaction between
   various iSCSI exception events -- such as a digest failure -- and
   subsequent timeouts.  When iSCSI error recovery is active, a digest
   failure is likely to result in discovering a missing command or data
   PDU.  In these cases, an implementer may want to lower the timeout
   values to enable faster initiation for recovery procedures.

10.4. Command Retry and Cleaning Old Command Instances

To avoid having old, retried command instances appear in a valid command window after a command sequence number wraparound, the protocol requires (see Section 4.2.2.1) that on every connection on which a retry has been issued a non-immediate command be issued and acknowledged within an interval of 2**31 - 1 commands from the CmdSN of the retried command. This requirement can be fulfilled by an implementation in several ways. The simplest technique to use is to send a (non-retry) non-immediate SCSI command (or a NOP if no SCSI command is available for a while) after every command retry on the connection on which the retry was attempted. Because errors are deemed rare events, this technique is probably the most effective, as it does not involve additional checks at the initiator when issuing commands.

10.5. Sync and Steering Layer, and Performance

While a Sync and Steering layer is optional, an initiator/target that does not have it working against a target/initiator that demands sync and steering may experience performance degradation caused by packet reordering and loss. Providing a sync and steering mechanism is recommended for all high-speed implementations.

10.6. Considerations for State-Dependent Devices and Long-Lasting SCSI Operations

Sequential access devices operate on the principle that the position of the device is based on the last command processed. As such, command processing order, and knowledge of whether or not the previous command was processed, are of the utmost importance to maintain data integrity. For example, inadvertent retries of SCSI commands when it is not known if the previous SCSI command was processed is a potential data integrity risk.
Top   ToC   RFC7143 - Page 148
   For a sequential access device, consider the scenario in which a SCSI
   SPACE command to backspace one filemark is issued and then reissued
   due to no status received for the command.  If the first SPACE
   command was actually processed, the reissued SPACE command, if
   processed, will cause the position to change.  Thus, a subsequent
   write operation will write data to the wrong position, and any
   previous data at that position will be overwritten.

   For a medium changer device, consider the scenario in which an
   EXCHANGE MEDIUM command (the SOURCE ADDRESS and DESTINATION ADDRESS
   are the same, thus performing a swap) is issued and then reissued due
   to no status received for the command.  If the first EXCHANGE MEDIUM
   command was actually processed, the reissued EXCHANGE MEDIUM command,
   if processed, will perform the swap again.  The net effect is that no
   swap was performed, thus putting data integrity at risk.

   All commands that change the state of the device (e.g., SPACE
   commands for sequential access devices and EXCHANGE MEDIUM commands
   for medium changer devices) MUST be issued as non-immediate commands
   for deterministic and ordered delivery to iSCSI targets.

   For many of those state-changing commands, the execution model also
   assumes that the command is executed exactly once.  Devices
   implementing READ POSITION and LOCATE provide a means for SCSI-level
   command recovery, and new tape-class devices should support those
   commands.  In their absence, a retry at the SCSI level is difficult,
   and error recovery at the iSCSI level is advisable.

   Devices operating on long-latency delivery subsystems and performing
   long-lasting SCSI operations may need mechanisms that enable
   connection replacement while commands are running (e.g., during an
   extended copy operation).

10.6.1. Determining the Proper ErrorRecoveryLevel

The implementation and use of a specific ErrorRecoveryLevel should be determined based on the deployment scenarios of a given iSCSI implementation. Generally, the following factors must be considered before deciding on the proper level of recovery: a) Application resilience to I/O failures. b) Required level of availability in the face of transport connection failures.
Top   ToC   RFC7143 - Page 149
      c) Probability of transport-layer "checksum escape" (message error
         undetected by TCP checksum -- see [RFC3385] for related
         discussion).  This in turn decides the iSCSI digest failure
         frequency and thus the criticality of iSCSI-level error
         recovery.  The details of estimating this probability are
         outside the scope of this document.

   A consideration of the above factors for SCSI tape devices as an
   example suggests that implementations SHOULD use ErrorRecoveryLevel=1
   when transport connection failure is not a concern and SCSI-level
   recovery is unavailable, and ErrorRecoveryLevel=2 when there is a
   high likelihood of connection failure during a backup/retrieval.

   For extended copy operations, implementations SHOULD use
   ErrorRecoveryLevel=2 whenever there is a relatively high likelihood
   of connection failure.

10.7. Multi-Task Abort Implementation Considerations

Multi-task abort operations are typically issued in emergencies, such as clearing a device lock-up, HA failover/failback, etc. In these circumstances, it is desirable to rapidly go through the error- handling process as opposed to the target waiting on multiple third- party initiators that may not even be functional anymore -- especially if this emergency is triggered because of one such initiator failure. Therefore, both iSCSI target and initiator implementations SHOULD support FastAbort multi-task abort semantics (Section 4.2.3.4). Note that in both standard semantics (Section 4.2.3.3) and FastAbort semantics (Section 4.2.3.4) there may be outstanding data transfers even after the TMF completion is reported on the issuing session. In the case of iSCSI/iSER [RFC7145], these would be tagged data transfers for STags not owned by any active tasks. Whether or not real buffers support these data transfers is implementation dependent. However, the data transfers logically MUST be silently discarded by the target iSCSI layer in all cases. A target MAY, on an implementation-defined internal timeout, also choose to drop the connections on which it did not receive the expected Data-Out sequences (Section 4.2.3.3) or NOP-Out acknowledgments (Section 4.2.3.4) so as to reclaim the associated buffer, STag, and TTT resources as appropriate.
Top   ToC   RFC7143 - Page 150

11. iSCSI PDU Formats

All multi-byte integers that are specified in formats defined in this document are to be represented in network byte order (i.e., big-endian). Any field that appears in this document assumes that the most significant byte is the lowest numbered byte and the most significant bit (within byte or field) is the lowest numbered bit unless specified otherwise. Any compliant sender MUST set all bits not defined and all reserved fields to 0, unless specified otherwise. Any compliant receiver MUST ignore any bit not defined and all reserved fields unless specified otherwise. Receipt of reserved code values in defined fields MUST be reported as a protocol error. Reserved fields are marked by the word "reserved", some abbreviation of "reserved", or by "." for individual bits when no other form of marking is technically feasible.

11.1. iSCSI PDU Length and Padding

iSCSI PDUs are padded to the closest integer number of 4-byte words. The padding bytes SHOULD be sent as 0.

11.2. PDU Template, Header, and Opcodes

All iSCSI PDUs have one or more header segments and, optionally, a data segment. After the entire header segment group, a header digest MAY follow. The data segment MAY also be followed by a data digest. The Basic Header Segment (BHS) is the first segment in all of the iSCSI PDUs. The BHS is a fixed-length 48-byte header segment. It MAY be followed by Additional Header Segments (AHS), a Header-Digest, a Data Segment, and/or a Data-Digest.
Top   ToC   RFC7143 - Page 151
   The overall structure of an iSCSI PDU is as follows:

   Byte/     0       |       1       |       2       |       3       |
      /              |               |               |               |
     |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|
     +---------------+---------------+---------------+---------------+
    0/ Basic Header Segment (BHS)                                    /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
   48/ Additional Header Segment 1 (AHS) (optional)                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
     / Additional Header Segment 2 (AHS) (optional)                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
     +---------------+---------------+---------------+---------------+
     / Additional Header Segment n (AHS) (optional)                  /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    k/ Header-Digest (optional)                                      /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    l/ Data Segment (optional)                                       /
    +/                                                               /
     +---------------+---------------+---------------+---------------+
    m/ Data-Digest (optional)                                        /
    +/                                                               /
     +---------------+---------------+---------------+---------------+

   All PDU segments and digests are padded to the closest integer number
   of 4-byte words.  For example, all PDU segments and digests start at
   a 4-byte word boundary, and the padding ranges from 0 to 3 bytes.
   The padding bytes SHOULD be sent as 0.

   iSCSI Response PDUs do not have AH Segments.
Top   ToC   RFC7143 - Page 152

11.2.1. Basic Header Segment (BHS)

The BHS is 48 bytes long. The Opcode and DataSegmentLength fields appear in all iSCSI PDUs. In addition, when used, the Initiator Task Tag and Logical Unit Number always appear in the same location in the header. The format of the BHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|.|I| Opcode |F| Opcode-specific fields | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| LUN or Opcode-specific fields | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20/ Opcode-specific fields / +/ / +---------------+---------------+---------------+---------------+ 48
11.2.1.1. I (Immediate) Bit
For Request PDUs, the I bit set to 1 is an immediate delivery marker.
11.2.1.2. Opcode
The Opcode indicates the type of iSCSI PDU the header encapsulates. The Opcodes are divided into two categories: initiator Opcodes and target Opcodes. Initiator Opcodes are in PDUs sent by the initiator (Request PDUs). Target Opcodes are in PDUs sent by the target (Response PDUs). Initiators MUST NOT use target Opcodes, and targets MUST NOT use initiator Opcodes.
Top   ToC   RFC7143 - Page 153
   Initiator Opcodes defined in this specification are:

      0x00 NOP-Out

      0x01 SCSI Command (encapsulates a SCSI Command Descriptor
           Block)

      0x02 SCSI Task Management Function Request

      0x03 Login Request

      0x04 Text Request

      0x05 SCSI Data-Out (for write operations)

      0x06 Logout Request

      0x10 SNACK Request

      0x1c-0x1e Vendor-specific codes

   Target Opcodes are:

      0x20 NOP-In

      0x21 SCSI Response - contains SCSI status and possibly sense
           information or other response information

      0x22 SCSI Task Management Function Response

      0x23 Login Response

      0x24 Text Response

      0x25 SCSI Data-In (for read operations)

      0x26 Logout Response

      0x31 Ready To Transfer (R2T) - sent by target when it is ready
           to receive data

      0x32 Asynchronous Message - sent by target to indicate certain
           special conditions

      0x3c-0x3e Vendor-specific codes

      0x3f Reject
Top   ToC   RFC7143 - Page 154
   All other Opcodes are unassigned.

11.2.1.3. F (Final) Bit
When set to 1 it indicates the final (or only) PDU of a sequence.
11.2.1.4. Opcode-Specific Fields
These fields have different meanings for different Opcode types.
11.2.1.5. TotalAHSLength
This is the total length of all AHS header segments in units of 4-byte words, including padding, if any. The TotalAHSLength is only used in PDUs that have an AHS and MUST be 0 in all other PDUs.
11.2.1.6. DataSegmentLength
This is the data segment payload length in bytes (excluding padding). The DataSegmentLength MUST be 0 whenever the PDU has no data segment.
11.2.1.7. LUN
Some Opcodes operate on a specific LU. The Logical Unit Number (LUN) field identifies which LU. If the Opcode does not relate to a LU, this field is either ignored or may be used in an Opcode-specific way. The LUN field is 64 bits and should be formatted in accordance with [SAM2]. For example, LUN[0] from [SAM2] is BHS byte 8 and so on up to LUN[7] from [SAM2], which is BHS byte 15.
11.2.1.8. Initiator Task Tag
The initiator assigns a task tag to each iSCSI task it issues. While a task exists, this tag MUST uniquely identify the task session-wide. SCSI may also use the Initiator Task Tag as part of the SCSI task identifier when the timespan during which an iSCSI Initiator Task Tag must be unique extends over the timespan during which a SCSI task tag must be unique. However, the iSCSI Initiator Task Tag must exist and be unique even for untagged SCSI commands. An ITT value of 0xffffffff is reserved and MUST NOT be assigned for a task by the initiator. The only instance in which it may be seen on the wire is in a target-initiated NOP-In PDU (Section 11.19) and in the initiator response to that PDU, if necessary.
Top   ToC   RFC7143 - Page 155

11.2.2. Additional Header Segment (AHS)

The general format of an AHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| AHSLength | AHSType | AHS-Specific | +---------------+---------------+---------------+---------------+ 4/ AHS-Specific / +/ / +---------------+---------------+---------------+---------------+ x
11.2.2.1. AHSType
The AHSType field is coded as follows: bit 0-1 - Reserved bit 2-7 - AHS code 0 - Reserved 1 - Extended CDB 2 - Bidirectional Read Expected Data Transfer Length 3 - 63 Reserved
11.2.2.2. AHSLength
This field contains the effective length in bytes of the AHS, excluding AHSType and AHSLength and padding, if any. The AHS is padded to the smallest integer number of 4-byte words (i.e., from 0 up to 3 padding bytes).
Top   ToC   RFC7143 - Page 156
11.2.2.3. Extended CDB AHS
The format of the Extended CDB AHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| AHSLength (CDBLength - 15) | 0x01 | Reserved | +---------------+---------------+---------------+---------------+ 4/ ExtendedCDB...+padding / +/ / +---------------+---------------+---------------+---------------+ x This type of AHS MUST NOT be used if the CDBLength is less than 17. The length includes the reserved byte 3.
11.2.2.4. Bidirectional Read Expected Data Transfer Length AHS
The format of the Bidirectional Read Expected Data Transfer Length AHS is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| AHSLength (0x0005) | 0x02 | Reserved | +---------------+---------------+---------------+---------------+ 4| Bidirectional Read Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 8

11.2.3. Header Digest and Data Digest

Optional header and data digests protect the integrity of the header and data, respectively. The digests, if present, are located, respectively, after the header and PDU-specific data and cover, respectively, the header and the PDU data, each including the padding bytes, if any. The existence and type of digests are negotiated during the Login Phase.
Top   ToC   RFC7143 - Page 157
   The separation of the header and data digests is useful in iSCSI
   routing applications, in which only the header changes when a message
   is forwarded.  In this case, only the header digest should be
   recalculated.

   Digests are not included in data or header length fields.

   A zero-length Data Segment also implies a zero-length Data-Digest.

11.2.4. Data Segment

The (optional) Data Segment contains PDU-associated data. Its payload effective length is provided in the BHS field -- DataSegmentLength. The Data Segment is also padded to an integer number of 4-byte words.
Top   ToC   RFC7143 - Page 158

11.3. SCSI Command

The format of the SCSI Command PDU is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|.|I| 0x01 |F|R|W|. .|ATTR | Reserved | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Logical Unit Number (LUN) | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| Expected Data Transfer Length | +---------------+---------------+---------------+---------------+ 24| CmdSN | +---------------+---------------+---------------+---------------+ 28| ExpStatSN | +---------------+---------------+---------------+---------------+ 32/ SCSI Command Descriptor Block (CDB) / +/ / +---------------+---------------+---------------+---------------+ 48/ AHS (optional) / +---------------+---------------+---------------+---------------+ x/ Header-Digest (optional) / +---------------+---------------+---------------+---------------+ y/ (DataSegment, Command Data) (optional) / +/ / +---------------+---------------+---------------+---------------+ z/ Data-Digest (optional) / +---------------+---------------+---------------+---------------+
Top   ToC   RFC7143 - Page 159

11.3.1. Flags and Task Attributes (Byte 1)

The flags for a SCSI Command PDU are: bit 0 (F) is set to 1 when no unsolicited SCSI Data-Out PDUs follow this PDU. When F = 1 for a write and if Expected Data Transfer Length is larger than the DataSegmentLength, the target may solicit additional data through R2T. bit 1 (R) is set to 1 when the command is expected to input data. bit 2 (W) is set to 1 when the command is expected to output data. bit 3-4 Reserved. bit 5-7 contains Task Attributes. Task Attributes (ATTR) have one of the following integer values (see [SAM2] for details): 0 - Untagged 1 - Simple 2 - Ordered 3 - Head of queue 4 - ACA 5-7 - Reserved At least one of the W and F bits MUST be set to 1. Either or both of R and W MAY be 1 when the Expected Data Transfer Length and/or the Bidirectional Read Expected Data Transfer Length are 0, but they MUST NOT both be 0 when the Expected Data Transfer Length and/or Bidirectional Read Expected Data Transfer Length are not 0 (i.e., when some data transfer is expected, the transfer direction is indicated by the R and/or W bit).

11.3.2. CmdSN - Command Sequence Number

The CmdSN enables ordered delivery across multiple connections in a single session.
Top   ToC   RFC7143 - Page 160

11.3.3. ExpStatSN

Command responses up to ExpStatSN - 1 (modulo 2**32) have been received (acknowledges status) on the connection.

11.3.4. Expected Data Transfer Length

For unidirectional operations, the Expected Data Transfer Length field contains the number of bytes of data involved in this SCSI operation. For a unidirectional write operation (W flag set to 1 and R flag set to 0), the initiator uses this field to specify the number of bytes of data it expects to transfer for this operation. For a unidirectional read operation (W flag set to 0 and R flag set to 1), the initiator uses this field to specify the number of bytes of data it expects the target to transfer to the initiator. It corresponds to the SAM-2 byte count. For bidirectional operations (both R and W flags are set to 1), this field contains the number of data bytes involved in the write transfer. For bidirectional operations, an additional header segment MUST be present in the header sequence that indicates the Bidirectional Read Expected Data Transfer Length. The Expected Data Transfer Length field and the Bidirectional Read Expected Data Transfer Length field correspond to the SAM-2 byte count. If the Expected Data Transfer Length for a write and the length of the immediate data part that follows the command (if any) are the same, then no more data PDUs are expected to follow. In this case, the F bit MUST be set to 1. If the Expected Data Transfer Length is higher than the FirstBurstLength (the negotiated maximum amount of unsolicited data the target will accept), the initiator MUST send the maximum amount of unsolicited data OR ONLY the immediate data, if any. Upon completion of a data transfer, the target informs the initiator (through residual counts) of how many bytes were actually processed (sent and/or received) by the target.

11.3.5. CDB - SCSI Command Descriptor Block

There are 16 bytes in the CDB field to accommodate the commonly used CDBs. Whenever the CDB is larger than 16 bytes, an Extended CDB AHS MUST be used to contain the CDB spillover.
Top   ToC   RFC7143 - Page 161

11.3.6. Data Segment - Command Data

Some SCSI commands require additional parameter data to accompany the SCSI command. This data may be placed beyond the boundary of the iSCSI header in a data segment. Alternatively, user data (e.g., from a write operation) can be placed in the data segment (both cases are referred to as immediate data). These data are governed by the rules for solicited vs. unsolicited data outlined in Section 4.2.5.2.

11.4. SCSI Response

The format of the SCSI Response PDU is: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|.|.| 0x21 |1|. .|o|u|O|U|.| Response | Status | +---------------+---------------+---------------+---------------+ 4|TotalAHSLength | DataSegmentLength | +---------------+---------------+---------------+---------------+ 8| Reserved | + + 12| | +---------------+---------------+---------------+---------------+ 16| Initiator Task Tag | +---------------+---------------+---------------+---------------+ 20| SNACK Tag or Reserved | +---------------+---------------+---------------+---------------+ 24| StatSN | +---------------+---------------+---------------+---------------+ 28| ExpCmdSN | +---------------+---------------+---------------+---------------+ 32| MaxCmdSN | +---------------+---------------+---------------+---------------+ 36| ExpDataSN or Reserved | +---------------+---------------+---------------+---------------+ 40| Bidirectional Read Residual Count or Reserved | +---------------+---------------+---------------+---------------+ 44| Residual Count or Reserved | +---------------+---------------+---------------+---------------+ 48| Header-Digest (optional) | +---------------+---------------+---------------+---------------+ / Data Segment (optional) / +/ / +---------------+---------------+---------------+---------------+ | Data-Digest (optional) | +---------------+---------------+---------------+---------------+
Top   ToC   RFC7143 - Page 162

11.4.1. Flags (Byte 1)

bit 1-2 Reserved. bit 3 - (o) set for Bidirectional Read Residual Overflow. In this case, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator because the initiator's Bidirectional Read Expected Data Transfer Length was not sufficient. bit 4 - (u) set for Bidirectional Read Residual Underflow. In this case, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator out of the number of bytes expected to be transferred. bit 5 - (O) set for Residual Overflow. In this case, the Residual Count indicates the number of bytes that were not transferred because the initiator's Expected Data Transfer Length was not sufficient. For a bidirectional operation, the Residual Count contains the residual for the write operation. bit 6 - (U) set for Residual Underflow. In this case, the Residual Count indicates the number of bytes that were not transferred out of the number of bytes that were expected to be transferred. For a bidirectional operation, the Residual Count contains the residual for the write operation. bit 7 - (0) Reserved. Bits O and U and bits o and u are mutually exclusive (i.e., having both o and u or O and U set to 1 is a protocol error). For a response other than "Command Completed at Target", bits 3-6 MUST be 0.
Top   ToC   RFC7143 - Page 163

11.4.2. Status

The Status field is used to report the SCSI status of the command (as specified in [SAM2]) and is only valid if the response code is Command Completed at Target. Some of the status codes defined in [SAM2] are: 0x00 GOOD 0x02 CHECK CONDITION 0x08 BUSY 0x18 RESERVATION CONFLICT 0x28 TASK SET FULL 0x30 ACA ACTIVE 0x40 TASK ABORTED See [SAM2] for the complete list and definitions. If a SCSI device error is detected while data from the initiator is still expected (the command PDU did not contain all the data and the target has not received a data PDU with the Final bit set), the target MUST wait until it receives a data PDU with the F bit set in the last expected sequence before sending the Response PDU.

11.4.3. Response

This field contains the iSCSI service response. iSCSI service response codes defined in this specification are: 0x00 - Command Completed at Target 0x01 - Target Failure 0x80-0xff - Vendor specific All other response codes are reserved. The Response field is used to report a service response. The mapping of the response code into a SCSI service response code value, if needed, is outside the scope of this document. However, in symbolic terms, response value 0x00 maps to the SCSI service response (see
Top   ToC   RFC7143 - Page 164
   [SAM2] and [SPC3]) of TASK COMPLETE or LINKED COMMAND COMPLETE.  All
   other Response values map to the SCSI service response of SERVICE
   DELIVERY OR TARGET FAILURE.

   If a SCSI Response PDU does not arrive before the session is
   terminated, the SCSI service response is SERVICE DELIVERY OR TARGET
   FAILURE.

   A non-zero response field indicates a failure to execute the command,
   in which case the Status and Flag fields are undefined and MUST be
   ignored on reception.

11.4.4. SNACK Tag

This field contains a copy of the SNACK Tag of the last SNACK Tag accepted by the target on the same connection and for the command for which the response is issued. Otherwise, it is reserved and should be set to 0. After issuing a R-Data SNACK, the initiator must discard any SCSI status unless contained in a SCSI Response PDU carrying the same SNACK Tag as the last issued R-Data SNACK for the SCSI command on the current connection. For a detailed discussion on R-Data SNACK, see Section 11.16.3.

11.4.5. Residual Count

11.4.5.1. Field Semantics
The Residual Count field MUST be valid in the case where either the U bit or the O bit is set. If neither bit is set, the Residual Count field MUST be ignored on reception and SHOULD be set to 0 when sending. Targets may set the residual count, and initiators may use it when the response code is Command Completed at Target (even if the status returned is not GOOD). If the O bit is set, the Residual Count indicates the number of bytes that were not transferred because the initiator's Expected Data Transfer Length was not sufficient. If the U bit is set, the Residual Count indicates the number of bytes that were not transferred out of the number of bytes expected to be transferred.
11.4.5.2. Residuals Concepts Overview
"SCSI-Presented Data Transfer Length (SPDTL)" is the term this document uses (see Section 2.2 for definition) to represent the aggregate data length that the target SCSI layer attempts to transfer using the local iSCSI layer for a task. "Expected Data Transfer
Top   ToC   RFC7143 - Page 165
   Length (EDTL)" is the iSCSI term that represents the length of data
   that the iSCSI layer expects to transfer for a task.  EDTL is
   specified in the SCSI Command PDU.

   When SPDTL = EDTL for a task, the target iSCSI layer completes the
   task with no residuals.  Whenever SPDTL differs from EDTL for a task,
   that task is said to have a residual.

   If SPDTL > EDTL for a task, iSCSI Overflow MUST be signaled in the
   SCSI Response PDU as specified in Section 11.4.5.1.  The Residual
   Count MUST be set to the numerical value of (SPDTL - EDTL).

   If SPDTL < EDTL for a task, iSCSI Underflow MUST be signaled in the
   SCSI Response PDU as specified in Section 11.4.5.1.  The Residual
   Count MUST be set to the numerical value of (EDTL - SPDTL).

   Note that the Overflow and Underflow scenarios are independent of
   Data-In and Data-Out.  Either scenario is logically possible in
   either direction of data transfer.

11.4.5.3. SCSI REPORT LUNS Command and Residual Overflow
This section discusses the residual overflow issues, citing the example of the SCSI REPORT LUNS command. Note, however, that there are several SCSI commands (e.g., INQUIRY) with ALLOCATION LENGTH fields following the same underlying rules. The semantics in the rest of the section apply to all such SCSI commands. The specification of the SCSI REPORT LUNS command requires that the SCSI target limit the amount of data transferred to a maximum size (ALLOCATION LENGTH) provided by the initiator in the REPORT LUNS CDB. If the Expected Data Transfer Length (EDTL) in the iSCSI header of the SCSI Command PDU for a REPORT LUNS command is set to at least as large as that ALLOCATION LENGTH, the SCSI-layer truncation prevents an iSCSI Residual Overflow from occurring. A SCSI initiator can detect that such truncation has occurred via other information at the SCSI layer. The rest of the section elaborates on this required behavior. The SCSI REPORT LUNS command requests a target SCSI layer to return a LU inventory (LUN list) to the initiator SCSI layer (see Clause 6.21 of [SPC3]). The size of this LUN list may not be known to the initiator SCSI layer when it issues the REPORT LUNS command; to avoid transferring more LUN list data than the initiator is prepared for, the REPORT LUNS CDB contains an ALLOCATION LENGTH field to specify the maximum amount of data to be transferred to the initiator for this command. If the initiator SCSI layer has underestimated the
Top   ToC   RFC7143 - Page 166
   number of LUs at the target, it is possible that the complete LU
   inventory does not fit in the specified ALLOCATION LENGTH.  In this
   situation, Clause 4.3.4.6 of [SPC3] requires that the target SCSI
   layer "shall terminate transfers to the Data-In Buffer" when the
   number of bytes specified by the ALLOCATION LENGTH field have been
   transferred.

   Therefore, in response to a REPORT LUNS command, the SCSI layer at
   the target presents at most ALLOCATION LENGTH bytes of data (LU
   inventory) to iSCSI for transfer to the initiator.  For a REPORT LUNS
   command, if the iSCSI EDTL is at least as large as the ALLOCATION
   LENGTH, the SCSI truncation ensures that the EDTL will accommodate
   all of the data to be transferred.  If all of the LU inventory data
   presented to the iSCSI layer -- i.e., the data remaining after any
   SCSI truncation -- is transferred to the initiator by the iSCSI
   layer, an iSCSI Residual Overflow has not occurred and the iSCSI (O)
   bit MUST NOT be set in the SCSI Response or final SCSI Data-Out PDU.
   Note that this behavior is implied in Section 11.4.5.1, along with
   the specification of the REPORT LUNS command in [SPC3].  However, if
   the iSCSI EDTL is larger than the ALLOCATION LENGTH in this scenario,
   note that the iSCSI Underflow MUST be signaled in the SCSI Response
   PDU.  An iSCSI Underflow MUST also be signaled when the iSCSI EDTL is
   equal to the ALLOCATION LENGTH but the LU inventory data presented to
   the iSCSI layer is smaller than the ALLOCATION LENGTH.

   The LUN LIST LENGTH field in the LU inventory (the first field in the
   inventory) is not affected by truncation of the inventory to fit in
   ALLOCATION LENGTH; this enables a SCSI initiator to determine that
   the received inventory is incomplete by noticing that the LUN LIST
   LENGTH in the inventory is larger than the ALLOCATION LENGTH that was
   sent in the REPORT LUNS CDB.  A common initiator behavior in this
   situation is to reissue the REPORT LUNS command with a larger
   ALLOCATION LENGTH.

11.4.6. Bidirectional Read Residual Count

The Bidirectional Read Residual Count field MUST be valid in the case where either the u bit or the o bit is set. If neither bit is set, the Bidirectional Read Residual Count field is reserved. Targets may set the Bidirectional Read Residual Count, and initiators may use it when the response code is Command Completed at Target. If the o bit is set, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator because the initiator's Bidirectional Read Expected Data Transfer Length was not sufficient. If the u bit is set, the Bidirectional Read Residual Count indicates the number of bytes that were not transferred to the initiator out of the number of bytes expected to be transferred.
Top   ToC   RFC7143 - Page 167

11.4.7. Data Segment - Sense and Response Data Segment

iSCSI targets MUST support and enable Autosense. If Status is CHECK CONDITION (0x02), then the data segment MUST contain sense data for the failed command. For some iSCSI responses, the response data segment MAY contain some response-related information (e.g., for a target failure, it may contain a vendor-specific detailed description of the failure). If the DataSegmentLength is not 0, the format of the data segment is as follows: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0|SenseLength | Sense Data | +---------------+---------------+---------------+---------------+ x/ Sense Data / +---------------+---------------+---------------+---------------+ y/ Response Data / / / +---------------+---------------+---------------+---------------+
11.4.7.1. SenseLength
This field indicates the length of Sense Data.
Top   ToC   RFC7143 - Page 168
11.4.7.2. Sense Data
The Sense Data contains detailed information about a CHECK CONDITION. [SPC3] specifies the format and content of the Sense Data. Certain iSCSI conditions result in the command being terminated at the target (response code of Command Completed at Target) with a SCSI CHECK CONDITION Status as outlined in the next table: +--------------------------+-----------+---------------------------+ | iSCSI Condition |Sense | Additional Sense Code and | | |Key | Qualifier | +--------------------------+-----------+---------------------------+ | Unexpected unsolicited |Aborted | ASC = 0x0c ASCQ = 0x0c | | data |Command-0B | Write Error | +--------------------------+-----------+---------------------------+ | Incorrect amount of data |Aborted | ASC = 0x0c ASCQ = 0x0d | | |Command-0B | Write Error | +--------------------------+-----------+---------------------------+ | Protocol Service CRC |Aborted | ASC = 0x47 ASCQ = 0x05 | | error |Command-0B | CRC Error Detected | +--------------------------+-----------+---------------------------+ | SNACK rejected |Aborted | ASC = 0x11 ASCQ = 0x13 | | |Command-0B | Read Error | +--------------------------+-----------+---------------------------+ The target reports the "Incorrect amount of data" condition if, during data output, the total data length to output is greater than FirstBurstLength and the initiator sent unsolicited non-immediate data but the total amount of unsolicited data is different than FirstBurstLength. The target reports the same error when the amount of data sent as a reply to an R2T does not match the amount requested.

11.4.8. ExpDataSN

This field indicates the number of Data-In (read) PDUs the target has sent for the command. This field MUST be 0 if the response code is not Command Completed at Target or the target sent no Data-In PDUs for the command.

11.4.9. StatSN - Status Sequence Number

The StatSN is a sequence number that the target iSCSI layer generates per connection and that in turn enables the initiator to acknowledge status reception. The StatSN is incremented by 1 for every response/status sent on a connection, except for responses sent as a
Top   ToC   RFC7143 - Page 169
   result of a retry or SNACK.  In the case of responses sent due to a
   retransmission request, the StatSN MUST be the same as the first time
   the PDU was sent, unless the connection has since been restarted.

11.4.10. ExpCmdSN - Next Expected CmdSN from This Initiator

The ExpCmdSN is a sequence number that the target iSCSI returns to the initiator to acknowledge command reception. It is used to update a local variable with the same name. An ExpCmdSN equal to MaxCmdSN + 1 indicates that the target cannot accept new commands.

11.4.11. MaxCmdSN - Maximum CmdSN from This Initiator

The MaxCmdSN is a sequence number that the target iSCSI returns to the initiator to indicate the maximum CmdSN the initiator can send. It is used to update a local variable with the same name. If the MaxCmdSN is equal to ExpCmdSN - 1, this indicates to the initiator that the target cannot receive any additional commands. When the MaxCmdSN changes at the target while the target has no pending PDUs to convey this information to the initiator, it MUST generate a NOP-In to carry the new MaxCmdSN.


(next page on part 7)

Next Section