Network Working Group P. Culley
Request for Comments: 5044 Hewlett-Packard Company
Category: Standards Track U. Elzur
October 2007 Marker PDU Aligned Framing for TCP Specification
Status of This Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Marker PDU Aligned Framing (MPA) is designed to work as an
"adaptation layer" between TCP and the Direct Data Placement protocol
(DDP) as described in RFC 5041. It preserves the reliable, in-order
delivery of TCP, while adding the preservation of higher-level
protocol record boundaries that DDP requires. MPA is fully compliant
with applicable TCP RFCs and can be utilized with existing TCP
implementations. MPA also supports integrated implementations that
combine TCP, MPA and DDP to reduce buffering requirements in the
implementation and improve performance at the system level.
A.5.2. TCP Reassembly Buffers .............................52
Appendix B. Analysis of MPA over TCP Operations ...................52B.1. Assumptions ...............................................53B.1.1. MPA Is Layered beneath DDP .........................53B.1.2. MPA Preserves DDP Message Framing ..................53B.1.3. The Size of the ULPDU Passed to MPA Is Less Than
EMSS Under Normal Conditions .......................53B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery.54B.2. The Value of FPDU Alignment ...............................54B.2.1. Impact of Lack of FPDU Alignment on the Receiver
Computational Load and Complexity ..................56B.2.2. FPDU Alignment Effects on TCP Wire Protocol ........60
Appendix C. IETF Implementation Interoperability with RDMA
Consortium Protocols ..................................62C.1. Negotiated Parameters ......................................63C.2. RDMAC RNIC and Non-Permissive IETF RNIC ....................64C.2.1. RDMAC RNIC Initiator ................................65C.2.2. Non-Permissive IETF RNIC Initiator ..................65C.2.3. RDMAC RNIC and Permissive IETF RNIC .................65C.2.4. RDMAC RNIC Initiator ................................66C.2.5. Permissive IETF RNIC Initiator ......................67C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC ..........67
Normative References ..............................................68
Informative References ............................................68
Contributors ......................................................70Table of FiguresFigure 1: ULP MPA TCP Layering .....................................5
Figure 2: FPDU Format .............................................13
Figure 3: Marker Format ...........................................14
Figure 4: Example FPDU Format with Marker .........................16
Figure 5: Annotated Hex Dump of an FPDU ...........................19
Figure 6: Annotated Hex Dump of an FPDU with Marker ...............20
Figure 7: Fully Layered Implementation ............................22
Figure 8: MPA Request/Reply Frame .................................26
Figure 9: Example Delayed Startup Negotiation .....................31
Figure 10: Example Immediate Startup Negotiation ..................35
Figure 11: Optimized MPA/TCP Implementation .......................45
Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream .....56
Figure 13: Aligned FPDU Placed Immediately after TCP Header .......58
Figure 14: Connection Parameters for the RNIC Types ...............63
Figure 15: MPA Negotiation between an RDMAC RNIC and a
Non-Permissive IETF RNIC ...............................65
Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive
IETF RNIC ..............................................66
Figure 17: MPA Negotiation between a Non-Permissive IETF RNIC and
a Permissive IETF RNIC .................................67
This section discusses the reason for creating MPA on TCP and a
general overview of the protocol.
The Direct Data Placement protocol [DDP], when used with TCP
[RFC793], requires a mechanism to detect record boundaries. The DDP
records are referred to as Upper Layer Protocol Data Units by this
document. The ability to locate the Upper Layer Protocol Data Unit
(ULPDU) boundary is useful to a hardware network adapter that uses
DDP to directly place the data in the application buffer based on the
control information carried in the ULPDU header. This may be done
without requiring that the packets arrive in order. Potential
benefits of this capability are the avoidance of the memory copy
overhead and a smaller memory requirement for handling out-of-order
or dropped packets.
Many approaches have been proposed for a generalized framing
mechanism. Some are probabilistic in nature and others are
deterministic. An example probabilistic approach is characterized by
a detectable value embedded in the octet stream, with no method of
preventing that value elsewhere within user data. It is
probabilistic because under some conditions the receiver may
incorrectly interpret application data as the detectable value.
Under these conditions, the protocol may fail with unacceptable
frequency. One deterministic approach is characterized by embedded
controls at known locations in the octet stream. Because the
receiver can guarantee it will only examine the data stream at
locations that are known to contain the embedded control, the
protocol can never misinterpret application data as being embedded
control data. For unambiguous handling of an out-of-order packet, a
deterministic approach is preferred.
The MPA protocol provides a framing mechanism for DDP running over
TCP using the deterministic approach. It allows the location of the
ULPDU to be determined in the TCP stream even if the TCP segments
arrive out of order.
1.2. Protocol Overview
The layering of PDUs with MPA is shown in Figure 1, below.
| ULP client |
+------------------+ <- Consumer messages
| DDP |
+------------------+ <- ULPDUs
| MPA* |
+------------------+ <- FPDUs (containing ULPDUs)
| TCP* |
+------------------+ <- TCP Segments (containing FPDUs)
| IP etc. |
* These may be fully layered or optimized together.
Figure 1: ULP MPA TCP Layering
MPA is described as an extra layer above TCP and below DDP. The
operation sequence is:
1. A TCP connection is established by ULP action. This is done
using methods not described by this specification. The ULP may
exchange some amount of data in streaming mode prior to starting
MPA, but is not required to do so.
2. The Consumer negotiates the use of DDP and MPA at both ends of a
connection. The mechanisms to do this are not described in this
specification. The negotiation may be done in streaming mode, or
by some other mechanism (such as a pre-arranged port number).
3. The ULP activates MPA on each end in the Startup Phase, either as
an Initiator or a Responder, as determined by the ULP. This mode
verifies the usage of MPA, specifies the use of CRC and Markers,
and allows the ULP to communicate some additional data via a
Private Data exchange. See Section 7.1, Connection Setup, for
more details on the startup process.
4. At the end of the Startup Phase, the ULP puts MPA (and DDP) into
Full Operation and begins sending DDP data as further described
below. In this document, DDP data chunks are called ULPDUs. For
a description of the DDP data, see [DDP].
Following is a description of data transfer when MPA is in Full
1. DDP determines the Maximum ULPDU (MULPDU) size by querying MPA
for this value. MPA derives this information from TCP or IP,
when it is available, or chooses a reasonable value.
2. DDP creates ULPDUs of MULPDU size or smaller, and hands them to
MPA at the sender.
3. MPA creates a Framed Protocol Data Unit (FPDU) by prepending a
header, optionally inserting Markers, and appending a CRC field
after the ULPDU and PAD (if any). MPA delivers the FPDU to TCP.
4. The TCP sender puts the FPDUs into the TCP stream. If the sender
is optimized MPA/TCP, it segments the TCP stream in such a way
that a TCP Segment boundary is also the boundary of an FPDU. TCP
then passes each segment to the IP layer for transmission.
5. The receiver may or may not be optimized. If it is optimized
MPA/TCP, it may separate passing the TCP payload to MPA from
passing the TCP payload ordering information to MPA. In either
case, RFC-compliant TCP wire behavior is observed at both the
sender and receiver.
6. The MPA receiver locates and assembles complete FPDUs within the
stream, verifies their integrity, and removes MPA Markers (when
present), ULPDU_Length, PAD, and the CRC field.
7. MPA then provides the complete ULPDUs to DDP. MPA may also
separate passing MPA payload to DDP from passing the MPA payload
A fully layered MPA on TCP is implemented as a data stream ULP for
TCP and is therefore RFC compliant.
An optimized DDP/MPA/TCP uses a TCP layer that potentially contains
some additional behaviors as suggested in this document. When
DDP/MPA/TCP are cross-layer optimized, the behavior of TCP
(especially sender segmentation) may change from that of the un-
optimized implementation, but the changes are within the bounds
permitted by the TCP RFC specifications, and will interoperate with
an un-optimized TCP. The additional behaviors are described in
Appendix A and are not normative; they are described at a TCP
interface layer as a convenience. Implementations may achieve the
described functionality using any method, including cross-layer
optimizations between TCP, MPA, and DDP.
An optimized DDP/MPA/TCP sender is able to segment the data stream
such that TCP segments begin with FPDUs (FPDU Alignment). This has
significant advantages for receivers. When segments arrive with
aligned FPDUs, the receiver usually need not buffer any portion of
the segment, allowing DDP to place it in its destination memory
immediately, thus avoiding copies from intermediate buffers (DDP's
reason for existence).
An optimized DDP/MPA/TCP receiver allows a DDP on MPA implementation
to locate the start of ULPDUs that may be received out of order. It
also allows the implementation to determine if the entire ULPDU has
been received. As a result, MPA can pass out-of-order ULPDUs to DDP
for immediate use. This enables a DDP on MPA implementation to save
a significant amount of intermediate storage by placing the ULPDUs in
the right locations in the application buffers when they arrive,
rather than waiting until full ordering can be restored.
The ability of a receiver to recover out-of-order ULPDUs is optional
and declared to the transmitter during startup. When the receiver
declares that it does not support out-of-order recovery, the
transmitter does not add the control information to the data stream
needed for out-of-order recovery.
If the receiver is fully layered, then MPA receives a strictly
ordered stream of data and does not deal with out-of-order ULPDUs.
In this case, MPA passes each ULPDU to DDP when the last bytes arrive
from TCP, along with the indication that they are in order.
MPA implementations that support recovery of out-of-order ULPDUs MUST
support a mechanism to indicate the ordering of ULPDUs as the sender
transmitted them and indicate when missing intermediate segments
arrive. These mechanisms allow DDP to reestablish record ordering
and report Delivery of complete messages (groups of records).
MPA also addresses enhanced data integrity. Some users of TCP have
noted that the TCP checksum is not as strong as could be desired (see
[CRCTCP]). Studies such as [CRCTCP] have shown that the TCP checksum
indicates segments in error at a much higher rate than the underlying
link characteristics would indicate. With these higher error rates,
the chance that an error will escape detection, when using only the
TCP checksum for data integrity, becomes a concern. A stronger
integrity check can reduce the chance of data errors being missed.
MPA includes a CRC check to increase the ULPDU data integrity to the
level provided by other modern protocols, such as SCTP [RFC4960]. It
is possible to disable this CRC check; however, CRCs MUST be enabled
unless it is clear that the end-to-end connection through the network
has data integrity at least as good as an MPA with CRC enabled (for
example, when IPsec is implemented end to end). DDP's ULP expects
this level of data integrity and therefore the ULP does not have to
provide its own duplicate data integrity and error recovery for lost
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Consumer - the ULPs or applications that lie above MPA and DDP. The
Consumer is responsible for making TCP connections, starting MPA
and DDP connections, and generally controlling operations.
CRC - Cyclic Redundancy Check.
Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as
the process of informing DDP that a particular PDU is ordered for
use. A PDU is Delivered in the exact order that it was sent by
the original sender; MPA uses TCP's byte stream ordering to
determine when Delivery is possible. This is specifically
different from "passing the PDU to DDP", which may generally
occur in any order, while the order of Delivery is strictly
EMSS - Effective Maximum Segment Size. EMSS is the smaller of the
TCP maximum segment size (MSS) as defined in RFC 793 [RFC793],
and the current path Maximum Transmission Unit (MTU) [RFC1191].
FPDU - Framed Protocol Data Unit. The unit of data created by an MPA
FPDU Alignment - The property that an FPDU is Header Aligned with the
TCP segment, and the TCP segment includes an integer number of
FPDUs. A TCP segment with an FPDU Alignment allows immediate
processing of the contained FPDUs without waiting on other TCP
segments to arrive or combining with prior segments.
FPDU Pointer (FPDUPTR) - This field of the Marker is used to indicate
the beginning of an FPDU.
Full Operation (Full Operation Phase) - After the completion of the
Startup Phase, MPA begins exchanging FPDUs.
Header Alignment - The property that a TCP segment begins with an
FPDU. The FPDU is Header Aligned when the FPDU header is exactly
at the start of the TCP segment (right behind the TCP headers on
Initiator - The endpoint of a connection that sends the MPA Request
Frame, i.e., the first to actually send data (which may not be
the one that sends the TCP SYN).
Marker - A four-octet field that is placed in the MPA data stream at
fixed octet intervals (every 512 octets).
MPA-aware TCP - A TCP implementation that is aware of the receiver
efficiencies of MPA FPDU Alignment and is capable of sending TCP
segments that begin with an FPDU.
MPA-enabled - MPA is enabled if the MPA protocol is visible on the
wire. When the sender is MPA-enabled, it is inserting framing
and Markers. When the receiver is MPA-enabled, it is
interpreting framing and Markers.
MPA Request Frame - Data sent from the MPA Initiator to the MPA
Responder during the Startup Phase.
MPA Reply Frame - Data sent from the MPA Responder to the MPA
Initiator during the Startup Phase.
MPA - Marker-based ULP PDU Aligned Framing for TCP protocol. This
document defines the MPA protocol.
MULPDU - Maximum ULPDU. The current maximum size of the record that
is acceptable for DDP to pass to MPA for transmission.
Node - A computing device attached to one or more links of a network.
A Node in this context does not refer to a specific application
or protocol instantiation running on the computer. A Node may
consist of one or more MPA on TCP devices installed in a host
PAD - A 1-3 octet group of zeros used to fill an FPDU to an exact
modulo 4 size.
PDU - Protocol data unit
Private Data - A block of data exchanged between MPA endpoints during
initial connection setup.
Protection Domain - An RDMA concept (see [VERBS-RDMA] and [RDMASEC])
that ties use of various endpoint resources (memory access, etc.)
to the specific RDMA/DDP/MPA connection.
RDDP - A suite of protocols including MPA, [DDP], [RDMAP], an overall
security document [RDMASEC], a problem statement [RFC4297], an
architecture document [RFC4296], and an applicability document
RDMA - Remote Direct Memory Access; a protocol that uses DDP and MPA
to enable applications to transfer data directly from memory
buffers. See [RDMAP].
Remote Peer - The MPA protocol implementation on the opposite end of
the connection. Used to refer to the remote entity when
describing protocol exchanges or other interactions between two
Responder - The connection endpoint that responds to an incoming MPA
connection request (the MAP Request Frame). This may not be the
endpoint that awaited the TCP SYN.
Startup Phase - The initial exchanges of an MPA connection that
serves to more fully identify MPA endpoints to each other and
pass connection specific setup information to each other.
ULP - Upper Layer Protocol. The protocol layer above the protocol
layer currently being referenced. The ULP for MPA is DDP [DDP].
ULPDU - Upper Layer Protocol Data Unit. The data record defined by
the layer above MPA (DDP). ULPDU corresponds to DDP's DDP
ULPDU_Length - A field in the FPDU describing the length of the
3. MPA's Interactions with DDP
DDP requires MPA to maintain DDP record boundaries from the sender to
the receiver. When using MPA on TCP to send data, DDP provides
records (ULPDUs) to MPA. MPA will use the reliable transmission
abilities of TCP to transmit the data, and will insert appropriate
additional information into the TCP stream to allow the MPA receiver
to locate the record boundary information.
As such, MPA accepts complete records (ULPDUs) from DDP at the sender
and returns them to DDP at the receiver.
MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU
contained in one FPDU.
MPA over a standard TCP stack can usually provide FPDU Alignment with
the TCP Header if the FPDU is equal to TCP's EMSS. An optimized
MPA/TCP stack can also maintain alignment as long as the FPDU is less
than or equal to TCP's EMSS. Since FPDU Alignment is generally
desired by the receiver, DDP cooperates with MPA to ensure FPDUs'
lengths do not exceed the EMSS under normal conditions. This is done
with the MULPDU mechanism.
MPA MUST provide information to DDP on the current maximum size of
the record that is acceptable to send (MULPDU). DDP SHOULD limit
each record size to MULPDU. The range of MULPDU values MUST be
between 128 octets and 64768 octets, inclusive.
The sending DDP MUST NOT post a ULPDU larger than 64768 octets to
MPA. DDP MAY post a ULPDU of any size between one and 64768 octets;
however, MPA is not REQUIRED to support a ULPDU Length that is
greater than the current MULPDU.
While the maximum theoretical length supported by the MPA header
ULPDU_Length field is 65535, TCP over IP requires the IP datagram
maximum length to be 65535 octets. To enable MPA to support FPDU
Alignment, the maximum size of the FPDU must fit within an IP
datagram. Thus, the ULPDU limit of 64768 octets was derived by
taking the maximum IP datagram length, subtracting from it the
maximum total length of the sum of the IPv4 header, TCP header, IPv4
options, TCP options, and the worst-case MPA overhead, and then
rounding the result down to a 128-octet boundary.
Note that MULPDU will be significantly smaller than the theoretical
maximum in most implementations for most circumstances, due to link
MTUs, use of extra headers such as required for IPsec, etc.
On receive, MPA MUST pass each ULPDU with its length to DDP when it
has been validated.
If an MPA implementation supports passing out-of-order ULPDUs to DDP,
the MPA implementation SHOULD:
* Pass each ULPDU with its length to DDP as soon as it has been
fully received and validated.
* Provide a mechanism to indicate the ordering of ULPDUs as the
sender transmitted them. One possible mechanism might be
providing the TCP sequence number for each ULPDU.
* Provide a mechanism to indicate when a given ULPDU (and prior
ULPDUs) are complete (Delivered to DDP). One possible mechanism
might be to allow DDP to see the current outgoing TCP ACK
* Provide an indication to DDP that the TCP has closed or has begun
to close the connection (e.g., received a FIN).
MPA MUST provide the protocol version negotiated with its peer to
DDP. DDP will use this version to set the version in its header and
to report the version to [RDMAP].
4. MPA Full Operation Phase
The following sections describe the main semantics of the Full
Operation Phase of MPA.
4.1. FPDU Format
MPA senders create FPDUs out of ULPDUs. The format of an FPDU shown
below MUST be used for all MPA FPDUs. For purposes of clarity,
Markers are not shown in Figure 2.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| ULPDU_Length | |
~ ULPDU ~
| | PAD (0-3 octets) |
| CRC |
Figure 2: FPDU Format
ULPDU_Length: 16 bits (unsigned integer). This is the number of
octets of the contained ULPDU. It does not include the length of the
FPDU header itself, the pad, the CRC, or of any Markers that fall
within the ULPDU. The 16-bit ULPDU Length field is large enough to
support the largest IP datagrams for IPv4 or IPv6.
PAD: The PAD field trails the ULPDU and contains between 0 and 3
octets of data. The pad data MUST be set to zero by the sender and
ignored by the receiver (except for CRC checking). The length of the
pad is set so as to make the size of the FPDU an integral multiple of
CRC: 32 bits. When CRCs are enabled, this field contains a CRC32c
check value, which is used to verify the entire contents of the FPDU,
using CRC32c. See Section 4.4, CRC Calculation. When CRCs are not
enabled, this field is still present, may contain any value, and MUST
NOT be checked.
The FPDU adds a minimum of 6 octets to the length of the ULPDU. In
addition, the total length of the FPDU will include the length of any
Markers and from 0 to 3 pad octets added to round-up the ULPDU size.
4.2. Marker Format
The format of a Marker MUST be as specified in Figure 3:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| RESERVED | FPDUPTR |
Figure 3: Marker Format
RESERVED: The Reserved field MUST be set to zero on transmit and
ignored on receive (except for CRC calculation).
FPDUPTR: The FPDU Pointer is a relative pointer, 16 bits long,
interpreted as an unsigned integer that indicates the number of
octets in the TCP stream from the beginning of the ULPDU Length field
to the first octet of the entire Marker. The least significant two
bits MUST always be set to zero at the transmitter, and the receivers
MUST always treat these as zero for calculations.
4.3. MPA Markers
MPA Markers are used to identify the start of FPDUs when packets are
received out of order. This is done by locating the Markers at fixed
intervals in the data stream (which is correlated to the TCP sequence
number) and using the Marker value to locate the preceding FPDU
All MPA Markers are included in the containing FPDU CRC calculation
(when both CRCs and Markers are in use).
The MPA receiver's ability to locate out-of-order FPDUs and pass the
ULPDUs to DDP is implementation dependent. MPA/DDP allows those
receivers that are able to deal with out-of-order FPDUs in this way
to require the insertion of Markers in the data stream. When the
receiver cannot deal with out-of-order FPDUs in this way, it may
disable the insertion of Markers at the sender. All MPA senders MUST
be able to generate Markers when their use is declared by the
opposing receiver (see Section 7.1, Connection Setup).
When Markers are enabled, MPA senders MUST insert a Marker into the
data stream at a 512-octet periodic interval in the TCP Sequence
Number Space. The Marker contains a 16-bit unsigned integer referred
to as the FPDUPTR (FPDU Pointer).
If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16-bit
relative back-pointer. FPDUPTR MUST contain the number of octets in
the TCP stream from the beginning of the ULPDU Length field to the
first octet of the Marker, unless the Marker falls between FPDUs.
Thus, the location of the first octet of the previous FPDU header can
be determined by subtracting the value of the given Marker from the
current octet-stream sequence number (i.e., TCP sequence number) of
the first octet of the Marker. Note that this computation MUST take
into account that the TCP sequence number could have wrapped between
the Marker and the header.
An FPDUPTR value of 0x0000 is a special case -- it is used when the
Marker falls exactly between FPDUs (between the preceding FPDU CRC
field and the next FPDU's ULPDU Length field). In this case, the
Marker is considered to be contained in the following FPDU; the
Marker MUST be included in the CRC calculation of the FPDU following
the Marker (if CRCs are being generated or checked). Thus, an
FPDUPTR value of 0x0000 means that immediately following the Marker
is an FPDU header (the ULPDU Length field).
Since all FPDUs are integral multiples of 4 octets, the bottom two
bits of the FPDUPTR as calculated by the sender are zero. MPA
reserves these bits so they MUST be treated as zero for computation
at the receiver.
When Markers are enabled (see Section 7.1, Connection Setup), the MPA
Markers MUST be inserted immediately preceding the first FPDU of Full
Operation Phase, and at every 512th octet of the TCP octet stream
thereafter. As a result, the first Marker has an FPDUPTR value of
0x0000. If the first Marker begins at octet sequence number
SeqStart, then Markers are inserted such that the first octet of the
Marker is at octet sequence number SeqNum if the remainder of (SeqNum
- SeqStart) mod 512 is zero. Note that SeqNum can wrap.
For example, if the TCP sequence number were used to calculate the
insertion point of the Marker, the starting TCP sequence number is
unlikely to be zero, and 512-octet multiples are unlikely to fall on
a modulo 512 of zero. If the MPA connection is started at TCP
sequence number 11, then the 1st Marker will begin at 11, and
subsequent Markers will begin at 523, 1035, etc.
If an FPDU is large enough to contain multiple Markers, they MUST all
point to the same point in the TCP stream: the first octet of the
ULPDU Length field for the FPDU.
If a Marker interval contains multiple FPDUs (the FPDUs are small),
the Marker MUST point to the start of the ULPDU Length field for the
FPDU containing the Marker unless the Marker falls between FPDUs, in
which case the Marker MUST be zero.
The following example shows an FPDU containing a Marker.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
| ULPDU Length (0x0010) | |
| ULPDU (octets 0-9) |
| (0x0000) | FPDU ptr (0x000C) |
| ULPDU (octets 10-15) |
| | PAD (2 octets:0,0) |
| CRC |
Figure 4: Example FPDU Format with Marker
MPA Receivers MUST preserve ULPDU boundaries when passing data to
DDP. MPA Receivers MUST pass the ULPDU data and the ULPDU Length to
DDP and not the Markers, headers, and CRC.
4.4. CRC Calculation
An MPA implementation MUST implement CRC support and MUST either:
(1) always use CRCs; the MPA provider is not REQUIRED to support an
administrator's request that CRCs not be used.
(2a) only indicate a preference not to use CRCs on the explicit
request of the system administrator, via an interface not
defined in this spec. The default configuration for a
connection MUST be to use CRCs.
(2b) disable CRC checking (and possibly generation) if both the local
and remote endpoints indicate preference not to use CRCs.
An administrative decision to have a host request CRC suppression
SHOULD NOT be made unless there is assurance that the TCP connection
involved provides protection from undetected errors that is at least
as strong as an end-to-end CRC32c. End-to-end usage of an IPsec
cryptographic integrity check is among the ways to provide such
protection, and the use of channel bindings [NFSv4CHANNEL] by the ULP
can provide a high level of assurance that the IPsec protection scope
is end-to-end with respect to the ULP.
The process MUST be invisible to the ULP.
After receipt of an MPA startup declaration indicating that its peer
requires CRCs, an MPA instance MUST continue generating and checking
CRCs until the connection terminates. If an MPA instance has
declared that it does not require CRCs, it MUST turn off CRC checking
immediately after receipt of an MPA mode declaration indicating that
its peer also does not require CRCs. It MAY continue generating
CRCs. See Section 7.1, Connection Setup, for details on the MPA
When sending an FPDU, the sender MUST include a CRC field. When CRCs
are enabled, the CRC field in the MPA FPDU MUST be computed using the
CRC32c polynomial in the manner described in the iSCSI Protocol
[iSCSI] document for Header and Data Digests.
The fields which MUST be included in the CRC calculation when sending
an FPDU are as follows:
1) If a Marker does not immediately precede the ULPDU Length field,
the CRC-32c is calculated from the first octet of the ULPDU
Length field, through all the ULPDU and Markers (if present), to
the last octet of the PAD (if present), inclusive. If there is a
Marker immediately following the PAD, the Marker is included in
the CRC calculation for this FPDU.
2) If a Marker immediately precedes the first octet of the ULPDU
Length field of the FPDU, (i.e., the Marker fell between FPDUs,
and thus is required to be included in the second FPDU), the
CRC-32c is calculated from the first octet of the Marker, through
the ULPDU Length header, through all the ULPDU and Markers (if
present), to the last octet of the PAD (if present), inclusive.
3) After calculating the CRC-32c, the resultant value is placed into
the CRC field at the end of the FPDU.
When an FPDU is received, and CRC checking is enabled, the receiver
MUST first perform the following:
1) Calculate the CRC of the incoming FPDU in the same fashion as
2) Verify that the calculated CRC-32c value is the same as the
received CRC-32c value found in the FPDU CRC field. If not, the
receiver MUST treat the FPDU as an invalid FPDU.
The procedure for handling invalid FPDUs is covered in Section 8,
The following is an annotated hex dump of an example FPDU sent as the
first FPDU on the stream. As such, it starts with a Marker. The
FPDU contains a 42 octet ULPDU (an example DDP segment) which in turn
contains 24 octets of the contained ULPDU, which is a data load that
is all zeros. The CRC32c has been correctly calculated and can be
used as a reference. See the [DDP] and [RDMAP] specification for
definitions of the DDP Control field, Queue, MSN, MO, and Send Data.
The following is an example sent as the second FPDU of the stream
where the first FPDU (which is not shown here) had a length of 492
octets and was also a Send to Queue 0 with Last Flag set. This
example contains a Marker.
Octet Contents Annotation
01ec 00 Length
01ee 41 DDP Control Field: Send with Last Flag set
01f0 00 Reserved (DDP STag position with no STag)
01f4 00 DDP Queue = 0
01f8 00 DDP MSN = 2
01fc 00 DDP MO = 0
0200 00 Marker: Reserved
0202 00 Marker: FPDUPTR
0204 00 DDP Send Data (24 octets of zeros)
021c 84 CRC32c
Figure 6: Annotated Hex Dump of an FPDU with Marker
4.5. FPDU Size Considerations
MPA defines the Maximum Upper Layer Protocol Data Unit (MULPDU) as
the size of the largest ULPDU fitting in an FPDU. For an empty TCP
Segment, MULPDU is EMSS minus the FPDU overhead (6 octets) minus
space for Markers and pad octets.
The maximum ULPDU Length for a single ULPDU when Markers are
present MUST be computed as:
MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4)
The formula above accounts for the worst-case number of Markers.
The maximum ULPDU Length for a single ULPDU when Markers are NOT
present MUST be computed as:
MULPDU = EMSS - (6 + EMSS mod 4)
As a further optimization of the wire efficiency an MPA
implementation MAY dynamically adjust the MULPDU (see Section 5 for
latency and wire efficiency trade-offs). When one or more FPDUs are
already packed into a TCP Segment, MULPDU MAY be reduced accordingly.
DDP SHOULD provide ULPDUs that are as large as possible, but less
than or equal to MULPDU.
If the TCP implementation needs to adjust EMSS to support MTU changes
or changing TCP options, the MULPDU value is changed accordingly.
In certain rare situations, the EMSS may shrink below 128 octets in
size. If this occurs, the MPA on TCP sender MUST NOT shrink the
MULPDU below 128 octets and is not required to follow the
segmentation rules in Section 5.1 and Appendix A.
If one or more FPDUs are already packed into a TCP segment, such that
the remaining room is less than 128 octets, MPA MUST NOT provide a
MULPDU smaller than 128. In this case, MPA would typically provide a
MULPDU for the next full sized segment, but may still pack the next
FPDU into the small remaining room, provide that the next FPDU is
small enough to fit.
The value 128 is chosen as to allow DDP designers room for the DDP
Header and some user data.