RFC 8540

Stream Control Transmission Protocol: Errata and Issues in RFC 4960

Pages: 94
Obsoleted by: 9260

Part 4 of 7 – Pages 41 to 55

RFC8540 - Page 41 prevText

3.26.  Increasing the cwnd in the Congestion Avoidance Phase

3.26.1.  Description of the Problem

   Section 7.2.2 of [RFC4960] prescribes that cwnd be increased by 1*MTU
   per RTT if the sender has cwnd or more bytes of data outstanding to
   the corresponding address in the congestion avoidance phase.
   However, this is described without normative language.  Moreover,
   Section 7.2.2 of [RFC4960] includes an algorithm that specifies how
   an implementation can achieve this, but this algorithm is
   underspecified and actually allows increasing cwnd by more than 1*MTU
   per RTT.

3.26.2.  Text Changes to the Document

   ---------
   Old text: (Section 7.2.2)
   ---------

   When cwnd is greater than ssthresh, cwnd should be incremented by
   1*MTU per RTT if the sender has cwnd or more bytes of data
   outstanding for the corresponding transport address.

RFC8540 - Page 42

   ---------
   New text: (Section 7.2.2)
   ---------

   When cwnd is greater than ssthresh, cwnd SHOULD be incremented by
   1*MTU per RTT if the sender has cwnd or more bytes of data
   outstanding for the corresponding transport address.  The basic
   guidelines for incrementing cwnd during congestion avoidance are as
   follows:

   o  SCTP MAY increment cwnd by 1*MTU.

   o  SCTP SHOULD increment cwnd by 1*MTU once per RTT when the sender
      has cwnd or more bytes of data outstanding for the corresponding
      transport address.

   o  SCTP MUST NOT increment cwnd by more than 1*MTU per RTT.

   This text is in final form and is not further updated in this
   document.

   ---------
   Old text: (Section 7.2.2)
   ---------

   o  Whenever cwnd is greater than ssthresh, upon each SACK arrival
      that advances the Cumulative TSN Ack Point, increase
      partial_bytes_acked by the total number of bytes of all new chunks
      acknowledged in that SACK including chunks acknowledged by the new
      Cumulative TSN Ack and by Gap Ack Blocks.

   o  When partial_bytes_acked is equal to or greater than cwnd and
      before the arrival of the SACK the sender had cwnd or more bytes
      of data outstanding (i.e., before arrival of the SACK, flightsize
      was greater than or equal to cwnd), increase cwnd by MTU, and
      reset partial_bytes_acked to (partial_bytes_acked - cwnd).

   ---------
   New text: (Section 7.2.2)
   ---------

   o  Whenever cwnd is greater than ssthresh, upon each SACK arrival,
      increase partial_bytes_acked by the total number of bytes of all
      new chunks acknowledged in that SACK, including chunks
      acknowledged by the new Cumulative TSN Ack, by Gap Ack Blocks,
      and by the number of bytes of duplicated chunks reported in
      Duplicate TSNs.

RFC8540 - Page 43

   o  (1) when partial_bytes_acked is greater than cwnd and (2) before
      the arrival of the SACK the sender had less than cwnd bytes of
      data outstanding (i.e., before the arrival of the SACK, flightsize
      was less than cwnd), reset partial_bytes_acked to cwnd.

   o  (1) when partial_bytes_acked is equal to or greater than cwnd and
      (2) before the arrival of the SACK the sender had cwnd or more
      bytes of data outstanding (i.e., before the arrival of the SACK,
      flightsize was greater than or equal to cwnd), partial_bytes_acked
      is reset to (partial_bytes_acked - cwnd).  Next, cwnd is increased
      by 1*MTU.

   This text has been modified by multiple errata.  It includes
   modifications from Sections 3.12 and 3.22.  It is in final form and
   is not further updated in this document.

3.26.3.  Solution Description

   The basic guidelines for incrementing cwnd during the congestion
   avoidance phase are added into Section 7.2.2.  The guidelines include
   the normative language and are aligned with [RFC5681].

   The algorithm from Section 7.2.2 is improved and now does not allow
   increasing cwnd by more than 1*MTU per RTT.

3.27.  Refresh of cwnd and ssthresh after Idle Period

3.27.1.  Description of the Problem

   [RFC4960] prescribes that cwnd per RTO be adjusted if the endpoint
   does not transmit data on a given transport address.  In addition to
   that, it prescribes that cwnd be set to the initial value after a
   sufficiently long idle period.  The latter is excessive.  Moreover,
   what is considered a sufficiently long idle period is unclear.

   [RFC4960] doesn't specify the handling of ssthresh in the idle case.
   If ssthresh is reduced due to packet loss, ssthresh is never
   recovered.  So, traffic can end up in congestion avoidance all the
   time, resulting in a low sending rate and bad performance.  The
   problem is even more serious for SCTP: in a multi-homed SCTP
   association, traffic that switches back to the previously failed
   primary path will also lead to the situation where traffic ends up in
   congestion avoidance.

RFC8540 - Page 44

3.27.2.  Text Changes to the Document

   ---------
   Old text: (Section 7.2.1)
   ---------

   o  The initial cwnd before DATA transmission or after a sufficiently
      long idle period MUST be set to min(4*MTU, max (2*MTU, 4380
      bytes)).

   ---------
   New text: (Section 7.2.1)
   ---------

   o  The initial cwnd before data transmission MUST be set to
      min(4*MTU, max (2*MTU, 4380 bytes)).

   ---------
   Old text: (Section 7.2.1)
   ---------

   o  When the endpoint does not transmit data on a given transport
      address, the cwnd of the transport address should be adjusted to
      max(cwnd/2, 4*MTU) per RTO.

   ---------
   New text: (Section 7.2.1)
   ---------

   o  While the endpoint does not transmit data on a given transport
      address, the cwnd of the transport address SHOULD be adjusted to
      max(cwnd/2, 4*MTU) once per RTO.  Before the first cwnd
      adjustment, the ssthresh of the transport address SHOULD be set to
      the cwnd.

   This text is in final form and is not further updated in this
   document.

3.27.3.  Solution Description

   A rule about cwnd adjustment after a sufficiently long idle period is
   removed.

   The text is updated to describe the handling of ssthresh.  When the
   idle period is detected, the cwnd value is copied to ssthresh.

RFC8540 - Page 45

3.28.  Window Updates after Receiver Window Opens Up

3.28.1.  Description of the Problem

   The sending of SACK chunks for window updates is only indirectly
   referenced in Section 6.2 of [RFC4960], which states that an SCTP
   receiver must not generate more than one SACK for every incoming
   packet, other than to update the offered window.

   However, to avoid performance problems, it is necessary to send the
   window updates when the receiver window opens up.

3.28.2.  Text Changes to the Document

   ---------
   Old text: (Section 6.2)
   ---------

   An SCTP receiver MUST NOT generate more than one SACK for every
   incoming packet, other than to update the offered window as the
   receiving application consumes new data.

   ---------
   New text: (Section 6.2)
   ---------

   An SCTP receiver MUST NOT generate more than one SACK for every
   incoming packet, other than to update the offered window as the
   receiving application consumes new data.  When the window opens up,
   an SCTP receiver SHOULD send additional SACK chunks to update the
   window even if no new data is received.  The receiver MUST avoid
   sending a large number of window updates -- in particular, large
   bursts of them.  One way to achieve this is to send a window update
   only if the window can be increased by at least a quarter of the
   receive buffer size of the association.

   This text is in final form and is not further updated in this
   document.

3.28.3.  Solution Description

   The new text makes it clear that additional SACK chunks for window
   updates should be sent as long as excessive bursts are avoided.

RFC8540 - Page 46

3.29.  Path of DATA and Reply Chunks

3.29.1.  Description of the Problem

   Section 6.4 of [RFC4960] describes the transmission policy for
   multi-homed SCTP endpoints.  However, this policy has the following
   issues:

   o  It states that a SACK should be sent to the source address of an
      incoming DATA.  However, it is known that other SACK policies
      (e.g., always sending SACKs to the primary path) may be more
      beneficial in some situations.

   o  Also, it initially states that an endpoint should always transmit
      DATA chunks to the primary path but then states that the rule for
      the transmittal of reply chunks should also be followed if the
      endpoint is bundling DATA chunks together with the reply chunk.
      The second statement contradicts the first statement.  Some
      implementations were having problems with it and sent DATA chunks
      bundled with reply chunks to a different destination address than
      the primary path, causing many gaps.

3.29.2.  Text Changes to the Document

   ---------
   Old text: (Section 6.4)
   ---------

   An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
   etc.) to the same destination transport address from which it
   received the DATA or control chunk to which it is replying.  This
   rule should also be followed if the endpoint is bundling DATA chunks
   together with the reply chunk.

   However, when acknowledging multiple DATA chunks received in packets
   from different source addresses in a single SACK, the SACK chunk may
   be transmitted to one of the destination transport addresses from
   which the DATA or control chunks being acknowledged were received.

   ---------
   New text: (Section 6.4)
   ---------

   An endpoint SHOULD transmit reply chunks (e.g., INIT ACK, COOKIE ACK,
   HEARTBEAT ACK) in response to control chunks to the same destination
   transport address from which it received the control chunk to which
   it is replying.

RFC8540 - Page 47

   The selection of the destination transport address for packets
   containing SACK chunks is implementation dependent.  However, an
   endpoint SHOULD NOT vary the destination transport address of a SACK
   when it receives DATA chunks coming from the same source address.

   When acknowledging multiple DATA chunks received in packets from
   different source addresses in a single SACK, the SACK chunk MAY be
   transmitted to one of the destination transport addresses from which
   the DATA or control chunks being acknowledged were received.

   This text is in final form and is not further updated in this
   document.

3.29.3.  Solution Description

   The SACK transmission policy is left implementation dependent, but
   the new text now specifies that the policy not vary the destination
   address of a packet containing a SACK chunk unless there are reasons
   for not doing so, as varying the destination address may negatively
   impact RTT measurement.

   New text removes a confusing statement that prescribes following the
   rule for transmittal of reply chunks when the endpoint is bundling
   DATA chunks together with the reply chunk.

3.30.  "Outstanding Data", "Flightsize", and "Data in Flight" Key Terms

3.30.1.  Description of the Problem

   [RFC4960] uses the key terms "outstanding data", "flightsize", and
   "data in flight" in formulas and statements, but Section 1.3
   ("Key Terms") of [RFC4960] does not provide their definitions.
   Furthermore, outstanding data does not include DATA chunks that are
   classified as lost but that have not yet been retransmitted, and
   there is a paragraph in Section 6.1 of [RFC4960] where this statement
   is broken.

3.30.2.  Text Changes to the Document

   ---------
   Old text: (Section 1.3)
   ---------

   o  Congestion window (cwnd): An SCTP variable that limits the data,
      in number of bytes, a sender can send to a particular destination
      transport address before receiving an acknowledgement.

   ...

RFC8540 - Page 48

   o  Outstanding TSN (at an SCTP endpoint): A TSN (and the associated
      DATA chunk) that has been sent by the endpoint but for which it
      has not yet received an acknowledgement.

   ---------
   New text: (Section 1.3)
   ---------

   o  Congestion window (cwnd): An SCTP variable that limits outstanding
      data, in number of bytes, that a sender can send to a particular
      destination transport address before receiving an acknowledgement.

   ...

   o  Flightsize: The amount of bytes of outstanding data to a
      particular destination transport address at any given time.

   ...

   o  Outstanding data (or "data outstanding" or "data in flight"): The
      total amount of the DATA chunks associated with outstanding TSNs.
      A retransmitted DATA chunk is counted once in outstanding data.  A
      DATA chunk that is classified as lost but that has not yet been
      retransmitted is not in outstanding data.

   o  Outstanding TSN (at an SCTP endpoint): A TSN (and the associated
      DATA chunk) that has been sent by the endpoint but for which it
      has not yet received an acknowledgement.

   This text is in final form and is not further updated in this
   document.

   ---------
   Old text: (Section 6.1)
   ---------

   C) When the time comes for the sender to transmit, before sending new
      DATA chunks, the sender MUST first transmit any outstanding DATA
      chunks that are marked for retransmission (limited by the current
      cwnd).

   ---------
   New text: (Section 6.1)
   ---------

   C) When the time comes for the sender to transmit, before sending new
      DATA chunks, the sender MUST first transmit any DATA chunks that
      are marked for retransmission (limited by the current cwnd).

RFC8540 - Page 49

   This text is in final form and is not further updated in this
   document.

3.30.3.  Solution Description

   Section 1.3 is corrected to include explanations of the key terms
   "outstanding data", "data in flight", and "flightsize".  Section 6.1
   is corrected to now use "any DATA chunks" instead of "any outstanding
   DATA chunks".

3.31.  Degradation of cwnd due to Max.Burst

3.31.1.  Description of the Problem

   Some implementations were experiencing a degradation of cwnd because
   of the Max.Burst limit.  This was due to misinterpretation of the
   suggestion in Section 6.1 of [RFC4960] regarding how to use the
   Max.Burst parameter when calculating the number of packets to
   transmit.

3.31.2.  Text Changes to the Document

   ---------
   Old text: (Section 6.1)
   ---------

   D) When the time comes for the sender to transmit new DATA chunks,
      the protocol parameter Max.Burst SHOULD be used to limit the
      number of packets sent.  The limit MAY be applied by adjusting
      cwnd as follows:

      if((flightsize + Max.Burst*MTU) < cwnd) cwnd = flightsize +
      Max.Burst*MTU

      Or it MAY be applied by strictly limiting the number of packets
      emitted by the output routine.

   ---------
   New text: (Section 6.1)
   ---------

   D) When the time comes for the sender to transmit new DATA chunks,
      the protocol parameter Max.Burst SHOULD be used to limit the
      number of packets sent.  The limit MAY be applied by adjusting
      cwnd temporarily, as follows:

      if ((flightsize + Max.Burst*MTU) < cwnd)
          cwnd = flightsize + Max.Burst*MTU

RFC8540 - Page 50

      Or, it MAY be applied by strictly limiting the number of packets
      emitted by the output routine.  When calculating the number of
      packets to transmit, and particularly when using the formula
      above, cwnd SHOULD NOT be changed permanently.

   This text is in final form and is not further updated in this
   document.

3.31.3.  Solution Description

   The new text clarifies that cwnd should not be changed when applying
   the Max.Burst limit.  This mitigates packet bursts related to the
   reception of SACK chunks but not bursts related to an application
   sending a burst of user messages.

3.32.  Reduction of RTO.Initial

3.32.1.  Description of the Problem

   [RFC4960] uses 3 seconds as the default value for RTO.Initial in
   accordance with Section 4.2.3.1 of [RFC1122].  [RFC6298] updates
   [RFC1122] and lowers the initial value of the retransmission timer
   from 3 seconds to 1 second.

3.32.2.  Text Changes to the Document

   ---------
   Old text: (Section 15)
   ---------

   The following protocol parameters are RECOMMENDED:

      RTO.Initial - 3 seconds
      RTO.Min - 1 second
      RTO.Max - 60 seconds
      Max.Burst - 4
      RTO.Alpha - 1/8
      RTO.Beta - 1/4
      Valid.Cookie.Life - 60 seconds
      Association.Max.Retrans - 10 attempts
      Path.Max.Retrans - 5 attempts (per destination address)
      Max.Init.Retransmits - 8 attempts
      HB.interval - 30 seconds
      HB.Max.Burst - 1

RFC8540 - Page 51

   ---------
   New text: (Section 15)
   ---------

   The following protocol parameters are RECOMMENDED:

      RTO.Initial: 1 second
      RTO.Min: 1 second
      RTO.Max: 60 seconds
      Max.Burst: 4
      RTO.Alpha: 1/8
      RTO.Beta: 1/4
      Valid.Cookie.Life: 60 seconds
      Association.Max.Retrans: 10 attempts
      Path.Max.Retrans: 5 attempts (per destination address)
      Max.Init.Retransmits: 8 attempts
      HB.interval: 30 seconds
      HB.Max.Burst: 1
      SACK.Delay: 200 milliseconds

   This text has been modified by multiple errata.  It includes
   modifications from Section 3.24.  It is in final form and is not
   further updated in this document.

3.32.3.  Solution Description

   The default value for RTO.Initial has been lowered to 1 second to be
   in tune with [RFC6298].

3.33.  Ordering of Bundled SACK and ERROR Chunks

3.33.1.  Description of the Problem

   When an SCTP endpoint receives a DATA chunk with an invalid stream
   identifier, it shall acknowledge it by sending a SACK chunk and
   indicate that the stream identifier was invalid by sending an ERROR
   chunk.  These two chunks may be bundled.  However, in the case of
   bundling, [RFC4960] requires that the ERROR chunk follow the SACK
   chunk.  This restriction regarding the ordering of the chunks is not
   necessary and might limit interoperability.

RFC8540 - Page 52

3.33.2.  Text Changes to the Document

   ---------
   Old text: (Section 6.5)
   ---------

   Every DATA chunk MUST carry a valid stream identifier.  If an
   endpoint receives a DATA chunk with an invalid stream identifier, it
   shall acknowledge the reception of the DATA chunk following the
   normal procedure, immediately send an ERROR chunk with cause set to
   "Invalid Stream Identifier" (see Section 3.3.10), and discard the
   DATA chunk.  The endpoint may bundle the ERROR chunk in the same
   packet as the SACK as long as the ERROR follows the SACK.

   ---------
   New text: (Section 6.5)
   ---------

   Every DATA chunk MUST carry a valid stream identifier.  If an
   endpoint receives a DATA chunk with an invalid stream identifier, it
   SHOULD acknowledge the reception of the DATA chunk following the
   normal procedure, immediately send an ERROR chunk with cause set to
   "Invalid Stream Identifier" (see Section 3.3.10), and discard the
   DATA chunk.  The endpoint MAY bundle the ERROR chunk and the SACK
   chunk in the same packet.

   This text is in final form and is not further updated in this
   document.

3.33.3.  Solution Description

   The unnecessary restriction regarding the ordering of the SACK and
   ERROR chunks has been removed.

3.34.  Undefined Parameter Returned by RECEIVE Primitive

3.34.1.  Description of the Problem

   [RFC4960] provides a description of an abstract API.  In the
   definition of the RECEIVE primitive, an optional parameter with name
   "delivery number" is mentioned.  However, no definition of this
   parameter is given in [RFC4960], and the parameter is unnecessary.

RFC8540 - Page 53

3.34.2.  Text Changes to the Document

   ---------
   Old text: (Section 10.1 G))
   ---------

   G) Receive

   Format: RECEIVE(association id, buffer address, buffer size
           [,stream id])
   -> byte count [,transport address] [,stream id] [,stream sequence
      number] [,partial flag] [,delivery number] [,payload protocol-id]

   ---------
   New text: (Section 10.1 G))
   ---------

   G) Receive

   Format: RECEIVE(association id, buffer address, buffer size
           [,stream id])
   -> byte count [,transport address] [,stream id] [,stream sequence
      number] [,partial flag] [,payload protocol-id]

   This text is in final form and is not further updated in this
   document.

3.34.3.  Solution Description

   The undefined parameter has been removed.

3.35.  DSCP Changes

3.35.1.  Description of the Problem

   The upper layer can change the Differentiated Services Code Point
   (DSCP) used for packets being sent.  Changing the DSCP can result in
   packets hitting different queues on the path.  Therefore, congestion
   control should be initialized when the DSCP is changed by the upper
   layer.  This is not described in [RFC4960].

RFC8540 - Page 54

3.35.2.  Text Changes to the Document

   ---------
   New text: (Section 7.2.5)
   ---------

   7.2.5.  Making Changes to Differentiated Services Code Points

      SCTP implementations MAY allow an application to configure the
      Differentiated Services Code Point (DSCP) used for sending
      packets.  If a DSCP change might result in outgoing packets being
      queued in different queues, the congestion control parameters for
      all affected destination addresses MUST be reset to their initial
      values.

   This text is in final form and is not further updated in this
   document.

   ---------
   Old text: (Section 10.1 M))
   ---------

   Mandatory attributes:

   o  association id - local handle to the SCTP association.

   o  protocol parameter list - the specific names and values of the
      protocol parameters (e.g., Association.Max.Retrans; see
      Section 15) that the SCTP user wishes to customize.

   ---------
   New text: (Section 10.1 M))
   ---------

   Mandatory attributes:

   o  association id - local handle to the SCTP association.

   o  protocol parameter list - the specific names and values of the
      protocol parameters (e.g., Association.Max.Retrans (see
      Section 15), or other parameters like the DSCP) that the SCTP user
      wishes to customize.

   This text is in final form and is not further updated in this
   document.

RFC8540 - Page 55

3.35.3.  Solution Description

   Text describing the required action for DSCP changes has been added.

(page 55 continued on part 5)