3.6. Offer/Answer Model Extensions
In this section, we define extensions to the offer/answer model
defined in [RFC3264] to allow for potential configurations to be
included in an offer, where they constitute alternative offers that
may be accepted by the answerer instead of the actual
configuration(s) included in the "m=" line(s).
The procedures defined in the following subsections apply to both
unicast and multicast streams.
3.6.1. Generating the Initial Offer
An offerer that wants to use the SDP Capability Negotiation defined
in this document MUST include the following in the offer:
o Zero or more attribute capability attributes. There MUST be an
attribute capability attribute ("a=acap") as defined in Section
3.4.1 for each attribute name and associated value (if any) that
needs to be indicated as a capability in the offer. Attribute
capabilities may be included irrespective of whether or not they
are referenced by a potential configuration.
Session-level attributes and associated values MUST be provided in
attribute capabilities only at the session level, whereas media-
level attributes and associated values can be provided in
attribute capabilities at either the media level or session level.
Attributes that are allowed at either the session or media level
can be provided in attribute capabilities at either level.
o Zero or more transport protocol capability attributes. There MUST
be transport protocol capabilities as defined in Section 3.4.2
with values for each transport protocol that needs to be indicated
as a capability in the offer.
Transport protocol capabilities may be included irrespective of
whether or not they are referenced by a potential configuration.
Transport protocols that apply to multiple media descriptions
SHOULD be provided as transport protocol capabilities at the
session level whereas transport protocols that apply only to a
specific media description ("m=" line), SHOULD be provided as
transport protocol capabilities within that particular media
description. In either case, there MUST NOT be more than a single
"a=tcap" attribute at the session level and a single "a=tcap"
attribute in each media description.
o Zero or more extension capability attributes. There MUST be one
or more extension capability attributes (as outlined in Section
3.4.3) for each extension capability that is referenced by a
potential configuration. Extension capability attributes that are
not referenced by a potential configuration can be provided as
o Zero or more potential configuration attributes. There MUST be
one or more potential configuration attributes ("a=pcfg"), as
defined in Section 3.5.1, in each media description where
alternative potential configurations are to be negotiated. Each
potential configuration attribute MUST adhere to the rules
provided in Section 3.5.1 and the additional rules provided below.
If the offerer requires support for one or more extensions (besides
the base protocol defined here), then the offerer MUST include one or
more "a=creq" attributes as follows:
o If support for one or more capability negotiation extensions is
required for the entire session description, then option tags for
those extensions MUST be included in a single session-level "creq"
o For each media description that requires support for one or more
capability negotiation extensions not listed at the session level,
a single "creq" attribute containing all the required extensions
for that media description MUST be included within the media
description (in accordance with Section 3.3.2).
Note that extensions that only need to be supported by a particular
potential configuration can use the "mandatory" extension prefix
("+") within the potential configuration (see Section 3.5.1).
The offerer SHOULD furthermore include the following:
o A supported capability negotiation extension attribute ("a=csup")
at the session level and/or media level as defined in Section
3.3.2 for each capability negotiation extension supported by the
offerer and not included in a corresponding "a=creq" attribute
(i.e., at the session level or in the same media description).
Option tags provided in a "a=csup" attribute at the session level
indicate extensions supported for the entire session description,
whereas option tags provided in a "a=csup" attribute in a media
description indicate extensions supported for only that particular
Capabilities provided in an offer merely indicate what the offerer is
capable of doing. They do not constitute a commitment or even an
indication to use them. In contrast, each potential configuration
constitutes an alternative offer that the offerer would like to use.
The potential configurations MUST be used by the answerer to
negotiate and establish the session.
The offerer MUST include one or more potential configuration
attributes ("a=pcfg") in each media description where the offerer
wants to provide alternative offers (in the form of potential
configurations). Each potential configuration attribute in a given
media description MUST contain a unique configuration number and
zero, one or more potential configuration lists, as described in
Section 3.5.1. Each potential configuration list MUST refer to
capabilities that are provided at the session level or within that
particular media description; otherwise, the potential configuration
is considered invalid. The base SDP Capability Negotiation framework
REQUIRES that potential configurations not reference any session-
level attribute capabilities that contain media-level-only
attributes; however, extensions may modify this behavior, as long as
it is fully backwards compatible with the base specification.
Furthermore, it is RECOMMENDED that potential configurations avoid
use of session-level capabilities whenever possible; refer to Section
The current actual configuration is included in the "m=" line (as
defined by [RFC3264]) and any associated parameters for the media
description (e.g., attribute ("a=") and bandwidth ("b=") lines).
Note that the actual configuration is by default the least-preferred
configuration, and hence the answerer will seek to negotiate use of
one of the potential configurations instead. If the offerer wishes a
different preference for the actual configuration, the offerer MUST
include a corresponding potential configuration with the relevant
configuration number (which indicates the relative preference between
potential configurations); this corresponding potential configuration
should simply duplicate the actual configuration.
This can either be done implicitly (by not referencing any
capabilities), or explicitly (by providing and using capabilities
for the transport protocol and all the attributes that are part of
the actual configuration). The latter may help detect
intermediaries that modify the actual configuration but are not
SDP Capability Negotiation aware.
Per [RFC3264], once the offerer generates the offer, he must be
prepared to receive incoming media in accordance with that offer.
That rule applies here as well, but only for the actual
configurations provided in the offer: Media received by the offerer
according to one of the potential configurations MAY be discarded,
until the offerer receives an answer indicating what the actual
selected configuration is. Once that answer is received, incoming
media MUST be processed in accordance with the actual selected
configuration indicated and the answer received (provided the
offer/answer exchange completed successfully).
The above rule assumes that the offerer can determine whether
incoming media adheres to the actual configuration offered or one of
the potential configurations instead; this may not always be the
case. If the offerer wants to ensure he does not play out any
garbage, the offerer SHOULD discard all media received before the
answer SDP session description is received. Conversely, if the
offerer wants to avoid clipping, he SHOULD attempt to play any
incoming media as soon as it is received (at the risk of playing out
garbage). In either case, please note that this document does not
place any requirements on the offerer to process and play media
before answer. For further details, please refer to Section 3.9.
3.6.2. Generating the Answer
When receiving an offer, the answerer MUST check for the presence of
a required capability negotiation extension attribute ("a=creq")
provided at the session level. If one is found, then capability
negotiation MUST be performed. If none is found, then the answerer
MUST check each offered media description for the presence of a
required capability negotiation extension attribute ("a=creq") and
one or more potential configuration attributes ("a=pcfg").
Capability negotiation MUST be performed for each media description
where either of those is present in accordance with the procedures
The answerer MUST first ensure that it supports any required
capability negotiation extensions:
o If a session-level "creq" attribute is provided, and it contains
an option tag that the answerer does not support, then the
answerer MUST NOT use any of the potential configuration
attributes provided for any of the media descriptions. Instead,
the normal offer/answer procedures MUST continue as per [RFC3264].
Furthermore, the answerer MUST include a session-level supported
capability negotiation extensions attribute ("a=csup") with option
tags for the capability negotiation extensions supported by the
o If a media-level "creq" attribute is provided, and it contains an
option tag that the answerer does not support, then the answerer
MUST NOT use any of the potential configuration attributes
provided for that particular media description. Instead, the
offer/answer procedures for that media description MUST continue
as per [RFC3264] (SDP Capability Negotiation is still performed
for other media descriptions in the SDP session description).
Furthermore, the answerer MUST include a supported capability
negotiation extensions attribute ("a=csup") in that media
description with option tags for the capability negotiation
extensions supported by the answerer for that media description.
Assuming all required capability negotiation extensions are
supported, the answerer now proceeds as follows.
For each media description where capability negotiation is to be
performed (i.e., all required capability negotiation extensions are
supported and at least one valid potential configuration attribute is
present), the answerer MUST perform capability negotiation by using
the most preferred potential configuration that is valid to the
answerer, subject to any local policies. A potential configuration
is valid to the answerer if:
1. It is in accordance with the syntax and semantics provided in
2. It contains a configuration number that is unique within that
3. All attribute capabilities referenced by the potential
configuration are valid themselves (as defined in Section 3.4.1)
and each of them is provided either at the session level or within
this particular media description.
For session-level attribute capabilities referenced, the
attributes contained inside them MUST NOT be media-level-only
attributes. Note that the answerer can only determine this for
attributes supported by the answerer. If an attribute is not
supported, it will simply be ignored by the answerer and hence
will not trigger an "invalid" potential configuration.
4. All transport protocol capabilities referenced by the potential
configuration are valid themselves (as defined in Section 3.4.2)
and each of them is furthermore provided either at the session
level or within this particular media description.
5. All extension capabilities referenced by the potential
configuration and supported by the answerer are valid themselves
(as defined by that particular extension) and each of them are
furthermore provided either at the session level or within this
particular media description. Unknown or unsupported extension
capabilities MUST be ignored, unless they are prefixed with the
plus ("+") sign, which indicates that the extension MUST be
supported in order to use that potential configuration. If the
extension is not supported, that potential configuration is not
valid to the answerer.
The most preferred valid potential configuration in a media
description is the valid potential configuration with the lowest
configuration number. The answerer MUST now process the offer for
that media stream based on the most preferred valid potential
configuration. Conceptually, this entails the answerer constructing
an (internal) offer as follows. First, all capability negotiation
parameters from the offer SDP session description are removed,
thereby yielding an offer SDP session description with the actual
configuration as if SDP Capability Negotiation was not done in the
first place. Secondly, this actual configuration SDP session
description is modified as follows for each media stream offered,
based on the capability negotiation parameters included originally:
o If a transport protocol capability is included in the potential
configuration, then it replaces the transport protocol provided in
the "m=" line for that media description.
o If attribute capabilities are present with a delete-attributes
session indication ("-s") or media and session indication ("-ms"),
then all session-level attributes from the actual configuration
SDP session description MUST be deleted in the resulting potential
configuration SDP session description in accordance with the
procedures in Section 3.5.1. If attribute capabilities are
present with a delete-attributes media indication ("-m") or media
and session indication ("-ms"), then all attributes from the
actual configuration SDP session description inside this media
description MUST be deleted.
o If a session-level attribute capability is included, the attribute
(and its associated value, if any) contained in it MUST be added
to the resulting SDP session description. All such added session-
level attributes MUST be listed before the session-level
attributes that were initially present in the SDP session
description. Furthermore, the added session-level attributes MUST
be added in the order they were provided in the potential
configuration (see also Section 3.5.1).
This allows for attributes with implicit preference ordering to
be added in the desired order; the "crypto" attribute [RFC4568]
is one such example.
o If a media-level attribute capability is included, then the
attribute (and its associated value, if any) MUST be added to the
resulting SDP session description within the media description in
question. All such added media-level attributes MUST be listed
before the media-level attributes that were initially present in
the media description in question. Furthermore, the added media-
level attributes MUST be added in the order they were provided in
the potential configuration (see also Section 3.5.1).
o If a supported extension capability is included, then it MUST be
processed in accordance with the rules provided for that
particular extension capability.
The above steps MUST be performed exactly once per potential
configuration, i.e., there MUST NOT be any recursive processing of
any additional capability negotiation parameters that may (illegally)
have been nested inside capabilities themselves.
As an example of this, consider the (illegal) attribute capability
a=acap:1 acap:2 foo:a
The resulting potential configuration SDP session description will,
after the above processing has been done, contain the attribute
However, since we do not perform any recursive processing of
capability negotiation parameters, this second attribute capability
parameter will not be processed by the offer/answer procedure.
Instead, it will simply appear as a (useless) attribute in the SDP
session description that will be ignored by further processing.
Note that a transport protocol from the potential configuration
replaces the transport protocol in the actual configuration, but an
attribute capability from the potential configuration is simply added
to the actual configuration. In some cases, this can result in
having one or more meaningless attributes in the resulting potential
configuration SDP session description, or worse, ambiguous or
potentially even illegal attributes. Use of delete-attributes for
the session- and/or media-level attributes MUST be done to avoid such
scenarios. Nevertheless, it is RECOMMENDED that implementations
ignore meaningless attributes that may result from potential
For example, if the actual configuration was using Secure RTP and
included an "a=crypto" attribute for the SRTP keying material,
then use of a potential configuration that uses plain RTP would
make the "crypto" attribute meaningless. The answerer may or may
not ignore such a meaningless attribute. The offerer can here
ensure correct operation by using delete-attributes to remove the
"crypto" attribute (but will then need to provide attribute
capabilities to reconstruct the SDP session description with the
necessary attributes deleted, e.g., rtpmaps).
Also note, that while it is permissible to include media-level
attribute capabilities at the session level, the base SDP Capability
Negotiation framework defined here does not define any procedures for
use of them, i.e., the answerer effectively ignores them.
Please refer to Section 184.108.40.206 for examples of how the answerer may
conceptually "see" the resulting offered alternative potential
The answerer MUST check that he supports all mandatory attribute
capabilities from the potential configuration (if any), the transport
protocol capability (if any) from the potential configuration, and
all mandatory extension capabilities from the potential configuration
(if any). If he does not, the answerer MUST proceed to the second
most preferred valid potential configuration for the media
o In the case of attribute capabilities, support implies that the
attribute name contained in the capability is supported and it can
(and will) be negotiated successfully in the offer/answer exchange
with the value provided. This does not necessarily imply that the
value provided is supported in its entirety. For example, the
"a=fmtp" parameter is often provided with one or more values in a
list, where the offerer and answerer negotiate use of some subset
of the values provided. Other attributes may include mandatory
and optional parts to their values; support for the mandatory part
is all that is required here.
A side effect of the above rule is that whenever an "fmtp" or
"rtpmap" parameter is provided as a mandatory attribute
capability, the corresponding media format (codec) must be
supported and use of it negotiated successfully. If this is
not the offerer's intent, the corresponding attribute
capabilities must be listed as optional instead.
o In the case of transport protocol capabilities, support implies
that the transport protocol contained in the capability is
supported and the transport protocol can (and will) be negotiated
successfully in the offer/answer exchange.
o In the case of extension capabilities, the extension MUST define
the rules for when the extension capability is considered
supported and those rules MUST be satisfied.
If the answerer has exhausted all potential configurations for the
media description, without finding a valid one that is also
supported, then the answerer MUST process the offered media stream
based on the actual configuration plus any session-level attributes
added by a valid and supported potential configuration from another
media description in the offered SDP session description.
The above process describes potential configuration selection as a
per-media-stream process. Inter-media stream coordination of
selected potential configurations however is required in some cases.
First of all, session-level attributes added by a potential
configuration for one media description MUST NOT cause any problems
for potential configurations selected by other media descriptions in
the offer SDP session description. If the session-level attributes
are mandatory, then those session-level attributes MUST furthermore
be supported by the session as a whole (i.e., all the media
descriptions if relevant). As mentioned earlier, this adds
additional complexity to the overall processing and hence it is
RECOMMENDED not to use session-level attribute capabilities in
potential configurations, unless absolutely necessary.
Once the answerer has selected a valid and supported offered
potential configuration for all of the media streams (or has fallen
back to the actual configuration plus any added session attributes),
the answerer MUST generate a valid virtual answer SDP session
description based on the selected potential configuration SDP session
description, as "seen" by the answerer using normal offer/answer
rules (see Section 220.127.116.11 for examples). The actual answer SDP
session description is formed from the virtual answer SDP session
description as follows: if the answerer selected one of the potential
configurations in a media description, the answerer MUST include an
actual configuration attribute ("a=acfg") within that media
description. The "a=acfg" attribute MUST identify the configuration
number for the selected potential configuration as well as the actual
parameters that were used from that potential configuration; if the
potential configuration included alternatives, the selected
alternatives only MUST be included. Only the known and supported
parameters will be included. Unknown or unsupported parameters MUST
NOT be included in the actual configuration attribute. In the case
of attribute capabilities, only the known and supported capabilities
are included; unknown or unsupported attribute capabilities MUST NOT
If the answerer supports one or more capability negotiation
extensions that were not included in a required capability
negotiation extensions attribute in the offer, then the answerer
SHOULD furthermore include a supported capability negotiation
attribute ("a=csup") at the session level with option tags for the
extensions supported across media streams. Also, if the answerer
supports one or more capability negotiation extensions for only
particular media descriptions, then a supported capability
negotiation attribute with those option tags SHOULD be included
within each relevant media description. The required capability
negotiation attribute ("a=creq") MUST NOT be used in an answer.
The offerer's originally provided actual configuration is contained
in the offer media description's "m=" line (and associated
parameters). The answerer MAY send media to the offerer in
accordance with that actual configuration as soon as it receives the
offer; however, it MUST NOT send media based on that actual
configuration if it selects an alternative potential configuration.
If the answerer selects one of the potential configurations, then the
answerer MAY immediately start to send media to the offerer in
accordance with the selected potential configuration; however, the
offerer MAY discard such media or play out garbage until the offerer
receives the answer. Please refer to Section 3.9. for additional
considerations and possible alternative solutions outside the base
SDP Capability Negotiation framework.
If the answerer selected a potential configuration instead of the
actual configuration, then it is RECOMMENDED that the answerer send
back an answer SDP session description as soon as possible. This
minimizes the risk of having media discarded or played out as garbage
by the offerer. In the case of SIP [RFC3261] without any extensions,
this implies that if the offer was received in an INVITE message,
then the answer SDP session description should be provided in the
first non-100 provisional response sent back (per RFC 3261, the
answer would need to be repeated in the 200 response as well, unless
a relevant extension such as [RFC3262] is being used).
18.104.22.168. Example Views of Potential Configurations
The following examples illustrate how the answerer may conceptually
"see" a potential configuration. Consider the following offered SDP
o=alice 2891092738 2891092738 IN IP4 lost.example.com
c=IN IP4 lost.example.com
a=acap:1 key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
a=tcap:1 RTP/SAVP RTP/AVP
m=audio 59000 RTP/AVP 98
a=acap:2 crypto:1 AES_CM_128_HMAC_SHA1_32
a=pcfg:1 t=1 a=1|2
m=video 52000 RTP/AVP 31
a=acap:3 crypto:1 AES_CM_128_HMAC_SHA1_80
a=pcfg:1 t=1 a=1|3
This particular SDP session description offers an audio stream and a
video stream, each of which can either use plain RTP (actual
configuration) or Secure RTP (potential configuration). Furthermore,
two different keying mechanisms are offered, namely session-level Key
Management Extensions using MIKEY (attribute capability 1) and media-
level SDP security descriptions (attribute capabilities 2 and 3).
There are several potential configurations here, however, below we
show the one the answerer "sees" when using potential configuration 1
for both audio and video, and furthermore using attribute capability
1 (MIKEY) for both (we have removed all the capability negotiation
attributes for clarity):
o=alice 2891092738 2891092738 IN IP4 lost.example.com
c=IN IP4 lost.example.com
m=audio 59000 RTP/SAVP 98
m=video 52000 RTP/SAVP 31
Note that the transport protocol in the media descriptions indicate
use of Secure RTP.
Below, we show the offer the answerer "sees" when using potential
configuration 1 for both audio and video and furthermore using
attribute capability 2 and 3, respectively, (SDP security
descriptions) for the audio and video stream -- note the order in
which the resulting attributes are provided:
o=alice 2891092738 2891092738 IN IP4 lost.example.com
c=IN IP4 lost.example.com
m=audio 59000 RTP/SAVP 98
m=video 52000 RTP/SAVP 31
Again, note that the transport protocol in the media descriptions
indicate use of Secure RTP.
And finally, we show the offer the answerer "sees" when using
potential configuration 1 with attribute capability 1 (MIKEY) for the
audio stream, and potential configuration 1 with attribute capability
3 (SDP security descriptions) for the video stream:
o=alice 2891092738 2891092738 IN IP4 lost.example.com
c=IN IP4 lost.example.com
m=audio 59000 RTP/SAVP 98
m=video 52000 RTP/SAVP 31
3.6.3. Offerer Processing of the Answer
When the offerer attempted to use SDP Capability Negotiation in the
offer, the offerer MUST examine the answer for actual use of SDP
For each media description where the offerer included a potential
configuration attribute ("a=pcfg"), the offerer MUST first examine
that media description for the presence of a valid actual
configuration attribute ("a=acfg"). An actual configuration
attribute is valid if:
o it refers to a potential configuration that was present in the
corresponding offer, and
o it contains the actual parameters that were used from that
potential configuration; if the potential configuration included
alternatives, the selected alternatives only MUST be included.
Note that the answer will include only parameters and attribute
capabilities that are known and supported by the answerer, as
described in Section 3.6.2.
If a valid actual configuration attribute is not present in a media
description, then the offerer MUST process the answer SDP session
description for that media stream per the normal offer/answer rules
defined in [RFC3264]. However, if a valid one is found, the offerer
MUST instead process the answer as follows:
o The actual configuration attribute specifies which of the
potential configurations was used by the answerer to generate the
answer for this media stream. This includes all the supported
attribute capabilities and the transport capabilities referenced
by the potential configuration selected, where the attribute
capabilities have any associated delete-attributes included.
Extension capabilities supported by the answerer are included as
o The offerer MUST now process the answer in accordance with the
rules in [RFC3264], except that it must be done as if the offer
consisted of the selected potential configuration instead of the
original actual configuration, including any transport protocol
changes in the media ("m=") line(s), attributes added and deleted
by the potential configuration at the media and session level, and
any extensions used. If this derived answer is not a valid answer
to the potential configuration offer selected by the answerer, the
offerer MUST instead continue further processing as it would have
for a regular offer/answer exchange, where the answer received
does not adhere to the rules of [RFC3264].
If the offer/answer exchange was successful, and if the answerer
selected one of the potential configurations from the offer as the
actual configuration, and the selected potential configuration
differs from the actual configuration in the offer (the "m=", "a=",
etc., lines), then the offerer SHOULD initiate another offer/answer
exchange. This second offer/answer exchange will not modify the
session in any way; however, it will help intermediaries (e.g.,
middleboxes), which look at the SDP session description but do not
support the capability negotiation extensions, understand the details
of the media stream(s) that were actually negotiated. This new offer
MUST contain the selected potential configuration as the actual
configuration, i.e., with the actual configuration used in the "m="
line and any other relevant attributes, bandwidth parameters, etc.
Note that, per normal offer/answer rules, the second offer/answer
exchange still needs to update the version number in the "o=" line
(<sess-version> in [RFC4566]). Attribute lines carrying keying
material SHOULD repeat the keys from the previous offer, unless
re-keying is necessary, e.g., due to a previously forked SIP INVITE
request. Please refer to Section 3.12 for additional considerations
related to intermediaries.
3.6.4. Modifying the Session
Capabilities and potential configurations may be included in
subsequent offers as defined in [RFC3264], Section 8. The procedure
for doing so is similar to that described above with the answer
including an indication of the actual selected configuration used by
If the answer indicates use of a potential configuration from the
offer, then the guidelines provided in Section 3.6.3 for doing a
second offer/answer exchange using that potential configuration as
the actual configuration apply.
3.7. Interactions with ICE
Interactive Connectivity Establishment (ICE) [RFC5245] provides a
mechanism for verifying connectivity between two endpoints by sending
Session Traversal Utilities for NAT (STUN) messages directly between
the media endpoints. The basic ICE specification [RFC5245] is only
defined to support UDP-based connectivity; however, it allows for
extensions to support other transport protocols, such as TCP, which
is being specified in [ICETCP]. ICE defines a new "a=candidate"
attribute, which, among other things, indicates the possible
transport protocol(s) to use and then associates a priority with each
of them. The most preferred transport protocol that *successfully*
verifies connectivity will end up being used.
When using ICE, it is thus possible that the transport protocol that
will be used differs from what is specified in the "m=" line. Since
both ICE and SDP Capability Negotiation may specify alternative
transport protocols, there is a potentially unintended interaction
when using these together.
We provide the following guidelines for addressing that.
There are two basic scenarios to consider:
1) A particular media stream can run over different transport
protocols (e.g., UDP, TCP, or TCP/TLS), and the intent is simply
to use the one that works (in the preference order specified).
2) A particular media stream can run over different transport
protocols (e.g., UDP, TCP, or TCP/TLS) and the intent is to have
the negotiation process decide which one to use (e.g., T.38 over
TCP or UDP).
In scenario 1, there should be ICE "a=candidate" attributes for UDP,
TCP, etc., but otherwise nothing special in the potential
configuration attributes to indicate the desire to use different
transport protocols (e.g., UDP, or TCP). The ICE procedures
essentially cover the capability negotiation required (by having the
answerer select something it supports and then use of trial and error
Scenario 2 does not require a need to support or use ICE. Instead,
we simply use transport protocol capabilities and potential
configuration attributes to indicate the desired outcome.
The scenarios may be combined, e.g., by offering potential
configuration alternatives where some of them can support only one
transport protocol (e.g., UDP), whereas others can support multiple
transport protocols (e.g., UDP or TCP). In that case, there is a
need for tight control over the ICE candidates that will be used for
a particular configuration, yet the actual configuration may want to
use all of the ICE candidates. In that case, the ICE candidate
attributes can be defined as attribute capabilities and the relevant
ones should then be included in the proper potential configurations
(for example, candidate attributes for UDP only for potential
configurations that are restricted to UDP, whereas there could be
candidate attributes for UDP, TCP, and TCP/TLS for potential
configurations that can use all three). Furthermore, use of the
delete-attributes in a potential configuration can be used to ensure
that ICE will not end up using a transport protocol that is not
desired for a particular configuration.
SDP Capability Negotiation recommends use of a second offer/answer
exchange when the negotiated actual configuration was one of the
potential configurations from the offer (see Section 3.6.3).
Similarly, ICE requires use of a second offer/answer exchange if the
chosen candidate is not the same as the one in the m/c-line from the
offer. When ICE and capability negotiation are used at the same
time, the two secondary offer/answer exchanges SHOULD be combined to
a single one.
3.8. Interactions with SIP Option Tags
SIP [RFC3261] allows for SIP extensions to define a SIP option tag
that identifies the SIP extension. Support for one or more such
extensions can be indicated by use of the SIP Supported header, and
required support for one or more such extensions can be indicated by
use of the SIP Require header. The "a=csup" and "a=creq" attributes
defined by the SDP Capability Negotiation framework are similar,
except that support for these two attributes by themselves cannot be
guaranteed (since they are specified as extensions to the SDP
specification [RFC4566] itself).
SIP extensions with associated option tags can introduce enhancements
to not only SIP, but also SDP. This is for example the case for SIP
preconditions defined in [RFC3312]. When using SDP Capability
Negotiation, some potential configurations may include certain SDP
extensions, whereas others may not. Since the purpose of the SDP
Capability Negotiation is to negotiate a session based on the
features supported by both sides, use of the SIP Require header for
such extensions may not produce the desired result. For example, if
one potential configuration requires SIP preconditions support,
another does not, and the answerer does not support preconditions,
then use of the SIP Require header for preconditions would result in
a session failure, in spite of the fact that a valid and supported
potential configuration was included in the offer.
In general, this can be alleviated by use of mandatory and optional
attribute capabilities in a potential configuration. There are
however cases where permissible SDP values are tied to the use of the
SIP Require header. SIP preconditions [RFC3312] is one such example,
where preconditions with a "mandatory" strength-tag can only be used
when a SIP Require header with the SIP option tag "precondition" is
included. Future SIP extensions that may want to use the SDP
Capability Negotiation framework should avoid such coupling.
3.9. Processing Media before Answer
The offer/answer model [RFC3264] requires an offerer to be able to
receive media in accordance with the offer prior to receiving the
answer. This property is retained with the SDP Capability
Negotiation extensions defined here, but only when the actual
configuration is selected by the answerer. If a potential
configuration is chosen, the offerer may decide not to process any
media received before the answer is received. This may lead to
clipping. Consequently, the SDP Capability Negotiation framework
recommends sending back an answer SDP session description as soon as
The issue can be resolved by introducing a three-way handshake. In
the case of SIP, this can, for example, be done by defining a
precondition [RFC3312] for capability negotiation (or by using an
existing precondition that is known to generate a second offer/answer
exchange before proceeding with the session). However, preconditions
are often viewed as complicated to implement and they may add to
overall session establishment delay by requiring an extra
An alternative three-way handshake can be performed by use of ICE
[RFC5245]. When ICE is being used, and the answerer receives a STUN
Binding Request for any one of the accepted media streams from the
offerer, the answerer knows the offer has received his answer. At
that point, the answerer knows that the offerer will be able to
process incoming media according to the negotiated configuration and
hence he can start sending media without the risk of the offerer
either discarding it or playing garbage.
Please note that, the above considerations notwithstanding, this
document does not place any requirements on the offerer to process
and play media before answer; it merely provides recommendations for
how to ensure that media sent by the answerer and received by the
offerer prior to receiving the answer can in fact be rendered by the
In some use cases, a three-way handshake is not needed. An example
is when the offerer does not need information from the answer, such
as keying material in the SDP session description, in order to
process incoming media. The SDP Capability Negotiation framework
does not define any such solutions; however, extensions may do so.
For example, one technique proposed for best-effort SRTP in [BESRTP]
is to provide different RTP payload type mappings for different
transport protocols used, outside of the actual configuration, while
still allowing them to be used by the answerer (exchange of keying
material is still needed, e.g., inband). The basic SDP Capability
Negotiation framework defined here does not include the ability to do
so; however, extensions that enable that may be defined.
3.10. Indicating Bandwidth Usage
The amount of bandwidth used for a particular media stream depends on
the negotiated codecs, transport protocol and other parameters. For
example the use of Secure RTP [RFC3711] with integrity protection
requires more bandwidth than plain RTP [RFC3551]. SDP defines the
bandwidth ("b=") parameter to indicate the proposed bandwidth for the
session or media stream.
In SDP, as defined by [RFC4566], each media description contains one
transport protocol and one or more codecs. When specifying the
proposed bandwidth, the worst case scenario must be taken into
account, i.e., use of the highest bandwidth codec provided, the
transport protocol indicated, and the worst case (bandwidth-wise)
parameters that can be negotiated (e.g., a 32-bit Hashed Message
Authentication Code (HMAC) or an 80-bit HMAC).
The base SDP Capability Negotiation framework does not provide a way
to negotiate bandwidth parameters. The issue thus remains; however,
it is potentially worse than with SDP per [RFC4566], since it is
easier to negotiate additional codecs, and furthermore possible to
negotiate different transport protocols. The recommended approach
for addressing this is the same as for plain SDP; the worst case (now
including potential configurations) needs to be taken into account
when specifying the bandwidth parameters in the actual configuration.
This can make the bandwidth value less accurate than in SDP per
[RFC4566] (due to potential greater variability in the potential
configuration bandwidth use). Extensions can be defined to address
Note, that when using RTP retransmission [RFC4588] with the RTCP-
based feedback profile [RFC4585] (RTP/AVPF), the retransmitted
packets are part of the media stream bandwidth when using
synchronization source (SSRC) multiplexing. If a feedback-based
protocol is offered as the actual configuration transport protocol, a
non-feedback-based protocol is offered as a potential configuration
transport protocol and ends up being used, the actual bandwidth usage
may be lower than the indicated bandwidth value in the offer (and
3.11. Dealing with Large Number of Potential Configurations
When using the SDP Capability Negotiation, it is easy to generate
offers that contain a large number of potential configurations. For
example, in the offer:
o=- 25678 753849 IN IP4 192.0.2.1
c=IN IP4 192.0.2.1
m=audio 53456 RTP/AVP 0 18
a=tcap:1 RTP/SAVPF RTP/SAVP RTP/AVPF
a=acap:1 crypto:1 AES_CM_128_HMAC_SHA1_80
a=acap:2 key-mgmt:mikey AQAFgM0XflABAAAAAAAAAAAAAAsAyO...
a=acap:3 rtcp-fb:0 nack
a=pcfg:1 t=1 a=1,3|2,3
a=pcfg:2 t=2 a=1|2
a=pcfg:3 t=3 a=3
we have 5 potential configurations on top of the actual configuration
for a single media stream. Adding an extension capability with just
two alternatives for each would double that number (to 10), and doing
the equivalent with two media streams would again double that number
(to 20). While it is easy (and inexpensive) for the offerer to
generate such offers, processing them at the answering side may not
be. Consequently, it is RECOMMENDED that offerers do not create
offers with unnecessarily large number of potential configurations in
On the answering side, implementers MUST take care to avoid excessive
memory and CPU consumption. For example, a naive implementation that
first generates all the valid potential configuration SDP session
descriptions internally, could find itself being memory exhausted,
especially if it supports a large number of endpoints. Similarly, a
naive implementation that simply performs iterative trial-and-error
processing on each possible potential configuration SDP session
description (in the preference order specified) could find itself
being CPU constrained. An alternative strategy is to prune the
search space first by discarding the set of offered potential
configurations where the transport protocol indicated (if any) is not
supported, and/or one or more mandatory attribute capabilities (if
any) are either not supported or not valid. Potential configurations
with unsupported mandatory extension configurations in them can be
discarded as well.
3.12. SDP Capability Negotiation and Intermediaries
An intermediary is here defined as an entity between a SIP user agent
A and a SIP user agent B, that needs to perform some kind of
processing on the SDP session descriptions exchanged between A and B,
in order for the session establishment to operate as intended.
Examples of such intermediaries include Session Border Controllers
(SBCs) that may perform media relaying, Proxy Call Session Control
Functions (P-CSCFs) that may authorize use of a certain amount of
network resources (bandwidth), etc. The presence and design of such
intermediaries may not follow the "Internet" model or the SIP
requirements for proxies (which are not supposed to look in message
bodies such as SDP session descriptions); however, they are a fact of
life in some deployment scenarios and hence deserve consideration.
If the intermediary needs to understand the characteristics of the
media sessions being negotiated, e.g., the amount of bandwidth used
or the transport protocol negotiated, then use of the SDP Capability
Negotiation framework may impact them. For example, some
intermediaries are known to disallow answers where the transport
protocol differs from the one in the offer. Use of the SDP
Capability Negotiation framework in the presence of such
intermediaries could lead to session failures. Intermediaries that
need to authorize use of network resources based on the negotiated
media stream parameters are affected as well. If they inspect only
the offer, then they may authorize parameters assuming a different
transport protocol, codecs, etc., than what is actually being
negotiated. For these, and other, reasons it is RECOMMENDED that
implementers of intermediaries add support for the SDP Capability
The SDP Capability Negotiation framework itself attempts to help out
these intermediaries as well, by recommending a second offer/answer
exchange when use of a potential configuration has been negotiated
(see Section 3.6.3). However, there are several limitations with
this approach. First of all, although the second offer/answer
exchange is RECOMMENDED, it is not required and hence may not be
performed. Secondly, the intermediary may refuse the initial answer,
e.g., due to perceived transport protocol mismatch. Thirdly, the
strategy is not foolproof since the offer/answer procedures [RFC3264]
leave the original offer/answer exchange in effect when a subsequent
one fails. Consider the following example:
1. Offerer generates an SDP session description offer with the actual
configuration specifying a low-bandwidth configuration (e.g.,
plain RTP) and a potential configuration specifying a high(er)
bandwidth configuration (e.g., Secure RTP with integrity).
2. An intermediary (e.g., an SBC or P-CSCF), that does not support
SDP Capability Negotiation, authorizes the session based on the
actual configuration it sees in the SDP session description.
3. The answerer chooses the high(er) bandwidth potential
configuration and generates an answer SDP session description
based on that.
4. The intermediary passes through the answer SDP session
5. The offerer sees the accepted answer, and generates an updated
offer that contains the selected potential configuration as the
actual configuration. In other words, the high(er) bandwidth
configuration (which has already been negotiated successfully) is
now the actual configuration in the offer SDP session description.
6. The intermediary sees the new offer; however, it does not
authorize the use of the high(er) bandwidth configuration, and
consequently generates a rejection message to the offerer.
7. The offerer receives the rejected offer.
After step 7, per RFC 3264, the offer/answer exchange that completed
in step 5 remains in effect; however, the intermediary may not have
authorized the necessary network resources and hence the media stream
may experience quality issues. The solution to this problem is to
upgrade the intermediary to support the SDP Capability Negotiation
3.13. Considerations for Specific Attribute Capabilities
3.13.1. The "rtpmap" and "fmtp" Attributes
The base SDP Capability Negotiation framework defines transport
capabilities and attribute capabilities. Media capabilities, which
can be used to describe media formats and their associated
parameters, are not defined in this document; however, the "rtpmap"
and "fmtp" attributes can nevertheless be used as attribute
capabilities. Using such attribute capabilities in a potential
configuration requires a bit of care though.
The rtpmap parameter binds an RTP payload type to a media format
(e.g., codec). While it is possible to provide rtpmaps for payload
types not found in the corresponding "m=" line, such rtpmaps provide
no value in normal offer/answer exchanges, since only the payload
types found in the "m=" line are part of the offer (or answer). This
applies to the base SDP Capability Negotiation framework as well.
Only the media formats (e.g., RTP payload types) provided in the "m="
line are actually offered; inclusion of "rtpmap" attributes with
other RTP payload types in a potential configuration does not change
this fact and hence they do not provide any useful information there.
They may still be useful as pure capabilities though (outside a
potential configuration) in order to inform a peer of additional
It is possible to provide an "rtpmap" attribute capability with a
payload type mapping to a different codec than a corresponding actual
configuration "rtpmap" attribute for the media description has. Such
practice is permissible as a way of indicating a capability. If that
capability is included in a potential configuration, then delete-
attributes (see Section 3.5.1) MUST be used to ensure that there is
not multiple "rtpmap" attributes for the same payload type in a given
media description (which would not be allowed by SDP [RFC4566]).
Similar considerations and rules apply to the "fmtp" attribute. An
"fmtp" attribute capability for a media format not included in the
"m=" line is useless in a potential configuration (but may be useful
as a capability by itself). An "fmtp" attribute capability in a
potential configuration for a media format that already has an "fmtp"
attribute in the actual configuration may lead to multiple fmtp
format parameters for that media format and that is not allowed by
SDP [RFC4566]. The delete-attributes MUST be used to ensure that
there are not multiple "fmtp" attributes for a given media format in
a media description.
Extensions to the base SDP Capability Negotiation framework may
change the above behavior.
3.13.2. Direction Attributes
SDP defines the "inactive", "sendonly", "recvonly", and "sendrecv"
direction attributes. The direction attributes can be applied at
either the session level or the media level. In either case, it is
possible to define attribute capabilities for these direction
capabilities; if used by a potential configuration, the normal
offer/answer procedures still apply. For example, if an offered
potential configuration includes the "sendonly" direction attribute,
and it is selected as the actual configuration, then the answer MUST
include a corresponding "recvonly" (or "inactive") attribute.
3.14. Relationship to RFC 3407
RFC 3407 defines capability descriptions with limited abilities to
describe attributes, bandwidth parameters, transport protocols and
media formats. RFC 3407 does not define any negotiation procedures
for actually using those capability descriptions.
This document defines new attributes for describing attribute
capabilities and transport capabilities. It also defines procedures
for using those capabilities as part of an offer/answer exchange. In
contrast to RFC 3407, this document does not define bandwidth
parameters, and it also does not define how to express ranges of
values. Extensions to this document may be defined in order to fully
cover all the capabilities provided by RFC 3407 (for example, more
general media capabilities).
It is RECOMMENDED that implementations use the attributes and
procedures defined in this document instead of those defined in
[RFC3407]. If capability description interoperability with legacy
RFC 3407 implementations is desired, implementations MAY include both
RFC 3407 capability descriptions and capabilities defined by this
document. The offer/answer negotiation procedures defined in this
document will not use the RFC 3407 capability descriptions.