Appendix D. Use of SDP for RTSP Session Descriptions
The Session Description Protocol (SDP, [RFC4566]) may be used to
describe streams or presentations in RTSP. This description is
typically returned in reply to a DESCRIBE request on a URI from a
server to a client or received via HTTP from a server to a client.
This appendix describes how an SDP file determines the operation of
an RTSP session. Thus, it is worth pointing out that the
interpretation of the SDP is done in the context of the SDP receiver,
which is the one being configured. This is the same as in SAP
[RFC2974]; this differs from SDP Offer/Answer [RFC3264] where each
SDP is interpreted in the context of the agent providing it.
SDP as is provides no mechanism by which a client can distinguish,
without human guidance, between several media streams to be rendered
simultaneously and a set of alternatives (e.g., two audio streams
spoken in different languages). The SDP extension found in "The
Session Description Protocol (SDP) Grouping Framework" [RFC5888]
provides such functionality to some degree. Appendix D.4 describes
the usage of SDP media line grouping for RTSP.
The terms "session-level", "media-level", and other key/attribute
names and values used in this appendix are to be used as defined in
D.1.1. Control URI
The "a=control" attribute is used to convey the control URI. This
attribute is used both for the session and media descriptions. If
used for individual media, it indicates the URI to be used for
controlling that particular media stream. If found at the session
level, the attribute indicates the URI for aggregate control
(presentation URI). The session-level URI MUST be different from any
media-level URI. The presence of a session-level control attribute
MUST be interpreted as support for aggregated control. The control
attribute MUST be present on the media level unless the presentation
only contains a single media stream; in which case, the attribute MAY
be present on the session level only and then also apply to that
single media stream.
ABNF for the attribute is defined in Section 20.3.
This attribute MAY contain either relative or absolute URIs,
following the rules and conventions set out in RFC 3986 [RFC3986].
Implementations MUST look for a base URI in the following order:
1. the RTSP Content-Base field;
2. the RTSP Content-Location field;
3. the RTSP Request-URI.
If this attribute contains only an asterisk (*), then the URI MUST be
treated as if it were an empty embedded URI; thus, it will inherit
the entire base URI.
Note: RFC 2326 was very unclear on the processing of relative URIs
and several RTSP 1.0 implementations at the point of publishing
this document did not perform RFC 3986 processing to determine the
resulting URI; instead, simple concatenation is common. To avoid
this issue completely, it is recommended to use absolute URIs in
The URI handling for SDPs from container files needs special
consideration. For example, let's assume that a container file has
the URI: "rtsp://example.com/container.mp4". Let's further assume
this URI is the base URI and that there is an absolute media-level
URI: "rtsp://example.com/container.mp4/trackID=2". A relative media-
level URI that resolves in accordance with RFC 3986 [RFC3986] to the
above given media URI is "container.mp4/trackID=2". It is usually
not desirable to need to include or modify the SDP stored within the
container file with the server local name of the container file. To
avoid this, one can modify the base URI used to include a trailing
slash, e.g., "rtsp://example.com/container.mp4/". In this case, the
relative URI for the media will only need to be "trackID=2".
However, this will also mean that using "*" in the SDP will result in
the control URI including the trailing slash, i.e.,
Note: the usage of TrackID in the above is not a standardized
form, but one example out of several similar strings such as
TrackID, Track_ID, StreamID that is used by different server
vendors to indicate a particular piece of media inside a container
D.1.2. Media Streams
The "m=" field is used to enumerate the streams. It is expected that
all the specified streams will be rendered with appropriate
synchronization. If the session is over multicast, the port number
indicated SHOULD be used for reception. The client MAY try to
override the destination port, through the Transport header. The
servers MAY allow this: the response will indicate whether or not
this is allowed. If the session is unicast, the port numbers are the
ones RECOMMENDED by the server to the client, about which receiver
ports to use; the client MUST still include its receiver ports in its
SETUP request. The client MAY ignore this recommendation. If the
server has no preference, it SHOULD set the port number value to
The "m=" lines contain information about which transport protocol,
profile, and possibly lower-layer are to be used for the media
stream. The combination of transport, profile, and lower layer, like
RTP/AVP/UDP, needs to be defined for how to be used with RTSP. The
currently defined combinations are discussed in Appendix C; further
combinations MAY be specified.
m=audio 0 RTP/AVP 31
D.1.3. Payload Type(s)
The payload type or types are specified in the "m=" line. In case
the payload type is a static payload type from RFC 3551 [RFC3551], no
other information may be required. In case it is a dynamic payload
type, the media attribute "rtpmap" is used to specify what the media
is. The "encoding name" within the "rtpmap" attribute may be one of
those specified in [RFC4856], a media type registered with IANA
according to [RFC4855], or an experimental encoding as specified in
SDP [RFC4566]). Codec-specific parameters are not specified in this
field, but rather in the "fmtp" attribute described below.
The selection of the RTP payload type numbers used may be required to
consider RTP and RTCP Multiplexing [RFC5761], if that is to be
supported by the server.
D.1.4. Format-Specific Parameters
Format-specific parameters are conveyed using the "fmtp" media
attribute. The syntax of the "fmtp" attribute is specific to the
encoding(s) to which the attribute refers. Note that some of the
format-specific parameters may be specified outside of the "fmtp"
parameters, for example, like the "ptime" attribute for most audio
D.1.5. Directionality of Media Stream
The SDP attributes "a=sendrecv", "a=recvonly", and "a=sendonly"
provide instructions about the direction the media streams flow
within a session. When using RTSP, the SDP can be delivered to a
client using either RTSP DESCRIBE or a number of RTSP external
methods, like HTTP, FTP, and email. Based on this, the SDP applies
to how the RTSP client will see the complete session. Thus, media
streams delivered from the RTSP server to the client would be given
the "a=recvonly" attribute.
"a=recvonly" in an SDP provided to the RTSP client indicates that
media delivery will only occur in the direction from the RTSP server
to the client. SDP provided to the RTSP client that lacks any of the
directionality attributes ("a=recvonly", "a=sendonly", "a=sendrecv")
would be interpreted as having "a=sendrecv". At the time of writing,
there exists no RTSP mode suitable for media traffic in the direction
from the RTSP client to the server. Thus, all RTSP SDP SHOULD have
an "a=recvonly" attribute when using the PLAY mode defined in this
document. If future modes are defined for media in the client-to-
server direction, then usage of "a=sendonly" or "a=sendrecv" may
become suitable to indicate intended media directions.
D.1.6. Range of Presentation
The "a=range" attribute defines the total time range of the stored
session or an individual media. Live sessions that are not seekable
can be indicated as specified below; whereas the length of live
sessions can be deduced from the "t=" and "r=" SDP parameters.
The attribute is both a session- and a media-level attribute. For
presentations that contain media streams of the same duration, the
range attribute SHOULD only be used at the session level. In case of
different lengths, the range attribute MUST be given at media level
for all media and SHOULD NOT be given at the session level. If the
attribute is present at both media level and session level, the
media-level values MUST be used.
Note: usually one will specify the same length for all media, even if
there isn't media available for the full duration on all media.
However, that requires that the server accept PLAY requests within
Servers MUST take care to provide RTSP Range (see Section 18.40)
values that are consistent with what is presented in the SDP for the
content. There is no reason for non dynamic content, like media
clips provided on demand to have inconsistent values. Inconsistent
values between the SDP and the actual values for the content handled
by the server is likely to generate some failure, like 457 "Invalid
Range", in case the client uses PLAY requests with a Range header.
In case the content is dynamic in length and it is infeasible to
provide a correct value in the SDP, the server is recommended to
describe this as content that is not seekable (see below). The
server MAY override that property in the response to a PLAY request
using the correct values in the Range header.
The unit is specified first, followed by the value range. The units
and their values are as defined in Section 4.4.1, Section 4.4.2, and
Section 4.4.3 and MAY be extended with further formats. Any open-
ended range (start-), i.e., without stop range, is of unspecified
duration and MUST be considered as content that is not seekable
unless this property is overridden. Multiple instances carrying
different clock formats MAY be included at either session or media
ABNF for the attribute is defined in Section 20.3.
Non-seekable stream of unknown duration:
D.1.7. Time of Availability
The "t=" field defines when the SDP is valid. For on-demand content,
the server SHOULD indicate a stop time value for which it guarantees
the description to be valid and a start time that is equal to or
before the time at which the DESCRIBE request was received. It MAY
also indicate start and stop times of 0, meaning that the session is
For sessions that are of live type, i.e., specific start time,
unknown stop time, likely not seekable, the "t=" and "r=" field
SHOULD be used to indicate the start time of the event. The stop
time SHOULD be given so that the live event will have ended at that
time, while still not being unnecessary far into the future.
D.1.8. Connection Information
In SDP used with RTSP, the "c=" field contains the destination
address for the media stream. If a multicast address is specified,
the client SHOULD use this address in any SETUP request as
destination address, including any additional parameters, such as
TTL. For on-demand unicast streams and some multicast streams, the
destination address MAY be specified by the client via the SETUP
request, thus overriding any specified address. To identify streams
without a fixed destination address, where the client is required to
specify a destination address, the "c=" field SHOULD be set to a null
value. For addresses of type "IP4", this value MUST be "0.0.0.0";
and for type "IP6", this value MUST be "0:0:0:0:0:0:0:0" (can also be
written as "::"), i.e., the unspecified address according to RFC 4291
D.1.9. Message Body Tag
The optional "a=mtag" attribute identifies a version of the session
description. It is opaque to the client. SETUP requests may include
this identifier in the If-Match field (see Section 18.24) to allow
session establishment only if this attribute value still corresponds
to that of the current description. The attribute value is opaque
and may contain any character allowed within SDP attribute values.
ABNF for the attribute is defined in Section 20.3.
One could argue that the "o=" field provides identical
functionality. However, it does so in a manner that would put
constraints on servers that need to support multiple session
description types other than SDP for the same piece of media
D.2. Aggregate Control Not Available
If a presentation does not support aggregate control, no session-
level "a=control" attribute is specified. For an SDP with multiple
media sections specified, each section will have its own control URI
specified via the "a=control" attribute.
o=- 2890844256 2890842807 IN IP4 192.0.2.56
s=I came from a web page
c=IN IP4 0.0.0.0
m=video 8002 RTP/AVP 31
m=audio 8004 RTP/AVP 3
Note that the position of the control URI in the description implies
that the client establishes separate RTSP control sessions to the
servers audio.example.com and video.example.com.
It is recommended that an SDP file contain the complete media-
initialization information even if it is delivered to the media
client through non-RTSP means. This is necessary as there is no
mechanism to indicate that the client should request more detailed
media stream information via DESCRIBE.
D.3. Aggregate Control Available
In this scenario, the server has multiple streams that can be
controlled as a whole. In this case, there are both a media-level
"a=control" attribute, which is used to specify the stream URIs, and
a session-level "a=control" attribute, which is used as the Request-
URI for aggregate control. If the media-level URI is relative, it is
resolved to absolute URIs according to Appendix D.1.1 above.
C->M: DESCRIBE rtsp://example.com/movie RTSP/2.0
M->C: RTSP/2.0 200 OK
Date: Wed, 23 Jan 2013 15:36:52 +0000
Expires: Wed, 23 Jan 2013 16:36:52 +0000
o=- 2890844256 2890842807 IN IP4 192.0.2.211
c=IN IP4 0.0.0.0
m=video 8002 RTP/AVP 31
m=audio 8004 RTP/AVP 3
In this example, the client is recommended to establish a single RTSP
session to the server, and it uses the URIs rtsp://example.com/movie/
trackID=1 and rtsp://example.com/movie/trackID=2 to set up the video
and audio streams, respectively. The URI rtsp://example.com/movie/,
which is resolved from the "*", controls the whole presentation
A client is not required to issue SETUP requests for all streams
within an aggregate object. Servers should allow the client to ask
for only a subset of the streams.
D.4. Grouping of Media Lines in SDP
For some types of media, it is desirable to express a relationship
between various media components, for instance, for lip
synchronization or Scalable Video Codec (SVC) [RFC5583]. This
relationship is expressed on the SDP level by grouping of media
lines, as described in [RFC5888], and can be exposed to RTSP.
For RTSP, it is mainly important to know how to handle grouped media
received by means of SDP, i.e., if the media are under aggregate
control (see Appendix D.3) or if aggregate control is not available
(see Appendix D.2).
It is RECOMMENDED that grouped media are handled by aggregate
control, to give the client the ability to control either the whole
presentation or single media.
D.5. RTSP External SDP Delivery
There are some considerations that need to be made when the session
description is delivered to the client outside of RTSP, for example
via HTTP or email.
First of all, the SDP needs to contain absolute URIs, since relative
will, in most cases, not work as the delivery will not correctly
forward the base URI.
The writing of the SDP session availability information, i.e., "t="
and "r=", needs to be carefully considered. When the SDP is fetched
by the DESCRIBE method, the probability that it is valid is very
high. However, the same is much less certain for SDPs distributed
using other methods. Therefore, the publisher of the SDP should take
care to follow the recommendations about availability in the SDP
specification [RFC4566] in Section 4.2.
Appendix E. RTSP Use Cases
This appendix describes the most important and considered use cases
for RTSP. They are listed in descending order of importance in
regard to ensuring that all necessary functionality is present. This
specification only fully supports usage of the two first. Also, in
these first two cases, there are special cases or exceptions that are
not supported without extensions, e.g., the redirection of media
delivery to an address other than the controlling agent's (client's).
E.1. On-Demand Playback of Stored Content
An RTSP-capable server stores content suitable for being streamed to
a client. A client desiring playback of any of the stored content
uses RTSP to set up the media transport required to deliver the
desired content. RTSP is then used to initiate, halt, and manipulate
the actual transmission (playout) of the content. RTSP is also
required to provide the necessary description and synchronization
information for the content.
The above high-level description can be broken down into a number of
functions of which RTSP needs to be capable.
Presentation Description: Provide initialization information about
the presentation (content); for example, which media codecs are
needed for the content. Other information that is important
includes the number of media streams the presentation contains,
the transport protocols used for the media streams, and
identifiers for these media streams. This information is
required before setup of the content is possible and to
determine if the client is even capable of using the content.
This information need not be sent using RTSP; other external
protocols can be used to transmit the transport presentation
descriptions. Two good examples are the use of HTTP [RFC7230]
or email to fetch or receive presentation descriptions like SDP
Setup: Set up some or all of the media streams in a presentation.
The setup itself consists of selecting the protocol for media
transport and the necessary parameters for the protocol, like
addresses and ports.
Control of Transmission: After the necessary media streams have been
established, the client can request the server to start
transmitting the content. The client must be allowed to start
or stop the transmission of the content at arbitrary times.
The client must also be able to start the transmission at any
point in the timeline of the presentation.
Synchronization: For media-transport protocols like RTP [RFC3550],
it might be beneficial to carry synchronization information
within RTSP. This may be due to either the lack of inter-media
synchronization within the protocol itself or the potential
delay before the synchronization is established (which is the
case for RTP when using RTCP).
Termination: Terminate the established contexts.
For this use case, there are a number of assumptions about how it
works. These are:
On-Demand content: The content is stored at the server and can be
accessed at any time during a time period when it is intended
to be available.
Independent sessions: A server is capable of serving a number of
clients simultaneously, including from the same piece of
content at different points in that presentations timeline.
Unicast Transport: Content for each individual client is transmitted
to them using unicast traffic.
It is also possible to redirect the media traffic to a different
destination than that of the agent controlling the traffic. However,
allowing this without appropriate mechanisms for checking that the
destination approves of this allows for Distributed DoS (DDoS).
E.2. Unicast Distribution of Live Content
This use case is similar to the above on-demand content case (see
Appendix E.1), the difference is the nature of the content itself.
Live content is continuously distributed as it becomes available from
a source; i.e., the main difference from on-demand is that one starts
distributing content before the end of it has become available to the
In many cases, the consumer of live content is only interested in
consuming what actually happens "now"; i.e., very similar to
broadcast TV. However, in this case, it is assumed that there exists
no broadcast or multicast channel to the users, and instead the
server functions as a distribution node, sending the same content to
multiple receivers, using unicast traffic between server and client.
This unicast traffic and the transport parameters are individually
negotiated for each receiving client.
Another aspect of live content is that it often has a very limited
time of availability, as it is only available for the duration of the
event the content covers. An example of such live content could be a
music concert that lasts two hours and starts at a predetermined
time. Thus, there is a need to announce when and for how long the
live content is available.
In some cases, the server providing live content may be saving some
or all of the content to allow clients to pause the stream and resume
it from the paused point, or to "rewind" and play continuously from a
point earlier than the live point. Hence, this use case does not
necessarily exclude playing from other than the live point of the
stream, playing with scales other than 1.0, etc.
E.3. On-Demand Playback Using Multicast
It is possible to use RTSP to request that media be delivered to a
multicast group. The entity setting up the session (the controller)
will then control when and what media is delivered to the group.
This use case has some potential for DoS attacks by flooding a
multicast group. Therefore, a mechanism is needed to indicate that
the group actually accepts the traffic from the RTSP server.
An open issue in this use case is how one ensures that all receivers
listening to the multicast or broadcast receives the session
presentation configuring the receivers. This specification has to
rely on an external solution to solve this issue.
E.4. Inviting an RTSP Server into a Conference
If one has an established conference or group session, it is possible
to have an RTSP server distribute media to the whole group.
Transmission to the group is simplest when controlled by a single
participant or leader of the conference. Shared control might be
possible, but would require further investigation and possibly
This use case assumes that there exists either a multicast or a
conference focus that redistributes media to all participants.
This use case is intended to be able to handle the following
scenario: a conference leader or participant (hereafter called the
"controller") has some pre-stored content on an RTSP server that he
wants to share with the group. The controller sets up an RTSP
session at the streaming server for this content and retrieves the
session description for the content. The destination for the media
content is set to the shared multicast group or conference focus.
When desired by the controller, he/she can start and stop the
transmission of the media to the conference group.
There are several issues with this use case that are not solved by
this core specification for RTSP:
DoS: To avoid an RTSP server from being an unknowing participant in
a DoS attack, the server needs to be able to verify the
destination's acceptance of the media. Such a mechanism to
verify the approval of received media does not yet exist;
instead, only policies can be used, which can be made to work
in controlled environments.
Distributing the presentation description to all participants in the
To enable a media receiver to correctly decode the content,
the media configuration information needs to be distributed
reliably to all participants. This will most likely require
support from an external protocol.
Passing control of the session: If it is desired to pass control
of the RTSP session between the participants, some support
will be required by an external protocol to exchange state
information and possibly floor control of who is controlling
the RTSP session.
E.5. Live Content Using Multicast
This use case in its simplest form does not require any use of RTSP
at all; this is what multicast conferences being announced with SAP
[RFC2974] and SDP are intended to handle. However, in use cases
where more advanced features like access control to the multicast
session are desired, RTSP could be used for session establishment.
A client desiring to join a live multicasted media session with
cryptographic (encryption) access control could use RTSP in the
following way. The source of the session announces the session and
gives all interested an RTSP URI. The client connects to the server
and requests the presentation description, allowing configuration for
reception of the media. In this step, it is possible for the client
to use secured transport and any desired level of authentication; for
example, for billing or access control. An RTSP link also allows for
load balancing between multiple servers.
If these were the only goals, they could be achieved by simply using
HTTP. However, for cases where the sender likes to keep track of
each individual receiver of a session, and possibly use the session
as a side channel for distributing key-updates or other information
on a per-receiver basis, and the full set of receivers is not known
prior to the session start, the state establishment that RTSP
provides can be beneficial. In this case, a client would establish
an RTSP session for this multicast group with the RTSP server. The
RTSP server will not transmit any media, but instead will point to
the multicast group. The client and server will be able to keep the
session alive for as long as the receiver participates in the session
thus enabling, for example, the server to push updates to the client.
This use case will most likely not be able to be implemented without
some extensions to the server-to-client push mechanism. Here the
PLAY_NOTIFY method (see Section 13.5) with a suitable extension could
provide clear benefits.
Appendix F. Text Format for Parameters
A resource of type "text/parameters" consists of either 1) a list of
parameters (for a query) or 2) a list of parameters and associated
values (for a response or setting of the parameter). Each entry of
the list is a single line of text. Parameters are separated from
values by a colon. The parameter name MUST only use US-ASCII visible
characters while the values are UTF-8 text strings. The media type
registration form is in Section 22.16.
There is a potential interoperability issue for this format. It was
named in RFC 2326 but never defined, even if used in examples that
hint at the syntax. This format matches the purpose and its syntax
supports the examples provided. However, it goes further by allowing
UTF-8 in the value part; thus, usage of UTF-8 strings may not be
supported. However, as individual parameters are not defined, the
implementing application needs to have out-of-band agreement or using
feature tag anyway to determine if the endpoint supports the
The ABNF [RFC5234] grammar for "text/parameters" content is:
file = *((parameter / parameter-value) CRLF)
parameter = 1*visible-except-colon
parameter-value = parameter *WSP ":" value
visible-except-colon = %x21-39 / %x3B-7E ; VCHAR - ":"
value = *(TEXT-UTF8char / WSP)
TEXT-UTF8char = <as defined in Section 20.1>
WSP = <See RFC 5234> ; Space or HTAB
VCHAR = <See RFC 5234>
CRLF = <See RFC 5234>
Appendix G. Requirements for Unreliable Transport of RTSP
This appendix provides guidance for those who want to implement RTSP
messages over unreliable transports as has been defined in RTSP 1.0
[RFC2326]. RFC 2326 defined the "rtspu" URI scheme and provided some
basic information for the transport of RTSP messages over UDP. The
information is being provided here as there has been at least one
commercial implementation and compatibility with that should be
The following points should be considered for an interoperable
o Requests shall be acknowledged by the receiver. If there is no
acknowledgement, the sender may resend the same message after a
timeout of one round-trip time (RTT). Any retransmissions due to
lack of acknowledgement must carry the same sequence number as the
o The RTT can be estimated as in TCP (RFC 6298) [RFC6298], with an
initial round-trip value of 500 ms. An implementation may cache
the last RTT measurement as the initial value for future
o The Timestamp header (Section 18.53) is used to avoid the
retransmission ambiguity problem [Stevens98].
o The registered default port for RTSP over UDP for the server is
o RTSP messages can be carried over any lower-layer transport
protocol that is 8-bit clean.
o RTSP messages are vulnerable to bit errors and should not be
subjected to them.
o Source authentication, or at least validation that RTSP messages
comes from the same entity becomes extremely important, as session
hijacking may be substantially easier for RTSP message transport
using an unreliable protocol like UDP than for TCP.
There are two RTSP headers that are primarily intended for being used
by the unreliable handling of RTSP messages and which will be
o CSeq: See Section 18.20. It should be noted that the CSeq header
is also required to match requests and responses independent
whether a reliable or unreliable transport is used.
o Timestamp: See Section 18.53Appendix H. Backwards-Compatibility Considerations
This section contains notes on issues about backwards compatibility
with clients or servers being implemented according to RFC 2326
[RFC2326]. Note that there exists no requirement to implement RTSP
1.0; in fact, this document recommends against it as it is difficult
to do in an interoperable way.
A server implementing RTSP 2.0 MUST include an RTSP-Version of
"RTSP/2.0" in all responses to requests containing RTSP-Version value
of "RTSP/2.0". If a server receives an RTSP 1.0 request, it MAY
respond with an RTSP 1.0 response if it chooses to support RFC 2326.
If the server chooses not to support RFC 2326, it MUST respond with a
505 (RTSP Version Not Supported) status code. A server MUST NOT
respond to an RTSP 1.0 request with an RTSP 2.0 response.
Clients implementing RTSP 2.0 MAY use an OPTIONS request with an
RTSP-Version of "RTSP/2.0" to determine whether a server supports
RTSP 2.0. If the server responds with either an RTSP-Version of
"RTSP/1.0" or a status code of 505 (RTSP Version Not Supported), the
client will have to use RTSP 1.0 requests if it chooses to support
H.1. Play Request in Play State
The behavior in the server when a Play is received in Play state has
changed (Section 13.4). In RFC 2326, the new PLAY request would be
queued until the current Play completed. Any new PLAY request now
takes effect immediately replacing the previous request.
H.2. Using Persistent Connections
Some server implementations of RFC 2326 maintain a one-to-one
relationship between a connection and an RTSP session. Such
implementations require clients to use a persistent connection to
communicate with the server and when a client closes its connection,
the server may remove the RTSP session. This is worth noting if an
RTSP 2.0 client also supporting 1.0 connects to a 1.0 server.
Appendix I. Changes
This appendix briefly lists the differences between RTSP 1.0
[RFC2326] and RTSP 2.0 for an informational purpose. For
implementers of RTSP 2.0, it is recommended to read carefully through
this memo and not to rely on the list of changes below to adapt from
RTSP 1.0 to RTSP 2.0, as RTSP 2.0 is not intended to be backwards
compatible with RTSP 1.0 [RFC2326] other than the version negotiation
I.1. Brief Overview
The following protocol elements were removed in RTSP 2.0 compared to
o the RECORD and ANNOUNCE methods and all related functionality
(including 201 (Created) and 250 (Low On Storage Space) status
o the use of UDP for RTSP message transport (due to missing interest
and to broken specification);
o the use of PLAY method for keep-alive in Play state.
The following protocol elements were added or changed in RTSP 2.0
compared to RTSP 1.0:
o RTSP session TEARDOWN from the server to the client;
o IPv6 support;
o extended IANA registries (e.g., transport headers parameters,
transport-protocol, profile, lower-transport, and mode);
o request pipelining for quick session start-up;
o fully reworked state machine;
o RTSP messages now use URIs rather than URLs;
o incorporated much of related HTTP text ([RFC2616]) in this memo,
compared to just referencing the sections in HTTP, to avoid
o the REDIRECT method was expanded and diversified for different
o Includes a new section about how to set up different media-
transport alternatives and their profiles in addition to lower-
layer protocols. This caused the appendix on RTP interaction to
be moved to the new section instead of being in the part that
describes RTP. The section also includes guidelines what to
consider when writing usage guidelines for new protocols and
o Added an asynchronous notification method PLAY_NOTIFY. This
method is used by the RTSP server to asynchronously notify clients
about session changes while in Play state. To a limited extent,
this is comparable with some implementations of ANNOUNCE in RTSP
1.0 not intended for Recording.
I.2. Detailed List of Changes
The below changes have been made to RTSP 1.0 (RFC 2326) when defining
RTSP 2.0. Note that this list does not reflect minor changes in
wording or correction of typographical errors.
o The section on minimal implementation was deleted. Instead, the
main part of the specification defines the core of RTSP 2.0.
o The Transport header has been changed in the following ways:
* The ABNF has been changed to define that extensions are
possible and that unknown parameters result in servers ignoring
the transport specification.
* To prevent backwards compatibility issues, any extension or new
parameter requires the usage of a feature tag combined with the
* Syntax ambiguities with the Mode parameter have been resolved.
* Syntax error with ";" for multicast and unicast has been
* Two new addressing parameters have been defined: src_addr and
dest_addr. These replace the parameters "port", "client_port",
"server_port", "destination", and "source".
* Support for IPv6 explicit addresses in all address fields has
* To handle URI definitions that contain ";" or ",", a quoted-URI
format has been introduced and is required.
* IANA registries for the transport header parameters, transport-
protocol, profile, lower-transport, and mode have been defined.
* The Transport header's interleaved parameter's text was made
more strict and uses formal requirements levels. It was also
clarified that the interleaved channels are symmetric and that
it is the server that sets the channel numbers.
* It has been clarified that the client can't request of the
server to use a certain RTP SSRC, using a request with the
transport parameter SSRC.
* Syntax definition for SSRC has been clarified to require 8HEX.
It has also been extended to allow multiple values for clients
supporting this version.
* Clarified the text on the Transport header's "dest_addr"
parameters regarding what security precautions the server is
required to perform.
o The Range formats have been changed in the following way:
* The NPT format has been given an initial NPT identifier that
must now be used.
* All formats now support initial open-ended formats of type
"npt=-10" and also format only "Range: smpte" ranges for usage
with GET_PARAMETER requests.
* The npt-hhmmss notation now follows ISO 8601 more strictly.
o RTSP message handling has been changed in the following ways:
* RTSP messages now use URIs rather than URLs.
* It has been clarified that a 4xx message due to a missing CSeq
header shall be returned without a CSeq header.
* The 300 (Multiple Choices) response code has been removed.
* Rules for how to handle the timing out RTSP messages have been
* Extended Pipelining rules allowing for quick session startup.
* Sequence numbering and proxy handling of sequence numbers have
been defined, including cases when responses arrive out of
o The HTTP references have been updated to first RFCs 2616 and 2617
and then to RFC 7230-7235. Most of the text has been copied and
then altered to fit RTSP into this specification. The Public and
the Content-Base headers have also been imported from RFC 2068 so
that they are defined in the RTSP specification. Known effects on
RTSP due to HTTP clarifications:
* Content-Encoding header can include encoding of type
o The state machine section has been completely rewritten. It now
includes more details and is also more clear about the model used.
o An IANA section has been included that contains a number of
registries and their rules. This will allow us to use IANA to
keep track of RTSP extensions.
o The transport of RTSP messages has seen the following changes:
* The use of UDP for RTSP message transport has been deprecated
due to missing interest and to broken specification.
* The rules for how TCP connections are to be handled have been
clarified. Now it is made clear that servers should not close
the TCP connection unless they have been unused for significant
* Strong recommendations why servers and clients should use
persistent connections have also been added.
* There is now a requirement on the servers to handle non-
persistent connections as this provides fault tolerance.
* Added wording on the usage of Connection:Close for RTSP.
* Specified usage of TLS for RTSP messages, including a scheme to
approve a proxy's TLS connection to the next hop.
o The following header-related changes have been made:
* Accept-Ranges response-header has been added. This header
clarifies which range formats can be used for a resource.
* Fixed the missing definitions for the Cache-Control header.
Also added to the syntax definition the missing delta-seconds
for max-stale and min-fresh parameters.
* Put requirement on CSeq header that the value is increased by
one for each new RTSP request. A recommendation to start at 0
has also been added.
* Added a requirement that the Date header must be used for all
messages with a message body and the Server should always
* Removed the possibility of using Range header with Scale header
to indicate when it is to be activated, since it can't work as
defined. Also, added a rule that lack of Scale header in a
response indicates lack of support for the header. feature
tags for scaled playback have been defined.
* The Speed header must now be responded to in order to indicate
support and the actual speed going to be used. A feature tag
is defined. Notes on congestion control were also added.
* The Supported header was borrowed from SIP [RFC3261] to help
with the feature negotiation in RTSP.
* Clarified that the Timestamp header can be used to resolve
* The Session header text has been expanded with an explanation
on keep-alive and which methods to use. SET_PARAMETER is now
recommended to use if only keep-alive within RTSP is desired.
* It has been clarified how the Range header formats are used to
indicate pause points in the PAUSE response.
* Clarified that RTP-Info URIs that are relative use the Request-
URI as base URI. Also clarified that the used URI must be the
one that was used in the SETUP request. The URIs are now also
required to be quoted. The header also expresses the SSRC for
the provided RTP timestamp and sequence number values.
* Added text that requires the Range to always be present in PLAY
responses. Clarified what should be sent in case of live
* The headers table has been updated using a structure borrowed
from SIP. Those tables convey much more information and should
provide a good overview of the available headers.
* It has been clarified that any message with a message body is
required to have a Content-Length header. This was the case in
RFC 2326, but could be misinterpreted.
* ETag has changed its name to MTag.
* To resolve functionality around MTag, the MTag and If-None-
Match header have been added from HTTP with necessary
clarification in regard to RTSP operation.
* Imported the Public header from HTTP (RFC 2068 [RFC2068]) since
it has been removed from HTTP due to lack of use. Public is
used quite frequently in RTSP.
* Clarified rules for populating the Public header so that it is
an intersection of the capabilities of all the RTSP agents in a
* Added the Media-Range header for listing the current
availability of the media range.
* Added the Notify-Reason header for giving the reason when
sending PLAY_NOTIFY requests.
* A new header Seek-Style has been defined to direct and inform
how any seek operation should/have been performed.
o The Protocol Syntax has been changed in the following way:
* All ABNF definitions are updated according to the rules defined
in RFC 5234 [RFC5234] and have been gathered in a separate
section (Section 20).
* The ABNF for the User-Agent and Server headers have been
* Some definitions in the introduction regarding the RTSP session
have been changed.
* The protocol has been made fully IPv6 capable.
* The CHAR rule has been changed to exclude NULL.
o The Status codes have been changed in the following ways:
* The use of status code 303 (See Other) has been deprecated as
it does not make sense to use in RTSP.
* The never-defined status code 411 "Length Required" has been
* When sending response 451 (Parameter Not Understood) and 458
(Parameter Is Read-Only), the response body should contain the
* Clarification on when a 3rr redirect status code can be
received has been added. This includes receiving 3rr as a
result of a request within an established session. This
provides clarification to a previous unspecified behavior.
* Removed the 201 (Created) and 250 (Low On Storage Space) status
codes as they are only relevant to recording, which is
* Several new status codes have been defined: 464 (Data Transport
Not Ready Yet), 465 (Notification Reason Unknown), 470
(Connection Authorization Required), 471 (Connection
Credentials Not Accepted), and 472 (Failure to Establish Secure
o The following functionality has been deprecated from the protocol:
* The use of Queued Play.
* The use of PLAY method for keep-alive in Play state.
* The RECORD and ANNOUNCE methods and all related functionality.
Some of the syntax has been removed.
* The possibility to use timed execution of methods with the time
parameter in the Range header.
* The description on how rtspu works is not part of the core
specification and will require external description. Only that
it exists is mentioned here and some requirements for the
transport are provided.
o The following changes have been made in relation to methods:
* The OPTIONS method has been clarified with regard to the use of
the Public and Allow headers.
* Added text clarifying the usage of SET_PARAMETER for keep-alive
and usage without a body.
* PLAY method is now allowed to be pipelined with the pipelining
of one or more SETUP requests following the initial that
generates the session for aggregated control.
* REDIRECT has been expanded and diversified for different
* Added a new method PLAY_NOTIFY. This method is used by the
RTSP server to asynchronously notify clients about session
o Wrote a new section about how to set up different media-transport
alternatives and their profiles as well as lower-layer protocols.
This caused the appendix on RTP interaction to be moved to the new
section instead of being in the part that describes RTP. The new
section also includes guidelines what to consider when writing
usage guidelines for new protocols and profiles.
o Setup and usage of independent TCP connections for transport of
RTP has been specified.
o Added a new section describing the available mechanisms to
determine if functionality is supported, called "Capability
Handling". Renamed option-tags to feature tags.
o Added a Contributors section with people who have contributed
actual text to the specification.
o Added a section "Use Cases" that describes the major use cases for
o Clarified the usage of a=range and how to indicate live content
that are not seekable with this header.
o Text specifying the special behavior of PLAY for live content.
o Security features of RTSP have been clarified:
* HTTP-based authorization has been clarified requiring both
Basic and Digest support
* TLS support has been mandated
* If one implements RTP, then SRTP and defined MIKEY-based key-
exchange must be supported
* Various minor mitigations discussed or resulted in protocol
This memorandum defines RTSP version 2.0, which is a revision of the
Proposed Standard RTSP version 1.0 defined in [RFC2326]. The authors
of RFC 2326 are Henning Schulzrinne, Anup Rao, and Robert Lanphier.
Both RTSP version 1.0 and RTSP version 2.0 borrow format and
descriptions from HTTP/1.1.
Robert Sparks and especially Elwyn Davies provided very valuable and
detailed reviews in the IETF Last Call that greatly improved the
document and resolved many issues, especially regarding consistency.
This document has benefited greatly from the comments of all those
participating in the MMUSIC WG. In addition to those already
mentioned, the following individuals have contributed to this
Rahul Agarwal, Claudio Allocchio, Jeff Ayars, Milko Boic, Torsten
Braun, Brent Browning, Bruce Butterfield, Steve Casner, Maureen
Chesire, Jinhang Choi, Francisco Cortes, Elwyn Davies, Spencer
Dawkins, Kelly Djahandari, Martin Dunsmuir, Adrian Farrel, Stephen
Farrell, Ross Finlayson, Eric Fleischman, Jay Geagan, Andy Grignon,
Christian Groves, V. Guruprasad, Peter Haight, Mark Handley, Brad
Hefta-Gaub, Volker Hilt, John K. Ho, Patrick Hoffman, Go Hori,
Philipp Hoschka, Anne Jones, Ingemar Johansson, Jae-Hwan Kim, Anders
Klemets, Ruth Lang, Barry Leiba, Stephanie Leif, Jonathan Lennox,
Eduardo F. Llach, Chris Lonvick, Xavier Marjou, Thomas Marshall, Rob
McCool, Martti Mela, David Oran, Joerg Ott, Joe Pallas, Maria
Papadopouli, Sujal Patel, Ema Patki, Alagu Periyannan, Colin Perkins,
Pekka Pessi, Igor Plotnikov, Pete Resnick, Peter Saint-Andre, Holger
Schmidt, Jonathan Sergent, Pinaki Shah, David Singer, Lior Sion, Jeff
Smith, Alexander Sokolsky, Dale Stammen, John Francis Stracke, Geetha
Srikantan, Scott Taylor, David Walker, Stephan Wenger, Dale R.
Worley, and Byungjo Yoon, and especially Flemming Andreasen.
The following people have made written contributions that were
included in the specification:
o Tom Marshall contributed text on the usage of 3rr status codes.
o Thomas Zheng contributed text on the usage of the Range in PLAY
responses and proposed an earlier version of the PLAY_NOTIFY
o Sean Sheedy contributed text on the timeout behavior of RTSP
messages and connections, the 463 (Destination Prohibited) status
code, and proposed an earlier version of the PLAY_NOTIFY method.
o Greg Sherwood proposed an earlier version of the PLAY_NOTIFY
o Fredrik Lindholm contributed text about the RTSP security
o John Lazzaro contributed the text for RTP over Independent TCP.
o Aravind Narasimhan contributed by rewriting "Media-Transport
Alternatives" (Appendix C) and making editorial improvements on a
number of places in the specification.
o Torbjorn Einarsson has done some editorial improvements of the
1214 Amsterdam Avenue
New York, NY 10027
United States of America
United States of America
San Francisco, CA
United States of America
Stockholm SE-164 80
Martin Stiemerling (editor)
University of Applied Sciences Darmstadt