Tech-invite3GPPspaceIETF RFCsSIP
9190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7826

Real-Time Streaming Protocol Version 2.0

Pages: 318
Proposed Standard
Obsoletes:  2326
Part 2 of 13 – Pages 11 to 35
First   Prev   Next

Top   ToC   RFC7826 - Page 11   prevText

2. Protocol Overview

This section provides an informative overview of the different mechanisms in the RTSP 2.0 protocol to give the reader a high-level understanding before getting into all the specific details. In case of conflict with this description and the later sections, the later sections take precedence. For more information about use cases considered for RTSP, see Appendix E. RTSP 2.0 is a bidirectional request and response protocol that first establishes a context including content resources (the media) and then controls the delivery of these content resources from the provider to the consumer. RTSP has three fundamental parts: Session Establishment, Media Delivery Control, and an extensibility model described below. The protocol is based on some assumptions about existing functionality to provide a complete solution for client- controlled real-time media delivery. RTSP uses text-based messages, requests and responses, that may contain a binary message body. An RTSP request starts with a method line that identifies the method, the protocol, and version and the resource on which to act. The resource is identified by a URI and the hostname part of the URI is used by RTSP client to resolve the IPv4 or IPv6 address of the RTSP server. Following the method line are a number of RTSP headers. These lines are ended by two consecutive carriage return line feed (CRLF) character pairs. The message body, if present, follows the two CRLF character pairs, and the body's length is described by a message header. RTSP responses are similar, but they start with a response line with the protocol and version followed by a status code and a reason phrase. RTSP messages are sent over a reliable transport protocol between the client and server. RTSP 2.0 requires clients and servers to implement TCP and TLS over TCP as mandatory transports for RTSP messages.
Top   ToC   RFC7826 - Page 12

2.1. Presentation Description

RTSP exists to provide access to multimedia presentations and content but tries to be agnostic about the media type or the actual media delivery protocol that is used. To enable a client to implement a complete system, an RTSP-external mechanism for describing the presentation and the delivery protocol(s) is used. RTSP assumes that this description is either delivered completely out of band or as a data object in the response to a client's request using the DESCRIBE method (Section 13.2). Parameters that commonly have to be included in the presentation description are the following: o The number of media streams; o the resource identifier for each media stream/resource that is to be controlled by RTSP; o the protocol that will be used to deliver each media stream; o the transport protocol parameters that are not negotiated or vary with each client; o the media-encoding information enabling a client to correctly decode the media upon reception; and o an aggregate control resource identifier. RTSP uses its own URI schemes ("rtsp" and "rtsps") to reference media resources and aggregates under common control (see Section 4.2). This specification describes in Appendix D how one uses SDP [RFC4566] for describing the presentation.

2.2. Session Establishment

The RTSP client can request the establishment of an RTSP session after having used the presentation description to determine which media streams are available, which media delivery protocol is used, and the resource identifiers of the media streams. The RTSP session is a common context between the client and the server that consists of one or more media resources that are to be under common media delivery control. The client creates an RTSP session by sending a request using the SETUP method (Section 13.3) to the server. In the Transport header (Section 18.54) of the SETUP request, the client also includes all
Top   ToC   RFC7826 - Page 13
   the transport parameters necessary to enable the media delivery
   protocol to function.  This includes parameters that are
   preestablished by the presentation description but necessary for any
   middlebox to correctly handle the media delivery protocols.  The
   Transport header in a request may contain multiple alternatives for
   media delivery in a prioritized list, which the server can select
   from.  These alternatives are typically based on information in the
   presentation description.

   When receiving a SETUP request, the server determines if the media
   resource is available and if one or more of the of the transport
   parameter specifications are acceptable.  If that is successful, an
   RTSP session context is created and the relevant parameters and state
   is stored.  An identifier is created for the RTSP session and
   included in the response in the Session header (Section 18.49).  The
   SETUP response includes a Transport header that specifies which of
   the alternatives has been selected and relevant parameters.

   A SETUP request that references an existing RTSP session but
   identifies a new media resource is a request to add that media
   resource under common control with the already-present media
   resources in an aggregated session.  A client can expect this to work
   for all media resources under RTSP control within a multimedia
   content container.  However, a server will likely refuse to aggregate
   resources from different content containers.  Even if an RTSP session
   contains only a single media stream, the RTSP session can be
   referenced by the aggregate control URI.

   To avoid an extra round trip in the session establishment of
   aggregated RTSP sessions, RTSP 2.0 supports pipelined requests; i.e.,
   the client can send multiple requests back-to-back without waiting
   first for the completion of any of them.  The client uses a client-
   selected identifier in the Pipelined-Requests header (Section 18.33)
   to instruct the server to bind multiple requests together as if they
   included the session identifier.

   The SETUP response also provides additional information about the
   established sessions in a couple of different headers.  The Media-
   Properties header (Section 18.29) includes a number of properties
   that apply for the aggregate that is valuable when doing media
   delivery control and configuring user interface.  The Accept-Ranges
   header (Section 18.5) informs the client about range formats that the
   server supports for these media resources.  The Media-Range header
   (Section 18.30) informs the client about the time range of the media
   currently available.
Top   ToC   RFC7826 - Page 14

2.3. Media Delivery Control

After having established an RTSP session, the client can start controlling the media delivery. The basic operations are "begin playback", using the PLAY method (Section 13.4) and "suspend (pause) playback" by using the PAUSE method (Section 13.6). PLAY also allows for choosing the starting media position from which the server should deliver the media. The positioning is done by using the Range header (Section 18.40) that supports several different time formats: Normal Play Time (NPT) (Section 4.4.2), Society of Motion Picture and Television Engineers (SMPTE) Timestamps (Section 4.4.1), and absolute time (Section 4.4.3). The Range header also allows the client to specify a position where delivery should end, thus allowing a specific interval to be delivered. The support for positioning/searching within media content depends on the content's media properties. Content exists in a number of different types, such as on-demand, live, and live with simultaneous recording. Even within these categories, there are differences in how the content is generated and distributed, which affect how it can be accessed for playback. The properties applicable for the RTSP session are provided by the server in the SETUP response using the Media-Properties header (Section 18.29). These are expressed using one or several independent attributes. A first attribute is Random- Access, which indicates whether positioning is possible, and with what granularity. Another aspect is whether the content will change during the lifetime of the session. While on-demand content will be provided in full from the beginning, a live stream being recorded results in the length of the accessible content growing as the session goes on. There also exists content that is dynamically built by a protocol other than RTSP and, thus, also changes in steps during the session, but maybe not continuously. Furthermore, when content is recorded, there are cases where the complete content is not maintained, but, for example, only the last hour. All of these properties result in the need for mechanisms that will be discussed below. When the client accesses on-demand content that allows random access, the client can issue the PLAY request for any point in the content between the start and the end. The server will deliver media from the closest random access point prior to the requested point and indicate that in its PLAY response. If the client issues a PAUSE, the delivery will be halted and the point at which the server stopped will be reported back in the response. The client can later resume by sending a PLAY request without a Range header. When the server is about to complete the PLAY request by delivering the end of the content or the requested range, the server will send a PLAY_NOTIFY request (Section 13.5) indicating this.
Top   ToC   RFC7826 - Page 15
   When playing live content with no extra functions, such as recording,
   the client will receive the live media from the server after having
   sent a PLAY request.  Seeking in such content is not possible as the
   server does not store it, but only forwards it from the source of the
   session.  Thus, delivery continues until the client sends a PAUSE
   request, tears down the session, or the content ends.

   For live sessions that are being recorded, the client will need to
   keep track of how the recording progresses.  Upon session
   establishment, the client will learn the current duration of the
   recording from the Media-Range header.  Because the recording is
   ongoing, the content grows in direct relation to the time passed.
   Therefore, each server's response to a PLAY request will contain the
   current Media-Range header.  The server should also regularly send
   (approximately every 5 minutes) the current media range in a
   PLAY_NOTIFY request (Section 13.5.2).  If the live transmission ends,
   the server must send a PLAY_NOTIFY request with the updated Media-
   Properties indicating that the content stopped being a recorded live
   session and instead became on-demand content; the request also
   contains the final media range.  While the live delivery continues,
   the client can request to play the current live point by using the
   NPT timescale symbol "now", or it can request a specific point in the
   available content by an explicit range request for that point.  If
   the requested point is outside of the available interval, the server
   will adjust the position to the closest available point, i.e., either
   at the beginning or the end.

   A special case of recording is that where the recording is not
   retained longer than a specific time period; thus, as the live
   delivery continues, the client can access any media within a moving
   window that covers, for example, "now" to "now" minus 1 hour.  A
   client that pauses on a specific point within the content may not be
   able to retrieve the content anymore.  If the client waits too long
   before resuming the pause point, the content may no longer be
   available.  In this case, the pause point will be adjusted to the
   closest point in the available media.

2.4. Session Parameter Manipulations

A session may have additional state or functionality that affects how the server or client treats the session or content, how it functions, or feedback on how well the session works. Such extensions are not defined in this specification, but they may be covered in various extensions. RTSP has two methods for retrieving and setting parameter values on either the client or the server: GET_PARAMETER (Section 13.8) and SET_PARAMETER (Section 13.9). These methods carry the parameters in a message body of the appropriate format. One can also use headers to query state with the GET_PARAMETER method. As an
Top   ToC   RFC7826 - Page 16
   example, clients needing to know the current media range for a time-
   progressing session can use the GET_PARAMETER method and include the
   media range.  Furthermore, synchronization information can be
   requested by using a combination of RTP-Info (Section 18.45) and
   Range (Section 18.40).

   RTSP 2.0 does not have a strong mechanism for negotiating the headers
   or parameters and their formats.  However, responses will indicate
   request-headers or parameters that are not supported.  A priori
   determination of what features are available needs to be done through
   out-of-band mechanisms, like the session description, or through the
   usage of feature tags (Section 4.5).

2.5. Media Delivery

This document specifies how media is delivered with RTP [RFC3550] over UDP [RFC768], TCP [RFC793], or the RTSP connection. Additional protocols may be specified in the future as needed. The usage of RTP as a media delivery protocol requires some additional information to function well. The PLAY response contains information to enable reliable and timely delivery of how a client should synchronize different sources in the different RTP sessions. It also provides a mapping between RTP timestamps and the content- time scale. When the server wants to notify the client about the completion of the media delivery, it sends a PLAY_NOTIFY request to the client. The PLAY_NOTIFY request includes information about the stream end, including the last RTP sequence number for each stream, thus enabling the client to empty the buffer smoothly.

2.5.1. Media Delivery Manipulations

The basic playback functionality of RTSP enables delivery of a range of requested content to the client at the pace intended by the content's creator. However, RTSP can also manipulate the delivery to the client in two ways. Scale: The ratio of media-content time delivered per unit of playback time. Speed: The ratio of playback time delivered per unit of wallclock time. Both affect the media delivery per time unit. However, they manipulate two independent timescales and the effects are possible to combine.
Top   ToC   RFC7826 - Page 17
   Scale (Section 18.46) is used for fast-forward or slow-motion control
   as it changes the amount of content timescale that should be played
   back per time unit.  Scale > 1.0, means fast forward, e.g., scale =
   2.0 results in that 2 seconds of content being played back every
   second of playback.  Scale = 1.0 is the default value that is used if
   no scale is specified, i.e., playback at the content's original rate.
   Scale values between 0 and 1.0 provide for slow motion.  Scale can be
   negative to allow for reverse playback in either regular pace
   (scale = -1.0), fast backwards (scale < -1.0), or slow-motion
   backwards (-1.0 < scale < 0).  Scale = 0 would be equal to pause and
   is not allowed.

   In most cases, the realization of scale means server-side
   manipulation of the media to ensure that the client can actually play
   it back.  The nature of these media manipulations and when they are
   needed is highly media-type dependent.  Let's consider two common
   media types, audio and video.

   It is very difficult to modify the playback rate of audio.
   Typically, no more than a factor of two is possible while maintaining
   intelligibility by changing the pitch and rate of speech.  Music goes
   out of tune if one tries to manipulate the playback rate by
   resampling it.  This is a well-known problem, and audio is commonly
   muted or played back in short segments with skips to keep up with the
   current playback point.

   For video, it is possible to manipulate the frame rate, although the
   rendering capabilities are often limited to certain frame rates.
   Also, the allowed bitrates in decoding, the structure used in the
   encoding, and the dependency between frames and other capabilities of
   the rendering device limits the possible manipulations.  Therefore,
   the basic fast-forward capabilities often are implemented by
   selecting certain subsets of frames.

   Due to the media restrictions, the possible scale values are commonly
   restricted to the set of realizable scale ratios.  To enable the
   clients to select from the possible scale values, RTSP can signal the
   supported scale ratios for the content.  To support aggregated or
   dynamic content, where this may change during the ongoing session and
   dependent on the location within the content, a mechanism for
   updating the media properties and the scale factor currently in use,
   exists.

   Speed (Section 18.50) affects how much of the playback timeline is
   delivered in a given wallclock period.  The default is Speed = 1
   which means to deliver at the same rate the media is consumed.
   Speed > 1 means that the receiver will get content faster than it
   regularly would consume it.  Speed < 1 means that delivery is slower
Top   ToC   RFC7826 - Page 18
   than the regular media rate.  Speed values of 0 or lower have no
   meaning and are not allowed.  This mechanism enables two general
   functionalities.  One is client-side scale operations, i.e., the
   client receives all the frames and makes the adjustment to the
   playback locally.  The second is delivery control for the buffering
   of media.  By specifying a speed over 1.0, the client can build up
   the amount of playback time it has present in its buffers to a level
   that is sufficient for its needs.

   A naive implementation of Speed would only affect the transmission
   schedule of the media and has a clear impact on the needed bandwidth.
   This would result in the data rate being proportional to the speed
   factor.  Speed = 1.5, i.e., 50% faster than normal delivery, would
   result in a 50% increase in the data-transport rate.  Whether or not
   that can be supported depends solely on the underlying network path.
   Scale may also have some impact on the required bandwidth due to the
   manipulation of the content in the new playback schedule.  An example
   is fast forward where only the independently decodable intra-frames
   are included in the media stream.  This usage of solely intra-frames
   increases the data rate significantly compared to a normal sequence
   with the same number of frames, where most frames are encoded using
   prediction.

   This potential increase of the data rate needs to be handled by the
   media sender.  The client has requested that the media be delivered
   in a specific way, which should be honored.  However, the media
   sender cannot ignore if the network path between the sender and the
   receiver can't handle the resulting media stream.  In that case, the
   media stream needs to be adapted to fit the available resources of
   the path.  This can result in a reduced media quality.

   The need for bitrate adaptation becomes especially problematic in
   connection with the Speed semantics.  If the goal is to fill up the
   buffer, the client may not want to do that at the cost of reduced
   quality.  If the client wants to make local playout changes, then it
   may actually require that the requested speed be honored.  To resolve
   this issue, Speed uses a range so that both cases can be supported.
   The server is requested to use the highest possible speed value
   within the range, which is compatible with the available bandwidth.
   As long as the server can maintain a speed value within the range, it
   shall not change the media quality, but instead modify the actual
   delivery rate in response to available bandwidth and reflect this in
   the Speed value in the response.  However, if this is not possible,
   the server should instead modify the media quality to respect the
   lowest speed value and the available bandwidth.
Top   ToC   RFC7826 - Page 19
   This functionality enables the local scaling implementation to use a
   tight range, or even a range where the lower bound equals the upper
   bound, to identify that it requires the server to deliver the
   requested amount of media time per delivery time, independent of how
   much it needs to adapt the media quality to fit within the available
   path bandwidth.  For buffer filling, it is suitable to use a range
   with a reasonable span and with a lower bound at the nominal media
   rate 1.0, such as 1.0 - 2.5.  If the client wants to reduce the
   buffer, it can specify an upper bound that is below 1.0 to force the
   server to deliver slower than the nominal media rate.

2.6. Session Maintenance and Termination

The session context that has been established is kept alive by having the client show liveness. This is done in two main ways: o Media-transport protocol keep-alive. RTP Control Protocol (RTCP) may be used when using RTP. o Any RTSP request referencing the session context. Section 10.5 discusses the methods for showing liveness in more depth. If the client fails to show liveness for more than the established session timeout value (normally 60 seconds), the server may terminate the context. Other values may be selected by the server through the inclusion of the timeout parameter in the session header. The session context is normally terminated by the client sending a TEARDOWN request (Section 13.7) to the server referencing the aggregated control URI. An individual media resource can be removed from a session context by a TEARDOWN request referencing that particular media resource. If all media resources are removed from a session context, the session context is terminated. A client may keep the session alive indefinitely if allowed by the server; however, a client is advised to release the session context when an extended period of time without media delivery activity has passed. The client can re-establish the session context if required later. What constitutes an extended period of time is dependent on the client, server, and their usage. It is recommended that the client terminate the session before ten times the session timeout value has passed. A server may terminate the session after one session timeout period without any client activity beyond keep-alive. When a server terminates the session context, it does so by sending a TEARDOWN request indicating the reason.
Top   ToC   RFC7826 - Page 20
   A server can also request that the client tear down the session and
   re-establish it at an alternative server, as may be needed for
   maintenance.  This is done by using the REDIRECT method
   (Section 13.10).  The Terminate-Reason header (Section 18.52) is used
   to indicate when and why.  The Location header indicates where it
   should connect if there is an alternative server available.  When the
   deadline expires, the server simply stops providing the service.  To
   achieve a clean closure, the client needs to initiate session
   termination prior to the deadline.  In case the server has no other
   server to redirect to, and it wants to close the session for
   maintenance, it shall use the TEARDOWN method with a Terminate-Reason
   header.

2.7. Extending RTSP

RTSP is quite a versatile protocol that supports extensions in many different directions. Even this core specification contains several blocks of functionality that are optional to implement. The use case and need for the protocol deployment should determine what parts are implemented. Allowing for extensions makes it possible for RTSP to address additional use cases. However, extensions will affect the interoperability of the protocol; therefore, it is important that they can be added in a structured way. The client can learn the capability of a server by using the OPTIONS method (Section 13.1) and the Supported header (Section 18.51). It can also try and possibly fail using new methods or require that particular features be supported using the Require (Section 18.43) or Proxy-Require (Section 18.37) header. The RTSP, in itself, can be extended in three ways, listed here in increasing order of the magnitude of changes supported: o Existing methods can be extended with new parameters, for example, headers, as long as these parameters can be safely ignored by the recipient. If the client needs negative acknowledgment when a method extension is not supported, a tag corresponding to the extension may be added in the field of the Require or Proxy- Require headers. o New methods can be added. If the recipient of the message does not understand the request, it must respond with error code 501 (Not Implemented) so that the sender can avoid using this method again. A client may also use the OPTIONS method to inquire about methods supported by the server. The server must list the methods it supports using the Public response-header.
Top   ToC   RFC7826 - Page 21
   o  A new version of the protocol can be defined, allowing almost all
      aspects (except the position of the protocol version number) to
      change.  A new version of the protocol must be registered through
      a Standards Track document.

   The basic capability discovery mechanism can be used to both discover
   support for a certain feature and to ensure that a feature is
   available when performing a request.  For a detailed explanation of
   this, see Section 11.

   New media delivery protocols may be added and negotiated at session
   establishment, in addition to extensions to the core protocol.
   Certain types of protocol manipulations can be done through parameter
   formats using SET_PARAMETER and GET_PARAMETER.

3. Document Conventions

3.1. Notational Conventions

All the mechanisms specified in this document are described in both prose and the Augmented Backus-Naur form (ABNF) described in detail in [RFC5234]. Indented paragraphs are used to provide informative background and motivation. This is intended to give readers who were not involved with the formulation of the specification an understanding of why things are the way they are in RTSP. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The word, "unspecified" is used to indicate functionality or features that are not defined in this specification. Such functionality cannot be used in a standardized manner without further definition in an extension specification to RTSP.

3.2. Terminology

Aggregate control: The concept of controlling multiple streams using a single timeline, generally one maintained by the server. A client, for example, uses aggregate control when it issues a single play or pause message to simultaneously control both the audio and video in a movie. A session that is under aggregate control is referred to as an "aggregated session".
Top   ToC   RFC7826 - Page 22
   Aggregate control URI:  The URI used in an RTSP request to refer to
      and control an aggregated session.  It normally, but not always,
      corresponds to the presentation URI specified in the session
      description.  See Section 13.3 for more information.

   Client:  The client is the requester of media service from the media
      server.

   Connection:  A transport-layer virtual circuit established between
      two programs for the purpose of communication.

   Container file:  A file that may contain multiple media streams that
      often constitute a presentation when played together.  The concept
      of a container file is not embedded in the protocol.  However,
      RTSP servers may offer aggregate control on the media streams
      within these files.

   Continuous media:  Data where there is a timing relationship between
      source and sink; that is, the sink needs to reproduce the timing
      relationship that existed at the source.  The most common examples
      of continuous media are audio and motion video.  Continuous media
      can be real time (interactive or conversational), where there is a
      "tight" timing relationship between source and sink or it can be
      streaming where the relationship is less strict.

   Feature tag:  A tag representing a certain set of functionality,
      i.e., a feature.

   IRI:  An Internationalized Resource Identifier is similar to a URI
      but allows characters from the whole Universal Character Set
      (Unicode/ISO 10646), rather than the US-ASCII only.  See [RFC3987]
      for more information.

   Live:  A live presentation or session originates media from an event
      taking place at the same time as the media delivery.  Live
      sessions often have an unbound or only loosely defined duration
      and seek operations may not be possible.

   Media initialization:  The datatype- or codec-specific
      initialization.  This includes such things as clock rates, color
      tables, etc.  Any transport-independent information that is
      required by a client for playback of a media stream occurs in the
      media initialization phase of stream setup.

   Media parameter:  A parameter specific to a media type that may be
      changed before or during stream delivery.
Top   ToC   RFC7826 - Page 23
   Media server:  The server providing media-delivery services for one
      or more media streams.  Different media streams within a
      presentation may originate from different media servers.  A media
      server may reside on the same host or on a different host from
      which the presentation is invoked.

   (Media) Stream:  A single media instance, e.g., an audio stream or a
      video stream as well as a single whiteboard or shared application
      group.  When using RTP, a stream consists of all RTP and RTCP
      packets created by a media source within an RTP session.

   Message:  The basic unit of RTSP communication, consisting of a
      structured sequence of octets matching the syntax defined in
      Section 20 and transmitted over a transport between RTSP agents.
      A message is either a request or a response.

   Message body:  The information transferred as the payload of a
      message (request or response).  A message body consists of meta-
      information in the form of message body headers and content in the
      form of an arbitrary number of data octets, as described in
      Section 9.

   Non-aggregated control:  Control of a single media stream.

   Presentation:  A set of one or more streams presented to the client
      as a complete media feed and described by a presentation
      description as defined below.  Presentations with more than one
      media stream are often handled in RTSP under aggregate control.

   Presentation description:  A presentation description contains
      information about one or more media streams within a presentation,
      such as the set of encodings, network addresses, and information
      about the content.  Other IETF protocols, such as SDP ([RFC4566]),
      use the term "session" for a presentation.  The presentation
      description may take several different formats, including but not
      limited to SDP format.

   Response:  An RTSP response to a request.  One type of RTSP message.
      If an HTTP response is meant, it is indicated explicitly.

   Request:  An RTSP request.  One type of RTSP message.  If an HTTP
      request is meant, it is indicated explicitly.

   Request-URI:  The URI used in a request to indicate the resource on
      which the request is to be performed.
Top   ToC   RFC7826 - Page 24
   RTSP agent:  Either an RTSP client, an RTSP server, or an RTSP proxy.
      In this specification, there are many capabilities that are common
      to these three entities such as the capability to send requests or
      receive responses.  This term will be used when describing
      functionality that is applicable to all three of these entities.

   RTSP session:  A stateful abstraction upon which the main control
      methods of RTSP operate.  An RTSP session is a common context; it
      is created and maintained on a client's request and can be
      destroyed by either the client or server.  It is established by an
      RTSP server upon the completion of a successful SETUP request
      (when a 200 OK response is sent) and is labeled with a session
      identifier at that time.  The session exists until timed out by
      the server or explicitly removed by a TEARDOWN request.  An RTSP
      session is a stateful entity; an RTSP server maintains an explicit
      session state machine (see Appendix B) where most state
      transitions are triggered by client requests.  The existence of a
      session implies the existence of state about the session's media
      streams and their respective transport mechanisms.  A given
      session can have one or more media streams associated with it.  An
      RTSP server uses the session to aggregate control over multiple
      media streams.

   Origin server:  The server on which a given resource resides.

   Seeking:  Requesting playback from a particular point in the content
      time line.

   Transport initialization:  The negotiation of transport information
      (e.g., port numbers, transport protocols) between the client and
      the server.

   URI:  A Universal Resource Identifier; see [RFC3986].  The URIs used
      in RTSP are generally URLs as they give a location for the
      resource.  As URLs are a subset of URIs, they will be referred to
      as URIs to cover also the cases when an RTSP URI would not be a
      URL.

   URL:  A Universal Resource Locator is a URI that identifies the
      resource through its primary access mechanism rather than
      identifying the resource by name or by some other attribute(s) of
      that resource.
Top   ToC   RFC7826 - Page 25

4. Protocol Parameters

4.1. RTSP Version

This specification defines version 2.0 of RTSP. RTSP uses a "<major>.<minor>" numbering scheme to indicate versions of the protocol. The protocol versioning policy is intended to allow the sender to indicate the format of a message and its capacity for understanding further RTSP communication rather than the features obtained via that communication. No change is made to the version number for the addition of message components that do not affect communication behavior or that only add to extensible field values. The <minor> number is incremented when the changes made to the protocol add features that do not change the general message parsing algorithm but that may add to the message semantics and imply additional capabilities of the sender. The <major> number is incremented when the format of a message within the protocol is changed. The version of an RTSP message is indicated by an RTSP- Version field in the first line of the message. Note that the major and minor numbers MUST be treated as separate integers and that each MAY be incremented higher than a single digit. Thus, RTSP/2.4 is a lower version than RTSP/2.13, which, in turn, is lower than RTSP/12.3. Leading zeros SHALL NOT be sent and MUST be ignored by recipients.

4.2. RTSP IRI and URI

RTSP 2.0 defines and registers or updates three URI schemes "rtsp", "rtsps", and "rtspu". The usage of the last, "rtspu", is unspecified in RTSP 2.0 and is defined here to register the URI scheme that was defined in RTSP 1.0. The "rtspu" scheme indicates unspecified transport of the RTSP messages over unreliable transport means (UDP in RTSP 1.0). An RTSP server MUST respond with an error code indicating the "rtspu" scheme is not implemented (501) to a request that carries a "rtspu" URI scheme. The details of the syntax of "rtsp" and "rtsps" URIs have been changed from RTSP 1.0. These changes include the addition of: o Support for an IPv6 literal in the host part and future IP literals through a mechanism defined in [RFC3986]. o A new relative format to use in the RTSP elements that is not required to start with "/".
Top   ToC   RFC7826 - Page 26
   Neither should have any significant impact on interoperability.  If
   IPv6 literals are needed in the RTSP URI, then that RTSP server must
   be IPv6 capable, and RTSP 1.0 is not a fully IPv6 capable protocol.
   If an RTSP 1.0 client attempts to process the URI, the URI will not
   match the allowed syntax, it will be considered invalid, and
   processing will be stopped.  This is clearly a failure to reach the
   resource; however, it is not a signification issue as RTSP 2.0
   support was needed anyway in both server and client.  Thus, failure
   will only occur in a later step when there is an RTSP version
   mismatch between client and server.  The second change will only
   occur inside RTSP message headers, as the Request-URI must be an
   absolute URI.  Thus, such usages will only occur after an agent has
   accepted and started processing RTSP 2.0 messages, and an agent using
   RTSP 1.0 only will not be required to parse such types of relative
   URIs.

   This specification also defines the format of RTSP IRIs [RFC3987]
   that can be used as RTSP resource identifiers and locators on web
   pages, user interfaces, on paper, etc.  However, the RTSP request
   message format only allows usage of the absolute URI format.  The
   RTSP IRI format MUST use the rules and transformation for IRIs to
   URIs, as defined in [RFC3987].  This allows a URI that matches the
   RTSP 2.0 specification, and so is suitable for use in a request, to
   be created from an RTSP IRI.

   The RTSP IRI and URI are both syntax restricted compared to the
   generic syntax defined in [RFC3986] and [RFC3987]:

   o  An absolute URI requires the authority part; i.e., a host identity
      MUST be provided.

   o  Parameters in the path element are prefixed with the reserved
      separator ";".

   The "scheme" and "host" parts of all URIs [RFC3986] and IRIs
   [RFC3987] are case insensitive.  All other parts of RTSP URIs and
   IRIs are case sensitive, and they MUST NOT be case mapped.

   The fragment identifier is used as defined in Sections 3.5 and 4.3 of
   [RFC3986], i.e., the fragment is to be stripped from the IRI by the
   requester and not included in the Request-URI.  The user agent needs
   to interpret the value of the fragment based on the media type the
   request relates to; i.e., the media type indicated in Content-Type
   header in the response to a DESCRIBE request.

   The syntax of any URI query string is unspecified and responder
   (usually the server) specific.  The query is, from the requester's
   perspective, an opaque string and needs to be handled as such.
Top   ToC   RFC7826 - Page 27
   Please note that relative URIs with queries are difficult to handle
   due to the relative URI handling rules of RFC 3986.  Any change of
   the path element using a relative URI results in the stripping of the
   query, which means the relative part needs to contain the query.

   The URI scheme "rtsp" requires that commands be issued via a reliable
   protocol (within the Internet, TCP), while the scheme "rtsps"
   identifies a reliable transport using secure transport (TLS
   [RFC5246]); see Section 19.

   For the scheme "rtsp", if no port number is provided in the authority
   part of the URI, the port number 554 MUST be used.  For the scheme
   "rtsps", if no port number is provided in the authority part of the
   URI port number, the TCP port 322 MUST be used.

   A presentation or a stream is identified by a textual media
   identifier, using the character set and escape conventions of URIs
   [RFC3986].  URIs may refer to a stream or an aggregate of streams;
   i.e., a presentation.  Accordingly, requests described in Section 13
   can apply to either the whole presentation or an individual stream
   within the presentation.  Note that some request methods can only be
   applied to streams, not presentations, and vice versa.

   For example, the RTSP URI:

      rtsp://media.example.com:554/twister/audiotrack

   may identify the audio stream within the presentation "twister",
   which can be controlled via RTSP requests issued over a TCP
   connection to port 554 of host media.example.com.

   Also, the RTSP URI:

      rtsp://media.example.com:554/twister

   identifies the presentation "twister", which may be composed of audio
   and video streams, but could also be something else, such as a random
   media redirector.

      This does not imply a standard way to reference streams in URIs.
      The presentation description defines the hierarchical
      relationships in the presentation and the URIs for the individual
      streams.  A presentation description may name a stream "a.mov" and
      the whole presentation "b.mov".

   The path components of the RTSP URI are opaque to the client and do
   not imply any particular file system structure for the server.
Top   ToC   RFC7826 - Page 28
      This decoupling also allows presentation descriptions to be used
      with non-RTSP media control protocols simply by replacing the
      scheme in the URI.

4.3. Session Identifiers

Session identifiers are strings of a length between 8-128 characters. A session identifier MUST be generated using methods that make it cryptographically random (see [RFC4086]). It is RECOMMENDED that a session identifier contain 128 bits of entropy, i.e., approximately 22 characters from a high-quality generator (see Section 21). However, note that the session identifier does not provide any security against session hijacking unless it is kept confidential by the client, server, and trusted proxies.

4.4. Media-Time Formats

RTSP currently supports three different media-time formats defined below. Additional time formats may be specified in the future. These time formats can be used with the Range header (Section 18.40) to request playback and specify at which media position protocol requests actually will or have taken place. They are also used in description of the media's properties using the Media-Range header (Section 18.30). The unqualified format identifier is used on its own in Accept-Ranges header (Section 18.5) to declare supported time formats and also in the Range header (Section 18.40) to request the time format used in the response.

4.4.1. SMPTE-Relative Timestamps

A timestamp may use a format derived from a Society of Motion Picture and Television Engineers (SMPTE) specification and expresses time offsets anchored at the start of the media clip. Relative timestamps are expressed as SMPTE time codes [SMPTE-TC] for frame-level access accuracy. The time code has the format: hours:minutes:seconds:frames.subframes with the origin at the start of the clip. The default SMPTE format is "SMPTE 30 drop" format, with a frame rate of 29.97 frames per second. Other SMPTE codes MAY be supported (such as "SMPTE 25") through the use of "smpte-type". For SMPTE 30, the "frames" field in the time value can assume the values 0 through 29. The difference between 30 and 29.97 frames per second is handled by dropping the first two frame indices (values 00 and 01) of every minute, except every tenth minute. If the frame and the subframe values are zero, they may be omitted. Subframes are measured in hundredths of a frame.
Top   ToC   RFC7826 - Page 29
   Examples:

     smpte=10:12:33:20-
     smpte=10:07:33-
     smpte=10:07:00-10:07:33:05.01
     smpte-25=10:07:00-10:07:33:05.01

4.4.2. Normal Play Time

Normal Play Time (NPT) indicates the stream-absolute position relative to the beginning of the presentation. The timestamp consists of two parts: The mandatory first part may be expressed in either seconds only or in hours, minutes, and seconds. The optional second part consists of a decimal point and decimal figures and indicates fractions of a second. The beginning of a presentation corresponds to 0.0 seconds. Negative values are not defined. The special constant "now" is defined as the current instant of a live event. It MAY only be used for live events and MUST NOT be used for on-demand (i.e., non-live) content. NPT is defined as in Digital Storage Media Command and Control (DSMb;CC) [ISO.13818-6.1995]: Intuitively, NPT is the clock the viewer associates with a program. It is often digitally displayed on a DVD player. NPT advances normally when in normal play mode (scale = 1), advances at a faster rate when in fast-scan forward (high positive scale ratio), decrements when in scan reverse (negative scale ratio) and is fixed in pause mode. NPT is (logically) equivalent to SMPTE time codes. Examples: npt=123.45-125 npt=12:05:35.3- npt=now-
Top   ToC   RFC7826 - Page 30
   The syntax is based on ISO 8601 [ISO.8601.2000] and expresses the
   time elapsed since presentation start, with two different notations
   allowed:

   o  The npt-hhmmss notation uses an ISO 8601 extended complete
      representation of the time of the day format (Section 5.3.1.1 of
      [ISO.8601.2000] ) using colons (":") as separators between hours,
      minutes, and seconds (hh:mm:ss).  The hour counter is not limited
      to 0-24 hours; up to nineteen (19) hour digits are allowed.

      *  In accordance with the requirements of the ISO 8601 time
         format, the hours, minutes, and seconds MUST all be present,
         with two digits used for minutes and for seconds and with at
         least two digits for hours.  An NPT of 7 minutes and 0 seconds
         is represented as "00:07:00", and an NPT of 392 hours, 0
         minutes, and 6 seconds is represented as "392:00:06".

      *  RTSP 1.0 allowed NPT in the npt-hhmmss notation without any
         leading zeros to ensure that implementations don't fail; for
         backward compatibility, all RTSP 2.0 implementations are
         REQUIRED to support receiving NPT values, hours, minutes, or
         seconds, without leading zeros.

   o  The npt-sec notation expresses the time in seconds, using between
      one and nineteen (19) digits.

   Both notations allow decimal fractions of seconds as specified in
   Section 5.3.1.3 of [ISO.8601.2000], using at most nine digits, and
   allowing only "." (full stop) as the decimal separator.

   The npt-sec notation is optimized for automatic generation; the npt-
   hhmmss notation is optimized for consumption by human readers.  The
   "now" constant allows clients to request to receive the live feed
   rather than the stored or time-delayed version.  This is needed since
   neither absolute time nor zero time are appropriate for this case.

4.4.3. Absolute Time

Absolute time is expressed using a timestamp based on ISO 8601 [ISO.8601.2000]. The date is a complete representation of the calendar date in basic format (YYYYMMDD) without separators (per Section 5.2.1.1 of [ISO.8601.2000]). The time of day is provided in the complete representation basic format (hhmmss) as specified in Section 5.3.1.1 of [ISO.8601.2000], allowing decimal fractions of seconds following Section 5.3.1.3 requiring "." (full stop) as decimal separator and limiting the number of digits to no more than nine. The time expressed MUST use UTC (GMT), i.e., no time zone offsets are allowed. The full date and time specification is the
Top   ToC   RFC7826 - Page 31
   eight-digit date followed by a "T" followed by the six-digit time
   value, optionally followed by a full stop followed by one to nine
   fractions of a second and ended by "Z", e.g., YYYYMMDDThhmmss.ssZ.

      The reasons for this time format rather than using "Date and Time
      on the Internet: Timestamps" [RFC3339] are historic.  We continue
      to use the format specified in RTSP 1.0.  The motivations raised
      in RFC 3339 apply to why a selection from ISO 8601 was made;
      however, a different and even more restrictive selection was
      applied in this case.

   Below are three examples of media time formats, first, a request for
   a clock format range request for a starting time of November 8, 1996
   at 14 h 37 min and 20 1/4 seconds UTC playing for 10 min and 5
   seconds, followed by a Media-Properties header's "Time-Limited" UTC
   property for the 24th of December 2014 at 15 hours and 00 minutes,
   and finally a Terminate-Reason header "time" property for the 18th of
   June 2013 at 16 hours, 12 minutes, and 56 seconds:

     clock=19961108T143720.25Z-19961108T144725.25Z
     Time-Limited=20141224T1500Z
     time=20130618T161256Z

4.5. Feature Tags

Feature tags are unique identifiers used to designate features in RTSP. These tags are used in Require (Section 18.43), Proxy-Require (Section 18.37), Proxy-Supported (Section 18.38), Supported (Section 18.51), and Unsupported (Section 18.55) header fields. A feature tag definition MUST indicate which combination of clients, servers, or proxies to which it applies. The creator of a new RTSP feature tag should either prefix the feature tag with a reverse domain name (e.g., "com.example.mynewfeature" is an apt name for a feature whose inventor can be reached at "example.com") or register the new feature tag with the Internet Assigned Numbers Authority (IANA). (See Section 22, "IANA Considerations".) The usage of feature tags is further described in Section 11, which deals with capability handling.
Top   ToC   RFC7826 - Page 32

4.6. Message Body Tags

Message body tags are opaque strings that are used to compare two message bodies from the same resource, for example, in caches or to optimize setup after a redirect. Message body tags can be carried in the MTag header (see Section 18.31) or in SDP (see Appendix D.1.9). MTag is similar to ETag in HTTP/1.1 (see Section 3.11 of [RFC2068]). A message body tag MUST be unique across all versions of all message bodies associated with a particular resource. A given message body tag value MAY be used for message bodies obtained by requests on different URIs. The use of the same message body tag value in conjunction with message bodies obtained by requests on different URIs does not imply the equivalence of those message bodies. Message body tags are used in RTSP to make some methods conditional. The methods are made conditional through the inclusion of headers; see Section 18.24 and Section 18.26 for information on the If-Match and If-None-Match headers, respectively. Note that RTSP message body tags apply to the complete presentation, i.e., both the presentation description and the individual media streams. Thus, message body tags can be used to verify at setup time after a redirect that the same session description applies to the media at the new location using the If-Match header.

4.7. Media Properties

When an RTSP server handles media, it is important to consider the different properties a media instance for delivery and playback can have. This specification considers the media properties listed below in its protocol operations. They are derived from the differences between a number of supported usages. On-demand: Media that has a fixed (given) duration that doesn't change during the lifetime of the RTSP session and is known at the time of the creation of the session. It is expected that the content of the media will not change, even if the representation, such as encoding, or quality, may change. Generally, one can seek, i.e., request any range, within the media. Dynamic On-demand: This is a variation of the on-demand case where external methods are used to manipulate the actual content of the media setup for the RTSP session. The main example is content defined by a playlist.
Top   ToC   RFC7826 - Page 33
   Live:  Live media represents a progressing content stream (such as
      broadcast TV) where the duration may or may not be known.  It is
      not seekable, only the content presently being delivered can be
      accessed.

   Live with Recording:  A live stream that is combined with a server-
      side capability to store and retain the content of the live
      session and allow for random access delivery within the part of
      the already-recorded content.  The actual behavior of the media
      stream is very much dependent on the retention policy for the
      media stream; either the server will be able to capture the
      complete media stream or it will have a limitation in how much
      will be retained.  The media range will dynamically change as the
      session progress.  For servers with a limited amount of storage
      available for recording, there will typically be a sliding window
      that moves forward while new data is made available and older data
      is discarded.

   To cover the above usages, the following media properties with
   appropriate values are specified.

4.7.1. Random Access and Seeking

Random access is the ability to specify and get media delivered starting from any time (instant) within the content, an operation called "seeking". The Media-Properties header will indicate the general capability for a media resource to perform random access. Random-Access: The media is seekable to any out of a large number of points within the media. Due to media-encoding limitations, a particular point may not be reachable, but seeking to a point close by is enabled. A floating-point number of seconds may be provided to express the worst-case distance between random access points. Beginning-Only: Seeking is only possible to the beginning of the content. No-Seeking: Seeking is not possible at all. If random access is possible, as indicated by the Media-Properties header, the actual behavior policy when seeking can be controlled using the Seek-Style header (Section 18.47).
Top   ToC   RFC7826 - Page 34

4.7.2. Retention

The following retention policies are used by media to limit possible protocol operations: Unlimited: The media will not be removed as long as the RTSP session is in existence. Time-Limited: The media will not be removed before the given wallclock time. After that time, it may or may not be available anymore. Time-Duration: The media (on fragment or unit basis) will be retained for the specified duration.

4.7.3. Content Modifications

The media content and its timeline can be of different types, e.g. pre-produced content on demand, a live source that is being generated as time progresses, or something that is dynamically altered or recomposed during playback. Therefore, a media property for content modifications is needed and the following initial values are defined: Immutable: The content of the media will not change, even if the representation, such as encoding or quality changes. Dynamic: The content can change due to external methods or triggers, such as playlists, but this will be announced by explicit updates. Time-Progressing: As time progresses, new content will become available. If the content is also retained, it will become longer as everything between the start point and the point currently being made available can be accessed. If the media server uses a sliding-window policy for retention, the start point will also change as time progresses.

4.7.4. Supported Scale Factors

A particular media content item often supports only a limited set or range of scales when delivering the media. To enable the client to know what values or ranges of scale operations that the whole content or the current position supports, a media properties attribute for this is defined that contains a list with the values or ranges that are supported. The attribute is named "Scales". The "Scales" attribute may be updated at any point in the content due to content consisting of spliced pieces or content being dynamically updated by out-of-band mechanisms.
Top   ToC   RFC7826 - Page 35

4.7.5. Mapping to the Attributes

This section shows examples of how one would map the above usages to the properties and their values. Example of On-Demand: Random Access: Random-Access=5.0, Content Modifications: Immutable, Retention: Unlimited or Time-Limited. Example of Dynamic On-Demand: Random Access: Random-Access=3.0, Content Modifications: Dynamic, Retention: Unlimited or Time-Limited. Example of Live: Random Access: No-Seeking, Content Modifications: Time- Progressing, Retention: Time-Duration=0.0 Example of Live with Recording: Random Access: Random-Access=3.0, Content Modifications: Time- Progressing, Retention: Time-Duration=7200.0


(page 35 continued on part 3)

Next Section