Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 4396

RTP Payload Format for 3rd Generation Partnership Project (3GPP) Timed Text

Pages: 66
Proposed Standard
Errata
Part 3 of 3 – Pages 45 to 66
First   Prev   None

Top   ToC   RFC4396 - Page 45   prevText

5. Resilient Transport

Apart from the basic fragmentation guidelines described in the section above, the simplest option for packet-loss-resilient transport is packet repetition. This mechanism may consist of a strict window-based repetition mechanism or, simply, a repetition mechanism in a wider sense, where new and old packets are mixed, for example. A server MAY decide to use repetition as a measure for packet loss resilience. Thereby, a server MAY send the same RTP payloads or just some of the units from the payloads. As for the case of complete payloads, single repeated units MUST exactly match the same units sent in the first transmission; i.e., if fragmentation is needed, it SHALL be performed only once for each text sample. Only then, a receiver can use the already received and the repeated units to reconstruct the original text samples. Since the RTP timestamp is used to group together the fragments of a sample, care must taken to preserve the timing of units when constructing new RTP packets. For example, if a text sample was originally sent as a single non-fragmented text sample (one TYPE 1 unit), a repetition of that sample MUST be sent also as a single non-fragmented text sample in one unit. Likewise, if the original text sample was fragmented and spread over several RTP packets (say, a total of 3 units), then the repeated fragments SHALL also have the same byte boundaries and use the same unit headers and bytes per fragment.
Top   ToC   RFC4396 - Page 46
   With repetition, repeated units resolve to the same timestamp as
   their originals.  Where redundant units are available, only one of
   them SHALL be used.

   Regarding the RTP header fields:

   o If the whole RTP payload is repeated, all payload-specific fields
     in the RTP header (the M, TS and PT fields) MUST keep their
     original values except the sequence number, which MUST be
     incremented to comply with RTP (the fields TOTAL/THIS enable to
     re-assemble fragments with different sequence numbers).

   o In packets containing single repeated units, the general rules in
     Section 3 for assigning values to the RTP header fields apply.
     Keeping the value of the RTP timestamp to preserve the timing of
     the units is particularly relevant here.

   Apart from repetition, other mechanisms such as FEC [7],
   retransmission [11], or similar techniques could be used to cope with
   packet losses.

6. Congestion Control

Congestion control for RTP SHALL be implemented in accordance with RTP [3] and the applicable RTP profile, e.g., RTP/AVP [17]. When using this payload format, mainly two factors may affect the congestion control: o The use of (unit) aggregation may make the payload format more bandwidth efficient, by avoiding header overhead and thus reducing the used bitrate. o The use of resilient transport mechanisms: Although timed text applications typically operate at low bitrates, the increase due to resilient transport shall be considered for congestion control mechanisms. This applies to all mechanisms but especially to less efficient ones like repetition.
Top   ToC   RFC4396 - Page 47

7. Scene Description

7.1. Text Rendering Position and Composition

In order to set up a timed text session, regardless of the stream being stored in a 3GP file or streamed live, some initial layout information is needed by the communicating peers. +-------------------------------------------+ | <-> tx | +-------------+ | +-------------------------------+ |<---|Display Area | | ^ | | | +-------------+ | : | | | | :ty| | | +-------------+ | : | |<---------|Video track | | : | | | +-------------+ | : | | | | : | | | | : | | | | v | | | | - | x-------------------------+ | | +-------------+ |h ^ | | |<-----------|Text Track | |e : +---|-------------------------|-+ | +-------------+ |i : | +---------------------+ | | |g : | | | | | +-------------+ |h : | | |<------------ |Text Box | |t v | +---------------------+ | | +-------------+ | - +-------------------------+ | +-------------------------------------------+ <........................> w i d t h Figure 18. Illustration of text rendering position and composition The parameters used for negotiating the position and size of the text track in the display area are shown in Figure 18. These are the "width" and "height" of the text track, its translation values, "tx" and "ty", and its "layer" or proximity to the user. At the same time, the sender of the stream needs to know the receiver's capabilities. In this case, the maximum allowable values for the text track height and width: "max-h" and "max-w", for the stream the receiver shall display. This layout information MUST be conveyed in a reliable form before the start of the session, e.g., during session announcement or in an Offer/Answer (O/A) exchange. An example of a reliable transport may be the out-of-band channel used for SDP. Sections 8 and 9 provide
Top   ToC   RFC4396 - Page 48
   details on the mapping of these parameters to SDP descriptions and
   their usage in O/A.

   For stored content, the layout values expressing stream properties
   MUST be obtained from the Track Header Box.  See Section 7.3.

   For live streaming, appropriate values as negotiated during session
   setup shall be used.

7.2. SMIL Usage

The attributes contained in the Track Header Boxes of a 3GP file only specify the spatial relationship of the tracks within the given 3GP file. If multiple 3GP files are sent, they require spatial synchronization. For example, for a text and video stream, the positions of the text and video tracks in Figure 18 shall be determined. For this purpose, SMIL [9] MAY be used. SMIL assigns regions in the display to each of those files and places the tracks within those regions. Generally, in SMIL, the position of one track (or stream) is expressed relative to another track. This is different from the 3GP file, where the upper left corner is the reference for all translation offsets. Hence, only if the position in SMIL is relative to the video track origin, then this translation offset has the same value as (tx, ty) in the 3GP file. Note also that the original track header information is used for each track only within its region, as assigned by SMIL. Therefore, even if SMIL scene description is used, the track header information pieces SHOULD be sent anyway, as they represent the intrinsic media properties. See 3GPP SMIL Language Profile in [27] for details.

7.3. Finding Layout Values in a 3GP File

In a 3GP file, within the Track Header Box (tkhd): o tx, ty: These values specify the translation offset of the (text) track relative to the upper left corner of the video track, if present. They are the second but last and third but last values in the unity matrix; values are fixed-point 16.16 values, restricted to be (signed) integers (i.e., the lower 16 bits of each value shall be all zeros). Therefore, only the first 16 bits are used for obtaining the value of the media type parameters.
Top   ToC   RFC4396 - Page 49
        o width, height: They have the same name in the tkhd box.  All
          (unsigned) 32 bits are meaningful.

        o layer: All (signed) 16 bits are used.

8. 3GPP Timed Text Media Type

The media subtype for the 3GPP Timed Text codec is allocated from the standards tree. The top-level media type under which this payload format is registered is 'video'. This registration is done using the template defined in [29] and following RFC 3555 [28]. The receiver MUST ignore any unrecognized parameter. Media type: video Media subtype: 3gpp-tt Required parameters rate: Refer to Section 3 in RFC 4396. sver: The parameter "sver" contains a list of supported backwards-compatible versions of the timed text format specification (3GPP TS 26.245) that the sender accepts to receive (and that are the same that it would be willing to send). The first value is the value preferred to receive (or preferred to send). The first value MAY be followed by a comma-separated list of versions that SHOULD be used as alternatives. The order is meaningful, being first the most preferred and last the least preferred. Each entry has the format Zi(xi*256+yi), where "Zi" is the number of the Release and "xi" and "yi" are taken from the 3GPP specification version (i.e., vZi.xi.yi). For example, for 3GPP TS 26.245 v6.0.0, Zi(xi*256+yi)=6(0), the version value is "60". (Note that "60" is the concatenation of the values Zi=6 and (xi*256+yi)=0 and not their product.) If no "sver" value is available, for example, when streaming out of a 3GP file, the default value "60", corresponding to the 3GPP Release 6 version of 3GPP TS 26.245, SHALL be used.
Top   ToC   RFC4396 - Page 50
   Optional parameters:

        tx:
                This parameter indicates the horizontal translation
                offset in pixels of the text track with respect to the
                origin of the video track.  This value is the decimal
                representation of a 16-bit signed integer.  Refer to TS
                3GPP 26.245 for an illustration of this parameter.

        ty:
                This parameter indicates the vertical translation offset
                in pixels of the text track with respect to the origin
                of the video track.  This value is the decimal
                representation of a 16-bit signed integer.  Refer to TS
                3GPP 26.245 for an illustration of this parameter.

        layer:
                This parameter indicates the proximity of the text track
                to the viewer.  More negative values mean closer to the
                viewer.  This parameter has no units.  This value is the
                decimal representation of a 16-bit signed integer.

        tx3g:
                This parameter MUST be used for conveying sample
                descriptions out-of-band.  It contains a comma-separated
                list of base64-encoded entries.  The entries of this
                list MAY follow any particular order and the list SHALL
                NOT be empty.  Each entry is the result of running
                base64 encoding over the concatenation of the (static)
                SIDX value as an 8-bit unsigned integer and the (static)
                sample description for that SIDX, in that order.  The
                format of a sample description entry can be found in
                3GPP TS 26.245 Release 6 and later releases.  All
                servers and clients MUST understand this parameter and
                MUST be capable of using the sample description(s)
                contained in it.  Please refer to RFC 3548 [6] for
                details on the base64 encoding.

        width:
                This parameter indicates the width in pixels of the text
                track or area of the text being sent.  This value is the
                decimal representation of a 32-bit unsigned integer.
                Refer to TS 3GPP 26.245 for an illustration of this
                parameter.
Top   ToC   RFC4396 - Page 51
        height:
                This parameter indicates the height in pixels of the
                text track being sent.  This value is the decimal
                representation of a 32-bit unsigned integer.  Refer to
                TS 3GPP 26.245 for an illustration of this parameter.

        max-w:
                This parameter indicates display capabilities.  This is
                the maximum "width" value that the sender of this
                parameter supports.  This value is the decimal
                representation of a 32-bit unsigned integer.

        max-h:
                This parameter indicates display capabilities.  This is
                the maximum "height" value that the sender of this
                parameter supports.  This value is the decimal
                representation of a 32-bit unsigned integer.

   Encoding considerations:

        This media type is framed (see Section 4.8 in [29]) and
        partially contains binary data.

   Restrictions on usage:

        This media type depends on RTP framing, and hence is only
        defined for transfer via RTP [3].  Transport within other
        framing protocols is not defined at this time.

   Security considerations:

        Please refer to Section 11 of RFC 4396.

   Interoperability considerations:

        The 3GPP Timed Text media format and its file storage is
        specified in Release 6 of 3GPP TS 26.245, "Transparent end-to-
        end packet switched streaming service (PSS); Timed Text Format
        (Release 6)".  Note also that 3GPP may in future releases
        specify extensions or updates to the timed text media format in
        a backwards-compatible way, e.g., new modifier boxes or
        extensions to the sample descriptions.  The payload format
        defined in RFC 4396 allows for such extensions.  For future 3GPP
        Releases of the Timed Text Format, the parameter "sver" is used
        to identify the exact specification used.
Top   ToC   RFC4396 - Page 52
        The defined storage format for 3GPP Timed Text format is the
        3GPP File Format (3GP) [30]. 3GP files may be transferred using
        the media type video/3gpp as registered by RFC 3839 [31].  The
        3GPP File Format is a container file that may contain, e.g.,
        audio and video that may be synchronized with the 3GPP Timed
        Text.

   Published specification: RFC 4396

   Applications which use this media type:

        Multimedia streaming applications.

   Additional information:

        The 3GPP Timed Text media format is specified in 3GPP TS 26.245,
        "Transparent end-to-end packet switched streaming service (PSS);
        Timed Text Format (Release 6)".  This document and future
        extensions to the 3GPP Timed Text format are publicly available
        at http://www.3gpp.org.

        Magic number(s): None.

        File extension(s): None.

        Macintosh File Type Code(s): None.

   Person & email address to contact for further information:

        Jose Rey, jose.rey@eu.panasonic.com
        Yoshinori Matsui, matsui.yoshinori@jp.panasonic.com
        Audio/Video Transport Working Group.

   Intended usage: COMMON

   Authors:
        Jose Rey
        Yoshinori Matsui

   Change controller: IETF Audio/Video Transport Working Group delegated
        from the IESG.
Top   ToC   RFC4396 - Page 53

9. SDP Usage

9.1. Mapping to SDP

The information carried in the media type specification has a specific mapping to fields in SDP [4]. If SDP is used to specify sessions using this payload format, the mapping is done as follows: o The media type ("video") goes in the SDP "m=" as the media name. m=video <port number> RTP/<RTP profile> <dynamic payload type> o The media subtype ("3gpp-tt") and the timestamp clockrate "rate" (the RECOMMENDED 1000 Hz or other value) go in SDP "a=rtpmap" line as the encoding name and rate, respectively: a=rtpmap:<payload type> 3gpp-tt/1000 o The REQUIRED parameter "sver" goes in the SDP "a=fmtp" attribute by copying it directly from the media type string as a semicolon- separated parameter=value pair. o The OPTIONAL parameters "tx", "ty", "layer", "tx3g", "width", "height", "max-w" and "max-h" go in the SDP "a=fmtp" attribute by copying them directly from the media type string as a semicolon separated list of parameter=value(s) pairs: a=fmtp:<dynamic payload type> <parameter name>=<value>[,<value>][; <parameter name>=<value>] o Any parameter unknown to the device that uses the SDP SHALL be ignored. For example, parameters added to the media format in later specifications MAY be copied into the SDP and SHALL be ignored by receivers that do not understand them.

9.2. Parameter Usage in the SDP Offer/Answer Model

In this section, the meaning of the SDP parameters defined in this document within the Offer/Answer [13] context is explained. In unicast, sender and receiver typically negotiate the streams, i.e., which codecs and parameter values are used in the session. This is also possible in multicast to a lesser extent. Additionally, the meaning of the parameters MAY vary depending on which direction is used. In the following sections, a "<directionality> offer" means an offer that contains a stream set to <directionality>. <directionality> may take the values sendrecv,
Top   ToC   RFC4396 - Page 54
   sendonly, and recvonly.  Similar considerations apply for answers.
   For example, an answer to a sendonly offer is a recvonly answer.

9.2.1. Unicast Usage

The following types of parameters are used in this payload format: 1. Declarative parameters: Offerer and answerer declare the values they will use for the incoming (sendrecv/recvonly) or outgoing (sendonly) stream. Offerer and answerer MAY use different values. a. "tx", "ty", and "layer": These are parameters describing where the received text track is placed. Depending on the directionality: i. They MUST appear in all sendrecv offers and answers and in all recvonly offers and answers (thus applying to the incoming stream). In the case of sendrecv offers and answers and in recvonly offers, these values SHOULD be used by the sender of the stream unless it has a particular preference, in which case, it MUST make sure that these different values do not corrupt the presentation. For recvonly answers, the answerer MAY accept the proposed values for the incoming stream (in a sendonly offer; see ii. below) or respond with different ones. The offerer MUST use the returned values. ii. They MAY appear in sendonly offers and MUST appear in sendonly answers. In sendonly offers, they specify the values that the offerer proposes for sending (see example in Section 9.3). In sendonly answers, these values SHOULD be copied from the corresponding recvonly offer upon accepting the stream, unless a particular preference by the receiver of the stream exists, as explained in the previous point. 2. Parameters describing the display capabilities, "max-h" and "max-w", which indicate the maximum dimensions of the text track (text display area) for the incoming stream "tx" and "ty" values (see Figure 18). "max-h" and "max-w" MUST be included in all offers and answers where "tx" and "ty" refer to the incoming stream, thus excluding sendonly offers and answers (see example in Section 9.3), where they SHALL NOT be present.
Top   ToC   RFC4396 - Page 55
     3. Parameters describing the sent stream properties, i.e., the
        sender of the stream decides upon the values of these:

          a. "width" and "height" specify the text track dimensions.
             They SHALL ALWAYS be present in sendrecv and sendonly
             offers and answers.  For recvonly answers, the answerer
             MUST include the offered parameter values (if any) verbatim
             in the answer upon accepting the stream.

          b. "tx3g" contains static sample descriptions.  It MAY only be
             present in sendrecv and sendonly offers and answers.  This
             parameter applies to the stream that offerers or answerers
             send.

     4. Negotiable parameters, which MUST be agreed on.  This is the
        case of "sver".  This parameter MUST be present in every offer
        and answer.  The answerer SHALL choose one supported value from
        the offerer's list, or else it MUST remove the stream or reject
        the session.

     5. Symmetric parameters: "rate", timestamp clockrate, belongs to
        this class.  Symmetric parameters MUST be echoed verbatim in the
        answer.  Otherwise, the stream MUST be removed or the session
        rejected.

   The following table summarizes all options:
Top   ToC   RFC4396 - Page 56
     +..---------------------------+----------+----------+----------+
     |   ``--..__  Directionality/ | sendrecv | recvonly | sendonly |
     + Type of   ``--..__   O or A +----------+----------+----------+
     |    Parameter      ``--..__  |   O/A    |   O/A    |   O/A    |
     +--------------+------------``+----------+----------+----------+
     | Declarative  |tx, ty, layer |   M/M    |   M/M    |   m/M    |
     |              |              |          |          |          |
     +--------------+--------------+----------+----------+----------+
     | Display      |max-h, max-w  |   M/M    |   M/M    |   -/-    |
     | Capabilities |              |          |          |          |
     +--------------+--------------+----------+----------+----------+
     | Stream       |height, width |   M/M    |   -/(M)  |   M/M    |
     | properties   |tx3g          |   m/m    |   -/-    |   m/m    |
     |              |              |          |          |          |
     +--------------+--------------+----------+----------+----------+
     |  Negotiable  |sver          |   M/M    |   M/M    |   M/M    |
     |              |              |          |          |          |
     +--------------+--------------+----------+----------+----------+
     |  Symmetric   |rate          |   M/M    |   M/M    |   M/M    |
     +--------------+--------------+----------+----------+----------+

          Table 1.  Parameter usage in Unicast Offer / Answer.

   KEY:
        o M means MUST be present.
        o m means MAY be present (such as proposed values).
        o (M) or (m) means MUST or MAY, if applicable.
        o a hyphen ("-") means the parameter MUST NOT be present.

   Other observations regarding parameter usage:

     o Translation and transparency values: In sendonly offers, "tx",
       "ty", and "layer" indicate proposed values.  This is useful for
       visually composed sessions where the different streams occupy
       different parts of the display, e.g., a video stream and the
       captions.  These are just suggested values; the peer rendering
       the text ultimately decides where to place the text track.

     o Text track (area) dimensions, "height" and "width": In the case
       of sendonly offers, an answerer accepting the offer MUST be
       prepared to render the stream using these values.  If any of
       these conditions are not met, the stream MUST be removed or the
       session rejected.

     o Display capabilities, "max-h" and "max-w": An answerer sending a
       stream SHALL ensure that the "height" and "width" values in the
       answer are compatible with the offerer's signaled capabilities.
Top   ToC   RFC4396 - Page 57
     o Version handling via "sver": The idea is that offerer and
       answerer communicate using the same version.  This is achieved by
       letting the answerer choose from a list of supported versions,
       "sver".  For recvonly streams, the first value in the list is the
       preferred version to receive.  Consequently, for sendonly (and
       sendrecv) streams, the first value is the one preferred for
       sending (and receiving).  The answerer MUST choose one value and
       return it in the answer.  Upon receiving the answer, the offerer
       SHALL be prepared to send (sendonly and sendrecv) and receive
       (recvonly and sendrecv) a stream using that version.  If none of
       the versions in the list is supported, the stream MUST be removed
       or the session rejected.  Note that, if alternative non-
       compatible versions are offered, then this SHALL be done using
       different payload types.

9.2.2. Multicast Usage

In multicast, the parameter usage is similar to the unicast case, except as follows: o the parameters "tx", "ty", and "layer" in multicast offers only have meaning for sendrecv and recvonly streams. In order for all clients to have the same vision of the session, they MUST be used symmetrically. o for "height", "width", and "tx3g" (for sendrecv and sendonly), multicast offers specify which values of these parameters the participants MUST use for sending. Thus, if the stream is accepted, the answerer MUST also include them verbatim in the answer (also "tx3g", if present). o The capability parameters, "max-h" and "max-w", SHALL NOT be used in multicast. If the offered text track should change in size, a new offer SHALL be used instead. o Regarding version handling: In the case of multicast offers, an answerer MAY accept a multicast offer as long as one of the versions listed in the "sver" is supported. Therefore, if the stream is accepted, the answerer MUST choose its preferred version, but, unlike in unicast, the offerer SHALL NOT change the offered stream to this chosen version because there may be other session participants that do support the newer extensions. Consequently, different session participants may end up using different backwards-compatible media format versions. It is RECOMMENDED that the multicast offer contains a limited number of versions, in order for all participants to have the same view of the session. This is a responsibility of the session creator. If
Top   ToC   RFC4396 - Page 58
     none of the offered versions is supported, the stream SHALL be
     removed or the session rejected.  Also in this case, if alternative
     non-compatible versions are offered, then this SHALL be done using
     different payload types.

9.3. Offer/Answer Examples

In these unicast O/A examples, the long lines are wrapped around. Static sample descriptions are shortened for clarity. For sendrecv: O -> A m=video <port> RTP/AVP 98 a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=120; max-w=160; sver=6256,60; tx3g=81... a=sendrecv A -> O m=video <port> RTP/AVP 98.. a=rtpmap:98 3gpp-tt/1000 a=fmtp:98 tx=100; ty=95; layer=0; height=90; width=100; max-h=100; max-w=160; sver=60; tx3g=82... a=sendrecv In this example, the offerer is telling the answerer where it will place the received stream and what is the maximum height and width allowable for the stream that it will receive. Also, it tells the answerer the dimensions of the text track for the stream sent and which sample description it shall use. It offers two versions, 6256 and 60. The answerer responds with an equivalent set of parameters for the stream it receives. In this case, the answerer's "max-h" and "max-w" are compatible with the offerer's "height" and "width". Otherwise, the answerer would have to remove this stream, and the offerer would have to issue a new offer taking the answerer's capabilities into account. This is possible only if multiple payload types are present in the initial offer so that at least one of them matches the answerer's capabilities as expressed by "max-h" and "max-w" in the negative answer. Note also that the answerer's text box dimensions fit within the maximum values signaled in the offer. Finally, the answerer chooses to use version 60 of the timed text format.
Top   ToC   RFC4396 - Page 59
   For recvonly:

   Offerer -> Answerer

   m=video <port> RTP/AVP 98
   a=rtpmap:98 3gpp-tt/1000
   a=fmtp:98 tx=100; ty=100; layer=0; max-h=120; max-w=160; sver=6256,60
   a=recvonly

   A -> O

   m=video <port> RTP/AVP 98..
   a=rtpmap:98 3gpp-tt/1000
   a=fmtp:98 tx=100; ty=100; layer=0; height=90; width=100; sver=60;
   tx3g=82...
   a=sendonly

   In this case, the offer is different from the previous case: It does
   not include the stream properties "height", "width", and "tx3g".  The
   answerer copies the "tx", "ty", and "layer" values, thus
   acknowledging these.  "max-h" and "max-w" are not present in the
   answer because the "tx" and "ty" (and "layer") in this special case
   do not apply to the received stream, but to the sent stream.  Also,
   if offerer and answerer had very different display sizes, it would
   not be possible to express the answerer's capabilities.  In the
   example above and for an answerer with a 50x50 display, the
   translation values are already out of range.

   For sendonly:

   O -> A

   m=video <port> RTP/AVP 98
   a=rtpmap:98 3gpp-tt/1000
   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100;
   sver=6256,60; tx3g=81...
   a=sendonly

   A -> O

   m=video <port> RTP/AVP 98..
   a=rtpmap:98 3gpp-tt/1000
   a=fmtp:98 tx=100; ty=100; layer=0; height=80; width=100; max-h=100;
   max-w=160; sver=60
   a=recvonly
Top   ToC   RFC4396 - Page 60
   Note that "max-h" and "max-w" are not present in the offer.  Also,
   with this answer, the answerer would accept the offer as is (thus
   echoing "tx", "ty", "height", "width", and "layer") and additionally
   inform the offerer about its capabilities: "max-h" and "max-w".

   Another possible answer for this case would be:

   A -> O

   m=video <port> RTP/AVP 98..
   a=rtpmap:98 3gpp-tt/1000
   a=fmtp:98 tx=120; ty=105; layer=0; max-h=95; max-w=150; sver=60
   a=recvonly

   In this case, the answerer does not accept the values offered.  The
   offerer MUST use these values or else remove the stream.

9.4. Parameter Usage outside of Offer/Answer

SDP may also be employed outside of the Offer/Answer context, for instance for multimedia sessions that are announced through the Session Announcement Protocol (SAP) [14] or streamed through the Real Time Streaming Protocol (RTSP) [15]. In this case, the receiver of a session description is required to support the parameters and given values for the streams, or else it MUST reject the session. It is the responsibility of the sender (or creator) of the session descriptions to define the session parameters so that the probability of unsuccessful session setup is minimized. This is out of the scope of this document.

10. IANA Considerations

IANA has registered the media subtype name "3gpp-tt" for the media type "video" as specified in Section 8 of this document.

11. Security Considerations

RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specification [3] and any applicable RTP profile, e.g., AVP [17]. In particular, an attacker may invalidate the current set of active sample descriptions at the client by means of repeating a packet with an old sample description, i.e., replay attack. This would mean that the display of the text would be corrupted, if displayed at all. Another form of attack may consist of sending redundant fragments, whose boundaries do not match the exact boundaries of the originals
Top   ToC   RFC4396 - Page 61
   (as indicated by LEN) or fragments that carry different sample
   lengths (SLEN).  This may cause a decoder to crash.

   These types of attack may easily be avoided by using source
   authentication and integrity protection.

   Additionally, peers in a timed text session may desire to retain
   privacy in their communication, i.e., confidentiality.

   This payload format does not provide any mechanisms for achieving
   these.  Confidentiality, integrity protection, and authentication
   have to be solved by a mechanism external to this payload format,
   e.g., SRTP [10].

12. References

12.1. Normative References

[1] Transparent end-to-end packet switched streaming service (PSS); Timed Text Format (Release 6), TS 26.245 v 6.0.0, June 2004. [2] ISO/IEC 14496-12:2004 Information technology - Coding of audio- visual objects - Part 12: ISO base media file format. [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [6] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003.

12.2. Informative References

[7] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for Generic Forward Error Correction", RFC 2733, December 1999. [8] Perkins, C. and O. Hodson, "Options for Repair of Streaming Media", RFC 2354, June 1998. [9] W3C, "Synchronised Multimedia Integration Language (SMIL 2.0)", August, 2001.
Top   ToC   RFC4396 - Page 62
   [10] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
        Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
        3711, March 2004.

   [11] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg,
        "RTP Retransmission Payload Format", Work in Progress, September
        2005.

   [12] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
        P. Gentric, "RTP Payload Format for Transport of MPEG-4
        Elementary Streams", RFC 3640, November 2003.

   [13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
        Session Description Protocol (SDP)", RFC 3264, June 2002.

   [14] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
        Protocol", RFC 2974, October 2000.

   [15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
        Protocol (RTSP)", RFC 2326, April 1998.

   [16] Transparent end-to-end packet switched streaming service (PSS);
        Protocols and codecs (Release 6), TS 26.234 v 6.1.0, September
        2004.

   [17] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
        Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

   [18] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD
        63, RFC 3629, November 2003.

   [19] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
        RFC 2781, February 2000.

   [20] Friedman, T., Caceres, R., and A. Clark, "RTP Control Protocol
        Extended Reports (RTCP XR)", RFC 3611, November 2003.

   [21] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
        "Extended RTP Profile for RTCP-based Feedback (RTP/AVPF)", Work
        in Progress, August 2004.

   [22] Hellstrom, G., "RTP Payload for Text Conversation", RFC 2793,
        May 2000.

   [23] Hellstrom, G. and P. Jones, "RTP Payload for Text Conversation",
        RFC 4103, June 2005.
Top   ToC   RFC4396 - Page 63
   [24] ITU-T Recommendation T.140 (1998) - Text conversation protocol
        for multimedia application, with amendment 1, (2000).

   [25] ISO/IEC 10646-1: (1993), Universal Multiple Octet Coded
        Character Set.

   [26] ISO/IEC FCD 14496-17 Information technology - Coding of audio-
        visual objects - Part 17: Streaming text format, Work in
        progress, June 2004.

   [27] Transparent end-to-end Packet-switched Streaming Service (PSS);
        3GPP SMIL language profile, (Release 6), TS 26.246 v 6.0.0, June
        2004.

   [28] Casner, S. and P. Hoschka, "MIME Type Registration of RTP
        Payload Formats", RFC 3555, July 2003.

   [29] Freed, N. and J. Klensin, "Media Type Specifications and
        Registration Procedures", BCP 13, RFC 4288, December 2005.

   [30] Transparent end-to-end packet switched streaming service (PSS);
        3GPP file format (3GP) (Release 6), TS 26.244 V6.3. March 2005.

   [31] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd
        Generation Partnership Project (3GPP) Multimedia files", RFC
        3839, July 2004.
Top   ToC   RFC4396 - Page 64

13. Basics of the 3GP File Structure

This section provides a coarse overview of the 3GP file structure, which follows the ISO Base Media file Format [2]. Each 3GP file consists of "Boxes". In general, a 3GP file contains the File Type Box (ftyp), the Movie Box (moov), and the Media Data Box (mdat). The File Type Box identifies the type and properties of the 3GP file itself. The Movie Box and the Media Data Box, serving as containers, include their own boxes for each media. Boxes start with a header, which indicates both size and type (these fields are called, namely, "size" and "type"). Additionally, each box type may include a number of boxes. In the following, only those boxes are mentioned that are useful for the purposes of this payload format. The Movie Box (moov) contains one or more Track Boxes (trak), which include information about each track. A Track Box contains, among others, the Track Header Box (tkhd), the Media Header Box (mdhd), and the Media Information Box (minf). The Track Header Box specifies the characteristics of a single track, where a track is, in this case, the streamed text during a session. Exactly one Track Header Box is present for a track. It contains information about the track, such as the spatial layout (width and height), the video transformation matrix, and the layer number. Since these pieces of information are essential and static (i.e., constant) for the duration of the session, they must be sent prior to the transmission of any text samples. The Media Header Box contains the "timescale" or number of time units that pass in one second, i.e., cycles per second or Hertz. The Media Information Box includes the Sample Table Box (stbl), which contains all the time and data indexing of the media samples in a track. Using this box, it is possible to locate samples in time and to determine their type, size, container, and offset into that container. Inside the Sample Table Box, we can find the Sample Description Box (stsd, for finding sample descriptions), the Decoding Time to Sample Box (stts, for finding sample duration), the Sample Size Box (stsz), and the Sample to Chunk Box (stsc, for finding the sample description index). Finally, the Media Data Box contains the media data itself. In timed text tracks, this box contains text samples. Its equivalent to audio and video is audio and video frames, respectively. The text sample consists of the text length, the text string, and one or several Modifier Boxes. The text length is the size of the text in bytes.
Top   ToC   RFC4396 - Page 65
   The text string is plain text to render.  The Modifier Box is
   information to render in addition to the text, such as color, font,
   etc.

14. Acknowledgements

The authors would like to thank Dave Singer, Jan van der Meer, Magnus Westerlund, and Colin Perkins for their comments and suggestions about this document. The authors would also like to thank Markus Gebhard for the free and publicly available JavE ASCII Editor (used for the ASCII drawings in this document) and Henrik Levkowetz for the Idnits web service.

Authors' Addresses

Jose Rey Panasonic R&D Center Germany GmbH Monzastr. 4c D-63225 Langen, Germany EMail: jose.rey@eu.panasonic.com Phone: +49-6103-766-134 Fax: +49-6103-766-166 Yoshinori Matsui Matsushita Electric Industrial Co., LTD. 1006 Kadoma Kadoma-shi, Osaka, Japan EMail: matsui.yoshinori@jp.panasonic.com Phone: +81 6 6900 9689 Fax: +81 6 6900 9699
Top   ToC   RFC4396 - Page 66
Full Copyright Statement

   Copyright (C) The Internet Society (2006).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.  The IETF invites any interested party to
   bring to its attention any copyrights, patents or patent
   applications, or other proprietary rights that may cover technology
   that may be required to implement this standard.  Please address the
   information to the IETF at ietf-ipr@ietf.org.

Acknowledgement

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).