Internet Engineering Task Force (IETF) I. Johansson Request for Comments: 6236 Ericsson AB Category: Standards Track K. Jung ISSN: 2070-1721 Samsung Electronics Co., Ltd. May 2011 Negotiation of Generic Image Attributes in the Session Description Protocol (SDP)
AbstractThis document proposes a new generic session setup attribute to make it possible to negotiate different image attributes such as image size. A possible use case is to make it possible for a low-end hand- held terminal to display video without the need to rescale the image, something that may consume large amounts of memory and processing power. The document also helps to maintain an optimal bitrate for video as only the image size that is desired by the receiver is transmitted. Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6236.
Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements . . . . . . . . . . . . . . . . . . . . . . . 4 2. Conventions Used in This Document . . . . . . . . . . . . . . 5 3. Specification of the 'imageattr' SDP Attribute . . . . . . . . 5 3.1. Attribute Syntax . . . . . . . . . . . . . . . . . . . . . 5 3.1.1. Overall View of Syntax . . . . . . . . . . . . . . . . 5 3.2. Considerations . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1. No imageattr in First Offer . . . . . . . . . . . . . 11 3.2.2. Different Payload Type Numbers in Offer and Answer . . 11 3.2.3. Asymmetry . . . . . . . . . . . . . . . . . . . . . . 12 3.2.4. sendonly and recvonly . . . . . . . . . . . . . . . . 12 3.2.5. Sample Aspect Ratio . . . . . . . . . . . . . . . . . 13 3.2.6. SDPCapNeg Support . . . . . . . . . . . . . . . . . . 13 3.2.7. Interaction with Codec Parameters . . . . . . . . . . 14 3.2.8. Change of Display in Middle of Session . . . . . . . . 16 3.2.9. Use with Layered Codecs . . . . . . . . . . . . . . . 16 3.2.10. Addition of Parameters . . . . . . . . . . . . . . . . 16 4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1. A High-Level Example . . . . . . . . . . . . . . . . . . . 16 4.2. Detailed Examples . . . . . . . . . . . . . . . . . . . . 17 4.2.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . 17 4.2.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . 18 4.2.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . 19 4.2.4. Example 4 . . . . . . . . . . . . . . . . . . . . . . 20 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 6. Security Considerations . . . . . . . . . . . . . . . . . . . 21 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 8.1. Normative References . . . . . . . . . . . . . . . . . . . 22 8.2. Informative References . . . . . . . . . . . . . . . . . . 22
H.264]), the indication of the image attributes may still provide an optimal use of bandwidth because the attribute will give the encoder a better indication about what image size is preferred anyway and will thus help to avoid wasting bandwidth by encoding with an unnecessarily large resolution. For implementers that are considering rescaling issues, it is worth noting that there are several benefits to doing it on the sender side: o Rescaling on the sender/encoder side is likely to be easier to do as the camera-related software/hardware already contains the
necessary functionality for zooming/cropping/trimming/sharpening the video signal. Moreover, rescaling is generally done in RGB or YUV domains and should not depend on the codecs used. o The encoder may be able to encode in a number of formats but may not know which format to choose as, without the image attribute, it does not know the receiver's performance or preference. o The quality drop due to digital domain rescaling using interpolation is likely to be lower if it is done before the video encoding rather than after the decoding especially when low bitrate video coding is used. o If low-complexity rescaling operations such as simple cropping must be performed, the benefit with having this functionality on the sender side is that it is then possible to present a miniature "what you send" image on the display to help the user to frame the image correctly. Several of the existing standards ([H.263], [H.264], and [MPEG-4]) have support for different resolutions at different framerates. The purpose of this document is to provide for a generic mechanism, which is targeted mainly at the negotiation of the image size. However, to make it more general, the attribute is named 'imageattr'. This document is limited to point-to-point unicast communication scenarios. The attribute may be used in centralized conferencing scenarios as well but due to the abundance of configuration options, it may then be difficult to come up with a configuration that fits all parties. H.264] or config in [MPEG-4].
REQ-4: Make the attribute generic with as few codec specific details/tricks as possible in order to be codec agnostic. Besides the above mentioned requirements, the requirement below may be applicable. OPT-1: The image attribute should support the description of image- related attributes for various types of media, including video, pictures, images, etc. RFC2119]. Section 3.2.10. RFC5234]: image-attr = "imageattr:" PT 1*2( 1*WSP ( "send" / "recv" ) 1*WSP attr-list ) PT = 1*DIGIT / "*" attr-list = ( set *(1*WSP set) ) / "*" ; WSP and DIGIT defined in [RFC5234] set= "[" "x=" xyrange "," "y=" xyrange *( "," key-value ) "]" ; x is the horizontal image size range (pixel count) ; y is the vertical image size range (pixel count)
key-value = ( "sar=" srange ) / ( "par=" prange ) / ( "q=" qvalue ) ; Key-value MAY be extended with other keyword ; parameters. ; At most, one instance each of sar, par, or q ; is allowed in a set. ; ; sar (sample aspect ratio) is the sample aspect ratio ; associated with the set (optional, MAY be ignored) ; par (picture aspect ratio) is the allowed ; ratio between the display's x and y physical ; size (optional) ; q (optional, range [0.0..1.0], default value 0.5) ; is the preference for the given set, ; a higher value means a higher preference onetonine = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" ; Digit between 1 and 9 xyvalue = onetonine *5DIGIT ; Digit between 1 and 9 that is ; followed by 0 to 5 other digits step = xyvalue xyrange = ( "[" xyvalue ":" [ step ":" ] xyvalue "]" ) ; Range between a lower and an upper value ; with an optional step, default step = 1 ; The rightmost occurrence of xyvalue MUST have a ; higher value than the leftmost occurrence. / ( "[" xyvalue 1*( "," xyvalue ) "]" ) ; Discrete values separated by ',' / ( xyvalue ) ; A single value spvalue = ( "0" "." onetonine *3DIGIT ) ; Values between 0.1000 and 0.9999 / ( onetonine "." 1*4DIGIT ) ; Values between 1.0000 and 9.9999 srange = ( "[" spvalue 1*( "," spvalue ) "]" ) ; Discrete values separated by ','. ; Each occurrence of spvalue MUST be ; greater than the previous occurrence. / ( "[" spvalue "-" spvalue "]" ) ; Range between a lower and an upper level (inclusive) ; The second occurrence of spvalue MUST have a higher ; value than the first / ( spvalue ) ; A single value
prange = ( "[" spvalue "-" spvalue "]" ) ; Range between a lower and an upper level (inclusive) ; The second occurrence of spvalue MUST have a higher ; value than the first qvalue = ( "0" "." 1*2DIGIT ) / ( "1" "." 1*2("0") ) ; Values between 0.00 and 1.00 o The attribute typically contains a "send" and a "recv" keyword. These specify the preferences for the media once the session is set up, in the send and receive direction respectively from the point of view of the sender of the session description. One of the keywords ("send" or "recv") MAY be omitted; see Section 3.2.4 and Section 3.2.2 for a description of cases when this may be appropriate. o The "send" keyword and corresponding attribute list (attr-list) MUST NOT occur more than once per image attribute. o The "recv" keyword and corresponding attribute list (attr-list) MUST NOT occur more than once per image attribute. o PT is the payload type number; it MAY be set to "*" (wild card) to indicate that the attribute applies to all payload types in the media description. o For sendrecv streams, both of the send and recv directions SHOULD be present in the SDP. o For inactive streams it is RECOMMENDED that both of the send and recv directions are present in the SDP.
specify a payload type number with the 'imageattr' attribute. See Section 3.2.2 for a discussion and recommendation how this is solved. Preference (q): The preference for each set is 0.5 by default; setting the optional q parameter to another value makes it possible to set different preferences for the sets. A higher value gives a higher preference for the given set. sar: The sar (storage aspect ratio) parameter specifies the sample aspect ratio associated to the given range of x and y values. The sar parameter is defined as dx/dy where dx and dy are the physical size of the pixels. Square pixels gives a sar=1.0. The parameter sar MAY be expressed as a range or as a single value. If this parameter is not present, a default sar value of 1.0 is assumed. The interpretation of sar differs between the send and the receive directions. * In the send direction, sar defines a specific sample aspect ratio associated to a given x and y image size (range). * In the recv direction, sar expresses that the receiver of the given medium prefers to receive a given x and y resolution with a given sample aspect ratio. See Section 3.2.5 for a more detailed discussion. The sar parameter will likely not solve all the issues that are related to different sample aspect ratios, but it can help to solve them and reduce aspect ratio distortion. The response MUST NOT include a sar parameter if there is no acceptable value given. The reason for this is that if the response includes a sar parameter it is interpreted as "sar parameter accepted", while removal of the sar parameter is treated as "sar parameter not accepted". For this reason, it is safer to remove an unacceptable sar parameter altogether. par: The par (width/height = x/y ratio) parameter indicates a range of allowed ratios between x and y physical size (picture aspect ratio). This is used to limit the number of x and y image size combinations; par is given as par=[ratio_min-ratio_max]
where ratio_min and ratio_max are the min and max allowed picture aspect ratios. If sar and the sample aspect ratio that the receiver actually uses in the display are the same (or close), the relation between the x and y pixel resolution and the physical size of the image is straightforward. If however sar differs from the sample aspect ratio of the receiver display, this must be taken into consideration when the x and y pixel resolution alternatives are sorted out. See Section 4.2.4 for an example of this. RFC3264], offer/answer exchange of the image attribute is as follows. o Offerer sending the offer: * The offerer must be able to support the image attributes that it offers, unless the offerer has expressed a wild card (*) in the attribute list. * It is recommended that a device that sees no reason to use the image attribute includes the attribute with wild cards (*) in the attribute lists anyway for the send and recv directions. An example of this looks like: a=imageattr:97 send * recv * This gives the answerer the possibility of expressing its preferences. The use of wild cards introduces a risk that the message size can increase in an uncontrolled way. To reduce this risk, these wild cards SHOULD only be replaced by an as small set as possible. o Answerer receiving the offer and sending the answer: * The answerer may choose to keep the image attribute but is not required to do so. * The answerer may, for its receive and send direction, include one or more entries that it can support from the set of entries proposed in the offer. * The answerer may also, for its receive and send direction, replace the entries with a complete new set of entries different from the original proposed by the offerer. The
implementor of this feature should however be aware that this may cause extra offer/answer exchanges. * The answerer may also remove its send direction completely if it is deemed that it cannot support any of the proposed entries. * The answerer should not include an image attribute in the answer if it was not present in the offer. o Offerer receiving the answer: * If the image attribute is not included in the SDP answer the offerer SHOULD continue to process the answer as if this mechanism had not been offered. * If the image attribute is included in the SDP answer but none of the entries are usable or acceptable, the offerer MUST resort to other methods to determine the appropriate image size. In this case, the offerer must also issue a new offer/ answer without the image attribute to avoid misunderstandings between the offerer and answerer. This will avoid the risk of infinite negotiations. Section 18.104.22.168 require the attribute to be absent in the answer. The reasons for this are: o The offerer of the initial SDP is not likely to understand the image attribute if it did not include it in the offer, bearing in mind that Section 3.1.1 recommends that the offerer provide the attribute with wild carded parameters if it has no preference. o Inclusion of the image attribute in the answer may come in conflict with the rules in Section 22.214.171.124, especially the rules that apply to "offerer receiving the answer". For the above reasons, it is RECOMMENDED that a device that sees no reason to use the image attribute includes the attribute with wild cards (*) in the attribute lists anyway for the send and recv directions.
Of the alternatives listed above, the last one MUST be used as it is the most safe. The other alternatives MUST NOT be used. H.264] with profile level 1.2 does not support higher resolution than 352x288 (CIF). The offer/answer rules imply that the same profile level must be used in both directions. This means that in an asymmetric scenario where Alice wants an image size of 580x360 and Bob wants 150x120, profile level 2.2 is needed in both directions even though profile level 1 would have been sufficient in one direction. Currently, the only solution to this problem is to specify two unidirectional media descriptions. Note however that the asymmetry issue for the H.264 codec is solved by means of the level-asymmetry- allowed parameter in [RFC6184].
To avoid this problem Alice may specify a range of values for the sar parameter like: a=imageattr:97 send [x=720,y=576,sar=[0.91,1.0,1.09,1.45]] Meaning that Alice can encode with any of the mentioned sample aspect ratios, leaving Bob to decide which one he prefers. RFC5939] framework and its use is then specified using the "a=acap" parameter. An example is a=acap:1 imageattr:97 send [x=720,y=576,sar=[0.91,1.0,1.09,1.45]] For use with SDP Media Capability Negotiation extension [SDPMedCapNeg], where it is no longer possible to specify payload type numbers, it is possible to use the parameter substitution rule, an example of this is ... a=mcap:1 video H264/90000 a=acap:1 imageattr:%1% send [x=720,y=576,sar=[0.91,1.0,1.09,1.45]] ... where %1% maps to media capability number 1. It is also possible to use the a=mscap attribute like in the example below. ... a=mcap:1 video H264/90000 a=mscap:1 imageattr send [x=720,y=576,sar=[0.91,1.0,1.09,1.45]] ... Section 126.96.36.199 outlines a few possible solutions, but this document does not make a recommendation for any of them.
H.263] is described in [RFC4629]. H.263 defines (on the fmtp line) a list of image sizes and their maximum frame rates (profiles) that the offerer can receive. The answerer is not allowed to modify this list and must reject a payload type that contains an unsupported profile. The CUSTOM profile may be used for image size negotiation but support for asymmetry requires the specification of two unidirectional media descriptions using the sendonly/recvonly attributes. H.264] is described in [RFC6184]. H.264 defines information related to image size in the fmtp line by means of sprop-parameter-sets. According to the specification, several sprop-parameter-sets may be defined for one payload type. The sprop-parameter-sets describe the image size (+ more) that the offerer sends in the stream and need not be complete. This means that sprop-parameter-sets does not represent any negotiation and the answer is not allowed to change the sprop-parameter-sets. This configuration may be changed later inband if for instance image sizes need to be changed or added. MPEG-4] is described in [RFC3016]. MPEG-4 defines a config parameter on the fmtp line, which is a hexadecimal representation of the MPEG-4 visual configuration information. This configuration does not represent any negotiation and the answer is not allowed to change the parameter. It is not possible to change the configuration using inband signaling.
o Ignore payload format parameters: This may not work well in the presence of bad channel conditions especially in the beginning of a session. Moreover, this is not a good option for MPEG-4. o Second session-wide offer/answer round: In the second offer/ answer, the parameters specific to codec payload format are defined based on the outcome of the 'imageattr' negotiation. The drawback with this is that setup of the entire session (including audio) may be delayed considerably, especially as the 'imageattr' negotiation can already itself cost up to two offer/answer rounds. Also, the conflict between the 'imageattr' negotiation and the parameters specific to payload format is still present after the first offer/answer round and a fuzzy/buggy implementation may start media before the second offer/answer is completed with unwanted results. o Second session-wide offer/answer round only for video: This is similar to the alternative above with the exception that setup time for audio is not increased; moreover, the port number for video is set to 0 during the first offer answer round to avoid the flow of media. This has the effect that video will blend in some time after the audio is started (up to 2 seconds delay). This alternative is likely the most clean-cut and failsafe. The drawback is, as the port number in the first offer is always zero, the media startup will always be delayed even though it would in fact have been possible to start media after the first offer/answer round. Note that according to [RFC3264], a port number of zero means that the whole media line is rejected, meaning that a new offer for the same port number should be treated as a completely new stream and not as an update. The safest way to solve this problem is to use preconditions; this is however outside the scope of this document. RFC3515] or SIP-UPDATE [RFC3311] methods. It is RECOMMENDED to negotiate the image size during this renegotiation.
In the first alternative, the recv direction may be a full list of desired image size formats. It may however (and most likely) just be a list with one alternative for the preferred x and y resolution. If Bob supports an x and y resolution in at least one of the X and Y ranges given in the send attr-list and in the recv attr-list of the offer, the answer from Bob will look like: a=imageattr:PT send attr-list recv attr-list and the offer/answer negotiation is done. Note that the attr-list will likely be pruned in the answer. While it may contain many different alternatives in the offer, it may in the end contain just one or two alternatives. If Bob does not support any x and y resolution in one of the provided send or recv ranges given in the send attr-list or in the recv attr- list, the corresponding part is removed completely. For instance, if Bob doesn't support any of the offered alternatives in the recv attr- list in the offer, the answer from Bob would look like: a=imageattr:PT recv attr-list
There is however a possibility that "recv [x=330,y=250]" is not supported. If the case, Bob may completely remove this part or replace it with a list of supported image sizes. a=imageattr:97 recv [x=800,y=640,sar=1.1] \ send [x=[320:16:640],y=[240:16:480],par=[1.2-1.3]] Alice can then select a valid image size that is closest to the one that was originally desired (336x256) and performs a second offer/ answer. a=imageattr:97 send [x=800,y=640,sar=1.1] \ recv [x=336,y=256] Bob replies with: a=imageattr:97 recv [x=800,y=640,sar=1.1] \ send [x=336,y=256]
RFC4566], the IANA is requested to register one new SDP attribute: Attribute name: imageattr Long form name: Image attribute
Type of attribute: Media-level Subject to charset: No Purpose: This attribute defines the ability to negotiate various image attributes such as image sizes. The attribute contains a number of parameters which can be modified in an offer/answer exchange. Appropriate values: See Section 3.1.1 of RFC 6236 Contact name: Authors of RFC 6236
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with Session Description Protocol (SDP)", RFC 3264, June 2002. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006. [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding Dependency in the Session Description Protocol (SDP)", RFC 5583, July 2009. [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, June 2010. [H.263] ITU-T, ITU-T Recommendation H.263 (2005): "Video coding for low bit rate communication". [H.264] ITU-T, ITU-T Recommendation H.264: "Advanced video coding for generic audiovisual services", <http://www.itu.int/rec/T-REC-H.264-200711-S/en>. [MPEG-4] ISO/IEC, ISO/IEC 14496-2:2004: "Information technology - Coding of audio-visual objects - Part 2: Visual". [RFC3016] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H. Kimata, "RTP Payload Format for MPEG-4 Audio/ Visual Streams", RFC 3016, November 2000. [RFC3311] Rosenberg, J., "The Session Initiation Protocol (SIP) UPDATE Method", RFC 3311, October 2002. [RFC3515] Sparks, R., "The Session Initiation Protocol (SIP) Refer Method", RFC 3515, April 2003.
[RFC4629] Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. Even, "RTP Payload Format for ITU-T Rec", RFC 4629, January 2007. [RFC5939] Andreasen, F., "Session Description Protocol (SDP) Capability Negotiation", RFC 5939, September 2010. [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP Payload Format for H.264 Video", RFC 6184, May 2011. [SDPMedCapNeg] Gilman, R., Even, R., and F. Andreasen, "SDP Media Mapabilities Negotiation", Work in Progress, February 2011.