Network Working Group D. Willis, Ed. Request for Comments: 5373 Softarmor Systems Category: Standards Track A. Allen Research in Motion (RIM) November 2008 Requesting Answering Modes for the Session Initiation Protocol (SIP) Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (c) 2008 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
AbstractThis document extends SIP with two header fields and associated option tags that can be used in INVITE requests to convey the requester's preference for user-interface handling related to answering of that request. The first header, "Answer-Mode", expresses a preference as to whether the target node's user interface waits for user input before accepting the request or, instead, accepts the request without waiting on user input. The second header, "Priv-Answer-Mode", is similar to the first, except that it requests administrative-level access and has consequent additional authentication and authorization requirements. These behaviors have applicability to applications such as push-to-talk and to diagnostics like loop-back. Usage of each header field in a response to indicate how the request was handled is also defined.
1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 2. Syntax of Header Fields and Option Tags . . . . . . . . . . . 5 3. Usage of the Answer-Mode and Priv-Answer-Mode Header Fields . 6 4. Usage of the Answer-Mode and Priv-Answer-Mode Header Fields in Requests . . . . . . . . . . . . . . . . . . . . . . 6 4.1. The Difference Between Answer-Mode and Priv-Answer-Mode . 7 4.2. The "require" Modifier . . . . . . . . . . . . . . . . . . 9 4.3. Procedures at User Agent Clients (UAC) . . . . . . . . . . 9 4.3.1. All Requests . . . . . . . . . . . . . . . . . . . . . 9 4.3.2. REGISTER Transactions . . . . . . . . . . . . . . . . 9 4.3.3. INVITE Transactions . . . . . . . . . . . . . . . . . 10 4.4. Procedures at Intermediate Proxies . . . . . . . . . . . . 12 4.4.1. General Proxy Behavior . . . . . . . . . . . . . . . . 12 4.4.2. Issues with Automatic Answering and Forking . . . . . 12 4.5. Procedures at User Agent Servers (UAS) . . . . . . . . . . 13 4.5.1. INVITE Transactions . . . . . . . . . . . . . . . . . 13 5. Usage of the Answer-Mode and Priv-Answer-Mode Header Fields in Responses . . . . . . . . . . . . . . . . . . . . . 14 5.1. Procedures at the UAS . . . . . . . . . . . . . . . . . . 14 5.2. Procedures at the UAC . . . . . . . . . . . . . . . . . . 15 6. Examples of Usage . . . . . . . . . . . . . . . . . . . . . . 15 6.1. REGISTER Request . . . . . . . . . . . . . . . . . . . . . 15 6.2. INVITE Request . . . . . . . . . . . . . . . . . . . . . . 16 6.3. 200 (OK) Response . . . . . . . . . . . . . . . . . . . . 16 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 7.1. Attack Sensitivity Depends on Media Characteristics . . . 17 7.2. Application Design Affects Attack Opportunity . . . . . . 19 7.3. Applying the Analysis . . . . . . . . . . . . . . . . . . 19 7.4. Minimal Policy Requirement . . . . . . . . . . . . . . . . 21 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 8.1. Registration of Header Fields . . . . . . . . . . . . . . 22 8.2. Registration of Header Field Parameters . . . . . . . . . 22 8.3. Registration of SIP Option Tags . . . . . . . . . . . . . 22 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 10.1. Normative References . . . . . . . . . . . . . . . . . . . 23 10.2. Informative References . . . . . . . . . . . . . . . . . . 24
RFC3261]) involves 1) sending a request for a session (a SIP INVITE) and notifying the user receiving the request, 2) acceptance of the request and of the session by that user, and 3) the sending of a response (SIP 200 OK) back to the requester before the session is established. Some usage scenarios deviate from this model, specifically with respect to the notification and acceptance phase. While it has always been possible for the node receiving the request to skip the notification and acceptance phases, there has been no standard mechanism for the party sending the request to specifically indicate a desire (or requirement) for this sort of treatment. This document defines a SIP extension header field that can be used to request specific treatment related to the notification and acceptance phase. The first usage scenario is the requirement for diagnostic loopback calls. In this sort of scenario, a testing service sends an INVITE to a node being tested. The tested node accepts and a dialog is established. But rather than establishing a two-way media flow, the tested node loops back or "echoes" media received from the testing service back toward the testing service. The testing service can then analyze the media flow for quality and timing characteristics. Session Description Protocol (SDP) usage for this sort of flow is described in [LOOPBACK]. In this sort of application, it might not be necessary that the human using the tested node interact with the node in any way for the test to be satisfactorily executed. In some cases, it might be appropriate to alert the user to the ongoing test, and in other cases it might not be. The second scenario is that of push-to-talk applications, which have been specified by the Open Mobile Alliance. In this sort of environment, SIP is used to establish a dialog supporting asynchronous delivery of unidirectional media flow, providing a user experience like that of a traditional two-way radio. It is conventional for the INVITES used to be automatically accepted by the called UA (User Agent), and the media is commonly played out on a loudspeaker. The called party's UA's microphone is not engaged until the user presses the local "talk" button to respond. A third scenario is the Private Branch Exchange (PBX) attendant. Traditional office PBX systems often include intercom functionality. A typical use for the intercom function is to allow a receptionist to activate a loudspeaker on a desk telephone in order to announce a visitor. Not every caller can access the loudspeaker, only the
receptionist or operator, and it is not expected that these callers will always want "intercom" functionality -- they might instead want to make an ordinary call. There are presumably many more use cases for the extensions defined in this specification, but this document was developed to specifically meet the requirements of these scenarios, or others with essentially similar properties. These sorts of mechanisms are not required to provide the functionality of an "answering machine" or "voice mail recorder". Such a device knows that it is expected to answer and does not require a SIP extension to support its behavior. Much of the discussion of this topic in working group meetings and on the mailing list dealt with differentiating "answering mode" from "alerting mode". Some early work did not make this distinction. We therefore proceed with the following definitions: o Answering Mode includes behaviors in a SIP UA relating to acceptance or rejection of a request that are contingent on interaction between the UA and the user of that UA after the UA has received the request. We are principally concerned with the user interaction involved in accepting the request and initiating an active session. An example of this might be pressing the "yes" button on a mobile phone. o Alerting Mode includes behaviors in a SIP UA relating to informing the user of the UA that a request to initiate a session has been received. An example of this might be activating the ring tone of a mobile phone. This document deals only with "Answering Mode". Issues relating to "Alerting Mode" are outside its scope. This document defines two SIP extension header fields: "Answer-Mode" and "Priv-Answer-Mode". These two extensions take the same parameters and operate in the same general way. The distinction between Answer-Mode and Priv-Answer-Mode relates to the level of authorization claimed by the User Agent Client (UAC) and verified and policed by the User Agent Server (UAS). Requests are usually made using Answer-Mode. Requests made using Priv-Answer-Mode request "privileged" treatment from the UAS. This mechanism is discussed in greater detail below, in Section 4.1.
Priv-Answer-Mode is not an assertion of privilege. Instead, it is a request for privileged treatment. This is similar to the UNIX model, where a user might run a command normally or use "sudo" to request administrative privilege for the command. Including "Priv-" is equivalent to prefixing a UNIX command with "sudo". In other words, a separate policy table (like "/etc/sudoers") is consulted to determine whether the user may receive the requested treatment. This distinction is discussed in greater detail in Section 4.1. RFC2119]. RFC5234]. Further, it relies on the syntax for SIP defined in [RFC3261]. The syntax for the header fields defined in this document is: Answer-Mode = "Answer-Mode" HCOLON answer-mode-value *(SEMI answer-mode-param) Priv-Answer-Mode = "Priv-Answer-Mode" HCOLON answer-mode-value *(SEMI answer-mode-param) answer-mode-value = "Manual" / "Auto" / token answer-mode-param= "require" / generic-param The SIP option tag indicating support for this extension is "answermode". For implementors: SIP header field names and values are always compared in a case-insensitive manner. The pretty capitalization is just for readability. This syntax includes extension hooks ("token" for answer-mode values and "generic-param" for optional parameters) that could be defined in future. This specification defines only the behavior for the values given explicitly above. In order to provide forward compatibility, implementations MUST ignore unknown values.
Section 7.1 for further discussion of this issue.
parameter could be used to make sure that a test "loopback" call doesn't disturb a user who has configured her phone to manually answer even if the caller requests an automatic answer. The UAS is responsible for deciding how to honor this preference. In general, the UAS makes an authorization decision based on the authenticated identity presented in the request using authentication mechanisms such as SIP Digest Authentication [RFC3261], the SIP Identity mechanism [RFC4474], or (within the restricted networks for which it is suitable) the SIP mechanism for asserted identity within trusted networks [RFC3325]. When making an authorization decision, the UAS should also use authorization information or policy available to the UAS. This decision-making MUST consider the risk model of the media session corresponding to the request, and the UAS MUST NOT answer without user input in cases where the privacy or security of the user would be compromised as a result. Making this determination is a matter of system or application design, and cannot in general be addressed by having a set of functions that are configurable on or off. Specific discussion of media sessions and appropriate policy is discussed in Section 7.
A UAS SHOULD apply a stricter authorization policy to a request with Priv-Answer-Mode than it does to requests with Answer-Mode. The default policy SHOULD be to refuse requests containing Priv-Answer- Mode header fields unless the requester is authenticated and specifically authorized to make Priv-Answer-Mode requests. Failure to enforce such a policy leaves the user potentially vulnerable to abuses, as discussed in Section 7. The use case envisioned for Priv-Answer-Mode relates to handling urgent requests from authorized callers. For example, assume Larry is a limousine driver working with a fleet dispatcher. Larry likes to provide a quiet environment for his car, so his communicator is configured for manual answer mode for all non-privileged calls, including push-to-talk (Answer-Mode: Auto) calls. Each time he gets a call, Larry's communicator chimes softly to alert him to the call. If the circumstances permit it, Larry presses the communicator in order to accept the call, the communicator sends a 200 (OK) response, and the calling party's talk-burst is played out through the communicator's loudspeaker. This treatment is delivered to incoming requests that have an Answer-Mode header field having values of "Manual" or "Auto" (or no Answer-Mode header field at all), no matter who the caller is. Larry's fleet dispatch operator is familiar with this policy, and needs to inform Larry about a critical matter. The dispatch operator tries several times to push-to-talk call Larry (including Answer- Mode: Auto in the requests), but the calls aren't accepted because Larry has fallen asleep, and therefore isn't pressing his communicator to accept the call. The operator then presses his "urgent" button and calls Larry again. This time, the INVITE request carries a "Priv-Answer-Mode: Auto" header field. Larry's communicator checks the identity of the caller (using a SIP Identity assertion or functionally equivalent mechanism), and matches the operator's identity against the list of users allowed to do Priv-Answer-Mode. Since the operator is listed, the communicator immediately returns a 200 (OK) response accepting the call. The operator speaks, and the resulting talk-burst is summarily played out the loudspeaker on Larry's communicator, waking him up. The effect of requesting Priv-Answer-Mode is different than the effect of simply granting higher privilege to an Answer-Mode request based on the requester's identity and corresponding authorization level. This distinction is what allows the fleet operator to make polite (Answer-Mode: Auto) requests to Larry under normal conditions, and receive different handling (Priv-Answer-Mode: Auto) for a request having greater urgency.
In normal operations, only one of either Answer-Mode or Priv-Answer- Mode would be used in an INVITE request. If both are present, the UAS will first test the authorization of the requester for Priv- Answer-Mode and, if authorized, process the request as if only Priv- Answer-Mode had been included. If the requester is not authorized for Priv-Answer-Mode, then the UAS will process the request as if only Answer-Mode had been included. Section 4.5. RFC3840].
If a UA is dependent on support for callee capabilities in the registrar, it MAY include a Require header field with the value "pref" in its REGISTER request. This will cause the registrar to reject the request if the registrar does not support callee capabilities and caller preferences. Example: Require: pref
To require that the UAS either support this extension or reject the request, the UAC includes a Require header field having the value "answermode". This does not actually force the UAS to automatically answer, it just requires that the UAS either understand this extension or reject the request. We do not have a SIP negotiation technique to force specific behavior. Rather, the desired behavior is indicated in the SIP extension itself. Example: Require: answermode To request that retargeting proxies in the path preferentially select targets that have indicated support for this extension in their registration, a UAC includes an Accept-Contact header field with an extensions parameter having a value of "answermode". This usage of Accept-Contact is described in [RFC3841]. This would normally be used in conjunction with the "Require: answermode" header field as described above. Example: Require: answermode Accept-Contact: *;extensions="answermode";methods="INVITE" To request that retargeting proxies in the path do not select targets that have indicated non-support for this extension in their registration, a UAC includes an Accept-Contact header field with an extensions parameter having a value of "answermode" and an option field of "require". This usage of Accept-Contact is described in [RFC3841]. This would normally be used in conjunction with the "Require: answermode" header field as described above. Example: Require: answermode Accept-Contact: *;extensions="answermode"; methods="INVITE";require To request that retargeting proxies in the path exclusively select targets that have indicated support for this extension in their registration, a UAC includes an Accept-Contact header field extensions parameter having a value of "answermode" and options of "require" and "explicit". This usage of Accept-Contact is described in [RFC3841]. This would normally be used in conjunction with the "Require: answermode" header field as described above. Example: Require: answermode Accept-Contact: *;extensions="answermode"; methods="INVITE";require;explicit
Another alternative would be to use dynamic conferencing instead of forking. In this approach, instead of forking the request, a conference would be initiated and all registered UAs invited into that conference. The mixer attached to the conference would then mediate traffic flows appropriately. RFC3261] section 23.2, which in some cases requires user validation of certificates used for S/MIME. Since this places the same interrupt burden on the user as would manually answering the request, a UAS experiencing this requirement for user validation of a request that requires automatic answering SHOULD reject the request with a "403 (Forbidden)" response and MAY include a reason phrase of "certificate validation requires user input not compatible with automatic answer."
For a request having an Answer-Mode value of "Auto" and an Answer- Mode parameter of "require", the UAS SHOULD, if the calling party is authenticated and authorized for automatic answering, accept the request. The UAS MUST NOT allow "manual" answer of this request, but MAY reject it. If, for whatever reason, the UAS chooses not to accept the request automatically, the UAS MUST reject the request, SHOULD do so with a "403 (Forbidden)" response, and MAY include a reason phrase of "automatic answer forbidden". Similar behavior applies for Priv-Answer-Mode, except that the policy for authorization may be different (and generally more stringent).
conversation would also reveal this same information, there might be cases where this information might need to be protected. Consequently, UASs supporting this specification SHOULD include appropriately configurable policy mechanisms for making this determination, and the default configuration SHOULD be to exclude this header field from responses.
charge fraud, unsolicited push-to-talk communications (spam over push-to-talk (SPTT)), playout of obnoxious noises (the "whoopee cushion" attack), battery-rundown denial of service, "forced busy" denial of service, running up the victim's data transport bill, and phishing via session insertion (where an ongoing session is replaced by another without the victim's awareness). Since SIP implementations do not commonly implement end-to-end message protections, this specification is completely dependent on transitive security across SIP proxies. Any misbehaving proxy can insert, delete, and/or alter the contents of the Answer-Mode and Priv-Answer-Mode header fields, and in general can do so without being noticed by either the UAC or UAS. Consequently, it is critical that any proxies in the path be not only trusted, but worthy of that trust. While proxies do not generally intentionally insert, delete, or alter the Answer-Mode and Priv-Answer-Mode header fields, this specification does note a use case for such manipulation by proxies acting on behalf of the user of a UAC or UAS that has limited support for the authentication or policy enforcement needed to securely exercise these extensions. Proxies that perform such extension- sensitive manipulation MUST therefore provide complete policy enforcement, as per the minimal policy discussed in Section 7.4. The existing body of SIP work provides strong capabilities for authentication of requests, prevention of man-in-the-middle attacks, protection of the privacy and integrity of media flows, and so on (although as noted above, these capabilities usually rely on transitive trust across proxies). The behaviors added by the extensions in this document raise additional possibilities for attacks against media flows not completely addressed by existing SIP work, and therefore require analysis in this document. Media attacks can be loosely categorized as: Insertion: Media is inserted into and played out by the victim UA without consent of the UA's user. Interception: The victim UA's media acquisition facility (such as a microphone or camera) is activated, producing a media stream, without the consent of the UA's user.
unbounded, we might find it more effective to model loose categories of media modality rather than explicitly describing every possible scenario. Security analysis can then be applied per modality. The media modalities of interest appear to be: UAC-sourced (Inbound) Unidirectional Media Insertion: Sensitive media flows from the UAC and is rendered by the UAS, annoying the user of the UAS or disrupting the function of the UAS. We refer to this as the "whoopee-cushion" attack because of its utility in replicating the rude-noise-making seat cushion. The danger of this attack is quite literally amplified by a loudspeaker apparatus attached to the victim UAS. Media that has minimal secondary implication (such as sending a move in a chess game to a computer that isn't running a chess game) is related, but of far less significance. This sort of attack can also have other consequences, such as discharging the victim's battery or increasing charges for data transport to be paid by the victim. UAS-sourced (Outbound) Unidirectional Media Interception: Sensitive media flows from the UAS and is rendered by the UAC, violating the privacy of the user of the UAS. We refer to this as the "bug-my- phone" attack because that would appear to be the primary attack motivator. Bidirectional Media Insertion or Interception: Bidirectional media is the common case when SIP is used in a voice-over-IP scenario or "traditional phone call". Once a media flow is established, both ends send and receive media without further engagement. The media information is presumed to be sensitive -- that is, if intercepted it damages the victim's privacy, and if inserted, it annoys or interferes with the recipient. Attacks of this sort might produce either the "whoopee-cushion" or "bug-my-phone" scenarios, potentially even simultaneously. It seems reasonable to consider the "bug-my-phone" attack as being in a different class (potentially far more severe) than the "whoopee- cushion" attack. This distinction suggests that security policy could be established in different and presumably less restrictive fashion for inbound media flows than for outbound media flows. The set of callers from which a user would be willing to automatically accept inbound media is reasonably much broader than the set of callers to which a user would be willing to automatically grant outbound media access, although this may not be true in all environments, especially those where reception of unwanted media has unwanted financial consequences.
For example, assume a UA is designed such that it can be used to receive push-to-talk calls to a loudspeaker, and it can be used as a "baby monitor" (has an open mic and streams received audio to listeners). The policy for activating the push-to-talk loudspeaker would probably need to be reasonably broad (perhaps "all the user's buddies"). However, the policy for the baby monitor would need to be very narrow (perhaps "only the baby's mother") or even completely closed. The minimal policy defined in Section 7.4 explicitly forbids the "baby monitor" functionality.
restrictive policies might reasonably be used, but any policy less restrictive than the approach described below is very likely to result in significant security issues. From the previous discussion of risks, attacks, and vulnerabilities, we can derive five defensive mechanisms available at the application level: 1. Identity -- Know who the request came from. 2. Alerting -- Let the called user know what's happening. Some applications might use inbound media as an alert. 3. Acceptance -- Require called user to make run-time decision. Asking the user to make a run-time decision without alerting the user to the need to make a decision is generally infeasible. This will have implications for possible alerting options that are outside the scope of this document. 4. Limit the Input/Output (I/O) -- Turn off loudspeakers or microphone. This could be used to convert a bidirectional media session (very risky, possible "bug my phone") into a unidirectional, inbound-only (less risky, possible "spam" or "rundown", etc.) session while waiting for user acceptance. 5. Policy -- Rules about other factors, such as black- and whitelisting based on identity, disallowing acceptance without alerting, etc. Since SIP and related work already provide several mechanisms (including SIP Digest Authentication [RFC3261], the SIP Identity mechanism [RFC4474], and the SIP mechanism for asserted identity within private networks [RFC3325], in networks for which it is suitable) for establishing the identity of the originator of a request, we presume that an appropriately selected mechanism is available for UAs implementing the extensions described in this document. In short, UAs implementing these extensions MUST be equipped with and MUST exercise a request-identity mechanism. The analysis below proceeds from an assumption that the identity of the sender of each request is either known or is known to be unknown, and can therefore be considered in related policy considerations. Failure to meet this identity requirement either opens the door to a wide range of attacks or requires operational policy so tight as to make these extensions useless.
We previously established a class distinction between inbound and outbound media flows, and can model bidirectional flows as "worst case" sums of the risks of the other two classes. Given this distinction, it seems reasonable to provide separate directionality policy classes for: 1. Inbound media flows. 2. Outbound media flows. For each directionality policy class, we can divide the set of request identities into three classes: 1. Identities explicitly authorized for the class. 2. Identities explicitly denied for the class. 3. Identities for which we have no explicit policy and need the user to make a decision. Note that not all combinations of policies possible in this decomposition are generally useful. Specifically, a policy of "inbound media denied, outbound media allowed" equates to a "bug my phone" attack, and is disallowed by the minimal policy of Section 7.4, which as written excludes all cases of "Outbound media explicitly authorized".
enforcing this policy requires dialog-stateful inspection in the SIP UAS. In other words, if a session was initiated with automatic answering, the UAS MUST NOT transition to a mode that sends outbound media without explicit acceptance by the user of the UAS.
+------------+------------------------------------------+-----------+ | Name | Description | Reference | +------------+------------------------------------------+-----------+ | answermode | This option tag is for support of the | [RFC5373] | | | Answer-Mode and Priv-Answer-Mode | | | | extensions used to negotiate automatic | | | | or manual answering of a request. | | +------------+------------------------------------------+-----------+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3840] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating User Agent Capabilities in the Session Initiation Protocol (SIP)", RFC 3840, August 2004. [RFC3841] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller Preferences for the Session Initiation Protocol (SIP)", RFC 3841, August 2004. [RFC4474] Peterson, J. and C. Jennings, "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)", RFC 4474, August 2006.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008. [LOOPBACK] Hedayat, K., "An Extension to the Session Description Protocol (SDP) for Media Loopback", Work in Progress, August 2008. [RFC3325] Jennings, C., Peterson, J., and M. Watson, "Private Extensions to the Session Initiation Protocol (SIP) for Asserted Identity within Trusted Networks", RFC 3325, November 2002.