Internet Architecture Board (IAB) A. Cooper Request for Comments: 6973 CDT Category: Informational H. Tschofenig ISSN: 2070-1721 Nokia Siemens Networks B. Aboba Skype J. Peterson NeuStar, Inc. J. Morris M. Hansen ULD R. Smith Janet July 2013 Privacy Considerations for Internet Protocols
AbstractThis document offers guidance for developing privacy considerations for inclusion in protocol specifications. It aims to make designers, implementers, and users of Internet protocols aware of privacy- related design choices. It suggests that whether any individual RFC warrants a specific privacy considerations section will depend on the document's content. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Architecture Board (IAB) and represents information that the IAB has deemed valuable to provide for permanent record. It represents the consensus of the Internet Architecture Board (IAB). Documents approved for publication by the IAB are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6973.
Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
1. Introduction ....................................................4 2. Scope of Privacy Implications of Internet Protocols .............5 3. Terminology .....................................................6 3.1. Entities ...................................................7 3.2. Data and Analysis ..........................................8 3.3. Identifiability ............................................9 4. Communications Model ...........................................10 5. Privacy Threats ................................................12 5.1. Combined Security-Privacy Threats .........................13 5.1.1. Surveillance .......................................13 5.1.2. Stored Data Compromise .............................14 5.1.3. Intrusion ..........................................14 5.1.4. Misattribution .....................................14 5.2. Privacy-Specific Threats ..................................15 5.2.1. Correlation ........................................15 5.2.2. Identification .....................................16 5.2.3. Secondary Use ......................................16 5.2.4. Disclosure .........................................17 5.2.5. Exclusion ..........................................17 6. Threat Mitigations .............................................18 6.1. Data Minimization .........................................18 6.1.1. Anonymity ..........................................19 6.1.2. Pseudonymity .......................................20 6.1.3. Identity Confidentiality ...........................20 6.1.4. Data Minimization within Identity Management .......21 6.2. User Participation ........................................21 6.3. Security ..................................................22 7. Guidelines .....................................................23 7.1. Data Minimization .........................................24 7.2. User Participation ........................................25 7.3. Security ..................................................25 7.4. General ...................................................26 8. Example ........................................................26 9. Security Considerations ........................................31 10. Acknowledgements ..............................................31 11. IAB Members at the Time of Approval ...........................32 12. Informative References ........................................32
RFC3552] provides detailed guidance to protocol designers about both how to consider security as part of protocol design and how to inform readers of protocol specifications about security issues. This document intends to provide a similar set of guidelines for considering privacy in protocol design. Privacy is a complicated concept with a rich history that spans many disciplines. With regard to data, often it is a concept applied to "personal data", commonly defined as information relating to an identified or identifiable individual. Many sets of privacy principles and privacy design frameworks have been developed in different forums over the years. These include the Fair Information Practices [FIPs], a baseline set of privacy protections pertaining to the collection and use of personal data (often based on the principles established in [OECD], for example), and the Privacy by Design concept, which provides high-level privacy guidance for systems design (see [PbD] for one example). The guidance provided in this document is inspired by this prior work, but it aims to be more concrete, pointing protocol designers to specific engineering choices that can impact the privacy of the individuals that make use of Internet protocols. Different people have radically different conceptions of what privacy means, both in general and as it relates to them personally [Westin]. Furthermore, privacy as a legal concept is understood differently in different jurisdictions. The guidance provided in this document is generic and can be used to inform the design of any protocol to be used anywhere in the world, without reference to specific legal frameworks. Whether any individual document warrants a specific privacy considerations section will depend on the document's content. Documents whose entire focus is privacy may not merit a separate section (for example, "Private Extensions to the Session Initiation Protocol (SIP) for Asserted Identity within Trusted Networks" [RFC3325]). For certain specifications, privacy considerations are a subset of security considerations and can be discussed explicitly in the security considerations section. Some documents will not require discussion of privacy considerations (for example, "Definition of the Opus Audio Codec" [RFC6716]). The guidance provided here can and should be used to assess the privacy considerations of protocol, architectural, and operational specifications and to decide whether those considerations are to be documented in a stand-alone section, within the security considerations section, or throughout the
document. The guidance provided here is meant to help the thought process of privacy analysis; it does not provide specific directions for how to write a privacy considerations section. This document is organized as follows. Section 2 describes the extent to which the guidance offered here is applicable within the IETF and within the larger Internet community. Section 3 explains the terminology used in this document. Section 4 reviews typical communications architectures to understand at which points there may be privacy threats. Section 5 discusses threats to privacy as they apply to Internet protocols. Section 6 outlines mitigations of those threats. Section 7 provides the guidelines for analyzing and documenting privacy considerations within IETF specifications. Section 8 examines the privacy characteristics of an IETF protocol to demonstrate the use of the guidance framework. Section 7 ask protocol designers to consider how their protocols are expected to interact with systems and information that exist outside the protocol bounds, but not to imagine every possible deployment scenario. Furthermore, in many cases the privacy properties of a system are dependent upon the complete system design where various protocols are combined together to form a product solution; the implementation, which includes the user interface design; and operational deployment practices, including default privacy settings and security processes of the company doing the deployment. These details are specific to
particular instantiations and generally outside the scope of the work conducted in the IETF. The guidance provided here may be useful in making choices about these details, but its primary aim is to assist with the design, implementation, and operation of protocols. Transparency of data collection and use -- often effectuated through user interface design -- is normally relied on (whether rightly or wrongly) as a key factor in determining the privacy impact of a system. Although most IETF activities do not involve standardizing user interfaces or user-facing communications, in some cases, understanding expected user interactions can be important for protocol design. Unexpected user behavior may have an adverse impact on security and/or privacy. In sum, privacy issues, even those related to protocol development, go beyond the technical guidance discussed herein. As an example, consider HTTP [RFC2616], which was designed to allow the exchange of arbitrary data. A complete analysis of the privacy considerations for uses of HTTP might include what type of data is exchanged, how this data is stored, and how it is processed. Hence the analysis for an individual's static personal web page would be different than the use of HTTP for exchanging health records. A protocol designer working on HTTP extensions (such as Web Distributed Authoring and Versioning (WebDAV) [RFC4918]) is not expected to describe the privacy risks derived from all possible usage scenarios, but rather the privacy properties specific to the extensions and any particular uses of the extensions that are expected and foreseen at design time. RFC4949], each entry is preceded by a dollar sign ($) and a space for automated searching. Note that this document does not try to attempt to define the term 'privacy' with a brief definition. Instead, privacy is the sum of what is contained in this document. We therefore follow the approach taken by [RFC3552]. Examples of several different brief definitions are provided in [RFC4949].
Section 4. $ Attacker: An entity that works against one or more privacy protection goals. Unlike observers, attackers' behavior is unauthorized. $ Eavesdropper: A type of attacker that passively observes an initiator's communications without the initiator's knowledge or authorization. See [RFC4949]. $ Enabler: A protocol entity that facilitates communication between an initiator and a recipient without being directly in the communications path. $ Individual: A human being. $ Initiator: A protocol entity that initiates communications with a recipient. $ Intermediary: A protocol entity that sits between the initiator and the recipient and is necessary for the initiator and recipient to communicate. Unlike an eavesdropper, an intermediary is an entity that is part of the communication architecture and therefore at least tacitly authorized. For example, a SIP [RFC3261] proxy is an intermediary in the SIP architecture. $ Observer: An entity that is able to observe and collect information from communications, potentially posing privacy threats, depending on the context. As defined in this document, initiators, recipients, intermediaries, and enablers can all be observers. Observers are distinguished from eavesdroppers by being at least tacitly authorized. $ Recipient: A protocol entity that receives communications from an initiator.
RFC4949]. $ Correlation: The combination of various pieces of information that relate to an individual or that obtain that characteristic when combined. $ Fingerprint: A set of information elements that identifies a device or application instance. $ Fingerprinting: The process of an observer or attacker uniquely identifying (with a sufficiently high probability) a device or application instance based on multiple information elements communicated to the observer or attacker. See [EFF]. $ Item of Interest (IOI): Any data item that an observer or attacker might be interested in. This includes attributes, identifiers, identities, communications content, and the fact that a communication interaction has taken place. $ Personal Data: Any information relating to an individual who can be identified, directly or indirectly. $ (Protocol) Interaction: A unit of communication within a particular protocol. A single interaction may be comprised of a single message between an initiator and recipient or multiple messages, depending on the protocol. $ Traffic Analysis: The inference of information from observation of traffic flows (presence, absence, amount, direction, timing, packet size, packet composition, and/or frequency), even if flows are encrypted. See [RFC4949]. $ Undetectability: The inability of an observer or attacker to sufficiently distinguish whether an item of interest exists or not. $ Unlinkability: Within a particular set of information, the inability of an observer or attacker to distinguish whether two items of interest are related or not (with a high enough degree of probability to be useful to the observer or attacker).
RFC4949]. Identifiers can be based upon natural names -- official names, personal names, and/or nicknames -- or can be artificial (for example, x9z32vb). However, identifiers are by definition unique within their context of use, while natural names are often not unique. $ Identity: Any subset of an individual's attributes, including names, that identifies the individual within a given context. Individuals usually have multiple identities for use in different contexts. $ Identity Confidentiality: A property of an individual where only the recipient can sufficiently identify the individual within a set of other individuals. This can be a desirable property of authentication protocols. $ Identity Provider: An entity (usually an organization) that is responsible for establishing, maintaining, securing, and vouching for the identities associated with individuals.
$ Official Name: A personal name for an individual that is registered in some official context (for example, the name on an individual's birth certificate). Official names are often not unique. $ Personal Name: A natural name for an individual. Personal names are often not unique and often comprise given names in combination with a family name. An individual may have multiple personal names at any time and over a lifetime, including official names. From a technological perspective, it cannot always be determined whether a given reference to an individual is, or is based upon, the individual's personal name(s) (see Pseudonym). $ Pseudonym: A name assumed by an individual in some context, unrelated to the individual's personal names known by others in that context, with an intent of not revealing the individual's identities associated with his or her other names. Pseudonyms are often not unique. $ Pseudonymity: The state of being pseudonymous. $ Pseudonymous: A property of an individual in which the individual is identified by a pseudonym. $ Real Name: See Personal Name and Official Name. $ Relying Party: An entity that relies on assertions of individuals' identities from identity providers in order to provide services to individuals. In effect, the relying party delegates aspects of identity management to the identity provider(s). Such delegation requires protocol exchanges, trust, and a common understanding of semantics of information exchanged between the relying party and the identity provider.
Communications may be direct between the initiator and the recipient, or they may involve an application-layer intermediary (such as a proxy, cache, or relay) that is necessary for the two parties to communicate. In some cases, this intermediary stays in the communication path for the entire duration of the communication; sometimes it is only used for communication establishment, for either inbound or outbound communication. In some cases, there may be a series of intermediaries that are traversed. At lower layers, additional entities involved in packet forwarding may interfere with privacy protection goals as well. Some communications tasks require multiple protocol interactions with different entities. For example, a request to an HTTP server may be preceded by an interaction between the initiator and an Authentication, Authorization, and Accounting (AAA) server for network access and to a Domain Name System (DNS) server for name resolution. In this case, the HTTP server is the recipient and the other entities are enablers of the initiator-to-recipient communication. Similarly, a single communication with the recipient might generate further protocol interactions between either the initiator or the recipient and other entities, and the roles of the entities might change with each interaction. For example, an HTTP request might trigger interactions with an authentication server or with other resource servers wherein the recipient becomes an initiator in those later interactions. Thus, when conducting privacy analysis of an architecture that involves multiple communications phases, the entities involved may take on different -- or opposing -- roles from a privacy considerations perspective in each phase. Understanding the privacy implications of the architecture as a whole may require a separate analysis of each phase. Protocol design is often predicated on the notion that recipients, intermediaries, and enablers are assumed to be authorized to receive and handle data from initiators. As [RFC3552] explains, "we assume that the end systems engaging in a protocol exchange have not themselves been compromised". However, privacy analysis requires questioning this assumption, since systems are often compromised for the purpose of obtaining personal data. Although recipients, intermediaries, and enablers may not generally be considered as attackers, they may all pose privacy threats (depending on the context) because they are able to observe, collect, process, and transfer privacy-relevant data. These entities are collectively described below as "observers" to distinguish them from traditional attackers. From a privacy perspective, one important
type of attacker is an eavesdropper: an entity that passively observes the initiator's communications without the initiator's knowledge or authorization. The threat descriptions in the next section explain how observers and attackers might act to harm individuals' privacy. Different kinds of attacks may be feasible at different points in the communications path. For example, an observer could mount surveillance or identification attacks between the initiator and intermediary, or instead could surveil an enabler (e.g., by observing DNS queries from the initiator). Solove], as well as [CoE]), showing how each of them may cause individuals to incur privacy harms and providing examples of how these threats can exist on the Internet. This threat modeling is inspired by security threat analysis. Although it is not a perfect fit for assessing privacy risks in Internet protocols and systems, no better methodology has been developed to date. Some privacy threats are already considered in Internet protocols as a matter of routine security analysis. Others are more pure privacy threats that existing security considerations do not usually address. The threats described here are divided into those that may also be considered security threats and those that are primarily privacy threats.
Note that an individual's awareness of and consent to the practices described below may change an individual's perception of and concern for the extent to which they threaten privacy. If an individual authorizes surveillance of his own activities, for example, the individual may be able to take actions to mitigate the harms associated with it or may consider the risk of harm to be tolerable. Section 3 of [RFC3552]) are necessary to prevent surveillance of the content of communications. To prevent traffic analysis or other surveillance of communications patterns, other measures may be necessary, such as [Tor].
RFC6302]), it is important to recognize the potential for that information to be compromised and for that potential to be weighed against the benefits of data storage. Any recipient, intermediary, or enabler that stores data may be vulnerable to compromise. (Note that stored data compromise is distinct from purposeful disclosure, which is discussed in Section 5.2.4.) Section 5.1.1. Unsolicited messages and denial-of-service attacks are the most common types of intrusion on the Internet. Intrusion can be perpetrated by any attacker that is capable of sending unwanted traffic to the initiator. RFC6269] notes, abuse mitigation is often conducted on the basis of the source IP address, such that connections from individual IP addresses may be prevented or temporarily blacklisted if abusive activity is determined to be sourced from those addresses. However, in the case
where a single IP address is shared by multiple individuals, those penalties may be suffered by all individuals sharing the address, even if they were not involved in the abuse. This threat can be mitigated by using identity management mechanisms with proper forms of authentication (ideally with cryptographic properties) so that actions can be attributed uniquely to an individual to provide the basis for accountability without generating false positives. RFC5246] or TLS session resumption without server-side state [RFC5077]. In RFC 5246 [RFC5246], a server provides the client with a session_id in the ServerHello message and caches the master_secret for later exchanges. When the client initiates a new connection with the server, it re-uses the previously obtained session_id in its ClientHello message. The server agrees to resume the session by using the same session_id and the previously stored master_secret for the generation of the TLS Record Layer security association. RFC 5077 [RFC5077] borrows from the session resumption design idea, but the server encapsulates all state information into a ticket instead of caching it. An attacker who is able to observe the protocol exchanges between the TLS client and the TLS server is able to link the initial exchange to subsequently resumed TLS sessions when the session_id and the ticket are exchanged in the clear (which is the case with data exchanged in the initial handshake messages).
In theory, any observer or attacker that receives an initiator's communications can engage in correlation. The extent of the potential for correlation will depend on what data the entity receives from the initiator and has access to otherwise. Often, intermediaries only require a small amount of information for message routing and/or security. In theory, protocol mechanisms could ensure that end-to-end information is not made accessible to these entities, but in practice the difficulty of deploying end-to-end security procedures, additional messaging or computational overhead, and other business or legal requirements often slow or prevent the deployment of end-to-end security mechanisms, giving intermediaries greater exposure to initiators' data than is strictly necessary from a technical point of view.
can generate uncertainty as to how one's information will be used in the future, potentially discouraging information exchange in the first place. Secondary use encompasses any use of data, including disclosure. One example of secondary use would be an authentication server that uses a network access server's Access-Requests to track an initiator's location. Any observer or attacker could potentially make unwanted secondary uses of initiators' data. Protecting against secondary use is typically outside the scope of IETF protocols. Section 5.2.5). A further example is provided by the IETF geolocation privacy architecture [RFC6280], which supports a way for users to express a preference that their location information not be disclosed beyond the intended recipient. RFC5025]), presence clients can authorize the specific conditions under which their presence information may be shared.
Exclusion is primarily considered problematic when the recipient fails to involve the initiator in decisions about data collection, handling, and use. Eavesdroppers engage in exclusion by their very nature, since their data collection and handling practices are covert. Section 5. This section describes three categories of relevant mitigations: (1) data minimization, (2) user participation, and (3) security. The privacy mitigations described in this section can loosely be mapped to existing privacy principles, such as the Fair Information Practices, but they have been adapted to fit the target audience of this document.
Data minimization mitigates the following threats: surveillance, stored data compromise, correlation, identification, secondary use, and disclosure. RFC 3325 (P-Asserted-Identity (PAI)) [RFC3325], an extension for the Session Initiation Protocol (SIP) that allows an individual, such as a Voice over IP (VoIP) caller, to instruct an intermediary that he or she trusts not to populate the SIP From header field with the individual's authenticated and verified identity. The recipient of the call, as well as any other entity outside of the individual's trust domain, would therefore only learn that the SIP message (typically a SIP INVITE) was sent with a header field 'From: "Anonymous" <sip:firstname.lastname@example.org>' rather than the individual's address-of-record, which is typically thought of as the "public address" of the user. When PAI is used, the individual becomes anonymous within the initiator anonymity set that is populated by every individual making use of that specific intermediary. Note that this example ignores the fact that the recipient may infer or obtain personal data from the other SIP payloads (e.g., SIP Via and Contact headers, the Session Description Protocol (SDP)). The implication is that PAI only attempts to address a particular threat, namely the disclosure of identity (in the From header) with respect to the recipient. This caveat makes the analysis of the specific protocol extension easier but cannot be assumed when conducting analysis of an entire architecture.
RFC6350], for example). Pseudonymity is strengthened when less personal data can be linked to the pseudonym; when the same pseudonym is used less often and across fewer contexts; and when independently chosen pseudonyms are more frequently used for new actions (making them, from an observer's or attacker's perspective, unlinkable). For Internet protocols, the following are important considerations: whether protocols allow pseudonyms to be changed without human interaction, the default length of pseudonym lifetimes, to whom pseudonyms are exposed, how individuals are able to control disclosure, how often pseudonyms can be changed, and the consequences of changing them. RFC3748]. EAP includes an identity exchange where the Identity Response is primarily used for routing purposes and selecting which EAP method to use. Since EAP Identity Requests and Identity Responses are sent in cleartext, eavesdroppers and intermediaries along the communication path between the EAP peer and the EAP server can snoop on the identity, which is encoded in the form of the Network Access Identifier (NAI) as defined in RFC 4282 [RFC4282]. To address this threat, as discussed in RFC 4282 [RFC4282], the username part of the NAI (but not the realm part) can be hidden from these eavesdroppers and intermediaries with the cryptographic support offered by EAP methods. Identity confidentiality has become a recommended design criteria for EAP (see [RFC4017]). The EAP method for 3rd Generation Authentication and Key Agreement (EAP-AKA) [RFC4187], for example, protects the EAP peer's identity against passive adversaries by utilizing temporal identities. The EAP-Internet Key Exchange
Protocol version 2 (EAP-IKEv2) method [RFC5106] is an example of an EAP method that offers protection against active attackers with regard to the individual's identity. Section 5.2.5, data collection and use that happen "in secret", without the individual's knowledge, are apt to violate the individual's expectation of privacy and may create incentives for misuse of data. As a result, privacy regimes tend to include
provisions to require informing individuals about data collection and use and involving them in decisions about the treatment of their data. In an engineering context, supporting the goal of user participation usually means providing ways for users to control the data that is shared about them. It may also mean providing ways for users to signal how they expect their data to be used and shared. Different protocol and architectural designs can make supporting user participation (for example, the ability to support a dialog box for user interaction) easier or harder; for example, OAuth-based services may have more natural hooks for user input than AAA services. User participation mitigates the following threats: surveillance, secondary use, disclosure, and exclusion. Section 2 of [RFC3552], a number of security goals also serve to enhance privacy: o Confidentiality: Keeping data secret from unintended listeners. o Peer entity authentication: Ensuring that the endpoint of a communication is the one that is intended (in support of maintaining confidentiality). o Unauthorized usage: Limiting data access to only those users who are authorized. (Note that this goal also falls within data minimization.) o Inappropriate usage: Limiting how authorized users can use data. (Note that this goal also falls within data minimization.) Note that even when these goals are achieved, the existence of items of interest -- attributes, identifiers, identities, communications, actions (such as the sending or receiving of a communication), or anything else an attacker or observer might be interested in -- may still be detectable, even if they are not readable. Thus, undetectability, in which an observer or attacker cannot sufficiently distinguish whether an item of interest exists or not, may be considered as a further security goal (albeit one that can be extremely difficult to accomplish). Detection of the protocols or applications in use via traffic analysis may be particularly difficult to defend against. As with the anonymity of individuals, achieving "protocol anonymity" requires that multiple protocols or applications exist that appear to have the
same attributes -- packet sizes, content, token locations, or inter-packet timing, for example. An attacker or observer will not be able to use traffic analysis to identify which protocol or application is in use if multiple protocols or applications are indistinguishable. Defending against the threat of traffic analysis will be possible to different extents for different protocols, may depend on implementation- or use-specific details, and may depend on which other protocols already exist and whether they share similar traffic characteristics. The defenses will also vary relative to what the protocol is designed to do; for example, in some situations randomizing packet sizes, timing, or token locations will reduce the threat of traffic analysis, whereas in other situations (real-time communications, for example) holding some or all of those factors constant is a more appropriate defense. See "Guidelines for the Use of Variable Bit Rate Audio with Secure RTP" [RFC6562] for an example of how these kinds of trade-offs should be evaluated. By providing proper security protection, the following threats can be mitigated: surveillance, stored data compromise, misattribution, secondary use, disclosure, and intrusion. RFC4101]. Note that the guidance provided in this section does not recommend specific practices. The range of protocols developed in the IETF is too broad to make recommendations about particular uses of data or how privacy might be balanced against other design goals. However, by carefully considering the answers to each question, document authors should be able to produce a comprehensive analysis that can serve as the basis for discussion of whether the protocol adequately protects against privacy threats. This guidance is meant to help the thought process of privacy analysis; it does not provide specific directions for how to write a privacy considerations section. The framework is divided into four sections: three sections that address each of the mitigation classes from Section 6, plus a general section. Security is not fully elaborated, since substantial guidance already exists in [RFC3552].
g. Retention. Does the protocol or its anticipated uses require that the information discussed in (a) or (b) be retained by recipients, intermediaries, or enablers? If so, why? Is the retention expected to be persistent or temporary?
c. Intrusion. How do the protocol's security considerations prevent or mitigate intrusion, including denial-of-service attacks and unsolicited communications more generally? d. Misattribution. How do the protocol's mechanisms for identifying and/or authenticating individuals prevent misattribution? RFC2778], allows users of a communications service to monitor one another's availability and disposition in order to make decisions about communicating. Presence information is highly dynamic and generally characterizes whether a user is online or offline, busy or idle, away from communications devices or nearby, and the like. Necessarily, this information has certain privacy implications, and from the start the IETF approached this work with the aim of providing users with the controls to determine how their presence information would be shared. The Common Profile for Presence (CPP) [RFC3859] defines a set of logical operations for delivery of presence information. This abstract model is applicable to multiple presence systems. The SIP
for Instant Messaging and Presence Leveraging Extensions (SIMPLE) presence system [RFC3856] uses CPP as its baseline architecture, and the presence operations in the Extensible Messaging and Presence Protocol (XMPP) have also been mapped to CPP [RFC3922]. The fundamental architecture defined in RFC 2778 and RFC 3859 is a mediated one. Clients (presentities in RFC 2778 terms) publish their presence information to presence servers, which in turn distribute information to authorized watchers. Presence servers thus retain presence information for an interval of time, until it either changes or expires, so that it can be revealed to authorized watchers upon request. This architecture mirrors existing pre-standard deployment models. The integration of an explicit authorization mechanism into the presence architecture has been widely successful in involving the end users in the decision-making process before sharing information. Nearly all presence systems deployed today provide such a mechanism, typically through a reciprocal authorization system by which a pair of users, when they agree to be "buddies", consent to divulge their presence information to one another. Buddylists are managed by servers but controlled by end users. Users can also explicitly block one another through a similar interface, and in some deployments it is desirable to provide "polite blocking" of various kinds. From a perspective of privacy design, however, the classical presence architecture represents nearly a worst-case scenario. In terms of data minimization, presentities share their sensitive information with presence services, and while services only share this presence information with watchers authorized by the user, no technical mechanism constrains those watchers from relaying presence to further third parties. Any of these entities could conceivably log or retain presence information indefinitely. The sensitivity cannot be mitigated by rendering the user anonymous, as it is indeed the purpose of the system to facilitate communications between users who know one another. The identifiers employed by users are long-lived and often contain personal information, including personal names and the domains of service providers. While users do participate in the construction of buddylists and blacklists, they do so with little prospect for accountability: the user effectively throws their presence information over the wall to a presence server that in turn distributes the information to watchers. Users typically have no way to verify that presence is being distributed only to authorized watchers, especially as it is the server that authenticates watchers, not the end user. Moreover, connections between the server and all publishers and consumers of presence data are an attractive target for eavesdroppers and require strong confidentiality mechanisms, though again the end user has no way to verify what mechanisms are in place between the presence server and a watcher.
Additionally, the sensitivity of presence information is not limited to the disposition and capability to communicate. Capabilities can reveal the type of device that a user employs, for example, and since multiple devices can publish the same user's presence, there are significant risks of allowing attackers to correlate user devices. An important extension to presence was developed to enable the support for location sharing. The effort to standardize protocols for systems sharing geolocation was started in the GEOPRIV working group. During the initial requirements and privacy threat analysis in the process of chartering the working group, it became clear that the system would require an underlying communication mechanism supporting user consent to share location information. The resemblance of these requirements to the presence framework was quickly recognized, and this design decision was documented in [RFC4079]. Location information thus mingles with other presence information available through the system to intermediaries and to authorized watchers. Privacy concerns about presence information largely arise due to the built-in mediation of the presence architecture. The need for a presence server is motivated by two primary design requirements of presence: in the first place, the server can respond with an "offline" indication when the user is not online; in the second place, the server can compose presence information published by different devices under the user's control. Additionally, to facilitate the use of URIs as identifiers for entities, some service must operate a host with the domain name appearing in a presence URI, and in practical terms no commercial presence architecture would force end users to own and operate their own domain names. Many end users of applications like presence are behind NATs or firewalls and effectively cannot receive direct connections from the Internet -- the persistent bidirectional channel these clients open and maintain with a presence server is essential to the operation of the protocol. One must first ask if the trade-off of mediation for presence is worthwhile. Does a server need to be in the middle of all publications of presence information? It might seem that end-to-end encryption of the presence information could solve many of these problems. A presentity could encrypt the presence information with the public key of a watcher and only then send the presence information through the server. The IETF defined an object format for presence information called the Presence Information Data Format (PIDF), which for the purposes of conveying location information was extended to the PIDF Location Object (PIDF-LO) -- these XML objects were designed to accommodate an encrypted wrapper. Encrypting this data would have the added benefit of preventing stored cleartext presence information from being seized by an attacker who manages to compromise a presence server. This proposal, however, quickly runs
into usability problems. Discovering the public keys of watchers is the first difficulty, one that few Internet protocols have addressed successfully. This solution would then require the presentity to publish one encrypted copy of its presence information per authorized watcher to the presence service, regardless of whether or not a watcher is actively seeking presence information -- for a presentity with many watchers, this may place an unacceptable burden on the presence server, especially given the dynamism of presence information. Finally, it prevents the server from composing presence information reported by multiple devices under the same user's control. On the whole, these difficulties render object encryption of presence information a doubtful prospect. Some protocols that support presence information, such as SIP, can operate intermediaries in a redirecting mode rather than a publishing or proxying mode. Instead of sending presence information through the server, in other words, these protocols can merely redirect watchers to the presentity, and then presence information could pass directly and securely from the presentity to the watcher. It is worth noting that this would disclose the IP address of the presentity to the watcher, which has its own set of risks. In that case, the presentity can decide exactly what information it would like to share with the watcher in question, it can authenticate the watcher itself with whatever strength of credential it chooses, and with end-to-end encryption it can reduce the likelihood of any eavesdropping. In a redirection architecture, a presence server could still provide the necessary "offline" indication without requiring the presence server to observe and forward all information itself. This mechanism is more promising than encryption but also suffers from significant difficulties. It too does not provide for composition of presence information from multiple devices -- it in fact forces the watcher to perform this composition itself. The largest single impediment to this approach is, however, the difficulty of creating end-to-end connections between the presentity's device(s) and a watcher, as some or all of these endpoints may be behind NATs or firewalls that prevent peer-to-peer connections. While there are potential solutions for this problem, like Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN), they add complexity to the overall system. Consequently, mediation is a difficult feature of the presence architecture to remove. It is hard to minimize the data shared with intermediaries, especially due to the requirement for composition. Control over sharing with intermediaries must therefore come from some other explicit component of the architecture. As such, the presence work in the IETF focused on improving user participation in the activities of the presence server. This work began in the GEOPRIV working group, with controls on location privacy, as location
of users is perceived as having especially sensitive properties. With the aim of meeting the privacy requirements defined in [RFC2779], a set of usage indications, such as whether retransmission is allowed or when the retention period expires, have been added to the PIDF-LO such that they always travel with the location information itself. These privacy preferences apply not only to the intermediaries that store and forward presence information but also to the watchers who consume it. This approach very much follows the spirit of Creative Commons [CC], namely the usage of a limited number of conditions (such as 'Share Alike' [CC-SA]). Unlike Creative Commons, the GEOPRIV working group did not, however, initiate work to produce legal language or design graphical icons, since this would fall outside the scope of the IETF. In particular, the GEOPRIV rules state a preference on the retention and retransmission of location information; while GEOPRIV cannot force any entity receiving a PIDF-LO object to abide by those preferences, if users lack the ability to express them at all, we can guarantee their preferences will not be honored. The GEOPRIV rules can provide a means to establish accountability. The retention and retransmission elements were envisioned as the most essential examples of preference expression in sharing presence. The PIDF object was designed for extensibility, and the rulesets created for the PIDF-LO can also be extended to provide new expressions of user preference. Not all user preference information should be bound into a particular PIDF object, however; many forms of access control policy assumed by the presence architecture need to be provisioned in the presence server by some interface with the user. This requirement eventually triggered the standardization of a general access control policy language called the common policy framework (defined in [RFC4745]). This language allows one to express ways to control the distribution of information as simple conditions, actions, and transformation rules expressed in an XML format. Common Policy itself is an abstract format that needs to be instantiated: two examples can be found with the presence authorization rules [RFC5025] and the Geolocation Policy [RFC6772]. The former provides additional expressiveness for presence-based systems, while the latter defines syntax and semantics for location-based conditions and transformations. Ultimately, the privacy work on presence represents a compromise between privacy principles and the needs of the architecture and marketplace. While it was not feasible to remove intermediaries from the architecture entirely or prevent their access to presence information, the IETF did provide a way for users to express their preferences and provision their controls at the presence service. We have not had great successes in the implementation space with privacy
mechanisms thus far, but by documenting and acknowledging the limitations of these mechanisms, the designers were able to provide implementers, and end users, with an informed perspective on the privacy properties of the IETF's presence protocols.
[CC-SA] Creative Commons, "Share Alike", 2012, <http://wiki.creativecommons.org/Share_Alike>. [CC] Creative Commons, "Creative Commons", 2012, <http://creativecommons.org/>. [CoE] Council of Europe, "Recommendation CM/Rec(2010)13 of the Committee of Ministers to member states on the protection of individuals with regard to automatic processing of personal data in the context of profiling", November 2010, <https://wcd.coe.int/ViewDoc.jsp?Ref=CM/Rec%282010%2913>. [EFF] Electronic Frontier Foundation, "Panopticlick", 2013, <http://panopticlick.eff.org>. [FIPs] Gellman, B., "Fair Information Practices: A Basic History", 2012, <http://bobgellman.com/rg-docs/rg-FIPShistory.pdf>. [OECD] Organisation for Economic Co-operation and Development, "OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data", (adopted 1980), September 2010, <http://www.oecd.org/>. [PbD] Office of the Information and Privacy Commissioner, Ontario, Canada, "Privacy by Design", 2013, <http://privacybydesign.ca/>.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC2778] Day, M., Rosenberg, J., and H. Sugano, "A Model for Presence and Instant Messaging", RFC 2778, February 2000. [RFC2779] Day, M., Aggarwal, S., Mohr, G., and J. Vincent, "Instant Messaging / Presence Protocol Requirements", RFC 2779, February 2000. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3325] Jennings, C., Peterson, J., and M. Watson, "Private Extensions to the Session Initiation Protocol (SIP) for Asserted Identity within Trusted Networks", RFC 3325, November 2002. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. [RFC3748] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. Levkowetz, "Extensible Authentication Protocol (EAP)", RFC 3748, June 2004. [RFC3856] Rosenberg, J., "A Presence Event Package for the Session Initiation Protocol (SIP)", RFC 3856, August 2004. [RFC3859] Peterson, J., "Common Profile for Presence (CPP)", RFC 3859, August 2004. [RFC3922] Saint-Andre, P., "Mapping the Extensible Messaging and Presence Protocol (XMPP) to Common Presence and Instant Messaging (CPIM)", RFC 3922, October 2004. [RFC4017] Stanley, D., Walker, J., and B. Aboba, "Extensible Authentication Protocol (EAP) Method Requirements for Wireless LANs", RFC 4017, March 2005. [RFC4079] Peterson, J., "A Presence Architecture for the Distribution of GEOPRIV Location Objects", RFC 4079, July 2005.
[RFC4101] Rescorla, E. and IAB, "Writing Protocol Models", RFC 4101, June 2005. [RFC4187] Arkko, J. and H. Haverinen, "Extensible Authentication Protocol Method for 3rd Generation Authentication and Key Agreement (EAP-AKA)", RFC 4187, January 2006. [RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The Network Access Identifier", RFC 4282, December 2005. [RFC4745] Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J., Polk, J., and J. Rosenberg, "Common Policy: A Document Format for Expressing Privacy Preferences", RFC 4745, February 2007. [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV)", RFC 4918, June 2007. [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 4949, August 2007. [RFC5025] Rosenberg, J., "Presence Authorization Rules", RFC 5025, December 2007. [RFC5077] Salowey, J., Zhou, H., Eronen, P., and H. Tschofenig, "Transport Layer Security (TLS) Session Resumption without Server-Side State", RFC 5077, January 2008. [RFC5106] Tschofenig, H., Kroeselberg, D., Pashalidis, A., Ohba, Y., and F. Bersani, "The Extensible Authentication Protocol- Internet Key Exchange Protocol version 2 (EAP-IKEv2) Method", RFC 5106, February 2008. [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008. [RFC6269] Ford, M., Boucadair, M., Durand, A., Levis, P., and P. Roberts, "Issues with IP Address Sharing", RFC 6269, June 2011. [RFC6280] Barnes, R., Lepinski, M., Cooper, A., Morris, J., Tschofenig, H., and H. Schulzrinne, "An Architecture for Location and Location Privacy in Internet Applications", BCP 160, RFC 6280, July 2011. [RFC6302] Durand, A., Gashinsky, I., Lee, D., and S. Sheppard, "Logging Recommendations for Internet-Facing Servers", BCP 162, RFC 6302, June 2011.
[RFC6350] Perreault, S., "vCard Format Specification", RFC 6350, August 2011. [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of Variable Bit Rate Audio with Secure RTP", RFC 6562, March 2012. [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, September 2012. [RFC6772] Schulzrinne, H., Tschofenig, H., Cuellar, J., Polk, J., Morris, J., and M. Thomson, "Geolocation Policy: A Document Format for Expressing Privacy Preferences for Location Information", RFC 6772, January 2013. [Solove] Solove, D., "Understanding Privacy", March 2010. [Tor] The Tor Project, Inc., "Tor", 2013, <https://www.torproject.org/>. [Westin] Kumaraguru, P. and L. Cranor, "Privacy Indexes: A Survey of Westin's Studies", December 2005, <http://reports-archive.adm.cs.cmu.edu/anon/isri2005/ CMU-ISRI-05-138.pdf>. http://www.cdt.org/ Hannes Tschofenig Nokia Siemens Networks Linnoitustie 6 Espoo 02600 Finland Phone: +358 (50) 4871445 EMail: Hannes.Tschofenig@gmx.net URI: http://www.tschofenig.priv.at
Bernard Aboba Skype EMail: email@example.com Jon Peterson NeuStar, Inc. 1800 Sutter St. Suite 570 Concord, CA 94520 US EMail: firstname.lastname@example.org John B. Morris, Jr. EMail: email@example.com Marit Hansen ULD EMail: firstname.lastname@example.org Rhys Smith Janet EMail: email@example.com