Network Working Group P. Saint-Andre, Ed. Request for Comments: 3920 Jabber Software Foundation Category: Standards Track October 2004 Extensible Messaging and Presence Protocol (XMPP): Core Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2004).
AbstractThis memo defines the core features of the Extensible Messaging and Presence Protocol (XMPP), a protocol for streaming Extensible Markup Language (XML) elements in order to exchange structured information in close to real time between any two network endpoints. While XMPP provides a generalized, extensible framework for exchanging XML data, it is used mainly for the purpose of building instant messaging and presence applications that meet the requirements of RFC 2779.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Generalized Architecture . . . . . . . . . . . . . . . . . . 3 3. Addressing Scheme . . . . . . . . . . . . . . . . . . . . . 5 4. XML Streams . . . . . . . . . . . . . . . . . . . . . . . . 7 5. Use of TLS . . . . . . . . . . . . . . . . . . . . . . . . . 19 6. Use of SASL . . . . . . . . . . . . . . . . . . . . . . . . 27 7. Resource Binding . . . . . . . . . . . . . . . . . . . . . . 37 8. Server Dialback . . . . . . . . . . . . . . . . . . . . . . 41 9. XML Stanzas . . . . . . . . . . . . . . . . . . . . . . . . 48 10. Server Rules for Handling XML Stanzas . . . . . . . . . . . 58 11. XML Usage within XMPP . . . . . . . . . . . . . . . . . . . 60 12. Core Compliance Requirements . . . . . . . . . . . . . . . . 62 13. Internationalization Considerations . . . . . . . . . . . . 64 14. Security Considerations . . . . . . . . . . . . . . . . . . 64 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . 69 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 71 A. Nodeprep . . . . . . . . . . . . . . . . . . . . . . . . . . 75 B. Resourceprep . . . . . . . . . . . . . . . . . . . . . . . . 76 C. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . 78 D. Differences Between Core Jabber Protocols and XMPP . . . . . 87 Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . 89 Author's Address. . . . . . . . . . . . . . . . . . . . . . . . . 89 Full Copyright Statement. . . . . . . . . . . . . . . . . . . . . 90 XML] protocol for near-real-time messaging, presence, and request-response services. The basic syntax and semantics were developed originally within the Jabber open-source community, mainly in 1999. In 2002, the XMPP WG was chartered with developing an adaptation of the Jabber protocol that would be suitable as an IETF instant messaging (IM) and presence technology. As a result of work by the XMPP WG, the current memo defines the core features of XMPP 1.0; the extensions required to provide the instant messaging and presence functionality defined in RFC 2779 [IMP-REQS] are specified in the Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence [XMPP-IM].
RFC 2119 [TERMS]. TCP] connection, and servers also communicate with each other over TCP connections. The following diagram provides a high-level overview of this architecture (where "-" represents communications that use XMPP and "=" represents communications that use any other protocol). C1----S1---S2---C3 | C2----+--G1===FN1===FC1 The symbols are as follows: o C1, C2, C3 = XMPP clients o S1, S2 = XMPP servers o G1 = A gateway that translates between XMPP and the protocol(s) used on a foreign (non-XMPP) messaging network o FN1 = A foreign messaging network o FC1 = A client on a foreign messaging network
o to route appropriately-addressed XML stanzas (Section 9) among such entities over XML streams Most XMPP-compliant servers also assume responsibility for the storage of data that is used by clients (e.g., contact lists for users of XMPP-based instant messaging and presence applications); in this case, the XML data is processed directly by the server itself on behalf of the client and is not routed to another entity. TCP] connection and use XMPP to take full advantage of the functionality provided by a server and any associated services. Multiple resources (e.g., devices or locations) MAY connect simultaneously to a server on behalf of each authorized client, with each resource differentiated by the resource identifier of an XMPP address (e.g., <node@domain/ home> vs. <node@domain/work>) as defined under Addressing Scheme (Section 3). The RECOMMENDED port for connections between a client and a server is 5222, as registered with the IANA (see Port Numbers (Section 15.9)). SMTP]), Internet Relay Chat (see [IRC]), SIMPLE (see [SIMPLE]), Short Message Service (SMS), and legacy instant messaging services such as AIM, ICQ, MSN Messenger, and Yahoo! Instant Messenger. Communications between gateways and servers, and between gateways and the foreign messaging system, are not defined in this document. SMTP]) that make use of network addressing standards. Communications between any two servers are OPTIONAL. If enabled, such communications SHOULD occur over XML streams that are bound to [TCP] connections. The RECOMMENDED port for connections between servers is 5269, as registered with the IANA (see Port Numbers (Section 15.9)).
RFC 2396 [URI]. For historical reasons, the address of an XMPP entity is called a Jabber Identifier or JID. A valid JID contains a set of ordered elements formed of a domain identifier, node identifier, and resource identifier. The syntax for a JID is defined below using the Augmented Backus-Naur Form as defined in [ABNF]. (The IPv4address and IPv6address rules are defined in Appendix B of [IPv6]; the allowable character sequences that conform to the node rule are defined by the Nodeprep profile of [STRINGPREP] as documented in Appendix A of this memo; the allowable character sequences that conform to the resource rule are defined by the Resourceprep profile of [STRINGPREP] as documented in Appendix B of this memo; and the sub-domain rule makes reference to the concept of an internationalized domain label as described in [IDNA].) jid = [ node "@" ] domain [ "/" resource ] domain = fqdn / address-literal fqdn = (sub-domain 1*("." sub-domain)) sub-domain = (internationalized domain label) address-literal = IPv4address / IPv6address All JIDs are based on the foregoing structure. The most common use of this structure is to identify an instant messaging user, the server to which the user connects, and the user's connected resource (e.g., a specific client) in the form of <user@host/resource>. However, node types other than clients are possible; for example, a specific chat room offered by a multi-user chat service could be addressed as <room@service> (where "room" is the name of the chat room and "service" is the hostname of the multi-user chat service) and a specific occupant of such a room could be addressed as <room@service/nick> (where "nick" is the occupant's room nickname). Many other JID types are possible (e.g., <domain/resource> could be a server-side script or service). Each allowable portion of a JID (node identifier, domain identifier, and resource identifier) MUST NOT be more than 1023 bytes in length, resulting in a maximum total size (including the '@' and '/' separators) of 3071 bytes.
DNS]). A domain identifier MUST be an "internationalized domain name" as defined in [IDNA], to which the Nameprep [NAMEPREP] profile of stringprep [STRINGPREP] can be applied without failing. Before comparing two domain identifiers, a server MUST (and a client SHOULD) first apply the Nameprep profile to the labels (as defined in [IDNA]) that make up each identifier. STRINGPREP] can be applied to it without failing. Before comparing two node identifiers, a server MUST (and a client SHOULD) first apply the Nodeprep profile to each identifier.
to both servers and other clients, and is typically defined by a client implementation when it provides the information necessary to complete Resource Binding (Section 7) (although it may be generated by a server on behalf of a client), after which it is referred to as a "connected resource". An entity MAY maintain multiple connected resources simultaneously, with each connected resource differentiated by a distinct resource identifier. A resource identifier MUST be formatted such that the Resourceprep profile of [STRINGPREP] can be applied without failing. Before comparing two resource identifiers, a server MUST (and a client SHOULD) first apply the Resourceprep profile to each identifier. SASL], if no authorization identity was specified during SASL negotiation (Section 6). For client-to-server communications, the "bare JID" (<node@domain>) SHOULD be the authorization identity, derived from the authentication identity, as defined in [SASL], if no authorization identity was specified during SASL negotiation (Section 6); the resource identifier portion of the "full JID" (<node@domain/resource>) SHOULD be the resource identifier negotiated by the client and server during Resource Binding (Section 7). The receiving entity MUST ensure that the resulting JID (including node identifier, domain identifier, resource identifier, and separator characters) conforms to the rules and formats defined earlier in this section; to meet this restriction, the receiving entity may need to replace the JID sent by the initiating entity with the canonicalized JID as determined by the receiving entity.
Definition of XML Stream: An XML stream is a container for the exchange of XML elements between any two entities over a network. The start of an XML stream is denoted unambiguously by an opening XML <stream> tag (with appropriate attributes and namespace declarations), while the end of the XML stream is denoted unambiguously by a closing XML </stream> tag. During the life of the stream, the entity that initiated it can send an unbounded number of XML elements over the stream, either elements used to negotiate the stream (e.g., to negotiate Use of TLS (Section 5) or use of SASL (Section 6)) or XML stanzas (as defined herein, <message/>, <presence/>, or <iq/> elements qualified by the default namespace). The "initial stream" is negotiated from the initiating entity (usually a client or server) to the receiving entity (usually a server), and can be seen as corresponding to the initiating entity's "session" with the receiving entity. The initial stream enables unidirectional communication from the initiating entity to the receiving entity; in order to enable information exchange from the receiving entity to the initiating entity, the receiving entity MUST negotiate a stream in the opposite direction (the "response stream"). Definition of XML Stanza: An XML stanza is a discrete semantic unit of structured information that is sent from one entity to another over an XML stream. An XML stanza exists at the direct child level of the root <stream/> element and is said to be well-balanced if it matches the production  content of [XML]. The start of any XML stanza is denoted unambiguously by the element start tag at depth=1 of the XML stream (e.g., <presence>), and the end of any XML stanza is denoted unambiguously by the corresponding close tag at depth=1 (e.g., </presence>). An XML stanza MAY contain child elements (with accompanying attributes, elements, and XML character data) as necessary in order to convey the desired information. The only XML stanzas defined herein are the <message/>, <presence/>, and <iq/> elements qualified by the default namespace for the stream, as described under XML Stanzas (Section 9); an XML element sent for the purpose of Transport Layer Security (TLS) negotiation (Section 5), Simple Authentication and Security Layer (SASL) negotiation (Section 6), or server dialback (Section 8) is not considered to be an XML stanza. Consider the example of a client's session with a server. In order to connect to a server, a client MUST initiate an XML stream by sending an opening <stream> tag to the server, optionally preceded by a text declaration specifying the XML version and the character encoding supported (see Inclusion of Text Declaration (Section 11.4); see also Character Encoding (Section 11.5)). Subject to local policies and service provisioning, the server SHOULD then reply with
a second XML stream back to the client, again optionally preceded by a text declaration. Once the client has completed SASL negotiation (Section 6), the client MAY send an unbounded number of XML stanzas over the stream to any recipient on the network. When the client desires to close the stream, it simply sends a closing </stream> tag to the server (alternatively, the stream may be closed by the server), after which both the client and server SHOULD terminate the underlying connection (usually a TCP connection) as well. Those who are accustomed to thinking of XML in a document-centric manner may wish to view a client's session with a server as consisting of two open-ended XML documents: one from the client to the server and one from the server to the client. From this perspective, the root <stream/> element can be considered the document entity for each "document", and the two "documents" are built up through the accumulation of XML stanzas sent over the two XML streams. However, this perspective is a convenience only; XMPP does not deal in documents but in XML streams and XML stanzas. In essence, then, an XML stream acts as an envelope for all the XML stanzas sent during a session. We can represent this in a simplistic fashion as follows: |--------------------| | <stream> | |--------------------| | <presence> | | <show/> | | </presence> | |--------------------| | <message to='foo'> | | <body/> | | </message> | |--------------------| | <iq to='bar'> | | <query/> | | </iq> | |--------------------| | ... | |--------------------| | </stream> | |--------------------|
TCP] connection (e.g., two entities could connect to each other via another mechanism such as polling over [HTTP]), this specification defines a binding of XMPP to TCP only. In the context of client-to-server communications, a server MUST allow a client to share a single TCP connection for XML stanzas sent from client to server and from server to client. In the context of server-to-server communications, a server MUST use one TCP connection for XML stanzas sent from the server to the peer and another TCP connection (initiated by the peer) for stanzas from the peer to the server, for a total of two TCP connections.
o from -- The 'from' attribute SHOULD be used only in the XML stream header from the receiving entity to the initiating entity, and MUST be set to a hostname serviced by the receiving entity that is granting access to the initiating entity. There SHOULD NOT be a 'from' attribute on the XML stream header sent from the initiating entity to the receiving entity; however, if a 'from' attribute is included, it SHOULD be silently ignored by the receiving entity. o id -- The 'id' attribute SHOULD be used only in the XML stream header from the receiving entity to the initiating entity. This attribute is a unique identifier created by the receiving entity to function as a session key for the initiating entity's streams with the receiving entity, and MUST be unique within the receiving application (normally a server). Note well that the stream ID may be security-critical and therefore MUST be both unpredictable and nonrepeating (see [RANDOM] for recommendations regarding randomness for security purposes). There SHOULD NOT be an 'id' attribute on the XML stream header sent from the initiating entity to the receiving entity; however, if an 'id' attribute is included, it SHOULD be silently ignored by the receiving entity. o xml:lang -- An 'xml:lang' attribute (as defined in Section 2.12 of [XML]) SHOULD be included by the initiating entity on the header for the initial stream to specify the default language of any human-readable XML character data it sends over that stream. If the attribute is included, the receiving entity SHOULD remember that value as the default for both the initial stream and the response stream; if the attribute is not included, the receiving entity SHOULD use a configurable default value for both streams, which it MUST communicate in the header for the response stream. For all stanzas sent over the initial stream, if the initiating entity does not include an 'xml:lang' attribute, the receiving entity SHOULD apply the default value; if the initiating entity does include an 'xml:lang' attribute, the receiving entity MUST NOT modify or delete it (see also xml:lang (Section 9.1.5)). The value of the 'xml:lang' attribute MUST be an NMTOKEN (as defined in Section 2.3 of [XML]) and MUST conform to the format defined in RFC 3066 [LANGTAGS]. o version -- The presence of the version attribute set to a value of at least "1.0" signals support for the stream-related protocols (including stream features) defined in this specification. Detailed rules regarding the generation and handling of this attribute are defined below.
We can summarize as follows: | initiating to receiving | receiving to initiating ---------+---------------------------+----------------------- to | hostname of receiver | silently ignored from | silently ignored | hostname of receiver id | silently ignored | session key xml:lang | default language | default language version | signals XMPP 1.0 support | signals XMPP 1.0 support
2. The receiving entity MUST set the value of the 'version' attribute on the response stream header to either the value supplied by the initiating entity or the highest version number supported by the receiving entity, whichever is lower. The receiving entity MUST perform a numeric comparison on the major and minor version numbers, not a string match on "<major>.<minor>". 3. If the version number included in the response stream header is at least one major version lower than the version number included in the initial stream header and newer version entities cannot interoperate with older version entities as described above, the initiating entity SHOULD generate an <unsupported-version/> stream error and terminate the XML stream and underlying TCP connection. 4. If either entity receives a stream header with no 'version' attribute, the entity MUST consider the version supported by the other entity to be "0.0" and SHOULD NOT include a 'version' attribute in the stream header it sends in reply. XML-NAMES]). For detailed information regarding the streams namespace and default namespace, see Namespace Names and Prefixes (Section 11.2). XMPP-IM]; however, the stream features functionality could be used to advertise other negotiable features in the future. If an entity does not understand or support some features, it SHOULD silently ignore them. If one or more security features (e.g., TLS and SASL) need to be successfully negotiated before a non-security-related feature (e.g., Resource Binding) can be offered, the non-security-related feature SHOULD NOT be included in the stream features that are advertised before the relevant security features have been negotiated.
o MAY contain a <text/> child containing XML character data that describes the error in more detail; this element MUST be qualified by the 'urn:ietf:params:xml:ns:xmpp-streams' namespace and SHOULD possess an 'xml:lang' attribute specifying the natural language of the XML character data o MAY contain a child element for an application-specific error condition; this element MUST be qualified by an application-defined namespace, and its structure is defined by that namespace The <text/> element is OPTIONAL. If included, it SHOULD be used only to provide descriptive or diagnostic information that supplements the meaning of a defined condition or application-specific condition. It SHOULD NOT be interpreted programmatically by an application. It SHOULD NOT be used as the error message presented to a user, but MAY be shown in addition to the error message associated with the included condition element (or elements).
o <host-unknown/> -- the value of the 'to' attribute provided by the initiating entity in the stream header does not correspond to a hostname that is hosted by the server. o <improper-addressing/> -- a stanza sent between two servers lacks a 'to' or 'from' attribute (or the attribute has no value). o <internal-server-error/> -- the server has experienced a misconfiguration or an otherwise-undefined internal error that prevents it from servicing the stream. o <invalid-from/> -- the JID or hostname provided in a 'from' address does not match an authorized JID or validated domain negotiated between servers via SASL or dialback, or between a client and a server via authentication and resource binding. o <invalid-id/> -- the stream ID or dialback ID is invalid or does not match an ID previously provided. o <invalid-namespace/> -- the streams namespace name is something other than "http://etherx.jabber.org/streams" or the dialback namespace name is something other than "jabber:server:dialback" (see XML Namespace Names and Prefixes (Section 11.2)). o <invalid-xml/> -- the entity has sent invalid XML over the stream to a server that performs validation (see Validation (Section 11.3)). o <not-authorized/> -- the entity has attempted to send data before the stream has been authenticated, or otherwise is not authorized to perform an action related to stream negotiation; the receiving entity MUST NOT process the offending stanza before sending the stream error. o <policy-violation/> -- the entity has violated some local service policy; the server MAY choose to specify the policy in the <text/> element or an application-specific condition element. o <remote-connection-failed/> -- the server is unable to properly connect to a remote entity that is required for authentication or authorization. o <resource-constraint/> -- the server lacks the system resources necessary to service the stream.
o <restricted-xml/> -- the entity has attempted to send restricted XML features such as a comment, processing instruction, DTD, entity reference, or unescaped character (see Restrictions (Section 11.1)). o <see-other-host/> -- the server will not provide service to the initiating entity but is redirecting traffic to another host; the server SHOULD specify the alternate hostname or IP address (which MUST be a valid domain identifier) as the XML character data of the <see-other-host/> element. o <system-shutdown/> -- the server is being shut down and all active streams are being closed. o <undefined-condition/> -- the error condition is not one of those defined by the other conditions in this list; this error condition SHOULD be used only in conjunction with an application-specific condition. o <unsupported-encoding/> -- the initiating entity has encoded the stream in an encoding that is not supported by the server (see Character Encoding (Section 11.5)). o <unsupported-stanza-type/> -- the initiating entity has sent a first-level child of the stream that is not supported by the server. o <unsupported-version/> -- the value of the 'version' attribute provided by the initiating entity in the stream header specifies a version of XMPP that is not supported by the server; the server MAY specify the version(s) it supports in the <text/> element. o <xml-not-well-formed/> -- the initiating entity has sent XML that is not well-formed as defined by [XML].
<stream:error> <xml-not-well-formed xmlns='urn:ietf:params:xml:ns:xmpp-streams'/> <text xml:lang='en' xmlns='urn:ietf:params:xml:ns:xmpp-streams'> Some special application diagnostic information! </text> <escape-your-data xmlns='application-ns'/> </stream:error> </stream:stream>
A "session" gone bad: C: <?xml version='1.0'?> <stream:stream to='example.com' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams' version='1.0'> S: <?xml version='1.0'?> <stream:stream from='example.com' id='someid' xmlns='jabber:client' xmlns:stream='http://etherx.jabber.org/streams' version='1.0'> ... encryption, authentication, and resource binding ... C: <message xml:lang='en'> <body>Bad XML, no closing body tag! </message> S: <stream:error> <xml-not-well-formed xmlns='urn:ietf:params:xml:ns:xmpp-streams'/> </stream:error> S: </stream:stream>