XMPP provides a technology for the asynchronous, end-to-end exchange
of structured data by means of direct, persistent XML streams among a
distributed network of globally addressable, presence-aware clients
and servers. Because this architectural style involves ubiquitous
knowledge of network availability and a conceptually unlimited number
of concurrent information transactions in the context of a given
client-to-server or server-to-server session, we label it
"Availability for Concurrent Transactions" (ACT) to distinguish it
from the "Representational State Transfer" [REST] architectural style
familiar from the World Wide Web. Although the architecture of XMPP
is similar in important ways to that of email (see [EMAIL-ARCH]), it
introduces several modifications to facilitate communication in close
to real time. The salient features of this ACTive architectural
style are as follows.
2.1. Global Addresses
As with email, XMPP uses globally unique addresses (based on the
Domain Name System) in order to route and deliver messages over the
network. All XMPP entities are addressable on the network, most
particularly clients and servers but also various additional services
that can be accessed by clients and servers. In general, server
addresses are of the form <domainpart> (e.g., <im.example.com>),
accounts hosted at a server are of the form <localpart@domainpart>
(e.g., <email@example.com>, called a "bare JID"), and a
particular connected device or resource that is currently authorized
for interaction on behalf of an account is of the form
<firstname.lastname@example.org/balcony>, called a "full JID"). For
historical reasons, XMPP addresses are often called Jabber IDs or
JIDs. Because the formal specification of the XMPP address format
depends on internationalization technologies that are in flux at the
time of writing, the format is defined in [XMPP-ADDR] instead of this
document. The terms "localpart", "domainpart", and "resourcepart"
are defined more formally in [XMPP-ADDR].
XMPP includes the ability for an entity to advertise its network
availability or "presence" to other entities. In XMPP, this
availability for communication is signaled end-to-end by means of a
dedicated communication primitive: the <presence/> stanza. Although
knowledge of network availability is not strictly necessary for the
exchange of XMPP messages, it facilitates real-time interaction
because the originator of a message can know before initiating
communication that the intended recipient is online and available.
End-to-end presence is defined in [XMPP-IM].
2.3. Persistent Streams
Availability for communication is also built into each point-to-point
"hop" through the use of persistent XML streams over long-lived TCP
connections. These "always-on" client-to-server and server-to-server
streams enable each party to push data to the other party at any time
for immediate routing or delivery. XML streams are defined under
2.4. Structured Data
The basic protocol data unit in XMPP is not an XML stream (which
simply provides the transport for point-to-point communication) but
an XML "stanza", which is essentially a fragment of XML that is sent
over a stream. The root element of a stanza includes routing
attributes (such as "from" and "to" addresses), and the child
elements of the stanza contain a payload for delivery to the intended
recipient. XML stanzas are defined under Section 8.
2.5. Distributed Network of Clients and Servers
In practice, XMPP consists of a network of clients and servers that
inter-communicate (however, communication between any two given
deployed servers is strictly discretionary and a matter of local
service policy). Thus, for example, the user <email@example.com>
associated with the server <im.example.com> might be able to exchange
messages, presence, and other structured data with the user
<firstname.lastname@example.org> associated with the server <example.net>. This
pattern is familiar from messaging protocols that make use of global
addresses, such as the email network (see [SMTP] and [EMAIL-ARCH]).
As a result, end-to-end communication in XMPP is logically peer-to-
peer but physically client-to-server-to-server-to-client, as
illustrated in the following diagram.
example.net <--------------> im.example.com
Figure 1: Distributed Client-Server Architecture
Informational Note: Architectures that employ XML streams
(Section 4) and XML stanzas (Section 8) but that establish peer-
to-peer connections directly between clients using technologies
based on [LINKLOCAL] have been deployed, but such architectures
are not defined in this specification and are best described as
"XMPP-like"; for details, see [XEP-0174]. In addition, XML
streams can be established end-to-end over any reliable transport,
including extensions to XMPP itself; however, such methods are out
of scope for this specification.
The following paragraphs describe the responsibilities of clients and
servers on the network.
A client is an entity that establishes an XML stream with a server by
authenticating using the credentials of a registered account (via
SASL negotiation (Section 6)) and that then completes resource
binding (Section 7) in order to enable delivery of XML stanzas
between the server and the client over the negotiated stream. The
client then uses XMPP to communicate with its server, other clients,
and any other entities on the network, where the server is
responsible for delivering stanzas to other connected clients at the
same server or routing them to remote servers. Multiple clients can
connect simultaneously to a server on behalf of the same registered
account, where each client is differentiated by the resourcepart of
an XMPP address (e.g., <email@example.com/balcony> vs.
<firstname.lastname@example.org/chamber>), as defined under [XMPP-ADDR] and
A server is an entity whose primary responsibilities are to:
o Manage XML streams (Section 4) with connected clients and deliver
XML stanzas (Section 8) to those clients over the negotiated
streams; this includes responsibility for ensuring that a client
authenticates with the server before being granted access to the
o Subject to local service policies on server-to-server
communication, manage XML streams (Section 4) with remote servers
and route XML stanzas (Section 8) to those servers over the
Depending on the application, the secondary responsibilities of an
XMPP server can include:
o Storing data that is used by clients (e.g., contact lists for
users of XMPP-based instant messaging and presence applications as
defined in [XMPP-IM]); in this case, the relevant XML stanza is
handled directly by the server itself on behalf of the client and
is not routed to a remote server or delivered to a connected
o Hosting add-on services that also use XMPP as the basis for
communication but that provide additional functionality beyond
that defined in this document or in [XMPP-IM]; examples include
multi-user conferencing services as specified in [XEP-0045] and
publish-subscribe services as specified in [XEP-0060].
3. TCP Binding
As XMPP is defined in this specification, an initiating entity
(client or server) MUST open a Transmission Control Protocol [TCP]
connection to the receiving entity (server) before it negotiates XML
streams with the receiving entity. The parties then maintain that
TCP connection for as long as the XML streams are in use. The rules
specified in the following sections apply to the TCP binding.
Informational Note: There is no necessary coupling of XML streams
to TCP, and other transports are possible. For example, two
entities could connect to each other by means of [HTTP] as
specified in [XEP-0124] and [XEP-0206]. However, this
specification defines only a binding of XMPP to TCP.
3.2. Resolution of Fully Qualified Domain Names
Because XML streams are sent over TCP, the initiating entity needs to
determine the IPv4 or IPv6 address (and port) of the receiving entity
before it can attempt to open an XML stream. Typically this is done
by resolving the receiving entity's fully qualified domain name or
FQDN (see [DNS-CONCEPTS]).
3.2.1. Preferred Process: SRV Lookup
The preferred process for FQDN resolution is to use [DNS-SRV] records
1. The initiating entity constructs a DNS SRV query whose inputs
* a Service of "xmpp-client" (for client-to-server connections)
or "xmpp-server" (for server-to-server connections)
* a Proto of "tcp"
* a Name corresponding to the "origin domain" [TLS-CERTS] of the
XMPP service to which the initiating entity wishes to connect
(e.g., "example.net" or "im.example.com")
2. The result is a query such as "_xmpp-client._tcp.example.net." or
3. If a response is received, it will contain one or more
combinations of a port and FDQN, each of which is weighted and
prioritized as described in [DNS-SRV]. (However, if the result
of the SRV lookup is a single resource record with a Target of
".", i.e., the root domain, then the initiating entity MUST abort
SRV processing at this point because according to [DNS-SRV] such
a Target "means that the service is decidedly not available at
4. The initiating entity chooses at least one of the returned FQDNs
to resolve (following the rules in [DNS-SRV]), which it does by
performing DNS "A" or "AAAA" lookups on the FDQN; this will
result in an IPv4 or IPv6 address.
5. The initiating entity uses the IP address(es) from the
successfully resolved FDQN (with the corresponding port number
returned by the SRV lookup) as the connection address for the
6. If the initiating entity fails to connect using that IP address
but the "A" or "AAAA" lookups returned more than one IP address,
then the initiating entity uses the next resolved IP address for
that FDQN as the connection address.
7. If the initiating entity fails to connect using all resolved IP
addresses for a given FDQN, then it repeats the process of
resolution and connection for the next FQDN returned by the SRV
lookup based on the priority and weight as defined in [DNS-SRV].
8. If the initiating entity receives a response to its SRV query but
it is not able to establish an XMPP connection using the data
received in the response, it SHOULD NOT attempt the fallback
process described in the next section (this helps to prevent a
state mismatch between inbound and outbound connections).
9. If the initiating entity does not receive a response to its SRV
query, it SHOULD attempt the fallback process described in the
3.2.2. Fallback Processes
The fallback process SHOULD be a normal "A" or "AAAA" address record
resolution to determine the IPv4 or IPv6 address of the origin
domain, where the port used is the "xmpp-client" port of 5222 for
client-to-server connections or the "xmpp-server" port of 5269 for
server-to-server connections (these are the default ports as
registered with the IANA as described under Section 14.7).
If connections via TCP are unsuccessful, the initiating entity might
attempt to find and use alternative connection methods such as the
HTTP binding (see [XEP-0124] and [XEP-0206]), which might be
discovered using [DNS-TXT] records as described in [XEP-0156].
3.2.3. When Not to Use SRV
If the initiating entity has been explicitly configured to associate
a particular FQDN (and potentially port) with the origin domain of
the receiving entity (say, to "hardcode" an association from an
origin domain of example.net to a configured FQDN of
apps.example.com), the initiating entity is encouraged to use the
configured name instead of performing the preferred SRV resolution
process on the origin domain.
3.2.4. Use of SRV Records with Add-On Services
Many XMPP servers are implemented in such a way that they can host
add-on services (beyond those defined in this specification and
[XMPP-IM]) at DNS domain names that typically are "subdomains" of the
main XMPP service (e.g., conference.example.net for a [XEP-0045]
service associated with the example.net XMPP service) or "subdomains"
of the first-level domain of the underlying service (e.g.,
muc.example.com for a [XEP-0045] service associated with the
im.example.com XMPP service). If an entity associated with a remote
XMPP server wishes to communicate with such an add-on service, it
would generate an appropriate XML stanza and the remote server would
attempt to resolve the add-on service's DNS domain name via an SRV
lookup on resource records such as "_xmpp-
server._tcp.conference.example.net." or "_xmpp-
server._tcp.muc.example.com.". Therefore, if the administrators of
an XMPP service wish to enable entities associated with remote
servers to access such add-on services, they need to advertise the
appropriate "_xmpp-server" SRV records in addition to the "_xmpp-
server" record for their main XMPP service. In case SRV records are
not available, the fallback methods described under Section 3.2.2 can
be used to resolve the DNS domain names of add-on services.
It can happen that an XMPP server goes offline unexpectedly while
servicing TCP connections from connected clients and remote servers.
Because the number of such connections can be quite large, the
reconnection algorithm employed by entities that seek to reconnect
can have a significant impact on software performance and network
congestion. If an entity chooses to reconnect, it:
o SHOULD set the number of seconds that expire before reconnecting
to an unpredictable number between 0 and 60 (this helps to ensure
that not all entities attempt to reconnect at exactly the same
number of seconds after being disconnected).
o SHOULD back off increasingly on the time between subsequent
reconnection attempts (e.g., in accordance with "truncated binary
exponential backoff" as described in [ETHERNET]) if the first
reconnection attempt does not succeed.
It is RECOMMENDED to make use of TLS session resumption [TLS-RESUME]
when reconnecting. A future version of this document, or a separate
specification, might provide more detailed guidelines regarding
methods for speeding the reconnection process.
The use of long-lived TCP connections in XMPP implies that the
sending of XML stanzas over XML streams can be unreliable, since the
parties to a long-lived TCP connection might not discover a
connectivity disruption in a timely manner. At the XMPP application
layer, long connectivity disruptions can result in undelivered
stanzas. Although the core XMPP technology defined in this
specification does not contain features to overcome this lack of
reliability, there exist XMPP extensions for doing so (e.g.,
4. XML Streams
4.1. Stream Fundamentals
Two fundamental concepts make possible the rapid, asynchronous
exchange of relatively small payloads of structured information
between XMPP entities: XML streams and XML stanzas. These terms are
defined as follows.
Definition of XML Stream: An XML stream is a container for the
exchange of XML elements between any two entities over a network.
The start of an XML stream is denoted unambiguously by an opening
"stream header" (i.e., an XML <stream> tag with appropriate
attributes and namespace declarations), while the end of the XML
stream is denoted unambiguously by a closing XML </stream> tag.
During the life of the stream, the entity that initiated it can
send an unbounded number of XML elements over the stream, either
elements used to negotiate the stream (e.g., to complete TLS
negotiation (Section 5) or SASL negotiation (Section 6)) or XML
stanzas. The "initial stream" is negotiated from the initiating
entity (typically a client or server) to the receiving entity
(typically a server), and can be seen as corresponding to the
initiating entity's "connection to" or "session with" the
receiving entity. The initial stream enables unidirectional
communication from the initiating entity to the receiving entity;
in order to enable exchange of stanzas from the receiving entity
to the initiating entity, the receiving entity MUST negotiate a
stream in the opposite direction (the "response stream").
Definition of XML Stanza: An XML stanza is the basic unit of meaning
in XMPP. A stanza is a first-level element (at depth=1 of the
stream) whose element name is "message", "presence", or "iq" and
whose qualifying namespace is 'jabber:client' or 'jabber:server'.
By contrast, a first-level element qualified by any other
namespace is not an XML stanza (stream errors, stream features,
TLS-related elements, SASL-related elements, etc.), nor is a
<message/>, <presence/>, or <iq/> element that is qualified by the
'jabber:client' or 'jabber:server' namespace but that occurs at a
depth other than one (e.g., a <message/> element contained within
an extension element (Section 8.4) for reporting purposes), nor is
a <message/>, <presence/>, or <iq/> element that is qualified by a
namespace other than 'jabber:client' or 'jabber:server'. An XML
stanza typically contains one or more child elements (with
accompanying attributes, elements, and XML character data) as
necessary in order to convey the desired information, which MAY be
qualified by any XML namespace (see [XML-NAMES] as well as
Section 8.4 in this specification).
There are three kinds of stanzas: message, presence, and IQ (short
for "Info/Query"). These stanza types provide three different
communication primitives: a "push" mechanism for generalized
messaging, a specialized "publish-subscribe" mechanism for
broadcasting information about network availability, and a "request-
response" mechanism for more structured exchanges of data (similar to
[HTTP]). Further explanations are provided under Section 8.2.1,
Section 8.2.2, and Section 8.2.3, respectively.
Consider the example of a client's connection to a server. The
client initiates an XML stream by sending a stream header to the
server, preferably preceded by an XML declaration specifying the XML
version and the character encoding supported (see Section 11.5 and
Section 11.6). Subject to local policies and service provisioning,
the server then replies with a second XML stream back to the client,
again preferably preceded by an XML declaration. Once the client has
completed SASL negotiation (Section 6) and resource binding
(Section 7), the client can send an unbounded number of XML stanzas
over the stream. When the client desires to close the stream, it
simply sends a closing </stream> tag to the server as further
described under Section 4.4.
In essence, then, one XML stream functions as an envelope for the XML
stanzas sent during a session and another XML stream functions as an
envelope for the XML stanzas received during a session. We can
represent this in a simplistic fashion as follows.
| INITIAL STREAM | RESPONSE STREAM |
| <stream> | |
| | <stream> |
| <presence> | |
| <show/> | |
| </presence> | |
| <message to='foo'> | |
| <body/> | |
| </message> | |
| <iq to='bar' | |
| type='get'> | |
| <query/> | |
| </iq> | |
| | <iq from='bar' |
| | type='result'> |
| | <query/> |
| | </iq> |
| [ ... ] | |
| | [ ... ] |
| </stream> | |
| | </stream> |
Figure 2: A Simplistic View of Two Streams
Those who are accustomed to thinking of XML in a document-centric
manner might find the following analogies useful:
o The two XML streams are like two "documents" (matching the
"document" production from [XML]) that are built up through the
accumulation of XML stanzas.
o The root <stream/> element is like the "document entity" for each
"document" (as described in Section 4.8 of [XML]).
o The XML stanzas sent over the streams are like "fragments" of the
"documents" (as described in [XML-FRAG]).
However, these descriptions are merely analogies, because XMPP does
not deal in documents and fragments but in streams and stanzas.
The remainder of this section defines the following aspects of XML
streams (along with related topics):
o How to open a stream (Section 4.2)
o The stream negotiation process (Section 4.3)
o How to close a stream (Section 4.4)
o The directionality of XML streams (Section 4.5)
o How to handle peers that are silent (Section 4.6)
o The XML attributes of a stream (Section 4.7)
o The XML namespaces of a stream (Section 4.8)
o Error handling related to XML streams (Section 4.9)
4.2. Opening a Stream
After connecting to the appropriate IP address and port of the
receiving entity, the initiating entity opens a stream by sending a
stream header (the "initial stream header") to the receiving entity.
I: <?xml version='1.0'?>
The receiving entity then replies by sending a stream header of its
own (the "response stream header") to the initiating entity.
R: <?xml version='1.0'?>
The entities can then proceed with the remainder of the stream
4.3. Stream Negotiation
4.3.1. Basic Concepts
Because the receiving entity for a stream acts as a gatekeeper to the
domains it services, it imposes certain conditions for connecting as
a client or as a peer server. At a minimum, the initiating entity
needs to authenticate with the receiving entity before it is allowed
to send stanzas to the receiving entity (for client-to-server streams
this means using SASL as described under Section 6). However, the
receiving entity can consider conditions other than authentication to
be mandatory-to-negotiate, such as encryption using TLS as described
under Section 5. The receiving entity informs the initiating entity
about such conditions by communicating "stream features": the set of
particular protocol interactions that the initiating entity needs to
complete before the receiving entity will accept XML stanzas from the
initiating entity, as well as any protocol interactions that are
voluntary-to-negotiate but that might improve the handling of an XML
stream (e.g., establishment of application-layer compression as
described in [XEP-0138]).
The existence of conditions for connecting implies that streams need
to be negotiated. The order of layers (TCP, then TLS, then SASL,
then XMPP as described under Section 13.3) implies that stream
negotiation is a multi-stage process. Further structure is imposed
by two factors: (1) a given stream feature might be offered only to
certain entities or only after certain other features have been
negotiated (e.g., resource binding is offered only after SASL
authentication), and (2) stream features can be either mandatory-to-
negotiate or voluntary-to-negotiate. Finally, for security reasons
the parties to a stream need to discard knowledge that they gained
during the negotiation process after successfully completing the
protocol interactions defined for certain features (e.g., TLS in all
cases and SASL in the case when a security layer might be
established, as defined in the specification for the relevant SASL
mechanism). This is done by flushing the old stream context and
exchanging new stream headers over the existing TCP connection.
4.3.2. Stream Features Format
If the initiating entity includes in the initial stream header the
'version' attribute set to a value of at least "1.0" (see
Section 4.7.5), after sending the response stream header the
receiving entity MUST send a <features/> child element (typically
prefixed by the stream namespace prefix as described under
Section 4.8.5) to the initiating entity in order to announce any
conditions for continuation of the stream negotiation process. Each
condition takes the form of a child element of the <features/>
element, qualified by a namespace that is different from the stream
namespace and the content namespace. The <features/> element can
contain one child, contain multiple children, or be empty.
Implementation Note: The order of child elements contained in any
given <features/> element is not significant.
If a particular stream feature is or can be mandatory-to-negotiate,
the definition of that feature needs to do one of the following:
1. Declare that the feature is always mandatory-to-negotiate (e.g.,
this is true of resource binding for XMPP clients); or
2. Specify a way for the receiving entity to flag the feature as
mandatory-to-negotiate for this interaction (e.g., for STARTTLS,
this is done by including an empty <required/> element in the
advertisement for that stream feature, but that is not a generic
format for all stream features); it is RECOMMENDED that stream
feature definitions for new mandatory-to-negotiate features do so
by including an empty <required/> element as is done for
Informational Note: Because there is no generic format for
indicating that a feature is mandatory-to-negotiate, it is
possible that a feature that is not understood by the initiating
entity might be considered mandatory-to-negotiate by the receiving
entity, resulting in failure of the stream negotiation process.
Although such an outcome would be undesirable, the working group
deemed it rare enough that a generic format was not needed.
For security reasons, certain stream features necessitate the
initiating entity to send a new initial stream header upon successful
negotiation of the feature (e.g., TLS in all cases and SASL in the
case when a security layer might be established). If this is true of
a given stream feature, the definition of that feature needs to
specify that a stream restart is expected after negotiation of the
A <features/> element that contains at least one mandatory-to-
negotiate feature indicates that the stream negotiation is not
complete and that the initiating entity MUST negotiate further
A <features/> element MAY contain more than one mandatory-to-
negotiate feature. This means that the initiating entity can choose
among the mandatory-to-negotiate features at this stage of the stream
negotiation process. As an example, perhaps a future technology will
perform roughly the same function as TLS, so the receiving entity
might advertise support for both TLS and the future technology at the
same stage of the stream negotiation process. However, this applies
only at a given stage of the stream negotiation process and does not
apply to features that are mandatory-to-negotiate at different stages
(e.g., the receiving entity would not advertise both STARTTLS and
SASL as mandatory-to-negotiate, or both SASL and resource binding as
mandatory-to-negotiate, because TLS would need to be negotiated
before SASL and because SASL would need to be negotiated before
A <features/> element that contains both mandatory-to-negotiate and
voluntary-to-negotiate features indicates that the negotiation is not
complete but that the initiating entity MAY complete the voluntary-
to-negotiate feature(s) before it attempts to negotiate the
A <features/> element that contains only voluntary-to-negotiate
features indicates that the stream negotiation is complete and that
the initiating entity is cleared to send XML stanzas, but that the
initiating entity MAY negotiate further features if desired.
An empty <features/> element indicates that the stream negotiation is
complete and that the initiating entity is cleared to send XML
On successful negotiation of a feature that necessitates a stream
restart, both parties MUST consider the previous stream to be
replaced but MUST NOT send a closing </stream> tag and MUST NOT
terminate the underlying TCP connection; instead, the parties MUST
reuse the existing connection, which might be in a new state (e.g.,
encrypted as a result of TLS negotiation). The initiating entity
then MUST send a new initial stream header, which SHOULD be preceded
by an XML declaration as described under Section 11.5. When the
receiving entity receives the new initial stream header, it MUST
generate a new stream ID (instead of reusing the old stream ID)
before sending a new response stream header (which SHOULD be preceded
by an XML declaration as described under Section 11.5).
4.3.4. Resending Features
The receiving entity MUST send an updated list of stream features to
the initiating entity after a stream restart. The list of updated
features MAY be empty if there are no further features to be
advertised or MAY include any combination of features.
4.3.5. Completion of Stream Negotiation
The receiving entity indicates completion of the stream negotiation
process by sending to the initiating entity either an empty
<features/> element or a <features/> element that contains only
voluntary-to-negotiate features. After doing so, the receiving
entity MAY send an empty <features/> element (e.g., after negotiation
of such voluntary-to-negotiate features) but MUST NOT send additional
stream features to the initiating entity (if the receiving entity has
new features to offer, preferably limited to mandatory-to-negotiate
or security-critical features, it can simply close the stream with a
<reset/> stream error (Section 184.108.40.206) and then advertise the new
features when the initiating entity reconnects, preferably closing
existing streams in a staggered way so that not all of the initiating
entities reconnect at once). Once stream negotiation is complete,
the initiating entity is cleared to send XML stanzas over the stream
for as long as the stream is maintained by both parties.
Informational Note: Resource binding as specified under Section 7
is an historical exception to the foregoing rule, since it is
mandatory-to-negotiate for clients but uses XML stanzas for
The initiating entity MUST NOT attempt to send XML stanzas
(Section 8) to entities other than itself (i.e., the client's
connected resource or any other authenticated resource of the
client's account) or the server to which it is connected until stream
negotiation has been completed. Even if the initiating entity does
attempt to do so, the receiving entity MUST NOT accept such stanzas
and MUST close the stream with a <not-authorized/> stream error
(Section 220.127.116.11). This rule applies to XML stanzas only (i.e.,
<message/>, <presence/>, and <iq/> elements qualified by the content
namespace) and not to XML elements used for stream negotiation (e.g.,
elements used to complete TLS negotiation (Section 5) or SASL
negotiation (Section 6)).
4.3.6. Determination of Addresses
After the parties to an XML stream have completed the appropriate
aspects of stream negotiation, the receiving entity for a stream MUST
determine the initiating entity's JID.
For client-to-server communication, both SASL negotiation (Section 6)
and resource binding (Section 7) MUST be completed before the server
can determine the client's address. The client's bare JID
(<localpart@domainpart>) MUST be the authorization identity (as
defined by [SASL]), either (1) as directly communicated by the client
during SASL negotiation (Section 6) or (2) as derived by the server
from the authentication identity if no authorization identity was
specified during SASL negotiation. The resourcepart of the full JID
(<localpart@domainpart/resourcepart>) MUST be the resource negotiated
by the client and server during resource binding (Section 7). A
client MUST NOT attempt to guess at its JID but instead MUST consider
its JID to be whatever the server returns to it during resource
binding. The server MUST ensure that the resulting JID (including
localpart, domainpart, resourcepart, and separator characters)
conforms to the canonical format for XMPP addresses defined in
[XMPP-ADDR]; to meet this restriction, the server MAY replace the JID
sent by the client with the canonicalized JID as determined by the
server and communicate that JID to the client during resource
For server-to-server communication, the initiating server's bare JID
(<domainpart>) MUST be the authorization identity (as defined by
[SASL]), either (1) as directly communicated by the initiating server
during SASL negotiation (Section 6) or (2) as derived by the
receiving server from the authentication identity if no authorization
identity was specified during SASL negotiation. In the absence of
SASL negotiation, the receiving server MAY consider the authorization
identity to be an identity negotiated within the relevant
verification protocol (e.g., the 'from' attribute of the <result/>
element in Server Dialback [XEP-0220]).
Security Warning: Because it is possible for a third party to
tamper with information that is sent over the stream before a
security layer such as TLS is successfully negotiated, it is
advisable for the receiving server to treat any such unprotected
information with caution; this applies especially to the 'from'
and 'to' addresses on the first initial stream header sent by the
4.3.7. Flow Chart
We summarize the foregoing rules in the following non-normative flow
chart for the stream negotiation process, presented from the
perspective of the initiating entity.
4.4. Closing a Stream
An XML stream from one entity to another can be closed at any time,
either because a specific stream error (Section 4.9) has occurred or
in the absence of an error (e.g., when a client simply ends its
A stream is closed by sending a closing </stream> tag.
If the parties are using either two streams over a single TCP
connection or two streams over two TCP connections, the entity that
sends the closing stream tag MUST behave as follows:
1. Wait for the other party to also close its outbound stream before
terminating the underlying TCP connection(s); this gives the
other party an opportunity to finish transmitting any outbound
data to the closing entity before the termination of the TCP
2. Refrain from sending any further data over its outbound stream to
the other entity, but continue to process data received from the
other entity (and, if necessary, process such data).
3. Consider both streams to be void if the other party does not send
its closing stream tag within a reasonable amount of time (where
the definition of "reasonable" is a matter of implementation or
4. After receiving a reciprocal closing stream tag from the other
party or waiting a reasonable amount of time with no response,
terminate the underlying TCP connection(s).
Security Warning: In accordance with Section 7.2.1 of [TLS], to
help prevent a truncation attack the party that is closing the
stream MUST send a TLS close_notify alert and MUST receive a
responding close_notify alert from the other party before
terminating the underlying TCP connection(s).
If the parties are using multiple streams over multiple TCP
connections, there is no defined pairing of streams and therefore the
behavior is a matter for implementation.
An XML stream is always unidirectional, by which is meant that XML
stanzas can be sent in only one direction over the stream (either
from the initiating entity to the receiving entity or from the
receiving entity to the initiating entity).
Depending on the type of session that has been negotiated and the
nature of the entities involved, the entities might use:
o Two streams over a single TCP connection, where the security
context negotiated for the first stream is applied to the second
stream. This is typical for client-to-server sessions, and a
server MUST allow a client to use the same TCP connection for both
o Two streams over two TCP connections, where each stream is
separately secured. In this approach, one TCP connection is used
for the stream in which stanzas are sent from the initiating
entity to the receiving entity, and the other TCP connection is
used for the stream in which stanzas are sent from the receiving
entity to the initiating entity. This is typical for server-to-
o Multiple streams over two or more TCP connections, where each
stream is separately secured. This approach is sometimes used for
server-to-server communication between two large XMPP service
providers; however, this can make it difficult to maintain
coherence of data received over multiple streams in situations
described under Section 10.1, which is why a server MAY close the
stream with a <conflict/> stream error (Section 18.104.22.168) if a
remote server attempts to negotiate more than one stream (as
described under Section 22.214.171.124).
This concept of directionality applies only to stanzas and explicitly
does not apply to first-level children of the stream root that are
used to bootstrap or manage the stream (e.g., first-level elements
used for TLS negotiation, SASL negotiation, Server Dialback
[XEP-0220], and Stream Management [XEP-0198]).
The foregoing considerations imply that while completing STARTTLS
negotiation (Section 5) and SASL negotiation (Section 6) two servers
would use one TCP connection, but after the stream negotiation
process is done that original TCP connection would be used only for
the initiating server to send XML stanzas to the receiving server.
In order for the receiving server to send XML stanzas to the
initiating server, the receiving server would need to reverse the
roles and negotiate an XML stream from the receiving server to the
initiating server over a separate TCP connection. This separate TCP
connection is then secured using a new round of TLS and/or SASL
Implementation Note: For historical reasons, a server-to-server
session always uses two TCP connections. While that approach
remains the standard behavior described in this document,
extensions such as [XEP-0288] enable servers to negotiate the use
of a single TCP connection for bidirectional stanza exchange.
Informational Note: Although XMPP developers sometimes apply the
terms "unidirectional" and "bidirectional" to the underlying TCP
connection (e.g., calling the TCP connection for a client-to-
server session "bidirectional" and the TCP connection for a
server-to-server session "unidirectional"), strictly speaking a
stream is always unidirectional (because the initiating entity and
receiving entity always have a minimum of two streams, one in each
direction) and a TCP connection is always bidirectional (because
TCP traffic can be sent in both directions). Directionality
applies to the application-layer traffic sent over the TCP
connection, not to the transport-layer traffic sent over the TCP
4.6. Handling of Silent Peers
When an entity that is a party to a stream has not received any XMPP
traffic from its stream peer for some period of time, the peer might
appear to be silent. There are several reasons why this might
1. The underlying TCP connection is dead.
2. The XML stream is broken despite the fact that the underlying TCP
connection is alive.
3. The peer is idle and simply has not sent any XMPP traffic over
its XML stream to the entity.
These three conditions are best handled separately, as described in
the following sections.
Implementation Note: For the purpose of handling silent peers, we
treat a two unidirectional TCP connections as conceptually
equivalent to a single bidirectional TCP connection (see
Section 4.5); however, implementers need to be aware that, in the
case of two unidirectional TCP connections, responses to traffic
at the XMPP application layer will come back from the peer on the
second TCP connection. In addition, the use of multiple streams
in each direction (which is a somewhat frequent deployment choice
for server-to-server connectivity among large XMPP service
providers) further complicates application-level checking of XMPP
streams and their underlying TCP connections, because there is no
necessary correlation between any given initial stream and any
given response stream.
4.6.1. Dead Connection
If the underlying TCP connection is dead, stream-level checks (e.g.,
[XEP-0199] and [XEP-0198]) are ineffective. Therefore, it is
unnecessary to close the stream with or without an error, and it is
appropriate instead to simply terminate the TCP connection.
One common method for checking the TCP connection is to send a space
character (U+0020) between XML stanzas, which is allowed for XML
streams as described under Section 11.7; the sending of such a space
character is properly called a "whitespace keepalive" (the term
"whitespace ping" is often used, despite the fact that it is not a
ping since no "pong" is possible). However, this is not allowed
during TLS negotiation or SASL negotiation, as described under
Section 5.3.3 and Section 6.3.5.
4.6.2. Broken Stream
Even if the underlying TCP connection is alive, the peer might never
respond to XMPP traffic that the entity sends, whether normal stanzas
or specialized stream-checking traffic such as the application-level
pings defined in [XEP-0199] or the more comprehensive Stream
Management protocol defined in [XEP-0198]. In this case, it is
appropriate for the entity to close a broken stream with a
<connection-timeout/> stream error (Section 126.96.36.199).
4.6.3. Idle Peer
Even if the underlying TCP connection is alive and the stream is not
broken, the peer might have sent no stanzas for a certain period of
time. In this case, the peer itself MAY close the stream (as
described under Section 4.4) rather than leaving an unused stream
open. If the idle peer does not close the stream, the other party
MAY either close the stream using the handshake described under
Section 4.4 or close the stream with a stream error (e.g., <resource-
constraint/> (Section 188.8.131.52) if the entity has reached a limit on
the number of open TCP connections or <policy-violation/>
(Section 184.108.40.206) if the connection has exceeded a local timeout
policy). However, consistent with the order of layers (specified
under Section 13.3), the other party is advised to verify that the
underlying TCP connection is alive and the stream is unbroken (as
described above) before concluding that the peer is idle.
Furthermore, it is preferable to be liberal in accepting idle peers,
since experience has shown that doing so improves the reliability of
communication over XMPP networks and that it is typically more
efficient to maintain a stream between two servers than to
aggressively time out such a stream.
4.6.4. Use of Checking Methods
Implementers are advised to support whichever stream-checking and
connection-checking methods they deem appropriate, but to carefully
weigh the network impact of such methods against the benefits of
discovering broken streams and dead TCP connections in a timely
manner. The length of time between the use of any particular check
is very much a matter of local service policy and depends strongly on
the network environment and usage scenarios of a given deployment and
connection type. At the time of writing, it is RECOMMENDED that any
such check be performed not more than once every 5 minutes and that,
ideally, such checks will be initiated by clients rather than
servers. Those who implement XMPP software and deploy XMPP services
are encouraged to seek additional advice regarding appropriate timing
of stream-checking and connection-checking methods, particularly when
power-constrained devices are being used (e.g., in mobile
4.7. Stream Attributes
The attributes of the root <stream/> element are defined in the
Security Warning: Until and unless the confidentiality and
integrity of the stream are protected via TLS as described under
Section 5 or an equivalent security layer (such as the SASL GSSAPI
mechanism), the attributes provided in a stream header could be
tampered with by an attacker.
Implementation Note: The attributes of the root <stream/> element
are not prepended by a namespace prefix because, as explained in
[XML-NAMES], "[d]efault namespace declarations do not apply
directly to attribute names; the interpretation of unprefixed
attributes is determined by the element on which they appear."
The 'from' attribute specifies an XMPP identity of the entity sending
the stream element.
For initial stream headers in client-to-server communication, the
'from' attribute is the XMPP identity of the principal controlling
the client, i.e., a JID of the form <localpart@domainpart>. The
client might not know the XMPP identity, e.g., because the XMPP
identity is assigned at a level other than the XMPP application layer
(as in the Generic Security Service Application Program Interface
[GSS-API]) or is derived by the server from information provided by
the client (as in some deployments of end-user certificates with the
SASL EXTERNAL mechanism). Furthermore, if the client considers the
XMPP identity to be private information then it is advised not to
include a 'from' attribute before the confidentiality and integrity
of the stream are protected via TLS or an equivalent security layer.
However, if the client knows the XMPP identity then it SHOULD include
the 'from' attribute after the confidentiality and integrity of the
stream are protected via TLS or an equivalent security layer.
I: <?xml version='1.0'?>
For initial stream headers in server-to-server communication, the
'from' attribute is one of the configured FQDNs of the server, i.e.,
a JID of the form <domainpart>. The initiating server might have
more than one XMPP identity, e.g., in the case of a server that
provides virtual hosting, so it will need to choose an identity that
is associated with this output stream (e.g., based on the 'to'
attribute of the stanza that triggered the stream negotiation
attempt). Because a server is a "public entity" on the XMPP network,
it MUST include the 'from' attribute after the confidentiality and
integrity of the stream are protected via TLS or an equivalent
I: <?xml version='1.0'?>
For response stream headers in both client-to-server and server-to-
server communication, the receiving entity MUST include the 'from'
attribute and MUST set its value to one of the receiving entity's
FQDNs (which MAY be an FQDN other than that specified in the 'to'
attribute of the initial stream header, as described under
Section 220.127.116.11 and Section 18.104.22.168).
R: <?xml version='1.0'?>
Whether or not the 'from' attribute is included, each entity MUST
verify the identity of the other entity before exchanging XML stanzas
with it, as described under Section 13.5.
Interoperability Note: It is possible that implementations based
on [RFC3920] will not include the 'from' address on any stream
headers (even ones whose confidentiality and integrity are
protected); an entity SHOULD be liberal in accepting such stream
For initial stream headers in both client-to-server and server-to-
server communication, the initiating entity MUST include the 'to'
attribute and MUST set its value to a domainpart that the initiating
entity knows or expects the receiving entity to service. (The same
information can be provided in other ways, such as a Server Name
Indication during TLS negotiation as described in [TLS-EXT].)
I: <?xml version='1.0'?>
For response stream headers in client-to-server communication, if the
client included a 'from' attribute in the initial stream header then
the server MUST include a 'to' attribute in the response stream
header and MUST set its value to the bare JID specified in the 'from'
attribute of the initial stream header. If the client did not
include a 'from' attribute in the initial stream header then the
server MUST NOT include a 'to' attribute in the response stream
R: <?xml version='1.0'?>
For response stream headers in server-to-server communication, the
receiving entity MUST include a 'to' attribute in the response stream
header and MUST set its value to the domainpart specified in the
'from' attribute of the initial stream header.
R: <?xml version='1.0'?>
Whether or not the 'to' attribute is included, each entity MUST
verify the identity of the other entity before exchanging XML stanzas
with it, as described under Section 13.5.
Interoperability Note: It is possible that implementations based
on [RFC3920] will not include the 'to' address on stream headers;
an entity SHOULD be liberal in accepting such stream headers.
The 'id' attribute specifies a unique identifier for the stream,
called a "stream ID". The stream ID MUST be generated by the
receiving entity when it sends a response stream header and MUST BE
unique within the receiving application (normally a server).
Security Warning: The stream ID MUST be both unpredictable and
non-repeating because it can be security-critical when reused by
an authentication mechanisms, as is the case for Server Dialback
[XEP-0220] and the "XMPP 0.9" authentication mechanism used before
RFC 3920 defined the use of SASL in XMPP; for recommendations
regarding randomness for security purposes, see [RANDOM].
For initial stream headers, the initiating entity MUST NOT include
the 'id' attribute; however, if the 'id' attribute is included, the
receiving entity MUST ignore it.
For response stream headers, the receiving entity MUST include the
R: <?xml version='1.0'?>
Interoperability Note: In RFC 3920, the text regarding inclusion
of the 'id' attribute was ambiguous, leading some implementations
to leave the attribute off the response stream header.
The 'xml:lang' attribute specifies an entity's preferred or default
language for any human-readable XML character data to be sent over
the stream (an XML stanza can also possess an 'xml:lang' attribute,
as discussed under Section 8.1.5). The syntax of this attribute is
defined in Section 2.12 of [XML]; in particular, the value of the
'xml:lang' attribute MUST conform to the NMTOKEN datatype (as defined
in Section 2.3 of [XML]) and MUST conform to the language identifier
format defined in [LANGTAGS].
For initial stream headers, the initiating entity SHOULD include the
I: <?xml version='1.0'?>
For response stream headers, the receiving entity MUST include the
'xml:lang' attribute. The following rules apply:
o If the initiating entity included an 'xml:lang' attribute in its
initial stream header and the receiving entity supports that
language in the human-readable XML character data that it
generates and sends to the initiating entity (e.g., in the <text/>
element for stream and stanza errors), the value of the 'xml:lang'
attribute MUST be the identifier for the initiating entity's
preferred language (e.g., "de-CH").
o If the receiving entity supports a language that matches the
initiating entity's preferred language according to the "lookup
scheme" specified in Section 3.4 of [LANGMATCH] (e.g., "de"
instead of "de-CH"), then the value of the 'xml:lang' attribute
SHOULD be the identifier for the matching language.
o If the receiving entity does not support the initiating entity's
preferred language or a matching language according to the lookup
scheme (or if the initiating entity did not include the 'xml:lang'
attribute in its initial stream header), then the value of the
'xml:lang' attribute MUST be the identifier for the default
language of the receiving entity (e.g., "en").
R: <?xml version='1.0'?>
If the initiating entity included the 'xml:lang' attribute in its
initial stream header, the receiving entity SHOULD remember that
value as the default xml:lang for all stanzas sent by the initiating
entity over the current stream. As described under Section 8.1.5,
the initiating entity MAY include the 'xml:lang' attribute in any XML
stanzas it sends over the stream. If the initiating entity does not
include the 'xml:lang' attribute in any such stanza, the receiving
entity SHOULD add the 'xml:lang' attribute to the stanza when routing
it to a remote server or delivering it to a connected client, where
the value of the attribute MUST be the identifier for the language
preferred by the initiating entity (even if the receiving entity does
not support that language for human-readable XML character data it
generates and sends to the initiating entity, such as in stream or
stanza errors). If the initiating entity includes the 'xml:lang'
attribute in any such stanza, the receiving entity MUST NOT modify or
delete it when routing it to a remote server or delivering it to a
The inclusion of the version attribute set to a value of at least
"1.0" signals support for the stream-related protocols defined in
this specification, including TLS negotiation (Section 5), SASL
negotiation (Section 6), stream features (Section 4.3.2), and stream
errors (Section 4.9).
The version of XMPP specified in this specification is "1.0"; in
particular, XMPP 1.0 encapsulates the stream-related protocols as
well as the basic semantics of the three defined XML stanza types
(<message/>, <presence/>, and <iq/> as described under Sections
8.2.1, 8.2.2, and 8.2.3, respectively).
The numbering scheme for XMPP versions is "<major>.<minor>". The
major and minor numbers MUST be treated as separate integers and each
number MAY be incremented higher than a single digit. Thus, "XMPP
2.4" would be a lower version than "XMPP 2.13", which in turn would
be lower than "XMPP 12.3". Leading zeros (e.g., "XMPP 6.01") MUST be
ignored by recipients and MUST NOT be sent.
The major version number will be incremented only if the stream and
stanza formats or obligatory actions have changed so dramatically
that an older version entity would not be able to interoperate with a
newer version entity if it simply ignored the elements and attributes
it did not understand and took the actions defined in the older
The minor version number will be incremented only if significant new
capabilities have been added to the core protocol (e.g., a newly
defined value of the 'type' attribute for message, presence, or IQ
stanzas). The minor version number MUST be ignored by an entity with
a smaller minor version number, but MAY be used for informational
purposes by the entity with the larger minor version number (e.g.,
the entity with the larger minor version number would simply note
that its correspondent would not be able to understand that value of
the 'type' attribute and therefore would not send it).
The following rules apply to the generation and handling of the
'version' attribute within stream headers:
1. The initiating entity MUST set the value of the 'version'
attribute in the initial stream header to the highest version
number it supports (e.g., if the highest version number it
supports is that defined in this specification, it MUST set the
value to "1.0").
2. The receiving entity MUST set the value of the 'version'
attribute in the response stream header to either the value
supplied by the initiating entity or the highest version number
supported by the receiving entity, whichever is lower. The
receiving entity MUST perform a numeric comparison on the major
and minor version numbers, not a string match on
3. If the version number included in the response stream header is
at least one major version lower than the version number included
in the initial stream header and newer version entities cannot
interoperate with older version entities as described, the
initiating entity SHOULD close the stream with an <unsupported-
version/> stream error (Section 22.214.171.124).
4. If either entity receives a stream header with no 'version'
attribute, the entity MUST consider the version supported by the
other entity to be "0.9" and SHOULD NOT include a 'version'
attribute in the response stream header.
4.7.6. Summary of Stream Attributes
The following table summarizes the attributes of the root <stream/>
| | initiating to receiving | receiving to initiating |
| to | JID of receiver | JID of initiator |
| from | JID of initiator | JID of receiver |
| id | ignored | stream identifier |
| xml:lang | default language | default language |
| version | XMPP 1.0+ supported | XMPP 1.0+ supported |
Figure 4: Stream Attributes