Network Working Group M. Handley
Request for Comments: 2543 ACIRI
Category: Standards Track H. Schulzrinne
March 1999 SIP: Session Initiation Protocol
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright (C) The Internet Society (1999). All Rights Reserved.
The IESG intends to charter, in the near future, one or more working
groups to produce standards for "name lookup", where such names would
include electronic mail addresses and telephone numbers, and the
result of such a lookup would be a list of attributes and
characteristics of the user or terminal associated with the name.
Groups which are in need of a "name lookup" protocol should follow
the development of these new working groups rather than using SIP for
this function. In addition it is anticipated that SIP will migrate
towards using such protocols, and SIP implementors are advised to
monitor these efforts.
The Session Initiation Protocol (SIP) is an application-layer control
(signaling) protocol for creating, modifying and terminating sessions
with one or more participants. These sessions include Internet
multimedia conferences, Internet telephone calls and multimedia
distribution. Members in a session can communicate via multicast or
via a mesh of unicast relations, or a combination of these.
SIP invitations used to create sessions carry session descriptions
which allow participants to agree on a set of compatible media types.
SIP supports user mobility by proxying and redirecting requests to
the user's current location. Users can register their current
location. SIP is not tied to any particular conference control
protocol. SIP is designed to be independent of the lower-layer
transport protocol and can be extended with additional capabilities.
Table of Contents
1 Introduction ........................................ 71.1 Overview of SIP Functionality ....................... 71.2 Terminology ......................................... 81.3 Definitions ......................................... 91.4 Overview of SIP Operation ........................... 121.4.1 SIP Addressing ...................................... 121.4.2 Locating a SIP Server ............................... 131.4.3 SIP Transaction ..................................... 141.4.4 SIP Invitation ...................................... 151.4.5 Locating a User ..................................... 171.4.6 Changing an Existing Session ........................ 181.4.7 Registration Services ............................... 181.5 Protocol Properties ................................. 181.5.1 Minimal State ....................................... 181.5.2 Lower-Layer-Protocol Neutral ........................ 181.5.3 Text-Based .......................................... 202 SIP Uniform Resource Locators ....................... 203 SIP Message Overview ................................ 244 Request ............................................. 264.1 Request-Line ........................................ 264.2 Methods ............................................. 274.2.1 INVITE .............................................. 284.2.2 ACK ................................................. 294.2.3 OPTIONS ............................................. 294.2.4 BYE ................................................. 304.2.5 CANCEL .............................................. 304.2.6 REGISTER ............................................ 314.3 Request-URI ......................................... 344.3.1 SIP Version ......................................... 354.4 Option Tags ......................................... 354.4.1 Registering New Option Tags with IANA ............... 355 Response ............................................ 365.1 Status-Line ......................................... 365.1.1 Status Codes and Reason Phrases ..................... 376 Header Field Definitions ............................ 396.1 General Header Fields ............................... 416.2 Entity Header Fields ................................ 426.3 Request Header Fields ............................... 43
1.1 Overview of SIP Functionality
The Session Initiation Protocol (SIP) is an application-layer control
protocol that can establish, modify and terminate multimedia sessions
or calls. These multimedia sessions include multimedia conferences,
distance learning, Internet telephony and similar applications. SIP
can invite both persons and "robots", such as a media storage
service. SIP can invite parties to both unicast and multicast
sessions; the initiator does not necessarily have to be a member of
the session to which it is inviting. Media and participants can be
added to an existing session.
SIP can be used to initiate sessions as well as invite members to
sessions that have been advertised and established by other means.
Sessions can be advertised using multicast protocols such as SAP,
electronic mail, news groups, web pages or directories (LDAP), among
SIP transparently supports name mapping and redirection services,
allowing the implementation of ISDN and Intelligent Network telephony
subscriber services. These facilities also enable personal mobility.
In the parlance of telecommunications intelligent network services,
this is defined as: "Personal mobility is the ability of end users to
originate and receive calls and access subscribed telecommunication
services on any terminal in any location, and the ability of the
network to identify end users as they move. Personal mobility is
based on the use of a unique personal identity (i.e., personal
number)." . Personal mobility complements terminal mobility, i.e.,
the ability to maintain communications when moving a single end
system from one subnet to another.
SIP supports five facets of establishing and terminating multimedia
User location: determination of the end system to be used for
User capabilities: determination of the media and media parameters to
User availability: determination of the willingness of the called
party to engage in communications;
Call setup: "ringing", establishment of call parameters at both
called and calling party;
Call handling: including transfer and termination of calls.
SIP can also initiate multi-party calls using a multipoint control
unit (MCU) or fully-meshed interconnection instead of multicast.
Internet telephony gateways that connect Public Switched Telephone
Network (PSTN) parties can also use SIP to set up calls between them.
SIP is designed as part of the overall IETF multimedia data and
control architecture currently incorporating protocols such as RSVP
(RFC 2205 ) for reserving network resources, the real-time
transport protocol (RTP) (RFC 1889 ) for transporting real-time
data and providing QOS feedback, the real-time streaming protocol
(RTSP) (RFC 2326 ) for controlling delivery of streaming media,
the session announcement protocol (SAP)  for advertising
multimedia sessions via multicast and the session description
protocol (SDP) (RFC 2327 ) for describing multimedia sessions.
However, the functionality and operation of SIP does not depend on
any of these protocols.
SIP can also be used in conjunction with other call setup and
signaling protocols. In that mode, an end system uses SIP exchanges
to determine the appropriate end system address and protocol from a
given address that is protocol-independent. For example, SIP could be
used to determine that the party can be reached via H.323 , obtain
the H.245  gateway and user address and then use H.225.0  to
establish the call.
In another example, SIP might be used to determine that the callee is
reachable via the PSTN and indicate the phone number to be called,
possibly suggesting an Internet-to-PSTN gateway to be used.
SIP does not offer conference control services such as floor control
or voting and does not prescribe how a conference is to be managed,
but SIP can be used to introduce conference control protocols. SIP
does not allocate multicast addresses.
SIP can invite users to sessions with and without resource
reservation. SIP does not reserve resources, but can convey to the
invited system the information necessary to do this.
In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" are to be interpreted as described in RFC 2119 
and indicate requirement levels for compliant SIP implementations.
This specification uses a number of terms to refer to the roles
played by participants in SIP communications. The definitions of
client, server and proxy are similar to those used by the Hypertext
Transport Protocol (HTTP) (RFC 2068 ). The terms and generic
syntax of URI and URL are defined in RFC 2396 . The following
terms have special significance for SIP.
Call: A call consists of all participants in a conference invited by
a common source. A SIP call is identified by a globally unique
call-id (Section 6.12). Thus, if a user is, for example, invited
to the same multicast session by several people, each of these
invitations will be a unique call. A point-to-point Internet
telephony conversation maps into a single SIP call. In a
multiparty conference unit (MCU) based call-in conference, each
participant uses a separate call to invite himself to the MCU.
Call leg: A call leg is identified by the combination of Call-ID, To
Client: An application program that sends SIP requests. Clients may
or may not interact directly with a human user. User agents and
proxies contain clients (and servers).
Conference: A multimedia session (see below), identified by a common
session description. A conference can have zero or more members
and includes the cases of a multicast conference, a full-mesh
conference and a two-party "telephone call", as well as
combinations of these. Any number of calls can be used to
create a conference.
Downstream: Requests sent in the direction from the caller to the
callee (i.e., user agent client to user agent server).
Final response: A response that terminates a SIP transaction, as
opposed to a provisional response that does not. All 2xx, 3xx,
4xx, 5xx and 6xx responses are final.
Initiator, calling party, caller: The party initiating a conference
invitation. Note that the calling party does not have to be the
same as the one creating the conference.
Invitation: A request sent to a user (or service) requesting
participation in a session. A successful SIP invitation consists
of two transactions: an INVITE request followed by an ACK
Invitee, invited user, called party, callee: The person or service
that the calling party is trying to invite to a conference.
Isomorphic request or response: Two requests or responses are defined
to be isomorphic for the purposes of this document if they have
the same values for the Call-ID, To, From and CSeq header
fields. In addition, isomorphic requests have to have the same
Location server: See location service.
Location service: A location service is used by a SIP redirect or
proxy server to obtain information about a callee's possible
location(s). Location services are offered by location servers.
Location servers MAY be co-located with a SIP server, but the
manner in which a SIP server requests location services is
beyond the scope of this document.
Parallel search: In a parallel search, a proxy issues several
requests to possible user locations upon receiving an incoming
request. Rather than issuing one request and then waiting for
the final response before issuing the next request as in a
sequential search , a parallel search issues requests without
waiting for the result of previous requests.
Provisional response: A response used by the server to indicate
progress, but that does not terminate a SIP transaction. 1xx
responses are provisional, other responses are considered final.
Proxy, proxy server: An intermediary program that acts as both a
server and a client for the purpose of making requests on behalf
of other clients. Requests are serviced internally or by passing
them on, possibly after translation, to other servers. A proxy
interprets, and, if necessary, rewrites a request message before
Redirect server: A redirect server is a server that accepts a SIP
request, maps the address into zero or more new addresses and
returns these addresses to the client. Unlike a proxy server ,
it does not initiate its own SIP request. Unlike a user agent
server , it does not accept calls.
Registrar: A registrar is a server that accepts REGISTER requests. A
registrar is typically co-located with a proxy or redirect
server and MAY offer location services.
Ringback: Ringback is the signaling tone produced by the calling
client's application indicating that a called party is being
Server: A server is an application program that accepts requests in
order to service requests and sends back responses to those
requests. Servers are either proxy, redirect or user agent
servers or registrars.
Session: From the SDP specification: "A multimedia session is a set
of multimedia senders and receivers and the data streams flowing
from senders to receivers. A multimedia conference is an example
of a multimedia session." (RFC 2327 ) (A session as defined
for SDP can comprise one or more RTP sessions.) As defined, a
callee can be invited several times, by different calls, to the
same session. If SDP is used, a session is defined by the
concatenation of the user name , session id , network type ,
address type and address elements in the origin field.
(SIP) transaction: A SIP transaction occurs between a client and a
server and comprises all messages from the first request sent
from the client to the server up to a final (non-1xx) response
sent from the server to the client. A transaction is identified
by the CSeq sequence number (Section 6.17) within a single call
leg. The ACK request has the same CSeq number as the
corresponding INVITE request, but comprises a transaction of its
Upstream: Responses sent in the direction from the user agent server
to the user agent client.
URL-encoded: A character string encoded according to RFC 1738,
Section 2.2 .
User agent client (UAC), calling user agent: A user agent client is a
client application that initiates the SIP request.
User agent server (UAS), called user agent: A user agent server is a
server application that contacts the user when a SIP request is
received and that returns a response on behalf of the user. The
response accepts, rejects or redirects the request.
User agent (UA): An application which contains both a user agent
client and user agent server.
An application program MAY be capable of acting both as a client and
a server. For example, a typical multimedia conference control
application would act as a user agent client to initiate calls or to
invite others to conferences and as a user agent server to accept
invitations. The properties of the different SIP server types are
summarized in Table 1.
property redirect proxy user agent registrar
server server server
also acts as a SIP client no yes no no
returns 1xx status yes yes yes yes
returns 2xx status no yes yes yes
returns 3xx status yes yes yes yes
returns 4xx status yes yes yes yes
returns 5xx status yes yes yes yes
returns 6xx status no yes yes yes
inserts Via header no yes no no
accepts ACK yes yes yes no
Table 1: Properties of the different SIP server types
1.4 Overview of SIP Operation
This section explains the basic protocol functionality and operation.
Callers and callees are identified by SIP addresses, described in
Section 1.4.1. When making a SIP call, a caller first locates the
appropriate server (Section 1.4.2) and then sends a SIP request
(Section 1.4.3). The most common SIP operation is the invitation
(Section 1.4.4). Instead of directly reaching the intended callee, a
SIP request may be redirected or may trigger a chain of new SIP
requests by proxies (Section 1.4.5). Users can register their
location(s) with SIP servers (Section 4.2.6).
1.4.1 SIP Addressing
The "objects" addressed by SIP are users at hosts, identified by a
SIP URL. The SIP URL takes a form similar to a mailto or telnet URL,
i.e., user@host. The user part is a user name or a telephone number.
The host part is either a domain name or a numeric network address.
See section 2 for a detailed discussion of SIP URL's.
A user's SIP address can be obtained out-of-band, can be learned via
existing media agents, can be included in some mailers' message
headers, or can be recorded during previous invitation interactions.
In many cases, a user's SIP URL can be guessed from their email
A SIP URL address can designate an individual (possibly located at
one of several end systems), the first available person from a group
of individuals or a whole group. The form of the address, for
example, sip:firstname.lastname@example.org , is not sufficient, in general, to
determine the intent of the caller.
If a user or service chooses to be reachable at an address that is
guessable from the person's name and organizational affiliation, the
traditional method of ensuring privacy by having an unlisted "phone"
number is compromised. However, unlike traditional telephony, SIP
offers authentication and access control mechanisms and can avail
itself of lower-layer security mechanisms, so that client software
can reject unauthorized or undesired call attempts.
1.4.2 Locating a SIP Server
When a client wishes to send a request, the client either sends it to
a locally configured SIP proxy server (as in HTTP), independent of
the Request-URI, or sends it to the IP address and port corresponding
to the Request-URI.
For the latter case, the client must determine the protocol, port and
IP address of a server to which to send the request. A client SHOULD
follow the steps below to obtain this information, but MAY follow the
alternative, optional procedure defined in Appendix D. At each step,
unless stated otherwise, the client SHOULD try to contact a server at
the port number listed in the Request-URI. If no port number is
present in the Request-URI, the client uses port 5060. If the
Request-URI specifies a protocol (TCP or UDP), the client contacts
the server using that protocol. If no protocol is specified, the
client tries UDP (if UDP is supported). If the attempt fails, or if
the client doesn't support UDP but supports TCP, it then tries TCP.
A client SHOULD be able to interpret explicit network notifications
(such as ICMP messages) which indicate that a server is not
reachable, rather than relying solely on timeouts. (For socket-based
programs: For TCP, connect() returns ECONNREFUSED if the client could
not connect to a server at that address. For UDP, the socket needs to
be bound to the destination address using connect() rather than
sendto() or similar so that a second write() fails with ECONNREFUSED
if there is no server listening) If the client finds the server is
not reachable at a particular address, it SHOULD behave as if it had
received a 400-class error response to that request.
The client tries to find one or more addresses for the SIP server by
querying DNS. The procedure is as follows:
1. If the host portion of the Request-URI is an IP address,
the client contacts the server at the given address.
Otherwise, the client proceeds to the next step.
2. The client queries the DNS server for address records for
the host portion of the Request-URI. If the DNS server
returns no address records, the client stops, as it has
been unable to locate a server. By address record, we mean
A RR's, AAAA RR's, or other similar address records, chosen
according to the client's network protocol capabilities.
There are no mandatory rules on how to select a host name
for a SIP server. Users are encouraged to name their SIP
servers using the sip.domainname (i.e., sip.example.com)
convention, as specified in RFC 2219 . Users may only
know an email address instead of a full SIP URL for a
callee, however. In that case, implementations may be able
to increase the likelihood of reaching a SIP server for
that domain by constructing a SIP URL from that email
address by prefixing the host name with "sip.". In the
future, this mechanism is likely to become unnecessary as
better DNS techniques, such as the one in Appendix D,
become widely available.
A client MAY cache a successful DNS query result. A successful query
is one which contained records in the answer, and a server was
contacted at one of the addresses from the answer. When the client
wishes to send a request to the same host, it MUST start the search
as if it had just received this answer from the name server. The
client MUST follow the procedures in RFC1035  regarding DNS cache
invalidation when the DNS time-to-live expires.
1.4.3 SIP Transaction
Once the host part has been resolved to a SIP server, the client
sends one or more SIP requests to that server and receives one or
more responses from the server. A request (and its retransmissions)
together with the responses triggered by that request make up a SIP
transaction. All responses to a request contain the same values in
the Call-ID, CSeq, To, and From fields (with the possible addition of
a tag in the To field (section 6.37)). This allows responses to be
matched with requests. The ACK request following an INVITE is not
part of the transaction since it may traverse a different set of
If TCP is used, request and responses within a single SIP transaction
are carried over the same TCP connection (see Section 10). Several
SIP requests from the same client to the same server MAY use the same
TCP connection or MAY use a new connection for each request.
If the client sent the request via unicast UDP, the response is sent
to the address contained in the next Via header field (Section 6.40)
of the response. If the request is sent via multicast UDP, the
response is directed to the same multicast address and destination
port. For UDP, reliability is achieved using retransmission (Section
The SIP message format and operation is independent of the transport
1.4.4 SIP Invitation
A successful SIP invitation consists of two requests, INVITE followed
by ACK. The INVITE (Section 4.2.1) request asks the callee to join a
particular conference or establish a two-party conversation. After
the callee has agreed to participate in the call, the caller confirms
that it has received that response by sending an ACK (Section 4.2.2)
request. If the caller no longer wants to participate in the call, it
sends a BYE request instead of an ACK.
The INVITE request typically contains a session description, for
example written in SDP (RFC 2327 ) format, that provides the
called party with enough information to join the session. For
multicast sessions, the session description enumerates the media
types and formats that are allowed to be distributed to that session.
For a unicast session, the session description enumerates the media
types and formats that the caller is willing to use and where it
wishes the media data to be sent. In either case, if the callee
wishes to accept the call, it responds to the invitation by returning
a similar description listing the media it wishes to use. For a
multicast session, the callee SHOULD only return a session
description if it is unable to receive the media indicated in the
caller's description or wants to receive data via unicast.
The protocol exchanges for the INVITE method are shown in Fig. 1 for
a proxy server and in Fig. 2 for a redirect server. (Note that the
messages shown in the figures have been abbreviated slightly.) In
Fig. 1, the proxy server accepts the INVITE request (step 1),
contacts the location service with all or parts of the address (step
2) and obtains a more precise location (step 3). The proxy server
then issues a SIP INVITE request to the address(es) returned by the
location service (step 4). The user agent server alerts the user
(step 5) and returns a success indication to the proxy server (step
6). The proxy server then returns the success result to the original
caller (step 7). The receipt of this message is confirmed by the
caller using an ACK request, which is forwarded to the callee (steps
8 and 9). Note that an ACK can also be sent directly to the callee,
bypassing the proxy. All requests and responses have the same Call-
+....... cs.columbia.edu .......+
: (~~~~~~~~~~) :
: ( location ) :
: ( service ) :
: (~~~~~~~~~~) :
: ^ | :
: | hgs@lab :
: 2| 3| :
: | | :
: henning | :
+.. cs.tu-berlin.de ..+ 1: INVITE : | | :
: : email@example.com: | \/ 4: INVITE 5: ring :
: firstname.lastname@example.org ========================>(~~~~~~)=========>(~~~~~~) :
: <........................( )<.........( ) :
: : 7: 200 OK : ( )6: 200 OK ( ) :
: : : ( work ) ( lab ) :
: : 8: ACK : ( )9: ACK ( ) :
: ========================>(~~~~~~)=========>(~~~~~~) :
====> SIP request
....> SIP response
| non-SIP protocols
Figure 1: Example of SIP proxy server
The redirect server shown in Fig. 2 accepts the INVITE request (step
1), contacts the location service as before (steps 2 and 3) and,
instead of contacting the newly found address itself, returns the
address to the caller (step 4), which is then acknowledged via an ACK
request (step 5). The caller issues a new request, with the same
call-ID but a higher CSeq, to the address returned by the first
server (step 6). In the example, the call succeeds (step 7). The
caller and callee complete the handshake with an ACK (step 8).
The next section discusses what happens if the location service
returns more than one possible alternative.
1.4.5 Locating a User
A callee may move between a number of different end systems over
time. These locations can be dynamically registered with the SIP
server (Sections 1.4.7, 4.2.6). A location server MAY also use one or
more other protocols, such as finger (RFC 1288 ), rwhois (RFC
2167 ), LDAP (RFC 1777 ), multicast-based protocols  or
operating-system dependent mechanisms to actively determine the end
system where a user might be reachable. A location server MAY return
several locations because the user is logged in at several hosts
simultaneously or because the location server has (temporarily)
inaccurate information. The SIP server combines the results to yield
a list of a zero or more locations.
The action taken on receiving a list of locations varies with the
type of SIP server. A SIP redirect server returns the list to the
client as Contact headers (Section 6.13). A SIP proxy server can
sequentially or in parallel try the addresses until the call is
successful (2xx response) or the callee has declined the call (6xx
response). With sequential attempts, a proxy server can implement an
If a proxy server forwards a SIP request, it MUST add itself to the
beginning of the list of forwarders noted in the Via (Section 6.40)
headers. The Via trace ensures that replies can take the same path
back, ensuring correct operation through compliant firewalls and
avoiding request loops. On the response path, each host MUST remove
its Via, so that routing internal information is hidden from the
callee and outside networks. A proxy server MUST check that it does
not generate a request to a host listed in the Via sent-by, via-
received or via-maddr parameters (Section 6.40). (Note: If a host has
several names or network addresses, this does not always work. Thus,
each host also checks if it is part of the Via list.)
A SIP invitation may traverse more than one SIP proxy server. If one
of these "forks" the request, i.e., issues more than one request in
response to receiving the invitation request, it is possible that a
client is reached, independently, by more than one copy of the
invitation request. Each of these copies bears the same Call-ID. The
user agent MUST return the same status response returned in the first
response. Duplicate requests are not an error.
1.4.6 Changing an Existing Session
In some circumstances, it is desirable to change the parameters of an
existing session. This is done by re-issuing the INVITE, using the
same Call-ID, but a new or different body or header fields to convey
the new information. This re INVITE MUST have a higher CSeq than any
previous request from the client to the server.
For example, two parties may have been conversing and then want to
add a third party, switching to multicast for efficiency. One of the
participants invites the third party with the new multicast address
and simultaneously sends an INVITE to the second party, with the new
multicast session description, but with the old call identifier.
1.4.7 Registration Services
The REGISTER request allows a client to let a proxy or redirect
server know at which address(es) it can be reached. A client MAY also
use it to install call handling features at the server.
1.5 Protocol Properties
1.5.1 Minimal State
A single conference session or call involves one or more SIP
request-response transactions. Proxy servers do not have to keep
state for a particular call, however, they MAY maintain state for a
single SIP transaction, as discussed in Section 12. For efficiency, a
server MAY cache the results of location service requests.
1.5.2 Lower-Layer-Protocol Neutral
SIP makes minimal assumptions about the underlying transport and
network-layer protocols. The lower-layer can provide either a packet
or a byte stream service, with reliable or unreliable service.
In an Internet context, SIP is able to utilize both UDP and TCP as
transport protocols, among others. UDP allows the application to more
carefully control the timing of messages and their retransmission, to
perform parallel searches without requiring TCP connection state for
each outstanding request, and to use multicast. Routers can more
readily snoop SIP UDP packets. TCP allows easier passage through
When TCP is used, SIP can use one or more connections to attempt to
contact a user or to modify parameters of an existing conference.
Different SIP requests for the same SIP call MAY use different TCP
connections or a single persistent connection, as appropriate.
For concreteness, this document will only refer to Internet
protocols. However, SIP MAY also be used directly with protocols
such as ATM AAL5, IPX, frame relay or X.25. The necessary naming
conventions are beyond the scope of this document. User agents SHOULD
implement both UDP and TCP transport. Proxy, registrar, and redirect
servers MUST implement both UDP and TCP transport.
SIP is text-based, using ISO 10646 in UTF-8 encoding throughout. This
allows easy implementation in languages such as Java, Tcl and Perl,
allows easy debugging, and most importantly, makes SIP flexible and
extensible. As SIP is used for initiating multimedia conferences
rather than delivering media data, it is believed that the additional
overhead of using a text-based protocol is not significant.