9.5. Floor Control Using Sidebars
Floor control with sidebars can be used to realize conferencing
scenarios such as an analyst briefing. In this scenario, the
conference call has a panel of speakers who are allowed to talk in
the main conference. The other participants are the analysts, who
are not allowed to speak unless they have the floor. To request
access to the floor, they have to join a new sidebar with the
moderator and ask their question. The moderator can also whisper to
each analyst what their status/position in the floor control queue,
similar to the example in Figure 15.
Figure 14 provides an example of the configuration involved for this
type of conference. As in the previous sidebar examples, there is
the main conference along with a sidebar. "Alice" and "Bob" are the
main participants in the conference, with "A1", "A2", and "A3"
representing the analysts. The sidebar remains active throughout the
conference, with the moderator, "Carol", serving as the chair. As
discussed previously, the sidebar conference is NOT independent of
the active conference (i.e., parent). The analysts are provided the
conference object ID associated with the active sidebar when they
join the main conference. The conferencing system also allocates a
conference ID to be used for any subsequent manipulations of the
sidebar conference. The conferencing system maintains the mapping
between this conference ID and the conference object ID associated
with the active sidebar conference through the conference instance.
The analysts are permanently muted while in the main conference. The
analysts are moved to the sidebar when they wish to speak. Only one
analyst is given the floor at a given time. All participants in the
main conference receive audio from the sidebar conference, as well as
audio provided by the panelists in the main conference.
When "A1" wishes to ask a question, he sends a Floor Request message
to the floor control server. Upon receipt of the request, the floor
control server notifies the moderator, "Carol" of the active sidebar
conference, who's serving as the floor chair. Note, that this
signaling flow is not shown in the diagram. Since no other analysts
have yet requested the floor, "Carol" indicates to the floor control
server that "A1" may be granted the floor.
9.6. Whispering or Private Messages
The case of private messages can be handled as a sidebar with just
two participants, similar to the example in Section 9.4.1, but rather
than using audio within the sidebar, "Alice" could add an additional
text based media stream to the sidebar. The other context, referred
to as whisper, in this document refers to situations involving one
time media targeted to specific user(s). An example of a whisper
would be an announcement injected only to the conference chair or to
a new participant joining a conference.
Figure 15 provides an example of one user "Alice" who's chairing a
fixed length conference with "Bob" and "Carol". The configuration is
such that only the chair is providing a warning when there are only
10 minutes left in the conference. At that time, "Alice" is moved
into a sidebar created by the conferencing system and only "Alice"
receives the announcement.
When the conferencing system determines that there are only 10
minutes left in the conference which "Alice" is chairing, rather than
creating a reservation as was done for the sidebar in Section 9.4.1,
the conferencing system directly creates an active sidebar
conference, based on the active conference associated with "Alice".
As discussed previously, the sidebar conference is NOT independent of
the active conference (i.e., parent). The conferencing system also
allocates a conference ID to be used for any subsequent manipulations
of the sidebar conference. The conferencing system maintains the
mapping between this conference ID and the conference object ID
associated with the active sidebar conference through the conference
Immediately upon creation of the active sidebar conference, the
announcement media is provided to "Alice". Depending upon the
policies, "Alice" may be notified of her addition to the sidebar via
the conference notification service. "Alice" continues to receive
the media from the main conference.
Upon completion of the announcement, "Alice" is removed from the
sidebar, and the sidebar conference is deleted. Depending upon the
policies, "Alice" may be notified of her removal from the sidebar via
the conference notification service.
9.7. Conference Announcements and Recordings
Each participant can require a different type of announcement and/or
recording service from the system. For example, "Alice", the
conference chair, could be listening to a roll call while "Bob" may
be using a telephony user interface to create a sidebar. Some
announcements would apply to all the participants such as "This
conference will end in 10 minutes". Recording is often required to
capture the names of participants as they join a conference,
typically after the participant has entered an access code, as
discussed in Section 9.8. These recorded names are then announced to
all the participants as the new participant is added to the active
An example of a conferencing recording and announcement, along with
collecting the dual tone multi-frequency (DTMF), within the context
of this framework, is shown in Figure 16.
representing "Bob's" active conference. The conferencing system
determines that a password is required for this specific conference;
thus, an announcement asking "Alice" to enter the password is
provided to "Alice". Once "Alice" enters the password, it is
validated against the policies associated with "Bob's" active
conference. The conferencing system then connects to a server that
prompts and records "Alice"'s name. The conferencing system must
also determine whether "Alice" is already a user of this conferencing
system or whether she is a new user.
If "Alice" is a new user for this conferencing system, a conference
user identifier is created for "Alice". Based upon the addressing
information provided by "Alice", the call signaling to add "Alice" to
the conference is instigated through the focus.
Once the call signaling indicates that "Alice" has been successfully
added to the specific conference, per updates to the state, and
depending upon the policies, other participants (e.g., "Bob") are
notified of the addition of "Alice" to the conference via the
conference notification service, and an announcement is provided to
all the participants indicating that "Alice" has joined the
9.8. Monitoring for DTMF
The conferencing system also needs the capability to monitor for DTMF
from each individual participant. This would typically be used to
enter the identifier and/or access code for joining a specific
An example of DTMF monitoring, within the context of the framework
elements, is shown in Figure 16.
9.9. Observing and Coaching
The capability to observe a conference allows a participant with the
appropriate authority to listen to the conference, typically without
being an active participant and often as a hidden participant. When
such a capability is available on a conferencing system, there is
often an announcement provided to each participant as they join the
conference indicating the call may be monitored. This capability is
useful in the context of conferences, which might be experiencing
technical difficulties, thus allowing a technician to listen in to
evaluate the type of problem.
This capability could also apply to call center applications as it
provides a mechanism for a supervisor to observe how the agent is
handling a particular call with a customer. This scenario can be
handled by a supervisor adding themselves to the existing active
conference, with a listen only audio media path. Whether the agent
is aware of when the supervisor joins the call should be
Taking the supervisor capability one step further introduces a
scenario whereby the agent can hear the supervisor, as well as the
customer. The customer can still only hear the agent. This scenario
would involve the creation of a sidebar involving the agent and the
supervisor. Both the agent and supervisor receive the audio from the
main conference. When the agent speaks, it is heard by the customer
in the main conference. When the supervisor speaks, it is heard only
by the agent in the sidebar conference.
An example of observing and coaching is shown in Figure 17. In this
example, call center agent "Bob" is involved in a conference with
customer "Carol". Since "Bob" is a new agent and "Alice" sees that
he has been on the call with "Carol" for longer than normal, she
decides to observe the call and coach "Bob" as necessary.
Upon receipt of the conference control protocol request from "Alice"
to "reserve" a new sidebar conference, based upon the active
conference received in the request, the conferencing system uses the
received active conference to clone a conference reservation for the
sidebar. The conferencing system also reserves or allocates a
conference ID to be used for any subsequent protocol requests from
any of the members of the conference. The conferencing system
maintains the mapping between this conference ID and the conference
object ID associated with the sidebar reservation through the
Upon receipt of the conference control protocol response to reserve
the conference, "Alice" can now create an active conference using
that reservation or create additional reservations based upon the
existing reservations. In this example, "Alice" wants only "Bob" to
be involved in the sidebar; thus, she manipulates the membership.
"Alice" also wants the audio to be received by herself and "Bob" from
the original conference, but wants any outgoing audio from herself to
be restricted to the participants in the sidebar, whereas "Bob's"
outgoing audio should go to the main conference, so that both "Alice"
and the customer "Carol" hear the same audio from "Bob". "Alice"
sends a conference control protocol request to update the information
in the reservation and to create an active conference.
Upon receipt of the conference control protocol request to update the
reservation and to create an active conference for the sidebar, as
identified by the conference object ID, the conferencing system
ensures that "Alice" has the appropriate authority based on the
policies associated with that specific conference object to perform
the operation. Based upon the addressing information provided for
"Bob" by "Alice", the call signaling to add "Bob" to the sidebar with
the appropriate media characteristics is instigated through the
"Bob" is notified of his addition to the sidebar via the conference
notification service; thus, he is aware that "Alice", the supervisor,
is available for coaching him through this call.
10. Relationships between SIP and Centralized Conferencing Frameworks
The SIP Conferencing Framework [RFC4353] provides an overview of a
wide range of centralized conferencing solutions known today in the
conferencing industry. The document introduces a terminology and
logical entities in order to systemize the overview and to show the
common core of many of these systems. The logical entities and the
listed scenarios in the SIP Conferencing Framework are used to
illustrate how SIP [RFC3261] can be used as a signaling means in
these conferencing systems. The SIP Conferencing Framework does not
define new conference control protocols to be used by the general
conferencing system. It uses only basic SIP [RFC3261], the SIP
Conferencing for User Agents [RFC4579], and the SIP Conference
Package [RFC4575] for basic SIP conferencing realization.
This centralized conferencing framework document defines a particular
centralized conferencing system and the logical entities implementing
it. It also defines a particular data model and refers to the set of
protocols (beyond call signaling means) to be used among the logical
entities for implementing advanced conferencing features. The
purpose of the XCON Working Group and this framework is to achieve
interoperability between the logical entities from different vendors
for controlling different aspects of advanced conferencing
The logical entities defined in the two frameworks are not intended
to be mapped one-to-one. The two frameworks differ in the
interpretation of the internal conferencing system decomposition and
the corresponding operations. Nevertheless, the basic SIP [RFC3261],
the SIP Conferencing for User Agents [RFC4579], and the SIP
Conference Package [RFC4575] are fully compatible with both framework
documents. The basis for compatibility is provided by including the
basic data elements defined in [RFC4575] in the Conference
Information Data Model for Centralized Conferencing (XCON)
[XCON-COMMON]. User agents that only support [RFC4579] and do not
support the Conferencing Control Protocol are still provided basic
SIP conferencing, but cannot take advantage of any of the advanced
11. Security Considerations
There are a wide variety of potential attacks related to
conferencing, due to the natural involvement of multiple endpoints
and the many, often user-invoked, capabilities provided by the
conferencing system. Examples of attacks include the following: an
endpoint attempting to listen to conferences in which it is not
authorized to participate, an endpoint attempting to disconnect or
mute other users, and theft of service by an endpoint in attempting
to create conferences it is not allowed to create.
There are several issues surrounding security of this conferencing
framework. One set of issues involves securing the actual protocols
and the associated authorization mechanisms. This first set of
issues should be addressed in the specifications specific to the
protocols described in Section 8 and policy control. The protocols
used for manipulation and retrieval of confidential information need
to support a confidentiality and integrity mechanism. Similar
requirements apply for the floor control protocols. Section 11.3
discusses an approach for client authentication of a floor control
server. It is RECOMMENDED that all the protocols that interface with
the conferencing system implement Transport Layer Security (TLS).
There are also security issues associated with the authorization to
perform actions on the conferencing system to invoke specific
capabilities. Section 5.2 discusses the policies associated with the
conference object to ensure that only authorized entities are able to
manipulate the data to access the capabilities. Another set of
issues involves the privacy and security of the identity of a user in
the conference, which is discussed in Section 11.2.
A final issue is related to Denial of Service (DoS) attacks on the
conferencing system itself. In order to minimize the potential for
DoS attacks, it is recommended that conferencing systems require user
authentication and authorization for any client participating in a
conference. It is recommended that the specific signaling and media
protocols include mechanisms to minimize the potential for DoS.
11.1. User Authentication and Authorization
Many policy authorization decisions are based on the identity of the
user or the role that a user may have. Conferencing systems
typically require authentication of users to validate their identity.
There are several ways that a user might authenticate its identity to
the system. For users joining a conference using one of the call
signaling protocols, the user authentication mechanisms for the
specific protocol often suffice. For the case of users joining the
conference via SIP signaling or using the conference control
protocol, TLS is RECOMMENDED.
The conferencing system may also know (e.g., out-of-band mechanisms)
about specific users and assign passwords to allow these users to be
authorized. In some cases (e.g., Public Switched Telephone Network
(PSTN) users), additional authorization may be required to allow the
user to participate in the conference. This may be in the form of an
Interactive Voice Response (IVR) system or other means. The users
may also be authorized by knowing a particular conference ID and a
Personal Identification (PIN) for it. Sometimes, a PIN is not
required and the conference ID is used as a shared secret.
In the cases where a user is authorized via multiple mechanisms, it
is up to the conferencing system to correlate (if desired) the
authorization of the call signaling interface with other
authorization mechanisms. A conferencing system can avoid the
problem with multiple mechanisms by restricting the methods by which
a conference can be joined. For example, many conferencing systems
that provide a web interface for conferences correlate the PSTN call
signaling by forcing a dial-out mode for joining the conference.
Thus, there is only the need for a single PIN or password to join the
When a conferencing system presents the identity of authorized users,
it may choose to provide information about the way the identity was
proven or verified by the system. A user may also come as a
completely unauthenticated user into the system -- this fact needs
also to be communicated to interested parties.
When guest users interact with the system, it is often in the context
of a particular conference. In this case, the user may provide a PIN
or a password that is specific to the conferences and authorizes the
user to take on a certain role in that conference. The guest user
can then perform actions that are allowed to any user with that role.
The term password refers to the usual, reasonable sized and hard to
predict shared secret. Today, users often have passwords containing
up to 30 bits (8-16 characters) of entropy. A PIN is a special
password case -- a shared secret that is only numeric and often
contains a fairly small number of bits (often as few as 10 bits or 3
digits). When conferencing systems are used for audio on the PSTN,
there is often a need to authenticate using a PIN. Typically, if the
user fails to provide the correct PIN a few times in a row, the PSTN
call is disconnected. The rate of making the calls and getting to
the point to enter a PIN makes it fairly hard to do an exhaustive
search of the PIN space even for 4 digit PINs. When using a high
speed interface to connect to a conferencing system, it is often
possible to do thousands of attempts per second and the PIN space
could quickly be searched. Because of this, it is not appropriate to
use PINs for authorization on any of the interfaces that provide fast
queries or many simultaneous queries.
Once a user is authenticated and authorized through the various
mechanisms available on the conferencing system, a conference user
identifier is associated with any signaling specific user identifiers
that may have been used for authentication and authorization. This
conference user identifier may be provided to a specific user through
the conference notification interface and will be provided to users
that interact with the conferencing system using the conference
control protocol. This conference user identifier is required for
any subsequent operations on the conference object.
11.2. Security and Privacy of Identity
This conferencing system has an idea of the identity of a user, but
this does not mean it can reveal this identity to other users, due to
privacy considerations. Users can select various options for
revealing their identity to other users. A user can be "hidden" such
that other users can not see they are participants in the conference,
"anonymous" such that users can see that another user is there, but
not see the identity of the user, or they can be "public" where other
users can see their identity. If there are multiple "anonymous"
users, other parties will be able to see them as independent
"anonymous" parties and will be able to tell how many "anonymous"
parties are in the conference. Note, that the visibility to other
participants is dependent on their roles. For example, users'
identity (including "anonymous" and "hidden") may be displayed to the
moderator or administrator, subject to a conferencing system's local
policies. "Hidden" status is often used by automated or machine
participants of a conference (e.g., call recording) and is also used
in many call center situations.
Since a conferencing system based on this framework allocates a
unique conference user identifier for each user of the conferencing
system, it is not necessary to distribute any signaling specific user
identifier to other users or participants. Access to any signaling
specific user identifiers can be controlled by applying the
appropriate access control to the signaling specific user identifiers
in the data schema.
11.3. Floor Control Server Authentication
The floor control protocol contains mechanisms that clients can use
to authenticate servers, and that servers can use to authenticate
clients, as described in Section 9 of [RFC4582]. The precise
mechanisms used for such authentication can vary depending on the
call control protocol used. Clients using call control protocols
that employ an SDP offer/answer model, such as SIP, use the mechanism
described in Section 8 of [RFC4583]. Clients using other call
control protocols make use of the mechanisms described in the BFCP
Connection Establishment document [RFC5018].
This document is a result of architectural discussions among IETF
XCON Working Group participants. The authors would like to thank
Henning Schulzrinne for the "Conference Object Tree" proposal and
general feedback, Cullen Jennings for providing input for the
"Security Considerations" section, and Keith Lantz, Dave Morgan,
Oscar Novo, Roni Even, Umesh Chandra, Avshalom Houri, Sean Olson,
Rohan Mahy, Brian Rosen, Pierre Tane, Bob Braudes, Gregory Sperounes,
and Gonzalo Camarillo for their reviews and constructive input. In
addition, the authors would like to thank Scott Brim for his gen-art
review comments and Kurt Zeilenga for his secdir review comments.
13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
13.2. Informative References
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP:
Session Description Protocol", RFC 4566, July 2006.
[RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G.,
Johnston, A., Peterson, J., Sparks, R., Handley, M.,
and E. Schooler, "SIP: Session Initiation Protocol",
RFC 3261, June 2002.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer
Model with Session Description Protocol (SDP)",
RFC 3264, June 2002.
[RFC3265] Roach, A., "Session Initiation Protocol (SIP)-Specific
Event Notification", RFC 3265, June 2002.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC2445] Dawson, F. and Stenerson, D., "Internet Calendaring
and Scheduling Core Object Specification (iCalendar)",
RFC 2445, November 1998.
[RFC4245] Levin, O. and R. Even, "High-Level Requirements for
Tightly Coupled SIP Conferencing", RFC 4245,
[RFC4353] Rosenberg, J., "A Framework for Conferencing with the
Session Initiation Protocol (SIP)", RFC 4353,
[RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A
Session Initiation Protocol (SIP) Event Package for
Conference State", RFC 4575, August 2006.
[RFC4376] Koskelainen, P., Ott, J., Schulzrinne, H., and X. Wu,
"Requirements for Floor Control Protocols", RFC 4376,
[RFC4597] Even, R. and N. Ismail, "Conferencing Scenarios",
RFC 4597, August 2006.
[RFC4579] Johnston, A. and O. Levin, "Session Initiation
Protocol (SIP) Call Control - Conferencing for User
Agents", BCP 119, RFC 4579, August 2006.
[RFC4582] Camarillo, G., Ott, J., and K. Drage, "The Binary
Floor Control Protocol (BFCP)", RFC 4582,
[RFC4574] Levin, O. and G. Camarillo, "The Session Description
Protocol (SDP) Label Attribute", RFC 4574,
[RFC4583] Camarillo, G., "Session Description Protocol (SDP)
Format for Binary Floor Control Protocol (BFCP)
Streams", RFC 4583, November 2006.
[XCON-COMMON] Novo, O., Camarillo, G., Morgan, D., and R. Even,
"Conference Information Data Model for Centralized
Conferencing (XCON)", Work in Progress, March 2008.
[RFC4975] Campbell, B., Mahy, R., and C. Jennings, "The Message
Session Relay Protocol (MSRP)", RFC 4975,
[RFC5018] Camarillo, G., "Connection Establishment in the Binary
Floor Control Protocol (BFCP)", RFC 5018,
2201 Lakeside Blvd
Wern Fawr Lane
Cardiff, South Wales CF3 5EA
One Microsoft Way
Redmond, WA 98052
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at