Tech-invite3GPPspecsGlossariesIETFRFCsGroupsSIPABNFsWorld Map

RFC 6231


An Interactive Voice Response (IVR) Control Package for the Media Control Channel Framework

Part 6 of 6, p. 105 to 134
Prev RFC Part


prevText      Top      Up      ToC       Page 105 
6.  Examples

   This section provides examples of the IVR Control Package.

6.1.  AS-MS Dialog Interaction Examples

   The following example assume a Control Channel has been established
   and synced as described in the Media Control Channel Framework

   The XML messages are in angled brackets (with the root <mscivr>
   omitted); the REPORT status is in round brackets.  Other aspects of
   the protocol are omitted for readability.

6.1.1.  Starting an IVR Dialog

   An IVR dialog is started successfully, and dialogexit notification
   <event> is sent from the MS to the AS when the dialog exits normally.

Top      Up      ToC       Page 106 
             Application Server (AS)                   Media Server (MS)
                |                                             |
                |       (1) CONTROL: <dialogstart>            |
                |  ---------------------------------------->  |
                |                                             |
                |       (2) 202                               |
                |  <---------------------------------------   |
                |                                             |
                |                                             |
                |       (3) REPORT: <response status="200"/>  |
                |                   (terminate)               |
                |  <----------------------------------------  |
                |                                             |
                |       (4) 200                               |
                |  ---------------------------------------->  |
                |                                             |
                |       (5) CONTROL: <event ... />            |
                |                                             |
                |  <----------------------------------------  |
                |                                             |
                |       (6) 200                               |
                |  ---------------------------------------->  |
                |                                             |

6.1.2.  IVR Dialog Fails to Start

   An IVR dialog fails to start due to an unknown dialog language.  The
   <response> is reported in a framework 200 message.

             Application Server (AS)                   Media Server (MS)
                |                                             |
                |       (1) CONTROL: <dialogstart>            |
                |  ---------------------------------------->  |
                |                                             |
                |       (2) 200: <response status="421"/>     |
                |  <----------------------------------------  |
                |                                             |

Top      Up      ToC       Page 107 
6.1.3.  Preparing and Starting an IVR Dialog

   An IVR dialog is prepared and started successfully, and then the
   dialog exits normally.

             Application Server (AS)                   Media Server (MS)
                |                                             |
                |       (1) CONTROL: <dialogprepare>          |
                |  ---------------------------------------->  |
                |                                             |
                |       (2) 202                               |
                |  <---------------------------------------   |
                |                                             |
                |       (3) REPORT: <response status="200"/>  |
                |                   (terminate)               |
                |  <----------------------------------------  |
                |                                             |
                |       (4) 200                               |
                |  ---------------------------------------->  |
                |                                             |
                |       (5) CONTROL: <dialogstart>            |
                |  ---------------------------------------->  |
                |                                             |
                |       (6) 202                               |
                |  <---------------------------------------   |
                |                                             |
                |       (7) REPORT: <response status="200"/>  |
                |                   (terminate)               |
                |  <----------------------------------------  |
                |                                             |
                |       (8) 200                               |
                |  ---------------------------------------->  |
                |                                             |
                |       (9) CONTROL: <event .../>             |
                |  <----------------------------------------  |
                |                                             |
                |       (10) 200                              |
                |  ---------------------------------------->  |
                |                                             |

Top      Up      ToC       Page 108 
6.1.4.  Terminating a Dialog

   An IVR dialog is started successfully, and then terminated by the AS.
   The dialogexit event is sent to the AS when the dialog exits.

             Application Server (AS)                   Media Server (MS)
                |                                             |
                |       (1) CONTROL: <dialogstart>            |
                |  ---------------------------------------->  |
                |                                             |
                |       (2) 202                               |
                |  <---------------------------------------   |
                |                                             |
                |       (3) REPORT: <response status="200"/>  |
                |                   (terminate)               |
                |  <----------------------------------------  |
                |                                             |
                |       (4) 200                               |
                |  ---------------------------------------->  |
                |                                             |
                |       (5) CONTROL: <dialogterminate>        |
                |  ---------------------------------------->  |
                |                                             |
                |       (6) 200: <response status="200"/>     |
                |  <----------------------------------------  |
                |                                             |
                |       (7) CONTROL: <event .../>             |
                |  <----------------------------------------  |
                |                                             |
                |       (8) 200                               |
                |  ---------------------------------------->  |
                |                                             |

   Note that in (6) the <response> payload to the <dialogterminate/>
   request is carried on a framework 200 response since it could
   complete the requested operation before the transaction timeout.

6.2.  IVR Dialog Examples

   The following examples show how <dialog> is used with
   <dialogprepare>, <dialogstart>, and <event> elements to play prompts,
   set runtime controls, collect DTMF input, and record user input.

   The examples do not specify all messages between the AS and MS.

Top      Up      ToC       Page 109 
6.2.1.  Playing Announcements

   This example prepares an announcement composed of two prompts where
   the dialog repeatCount is set to 2.

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
      <dialog repeatCount="2">
         <media loc=""/>
         <media loc=""/>

   If the dialog is prepared successfully, a <response> is returned with
   status 200 and a dialog identifier assigned by the MS:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
     <response status="200" dialogid="vxi78"/>

   The prepared dialog is then started on a conference playing the
   prompts twice:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart prepareddialogid="vxi78" conferenceid="conference11"/>

   In the case of a successful dialog, the output is provided in
   <event>; for example:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <event dialogid="vxi78">
       <dialogexit status="1">
          <promptinfo termmode="completed" duration="24000"/>

6.2.2.  Prompt and Collect

   In this example, a prompt is played and then the MS waits for 30s for
   a two digit sequence:

Top      Up      ToC       Page 110 
   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns">
      <media loc=""/>
     <collect timeout="30s" maxdigits="2"/>

   If no user input is collected within 30s, then the following
   notification event would be returned:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <event dialogid="vxi81">
       <dialogexit status="1" >
          <promptinfo termmode="completed" duration="4000"/>
          <collectinfo termmode="noinput"/>

   The collect operation can be specified without a prompt.  Here the MS
   just waits for DTMF input from the user (the maxdigits attribute of
   <collect> defaults to 5):

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns">

   If the dialog is successful, then dialogexit <event> contains the
   dtmf collected in its result parameter:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <event dialogid="vxi80">
       <dialogexit status="1">
          <collectinfo dtmf="12345" termmode="match"/>

   And finally, in this example, one of the input parameters is invalid:

Top      Up      ToC       Page 111 
   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
   <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns">
    <dialog repeatCount="two">
        <media loc=""/>
      <collect cleardigitbuffer="true"
      timeout="4s" interdigittimeout="2s"
      termtimeout="0s" maxdigits="2"/>

   The error is reported in the response:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <response status="400" dialogid="vxi82"
     reason="repeatCount attribute value invalid: two"/>

6.2.3.  Prompt and Record

   In this example, the user is prompted, then their input is recorded
   for a maximum of 30 seconds.

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
   <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns">
          <media loc=""/>
         <record dtmfterm="false" maxtime="30s" beep="true"/>

   If successful and the recording is terminated by DTMF, the following
   is returned in a dialogexit <event>:

Top      Up      ToC       Page 112 
   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <event dialogid="vxi83">
     <dialogexit status="1">
      <recordinfo termmode="dtmf">
       <mediainfo type="audio/x-wav"

6.2.4.  Runtime Controls

   In this example, a prompt is played with the collect operation and
   runtime controls activated.

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns">
     <prompt bargein="true">
      <media loc=""/>
     <control ffkey="5" rwkey="6" speedupkey="3"
     <collect maxdigits="2"/>

   Once the dialog is active, the user can press keys 3, 4, 5, and 6 to
   execute runtime controls on the prompt queue.  The keys do not cause
   bargein to occur.  If the user presses any other key, then the prompt
   is interrupted and DTMF collect begins.  Note that runtime controls
   are not active during the collect operation.

   When the dialog is completed successfully, then both control and
   collect information is reported.

Top      Up      ToC       Page 113 
   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <event dialogid="vxi81">
       <dialogexit status="1">
          <promptinfo termmode="bargein"/>
           <controlmatch dtmf="4" timestamp="2008-05-12T12:13:14Z"/>
           <controlmatch dtmf="3" timestamp="2008-05-12T12:13:15Z"/>
           <controlmatch dtmf="5" timestamp="2008-05-12T12:13:16Z"/>
          <collectinfo termmode="match" dtmf="14"/>

6.2.5.  Subscriptions and Notifications

   In this example, a looped dialog is started with subscription for
   notifications each time the user input matches the collect grammar:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart connectionid="7HDY839:HJKSkyHS">
     <dialog repeatCount="0">
      <collect maxdigits="2"/>
      <dtmfsub matchmode="collect"/>

   Each time the user input the DTMF matching the grammar, the following
   notification event would be sent:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <event dialogid="vxi81">
       <dtmfnotify matchmode="collect" dtmf="12"

   If no user input was provided, or the input did not match the
   grammar, the dialog would continue to loop until terminated (or an
   error occurred).

6.2.6.  Dialog Repetition until DTMF Collection Complete

   This example is a prompt and collect dialog to collect the PIN from
   the user.  The repeatUntilComplete attribute in the <dialog> is set

Top      Up      ToC       Page 114 
   to true in this case so that when the grammar collection is complete,
   the MS automatically terminates the dialog repeat cycle and reports
   the results in a <dialogexit> event.

      <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
       <dialogstart connectionid="7HDY839:HJKSkyHS">
        <dialog repeatCount="3" repeatUntilComplete="true">
         <prompt bargein="true">
           <media loc=""/>
         <collect maxdigits="4"/>

   If the user barges in on the prompt and <collect> receives DTMF input
   matching the grammar, the dialog cycle is considered complete and the
   MS returns the following:

      <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
       <event dialogid="vxi81">
         <dialogexit status="1">
           <promptinfo duration="3654" termmode="bargein"/>
           <collectinfo dtmf="1234" termmode="match"/>

   If no user input was provided, or the input did not match the
   grammar, the dialog would loop for a maximum of 3 times.

6.3.  Other Dialog Languages

   The following example requests that a VoiceXML dialog is started:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart dialogid="d2"
      <param name="prompt1">nfs://nas01/media1.3gp</param>
      <param name="prompt2">nfs://nas01/media2.3gp</param>

Top      Up      ToC       Page 115 
   If the MS does not support this dialog language, then the response
   would have the status code 421 (Section 4.5).  However, if it does
   support the VoiceXML dialog language, it would respond with a 200
   status, activate the VoiceXML dialog, and make the <params> available
   to the VoiceXML script as described in Section 9.

   When the VoiceXML dialog exits, exit namelist parameters are
   specified using <params> in the dialogexit event:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <event dialogid="d2">
      <dialogexit status="1">
        <param name="username">peter</param>
        <param name="pin">1234</param>

6.4.  Foreign Namespace Attributes and Elements

   An MS can support attributes and elements from foreign namespaces
   within the <mscivr> element.  For example, the MS could support a
   <listen> element (in a foreign namespace) for speech recognition by
   analogy to how <collect> supports DTMF collection.

   In the following example, a prompt and collect request is extended
   with a <listen> element:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"
    <dialogstart connectionid="7HDY839:HJKSkyHS~HUwkuh7ns">
      <media loc=""/>
     <collect timeout="30s" maxdigits="4"/>
     <ex:listen maxtimeout="30s" >
       <ex:grammar src=""/>

   In the <mscivr> root element, the xmlns:ex attribute declares that
   "ex" is associated with the foreign namespace URI
   "".  The <ex:listen>,

Top      Up      ToC       Page 116 
   its attributes, and child elements are associated with this
   namespace.  This <listen> could be defined so that it activates an
   SRGS grammar and listens for user input matching the grammar in a
   similar manner to DTMF collection.

   If an MS receives this request but does not support the <listen>
   element, then it would send a 431 response:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <response status="431" dialogid="d560"
     reason="unsupported foreign listen element"/>

   If the MS does support this foreign element, it would send a 200
   response and start the dialog with speech recognition.  When the
   dialog exits, it provides information about the <listen> execution
   within <dialogexit>, again using elements in a foreign namespace such
   as <listeninfo> below:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr"
    <event dialogid="d560">
      <dialogexit status="1">
       <ex:listeninfo speech="1 2 3 4" termmode="match"/>

   Note that in reply the AS sends a Control Framework 200 response even
   though the notification event contains an element in a foreign
   namespace that it might not understand.

7.  Security Considerations

   As this Control Package processes XML markup, implementations MUST
   address the security considerations of [RFC3023].

   Implementations of this Control Package MUST address security,
   confidentiality, and integrity of messages transported over the
   Control Channel as described in Section 12 of "Media Control Channel
   Framework" [RFC6230], including Transport Level Protection, Control
   Channel Policy Management, and Session Establishment.  In addition,
   implementations MUST address security, confidentiality, and integrity
   of User Agent sessions with the MS, both in terms of SIP signaling
   and associated RTP media flow; see [RFC6230] for further details on
   this topic.  Finally, implementations MUST address security,

Top      Up      ToC       Page 117 
   confidentiality, and integrity of sessions where, following a URI
   scheme, an MS uploads recordings or retrieves documents and resources
   (e.g., fetching a grammar document from a web server using HTTPS).

   Adequate transport protection and authentication are critical,
   especially when the implementation is deployed in open networks.  If
   the implementation fails to correctly address these issues, it risks
   exposure to malicious attacks, including (but not limited to):

   Denial of Service:  An attacker could insert a request message into
      the transport stream causing specific dialogs on the MS to be
      terminated immediately.  For example, <dialogterminate
      dialogid="XXXX" immediate="true">, where the value of "XXXX" could
      be guessed or discovered by auditing active dialogs on the MS
      using an <audit> request.  Likewise, an attacker could impersonate
      the MS and insert error responses into the transport stream so
      denying the AS access to package capabilities.

   Resource Exhaustion:  An attacker could insert into the Control
      Channel new request messages (or modify existing ones) with, for
      instance, <dialogprepare> elements with a very long fetchtimeout
      attribute and a bogus source URL.  At some point, this will
      exhaust the number of connections that the MS is able to make.

   Phishing:  An attacker with access to the Control Channel could
      modify the "loc" attribute of the <media> element in a dialog to
      point to some other audio file that had different information from
      the original.  This modified file could include a different phone
      number for people to call if they want more information or need to
      provide additional information (such as governmental, corporate,
      or financial information).

   Data Theft:  An attacker could modify a <record> element in the
      Control Channel so as to add a new recording location:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
         <media type="audio/x-wav" loc="(Good URI)"/>
         <media type="audio/x-wav" loc="(Attacker's URI)"/>

Top      Up      ToC       Page 118 
   The recorded data would be uploaded to two locations indicated by the
   "{Good URI}" and the "{Attacker's URI}".  This allows the attacker to
   steal the recorded audio (which could include sensitive or
   confidential information) without the originator of the request
   necessarily being aware of the theft.

   The Media Control Channel Framework permits additional security
   policy management, including resource access and Control Channel
   usage, to be specified at the Control Package level beyond that
   specified for the Media Control Channel Framework (see Section 12.3
   of [RFC6230]).

   Since creation of IVR dialogs is associated with media processing
   resources (e.g., DTMF detectors, media playback and recording, etc.)
   on the MS, the security policy for this Control Package needs to
   address how such dialogs are securely managed across more than one
   Control Channel.  Such a security policy is only useful for secure,
   confidential, and integrity-protected channels.  The identity of
   Control Channels is determined by the channel identifier, i.e., the
   value of the cfw-id attribute in the SDP and 'Dialog-ID' header in
   the channel protocol (see [RFC6230]).  Channels are the same if they
   have the same identifier; otherwise, they are different.  This
   Control Package imposes the following additional security policies:

   Responses:  The MS MUST only send a response to a dialog management
      or audit request using the same Control Channel as the one used to
      send the request.

   Notifications:  The MS MUST only send notification events for a
      dialog using the same Control Channel as it received the request
      creating the dialog.

   Auditing:  The MS MUST only provide audit information about dialogs
      that have been created on the same Control Channel as the one upon
      the <audit> request is sent.

   Rejection:  The MS SHOULD reject requests to audit or manipulate an
      existing dialog on the MS if the channel is not the same as the
      one used when the dialog was created.  The MS rejects a request by
      sending a Control Framework 403 response (see Section 7.4 and
      Section 12.3 of [RFC6230]).  For example, if a channel with
      identifier 'cfw1234' has been used to send a request to create a
      particular dialog and the MS receives on channel 'cfw98969' a
      request to audit or terminate the dialog, then the MS sends a 403
      framework response.

Top      Up      ToC       Page 119 
   There can be valid reasons why an implementation does not reject an
   audit or dialog manipulation request on a different channel from the
   one that created the dialog.  For example, a system administrator
   might require a separate channel to audit dialog resources created by
   system users and to terminate dialogs consuming excessive system
   resources.  Alternatively, a system monitor or resource broker might
   require a separate channel to audit dialogs managed by this package
   on an MS.  However, the full implications need to be understood by
   the implementation and carefully weighted before accepting these
   reasons as valid.  If the reasons are not valid in their particular
   circumstances, the MS rejects such requests.

   There can also be valid reasons for 'channel handover' including high
   availability support or where one AS needs to take over management of
   dialogs after the AS that created them has failed.  This could be
   achieved by the Control Channels using the same channel identifier,
   one after another.  For example, assume a channel is created with the
   identifier 'cfw1234' and the channel is used to create dialogs on the
   MS.  This channel (and associated SIP dialog) then terminates due to
   a failure on the AS.  As permitted by the Control Framework, the
   channel identifier 'cfw1234' could then be reused so that another
   channel is created with the same identifier 'cfw1234', allowing it to
   'take over' management of the dialogs on the MS.  Again, the
   implementation needs to understand the full implications and
   carefully weight them before accepting these reasons as valid.  If
   the reasons are not valid for their particular circumstances, the MS
   uses the appropriate SIP mechanisms to prevent session establishment
   when the same channel identifier is used in setting up another
   Control Channel (see Section 4 of [RFC6230]).

8.  IANA Considerations

   IANA has registered a new Media Control Channel Framework Package, a
   new XML namespace, a new XML schema, and a new MIME type.

   IANA has further created a new registry for IVR prompt variable

8.1.  Control Package Registration

   This section registers a new Media Control Channel Framework package,
   per the instructions in Section 13.1 of [RFC6230].

      Package Name: msc-ivr/1.0
      Published Specification(s): RFC 6231
      Person & email address to contact for further information:
         IETF MEDIACTRL working group (,
         Scott McGlashan (

Top      Up      ToC       Page 120 
8.2.  URN Sub-Namespace Registration

   This section registers a new XML namespace,
   "urn:ietf:params:xml:ns:msc-ivr", per the guidelines in RFC 3688

  URI: urn:ietf:params:xml:ns:msc-ivr
  Registrant Contact: IETF MEDIACTRL working group (,
     Scott McGlashan (
     <?xml version="1.0"?>
     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      <html xmlns="" xml:lang="en">
        <title>Media Control Channel Framework IVR
               Package attributes</title>
        <h1>Namespace for Media Control Channel
            Framework IVR Package attributes</h1>
          <p>See <a href="">
          RFC 6231</a>.</p>

8.3.  XML Schema Registration

   This section registers an XML schema as per the guidelines in RFC
   3688 [RFC3688].

  URI:  urn:ietf:params:xml:ns:msc-ivr
  Registrant Contact: IETF MEDIACTRL working group (,
     Scott McGlashan (
  Schema:  The XML for this schema can be found in Section 5 of this

8.4.  MIME Media Type Registration for application/msc-ivr+xml

   This section registers the application/msc-ivr+xml MIME type.

      Type name:  application

      Subtype name:  msc-ivr+xml

Top      Up      ToC       Page 121 
      Required parameters:  (none)

      Optional parameters:  charset
         Indicates the character encoding of enclosed XML.  Default is

      Encoding considerations:  Uses XML, which can employ 8-bit
         characters, depending on the character encoding used.  See RFC
         3023 [RFC3023], Section 3.2.

      Security considerations:  No known security considerations outside
         of those provided by the Media Control Channel Framework IVR

      Interoperability considerations:  This content type provides
         constructs for the Media Control Channel Framework IVR package.

      Published specification:  RFC 6231

      Applications that use this media type:  Implementations of
         the Media Control Channel Framework IVR package.

      Additional information:
         Magic number(s):  (none)
         File extension(s):  (none)
         Macintosh file type code(s):  (none)

      Person & email address to contact for further information:
         Scott McGlashan <>

      Intended usage:  LIMITED USE

      Author/Change controller:  The IETF

      Other information:  None.

8.5.  IVR Prompt Variable Type Registration Information

   This specification establishes an IVR Prompt Variable Type registry
   for Control Packages and initiates its population as follows.  New
   entries in this registry must be published in an RFC (either as an
   IETF submission or RFC Editor submission), using the IANA policy
   [RFC5226] "RFC Required".

Top      Up      ToC       Page 122 
   Variable Type      Control Package  Reference
   -------------      ---------------  ---------
       date            msc-ivr/1.0     [RFC6231]
       time            msc-ivr/1.0     [RFC6231]
       digits          msc-ivr/1.0     [RFC6231]

   The following information MUST be provided in an RFC in order to
   register a new prompt variable type:

   Variable Type:  The value for the <variable> type attribute
      (Section  The RFC MUST specify permitted values (if
      any) for the format attribute of <variable> and how the value
      attribute is rendered for different values of the format
      attribute.  The RFC MUST NOT weaken but MAY strengthen the valid
      values of <variable> attributes defined in Section of
      this specification.

   Reference:  The RFC number in which the variable type is registered.

   Control Package:  The Control Package associated with the IVR
      variable type.

   Person & address to contact for further information:

9.  Using VoiceXML as a Dialog Language

   The IVR Control Package allows, but does not require, the MS to
   support other dialog languages by referencing an external dialog
   document.  This section provides MS implementations that support the
   VoiceXML dialog language ([VXML20], [VXML21], [VXML30]) with
   additional details about using these dialogs in this package.  This
   section is normative for an MS that supports the VoiceXML dialog

   This section covers preparing (Section 9.1), starting (Section 9.2),
   terminating (Section 9.3), and exiting (Section 9.4) VoiceXML dialogs
   as well as handling VoiceXML call transfer (Section 9.5).

9.1.  Preparing a VoiceXML Dialog

   A VoiceXML dialog is prepared by sending the MS a request containing
   a <dialogprepare> element (Section 4.2.1).  The type attribute is set
   to "application/voicexml+xml" and the src attribute to the URI of the
   VoiceXML document that is to be prepared by the MS.  For example:

Top      Up      ToC       Page 123 
   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogprepare type="application/voicexml+xml"

   The VoiceXML dialog environment uses the <dialogprepare> request as
   an opportunity to fetch and validate the initial document indicated
   by the src attribute along with any resources referenced in the
   VoiceXML document marked as prefetchable.  The maxage and maxstale
   attributes, if specified, control how the initial VoiceXML document
   is fetched using HTTP (see [RFC2616]).  Note that the fetchtimeout
   attribute is not defined in VoiceXML for an initial document, but the
   MS MUST support this attribute in its VoiceXML environment.

   If a <params> child element of <dialogprepare> is specified, then the
   MS MUST map the parameter information into a VoiceXML session
   variable object as described in Section 9.2.3.

   The success or failure of the VoiceXML document preparation is
   reported in the MS response.  For example, if the VoiceXML document
   cannot be retrieved, then a 409 error response is returned.  If the
   document is syntactically invalid according to VoiceXML, then a 400
   response is returned.  If successful, the response includes a
   dialogid attribute whose value the AS can use in <dialogstart>
   element to start the prepared dialog.

9.2.  Starting a VoiceXML Dialog

   A VoiceXML dialog is started by sending the MS a request containing a
   <dialogstart> element (Section 4.2.2).  If a VoiceXML dialog has
   already been prepared using <dialogprepare>, then the MS starts the
   dialog indicated by the prepareddialogid attribute.  Otherwise, a new
   VoiceXML dialog can be started by setting the type attribute to
   "application/voicexml+xml" and the src attribute to the URI of the
   VoiceXML document.  For example:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart connectionid="ssd3r3:sds345b"

   The maxage and maxstale attributes, if specified, control how the
   initial VoiceXML document is fetched using HTTP (see [RFC2616]).
   Note that the fetchtimeout attribute is not defined in VoiceXML for
   an initial document, but the MS MUST support this attribute in its

Top      Up      ToC       Page 124 
   VoiceXML environment.  Note also that support for <dtmfsub>
   subscriptions (Section and their associated dialog
   notification events is not defined in VoiceXML.  If such a
   subscription is specified in a <dialogstart> request, then the MS
   sends a 439 error response (see Section 4.5).

   The success or failure of starting a VoiceXML dialog is reported in
   the MS response as described in Section 4.2.2.

   When the MS starts a VoiceXML dialog, the MS MUST map session
   information into a VoiceXML session variable object.  There are 3
   types of session information: protocol information (Section 9.2.1),
   media stream information (Section 9.2.2), and parameter information
   (Section 9.2.3).

9.2.1.  Session Protocol Information

   If the connectionid attribute is specified, the MS assigns protocol
   information from the SIP dialog associated with the connection to the
   following session variables in VoiceXML:

   session.connection.local.uri  Evaluates to the SIP URI specified in
      the 'To:' header of the initial INVITE.

   session.connection.remote.uri  Evaluates to the SIP URI specified in
      the 'From:' header of the initial INVITE.

   session.connection.originator  Evaluates to the value of
      session.connection.remote (MS receives inbound connections but
      does not create outbound connections).  Evaluates to "sip".  Note that this
      is intended to reflect the use of SIP in general, and does not
      distinguish between whether the connection accesses the MS via SIP
      or SIP Secure (SIPS) procedures.

   session.connection.protocol.version  Evaluates to "2.0".

   session.connection.redirect  This array is populated by information
      contained in the 'History-Info' header [RFC4244] in the initial
      INVITE or is otherwise undefined.  Each entry (hi-entry) in the
      'History-Info' header is mapped, in the order it appeared in the
      'History-Info' header, into an element of the
      session.connection.redirect array.  Properties of each element of
      the array are determined as follows:

      uri    Set to the hi-targeted-to-uri value of the History-Info

Top      Up      ToC       Page 125 
      pi     Set to 'true' if hi-targeted-to-uri contains a
             'Privacy=history' parameter, or if the INVITE 'Privacy'
             header includes 'history'; 'false' otherwise

      si     Set to the value of the 'si' parameter if it exists;
             undefined otherwise

      reason Set verbatim to the value of the 'Reason' parameter of hi-

   session.connection.aai  Evaluates to the value of a SIP header with
      the name "aai" if present; undefined otherwise.

   session.connection.protocol.sip.requesturi  This is an associative
      array where the array keys and values are formed from the URI
      parameters on the SIP Request-URI of the initial INVITE.  The
      array key is the URI parameter name.  The corresponding array
      value is obtained by evaluating the URI parameter value as a
      string.  In addition, the array's toString() function returns the
      full SIP Request-URI.

   session.connection.protocol.sip.headers  This is an associative array
      where each key in the array is the non-compact name of a SIP
      header in the initial INVITE converted to lowercase (note the case
      conversion does not apply to the header value).  If multiple
      header fields of the same field name are present, the values are
      combined into a single comma-separated value.  Implementations
      MUST at a minimum include the 'Call-ID' header and MAY include
      other headers.  For example,
      session.connection.protocol.sip.headers["call-id"] evaluates to
      the Call-ID of the SIP dialog.

   If a conferenceid attribute is specified, then the MS populates the
   following session variables in VoiceXML:  Evaluates to the value of the conferenceid

9.2.2.  Session Media Stream Information

   The media streams of the connection or conference to use for the
   dialog are described in Section 4.2.2, including use of <stream>
   elements (Section if specified.  The MS maps media stream
   information into the VoiceXML session variable for a connection, and for a conference.  In both variables, the
   value of the variable is an array where each array element is an
   object with the following properties:

Top      Up      ToC       Page 126 
   type  This required property indicates the type of the media
      associated with the stream (see Section <stream> type
      attribute definition).

   direction  This required property indicates the directionality of the
      media relative to the endpoint of the dialog (see Section
      <stream> direction attribute definition).

   format  This property is optional.  If defined, the value of the
      property is an array.  Each array element is an object that
      specifies information about one format of the media stream.  The
      object contains at least one property called name whose value is
      the subtype name of the media format [RFC4855].  Other properties
      may be defined with string values; these correspond to required
      and, if defined, optional parameters of the format.

   As a consequence of this definition, when a connectionid is specified
   there is an array entry in for
   each media stream used by the VoiceXML dialog.  For an example,
   consider a connection with bidirectional G.711 mu-law audio sampled
   at 8kHz where the dialog is started with

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart connectionid="ssd3r3:sds345b"
     <stream media="audio" direction="recvonly"/>

   In this case,[0].type evaluates
   to "audio",[0].direction
   evaluates to "recvonly" (i.e., the endpoint only receives media from
   the dialog -- the endpoint does not send media to the dialog),[0].format[0].name evaluates to
   "PCMU", and[0].format[0].rate
   evaluates to "8000".

   Note that the session variable is updated if the connection or
   conference media session characteristics for the VoiceXML dialog
   change (e.g., due to a SIP re-INVITE).

Top      Up      ToC       Page 127 
9.2.3.  Session Parameter Information

   Parameter information is specified in the <params> child element of
   <dialogprepare> and <dialogstart> elements, where each parameter is
   specified using a <param> element.  The MS maps parameter information
   into VoiceXML session variables as follows:

   session.values  This is an associative array mapped to the <params>
      element.  It is undefined if no <params> element is specified.  If
      a <params> element is specified in both <dialogprepare> and
      <dialogstart> elements for the same dialog, then the array is
      first initialized with the <params> specified in the
      <dialogprepare> element and then updated with the <params>
      specified in the <dialogstart> element; in cases of conflict, the
      <dialogstart> parameter value take priority.  Array keys and
      values are formed from <param> children of the <params> element.
      Each array key is the value of the name attribute of a <param>
      element.  If the same name is used in more than one <param>
      element, then the array key is associated with the last <param> in
      document order.  The corresponding value for each key is an object
      with two required properties: a "type" property evaluating to the
      value of the type attribute, and a "content" property evaluating
      to the content of the <param>.  In addition, this object's
      toString() function returns the value of the "content" property as
      a string.

   For example, a VoiceXML dialog started with one parameter:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
    <dialogstart connectionid="ssd3r3:sds345b"
      <param name="mode">playannouncement</param>

   In this case, session.values would be defined with one item in the
   array where session.values['mode'].type evaluates to "text/plain"
   (the default value), session.values['mode'].content evaluates to
   "playannouncement", and session.values['mode'].toString() also
   evaluates to "playannouncement".

   The MS sends an error response (see Section 4.2.2) if a <param> is
   not supported by the MS (e.g., the parameter type is not supported).

Top      Up      ToC       Page 128 
9.3.  Terminating a VoiceXML Dialog

   When the MS receives a request with a <dialogterminate> element
   (Section 4.2.3), the MS throws a 'connection.disconnect.hangup' event
   into the specified VoiceXML dialog.  Note that if the immediate
   attribute has the value true, then the MS MUST NOT return <params>
   information when the VoiceXML dialog exits (even if the VoiceXML
   dialog provides such information) -- see Section 9.4.

   If the connection or conference associated with the VoiceXML dialog
   terminates, then the MS throws a 'connection.disconnect.hangup' event
   into the specified VoiceXML dialog.

9.4.  Exiting a VoiceXML Dialog

   The MS sends a <dialogexit> notification event (Section when
   the VoiceXML dialog is complete, has been terminated, or exits due to
   an error.  The <dialogexit> status attribute specifies the status of
   the VoiceXML dialog when it exits and its <params> child element
   specifies information, if any, returned from the VoiceXML dialog.

   A VoiceXML dialog exits when it processes a <disconnect> element, an
   <exit> element, or an implicit exit according to the VoiceXML form
   interpretation algorithm (FIA).  If the VoiceXML dialog executes a
   <disconnect> and then subsequently executes an <exit> with namelist
   information, the namelist information from the <exit> element is

   The MS reports namelist variables in the <params> element of the
   <dialogexit>.  Each <param> reports on a namelist variable.  The MS
   set the <param> name attribute to the name of the VoiceXML variable.
   The MS sets the <param> type attribute according to the type of the
   VoiceXML variable.  The MS sets the <param> type to 'text/plain' when
   the VoiceXML variable is a simple ECMAScript value.  If the VoiceXML
   variable is a recording, the MS sets the <param> type to the MIME
   media type of the recording and encodes the recorded content as CDATA
   in the <param> (see Section for an example).  If the VoiceXML
   variable is a complex ECMAScript value (e.g., object, array, etc.),
   the MS sets the <param> type to 'application/json' and converts the
   variable value to its JSON value equivalent [RFC4627].  The behavior
   resulting from specifying an ECMAScript object with circular
   references is not defined.

   If the expr attribute is specified on the VoiceXML <exit> element
   instead of the namelist attribute, the MS creates a <param> element
   with the reserved name '__exit'.  If the value is an ECMAScript
   literal, the <param> type is 'text/plain' and the content is the
   literal value.  If the value is a variable, the <param> type and

Top      Up      ToC       Page 129 
   content are set in the same way as a namelist variable; for example,
   an expr attribute referencing a variable with a simple ECMAScript
   value has the type 'text/plain' and the content is set to the
   ECMAScript value.  To allow the AS to differentiate between a
   <dialogexit> notification event resulting from a VoiceXML
   <disconnect> from one resulting from an <exit>, the MS creates a
   <param> with the reserved name '__reason', the type 'text/plain', and
   a value of "disconnect" (without brackets) to reflect the use of
   VoiceXML's <disconnect> element, and the value of "exit" (without
   brackets) to an explicit <exit> in the VoiceXML dialog.  If the
   VoiceXML session terminates for other reasons (such as encountering
   an error), this parameter MAY be omitted or take on platform-specific
   values prefixed with an underscore.

   Table 2 provides some examples of VoiceXML <exit> usage and the
   corresponding <params> element in the <dialogexit> notification
   event.  It assumes the following VoiceXML variable names and values:
   userAuthorized=true, pin=1234, and errors=0.  The <param> type
   attributes ('text/plain') are omitted for clarity.

   | <exit> Usage           | <params> Result                          |
   | <exit>                 | <params> <param                          |
   |                        | name="__reason">exit</param> </params>   |
   | <exit expr="5">        | <params> <param                          |
   |                        | name="__reason">exit</param> <param      |
   |                        | name="__exit">5</param> </params>        |
   | <exit expr="'done'">   | <params> <param                          |
   |                        | name="__reason">exit</param> <param      |
   |                        | name="__exit">'done'</param> </params>   |
   | <exit                  | <params> <param                          |
   | expr="userAuthorized"> | name="__reason">exit</param> <param      |
   |                        | name="__exit">true</param> </params>     |
   | <exit namelist="pin    | <params> <param                          |
   | errors">               | name="__reason">exit</param> <param      |
   |                        | name="pin">1234</param> <param           |
   |                        | name="errors">0</param> </params>        |

                 Table 2: VoiceXML <exit> Mapping Examples

9.5.  Call Transfer

   While VoiceXML is at its core a dialog language, it also provides
   optional call transfer capability.  It is NOT RECOMMENDED to use
   VoiceXML's call transfer capability in networks involving application
   servers.  Rather, the AS itself can provide call routing

Top      Up      ToC       Page 130 
   functionality by taking signaling actions based on the data returned
   to it, either through VoiceXML's own data submission mechanisms or
   through the mechanism described in Section 9.4.  If the MS encounters
   a VoiceXML dialog requesting call transfer capability, the MS SHOULD
   raise an error event in the VoiceXML dialog execution context: an
   error.unsupported.transfer.blind event if blind transfer is
   requested, error.unsupported.transfer.bridge if bridge transfer is
   requested, or error.unsupported.transfer.consultation if consultation
   transfer is requested.

10.  Contributors

   Asher Shiratzky provided valuable support and contributions to the
   early versions of this document.

   The authors would like to thank the IVR design team consisting of
   Roni Even, Lorenzo Miniero, Adnan Saleem, Diego Besprosvan, Mary
   Barnes, and Steve Buko, who provided valuable feedback, input, and
   text to this document.

11.  Acknowledgments

   The authors would like to thank Adnan Saleem, Gene Shtirmer, Dave
   Burke, Dan York, Steve Buko, Jean-Francois Bertrand, Henry Lum, and
   Lorenzo Miniero for expert reviews of this work.

   Ben Campbell carried out the RAI expert review on this specification
   and provided a great deal of invaluable input.  Donald Eastlake
   carried out a thorough security review.

12.  References

12.1.  Normative References

   [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
              Extensions (MIME) Part One: Format of Internet Message
              Bodies", RFC 2045, November 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
              Languages", BCP 18, RFC 2277, January 1998.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

Top      Up      ToC       Page 131 
   [RFC2818]  Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000.

   [RFC3023]  Murata, M., St. Laurent, S., and D. Kohn, "XML Media
              Types", RFC 3023, January 2001.

   [RFC3688]  Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
              January 2004.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, January 2005.

   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
              Registration Procedures", BCP 13, RFC 4288, December 2005.

   [RFC4574]  Levin, O. and G. Camarillo, "The Session Description
              Protocol (SDP) Label Attribute", RFC 4574, August 2006.

   [RFC4627]  Crockford, D., "The application/json Media Type for
              JavaScript Object Notation (JSON)", RFC 4627, July 2006.

   [RFC4647]  Phillips, A. and M. Davis, "Matching of Language Tags",
              BCP 47, RFC 4647, September 2006.

   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
              May 2008.

   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", STD 68, RFC 5234, January 2008.

   [RFC5646]  Phillips, A. and M. Davis, "Tags for Identifying
              Languages", BCP 47, RFC 5646, September 2009.

   [RFC6230]  Boulton, C., Melanchuk, T., and S. McGlashan, "Media
              Control Channel Framework", RFC 6230, May 2011.

   [SRGS]     Hunt, A. and S. McGlashan, "Speech Recognition Grammar
              Specification Version 1.0", W3C Recommendation,
              March 2004.

   [VXML20]   McGlashan, S., Burnett, D., Carter, J., Danielsen, P.,
              Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, K.,
              and S. Tryphonas, "Voice Extensible Markup Language
              (VoiceXML) Version 2.0", W3C Recommendation, March 2004.

Top      Up      ToC       Page 132 
   [VXML21]   Oshry, M., Auburn, RJ., Baggia, P., Bodell, M., Burke, D.,
              Burnett, D., Candell, E., Carter, J., McGlashan, S., Lee,
              A., Porter, B., and K. Rehor, "Voice Extensible Markup
              Language (VoiceXML) Version 2.1", W3C Recommendation,
              June 2007.

              Jansen, J., Layaida, N., Michel, T., Grassel, G.,
              Koivisto, A., Bulterman, D., Mullender, S., and D. Zucker,
              "Synchronized Multimedia Integration Language (SMIL 2.1)",
              World Wide Web Consortium Recommendation REC-SMIL2-
              20051213, December 2005,

   [XML]      Bray, T., Paoli, J., Sperberg-McQueen, C M., Maler, E.,
              and F. Yergeau, "Extensible Markup Language (XML) 1.0
              (Third Edition)", W3C Recommendation, February 2004.

              Biron, P. and A. Malhotra, "XML Schema Part 2: Datatypes
              Second Edition", W3C Recommendation, October 2004.

12.2.  Informative References

   [CCXML10]  Auburn, R J., "Voice Browser Call Control: CCXML Version
              1.0", W3C Candidate Recommendation (work in progress),
              April 2010.

   [H.248.9]  "Gateway control protocol: Advanced media server
              packages", ITU-T Recommendation H.248.9.

   [IANA]     IANA, "RTP Payload Types", available

              IANA, "MIME Media Types", available

              McGlashan, S., Melanchuk, T., and C. Boulton, "A Mixer
              Control Package for the Media Control Channel Framework",
              Work in Progress, January 2011.

   [RFC2897]  Cromwell, D., "Proposal for an MGCP Advanced Audio
              Package", RFC 2897, August 2000.

Top      Up      ToC       Page 133 
   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
              A., Peterson, J., Sparks, R., Handley, M., and E.
              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
              June 2002.

   [RFC4240]  Burger, E., Van Dyke, J., and A. Spitzer, "Basic Network
              Media Services with SIP", RFC 4240, December 2005.

   [RFC4244]  Barnes, M., "An Extension to the Session Initiation
              Protocol (SIP) for Request History Information", RFC 4244,
              November 2005.

   [RFC4267]  Froumentin, M., "The W3C Speech Interface Framework Media
              Types: application/voicexml+xml, application/ssml+xml,
              application/srgs, application/srgs+xml, application/
              ccxml+xml, and application/pls+xml", RFC 4267,
              November 2005.

   [RFC4281]  Gellens, R., Singer, D., and P. Frojdh, "The Codecs
              Parameter for "Bucket" Media Types", RFC 4281,
              November 2005.

   [RFC4730]  Burger, E. and M. Dolly, "A Session Initiation Protocol
              (SIP) Event Package for Key Press Stimulus (KPML)",
              RFC 4730, November 2006.

   [RFC4733]  Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
              Digits, Telephony Tones, and Telephony Signals", RFC 4733,
              December 2006.

   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
              Formats", RFC 4855, February 2007.

   [RFC5022]  Van Dyke, J., Burger, E., and A. Spitzer, "Media Server
              Control Markup Language (MSCML) and Protocol", RFC 5022,
              September 2007.

   [RFC5167]  Dolly, M. and R. Even, "Media Server Control Protocol
              Requirements", RFC 5167, March 2008.

   [RFC5707]  Saleem, A., Xin, Y., and G. Sharratt, "Media Server Markup
              Language (MSML)", RFC 5707, February 2010.

   [VXML30]   McGlashan, S., Burnett, D., Akolkar, R., Auburn, RJ.,
              Baggia, P., Barnett, J., Bodell, M., Carter, J., Oshry,
              M., Rehor, K., Young, M., and R. Hosn, "Voice Extensible
              Markup Language (VoiceXML) Version 3.0", W3C Working
              Draft, August 2010.

Top      Up      ToC       Page 134 
              Novo, O., Camarillo, G., Morgan, D., and J. Urpalainen,
              "Conference Information Data Model for Centralized
              Conferencing (XCON)", Work in Progress, April 2011.

Authors' Addresses

   Scott McGlashan


   Tim Melanchuk


   Chris Boulton