Tech-invite3GPPspaceIETFspace
959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6231

An Interactive Voice Response (IVR) Control Package for the Media Control Channel Framework

Pages: 134
Proposed Standard
Errata
Updated by:  6623
Part 3 of 6 – Pages 34 to 61
First   Prev   Next

Top   ToC   RFC6231 - Page 34   prevText

4.3. IVR Dialog Elements

This section describes the IVR dialog language defined as part of this specification. The MS MUST support this dialog language. The <dialog> element is an execution container for operations of playing prompts (Section 4.3.1.1), runtime controls (Section 4.3.1.2), collecting DTMF (Section 4.3.1.3), and recording user input (Section 4.3.1.4). Results of the dialog execution (Section 4.3.2) are reported in a dialogexit notification event. Using these elements, three common dialog models are supported: playannouncements: only a <prompt> element is specified in the container. The prompt media resources are played in sequence. promptandcollect: a <collect> element is specified and, optionally, a <prompt> element. If a <prompt> element is specified and bargein is enabled, playing of the prompt is terminated when bargein occurs, and DTMF collection is initiated; otherwise, the prompt is played to completion before DTMF collection is initiated. If no prompt element is specified, DTMF collection is initiated immediately. promptandrecord: a <record> element is specified and, optionally, a <prompt> element. If a <prompt> element is specified and bargein is enabled, playing of the prompt is terminated when bargein occurs, and recording is initiated; otherwise, the prompt is played to completion before recording is initiated. If no prompt element is specified, recording is initiated immediately.
Top   ToC   RFC6231 - Page 35
   In addition, this dialog language supports runtime ('VCR') controls
   enabling a user to control prompt playback using DTMF.

   Each of the core elements -- <prompt>, <control>, <collect>, and
   <record> -- are specified so that their execution and reporting is
   largely self-contained.  This facilitates their reuse in other dialog
   container elements.  Note that DTMF and bargein behavior affects
   multiple elements and is addressed in the relevant element
   definitions.

   Execution results are reported in the <dialogexit> notification event
   with child elements defined in Section 4.3.2.  If the dialog
   terminated normally (i.e., not due to an error or to a
   <dialogterminate> request), then the MS MUST report the results for
   the operations specified in the dialog:

   <prompt>:  <promptinfo> (see Section 4.3.2.1) with at least the
      termmode attribute specified.

   <control>:  <controlinfo> (see Section 4.3.2.2) if any runtime
      controls are matched.

   <collect>:  <collectinfo> (see Section 4.3.2.3) with the dtmf and
      termmode attributes specified.

   <record>:  <recordinfo> (see Section 4.3.2.4) with at least the
      termmode attribute and one <mediainfo> element specified.

   The media format requirements for IVR dialogs are undefined.  This
   package is agnostic to the media types and codecs for media resources
   and recording that need to be supported by an implementation.  For
   example, an MS implementation might only support audio and in
   particular the 'audio/basic' codec for media playback and recording.
   However, when executing a dialog, if an MS encounters a media type or
   codec that it cannot process, the MS MUST stop further processing and
   report the error using the dialogexit notification.

4.3.1. <dialog>

An IVR dialog to play prompts to the user, allow runtime controls, collect DTMF, or record input. The dialog is specified using a <dialog> element. A <dialog> element has the following attributes:
Top   ToC   RFC6231 - Page 36
   repeatCount:  number of times the dialog is to be executed.  A valid
      value is a non-negative integer (see Section 4.6.4).  A value of 0
      indicates that the dialog is repeated until halted by other means.
      The attribute is optional.  The default value is 1.

   repeatDur:  maximum duration for dialog execution.  A valid value is
      a time designation (see Section 4.6.7).  If no value is specified,
      then there is no limit on the duration of the dialog.  The
      attribute is optional.  There is no default value.

   repeatUntilComplete:  indicates whether the MS terminates dialog
      execution when an input operation is completed successfully.  A
      valid value is a boolean (see Section 4.6.1).  A value of true
      indicates that dialog execution is terminated when an input
      operation associated with its child elements is completed
      successfully (see execution model below for precise conditions).
      A value of false indicates that dialog execution is terminated by
      other means.  The attribute is optional.  The default value is
      false.

   The repeatDur attribute takes priority over the repeatCount attribute
   in determining maximum duration of the dialog.  See 'repeatCount' and
   'repeatDur' in the Synchronized Multimedia Integration Language
   (SMIL) [W3C.REC-SMIL2-20051213] for further information.  In the
   situation where a dialog is repeated more than once, only the results
   of operations in the last dialog iteration are reported.

   The <dialog> element has the following sequence of child elements (at
   least one, any order):

   <prompt>:  defines media resources to play in sequence (see
      Section 4.3.1.1).  The element is optional.

   <control>:  defines how DTMF is used for runtime controls (see
      Section 4.3.1.2).  The element is optional.

   <collect>:  defines how DTMF is collected (see Section 4.3.1.3).  The
      element is optional.

   <record>:  defines how recording takes place (see Section 4.3.1.4).
      The element is optional.

   Although the behavior when both <collect> and <record> elements are
   specified in a request is not defined in this Control Package, the MS
   MAY support this configuration.  If the MS does not support this
   configuration, the MS sends a <response> with a 433 status code.
Top   ToC   RFC6231 - Page 37
   The MS has the following execution model for the IVR dialog after
   initialization (initialization errors are reported by the MS in the
   response):

   1.  If an error occurs during execution, then the MS terminates the
       dialog and reports the error in the <dialogexit> event by setting
       the status attribute (see Section 4.3.2).  Details about the
       error are specified in the reason attribute.

   2.  The MS initializes a counter to 0.

   3.  The MS starts a duration timer for the value of the repeatDur
       attribute.  If the timer expires before the dialog is complete,
       then the MS terminates the dialog and sends a dialogexit whose
       status attribute is set to 3 (see Section 4.2.5.1).  The MS MAY
       report information in the dialogexit gathered in the last
       execution cycle (if any).

   4.  The MS initiates a dialog execution cycle.  Each cycle executes
       the operations associated with the child elements of the dialog.
       If a <prompt> element is specified, then execute the element's
       prompt playing operation and activate any controls (if the
       <control> element is specified).  If no <prompt> is specified or
       when a specified <prompt> terminates, then start the collect
       operation or the record operation if the <collect> or <record>
       elements, respectively, are specified.  If subscriptions are
       specified for the dialog, then the MS sends a notification event
       when the specified event occurs.  If execution of a child element
       results in an error, the MS terminates dialog execution (and
       stops other child element operations) and the MS sends a
       dialogexit status event, reporting any information gathered.

   5.  If the dialog execution cycle completes successfully, then the MS
       increments the counter by one.  The MS terminates dialog
       execution if either of the following conditions is true:

       *  the value of the repeatCount attribute is greater than zero,
          and the counter is equal to the value of the repeatCount
          attribute.

       *  the value of the repeatUntilComplete attribute is true and one
          of the following conditions is true:

          +  <collect> reports termination status of 'match' or
             'stopped'.

          +  <record> reports termination status of 'stopped', 'dtmf',
             'maxtime', or 'finalsilence'.
Top   ToC   RFC6231 - Page 38
       When the MS terminates dialog execution, it sends a dialogexit
       (with a status of 1) reporting operation information collected in
       the last dialog execution cycle only.  Otherwise, another dialog
       execution cycle is initiated.

4.3.1.1. <prompt>
The <prompt> element specifies a sequence of media resources to play back in document order. A <prompt> element has the following attributes: xml:base: A string declaring the base URI from which relative URIs in child elements are resolved prior to fetching. A valid value is a URI (see Section 4.6.9). The attribute is optional. There is no default value. bargein: Indicates whether user input stops prompt playback unless the input is associated with a specified runtime <control> operation (input matching control operations never interrupts prompt playback). A valid value is a boolean (see Section 4.6.1). A value of true indicates that bargein is permitted and prompt playback is stopped. A value of false indicates that bargein is not permitted: user input does not terminate prompt playback. The attribute is optional. The default value is true. The <prompt> element has the following child elements (at least one, any order, multiple occurrences of elements permitted): <media>: specifies a media resource (see Section 4.3.1.5) to play. The element is optional. <variable>: specifies a variable media announcement (see Section 4.3.1.1.1) to play. The element is optional. <dtmf>: generates one or more DTMF tones (see Section 4.3.1.1.2) to play. The element is optional. <par>: specifies media resources to play in parallel (see Section 4.3.1.1.3). The element is optional. If the MS does not support the configuration required for prompt playback to the output media streams and a more specific error code is not defined for its child elements, the MS sends a <response> with a 429 status code (Section 4.5). The MS MAY support transcoding between the media resource format and the output stream format.
Top   ToC   RFC6231 - Page 39
   The MS has the following execution model for prompt playing after
   initialization:

   1.  The MS initiates prompt playback playing its child elements
       (<media>, <variable>, <dtmf>, and <par>) one after another in
       document order.

   2.  If any error (including fetching and rendering errors) occurs
       during prompt execution, then the MS terminates playback and
       reports its error status to the dialog container (see
       Section 4.3) with a <promptinfo> (see Section 4.3.2.1) where the
       termmode attribute is set to stopped and any additional
       information is set.

   3.  If DTMF input is received and the value of the bargein attribute
       is true, then the MS terminates prompt playback and reports its
       execution status to the dialog container (see Section 4.3) with a
       <promptinfo> (see Section 4.3.2.1) where the termmode attribute
       is set to bargein and any additional information is set.

   4.  If prompt playback is stopped by the dialog container, then the
       MS reports its execution status to the dialog container (see
       Section 4.3) with a <promptinfo> (see Section 4.3.2.1) where the
       termmode attribute is set to stopped and any additional
       information is set.

   5.  If prompt playback completes successfully, then the MS reports
       its execution status to the dialog container (see Section 4.3)
       with a <promptinfo> (see Section 4.3.2.1) where the termmode
       attribute is set to completed and any additional information is
       set.

4.3.1.1.1. <variable>
The <variable> element specifies variable announcements using predefined media resources. Each variable has at least a type (e.g., date) and a value (e.g., 2008-02-25). The value is rendered according to the prompt variable type (e.g., 2008-02-25 is rendered as the date 25th February 2008). The precise mechanism for generating variable announcements (including the location of associated media resources) is implementation specific. A <variable> element has the following attributes: type: specifies the type of prompt variable to render. This specification defines three values -- date (Section 4.3.1.1.1.1), time (Section 4.3.1.1.1.2), and digits (Section 4.3.1.1.1.3). All other valid but undefined values are reserved for future use,
Top   ToC   RFC6231 - Page 40
      where new values are assigned as described in Section 8.5.  A
      valid value is a string (see Section 4.6.6).  The attribute is
      mandatory.

   value:  specifies a string to be rendered according to the prompt
      variable type.  A valid value is a string (see Section 4.6.6).
      The attribute is mandatory.

   format:  specifies format information that the prompt variable type
      uses to render the value attribute.  A valid value is a string
      (see Section 4.6.6).  The attribute is optional.  There is no
      default value.

   gender:  specifies the gender that the prompt variable type uses to
      render the value attribute.  Valid values are "male" or "female".
      The attribute is optional.  There is no default value.

   xml:lang:  specifies the language that the prompt variable type uses
      to render the value attribute.  A valid value is a language
      identifier (see Section 4.6.11).  The attribute is optional.
      There is no default value.

   The <variable> element has no children.

   This specification is agnostic to the type and codec of media
   resources into which variables are rendered as well as the rendering
   mechanism itself.  For example, an MS implementation supporting audio
   rendering could map the <variable> into one or more audio media
   resources.

   This package is agnostic to which <variable> types are supported by
   an implementation.  If a <variable> element configuration specified
   in a request is not supported by the MS, the MS sends a <response>
   with a 425 status code (Section 4.5).

4.3.1.1.1.1.  Date Type

   The date variable type provides a mechanism for dynamically rendering
   a date prompt.

   The <variable> type attribute MUST have the value "date".

   The <variable> format attribute MUST be one of the following values
   and comply with its rendering of the value attribute:

   mdy  indicating that the <variable> value attribute is to be rendered
        as sequence composed of month, then day, then year.
Top   ToC   RFC6231 - Page 41
   ymd  indicating that the <variable> value attribute is to be rendered
        as sequence composed of year, then month, then day.

   dym  indicating that the <variable> value attribute is to be rendered
        as sequence composed of day, then year, then month.

   dm   indicating that the <variable> value attribute is to be rendered
        as sequence composed of day then month.

   The <variable> value attribute MUST comply with a lexical
   representation of date where

   yyyy '-' mm '-' dd

   as defined in Section 3.2.9 of [XMLSchema:Part2].

   For example,

     <variable type="date" format="dmy" value="2010-11-25"
     xml:lang="en" gender="male"/>

   describes a variable date prompt where the date can be rendered in
   audio as "twenty-fifth of November two thousand and ten" using a list
   of <media> resources:

   <media loc="nfs://voicebase/en/male/25th.wav"/>
   <media loc="nfs://voicebase/en/male/of.wav"/>
   <media loc="nfs://voicebase/en/male/november.wav"/>
   <media loc="nfs://voicebase/en/male/2000.wav"/>
   <media loc="nfs://voicebase/en/male/and.wav"/>
   <media loc="nfs://voicebase/en/male/10.wav"/>

4.3.1.1.1.2.  Time Type

   The time variable type provides a mechanism for dynamically rendering
   a time prompt.

   The <variable> type attribute MUST have the value "time".

   The <variable> format attribute MUST be one of the following values
   and comply with its rendering of the value attribute:

   t12  indicating that the <variable> value attribute is to be rendered
        as a time in traditional 12-hour format using am or pm (for
        example, "twenty-five minutes past 2 pm" for "14:25").
Top   ToC   RFC6231 - Page 42
   t24  indicating that the <variable> value attribute is to be rendered
        as a time in 24-hour format (for example, "fourteen twenty-five"
        for "14:25").

   The <variable> value attribute MUST comply with a lexical
   representation of time where

   hh ':' mm ( ':' ss )?

   as defined in Section 3.2.8 of [XMLSchema:Part2].

4.3.1.1.1.3.  Digits Type

   The digits variable type provides a mechanism for dynamically
   rendering a digit sequence.

   The <variable> type attribute MUST have the value "digits".

   The <variable> format attribute MUST be one of the following values
   and comply with its rendering of the value attribute:

   gen  indicating that the <variable> value attribute is to be rendered
        as a general digit string (for example, "one two three" for
        "123").

   crn  indicating that the <variable> value attribute is to be rendered
        as a cardinal number (for example, "one hundred and twenty-
        three" for "123").

   ord  indicating that the <variable> value attribute is to be rendered
        as an ordinal number (for example, "one hundred and twenty-
        third" for "123").

   The <variable> value attribute MUST comply with the lexical
   representation

      d+

   i.e., one or more digits.

4.3.1.1.2. <dtmf>
The <dtmf> element specifies a sequence of DTMF tones for output. DTMF tones could be generated using <media> resources where the output is transported as RTP audio packets. However, <media> resources are not sufficient for cases where DTMF tones are to be transported as DTMF RTP [RFC4733] or in event packages.
Top   ToC   RFC6231 - Page 43
   A <dtmf> element has the following attributes:

   digits:  specifies the DTMF sequence to output.  A valid value is a
      DTMF string (see Section 4.6.3).  The attribute is mandatory.

   level:  used to define the power level for which the DTMF tones will
      be generated.  Values are expressed in dBm0.  A valid value is an
      integer in the range of 0 to -96 (dBm0).  Larger negative values
      express lower power levels.  Note that values lower than -55 dBm0
      will be rejected by most receivers (TR-TSY-000181, ITU-T Q.24A).
      The attribute is optional.  The default value is -6 (dBm0).

   duration:  specifies the duration for which each DTMF tone is
      generated.  A valid value is a time designation (see
      Section 4.6.7).  The MS MAY round the value if it only supports
      discrete durations.  The attribute is optional.  The default value
      is 100 ms.

   interval:  specifies the duration of a silence interval following
      each generated DTMF tone.  A valid value is a time designation
      (see Section 4.6.7).  The MS MAY round the value if it only
      supports discrete durations.  The attribute is optional.  The
      default value is 100 ms.

   The <dtmf> element has no children.

   If a <dtmf> element configuration is not supported, the MS sends a
   <response> with a 426 status code (Section 4.5).

4.3.1.1.3. <par>
The <par> element allows media resources to be played in parallel. Each of its child elements specifies a media resource (or a sequence of media resources using the <seq> element). When playback of the <par> element is initiated, the MS begins playback of all its child elements at the same time. This element is modeled after the <par> element in SMIL [W3C.REC-SMIL2-20051213]. The <par> element has the following attributes: endsync: indicates when playback of the element is complete. Valid values are "first" (indicates that the element is complete when any child element reports that it is complete) and "last" (indicates it is complete when every child elements are complete). The attribute is optional. The default value is "last". If the value is "first", then playback of other child elements is stopped when one child element reports it is complete.
Top   ToC   RFC6231 - Page 44
   The <par> element has the following child elements (at least one, any
   order, multiple occurrences of each element permitted):

   <seq>:  specifies a sequence of media resources to play in parallel
      with other <par> child elements (see Section 4.3.1.1.3.1).  The
      element is optional.

   <media>:  specifies a media resource (see Section 4.3.1.5) to play.
      The MS is responsible for assigning the appropriate media
      stream(s) when more than one is available.  The element is
      optional.

   <variable>:  specifies a variable media announcement (see
      Section 4.3.1.1.1) to play.  The element is optional.

   <dtmf>:  generates one or more DTMF tones (see Section 4.3.1.1.2) to
      play.  The element is optional.

   It is RECOMMENDED that a <par> element contains only one <media>
   element of the same media type (i.e., same type-name as defined in
   Section 4.6.10).  If a <par> element configuration is not supported,
   the MS sends a <response> with a 435 status code (Section 4.5).

   Runtime <control>s (Section 4.3.1.2) apply to each child element
   playing in parallel.  For example, pause and resume controls cause
   all child elements to be paused and resumed, respectively.

   If the <par> element is stopped by the prompt container (e.g.,
   bargein or dialog termination), then playback of all child elements
   is stopped.  The playback duration (Section 4.3.2.1) reported for the
   <par> element is the duration of parallel playback, not the
   cumulative duration of each child element played in parallel.

   For example, a request to playback audio and video media in parallel:
Top   ToC   RFC6231 - Page 45
   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
   <dialogstart connectionid="c1">
     <dialog>
      <prompt>
       <par>
        <media type="audio/x-wav"
               loc="http://www.example.com/media/comments.wav"/>
        <media type="video/3gpp;codecs='s263'"
               loc="http://www.example.com/media/camera.3gp"/>
       </par>
      </prompt>
     </dialog>
    </dialogstart>
   </mscivr>

   When the <prompt> element is executed, it begins playback of its
   child element in document-order sequence.  In this case, there is
   only one child element, a <par> element itself containing audio and
   video <media> child elements.  Consequently, playback of both audio
   and video media resources is initiated at the same time.  Since the
   endsync attribute is not specified, the default value "last" applies.
   The <par> element playback is complete when the media resource with
   the longest duration is complete.

4.3.1.1.3.1.  <seq>

   The <seq> element specifies media resources to be played back in
   sequence.  This allows a sequence of media resources to be played at
   the same time as other children of a <par> element are played in
   parallel, for example, a sequence of audio resources while a video
   resource is played in parallel.  This element is modeled after the
   <seq> element in SMIL [W3C.REC-SMIL2-20051213].

   The <seq> element has no attributes.

   The <seq> element has the following child elements (at least one, any
   order, multiple occurrences of each element permitted):

   <media>:  specifies a media resource (see Section 4.3.1.5) to play.
      The element is optional.

   <variable>:  specifies a variable media announcement (see
      Section 4.3.1.1.1) to play.  The element is optional.

   <dtmf>:  generates one or more DTMF tones (see Section 4.3.1.1.2) to
      play.  The element is optional.
Top   ToC   RFC6231 - Page 46
   Playback of a <seq> element is complete when all child elements in
   the sequence are complete.  If the <seq> element is stopped by the
   <par> container, then playback of the current child element is
   stopped (remaining child elements in the sequence are not played).

   For example, a request to play a sequence of audio resources in
   parallel with a video media:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
   <dialogstart connectionid="c1">
     <dialog>
      <prompt>
       <par endsync="first">
        <seq>
          <media type="audio/x-wav"
               loc="http://www.example.com/media/date.wav"/>
          <media type="audio/x-wav"
               loc="http://www.example.com/media/intro.wav"/>
          <media type="audio/x-wav"
               loc="http://www.example.com/media/main.wav"/>
          <media type="audio/x-wav"
               loc="http://www.example.com/media/end.wav"/>
        </seq>
        <media type="video/3gpp;codecs='s263'"
               loc="rtsp://www.example.com/media/camera.3gp"/>
       </par>
      </prompt>
     </dialog>
    </dialogstart>
   </mscivr>

   When the <prompt> element is executed, it begins playback of the
   <par> element containing a <seq> element and a video <media> element.
   The <seq> element itself contains a sequence of audio <media>
   elements.  Consequently, playback of the video media resource is
   initiated at the same time as playback of the sequence of the audio
   media resources is initiated.  Each audio resource is played back
   after the previous one completes.  Since the endsync attribute is set
   to "first", the <par> element playback is complete when either all
   the audio resources in <seq> have been played to completion or the
   video <media> is complete, whichever occurs first.

4.3.1.2. <control>
The <control> element defines how DTMF input is mapped to runtime controls, including prompt playback controls.
Top   ToC   RFC6231 - Page 47
   DTMF input matching these controls MUST NOT cause prompt playback to
   be interrupted (i.e., no prompt bargein), but causes the appropriate
   operation to be applied, for example, speeding up prompt playback.

   DTMF input matching these controls has priority over <collect> input
   for the duration of prompt playback.  If an incoming DTMF character
   matches a specified runtime control, then the DTMF character is
   consumed: it is not added to the digit buffer and so is not available
   to the <collect> operation.  Once prompt playback is complete,
   runtime controls are no longer active.

   The <control> element has the following attributes:

   gotostartkey:  maps a DTMF key to skip directly to the start of the
      prompt.  A valid value is a DTMF character (see Section 4.6.2).
      The attribute is optional.  There is no default value.

   gotoendkey:  maps a DTMF key to skip directly to the end of the
      prompt.  A valid value is a DTMF character (see Section 4.6.2).
      The attribute is optional.  There is no default value.

   skipinterval:  indicates how far an MS skips backwards or forwards
      through prompt playback when the rewind (rwkey) of fast forward
      key (ffkey) is pressed.  A valid value is a Time Designation (see
      Section 4.6.7).  The attribute is optional.  The default value is
      6s.

   ffkey:  maps a DTMF key to a fast forward operation equal to the
      value of 'skipinterval'.  A valid value is a DTMF character (see
      Section 4.6.2).  The attribute is optional.  There is no default
      value.

   rwkey:  maps a DTMF key to a rewind operation equal to the value of
      'skipinterval'.  A valid value is a DTMF character (see
      Section 4.6.2).  The attribute is optional.  There is no default
      value.

   pauseinterval:  indicates how long an MS pauses prompt playback when
      the pausekey is pressed.  A valid value is a Time Designation (see
      Section 4.6.7).  The attribute is optional.  The default value is
      10s.

   pausekey:  maps a DTMF key to a pause operation equal to the value of
      'pauseinterval'.  A valid value is a DTMF character (see
      Section 4.6.2).  The attribute is optional.  There is no default
      value.
Top   ToC   RFC6231 - Page 48
   resumekey:  maps a DTMF key to a resume operation.  A valid value is
      a DTMF character (see Section 4.6.2).  The attribute is optional.
      There is no default value.

   volumeinterval:  indicates the increase or decrease in playback
      volume (relative to the current volume) when the volupkey or
      voldnkey is pressed.  A valid value is a percentage (see
      Section 4.6.8).  The attribute is optional.  The default value is
      10%.

   volupkey:  maps a DTMF key to a volume increase operation equal to
      the value of 'volumeinterval'.  A valid value is a DTMF character
      (see Section 4.6.2).  The attribute is optional.  There is no
      default value.

   voldnkey:  maps a DTMF key to a volume decrease operation equal to
      the value of 'volumeinterval'.  A valid value is a DTMF character
      (see Section 4.6.2).  The attribute is optional.  There is no
      default value.

   speedinterval:  indicates the increase or decrease in playback speed
      (relative to the current speed) when the speedupkey or speeddnkey
      is pressed.  A valid value is a percentage (see Section 4.6.8).
      The attribute is optional.  The default value is 10%.

   speedupkey:  maps a DTMF key to a speed increase operation equal to
      the value of the speedinterval attribute.  A valid value is a DTMF
      character (see Section 4.6.2).  The attribute is optional.  There
      is no default value.

   speeddnkey:  maps a DTMF key to a speed decrease operation equal to
      the value of the speedinterval attribute.  A valid value is a DTMF
      character (see Section 4.6.2).  The attribute is optional.  There
      is no default value.

   external:  allows one or more DTMF keys to be declared as external
      controls (for example, video camera controls); the MS can send
      notifications when a matching key is activated using <dtmfnotify>
      (Section 4.2.5.2).  A valid value is a DTMF string (see
      Section 4.6.3).  The attribute is optional.  There is no default
      value.

   If the same DTMF is specified in more than one DTMF key control
   attribute -- except the pausekey and resumekey attributes -- the MS
   sends a <response> with a 413 status code (Section 4.5).

   The MS has the following execution model for runtime control after
   initialization:
Top   ToC   RFC6231 - Page 49
   1.  If an error occurs during execution, then the MS terminates
       runtime control and the error is reported to the dialog
       container.  The MS MAY report controls executed successfully
       before the error in <controlinfo> (see Section 4.3.2.2).

   2.  Runtime controls are active only during prompt playback (if no
       <prompt> element is specified, then runtime controls are
       ignored).  If DTMF input matches any specified keys (for example,
       the ffkey), then the MS applies the appropriate operation
       immediately.  If a seek operation (ffkey, rwkey) attempts to go
       beyond the beginning or end of the prompt queue, then the MS
       automatically truncates it to the prompt queue beginning or end,
       respectively.  If a volume operation (voldnkey, volupkey)
       attempts to go beyond the minimum or maximum volume supported by
       the platform, then the MS automatically limits the operation to
       minimum or maximum supported volume, respectively.  If a speed
       operation (speeddnkey, speedupkey) attempts to go beyond the
       minimum or maximum playback speed supported by the platform, then
       the MS automatically limits the operation to minimum or maximum
       supported speed, respectively.  If the pause operation attempts
       to pause output when it is already paused, then the operation is
       ignored.  If the resume operation attempts to resume when the
       prompts are not paused, then the operation is ignored.  If a
       seek, volume, or speed operation is applied when output is
       paused, then the MS also resumes output automatically.

   3.  If DTMF control subscription has been specified for the dialog,
       then each DTMF match of a control operation is reported in a
       <dtmfnotify> notification event (Section 4.2.5.2).

   4.  When the dialog exits, all control matches are reported in a
       <controlinfo> element (Section 4.3.2.2).

4.3.1.3. <collect>
The <collect> element defines how DTMF input is collected. The <collect> element has the following attributes: cleardigitbuffer: indicates whether the digit buffer is to be cleared. A valid value is a boolean (see Section 4.6.1). A value of true indicates that the digit buffer is to be cleared. A value of false indicates that the digit buffer is not to be cleared. The attribute is optional. The default value is true. timeout: indicates the maximum time to wait for user input to begin. A valid value is a Time Designation (see Section 4.6.7). The attribute is optional. The default value is 5s.
Top   ToC   RFC6231 - Page 50
   interdigittimeout:  indicates the maximum time to wait for another
      DTMF when the collected input is incomplete with respect to the
      grammar.  A valid value is a Time Designation (see Section 4.6.7).
      The attribute is optional.  The default value is 2s.

   termtimeout:  indicates the maximum time to wait for the termchar
      character when the collected input is complete with respect to the
      grammar.  A valid value is a Time Designation (see Section 4.6.7).
      The attribute is optional.  The default value is 0s (no delay).

   escapekey:  specifies a DTMF key that indicates collected grammar
      matches are discarded and the DTMF collection is to be re-
      initiated.  A valid value is a DTMF character (see Section 4.6.2).
      The attribute is optional.  There is no default value.

   termchar:  specifies a DTMF character for terminating DTMF input
      collection using the internal grammar.  It is ignored when a
      custom grammar is specified.  A valid value is a DTMF character
      (see Section 4.6.2).  To disable termination by a conventional
      DTMF character, set the parameter to an unconventional character
      like 'A'.  The attribute is optional.  The default value is '#'.

   maxdigits:  The maximum number of digits to collect using an internal
      digits (0-9 only) grammar.  It is ignored when a custom grammar is
      specified.  A valid value is a positive integer (see
      Section 4.6.5).  The attribute is optional.  The default value is
      5.

   The following matching priority is defined for incoming DTMF:
   termchar attribute, escapekey attribute, and then as part of a
   grammar.  For example, if "1" is defined as the escapekey attribute
   and as part of a grammar, then its interpretation as an escapekey
   takes priority.

   The <collect> element has the following child element:

   <grammar>:  indicates a custom grammar format (see
      Section 4.3.1.3.1).  The element is optional.

   The custom grammar takes priority over the internal grammar.  If a
   <grammar> element is specified, the MS MUST use it for DTMF
   collection.

   The MS has the following execution model for DTMF collection after
   initialization:

   1.  The DTMF collection buffer MUST NOT receive DTMF input matching
       <control> operations (see Section 4.3.1.2).
Top   ToC   RFC6231 - Page 51
   2.  If an error occurs during execution, then the MS terminates
       collection and reports the error to the dialog container (see
       Section 4.3).  The MS MAY report DTMF collected before the error
       in <collectinfo> (see Section 4.3.2.3).

   3.  The MS clears the digit buffer if the value of the
       cleardigitbuffer attribute is true.

   4.  The MS activates an initial timer with the duration of the value
       of the timeout attribute.  If the initial timer expires before
       any DTMF input is received, then collection execution terminates,
       the <collectinfo> (see Section 4.3.2.3) has the termmode
       attribute set to noinput and the execution status is reported to
       the dialog container.

   5.  When the first DTMF collect input is received, the initial timer
       is canceled and DTMF collection begins.  Each DTMF input is
       collected unless it matches the value of the escapekey attribute
       or the termchar attribute when the internal grammar is used.
       Collected input is matched against the grammar to determine if it
       is valid and, if valid, whether collection is complete.  Valid
       DTMF patterns are either a simple digit string where the maximum
       length is determined by the maxdigits attribute and that can be
       optionally terminated by the character in the termchar attribute,
       or a custom DTMF grammar specified with the <grammar> element.

   6.  After escapekey input, or a valid input that does not complete
       the grammar, the MS activates a timer for the value of the
       interdigittimeout attribute or the termtimeout attribute.  The MS
       only uses the termtimeout value when the grammar does not allow
       any additional input; otherwise, the MS uses the
       interdigittimeout.

   7.  If DTMF collect input matches the value of the escapekey
       attribute, then the MS re-initializes DTMF collection: i.e., the
       MS discards collected DTMFs already matched against the grammar,
       and the MS attempts to match incoming DTMF (including any pending
       in the digit buffer) as described in Step 5 above.

   8.  If the collect input is not valid with respect to the grammar or
       an interdigittimeout timer expires, the MS terminates collection
       execution and reports execution status to the dialog container
       with a <collectinfo> (see Section 4.3.2.3) where the termmode
       attribute is set to nomatch.

   9.  If the collect input completes the grammar or if a termtimeout
       timer expires, then the MS terminates collection execution and
       reports execution status to the dialog container with
Top   ToC   RFC6231 - Page 52
       <collectinfo> (see Section 4.3.2.3) where the termmode attribute
       is set to match.

4.3.1.3.1. <grammar>
The <grammar> element allows a custom grammar, inline or external, to be specified. Custom grammars permit the full range of DTMF characters including '*' and '#' to be specified for DTMF pattern matching. The <grammar> element has the following attributes: src: specifies the location of an external grammar document. A valid value is a URI (see Section 4.6.9). The MS MUST support both HTTP [RFC2616] and HTTPS [RFC2818] schemes and the MS MAY support other schemes. If the URI scheme is unsupported, the MS sends a <response> with a 420 status code (Section 4.5). If the resource cannot be retrieved within the timeout interval, the MS sends a <response> with a 409 status code. If the grammar format is not supported, the MS sends a <response> with a 424 status code. The attribute is optional. There is no default value. type: identifies the preferred type of the grammar document identified by the src attribute. A valid value is a MIME media type (see Section 4.6.10). If the URI scheme used in the src attribute defines a mechanism for establishing the authoritative MIME media type of the media resource, the value returned by that mechanism takes precedence over this attribute. The attribute is optional. There is no default value. fetchtimeout: the maximum interval to wait when fetching a grammar resource. A valid value is a Time Designation (see Section 4.6.7). The attribute is optional. The default value is 30s. The <grammar> element allows inline grammars to be specified. XML grammar formats MUST use a namespace other than the one used in this specification. Non-XML grammar formats MAY use a CDATA section. The MS MUST support the Speech Recognition Grammar Specification [SRGS] XML grammar format ("application/srgs+xml") and MS MAY support the Key Press Markup Language (KPML) [RFC4730] or other grammar formats. If the grammar format is not supported by the MS, then the MS sends a <response> with a 424 status code (Section 4.5). For example, the following fragment shows DTMF collection with an inline SRGS grammar:
Top   ToC   RFC6231 - Page 53
   <collect cleardigitbuffer="false" timeout="20s"
           interdigittimeout="1s">
           <grammar>
                   <grammar xmlns="http://www.w3.org/2001/06/grammar"
                                   version="1.0" mode="dtmf">
                     <rule id="digit">
                      <one-of>
                           <item>0</item>
                           <item>1</item>
                           <item>2</item>
                           <item>3</item>
                           <item>4</item>
                           <item>5</item>
                           <item>6</item>
                           <item>7</item>
                           <item>8</item>
                           <item>9</item>
                      </one-of>
                   </rule>

                   <rule id="pin" scope="public">
                    <one-of>
                           <item>
                            <item repeat="4">
                             <ruleref uri="#digit"/>
                                   </item>#</item>
                                   <item>* 9</item>
                    </one-of>
                   </rule>

          </grammar>
            </grammar>
           </collect>

   The same grammar could also be referenced externally (and take
   advantage of HTTP caching):

   <collect cleardigitbuffer="false" timeout="20s">
      <grammar type="application/srgs+xml"
               src="http://example.org/pin.grxml"/>
   </collect>

4.3.1.4. <record>
The <record> element specifies how media input is recorded. The <record> element has the following attributes:
Top   ToC   RFC6231 - Page 54
   timeout:  indicates the time to wait for user input to begin.  A
      valid value is a Time Designation (see Section 4.6.7).  The
      attribute is optional.  The default value is 5s.

   vadinitial:  controls whether Voice Activity Detection (VAD) is used
      to initiate the recording operation.  A valid value is a boolean
      (see Section 4.6.1).  A value of true indicates the MS MUST
      initiate recording if the VAD detects voice on the configured
      inbound audio streams.  A value of false indicates that the MS
      MUST NOT initiate recording using VAD.  The attribute is optional.
      The default value is false.

   vadfinal:  controls whether VAD is used to terminate the recording
      operation.  A valid value is a boolean (see Section 4.6.1).  A
      value of true indicates the MS MUST terminate recording if the VAD
      detects a period of silence (whose duration is specified by the
      finalsilence attribute) on configured inbound audio streams.  A
      value of false indicates that the MS MUST NOT terminate recording
      using VAD.  The attribute is optional.  The default value is
      false.

   dtmfterm:  indicates whether the recording operation is terminated by
      DTMF input.  A valid value is a boolean (see Section 4.6.1).  A
      value of true indicates that recording is terminated by DTMF
      input.  A value of false indicates that recording is not
      terminated by DTMF input.  The attribute is optional.  The default
      value is true.

   maxtime:  indicates the maximum duration of the recording.  A valid
      value is a Time Designation (see Section 4.6.7).  The attribute is
      optional.  The default value is 15s.

   beep:  indicates whether a 'beep' is to be played immediately prior
      to initiation of the recording operation.  A valid value is a
      boolean (see Section 4.6.1).  The attribute is optional.  The
      default value is false.

   finalsilence:  indicates the interval of silence that indicates the
      end of voice input.  This interval is not part of the recording
      itself.  This parameter is ignored if the vadfinal attribute has
      the value false.  A valid value is a Time Designation (see
      Section 4.6.7).  The attribute is optional.  The default value is
      5s.

   append:  indicates whether recorded data is appended or not to a
      recording location if a resource already exists.  A valid value is
      a boolean (see Section 4.6.1).  A value of true indicates that
      recorded data is appended to the existing resource at a recording
Top   ToC   RFC6231 - Page 55
      location.  A value of false indicates that recorded data is to
      overwrite the existing resource.  The attribute is optional.  The
      default value is false.

      When a recording location is specified using the HTTP or HTTPS
      protocol, the recording operation SHOULD be performed using the
      HTTP GET and PUT methods, unless the HTTP server provides a
      special interface for recording uploads and appends (e.g., using
      POST).  When the append attribute has the value false, the
      recording data is uploaded to the specified location using HTTP
      PUT and replaces any data at that location on the HTTP origin
      server.  When append has the value true, the existing data (if
      any) is first downloaded from the specified location using HTTP
      GET, then the recording data is appended to the existing recording
      (note that this might require codec conversion and modification to
      the existing data), then the combined recording is uploaded to the
      specified location using HTTP PUT.  HTTP errors are handled as
      described in [RFC2616].

      When the recording location is specified using protocols other
      than HTTP or HTTPS, the mapping of the append operation onto the
      upload protocol scheme is implementation specific.

   If either the vadinitial or vadfinal attribute is set to true and the
   MS does not support VAD, the MS sends a <response> with a 434 status
   code (Section 4.5).

   The <record> element has the following child element (0 or more
   occurrences):

   <media>:  specifies the location and type of the media resource for
      uploading recorded data (see Section 4.3.1.5).  The MS MUST
      support both HTTP [RFC2616] and HTTPS [RFC2818] schemes for
      uploading recorded data and the MS MAY support other schemes.  The
      MS uploads recorded data to this resource as soon as possible
      after recording is complete.  The element is optional.

   If multiple <media> elements are specified, then media input is to be
   recorded in parallel to multiple resource locations.

   If no <media> child element is specified, the MS MUST record media
   input but the recording location and the recording format are
   implementation specific (e.g., the MS records audio in the WAV format
   to a local disk accessible by HTTP).  The recording location and
   format are reported in <recordinfo> (Section 4.3.2.4) when the dialog
   terminates.  The recording MUST be available from this location until
   the connection or conference associated with the dialog on the MS
   terminates.
Top   ToC   RFC6231 - Page 56
   If the MS does not support the configuration required for recording
   from the input media streams to one or more <media> elements and a
   more specific error code is not defined for its child elements, the
   MS sends a <response> with a 423 status code (Section 4.5).

   Note that an MS MAY support uploading recorded data to recording
   locations at the same time the recording operation takes place.  Such
   implementations need to be aware of the requirements of certain
   recording formats (e.g., WAV) for metadata at the beginning of the
   uploaded file, that the finalsilence interval is not part of the
   recording and how these requirements interact with the URI scheme.

   The MS has the following execution model for recording after
   initialization:

   1.  If an error occurs during execution (e.g., authentication or
       communication error when trying to upload to a recording
       location), then the MS terminates record execution and reports
       the error to the dialog container (see Section 4.3).  The MS MAY
       report data recorded before the error in <recordinfo> (see
       Section 4.3.2.4).

   2.  If DTMF input (not matching a <control> operation) is received
       during prompt playback and the prompt bargein attribute is set to
       true, then the MS activates the record execution.  Otherwise, the
       MS activates it after the completion of prompt playback.

   3.  If a beep attribute with the value of true is specified, then the
       MS plays a beep tone.

   4.  The MS activates a timer with the duration of the value of the
       timeout attribute.  If the timer expires before the recording
       operation begins, then the MS terminates the recording execution
       and reports the status to dialog container with <recordinfo> (see
       Section 4.3.2.4) where the termmode attribute is set to noinput.

   5.  Initiation of the recording operation depends on the value of the
       vadinitial attribute.  If vadinitial has the value false, then
       the recording operation is initiated immediately.  Otherwise, the
       recording operation is initiated when voice activity is detected.

   6.  When the recording operation is initiated, a timer is started for
       the value of the maxtime attribute (maximum duration of the
       recording).  If the timer expires before the recording operation
       is complete, then the MS terminates recording execution and
       reports the execution status to the dialog container with
       <recordinfo> (see Section 4.3.2.4) where the termmode attribute
       set to maxtime.
Top   ToC   RFC6231 - Page 57
   7.  During the record operation input, media streams are recording to
       a location and format specified in one or more <media> child
       elements.  If no <media> child element is specified, the MS
       records input to an implementation-specific location and format.

   8.  If the dtmfterm attribute has the value true and DTMF input is
       detected during the record operation, then the MS terminates
       recording and its status is reported to the dialog container with
       a <recordinfo> (see Section 4.3.2.4) where the termmode attribute
       is set to dtmf.

   9.  If vadfinal attribute has the value true, then the MS terminates
       the recording operation when a period of silence, with the
       duration specified by the value of the finalsilence attribute, is
       detected.  This period of silence is not part of the final
       recording.  The status is reported to the dialog container with a
       <recordinfo> (see Section 4.3.2.4) where the termmode attribute
       is set to finalsilence.

   For example, a request to record audio and video input to separate
   locations:

   <mscivr version="1.0" xmlns="urn:ietf:params:xml:ns:msc-ivr">
   <dialogstart connectionid="c1">
     <dialog>
      <record maxtime="30s" vadinitial="false" vadfinal="false">
       <media type="audio/x-wav"
           loc="http://www.example.com/upload/audio.wav"/>
       <media type="video/3gpp;codecs='s263'"
           loc="http://www.example.com/upload/video.3gp"/>
      </record>
     </dialog>
    </dialogstart>
   </mscivr>

   When the <record> element is executed, it immediately begins
   recording of the audio and video (since vadinitial is false) where
   the destination locations are specified in the <media> child
   elements.  Recording is completed when the duration reaches 30s or
   the connection is terminated.

4.3.1.5. <media>
The <media> element specifies a media resource to playback from (see Section 4.3.1.1) or record to (see Section 4.3.1.4). In the playback case, the resource is retrieved and in the recording case, recording data is uploaded to the resource location.
Top   ToC   RFC6231 - Page 58
   A <media> element has the following attributes:

   loc:  specifies the location of the media resource.  A valid value is
      a URI (see Section 4.6.9).  The MS MUST support both HTTP
      [RFC2616] and HTTPS [RFC2818] schemes and the MS MAY support other
      schemes.  If the URI scheme is not supported by the MS, the MS
      sends a <response> with a 420 status code (Section 4.5).  If the
      resource is to be retrieved but the MS cannot retrieve it within
      the timeout interval, the MS sends a <response> with a 409 status
      code.  If the format of the media resource is not supported, the
      MS sends a <response> with a 429 status code.  The attribute is
      mandatory.

   type:  specifies the type of the media resource indicated in the loc
      attribute.  A valid value is a MIME media type (see
      Section 4.6.10) that, depending on its definition, can include
      additional parameters (e.g., [RFC4281]).  If the URI scheme used
      in the loc attribute defines a mechanism for establishing the
      authoratitive MIME media type of the media resource, the value
      returned by that mechanism takes precedence over this attribute.
      If additional media parameters are specified, the MS MUST use them
      to determine media processing.  For example, [RFC4281] defines a
      'codec' parameter for media types like video/3gpp that would
      determine which media streams are played or recorded.  The
      attribute is optional.  There is no default value.

   fetchtimeout:  the maximum interval to wait when fetching a media
      resource.  A valid value is a Time Designation (see
      Section 4.6.7).  The attribute is optional.  The default value is
      30s.

   soundLevel:  playback soundLevel (volume) for the media resource.  A
      valid value is a percentage (see Section 4.6.8).  The value
      indicates increase or decrease relative to the original recorded
      volume of the media.  A value of 100% (the default) plays the
      media at its recorded volume, a value of 200% will play the media
      twice recorded volume, 50% at half its recorded volume, a value of
      0% will play the media silently, and so on.  See 'soundLevel' in
      SMIL [W3C.REC-SMIL2-20051213] for further information.  The
      attribute is optional.  The default value is 100%.

   clipBegin:  offset from start of media resource to begin playback.  A
      valid value is a Time Designation (see Section 4.6.7).  The offset
      is measured in normal media playback time from the beginning of
      the media resource.  If the clipBegin offset is after the end of
      media (or the clipEnd offset), no media is played.  See
      'clipBegin' in SMIL [W3C.REC-SMIL2-20051213] for further
      information.  The attribute is optional.  The default value is 0s.
Top   ToC   RFC6231 - Page 59
   clipEnd:  offset from start of media resource to end playback.  A
      valid value is a Time Designation (see Section 4.6.7).  The offset
      is measured in normal media playback time from the beginning of
      the media resource.  If the clipEnd offset is after the end of
      media, then the media is played to the end.  If clipBegin is after
      clipEnd, then no media is played.  See 'clipEnd' in SMIL
      [W3C.REC-SMIL2-20051213] for further information.  The attribute
      is optional.  There is no default value.

   The fetchtimeout, soundLevel, clipBegin, and clipEnd attributes are
   only relevant in the playback use case.  The MS ignores these
   attributes when using the <media> for recording.

   The <media> element has no children.

4.3.2. Exit Information

When the dialog exits, information about the specified operations is reported in a <dialogexit> notification event (Section 4.2.5.1).
4.3.2.1. <promptinfo>
The <promptinfo> element reports the information about prompt execution. It has the following attributes: duration: indicates the duration of prompt playback in milliseconds. A valid value is a non-negative integer (see Section 4.6.4). The attribute is optional. There is no default value. termmode: indicates how playback was terminated. Valid values are 'stopped', 'completed', or 'bargein'. The attribute is mandatory. The <promptinfo> element has no child elements.
4.3.2.2. <controlinfo>
The <controlinfo> element reports information about control execution. The <controlinfo> element has no attributes and has 0 or more <controlmatch> child elements each describing an individual runtime control match.
4.3.2.2.1. <controlmatch>
The <controlmatch> element has the following attributes:
Top   ToC   RFC6231 - Page 60
   dtmf:  DTMF input triggering the runtime control.  A valid value is a
      DTMF string (see Section 4.6.3) with no space between characters.
      The attribute is mandatory.

   timestamp:  indicates the time (on the MS) at which the control was
      triggered.  A valid value is a dateTime expression
      (Section 4.6.12).  The attribute is mandatory.

   The <controlmatch> element has no child elements.

4.3.2.3. <collectinfo>
The <collectinfo> element reports the information about collect execution. The <collectinfo> element has the following attributes: dtmf: DTMF input collected from the user. A valid value is a DTMF string (see Section 4.6.3) with no space between characters. The attribute is optional. There is no default value. termmode: indicates how collection was terminated. Valid values are 'stopped', 'match', 'noinput', or 'nomatch'. The attribute is mandatory. The <collectinfo> element has no child elements.
4.3.2.4. <recordinfo>
The <recordinfo> element reports information about record execution (Section 4.3.1.4). The <recordinfo> element has the following attributes: termmode: indicates how recording was terminated. Valid values are 'stopped', 'noinput', 'dtmf', 'maxtime', and 'finalsilence'. The attribute is mandatory. duration: indicates the duration of the recording in milliseconds. A valid value is a non-negative integer (see Section 4.6.4). The attribute is optional. There is no default value. The <recordinfo> element has the following child element (0 or more occurrences): <mediainfo>: indicates information about a recorded media resource (see Section 4.3.2.4.1). The element is optional.
Top   ToC   RFC6231 - Page 61
   When the record operation is successful, the MS MUST specify a
   <mediainfo> element for each recording location.  For example, if the
   <record> element contained three <media> child elements, then the
   <recordinfo> would contain three <mediainfo> child elements.

4.3.2.4.1. <mediainfo>
The <mediainfo> element reports information about a recorded media resource. The <mediainfo> element has the following attributes: loc: indicates the location of the media resource. A valid value is a URI (see Section 4.6.9). The attribute is mandatory. type: indicates the format of the media resource. A valid value is a MIME media type (see Section 4.6.10). The attribute is mandatory. size: indicates the size of the media resource in bytes. A valid value is a non-negative integer (see Section 4.6.4). The attribute is optional. There is no default value.


(page 61 continued on part 4)

Next Section