RFC 4463

A Media Resource Control Protocol (MRCP) Developed by Cisco, Nuance, and Speechworks

Pages: 86
Informational

Part 2 of 4 – Pages 21 to 41

RFC4463 - Page 21 prevText

7.  Speech Synthesizer Resource

   This resource is capable of converting text provided by the client
   and generating a speech stream in real-time.  Depending on the
   implementation and capability of this resource, the client can
   control parameters like voice characteristics, speaker speed, etc.

   The synthesizer resource is controlled by MRCP requests from the
   client.  Similarly, the resource can respond to these requests or
   generate asynchronous events to the server to indicate certain
   conditions during the processing of the stream.

RFC4463 - Page 22

7.1.  Synthesizer State Machine

   The synthesizer maintains states because it needs to correlate MRCP
   requests from the client.  The state transitions shown below describe
   the states of the synthesizer and reflect the request at the head of
   the queue.  A SPEAK request in the PENDING state can be deleted or
   stopped by a STOP request and does not affect the state of the
   resource.

        Idle                   Speaking                  Paused
        State                  State                     State
        |                       |                          |
        |----------SPEAK------->|                 |--------|
        |<------STOP------------|             CONTROL      |
        |<----SPEAK-COMPLETE----|                 |------->|
        |<----BARGE-IN-OCCURRED-|                          |
        |              |--------|                          |
        |          CONTROL      |-----------PAUSE--------->|
        |              |------->|<----------RESUME---------|
        |                       |               |----------|
        |                       |              PAUSE       |
        |                       |               |--------->|
        |              |--------|----------|               |
        |     BARGE-IN-OCCURRED |      SPEECH-MARKER       |
        |              |------->|<---------|               |
        |----------|            |             |------------|
        |         STOP          |          SPEAK           |
        |          |            |             |----------->|
        |<---------|                                       |
        |<-------------------STOP--------------------------|

7.2.  Synthesizer Methods

   The synthesizer supports the following methods.

     synthesizer-method  =  "SET-PARAMS"
                         /  "GET-PARAMS"
                         /  "SPEAK"
                         /  "STOP"
                         /  "PAUSE"
                         /  "RESUME"
                         /  "BARGE-IN-OCCURRED"
                         /  "CONTROL"

RFC4463 - Page 23

7.3.  Synthesizer Events

   The synthesizer may generate the following events.

     synthesizer-event   =  "SPEECH-MARKER"
                         /  "SPEAK-COMPLETE"

7.4.  Synthesizer Header Fields

   A synthesizer message may contain header fields containing request
   options and information to augment the Request, Response, or Event of
   the message with which it is associated.

     synthesizer-header  =  jump-target       ; Section 7.4.1
                         /  kill-on-barge-in  ; Section 7.4.2
                         /  speaker-profile   ; Section 7.4.3
                         /  completion-cause  ; Section 7.4.4
                         /  voice-parameter   ; Section 7.4.5
                         /  prosody-parameter ; Section 7.4.6
                         /  vendor-specific   ; Section 7.4.7
                         /  speech-marker     ; Section 7.4.8
                         /  speech-language   ; Section 7.4.9
                         /  fetch-hint        ; Section 7.4.10
                         /  audio-fetch-hint  ; Section 7.4.11
                         /  fetch-timeout     ; Section 7.4.12
                         /  failed-uri        ; Section 7.4.13
                         /  failed-uri-cause  ; Section 7.4.14
                         /  speak-restart     ; Section 7.4.15
                         /  speak-length      ; Section 7.4.16

     Parameter           Support        Methods/Events/Response

     jump-target         MANDATORY      SPEAK, CONTROL
     logging-tag         MANDATORY      SET-PARAMS, GET-PARAMS
     kill-on-barge-in    MANDATORY      SPEAK
     speaker-profile     OPTIONAL       SET-PARAMS, GET-PARAMS,
                                        SPEAK, CONTROL
     completion-cause    MANDATORY      SPEAK-COMPLETE
     voice-parameter     MANDATORY      SET-PARAMS, GET-PARAMS,
                                        SPEAK, CONTROL
     prosody-parameter   MANDATORY      SET-PARAMS, GET-PARAMS,
                                        SPEAK, CONTROL
     vendor-specific     MANDATORY      SET-PARAMS, GET-PARAMS
     speech-marker       MANDATORY      SPEECH-MARKER
     speech-language     MANDATORY      SET-PARAMS, GET-PARAMS, SPEAK
     fetch-hint          MANDATORY      SET-PARAMS, GET-PARAMS, SPEAK
     audio-fetch-hint    MANDATORY      SET-PARAMS, GET-PARAMS, SPEAK
     fetch-timeout       MANDATORY      SET-PARAMS, GET-PARAMS, SPEAK

RFC4463 - Page 24

     failed-uri          MANDATORY      Any
     failed-uri-cause    MANDATORY      Any
     speak-restart       MANDATORY      CONTROL
     speak-length        MANDATORY      SPEAK, CONTROL

7.4.1.  Jump-Target

   This parameter MAY BE specified in a CONTROL method and controls the
   jump size to move forward or rewind backward on an active SPEAK
   request.  A + or - indicates a relative value to what is being
   currently played.  This MAY BE specified in a SPEAK request to
   indicate an offset into the speech markup that the SPEAK request
   should start speaking from.  The different speech length units
   supported are dependent on the synthesizer implementation.  If it
   does not support a unit or the operation, the resource SHOULD respond
   with a status code of 404 "Illegal or Unsupported value for
   parameter".

     jump-target         =    "Jump-Size" ":" speech-length-value CRLF
     speech-length-value =    numeric-speech-length
                         /    text-speech-length
     text-speech-length  =    1*ALPHA SP "Tag"
     numeric-speech-length=   ("+" / "-") 1*DIGIT SP
                              numeric-speech-unit
     numeric-speech-unit =    "Second"
                         /    "Word"
                         /    "Sentence"
                         /    "Paragraph"

7.4.2.  Kill-On-Barge-In

   This parameter MAY BE sent as part of the SPEAK method to enable
   kill-on-barge-in support.  If enabled, the SPEAK method is
   interrupted by DTMF input detected by a signal detector resource or
   by the start of speech sensed or recognized by the speech recognizer
   resource.

     kill-on-barge-in    =    "Kill-On-Barge-In" ":" boolean-value CRLF
     boolean-value       =    "true" / "false"

   If the recognizer or signal detector resource is on, the same server
   as the synthesizer, the server should be intelligent enough to
   recognize their interactions by their common RTSP session-id and work
   with each other to provide kill-on-barge-in support.  The client
   needs to send a BARGE-IN-OCCURRED method to the synthesizer resource
   when it receives a barge-in-able event from the synthesizer resource

RFC4463 - Page 25

   or signal detector resource.  These resources MAY BE local or
   distributed.  If this field is not specified, the value defaults to
   "true".

7.4.3.  Speaker Profile

   This parameter MAY BE part of the SET-PARAMS/GET-PARAMS or SPEAK
   request from the client to the server and specifies the profile of
   the speaker by a URI, which may be a set of voice parameters like
   gender, accent, etc.

     speaker-profile     =    "Speaker-Profile" ":" uri CRLF

7.4.4.  Completion Cause

   This header field MUST be specified in a SPEAK-COMPLETE event coming
   from the synthesizer resource to the client.  This indicates the
   reason behind the SPEAK request completion.

     completion-cause    =    "Completion-Cause" ":" 1*DIGIT SP 1*ALPHA
                             CRLF

   Cause-Code  Cause-Name     Description
     000       normal         SPEAK completed normally.
     001       barge-in       SPEAK request was terminated because
                              of barge-in.
     002       parse-failure  SPEAK request terminated because of a
                              failure to parse the speech markup text.
     003       uri-failure    SPEAK request terminated because, access
                              to one of the URIs failed.
     004       error          SPEAK request terminated prematurely due
                              to synthesizer error.
     005       language-unsupported
                              Language not supported.

7.4.5.  Voice-Parameters

   This set of parameters defines the voice of the speaker.

     voice-parameter     =    "Voice-" voice-param-name ":"
                              voice-param-value CRLF

   voice-param-name is any one of the attribute names under the voice
   element specified in W3C's Speech Synthesis Markup Language
   Specification [9].  The voice-param-value is any one of the value
   choices of the corresponding voice element attribute specified in the
   above section.

RFC4463 - Page 26

   These header fields MAY BE sent in SET-PARAMS/GET-PARAMS request to
   define/get default values for the entire session or MAY BE sent in
   the SPEAK request to define default values for that speak request.
   Furthermore, these attributes can be part of the speech text marked
   up in Speech Synthesis Markup Language (SSML).

   These voice parameter header fields can also be sent in a CONTROL
   method to affect a SPEAK request in progress and change its behavior
   on the fly.  If the synthesizer resource does not support this
   operation, it should respond back to the client with a status of
   unsupported.

7.4.6.  Prosody-Parameters

   This set of parameters defines the prosody of the speech.

     prosody-parameter   =    "Prosody-" prosody-param-name ":"
                              prosody-param-value CRLF

   prosody-param-name is any one of the attribute names under the
   prosody element specified in W3C's Speech Synthesis Markup Language
   Specification [9].  The prosody-param-value is any one of the value
   choices of the corresponding prosody element attribute specified in
   the above section.

   These header fields MAY BE sent in SET-PARAMS/GET-PARAMS request to
   define/get default values for the entire session or MAY BE sent in
   the SPEAK request to define default values for that speak request.
   Furthermore, these attributes can be part of the speech text marked
   up in SSML.

   The prosody parameter header fields in the SET-PARAMS or SPEAK
   request only apply if the speech data is of type text/plain and does
   not use a speech markup format.

   These prosody parameter header fields MAY also be sent in a CONTROL
   method to affect a SPEAK request in progress and to change its
   behavior on the fly.  If the synthesizer resource does not support
   this operation, it should respond back to the client with a status of
   unsupported.

RFC4463 - Page 27

7.4.7.  Vendor-Specific Parameters

   This set of headers allows for the client to set vendor-specific
   parameters.

     vendor-specific         = "Vendor-Specific-Parameters" ":"
                               vendor-specific-av-pair
                               *[";" vendor-specific-av-pair] CRLF

     vendor-specific-av-pair = vendor-av-pair-name "="
                               vendor-av-pair-value

   This header MAY BE sent in the SET-PARAMS/GET-PARAMS method and is
   used to set vendor-specific parameters on the server side.  The
   vendor-av-pair-name can be any vendor-specific field name and
   conforms to the XML vendor-specific attribute naming convention.  The
   vendor-av-pair-value is the value to set the attribute to and needs
   to be quoted.

   When asking the server to get the current value of these parameters,
   this header can be sent in the GET-PARAMS method with the list of
   vendor-specific attribute names to get separated by a semicolon.

7.4.8.  Speech Marker

   This header field contains a marker tag that may be embedded in the
   speech data.  Most speech markup formats provide mechanisms to embed
   marker fields between speech texts.  The synthesizer will generate
   SPEECH-MARKER events when it reaches these marker fields.  This field
   SHOULD be part of the SPEECH-MARKER event and will contain the marker
   tag values.

     speech-marker =          "Speech-Marker" ":" 1*ALPHA CRLF

7.4.9.  Speech Language

   This header field specifies the default language of the speech data
   if it is not specified in the speech data.  The value of this header
   field should follow RFC 3066 [16] for its values.  This MAY occur in
   SPEAK, SET-PARAMS, or GET-PARAMS request.

     speech-language          =    "Speech-Language" ":" 1*ALPHA CRLF

RFC4463 - Page 28

7.4.10.  Fetch Hint

   When the synthesizer needs to fetch documents or other resources like
   speech markup or audio files, etc., this header field controls URI
   access properties.  This defines when the synthesizer should retrieve
   content from the server.  A value of "prefetch" indicates a file may
   be downloaded when the request is received, whereas "safe" indicates
   a file that should only be downloaded when actually needed.  The
   default value is "prefetch".  This header field MAY occur in SPEAK,
   SET-PARAMS, or GET-PARAMS requests.

     fetch-hint               =    "Fetch-Hint" ":" 1*ALPHA CRLF

7.4.11.  Audio Fetch Hint

   When the synthesizer needs to fetch documents or other resources like
   speech audio files, etc., this header field controls URI access
   properties.  This defines whether or not the synthesizer can attempt
   to optimize speech by pre-fetching audio.  The value is either "safe"
   to say that audio is only fetched when it is needed, never before;
   "prefetch" to permit, but not require the platform to pre-fetch the
   audio; or "stream" to allow it to stream the audio fetches.  The
   default value is "prefetch".  This header field MAY occur in SPEAK,
   SET-PARAMS, or GET-PARAMS requests.

     audio-fetch-hint         =    "Audio-Fetch-Hint" ":" 1*ALPHA CRLF

7.4.12.  Fetch Timeout

   When the synthesizer needs to fetch documents or other resources like
   speech audio files, etc., this header field controls URI access
   properties.  This defines the synthesizer timeout for resources the
   media server may need to fetch from the network.  This is specified
   in milliseconds.  The default value is platform-dependent.  This
   header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS.

     fetch-timeout            =    "Fetch-Timeout" ":" 1*DIGIT CRLF

7.4.13.  Failed URI

   When a synthesizer method needs a synthesizer to fetch or access a
   URI, and the access fails, the media server SHOULD provide the failed
   URI in this header field in the method response.

     failed-uri               =    "Failed-URI" ":" Url CRLF

RFC4463 - Page 29

7.4.14.  Failed URI Cause

   When a synthesizer method needs a synthesizer to fetch or access a
   URI, and the access fails, the media server SHOULD provide the URI
   specific or protocol-specific response code through this header field
   in the method response.  This field has been defined as alphanumeric
   to accommodate all protocols, some of which might have a response
   string instead of a numeric response code.

     failed-uri-cause         =    "Failed-URI-Cause" ":" 1*ALPHA CRLF

7.4.15.  Speak Restart

   When a CONTROL jump backward request is issued to a currently
   speaking synthesizer resource and the jumps beyond the start of the
   speech, the current SPEAK request re-starts from the beginning of its
   speech data and the response to the CONTROL request would contain
   this header indicating a restart.  This header MAY occur in the
   CONTROL response.

     speak-restart       =    "Speak-Restart" ":" boolean-value CRLF

7.4.16.  Speak Length

   This parameter MAY BE specified in a CONTROL method to control the
   length of speech to speak, relative to the current speaking point in
   the currently active SPEAK request.  A "-" value is illegal in this
   field.  If a field with a Tag unit is specified, then the media must
   speak until the tag is reached or the SPEAK request complete,
   whichever comes first.  This MAY BE specified in a SPEAK request to
   indicate the length to speak in the speech data and is relative to
   the point in speech where the SPEAK request starts.  The different
   speech length units supported are dependent on the synthesizer
   implementation.  If it does not support a unit or the operation, the
   resource SHOULD respond with a status code of 404 "Illegal or
   Unsupported value for parameter".

     speak-length        =    "Speak-Length" ":" speech-length-value
                              CRLF

7.5.  Synthesizer Message Body

   A synthesizer message may contain additional information associated
   with the Method, Response, or Event in its message body.

RFC4463 - Page 30

7.5.1.  Synthesizer Speech Data

   Marked-up text for the synthesizer to speak is specified as a MIME
   entity in the message body.  The message to be spoken by the
   synthesizer can be specified inline (by embedding the data in the
   message body) or by reference (by providing the URI to the data).  In
   either case, the data and the format used to markup the speech needs
   to be supported by the media server.

   All media servers MUST support plain text speech data and W3C's
   Speech Synthesis Markup Language [9] at a minimum and, hence, MUST
   support the MIME types text/plain and application/synthesis+ssml at a
   minimum.

   If the speech data needs to be specified by URI reference, the MIME
   type text/uri-list is used to specify the one or more URIs that will
   list what needs to be spoken.  If a list of speech URIs is specified,
   speech data provided by each URI must be spoken in the order in which
   the URI are specified.

   If the data to be spoken consists of a mix of URI and inline speech
   data, the multipart/mixed MIME-type is used and embedded with the
   MIME-blocks for text/uri-list, application/synthesis+ssml or
   text/plain.  The character set and encoding used in the speech data
   may be specified according to standard MIME-type definitions.  The
   multi-part MIME-block can contain actual audio data in .wav or Sun
   audio format.  This is used when the client has audio clips that it
   may have recorded, then stored in memory or a local device, and that
   it currently needs to play as part of the SPEAK request.  The audio
   MIME-parts can be sent by the client as part of the multi-part MIME-
   block.  This audio will be referenced in the speech markup data that
   will be another part in the multi-part MIME-block according to the
   multipart/mixed MIME-type specification.

   Example 1:
       Content-Type:text/uri-list
       Content-Length:176

       http://www.cisco.com/ASR-Introduction.sml
       http://www.cisco.com/ASR-Document-Part1.sml
       http://www.cisco.com/ASR-Document-Part2.sml
       http://www.cisco.com/ASR-Conclusion.sml

   Example 2:
       Content-Type:application/synthesis+ssml
       Content-Length:104

       <?xml version="1.0"?>

RFC4463 - Page 31

       <speak>
       <paragraph>
                <sentence>You have 4 new messages.</sentence>
                <sentence>The first is from <say-as
                type="name">Stephanie Williams</say-as>
                and arrived at <break/>
                <say-as type="time">3:45pm</say-as>.</sentence>

                <sentence>The subject is <prosody
                rate="-20%">ski trip</prosody></sentence>
       </paragraph>
       </speak>

   Example 3:
       Content-Type:multipart/mixed; boundary="--break"

       --break
       Content-Type:text/uri-list
       Content-Length:176

       http://www.cisco.com/ASR-Introduction.sml
       http://www.cisco.com/ASR-Document-Part1.sml
       http://www.cisco.com/ASR-Document-Part2.sml
       http://www.cisco.com/ASR-Conclusion.sml

       --break
       Content-Type:application/synthesis+ssml
       Content-Length:104

       <?xml version="1.0"?>
       <speak>
       <paragraph>
                <sentence>You have 4 new messages.</sentence>
                <sentence>The first is from <say-as
                type="name">Stephanie Williams</say-as>
                and arrived at <break/>
                <say-as type="time">3:45pm</say-as>.</sentence>

                <sentence>The subject is <prosody
                rate="-20%">ski trip</prosody></sentence>
       </paragraph>
       </speak>
        --break

RFC4463 - Page 32

7.6.  SET-PARAMS

   The SET-PARAMS method, from the client to server, tells the
   synthesizer resource to define default synthesizer context
   parameters, like voice characteristics and prosody, etc.  If the
   server accepted and set all parameters, it MUST return a Response-
   Status of 200.  If it chose to ignore some optional parameters, it
   MUST return 201.

   If some of the parameters being set are unsupported or have illegal
   values, the server accepts and sets the remaining parameters and MUST
   respond with a Response-Status of 403 or 404, and MUST include in the
   response the header fields that could not be set.

   Example:
     C->S:SET-PARAMS 543256 MRCP/1.0
         Voice-gender:female
         Voice-category:adult
         Voice-variant:3

     S->C:MRCP/1.0 543256 200 COMPLETE

7.7.  GET-PARAMS

   The GET-PARAMS method, from the client to server, asks the
   synthesizer resource for its current synthesizer context parameters,
   like voice characteristics and prosody, etc.  The client SHOULD send
   the list of parameters it wants to read from the server by listing a
   set of empty parameter header fields.  If a specific list is not
   specified then the server SHOULD return all the settable parameters
   including vendor-specific parameters and their current values.  The
   wild card use can be very intensive as the number of settable
   parameters can be large depending on the vendor.  Hence, it is
   RECOMMENDED that the client does not use the wildcard GET-PARAMS
   operation very often.

   Example:
     C->S:GET-PARAMS 543256 MRCP/1.0
          Voice-gender:
          Voice-category:
          Voice-variant:
          Vendor-Specific-Parameters:com.mycorp.param1;
                      com.mycorp.param2

     S->C:MRCP/1.0 543256 200 COMPLETE
          Voice-gender:female
          Voice-category:adult
          Voice-variant:3

RFC4463 - Page 33

          Vendor-Specific-Parameters:com.mycorp.param1="Company Name";
                         com.mycorp.param2="124324234@mycorp.com"

7.8.  SPEAK

   The SPEAK method from the client to the server provides the
   synthesizer resource with the speech text and initiates speech
   synthesis and streaming.  The SPEAK method can carry voice and
   prosody header fields that define the behavior of the voice being
   synthesized, as well as the actual marked-up text to be spoken.  If
   specific voice and prosody parameters are specified as part of the
   speech markup text, it will take precedence over the values specified
   in the header fields and those set using a previous SET-PARAMS
   request.

   When applying voice parameters, there are 3 levels of scope.  The
   highest precedence are those specified within the speech markup text,
   followed by those specified in the header fields of the SPEAK request
   and, hence, apply for that SPEAK request only, followed by the
   session default values that can be set using the SET-PARAMS request
   and apply for the whole session moving forward.

   If the resource is idle and the SPEAK request is being actively
   processed, the resource will respond with a success status code and a
   request-state of IN-PROGRESS.

   If the resource is in the speaking or paused states (i.e., it is in
   the middle of processing a previous SPEAK request), the status
   returns success and a request-state of PENDING.  This means that this
   SPEAK request is in queue and will be processed after the currently
   active SPEAK request is completed.

   For the synthesizer resource, this is the only request that can
   return a request-state of IN-PROGRESS or PENDING.  When the text to
   be synthesized is complete, the resource will issue a SPEAK-COMPLETE
   event with the request-id of the SPEAK message and a request-state of
   COMPLETE.

   Example:
     C->S:SPEAK 543257 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

RFC4463 - Page 34

          <?xml version="1.0"?>
          <speak>
          <paragraph>
            <sentence>You have 4 new messages.</sentence>
            <sentence>The first is from <say-as
            type="name">Stephanie Williams</say-as>
            and arrived at <break/>
            <say-as type="time">3:45pm</say-as>.</sentence>

            <sentence>The subject is <prosody
            rate="-20%">ski trip</prosody></sentence>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543257 200 IN-PROGRESS

     S->C:SPEAK-COMPLETE 543257 COMPLETE MRCP/1.0
          Completion-Cause:000 normal

7.9.  STOP

   The STOP method from the client to the server tells the resource to
   stop speaking if it is speaking something.

   The STOP request can be sent with an active-request-id-list header
   field to stop the zero or more specific SPEAK requests that may be in
   queue and return a response code of 200(Success).  If no active-
   request-id-list header field is sent in the STOP request, it will
   terminate all outstanding SPEAK requests.

   If a STOP request successfully terminated one or more PENDING or
   IN-PROGRESS SPEAK requests, then the response message body contains
   an active-request-id-list header field listing the SPEAK request-ids
   that were terminated.  Otherwise, there will be no active-request-
   id-list header field in the response.  No SPEAK-COMPLETE events will
   be sent for these terminated requests.

   If a SPEAK request that was IN-PROGRESS and speaking was stopped, the
   next pending SPEAK request, if any, would become IN-PROGRESS and move
   to the speaking state.

   If a SPEAK request that was IN-PROGRESS and in the paused state was
   stopped, the next pending SPEAK request, if any, would become
   IN-PROGRESS and move to the paused state.

RFC4463 - Page 35

   Example:
     C->S:SPEAK 543258 MRCP/1.0
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
          <paragraph>
            <sentence>You have 4 new messages.</sentence>
            <sentence>The first is from <say-as
            type="name">Stephanie Williams</say-as>
            and arrived at <break/>
            <say-as type="time">3:45pm</say-as>.</sentence>

            <sentence>The subject is <prosody
            rate="-20%">ski trip</prosody></sentence>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543258 200 IN-PROGRESS

     C->S:STOP 543259 200 MRCP/1.0

     S->C:MRCP/1.0 543259 200 COMPLETE
          Active-Request-Id-List:543258

7.10.  BARGE-IN-OCCURRED

   The BARGE-IN-OCCURRED method is a mechanism for the client to
   communicate a barge-in-able event it detects to the speech resource.

   This event is useful in two scenarios,

   1.  The client has detected some events like DTMF digits or other
       barge-in-able events and wants to communicate that to the
       synthesizer.

   2.  The recognizer resource and the synthesizer resource are in
       different servers.  In which case the client MUST act as a Proxy
       and receive event from the recognition resource, and then send a
       BARGE-IN-OCCURRED method to the synthesizer.  In such cases, the
       BARGE-IN-OCCURRED method would also have a proxy-sync-id header
       field received from the resource generating the original event.

   If a SPEAK request is active with kill-on-barge-in enabled, and the
   BARGE-IN-OCCURRED event is received, the synthesizer should stop
   streaming out audio.  It should also terminate any speech requests
   queued behind the current active one, irrespective of whether they

RFC4463 - Page 36

   have barge-in enabled or not.  If a barge-in-able prompt was playing
   and it was terminated, the response MUST contain the request-ids of
   all SPEAK requests that were terminated in its active-request-id-
   list.  There will be no SPEAK-COMPLETE events generated for these
   requests.

   If the synthesizer and the recognizer are on the same server, they
   could be optimized for a quicker kill-on-barge-in response by having
   them interact directly based on a common RTSP session-id.  In these
   cases, the client MUST still proxy the recognition event through a
   BARGE-IN-OCCURRED method, but the synthesizer resource may have
   already stopped and sent a SPEAK-COMPLETE event with a barge-in
   completion cause code.  If there were no SPEAK requests terminated as
   a result of the BARGE-IN-OCCURRED method, the response would still be
   a 200 success, but MUST not contain an active-request-id-list header
   field.

     C->S:SPEAK 543258 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
          <paragraph>
            <sentence>You have 4 new messages.</sentence>
            <sentence>The first is from <say-as
            type="name">Stephanie Williams</say-as>
            and arrived at <break/>
            <say-as type="time">3:45pm</say-as>.</sentence>
            <sentence>The subject is <prosody
            rate="-20%">ski trip</prosody></sentence>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543258 200 IN-PROGRESS

     C->S:BARGE-IN-OCCURRED 543259 200 MRCP/1.0
          Proxy-Sync-Id:987654321

     S->C:MRCP/1.0 543259 200 COMPLETE
          Active-Request-Id-List:543258

RFC4463 - Page 37

7.11.  PAUSE

   The PAUSE method from the client to the server tells the resource to
   pause speech, if it is speaking something.  If a PAUSE method is
   issued on a session when a SPEAK is not active, the server SHOULD
   respond with a status of 402 or "Method not valid in this state".  If
   a PAUSE method is issued on a session when a SPEAK is active and
   paused, the server SHOULD respond with a status of 200 or "Success".
   If a SPEAK request was active, the server MUST return an active-
   request-id-list header with the request-id of the SPEAK request that
   was paused.

     C->S:SPEAK 543258 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
          <paragraph>
            <sentence>You have 4 new messages.</sentence>
            <sentence>The first is from <say-as
            type="name">Stephanie Williams</say-as>
            and arrived at <break/>
            <say-as type="time">3:45pm</say-as>.</sentence>

            <sentence>The subject is <prosody
            rate="-20%">ski trip</prosody></sentence>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543258 200 IN-PROGRESS

     C->S:PAUSE 543259 MRCP/1.0

     S->C:MRCP/1.0 543259 200 COMPLETE
          Active-Request-Id-List:543258

7.12.  RESUME

   The RESUME method from the client to the server tells a paused
   synthesizer resource to continue speaking.  If a RESUME method is
   issued on a session when a SPEAK is not active, the server SHOULD
   respond with a status of 402 or "Method not valid in this state".  If
   a RESUME method is issued on a session when a SPEAK is active and
   speaking (i.e., not paused), the server SHOULD respond with a status

RFC4463 - Page 38

   of 200 or "Success".  If a SPEAK request was active, the server MUST
   return an active-request-id-list header with the request-id of the
   SPEAK request that was resumed

   Example:
     C->S:SPEAK 543258 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
          <paragraph>
              <sentence>You have 4 new messages.</sentence>
              <sentence>The first is from <say-as
              type="name">Stephanie Williams</say-as>
              and arrived at <break/>
              <say-as type="time">3:45pm</say-as>.</sentence>

              <sentence>The subject is <prosody
              rate="-20%">ski trip</prosody></sentence>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543258 200 IN-PROGRESS

     C->S:PAUSE 543259 MRCP/1.0

     S->C:MRCP/1.0 543259 200 COMPLETE
          Active-Request-Id-List:543258

     C->S:RESUME 543260 MRCP/1.0

     S->C:MRCP/1.0 543260 200 COMPLETE
          Active-Request-Id-List:543258

7.13.  CONTROL

   The CONTROL method from the client to the server tells a synthesizer
   that is speaking to modify what it is speaking on the fly.  This
   method is used to make the synthesizer jump forward or backward in
   what it is being spoken, change speaker rate and speaker parameters,
   etc.  It affects the active or IN-PROGRESS SPEAK request.  Depending
   on the implementation and capability of the synthesizer resource, it
   may allow this operation or one or more of its parameters.

RFC4463 - Page 39

   When a CONTROL to jump forward is issued and the operation goes
   beyond the end of the active SPEAK method's text, the request
   succeeds.  A SPEAK-COMPLETE event follows the response to the CONTROL
   method.  If there are more SPEAK requests in the queue, the
   synthesizer resource will continue to process the next SPEAK method.
   When a CONTROL to jump backwards is issued and the operation jumps to
   the beginning of the speech data of the active SPEAK request, the
   response to the CONTROL request contains the speak-restart header.

   These two behaviors can be used to rewind or fast-forward across
   multiple speech requests, if the client wants to break up a speech
   markup text into multiple SPEAK requests.

   If a SPEAK request was active when the CONTROL method was received,
   the server MUST return an active-request-id-list header with the
   Request-id of the SPEAK request that was active.

   Example:
     C->S:SPEAK 543258 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
          <paragraph>
            <sentence>You have 4 new messages.</sentence>
            <sentence>The first is from <say-as
            type="name">Stephanie Williams</say-as>
            and arrived at <break/>
            <say-as type="time">3:45pm</say-as>.</sentence>

            <sentence>The subject is <prosody
            rate="-20%">ski trip</prosody></sentence>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543258 200 IN-PROGRESS

     C->S:CONTROL 543259 MRCP/1.0
          Prosody-rate:fast

     S->C:MRCP/1.0 543259 200 COMPLETE
          Active-Request-Id-List:543258

     C->S:CONTROL 543260 MRCP/1.0

RFC4463 - Page 40

          Jump-Size:-15 Words

     S->C:MRCP/1.0 543260 200 COMPLETE
          Active-Request-Id-List:543258

7.14.  SPEAK-COMPLETE

   This is an Event message from the synthesizer resource to the client
   indicating that the SPEAK request was completed.  The request-id
   header field WILL match the request-id of the SPEAK request that
   initiated the speech that just completed.  The request-state field
   should be COMPLETE indicating that this is the last Event with that
   request-id, and that the request with that request-id is now
   complete.  The completion-cause header field specifies the cause code
   pertaining to the status and reason of request completion such as the
   SPEAK completed normally or because of an error or kill-on-barge-in,
   etc.

   Example:
     C->S:SPEAK 543260 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
          <paragraph>
            <sentence>You have 4 new messages.</sentence>
            <sentence>The first is from <say-as
            type="name">Stephanie Williams</say-as>
            and arrived at <break/>
            <say-as type="time">3:45pm</say-as>.</sentence>

            <sentence>The subject is <prosody
            rate="-20%">ski trip</prosody></sentence>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543260 200 IN-PROGRESS

     S->C:SPEAK-COMPLETE 543260 COMPLETE MRCP/1.0

          Completion-Cause:000 normal

RFC4463 - Page 41

7.15.  SPEECH-MARKER

   This is an event generated by the synthesizer resource to the client
   when it hits a marker tag in the speech markup it is currently
   processing.  The request-id field in the header matches the SPEAK
   request request-id that initiated the speech.  The request-state
   field should be IN-PROGRESS as the speech is still not complete and
   there is more to be spoken.  The actual speech marker tag hit,
   describing where the synthesizer is in the speech markup, is returned
   in the speech-marker header field.

   Example:
     C->S:SPEAK 543261 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
          <paragraph>
            <sentence>You have 4 new messages.</sentence>
            <sentence>The first is from <say-as
            type="name">Stephanie Williams</say-as>
            and arrived at <break/>
            <say-as type="time">3:45pm</say-as>.</sentence>
            <mark name="here"/>
            <sentence>The subject is
               <prosody rate="-20%">ski trip</prosody>
            </sentence>
            <mark name="ANSWER"/>
          </paragraph>
          </speak>

     S->C:MRCP/1.0 543261 200 IN-PROGRESS

     S->C:SPEECH-MARKER 543261 IN-PROGRESS MRCP/1.0
          Speech-Marker:here

     S->C:SPEECH-MARKER 543261 IN-PROGRESS MRCP/1.0
          Speech-Marker:ANSWER

     S->C:SPEAK-COMPLETE 543261 COMPLETE MRCP/1.0
          Completion-Cause:000 normal