tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search

RFC 4463

Informational
Pages: 86
Top     in Index     Prev     Next
in Group Index     Prev in Group     Next in Group     Group: ~cisco

A Media Resource Control Protocol (MRCP) Developed by Cisco, Nuance, and Speechworks

Part 1 of 4, p. 1 to 21
None       Next RFC Part

 


Top       ToC       Page 1 
Network Working Group                                      S. Shanmugham
Request for Comments: 4463                           Cisco Systems, Inc.
Category: Informational                                        P. Monaco
                                                   Nuance Communications
                                                              B. Eberman
                                                        Speechworks Inc.
                                                              April 2006


                A Media Resource Control Protocol (MRCP)
              Developed by Cisco, Nuance, and Speechworks

Status of This Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard of any kind.  Distribution of this
   memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2006).

IESG Note

   This RFC is not a candidate for any level of Internet Standard.  The
   IETF disclaims any knowledge of the fitness of this RFC for any
   purpose and in particular notes that the decision to publish is not
   based on IETF review for such things as security, congestion control,
   or inappropriate interaction with deployed protocols.  The RFC Editor
   has chosen to publish this document at its discretion.  Readers of
   this document should exercise caution in evaluating its value for
   implementation and deployment.  See RFC 3932 for more information.

   Note that this document uses a MIME type 'application/mrcp' which has
   not been registered with the IANA, and is therefore not recognized as
   a standard IETF MIME type.  The historical value of this document as
   an ancestor to ongoing standardization in this space, however, makes
   the publication of this document meaningful.

Page 2 
Abstract

   This document describes a Media Resource Control Protocol (MRCP) that
   was developed jointly by Cisco Systems, Inc., Nuance Communications,
   and Speechworks, Inc.  It is published as an RFC as input for further
   IETF development in this area.

   MRCP controls media service resources like speech synthesizers,
   recognizers, signal generators, signal detectors, fax servers, etc.,
   over a network.  This protocol is designed to work with streaming
   protocols like RTSP (Real Time Streaming Protocol) or SIP (Session
   Initiation Protocol), which help establish control connections to
   external media streaming devices, and media delivery mechanisms like
   RTP (Real Time Protocol).

Table of Contents

   1. Introduction ....................................................3
   2. Architecture ....................................................4
      2.1. Resources and Services .....................................4
      2.2. Server and Resource Addressing .............................5
   3. MRCP Protocol Basics ............................................5
      3.1. Establishing Control Session and Media Streams .............5
      3.2. MRCP over RTSP .............................................6
      3.3. Media Streams and RTP Ports ................................8
   4. Notational Conventions ..........................................8
   5. MRCP Specification ..............................................9
      5.1. Request ...................................................10
      5.2. Response ..................................................10
      5.3. Event .....................................................12
      5.4. Message Headers ...........................................12
   6. Media Server ...................................................19
      6.1. Media Server Session ......................................19
   7. Speech Synthesizer Resource ....................................21
      7.1. Synthesizer State Machine .................................22
      7.2. Synthesizer Methods .......................................22
      7.3. Synthesizer Events ........................................23
      7.4. Synthesizer Header Fields .................................23
      7.5. Synthesizer Message Body ..................................29
      7.6. SET-PARAMS ................................................32
      7.7. GET-PARAMS ................................................32
      7.8. SPEAK .....................................................33
      7.9. STOP ......................................................34
      7.10. BARGE-IN-OCCURRED ........................................35
      7.11. PAUSE ....................................................37
      7.12. RESUME ...................................................37
      7.13. CONTROL ..................................................38
      7.14. SPEAK-COMPLETE ...........................................40

Top      ToC       Page 3 
      7.15. SPEECH-MARKER ............................................41
   8. Speech Recognizer Resource .....................................42
      8.1. Recognizer State Machine ..................................42
      8.2. Recognizer Methods ........................................42
      8.3. Recognizer Events .........................................43
      8.4. Recognizer Header Fields ..................................43
      8.5. Recognizer Message Body ...................................51
      8.6. SET-PARAMS ................................................56
      8.7. GET-PARAMS ................................................56
      8.8. DEFINE-GRAMMAR ............................................57
      8.9. RECOGNIZE .................................................60
      8.10. STOP .....................................................63
      8.11. GET-RESULT ...............................................64
      8.12. START-OF-SPEECH ..........................................64
      8.13. RECOGNITION-START-TIMERS .................................65
      8.14. RECOGNITON-COMPLETE ......................................65
      8.15. DTMF Detection ...........................................67
   9. Future Study ...................................................67
   10. Security Considerations .......................................67
   11. RTSP-Based Examples ...........................................67
   12. Informative References ........................................74
   Appendix A. ABNF Message Definitions ..............................76
   Appendix B. Acknowledgements ......................................84

1.  Introduction

   The Media Resource Control Protocol (MRCP) is designed to provide a
   mechanism for a client device requiring audio/video stream processing
   to control processing resources on the network.  These media
   processing resources may be speech recognizers (a.k.a. Automatic-
   Speech-Recognition (ASR) engines), speech synthesizers (a.k.a. Text-
   To-Speech (TTS) engines), fax, signal detectors, etc.  MRCP allows
   implementation of distributed Interactive Voice Response platforms,
   for example VoiceXML [6] interpreters.  The MRCP protocol defines the
   requests, responses, and events needed to control the media
   processing resources.  The MRCP protocol defines the state machine
   for each resource and the required state transitions for each request
   and server-generated event.

   The MRCP protocol does not address how the control session is
   established with the server and relies on the Real Time Streaming
   Protocol (RTSP) [2] to establish and maintain the session.  The
   session control protocol is also responsible for establishing the
   media connection from the client to the network server.  The MRCP
   protocol and its messaging is designed to be carried over RTSP or
   another protocol as a MIME-type similar to the Session Description
   Protocol (SDP) [5].

Top      ToC       Page 4 
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [8].

2.  Architecture

   The system consists of a client that requires media streams generated
   or needs media streams processed and a server that has the resources
   or devices to process or generate the streams.  The client
   establishes a control session with the server for media processing
   using a protocol such as RTSP.  This will also set up and establish
   the RTP stream between the client and the server or another RTP
   endpoint.  Each resource needed in processing or generating the
   stream is addressed or referred to by a URL.  The client can now use
   MRCP messages to control the media resources and affect how they
   process or generate the media stream.

     |--------------------|
     ||------------------||                   |----------------------|
     || Application Layer||                   ||--------------------||
     ||------------------||                   || TTS  | ASR  | Fax  ||
     ||  ASR/TTS API     ||                   ||Plugin|Plugin|Plugin||
     ||------------------||                   ||  on  |  on  |  on  ||
     ||    MRCP Core     ||                   || MRCP | MRCP | MRCP ||
     ||  Protocol Stack  ||                   ||--------------------||
     ||------------------||                   ||   RTSP Stack       ||
     ||   RTSP Stack     ||                   ||                    ||
     ||------------------||                   ||--------------------||
     ||   TCP/IP Stack   ||========IP=========||  TCP/IP Stack      ||
     ||------------------||                   ||--------------------||
     |--------------------|                   |----------------------|

        MRCP client                             Real-time Streaming MRCP
                                                 media server

2.1.  Resources and Services

   The server is set up to offer a certain set of resources and services
   to the client.  These resources are of 3 types.

   Transmission Resources

   These are resources that are capable of generating real-time streams,
   like signal generators that generate tones and sounds of certain
   frequencies and patterns, and speech synthesizers that generate
   spoken audio streams, etc.

Top      ToC       Page 5 
   Reception Resources

   These are resources that receive and process streaming data like
   signal detectors and speech recognizers.

   Dual Mode Resources

   These are resources that both send and receive data like a fax
   resource, capable of sending or receiving fax through a two-way RTP
   stream.

2.2.  Server and Resource Addressing

   The server as a whole is addressed using a container URL, and the
   individual resources the server has to offer are reached by
   individual resource URLs within the container URL.

   RTSP Example:

   A media server or container URL like,

     rtsp://mediaserver.com/media/

   may contain one or more resource URLs of the form,

     rtsp://mediaserver.com/media/speechrecognizer/
     rtsp://mediaserver.com/media/speechsynthesizer/
     rtsp://mediaserver.com/media/fax/

3.  MRCP Protocol Basics

   The message format for MRCP is text based, with mechanisms to carry
   embedded binary data.  This allows data like recognition grammars,
   recognition results, synthesizer speech markup, etc., to be carried
   in the MRCP message between the client and the server resource.  The
   protocol does not address session control management, media
   management, reliable sequencing, and delivery or server or resource
   addressing.  These are left to a protocol like SIP or RTSP.  MRCP
   addresses the issue of controlling and communicating with the
   resource processing the stream, and defines the requests, responses,
   and events needed to do that.

3.1.  Establishing Control Session and Media Streams

   The control session between the client and the server is established
   using a protocol like RTSP.  This protocol will also set up the
   appropriate RTP streams between the server and the client, allocating
   ports and setting up transport parameters as needed.  Each control

Top      ToC       Page 6 
   session is identified by a unique session-id.  The format, usage, and
   life cycle of the session-id is in accordance with the RTSP protocol.
   The resources within the session are addressed by the individual
   resource URLs.

   The MRCP protocol is designed to work with and tunnel through another
   protocol like RTSP, and augment its capabilities.  MRCP relies on
   RTSP headers for sequencing, reliability, and addressing to make sure
   that messages get delivered reliably and in the correct order and to
   the right resource.  The MRCP messages are carried in the RTSP
   message body.  The media server delivers the MRCP message to the
   appropriate resource or device by looking at the session-level
   message headers and URL information.  Another protocol, such as SIP
   [4], could be used for tunneling MRCP messages.

3.2.  MRCP over RTSP

   RTSP supports both TCP and UDP mechanisms for the client to talk to
   the server and is differentiated by the RTSP URL.  All MRCP based
   media servers MUST support TCP for transport and MAY support UDP.

   In RTSP, the ANNOUNCE method/response MUST be used to carry MRCP
   request/responses between the client and the server.  MRCP messages
   MUST NOT be communicated in the RTSP SETUP or TEARDOWN messages.

   Currently all RTSP messages are request/responses and there is no
   support for asynchronous events in RTSP.  This is because RTSP was
   designed to work over TCP or UDP and, hence, could not assume
   reliability in the underlying protocol.  Hence, when using MRCP over
   RTSP, an asynchronous event from the MRCP server is packaged in a
   server-initiated ANNOUNCE method/response communication.  A future
   RTSP extension to send asynchronous events from the server to the
   client would provide an alternate vehicle to carry such asynchronous
   MRCP events from the server.

   An RTSP session is created when an RTSP SETUP message is sent from
   the client to a server and is addressed to a server URL or any one of
   its resource URLs without specifying a session-id.  The server will
   establish a session context and will respond with a session-id to the
   client.  This sequence will also set up the RTP transport parameters
   between the client and the server, and then the server will be ready
   to receive or send media streams.  If the client wants to attach an
   additional resource to an existing session, the client should send
   that session's ID in the subsequent SETUP message.

Top      ToC       Page 7 
   When a media server implementing MRCP over RTSP receives a PLAY,
   RECORD, or PAUSE RTSP method from an MRCP resource URL, it should
   respond with an RTSP 405 "Method not Allowed" response.  For these
   resources, the only allowed RTSP methods are SETUP, TEARDOWN,
   DESCRIBE, and ANNOUNCE.

   Example 1:

   C->S:  ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
          CSeq:4
          Session:12345678
          Content-Type:application/mrcp
          Content-Length:223

          SPEAK 543257 MRCP/1.0
          Voice-gender:neutral
          Voice-category:teenager
          Prosody-volume:medium
          Content-Type:application/synthesis+ssml
          Content-Length:104

          <?xml version="1.0"?>
          <speak>
           <paragraph>
             <sentence>You have 4 new messages.</sentence>
             <sentence>The first is from <say-as
             type="name">Stephanie Williams</say-as>
             and arrived at <break/>
             <say-as type="time">3:45pm</say-as>.</sentence>

             <sentence>The subject is <prosody
             rate="-20%">ski trip</prosody></sentence>
           </paragraph>
          </speak>

   S->C:  RTSP/1.0 200 OK
          CSeq: 4
          Session:12345678
          RTP-Info:url=rtsp://media.server.com/media/synthesizer;
                    seq=9810092;rtptime=3450012
          Content-Type:application/mrcp
          Content-Length:52

          MRCP/1.0 543257 200 IN-PROGRESS

   S->C:  ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
          CSeq:6
          Session:12345678

Top      ToC       Page 8 
          Content-Type:application/mrcp
          Content-Length:123

          SPEAK-COMPLETE 543257 COMPLETE MRCP/1.0

   C->S:  RTSP/1.0 200 OK
          CSeq:6

   For the sake of brevity, most examples from here on show only the
   MRCP messages and do not show the RTSP message and headers in which
   they are tunneled.  Also, RTSP messages such as response that are not
   carrying an MRCP message are also left out.

3.3.  Media Streams and RTP Ports

   A single set of RTP/RTCP ports is negotiated and shared between the
   MRCP client and server when multiple media processing resources, such
   as automatic speech recognition (ASR) engines and text to speech
   (TTS) engines, are used for a single session.  The individual
   resource instances allocated on the server under a common session
   identifier will feed from/to that single RTP stream.

   The client can send multiple media streams towards the server,
   differentiated by using different synchronized source (SSRC)
   identifier values.  Similarly the server can use multiple
   Synchronized Source (SSRC) identifier values to differentiate media
   streams originating from the individual transmission resource URLs if
   more than one exists.  The individual resources may, on the other
   hand, work together to send just one stream to the client.  This is
   up to the implementation of the media server.

4.  Notational Conventions

   Since many of the definitions and syntax are identical to HTTP/1.1,
   this specification only points to the section where they are defined
   rather than copying it.  For brevity, [HX.Y] refers to Section X.Y of
   the current HTTP/1.1 specification (RFC 2616 [1]).

   All the mechanisms specified in this document are described in both
   prose and an augmented Backus-Naur form (ABNF) similar to that used
   in [H2.1].  It is described in detail in RFC 4234 [3].

   The ABNF provided along with the descriptive text is informative in
   nature and may not be complete.  The complete message format in ABNF
   form is provided in Appendix A and is the normative format
   definition.

Top      ToC       Page 9 
5.  MRCP Specification

   The MRCP PDU is textual using an ISO 10646 character set in the UTF-8
   encoding (RFC 3629 [12]) to allow many different languages to be
   represented.  However, to assist in compact representations, MRCP
   also allows other character sets such as ISO 8859-1 to be used when
   desired.  The MRCP protocol headers and field names use only the
   US-ASCII subset of UTF-8.  Internationalization only applies to
   certain fields like grammar, results, speech markup, etc., and not to
   MRCP as a whole.

   Lines are terminated by CRLF, but receivers SHOULD be prepared to
   also interpret CR and LF by themselves as line terminators.  Also,
   some parameters in the PDU may contain binary data or a record
   spanning multiple lines.  Such fields have a length value associated
   with the parameter, which indicates the number of octets immediately
   following the parameter.

   The whole MRCP PDU is encoded in the body of the session level
   message as a MIME entity of type application/mrcp.  The individual
   MRCP messages do not have addressing information regarding which
   resource the request/response are to/from.  Instead, the MRCP message
   relies on the header of the session level message carrying it to
   deliver the request to the appropriate resource, or to figure out who
   the response or event is from.

   The MRCP message set consists of requests from the client to the
   server, responses from the server to the client and asynchronous
   events from the server to the client.  All these messages consist of
   a start-line, one or more header fields (also known as "headers"), an
   empty line (i.e., a line with nothing preceding the CRLF) indicating
   the end of the header fields, and an optional message body.

          generic-message =   start-line
                              message-header
                              CRLF
                              [ message-body ]

          message-body    =   *OCTET

          start-line      =   request-line / status-line / event-line

   The message-body contains resource-specific and message-specific data
   that needs to be carried between the client and server as a MIME
   entity.  The information contained here and the actual MIME-types
   used to carry the data are specified later when addressing the
   specific messages.

Top      ToC       Page 10 
   If a message contains data in the message body, the header fields
   will contain content-headers indicating the MIME-type and encoding of
   the data in the message body.

5.1.  Request

   An MRCP request consists of a Request line followed by zero or more
   parameters as part of the message headers and an optional message
   body containing data specific to the request message.

   The Request message from a client to the server includes, within the
   first line, the method to be applied, a method tag for that request,
   and the version of protocol in use.

     request-line   =    method-name SP request-id SP
                         mrcp-version CRLF

   The request-id field is a unique identifier created by the client and
   sent to the server.  The server resource should use this identifier
   in its response to this request.  If the request does not complete
   with the response, future asynchronous events associated with this
   request MUST carry the request-id.

     request-id    =    1*DIGIT

   The method-name field identifies the specific request that the client
   is making to the server.  Each resource supports a certain list of
   requests or methods that can be issued to it, and will be addressed
   in later sections.

     method-name    =    synthesizer-method
                    /    recognizer-method

   The mrcp-version field is the MRCP protocol version that is being
   used by the client.

     mrcp-version   =    "MRCP" "/" 1*DIGIT "." 1*DIGIT

5.2.  Response

   After receiving and interpreting the request message, the server
   resource responds with an MRCP response message.  It consists of a
   status line optionally followed by a message body.

     response-line  =    mrcp-version SP request-id SP status-code SP
                         request-state CRLF

Top      ToC       Page 11 
   The mrcp-version field used here is similar to the one used in the
   Request Line and indicates the version of MRCP protocol running on
   the server.

   The request-id used in the response MUST match the one sent in the
   corresponding request message.

   The status-code field is a 3-digit code representing the success or
   failure or other status of the request.

   The request-state field indicates if the job initiated by the Request
   is PENDING, IN-PROGRESS, or COMPLETE.  The COMPLETE status means that
   the Request was processed to completion and that there will be no
   more events from that resource to the client with that request-id.
   The PENDING status means that the job has been placed on a queue and
   will be processed in first-in-first-out order.  The IN-PROGRESS
   status means that the request is being processed and is not yet
   complete.  A PENDING or IN-PROGRESS status indicates that further
   Event messages will be delivered with that request-id.

     request-state    =  "COMPLETE"
                      /  "IN-PROGRESS"
                      /  "PENDING"

5.2.1.  Status Codes

   The status codes are classified under the Success(2XX) codes and the
   Failure(4XX) codes.

5.2.1.1.  Success 2xx

      200       Success
      201       Success with some optional parameters ignored.

5.2.1.2.  Failure 4xx

      401       Method not allowed
      402       Method not valid in this state
      403       Unsupported Parameter
      404       Illegal Value for Parameter
      405       Not found (e.g., Resource URI not initialized
                or doesn't exist)
      406       Mandatory Parameter Missing
      407       Method or Operation Failed (e.g., Grammar compilation
                failed in the recognizer.  Detailed cause codes MAY BE
                available through a resource specific header field.)
      408       Unrecognized or unsupported message entity

Top      ToC       Page 12 
      409       Unsupported Parameter Value
      421-499   Resource specific Failure codes

5.3.  Event

   The server resource may need to communicate a change in state or the
   occurrence of a certain event to the client.  These messages are used
   when a request does not complete immediately and the response returns
   a status of PENDING or IN-PROGRESS.  The intermediate results and
   events of the request are indicated to the client through the event
   message from the server.  Events have the request-id of the request
   that is in progress and is generating these events and status value.
   The status value is COMPLETE if the request is done and this was the
   last event, else it is IN-PROGRESS.

     event-line       =  event-name SP request-id SP request-state SP
                         mrcp-version CRLF

   The mrcp-version used here is identical to the one used in the
   Request/Response Line and indicates the version of MRCP protocol
   running on the server.

   The request-id used in the event should match the one sent in the
   request that caused this event.

   The request-state indicates if the Request/Command causing this event
   is complete or still in progress, and is the same as the one
   mentioned in Section 5.2.  The final event will contain a COMPLETE
   status indicating the completion of the request.

   The event-name identifies the nature of the event generated by the
   media resource.  The set of valid event names are dependent on the
   resource generating it, and will be addressed in later sections.

     event-name       =  synthesizer-event
                      /  recognizer-event

5.4.  Message Headers

   MRCP header fields, which include general-header (Section 5.4) and
   resource-specific-header (Sections 7.4 and 8.4), follow the same
   generic format as that given in Section 2.1 of RFC 2822 [7].  Each
   header field consists of a name followed by a colon (":") and the
   field value.  Field names are case-insensitive.  The field value MAY
   be preceded by any amount of linear whitespace (LWS), though a single
   SP is preferred.  Header fields can be extended over multiple lines
   by preceding each extra line with at least one SP or HT.

Top      ToC       Page 13 
          message-header =    1*(generic-header / resource-header)

   The order in which header fields with differing field names are
   received is not significant.  However, it is "good practice" to send
   general-header fields first, followed by request-header or response-
   header fields, and ending with the entity-header fields.

   Multiple message-header fields with the same field-name MAY be
   present in a message if and only if the entire field value for that
   header field is defined as a comma-separated list (i.e., #(values)).

   It MUST be possible to combine the multiple header fields into one
   "field-name:field-value" pair, without changing the semantics of the
   message, by appending each subsequent field-value to the first, each
   separated by a comma.  Therefore, the order in which header fields
   with the same field-name are received is significant to the
   interpretation of the combined field value, and thus a proxy MUST NOT
   change the order of these field values when a message is forwarded.

   Generic Headers

     generic-header      =    active-request-id-list
                         /    proxy-sync-id
                         /    content-id
                         /    content-type
                         /    content-length
                         /    content-base
                         /    content-location
                         /    content-encoding
                         /    cache-control
                         /    logging-tag

   All headers in MRCP will be case insensitive, consistent with HTTP
   and RTSP protocol header definitions.

5.4.1.  Active-Request-Id-List

   In a request, this field indicates the list of request-ids to which
   it should apply.  This is useful when there are multiple Requests
   that are PENDING or IN-PROGRESS and you want this request to apply to
   one or more of these specifically.

   In a response, this field returns the list of request-ids that the
   operation modified or were in progress or just completed.  There
   could be one or more requests that returned a request-state of
   PENDING or IN-PROGRESS.  When a method affecting one or more PENDING

Top      ToC       Page 14 
   or IN-PROGRESS requests is sent from the client to the server, the
   response MUST contain the list of request-ids that were affected in
   this header field.

   The active-request-id-list is only used in requests and responses,
   not in events.

   For example, if a STOP request with no active-request-id-list is sent
   to a synthesizer resource (a wildcard STOP) that has one or more
   SPEAK requests in the PENDING or IN-PROGRESS state, all SPEAK
   requests MUST be cancelled, including the one IN-PROGRESS.  In
   addition, the response to the STOP request would contain the
   request-id of all the SPEAK requests that were terminated in the
   active-request-id-list.  In this case, no SPEAK-COMPLETE or
   RECOGNITION-COMPLETE events will be sent for these terminated
   requests.

     active-request-id-list  =  "Active-Request-Id-List" ":" request-id
                                 *("," request-id) CRLF

5.4.2.  Proxy-Sync-Id

   When any server resource generates a barge-in-able event, it will
   generate a unique Tag and send it as a header field in an event to
   the client.  The client then acts as a proxy to the server resource
   and sends a BARGE-IN-OCCURRED method (Section 7.10) to the
   synthesizer server resource with the Proxy-Sync-Id it received from
   the server resource.  When the recognizer and synthesizer resources
   are part of the same session, they may choose to work together to
   achieve quicker interaction and response.  Here, the proxy-sync-id
   helps the resource receiving the event, proxied by the client, to
   decide if this event has been processed through a direct interaction
   of the resources.

     proxy-sync-id    =  "Proxy-Sync-Id" ":" 1*ALPHA CRLF

5.4.3.  Accept-Charset

   See [H14.2].  This specifies the acceptable character set for
   entities returned in the response or events associated with this
   request.  This is useful in specifying the character set to use in
   the Natural Language Semantics Markup Language (NLSML) results of a
   RECOGNITON-COMPLETE event.

Top      ToC       Page 15 
5.4.4.  Content-Type

   See [H14.17].  Note that the content types suitable for MRCP are
   restricted to speech markup, grammar, recognition results, etc., and
   are specified later in this document.  The multi-part content type
   "multi-part/mixed" is supported to communicate multiple of the above
   mentioned contents, in which case the body parts cannot contain any
   MRCP specific headers.

5.4.5.  Content-Id

   This field contains an ID or name for the content, by which it can be
   referred to.  The definition of this field conforms to RFC 2392 [14],
   RFC 2822 [7], RFC 2046 [13] and is needed in multi-part messages.  In
   MRCP whenever the content needs to be stored, by either the client or
   the server, it is stored associated with this ID.  Such content can
   be referenced during the session in URI form using the session:URI
   scheme described in a later section.

5.4.6.  Content-Base

   The content-base entity-header field may be used to specify the base
   URI for resolving relative URLs within the entity.

     content-base      = "Content-Base" ":" absoluteURI CRLF

   Note, however, that the base URI of the contents within the entity-
   body may be redefined within that entity-body.  An example of this
   would be a multi-part MIME entity, which in turn can have multiple
   entities within it.

5.4.7.  Content-Encoding

   The content-encoding entity-header field is used as a modifier to the
   media-type.  When present, its value indicates what additional
   content coding has been applied to the entity-body, and thus what
   decoding mechanisms must be applied in order to obtain the media-type
   referenced by the content-type header field.  Content-encoding is
   primarily used to allow a document to be compressed without losing
   the identity of its underlying media type.

          content-encoding =  "Content-Encoding" ":"
                              *WSP content-coding
                              *(*WSP "," *WSP content-coding *WSP )
                              CRLF

          content-coding   =  token

Top      ToC       Page 16 
          token            =  1*(alphanum / "-" / "." / "!" / "%" / "*"
                              / "_" / "+" / "`" / "'" / "~" )

   Content coding is defined in [H3.5].  An example of its use is

     Content-Encoding:gzip

   If multiple encodings have been applied to an entity, the content
   codings MUST be listed in the order in which they were applied.

5.4.8.  Content-Location

   The content-location entity-header field MAY BE used to supply the
   resource location for the entity enclosed in the message when that
   entity is accessible from a location separate from the requested
   resource's URI.

     content-location =  "Content-Location" ":" ( absoluteURI /
                             relativeURI ) CRLF

   The content-location value is a statement of the location of the
   resource corresponding to this particular entity at the time of the
   request.  The media server MAY use this header field to optimize
   certain operations.  When providing this header field, the entity
   being sent should not have been modified from what was retrieved from
   the content-location URI.

   For example, if the client provided a grammar markup inline, and it
   had previously retrieved it from a certain URI, that URI can be
   provided as part of the entity, using the content-location header
   field.  This allows a resource like the recognizer to look into its
   cache to see if this grammar was previously retrieved, compiled, and
   cached.  In which case, it might optimize by using the previously
   compiled grammar object.

   If the content-location is a relative URI, the relative URI is
   interpreted relative to the content-base URI.

5.4.9.  Content-Length

   This field contains the length of the content of the message body
   (i.e., after the double CRLF following the last header field).
   Unlike HTTP, it MUST be included in all messages that carry content
   beyond the header portion of the message.  If it is missing, a
   default value of zero is assumed.  It is interpreted according to
   [H14.13].

Top      ToC       Page 17 
5.4.10.  Cache-Control

   If the media server plans on implementing caching, it MUST adhere to
   the cache correctness rules of HTTP 1.1 (RFC2616), when accessing and
   caching HTTP URI.  In particular, the expires and cache-control
   headers of the cached URI or document must be honored and will always
   take precedence over the Cache-Control defaults set by this header
   field.  The cache-control directives are used to define the default
   caching algorithms on the media server for the session or request.
   The scope of the directive is based on the method it is sent on.  If
   the directives are sent on a SET-PARAMS method, it SHOULD apply for
   all requests for documents the media server may make in that session.
   If the directives are sent on any other messages, they MUST only
   apply to document requests the media server needs to make for that
   method.  An empty cache-control header on the GET-PARAMS method is a
   request for the media server to return the current cache-control
   directives setting on the server.

          cache-control  =    "Cache-Control" ":" *WSP cache-directive
                              *( *WSP "," *WSP cache-directive *WSP )
                              CRLF

          cache-directive =   "max-age" "=" delta-seconds
                          /   "max-stale" "=" delta-seconds
                          /   "min-fresh" "=" delta-seconds

          delta-seconds       = 1*DIGIT

   Here, delta-seconds is a time value to be specified as an integer
   number of seconds, represented in decimal, after the time that the
   message response or data was received by the media server.

   These directives allow the media server to override the basic
   expiration mechanism.

   max-age

      Indicates that the client is OK with the media server using a
      response whose age is no greater than the specified time in
      seconds.  Unless a max-stale directive is also included, the
      client is not willing to accept the media server using a stale
      response.

   min-fresh

      Indicates that the client is willing to accept the media server
      using a response whose freshness lifetime is no less than its
      current age plus the specified time in seconds.  That is, the

Top      ToC       Page 18 
      client wants the media server to use a response that will still be
      fresh for at least the specified number of seconds.

   max-stale

      Indicates that the client is willing to accept the media server
      using a response that has exceeded its expiration time.  If max-
      stale is assigned a value, then the client is willing to accept
      the media server using a response that has exceeded its expiration
      time by no more than the specified number of seconds.  If no value
      is assigned to max-stale, then the client is willing to accept the
      media server using a stale response of any age.

   The media server cache MAY BE requested to use stale response/data
   without validation, but only if this does not conflict with any
   "MUST"-level requirements concerning cache validation (e.g., a
   "must-revalidate" cache-control directive) in the HTTP 1.1
   specification pertaining the URI.

   If both the MRCP cache-control directive and the cached entry on the
   media server include "max-age" directives, then the lesser of the two
   values is used for determining the freshness of the cached entry for
   that request.

5.4.11.  Logging-Tag

   This header field MAY BE sent as part of a SET-PARAMS/GET-PARAMS
   method to set the logging tag for logs generated by the media server.
   Once set, the value persists until a new value is set or the session
   is ended.  The MRCP server should provide a mechanism to subset its
   output logs so that system administrators can examine or extract only
   the log file portion during which the logging tag was set to a
   certain value.

   MRCP clients using this feature should take care to ensure that no
   two clients specify the same logging tag.  In the event that two
   clients specify the same logging tag, the effect on the MRCP server's
   output logs in undefined.

     logging-tag    =    "Logging-Tag" ":" 1*ALPHA CRLF

Top      ToC       Page 19 
6.  Media Server

   The capability of media server resources can be found using the RTSP
   DESCRIBE mechanism.  When a client issues an RTSP DESCRIBE method for
   a media resource URI, the media server response MUST contain an SDP
   description in its body describing the capabilities of the media
   server resource.  The SDP description MUST contain at a minimum the
   media header (m-line) describing the codec and other media related
   features it supports.  It MAY contain another SDP header as well, but
   support for it is optional.

   The usage of SDP messages in the RTSP message body and its
   application follows the SIP RFC 2543 [4], but is limited to media-
   related negotiation and description.

6.1.  Media Server Session

   As discussed in Section 3.2, a client/server should share one RTSP
   session-id for the different resources it may use under the same
   session.  The client MUST allocate a set of client RTP/RTCP ports for
   a new session and MUST NOT send a Session-ID in the SETUP message for
   the first resource.  The server then creates a Session-ID and
   allocates a set of server RTP/RTCP ports and responds to the SETUP
   message.

   If the client wants to open more resources with the same server under
   the same session, it will send the session-id (that it got in the
   earlier SETUP response) in the SETUP for the new resource.  A SETUP
   message with an existing session-id tells the server that this new
   resource will feed from/into the same RTP/RTCP stream of that
   existing session.

   If the client wants to open a resource from a media server that is
   not where the first resource came from, it will send separate SETUP
   requests with no session-id header field in them.  Each server will
   allocate its own session-id and return it in the response.  Each of
   them will also come back with their own set of RTP/RTCP ports.  This
   would be the case when the synthesizer engine and the recognition
   engine are on different servers.

   The RTSP SETUP method SHOULD contain an SDP description of the media
   stream being set up.  The RTSP SETUP response MUST contain an SDP
   description of the media stream that it expects to receive and send
   on that session.

   The SDP description in the SETUP method from the client SHOULD
   describe the required media parameters like codec, Named Signaling
   Event (NSE) payload types, etc.  This could have multiple media

Top      ToC       Page 20 
   headers (i.e., m-lines) to allow the client to provide the media
   server with more than one option to choose from.

   The SDP description in the SETUP response should reflect the media
   parameters that the media server will be using for the stream.  It
   should be within the choices that were specified in the SDP of the
   SETUP method, if one was provided.

   Example:

     C->S:

       SETUP rtsp://media.server.com/recognizer/ RTSP/1.0
       CSeq:1
       Transport:RTP/AVP;unicast;client_port=46456-46457
       Content-Type:application/sdp
       Content-Length:190

       v=0
       o=- 123 456 IN IP4 10.0.0.1
       s=Media Server
       p=+1-888-555-1212
       c=IN IP4 0.0.0.0
       t=0 0
       m=audio 46456 RTP/AVP 0 96
       a=rtpmap:0 pcmu/8000
       a=rtpmap:96 telephone-event/8000
       a=fmtp:96 0-15

     S->C:

       RTSP/1.0 200 OK
       CSeq:1
       Session:0a030258_00003815_3bc4873a_0001_0000
       Transport:RTP/AVP;unicast;client_port=46456-46457;
                  server_port=46460-46461
       Content-Length:190
       Content-Type:application/sdp

       v=0
       o=- 3211724219 3211724219 IN IP4 10.3.2.88
       s=Media Server
       c=IN IP4 0.0.0.0
       t=0 0
       m=audio 46460 RTP/AVP 0 96
       a=rtpmap:0 pcmu/8000
       a=rtpmap:96 telephone-event/8000
       a=fmtp:96 0-15

Top      ToC       Page 21 
   If an SDP description was not provided in the RTSP SETUP method, then
   the media server may decide on parameters of the stream but MUST
   specify what it chooses in the SETUP response.  An SDP announcement
   is only returned in a response to a SETUP message that does not
   specify a Session.  That is, the server will not return an SDP
   announcement for the synthesizer SETUP of a session already
   established with a recognizer.

     C->S:

       SETUP rtsp://media.server.com/recognizer/ RTSP/1.0
       CSeq:1
       Transport:RTP/AVP;unicast;client_port=46498

     S->C:

       RTSP/1.0 200 OK
       CSeq:1
       Session:0a030258_000039dc_3bc48a13_0001_0000
       Transport:RTP/AVP;unicast; client_port=46498;
                  server_port=46502-46503
       Content-Length:193
       Content-Type:application/sdp

       v=0
       o=- 3211724947 3211724947 IN IP4 10.3.2.88
       s=Media Server
       c=IN IP4 0.0.0.0
       t=0 0
       m=audio 46502 RTP/AVP 0 101
       a=rtpmap:0 pcmu/8000
       a=rtpmap:101 telephone-event/8000
       a=fmtp:101 0-15



(page 21 continued on part 2)

Next RFC Part