RFC 4695

RTP Payload Format for MIDI

Pages: 169
Obsoleted by: 6295

Part 6 of 7 – Pages 121 to 156

noToC RFC4695 - Page 121 prevText

C.5.  Configuration Tools: Stream Description

   As we discussed in Section 2.1, a party may send several RTP MIDI
   streams in the same RTP session, and several RTP sessions that carry
   MIDI may appear in a multimedia session.

   By default, the MIDI name space (16 channels + systems) of each RTP
   stream sent by a party in a multimedia session is independent.  By
   independent, we mean three distinct things:

     o  If a party sends two RTP MIDI streams (A and B), MIDI voice
        channel 0 in stream A is a different "channel 0" than MIDI voice
        channel 0 in stream B.

     o  MIDI voice channel 0 in stream B is not considered to be
        "channel 16" of a 32-channel MIDI voice channel space whose
        "channel 0" is channel 0 of stream A.

     o  Streams sent by different parties over different RTP sessions,
        or over the same RTP session but with different payload type
        numbers, do not share the association that is shared by a MIDI
        cable pair that cross-connects two devices in a MIDI 1.0 DIN
        network.  By default, this association is only held by streams
        sent by different parties in the same RTP session that use the
        same payload type number.

   In this appendix, we show how to express that specific RTP MIDI
   streams in a multimedia session are not independent but instead are
   related in one of the three ways defined above.  We use two tools to
   express these relations:

     o  The musicport parameter.  This parameter is assigned a non-
        negative integer value between 0 and 4294967295.  It appears in
        the fmtp lines of payload types.

noToC RFC4695 - Page 122

     o  The FID grouping attribute [RFC3388] signals that several RTP
        sessions in a multimedia session are using the musicport
        parameter to express an inter-session relationship.

   If a multimedia session has several payload types whose musicport
   parameters are assigned the same integer value, streams using these
   payload types share an "identity relationship" (including streams
   that use the same payload type).  Streams in an identity relationship
   share two properties:

     o  Identity relationship streams sent by the same party target the
        same MIDI name space.  Thus, if streams A and B share an
        identity relationship, voice channel 0 in stream A is the same
        "channel 0" as voice channel 0 in stream B.

     o  Pairs of identity relationship streams that are sent by
        different parties share the association that is shared by a MIDI
        cable pair that cross-connects two devices in a MIDI 1.0 DIN
        network.

   A party MUST NOT send two RTP MIDI streams that share an identity
   relationship in the same RTP session.  Instead, each stream MUST be
   in a separate RTP session.  As explained in Section 2.1, this
   restriction is necessary to support the RTP MIDI method for the
   synchronization of streams that share a MIDI name space.

   If a multimedia session has several payload types whose musicport
   parameters are assigned sequential values (i.e., i, i+1, ... i+k),
   the streams using the payload types share an "ordered relationship".
   For example, if payload type A assigns 2 to musicport and payload
   type B assigns 3 to musicport, A and B are in an ordered
   relationship.

   Streams in an ordered relationship that are sent by the same party
   are considered by renderers to form a single larger MIDI space.  For
   example, if stream A has a musicport value of 2 and stream B has a
   musicport value of 3, MIDI voice channel 0 in stream B is considered
   to be voice channel 16 in the larger MIDI space formed by the
   relationship.  Note that it is possible for streams to participate in
   both an identity relationship and an ordered relationship.

   We now state several rules for using musicport:

     o  If streams from several RTP sessions in a multimedia session use
        the musicport parameter, the RTP sessions MUST be grouped using
        the FID grouping attribute defined in [RFC3388].

noToC RFC4695 - Page 123

     o  An ordered or identity relationship MUST NOT contain both native
        RTP MIDI streams and mpeg4-generic RTP MIDI streams.  An
        exception applies if a relationship consists of sendonly and
        recvonly (but not sendrecv) streams.  In this case, the sendonly
        streams MUST NOT contain both types of streams, and the recvonly
        streams MUST NOT contain both types of streams.

     o  It is possible to construct identity relationships that violate
        the recovery journal mandate (for example, sending NoteOns for a
        voice channel on stream A and NoteOffs for the same voice
        channel on stream B).  Parties MUST NOT generate (or accept)
        session descriptions that exhibit this flaw.

     o  Other payload formats MAY define musicport media type
        parameters.  Formats would define these parameters so that their
        sessions could be bundled into RTP MIDI name spaces.  The
        parameter definitions MUST be compatible with the musicport
        semantics defined in this appendix.

   As a rule, at most one payload type in a relationship may specify a
   MIDI renderer.  An exception to the rule applies to relationships
   that contain sendonly and recvonly streams but no sendrecv streams.
   In this case, one sendonly session and one recvonly session may each
   define a renderer.

   Renderer specification in a relationship may be done using the tools
   described in Appendix C.6.  These tools work for both native streams
   and mpeg4-generic streams.  An mpeg4-generic stream that uses the
   Appendix C.6 tools MUST set all "config" parameters to the empty
   string ("").

   Alternatively, for mpeg4-generic streams, renderer specification may
   be done by setting one "config" parameter in the relationship to the
   renderer configuration string, and all other config parameters to the
   empty string ("").

   We now define sender and receiver rules that apply when a party sends
   several streams that target the same MIDI name space.

   Senders MAY use the subsetting parameters (Appendix C.1) to predefine
   the partitioning of commands between streams, or they MAY use a
   dynamic partitioning strategy.

   Receivers that merge identity relationship streams into a single MIDI
   command stream MUST maintain the structural integrity of the MIDI
   commands coded in each stream during the merging process, in the same
   way that software that merges traditional MIDI 1.0 DIN cable flows is

noToC RFC4695 - Page 124

   responsible for creating a merged command flow compatible with
   [MIDI].

   Senders MUST partition the name space so that the rendered MIDI
   performance does not contain indefinite artifacts (as defined in
   Section 4).  This responsibility holds even if all streams are sent
   over reliable transport, as different stream latencies may yield
   indefinite artifacts.  For example, stuck notes may occur in a
   performance split over two TCP streams, if NoteOn commands are sent
   on one stream and NoteOff commands are sent on the other.

   Senders MUST NOT split a Registered Parameter Name (RPN) or Non-
   Registered Parameter Name (NRPN) transaction appearing on a MIDI
   channel across multiple identity relationship sessions.  Receivers
   MUST assume that the RPN/NRPN transactions that appear on different
   identity relationship sessions are independent and MUST preserve
   transactional integrity during the MIDI merge.

   A simple way to safely partition voice channel commands is to place
   all MIDI commands for a particular voice channel into the same
   session.  Safe partitioning of MIDI Systems commands may be more
   complicated for sessions that extensively use System Exclusive.

   We now show several session description examples that use the
   musicport parameter.

   Our first session description example shows two RTP MIDI streams that
   drive the same General MIDI decoder.  The sender partitions MIDI
   commands between the streams dynamically.  The musicport values
   indicate that the streams share an identity relationship.

noToC RFC4695 - Page 125

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.94
   m=audio 5004 RTP/AVP 96
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:1
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0
   000000600FF2F000; musicport=12
   m=audio 5006 RTP/AVP 96
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:2
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; musicport=12

   (The a=fmtp lines have been wrapped to fit the page to accommodate
    memo formatting restrictions; they comprise single lines in SDP.)

   Recall that Section 2.1 defines rules for streams that target the
   same MIDI name space.  Those rules, implemented in the example above,
   require that each stream resides in a separate RTP session, and that
   the grouping mechanisms defined in [RFC3388] signal an inter-session
   relationship.  The "group" and "mid" attribute lines implement this
   grouping mechanism.

   A variant on this example, whose session description is not shown,
   would use two streams in an identity relationship driving the same
   MIDI renderer, each with a different transport type.  One stream
   would use UDP and would be dedicated to real-time messages.  A second
   stream would use TCP [RFC4571] and would be used for SysEx bulk data
   messages.

noToC RFC4695 - Page 126

   In the next example, two mpeg4-generic streams form an ordered
   relationship to drive a Structured Audio decoder with 32 MIDI voice
   channels.  Both streams reside in the same RTP session.

   v=0
   o=lazzaro 2520644554 2838152170 IN IP6 first.example.net
   s=Example
   t=0 0
   m=audio 5006 RTP/AVP 96 97
   c=IN IP6 2001:DB80::7F2E:172A:1E24
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=13; musicport=5
   a=rtpmap:97 mpeg4-generic/44100
   a=fmtp:97 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=13; musicport=6; render=synthetic;
   rinit="audio/asc";
   url="http://example.com/cardinal.asc";
   cid="azsldkaslkdjqpwojdkmsldkfpe"

   (The a=fmtp lines have been wrapped to fit the page to accommodate
    memo formatting restrictions; they comprise single lines in SDP.)

   The sequential musicport values for the two sessions establish the
   ordered relationship.  The musicport=5 session maps to Structured
   Audio extended channels range 0-15, the musicport=6 session maps to
   Structured Audio extended channels range 16-31.

   Both config strings are empty.  The configuration data is specified
   by parameters that appear in the fmtp line of the second media
   description.  We define this configuration method in Appendix C.6.

noToC RFC4695 - Page 127

   The next example shows two RTP MIDI streams (one recvonly, one
   sendonly) that form a "virtual sendrecv" session.  Each stream
   resides in a different RTP session (a requirement because sendonly
   and recvonly are RTP session attributes).

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.94
   m=audio 5004 RTP/AVP 96
   a=sendonly
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:1
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0
   000000600FF2F000; musicport=12
   m=audio 5006 RTP/AVP 96
   a=recvonly
   a=rtpmap:96 mpeg4-generic/44100
   a=mid:2
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0
   000000600FF2F000; musicport=12

   (The a=fmtp lines have been wrapped to fit the page to accommodate
    memo formatting restrictions; they comprise single lines in SDP.)

   To signal the "virtual sendrecv" semantics, the two streams assign
   musicport to the same value (12).  As defined earlier in this
   section, pairs of identity relationship streams that are sent by
   different parties share the association that is shared by a MIDI
   cable pair that cross-connects two devices in a MIDI 1.0 network.  We
   use the term "virtual sendrecv" because streams sent by different
   parties in a true sendrecv session also have this property.

   As discussed in the preamble to Appendix C, the primary advantage of
   the virtual sendrecv configuration is that each party can customize
   the property of the stream it receives.  In the example above, each
   stream defines its own "config" string that could customize the
   rendering algorithm for each party (in fact, the particular strings
   shown in this example are identical, because General MIDI is not a
   configurable MPEG 4 renderer).

noToC RFC4695 - Page 128

C.6.  Configuration Tools: MIDI Rendering

   This appendix defines the session configuration tools for rendering.

   The "render" parameter specifies a rendering method for a stream.
   The parameter is assigned a token value that signals the top-level
   rendering class.  This memo defines four token values for render:
   "unknown", "synthetic", "api", and "null":

     o  An "unknown" renderer is a renderer whose nature is unspecified.
        It is the default renderer for native RTP MIDI streams.

     o  A "synthetic" renderer transforms the MIDI stream into audio
        output (or sometimes into stage lighting changes or other
        actions).  It is the default renderer for mpeg4-generic RTP MIDI
        streams.

     o  An "api" renderer presents the command stream to applications
        via an Application Programmer Interface (API).

     o  The "null" renderer discards the MIDI stream.

   The "null" render value plays special roles during Offer/Answer
   negotiations [RFC3264].  A party uses the "null" value in an answer
   to reject an offered renderer.  Note that rejecting a renderer is
   independent from rejecting a payload type (coded by removing the
   payload type from a media line) and rejecting a media stream (coded
   by zeroing the port of a media line that uses the renderer).

   Other render token values MAY be registered with IANA.  The token
   value MUST adhere to the ABNF for render tokens defined in Appendix
   D.  Registrations MUST include a complete specification of parameter
   value usage, similar in depth to the specifications that appear
   throughout Appendix C.6 for "synthetic" and "api" render values.  If
   a party is offered a session description that uses a render token
   value that is not known to the party, the party MUST NOT accept the
   renderer.  Options include rejecting the renderer (using the "null"
   value), the payload type, the media stream, or the session
   description.

   Other parameters MAY follow a render parameter in a parameter list.
   The additional parameters act to define the exact nature of the
   renderer.  For example, the "subrender" parameter (defined in
   Appendix C.6.2) specifies the exact nature of the renderer.

   Special rules apply to using the render parameter in an mpeg4-generic
   stream.  We define these rules in Appendix C.6.5.

noToC RFC4695 - Page 129

C.6.1.  The multimode Parameter

   A media description MAY contain several render parameters.  By
   default, if a parameter list includes several render parameters, a
   receiver MUST choose exactly one renderer from the list to render the
   stream.  The "multimode" parameter may be used to override this
   default.  We define two token values for multimode: "one" and "all":

     o  The default "one" value requests rendering by exactly one of the
        listed renderers.

     o  The "all" value requests the synchronized rendering of the RTP
        MIDI stream by all listed renderers, if possible.

   If the multimode parameter appears in a parameter list, it MUST
   appear before the first render parameter assignment.

   Render parameters appear in the parameter list in order of decreasing
   priority.  A receiver MAY use the priority ordering to decide which
   renderer(s) to retain in a session.

   If the "offer" in an Offer/Answer-style negotiation [RFC3264]
   contains a parameter list with one or more render parameters, the
   "answer" MUST set the render parameters of all unchosen renderers to
   "null".

C.6.2.  Renderer Specification

   The render parameter (Appendix C.6 preamble) specifies, in a broad
   sense, what a renderer does with a MIDI stream.  In this appendix, we
   describe the "subrender" parameter.  The token value assigned to
   subrender defines the exact nature of the renderer.  Thus, "render"
   and "subrender" combine to define a renderer, in the same way as MIME
   types and MIME subtypes combine to define a type of media [RFC2045].

   If the subrender parameter is used for a renderer definition, it MUST
   appear immediately after the render parameter in the parameter list.
   At most one subrender parameter may appear in a renderer definition.

   This document defines one value for subrender: the value "default".
   The "default" token specifies the use of the default renderer for the
   stream type (native or mpeg4-generic).  The default renderer for
   native RTP MIDI streams is a renderer whose nature is unspecified
   (see point 6 in Section 6.1 for details).  The default renderer for
   mpeg4-generic RTP MIDI streams is an MPEG 4 Audio Object Type whose
   ID number is 13, 14, or 15 (see Section 6.2 for details).

noToC RFC4695 - Page 130

   If a renderer definition does not use the subrender parameter, the
   value "default" is assumed for subrender.

   Other subrender token values may be registered with IANA.  We now
   discuss guidelines for registering subrender values.

   A subrender value is registered for a specific stream type (native or
   mpeg4-generic) and a specific render value (excluding "null" and
   "unknown").  Registrations for mpeg4-generic subrender values are
   restricted to new MPEG 4 Audio Object Types that accept MIDI input.
   The syntax of the token MUST adhere to the token definition in
   Appendix D.

   For "render=synthetic" renderers, a subrender value registration
   specifies an exact method for transforming the MIDI stream into audio
   (or sometimes into video or control actions, such as stage lighting).
   For standardized renderers, this specification is usually a pointer
   to a standards document, perhaps supplemented by RTP-MIDI-specific
   information.  For commercial products and open-source projects, this
   specification usually takes the form of instructions for interfacing
   the RTP MIDI stream with the product or project software.  A
   "render=synthetic" registration MAY specify additional Reset State
   commands for the renderer (Appendix A.1).

   A "render=api" subrender value registration specifies how an RTP MIDI
   stream interfaces with an API (Application Programmers Interface).
   This specification is usually a pointer to programmer's documentation
   for the API, perhaps supplemented by RTP-MIDI-specific information.

   A subrender registration MAY specify an initialization file (referred
   to in this document as an initialization data object) for the stream.
   The initialization data object MAY be encoded in the parameter list
   (verbatim or by reference) using the coding tools defined in Appendix
   C.6.3.  An initialization data object MUST have a registered
   [RFC4288] media type and subtype [RFC2045].

   For "render=synthetic" renderers, the data object usually encodes
   initialization data for the renderer (sample files, synthesis patch
   parameters, reverberation room impulse responses, etc.).

   For "render=api" renderers, the data object usually encodes data
   about the stream used by the API (for example, for an RTP MIDI stream
   generated by a piano keyboard controller, the manufacturer and model
   number of the keyboard, for use in GUI presentation).

noToC RFC4695 - Page 131

   Usually, only one initialization object is encoded for a renderer.
   If a renderer uses multiple data objects, the correct receiver
   interpretation of multiple data objects MUST be defined in the
   subrender registration.

   A subrender value registration may also specify additional
   parameters, to appear in the parameter list immediately after
   subrender.  These parameter names MUST begin with the subrender
   value, followed by an underscore ("_"), to avoid name space
   collisions with future RTP MIDI parameter names (for example, a
   parameter "foo_bar" defined for subrender value "foo").

   We now specify guidelines for interpreting the subrender parameter
   during session configuration.

   If a party is offered a session description that uses a renderer
   whose subrender value is not known to the party, the party MUST NOT
   accept the renderer.  Options include rejecting the renderer (using
   the "null" value), the payload type, the media stream, or the session
   description.

   Receivers MUST be aware of the Reset State commands (Appendix A.1)
   for the renderer specified by the subrender parameter and MUST insure
   that the renderer does not experience indefinite artifacts due to the
   presence (or the loss) of a Reset State command.

C.6.3.  Renderer Initialization

   If the renderer for a stream uses an initialization data object, an
   "rinit" parameter MUST appear in the parameter list immediately after
   the "subrender" parameter.  If the renderer parameter list does not
   include a subrender parameter (recall the semantics for "default" in
   Appendix C.6.2), the "rinit" parameter MUST appear immediately after
   the "render" parameter.

   The value assigned to the rinit parameter MUST be the media
   type/subtype [RFC2045] for the initialization data object.  If an
   initialization object type is registered with several media types,
   including audio, the assignment to rinit MUST use the audio media
   type.

   RTP MIDI supports several parameters for encoding initialization data
   objects for renderers in the parameter list: "inline", "url", and
   "cid".

   If the "inline", "url", and/or "cid" parameters are used by a
   renderer, these parameters MUST immediately follow the "rinit"
   parameter.

noToC RFC4695 - Page 132

   If a "url" parameter appears for a renderer, an "inline" parameter
   MUST NOT appear.  If an "inline" parameter appears for a renderer, a
   "url" parameter MUST NOT appear.  However, neither "url" or "inline"
   is required to appear.  If neither "url" or "inline" parameters
   follow "rinit", the "cid" parameter MUST follow "rinit".

   The "inline" parameter supports the inline encoding of the data
   object.  The parameter is assigned a double-quoted Base64 [RFC2045]
   encoding of the binary data object, with no line breaks.  Appendix
   E.4 shows an example that constructs an inline parameter value.

   The "url" parameter is assigned a double-quoted string representation
   of a Uniform Resource Locator (URL) for the data object.  The string
   MUST specify a HyperText Transport Protocol URL (HTTP, [RFC2616]).
   HTTP MAY be used over TCP or MAY be used over a secure network
   transport, such as the method described in [RFC2818].  The media
   type/subtype for the data object SHOULD be specified in the
   appropriate HTTP transport header.

   The "cid" parameter supports data object caching.  The parameter is
   assigned a double-quoted string value that encodes a globally unique
   identifier for the data object.

   A cid parameter MAY immediately follow an inline parameter, in which
   case the cid identifier value MUST be associated with the inline data
   object.

   If a url parameter is present, and if the data object for the URL is
   expected to be unchanged for the life of the URL, a cid parameter MAY
   immediately follow the url parameter.  The cid identifier value MUST
   be associated with the data object for the URL.  A cid parameter
   assigned to the same identifier value SHOULD be specified following
   the data object type/subtype in the appropriate HTTP transport
   header.

   If a url parameter is present, and if the data object for the URL is
   expected to change during the life of the URL, a cid parameter MUST
   NOT follow the url parameter.  A receiver interprets the presence of
   a cid parameter as an indication that it is safe to use a cached copy
   of the url data object; the absence of a cid parameter is an
   indication that it is not safe to use a cached copy, as it may
   change.

   Finally, the cid parameter MAY be used without the inline and url
   parameters.  In this case, the identifier references a local or
   distributed catalog of data objects.

noToC RFC4695 - Page 133

   In most cases, only one data object is coded in the parameter list
   for each renderer.  For example, the default renderer for mpeg4-
   generic streams uses a single data object (see Appendix C.6.5 for
   example usage).

   However, a subrender registration MAY permit the use of multiple data
   objects for a renderer.  If multiple data objects are encoded for a
   renderer, each object encoding begins with an "rinit" parameter,
   followed by "inline", "url", and/or "cid" parameters.

   Initialization data objects MAY encapsulate a Standard MIDI File
   (SMF).  By default, the SMFs that are encapsulated in a data object
   MUST be ignored by an RTP MIDI receiver.  We define parameters to
   override this default in Appendix C.6.4.

   To end this section, we offer guidelines for registering media types
   for initialization data objects.  These guidelines are in addition to
   the information in [RFC4288] [RFC4289].

   Some initialization data objects are also capable of encoding MIDI
   note information and thus complete audio performances.  These objects
   SHOULD be registered using the "audio" media type, so that the
   objects may also be used for store-and-forward rendering, and
   "application" media type, to support editing tools.  Initialization
   objects without note storage, or initialization objects for non-audio
   renderers, SHOULD be registered only for an "application" media type.

C.6.4.  MIDI Channel Mapping

   In this appendix, we specify how to map MIDI name spaces (16 voice
   channels + systems) onto a renderer.

   In the general case:

     o  A session may define an ordered relationship (Appendix C.5) that
        presents more than one MIDI name space to a renderer.

     o  A renderer may accept an arbitrary number of MIDI name spaces,
        or it may expect a specific number of MIDI name spaces.

   A session description SHOULD provide a compatible MIDI name space to
   each renderer in the session.  If a receiver detects that a session
   description has too many or too few MIDI name spaces for a renderer,
   MIDI data from extra stream name spaces MUST be discarded, and extra
   renderer name spaces MUST NOT be driven with MIDI data (except as
   described in Appendix C.6.4.1, below).

noToC RFC4695 - Page 134

   If a parameter list defines several renderers and assigns the "all"
   token value to the multimode parameter, the same name space is
   presented to each renderer.  However, the "chanmask" parameter may be
   used to mask out selected voice channels to each renderer.  We define
   "chanmask" and other MIDI management parameters in the sub-sections
   below.

C.6.4.1.  The smf_info Parameter

   The smf_info parameter defines the use of the SMFs encapsulated in
   renderer data objects (if any).  The smf_info parameter also defines
   the use of SMFs coded in the smf_inline, smf_url, and smf_cid
   parameters (defined in Appendix C.6.4.2).

   The smf_info parameter describes the "render" parameter that most
   recently precedes it in the parameter list.  The smf_info parameter
   MUST NOT appear in parameter lists that do not use the "render"
   parameter, and MUST NOT appear before the first use of "render" in
   the parameter list.

   We define three token values for smf_info: "ignore", "sdp_start", and
   "identity":

     o  The "ignore" value indicates that the SMFs MUST be discarded.
        This behavior is the default SMF rendering behavior.

     o  The "sdp_start" value codes that SMFs MUST be rendered, and that
        the rendering MUST begin upon the acceptance of the session
        description.  If a receiver is offered a session description
        with a renderer that uses an smf_info parameter set to
        sdp_start, and if the receiver does not support rendering SMFs,
        the receiver MUST NOT accept the renderer associated with the
        smf_info parameter.  Options include rejecting the renderer (by
        setting the "render" parameter to "null"), the payload type, the
        media stream, or the entire session description.

     o  The "identity" value indicates that the SMFs code the identity
        of the renderer.  The value is meant for use with the "unknown"
        renderer (see Appendix C.6 preamble).  The MIDI commands coded
        in the SMF are informational in nature and MUST NOT be presented
        to a renderer for audio presentation.  In typical use, the SMF
        would use SysEx Identity Reply commands (F0 7E nn 06 02, as
        defined in [MIDI]) to identify devices, and use device-specific
        SysEx commands to describe current state of the devices (patch
        memory contents, etc.).

   Other smf_info token values MAY be registered with IANA.  The token
   value MUST adhere to the ABNF for render tokens defined in Appendix

noToC RFC4695 - Page 135

   D.  Registrations MUST include a complete specification of parameter
   usage, similar in depth to the specifications that appear in this
   appendix for "sdp_start" and "identity".

   If a party is offered a session description that uses an smf_info
   parameter value that is not known to the party, the party MUST NOT
   accept the renderer associated with the smf_info parameter.  Options
   include rejecting the renderer, the payload type, the media stream,
   or the entire session description.

   We now define the rendering semantics for the "sdp_start" token value
   in detail.

   The SMFs and RTP MIDI streams in a session description share the same
   MIDI name space(s).  In the simple case of a single RTP MIDI stream
   and a single SMF, the SMF MIDI commands and RTP MIDI commands are
   merged into a single name space and presented to the renderer.  The
   indefinite artifact responsibilities for merged MIDI streams defined
   in Appendix C.5 also apply to merging RTP and SMF MIDI data.

   If a payload type codes multiple SMFs, the SMF name spaces are
   presented as an ordered entity to the renderer.  To determine the
   ordering of SMFs for a renderer (which SMF is "first", which is
   "second", etc.), use the following rules:

     o  If the renderer uses a single data object, the order of
        appearance of the SMFs in the object's internal structure
        defines the order of the SMFs (the earliest SMF in the object is
        "first", the next SMF in the object is "second", etc.).

     o  If multiple data objects are encoded for a renderer, the
        appearance of each data object in the parameter list sets the
        relative order of the SMFs encoded in each data object (SMFs
        encoded in parameters that appear earlier in the list are
        ordered before SMFs encoded in parameters that appear later in
        the list).

     o  If SMFs are encoded in data objects parameters and in the
        parameters defined in C.6.4.2, the relative order of the data
        object parameters and C.6.4.2 parameters in the parameter list
        sets the relative order of SMFs (SMFs encoded in parameters that
        appear earlier in the list are ordered before SMFs in parameters
        that appear later in the list).

   Given this ordering of SMFs, we now define the mapping of SMFs to
   renderer name spaces.  The SMF that appears first for a renderer maps
   to the first renderer name space.  The SMF that appears second for a
   renderer maps to the second renderer name space, etc.  If the

noToC RFC4695 - Page 136

   associated RTP MIDI streams also form an ordered relationship, the
   first SMF is merged with the first name space of the relationship,
   the second SMF is merged to the second name space of the
   relationship, etc.

   Unless the streams and the SMFs both use MIDI Time Code, the time
   offset between SMF and stream data is unspecified.  This restriction
   limits the use of SMFs to applications where synchronization is not
   critical, such as the transport of System Exclusive commands for
   renderer initialization, or human-SMF interactivity.

   Finally, we note that each SMF in the sdp_start discussion above
   encodes exactly one MIDI name space (16 voice channels + systems).
   Thus, the use of the Device Name SMF meta event to specify several
   MIDI name spaces in an SMF is not supported for sdp_start.

C.6.4.2.  The smf_inline, smf_url, and smf_cid Parameters

   In some applications, the renderer data object may not encapsulate
   SMFs, but an application may wish to use SMFs in the manner defined
   in Appendix C.6.4.1.

   The "smf_inline", "smf_url", and "smf_cid" parameters address this
   situation.  These parameters use the syntax and semantics of the
   inline, url, and cid parameters defined in Appendix C.6.3, except
   that the encoded data object is an SMF.

   The "smf_inline", "smf_url", and "smf_cid" parameters belong to the
   "render" parameter that most recently precedes it in the session
   description.  The "smf_inline", "smf_url", and "smf_cid" parameters
   MUST NOT appear in parameter lists that do not use the "render"
   parameter and MUST NOT appear before the first use of "render" in the
   parameter list.  If several "smf_inline", "smf_url", or "smf_cid"
   parameters appear for a renderer, the order of the parameters defines
   the SMF name space ordering.

C.6.4.3.  The chanmask Parameter

   The chanmask parameter instructs the renderer to ignore all MIDI
   voice commands for certain channel numbers.  The parameter value is a
   concatenated string of "1" and "0" digits.  Each string position maps
   to a MIDI voice channel number (system channels may not be masked).
   A "1" instructs the renderer to process the voice channel; a "0"
   instructs the renderer to ignore the voice channel.

   The string length of the chanmask parameter value MUST be 16 (for a
   single stream or an identity relationship) or a multiple of 16 (for
   an ordered relationship).

noToC RFC4695 - Page 137

   The chanmask parameter describes the "render" parameter that most
   recently precedes it in the session description; chanmask MUST NOT
   appear in parameter lists that do not use the "render" parameter and
   MUST NOT appear before the first use of "render" in the parameter
   list.

   The chanmask parameter describes the final MIDI name spaces presented
   to the renderer.  The SMF and stream components of the MIDI name
   spaces may not be independently masked.

   If a receiver is offered a session description with a renderer that
   uses the chanmask parameter, and if the receiver does not implement
   the semantics of the chanmask parameter, the receiver MUST NOT accept
   the renderer unless the chanmask parameter value contains only "1"s.

C.6.5.  The audio/asc Media Type

   In Appendix 11.3, we register the audio/asc media type.  The data
   object for audio/asc is a binary encoding of the AudioSpecificConfig
   data block used to initialize mpeg4-generic streams (Section 6.2 and
   [MPEGAUDIO]).

   An mpeg4-generic parameter list MAY use the render, subrender, and
   rinit parameters with the audio/asc media type for renderer
   configuration.  Several restrictions apply to the use of these
   parameters in mpeg4-generic parameter lists:

     o  An mpeg4-generic media description that uses the render
        parameter MUST assign the empty string ("") to the mpeg4-generic
        "config" parameter.  The use of the streamtype, mode, and
        profile-level-id parameters MUST follow the normative text in
        Section 6.2.

     o  Sessions that use identity or ordered relationships MUST follow
        the mpeg4-generic configuration restrictions in Appendix C.5.

     o  The render parameter MUST be assigned the value "synthetic",
        "unknown", "null", or a render value that has been added to the
        IANA repository for use with mpeg4-generic RTP MIDI streams.
        The "api" token value for render MUST NOT be used.

     o  If a subrender parameter is present, it MUST immediately follow
        the render parameter, and it MUST be assigned the token value
        "default" or assigned a subrender value added to the IANA
        repository for use with mpeg4-generic RTP MIDI streams.  A
        subrender parameter assignment may be left out of the renderer
        configuration, in which case the implied value of subrender is
        the default value of "default".

noToC RFC4695 - Page 138

     o  If the render parameter is assigned the value "synthetic" and
        the subrender parameter has the value "default" (assigned or
        implied), the rinit parameter MUST be assigned the value
        "audio/asc", and an AudioSpecificConfig data object MUST be
        encoded using the mechanisms defined in C.6.2-3.  The
        AudioSpecificConfig data MUST encode one of the MPEG 4 Audio
        Object Types defined for use with mpeg4-generic in Section 6.2.
        If the subrender value is other than "default", refer to the
        subrender registration for information on the use of "audio/asc"
        with the renderer.

     o  If the render parameter is assigned the value "null" or
        "unknown", the data object MAY be omitted.

   Several general restrictions apply to the use of the audio/asc media
   type in RTP MIDI:

     o  A native stream MUST NOT assign "audio/asc" to rinit.  The
        audio/asc media type is not intended to be a general-purpose
        container for rendering systems outside of MPEG usage.

     o  The audio/asc media type defines a stored object type; it does
        not define semantics for RTP streams.  Thus, audio/asc MUST NOT
        appear on an rtpmap line of a session description.

   Below, we show session description examples for audio/asc.  The
   session description below uses the inline parameter to code the
   AudioSpecificConfig block for a mpeg4-generic General MIDI stream.
   We derive the value assigned to the inline parameter in Appendix E.4.
   The subrender token value of "default" is implied by the absence of
   the subrender parameter in the parameter list.

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP4 192.0.2.94
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; render=synthetic; rinit="audio/asc";
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"

   (The a=fmtp line has been wrapped to fit the page to accommodate
    memo formatting restrictions; it comprises a single line in SDP.)

noToC RFC4695 - Page 139

   The session description below uses the url parameter to code the
   AudioSpecificConfig block for the same General MIDI stream:

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP4 192.0.2.94
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; render=synthetic; rinit="audio/asc";
   url="http://example.net/oski.asc";
   cid="xjflsoeiurvpa09itnvlduihgnvet98pa3w9utnuighbuk"

   (The a=fmtp line has been wrapped to fit the page to accommodate
    memo formatting restrictions; it comprises a single line in SDP.)

C.7.  Interoperability

   In this appendix, we define interoperability guidelines for two
   application areas:

     o  MIDI content-streaming applications.  RTP MIDI is added to
        RTSP-based content-streaming servers, so that viewers may
        experience MIDI performances (produced by a specified client-
        side renderer) in synchronization with other streams (video,
        audio).

     o  Long-distance network musical performance applications.  RTP
        MIDI is added to SIP-based voice chat or videoconferencing
        programs, as an alternative, or as an addition, to audio and/or
        video RTP streams.

   For each application, we define a core set of functionality that all
   implementations MUST implement.

   The applications we address in this section are not an exhaustive
   list of potential RTP MIDI uses.  We expect framework documents for
   other applications to be developed, within the IETF or within other
   organizations.  We discuss other potential application areas for RTP
   MIDI in Section 1 of the main text of this memo.

C.7.1.  MIDI Content Streaming Applications

   In content-streaming applications, a user invokes an RTSP client to
   initiate a request to an RTSP server to view a multimedia session.
   For example, clicking on a web page link for an Internet Radio

noToC RFC4695 - Page 140

   channel launches an RTSP client that uses the link's RTSP URL to
   contact the RTSP server hosting the radio channel.

   The content may be pre-recorded (for example, on-demand replay of
   yesterday's football game) or "live" (for example, football game
   coverage as it occurs), but in either case the user is usually an
   "audience member" as opposed to a "participant" (as the user would be
   in telephony).

   Note that these examples describe the distribution of audio content
   to an audience member.  The interoperability guidelines in this
   appendix address RTP MIDI applications of this nature, not
   applications such as the transmission of raw MIDI command streams for
   use in a professional environment (recording studio, performance
   stage, etc.).

   In an RTSP session, a client accesses a session description that is
   "declared" by the server, either via the RTSP DESCRIBE method, or via
   other means, such as HTTP or email.  The session description defines
   the session from the perspective of the client.  For example, if a
   media line in the session description contains a non-zero port
   number, it encodes the server's preference for the client's port
   numbers for RTP and RTCP reception.  Once media flow begins, the
   server sends an RTP MIDI stream to the client, which renders it for
   presentation, perhaps in synchrony with video or other audio streams.

   We now define the interoperability text for content-streaming RTSP
   applications.

   In most cases, server interoperability responsibilities are described
   in terms of limits on the "reference" session description a server
   provides for a performance if it has no information about the
   capabilities of the client.  The reference session is a "lowest
   common denominator" session that maximizes the odds that a client
   will be able to view the session.  If a server is aware of the
   capabilities of the client, the server is free to provide a session
   description customized for the client in the DESCRIBE reply.

   Clients MUST support unicast UDP RTP MIDI streams that use the
   recovery journal with the closed-loop or the anchor sending policies.
   Clients MUST be able to interpret stream subsetting and chapter
   inclusion parameters in the session description that qualify the
   sending policies.  Client support of enhanced Chapter C encoding is
   OPTIONAL.

   The reference session description offered by a server MUST send all
   RTP MIDI UDP streams as unicast streams that use the recovery journal
   and the closed-loop or anchor sending policies.  Servers SHOULD use

noToC RFC4695 - Page 141

   the stream subsetting and chapter inclusion parameters in the
   reference session description, to simplify the rendering task of the
   client.  Server support of enhanced Chapter C encoding is OPTIONAL.

   Clients and servers MUST support the use of RTSP interleaved mode (a
   method for interleaving RTP onto the RTSP TCP transport).

   Clients MUST be able to interpret the timestamp semantics signalled
   by the "comex" value of the tsmode parameter (i.e., the timestamp
   semantics of Standard MIDI Files [MIDI]).  Servers MUST use the
   "comex" value for the "tsmode" parameter in the reference session
   description.

   Clients MUST be able to process an RTP MIDI stream whose packets
   encode an arbitrary temporal duration ("media time").  Thus, in
   practice, clients MUST implement a MIDI playout buffer.  Clients MUST
   NOT depend on the presence of rtp_ptime, rtp_maxtime, and guardtime
   parameters in the session description in order to process packets,
   but they SHOULD be able to use these parameters to improve packet
   processing.

   Servers SHOULD strive to send RTP MIDI streams in the same way media
   servers send conventional audio streams: a sequence of packets that
   either all code the same temporal duration (non-normative example: 50
   ms packets) or that code one of an integral number of temporal
   durations (non-normative example: 50 ms, 100 ms, 250 ms, or 500 ms
   packets).  Servers SHOULD encode information about the packetization
   method in the rtp_ptime and rtp_maxtime parameters in the session
   description.

   Clients MUST be able to examine the render and subrender parameter,
   to determine if a multimedia session uses a renderer it supports.
   Clients MUST be able to interpret the default "one" value of the
   "multimode" parameter, to identify supported renderers from a list of
   renderer descriptions.  Clients MUST be able to interpret the
   musicport parameter, to the degree that it is relevant to the
   renderers it supports.  Clients MUST be able to interpret the
   chanmask parameter.

   Clients supporting renderers whose data object (as encoded by a
   parameter value for "inline") could exceed 300 octets in size MUST
   support the url and cid parameters and thus must implement the HTTP
   protocol in addition to RTSP.

   Servers MUST specify complete rendering systems for RTP MIDI streams.
   Note that a minimal RTP MIDI native stream does not meet this
   requirement (Section 6.1), as the rendering method for such streams
   is "not specified".

noToC RFC4695 - Page 142

   At the time of this memo, the only way for servers to specify a
   complete rendering system is to specify an mpeg4-generic RTP MIDI
   stream in mode rtp-midi (Section 6.2 and C.6.5).  As a consequence,
   the only rendering systems that may be presently used are General
   MIDI [MIDI], DLS 2 [DLS2], or Structured Audio [MPEGSA].  Note that
   the maximum inline value for General MIDI is well under 300 octets
   (and thus clients need not support the "url" parameter), and that the
   maximum inline values for DLS 2 and Structured Audio may be much
   larger than 300 octets (and thus clients MUST support the url
   parameter).

   We anticipate that the owners of rendering systems (both standardized
   and proprietary) will register subrender parameters for their
   renderers.  Once registration occurs, native RTP MIDI sessions may
   use render and subrender (Appendix C.6.2) to specify complete
   rendering systems for RTSP content-streaming multimedia sessions.

   Servers MUST NOT use the sdp_start value for the smf_info parameter
   in the reference session description, as this use would require that
   clients be able to parse and render Standard MIDI Files.

   Clients MUST support mpeg4-generic mode rtp-midi General MIDI (GM)
   sessions, at a polyphony limited by the hardware capabilities of the
   client.  This requirement provides a "lowest common denominator"
   rendering system for content providers to target.  Note that this
   requirement does not force implementors of a non-GM renderer (such as
   DLS 2 or Structured Audio) to add a second rendering engine.
   Instead, a client may satisfy the requirement by including a set of
   voice patches that implement the GM instrument set, and using this
   emulation for mpeg4-generic GM sessions.

   It is RECOMMENDED that servers use General MIDI as the renderer for
   the reference session description, because clients are REQUIRED to
   support it.  We do not require General MIDI as the reference
   renderer, because for normative applications it is an inappropriate
   choice.  Servers using General MIDI as a "lowest common denominator"
   renderer SHOULD use Universal Real-Time SysEx MIP message [SPMIDI] to
   communicate the priority of voices to polyphony-limited clients.

C.7.2.  MIDI Network Musical Performance Applications

   In Internet telephony and videoconferencing applications, parties
   interact over an IP network as they would face-to-face.  Good user
   experiences require low end-to-end audio latency and tight
   audiovisual synchronization (for "lip-sync").  The Session Initiation
   Protocol (SIP, [RFC3261]) is used for session management.

noToC RFC4695 - Page 143

   In this appendix section, we define interoperability guidelines for
   using RTP MIDI streams in interactive SIP applications.  Our primary
   interest is supporting Network Musical Performances (NMP), where
   musicians in different locations interact over the network as if they
   were in the same room.  See [NMP] for background information on NMP,
   and see [RFC4696] for a discussion of low-latency RTP MIDI
   implementation techniques for NMP.

   Note that the goal of NMP applications is telepresence: the parties
   should hear audio that is close to what they would hear if they were
   in the same room.  The interoperability guidelines in this appendix
   address RTP MIDI applications of this nature, not applications such
   as the transmission of raw MIDI command streams for use in a
   professional environment (recording studio, performance stage, etc.).

   We focus on session management for two-party unicast sessions that
   specify a renderer for RTP MIDI streams.  Within this limited scope,
   the guidelines defined here are sufficient to let applications
   interoperate.  We define the REQUIRED capabilities of RTP MIDI
   senders and receivers in NMP sessions and define how session
   descriptions exchanged are used to set up network musical performance
   sessions.

   SIP lets parties negotiate details of the session, using the
   Offer/Answer protocol [RFC3264].  However, RTP MIDI has so many
   parameters that "blind" negotiations between two parties using
   different applications might not yield a common session
   configuration.

   Thus, we now define a set of capabilities that NMP parties MUST
   support.  Session description offers whose options lie outside the
   envelope of REQUIRED party behavior risk negotiation failure.  We
   also define session description idioms that the RTP MIDI part of an
   offer MUST follow, in order to structure the offer for simpler
   analysis.

   We use the term "offerer" for the party making a SIP offer, and
   "answerer" for the party answering the offer.  Finally, we note that
   unless it is qualified by the adjective "sender" or "receiver", a
   statement that a party MUST support X implies that it MUST support X
   for both sending and receiving.

   If an offerer wishes to define a "sendrecv" RTP MIDI stream, it may
   use a true sendrecv session or the "virtual sendrecv" construction
   described in the preamble to Appendix C and in Appendix C.5.  A true
   sendrecv session indicates that the offerer wishes to participate in
   a session where both parties use identically configured renderers.  A
   virtual sendrecv session indicates that the offerer is willing to

noToC RFC4695 - Page 144

   participate in a session where the two parties may be using different
   renderer configurations.  Thus, parties MUST be prepared to see both
   real and virtual sendrecv sessions in an offer.

   Parties MUST support unicast UDP transport of RTP MIDI streams.
   These streams MUST use the recovery journal with the closed-loop or
   anchor sending policies.  These streams MUST use the stream
   subsetting and chapter inclusion parameters to declare the types of
   MIDI commands that will be sent on the stream (for sendonly streams)
   or will be processed (for recvonly streams), including the size
   limits on System Exclusive commands.  Support of enhanced Chapter C
   encoding is OPTIONAL.

   Note that both TCP and multicast UDP support are OPTIONAL.  We make
   TCP OPTIONAL because we expect NMP renderers to rely on data objects
   (signalled by "rinit" and associated parameters) for initialization
   at the start of the session, and only to use System Exclusive
   commands for interactive control during the session.  These
   interactive commands are small enough to be protected via the
   recovery journal mechanism of RTP MIDI UDP streams.

   We now discuss timestamps, packet timing, and packet sending
   algorithms.

   Recall that the tsmode parameter controls the semantics of command
   timestamps in the MIDI list of RTP packets.

   Parties MUST support clock rates of 44.1 kHz, 48 kHz, 88.2 kHz, and
   96 kHz.  Parties MUST support streams using the "comex", "async", and
   "buffer" tsmode values.  Recvonly offers MUST offer the default
   "comex".

   Parties MUST support a wide range of packet temporal durations: from
   rtp_ptime and rtp_maxptime values of 0, to rtp_ptime and rtp_maxptime
   values that code 100 ms.  Thus, receivers MUST be able to implement a
   playout buffer.

   Offers and answers MUST present rtp_ptime, rtp_maxptime, and
   guardtime values that support the latency that users would expect in
   the application, subject to bandwidth constraints.  As senders MUST
   abide by values set for these parameters in a session description, a
   receiver SHOULD use these values to size its playout buffer to
   produce the lowest reliable latency for a session.  Implementers
   should refer to [RFC4696] for information on packet sending
   algorithms for latency-sensitive applications.  Parties MUST be able
   to implement the semantics of the guardtime parameter, for times from
   5 ms to 5000 ms.

noToC RFC4695 - Page 145

   We now discuss the use of the render parameter.

   Sessions MUST specify complete rendering systems for all RTP MIDI
   streams.  Note that a minimal RTP MIDI native stream does not meet
   this requirement (Section 6.1), as the rendering method for such
   streams is "not specified".

   At the time this writing, the only way for parties to specify a
   complete rendering system is to specify an mpeg4-generic RTP MIDI
   stream in mode rtp-midi (Section 6.2 and C.6.5).  We anticipate that
   the owners of rendering systems (both standardized and proprietary)
   will register subrender values for their renderers.  Once IANA
   registration occurs, native RTP MIDI sessions may use render and
   subrender (Appendix C.6.2) to specify complete rendering systems for
   SIP network musical performance multimedia sessions.

   All parties MUST support General MIDI (GM) sessions, at a polyphony
   limited by the hardware capabilities of the party.  This requirement
   provides a "lowest common denominator" rendering system, without
   which practical interoperability will be quite difficult.  When using
   GM, parties SHOULD use Universal Real-Time SysEx MIP message [SPMIDI]
   to communicate the priority of voices to polyphony-limited clients.

   Note that this requirement does not force implementors of a non-GM
   renderer (for mpeg4-generic sessions, DLS 2, or Structured Audio) to
   add a second rendering engine.  Instead, a client may satisfy the
   requirement by including a set of voice patches that implement the GM
   instrument set, and using this emulation for mpeg4-generic GM
   sessions.  We require GM support so that an offerer that wishes to
   maximize interoperability may do so by offering GM if its preferred
   renderer is not accepted by the answerer.

   Offerers MUST NOT present several renderers as options in a session
   description by listing several payload types on a media line, as
   Section 2.1 uses this construct to let a party send several RTP MIDI
   streams in the same RTP session.

   Instead, an offerer wishing to present rendering options SHOULD offer
   a single payload type that offers several renderers.  In this
   construct, the parameter list codes a list of render parameters (each
   followed by its support parameters).  As discussed in Appendix C.6.1,
   the order of renderers in the list declares the offerer's preference.
   The "unknown" and "null" values MUST NOT appear in the offer.  The
   answer MUST set all render values except the desired renderer to
   "null".  Thus, "unknown" MUST NOT appear in the answer.

noToC RFC4695 - Page 146

   We use SHOULD instead of MUST in the first sentence in the paragraph
   above, because this technique does not work in all situations
   (example:  an offerer wishes to offer both mpeg4-generic renderers
   and native RTP MIDI renderers as options).  In this case, the offerer
   MUST present a series of session descriptions, each offering a single
   renderer, until the answerer accepts a session description.

   Parties MUST support the musicport, chanmask, subrender, rinit, and
   inline parameters.  Parties supporting renderers whose data object
   (as encoded by a parameter value for "inline") could exceed 300
   octets in size MUST support the url and cid parameters and thus must
   implement HTTP protocol.  Note that in mpeg4-generic, General MIDI
   data objects cannot exceed 300 octets, but DLS 2 and Structured Audio
   data objects may.  Support for the other rendering parameters
   (smf_cif, smf_info, smf_inline, smf_url) is OPTIONAL.

   Thus far in this document, our discussion has assumed that the only
   MIDI flows that drive a renderer are the network flows described in
   the session description.  In NMP applications, this assumption would
   require two rendering engines: one for local use by a party, a second
   for the remote party.

   In practice, applications may wish to have both parties share a
   single rendering engine.  In this case, the session description MUST
   use a virtual sendrecv session and MUST use the stream subsetting and
   chapter inclusion parameters to allocate which MIDI channels are
   intended for use by a party.  If two parties are sharing a MIDI
   channels, the application MUST ensure that appropriate MIDI merging
   occurs at the input to the renderer.

   We now discuss the use of (non-MIDI) audio streams in the session.

   Audio streams may be used for two purposes: as a "talkback" channel
   for parties to converse, or as a way to conduct a performance that
   includes MIDI and audio channels.  In the latter case, offers MUST
   use sample rates and the packet temporal durations for the audio and
   MIDI streams that support low-latency synchronized rendering.

noToC RFC4695 - Page 147

   We now show an example of an offer/answer exchange in a network
   musical performance application (next page).  Below, we show an offer
   that complies with the interoperability text in this appendix
   section.

   v=0
   o=first 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.94
   m=audio 16112 RTP/AVP 96
   a=recvonly
   a=mid:1
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=2NPTW;
   cm_used=2C0.1.7.10.11.64.121.123; cm_used=2M0.1.2
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=2NPTW; ch_default=2C0.1.7.10.11.64.121.123;
   ch_default=2M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=0; guardtime=44100;
   musicport=1; render=synthetic; rinit="audio/asc";
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"
   m=audio 16114 RTP/AVP 96
   a=sendonly
   a=mid:2
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=1NPTW;
   cm_used=1C0.1.7.10.11.64.121.123; cm_used=1M0.1.2
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=1NPTW; ch_default=1C0.1.7.10.11.64.121.123;
   ch_default=1M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=0; guardtime=44100;
   musicport=1; render=synthetic; rinit="audio/asc";
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"

   (The a=fmtp lines have been wrapped to fit the page to accommodate
    memo formatting restrictions; it comprises a single line in SDP.)

   The owner line (o=) identifies the session owner as "first".

   The session description defines two MIDI streams: a recvonly stream
   on which "first" receives a performance, and a sendonly stream that
   "first" uses to send a performance.  The recvonly port number encodes
   the ports on which "first" wishes to receive RTP (16112) and RTCP
   (16113) media at IP4 address 192.0.2.94.  The sendonly port number

noToC RFC4695 - Page 148

   encodes the port on which "first" wishes to receive RTCP for the
   stream (16115).

   The musicport parameters code that the two streams share and identity
   relationship and thus form a virtual sendrecv stream.

   Both streams are mpeg4-generic RTP MIDI streams that specify a
   General MIDI renderer.  The stream subsetting parameters code that
   the recvonly stream uses MIDI channel 1 exclusively for voice
   commands, and that the sendonly stream uses MIDI channel 2
   exclusively for voice commands.  This mapping permits the application
   software to share a single renderer for local and remote performers.

noToC RFC4695 - Page 149

   We now show the answer to the offer.

   v=0
   o=second 2520644554 2838152170 IN IP4 second.example.net
   s=Example
   t=0 0
   a=group:FID 1 2
   c=IN IP4 192.0.2.105
   m=audio 5004 RTP/AVP 96
   a=sendonly
   a=mid:1
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=2NPTW;
   cm_used=2C0.1.7.10.11.64.121.123; cm_used=2M0.1.2
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=2NPTW; ch_default=2C0.1.7.10.11.64.121.123;
   ch_default=2M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=882; guardtime=44100;
   musicport=1; render=synthetic; rinit="audio/asc";
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"
   m=audio 5006 RTP/AVP 96
   a=recvonly
   a=mid:2
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; config="";
   profile-level-id=12; cm_unused=ABCFGHJKMNPQTVWXYZ; cm_used=1NPTW;
   cm_used=1C0.1.7.10.11.64.121.123; cm_used=1M0.1.2
   cm_used=X0-16; ch_never=ABCDEFGHJKMNPQTVWXYZ;
   ch_default=1NPTW; ch_default=1C0.1.7.10.11.64.121.123;
   ch_default=1M0.1.2; cm_default=X0-16;
   rtp_ptime=0; rtp_maxptime=0; guardtime=88200;
   musicport=1; render=synthetic; rinit="audio/asc";
   inline="egoAAAAaTVRoZAAAAAYAAAABAGBNVHJrAAAABgD/LwAA"

   (The a=fmtp lines have been wrapped to fit the page to accommodate
    memo formatting restrictions; they comprise single lines in SDP.)

   The owner line (o=) identifies the session owner as "second".

   The port numbers for both media streams are non-zero; thus, "second"
   has accepted the session description.  The stream marked "sendonly"
   in the offer is marked "recvonly" in the answer, and vice versa,
   coding the different view of the session held by "session".  The IP4
   number (192.0.2.105) and the RTP (5004 and 5006) and RTCP (5005 and
   5007) have been changed by "second" to match its transport wishes.

noToC RFC4695 - Page 150

   In addition, "second" has made several parameter changes:
   rtp_maxptime for the sendonly stream has been changed to code 2 ms
   (441 in clock units), and the guardtime for the recvonly stream has
   been doubled.  As these parameter modifications request capabilities
   that are REQUIRED to be implemented by interoperable parties,
   "second" can make these changes with confidence that "first" can
   abide by them.

D.  Parameter Syntax Definitions

   In this appendix, we define the syntax for the RTP MIDI media type
   parameters in Augmented Backus-Naur Form (ABNF, [RFC4234]).  When
   using these parameters with SDP, all parameters MUST appear on a
   single fmtp attribute line of an RTP MIDI media description.  For
   mpeg4-generic RTP MIDI streams, this line MUST also include any
   mpeg4-generic parameters (usage described in Section 6.2).  An fmtp
   attribute line may be defined (after [RFC3640]) as:

   ;
   ; SDP fmtp line definition
   ;

   fmtp = "a=fmtp:" token SP param-assign 0*(";" SP param-assign) CRLF

   where <token> codes the RTP payload type.  Note that white space MUST
   NOT appear between the "a=fmtp:" and the RTP payload type.

   We now define the syntax of the parameters defined in Appendix C.
   The definition takes the form of the incremental assembly of the
   <param-assign> token.  See [RFC3640] for the syntax of the
   mpeg4-generic parameters discussed in Section 6.2.

   ;
   ;
   ; top-level definition for all parameters
   ;
   ;

   ;
   ; Parameters defined in Appendix C.1

   param-assign =   ("cm_unused="  (([channel-list] command-type
                                     [f-list]) / sysex-data))

   param-assign =/  ("cm_used="    (([channel-list] command-type
                                     [f-list]) / sysex-data))

noToC RFC4695 - Page 151

   ;
   ; Parameters defined in Appendix C.2

   param-assign =/  ("j_sec="      ("none" / "recj" / *ietf-extension))

   param-assign =/  ("j_update="   ("anchor" / "closed-loop" /
                                    "open-loop" / *ietf-extension))

   param-assign =/  ("ch_default=" (([channel-list] chapter-list
                                     [f-list]) / sysex-data))

   param-assign =/  ("ch_never="   (([channel-list] chapter-list
                                     [f-list]) / sysex-data))

   param-assign =/  ("ch_anchor="  (([channel-list] chapter-list
                                     [f-list]) / sysex-data))

   ;
   ; Parameters defined in Appendix C.3

   param-assign =/  ("tsmode="     ("comex" / "async" / "buffer"))

   param-assign =/  ("linerate="    nonzero-four-octet)

   param-assign =/  ("octpos="      ("first" / "last"))

   param-assign =/  ("mperiod="     nonzero-four-octet)

   ;
   ; Parameter defined in Appendix C.4

   param-assign =/  ("guardtime="     nonzero-four-octet)

   param-assign =/  ("rtp_ptime="     four-octet)

   param-assign =/  ("rtp_maxptime="  four-octet)

   ;
   ; Parameters defined in Appendix C.5

   param-assign =/  ("musicport="     four-octet)

noToC RFC4695 - Page 152

   ;
   ; Parameters defined in Appendix C.6

   param-assign =/  ("chanmask="     ( 1*( 16( "0" / "1" ) )))

   param-assign =/  ("cid="          double-quote cid-block
                                     double-quote)

   param-assign =/  ("inline="       double-quote base-64-block
                                     double-quote)

   param-assign =/  ("multimode="    ("all" / "one"))

   param-assign =/  ("render="       ("synthetic" / "api" / "null" /
                                      "unknown" / *extension))

   param-assign =/  ("rinit="        mime-type "/" mime-subtype)

   param-assign =/  ("smf_cid="      double-quote cid-block
                                     double-quote)

   param-assign =/  ("smf_info="     ("ignore" / "identity" /
                                     "sdp_start" / *extension))

   param-assign =/  ("smf_inline="   double-quote base-64-block
                                     double-quote)

   param-assign =/  ("smf_url="      double-quote uri-element
                                     double-quote)

   param-assign =/  ("subrender="    ("default" / *extension))

   param-assign =/  ("url="          double-quote uri-element
                                     double-quote)

   ;
   ; list definitions for the cm_ command-type
   ;

   command-type    = command-part1 command-part2 command-part3

   command-part1   = (*1"A") (*1"B") (*1"C") (*1"F") (*1"G") (*1"H")

   command-part2   = (*1"J") (*1"K") (*1"M") (*1"N") (*1"P") (*1"Q")

   command-part3   = (*1"T") (*1"V") (*1"W") (*1"X") (*1"Y") (*1"Z")

noToC RFC4695 - Page 153

   ;
   ; list definitions for the ch_ chapter-list
   ;

   chapter-list  =  ch-part1 ch-part2 ch-part3

   ch-part1  = (*1"A") (*1"B") (*1"C") (*1"D") (*1"E") (*1"F") (*1"G")

   ch-part2  = (*1"H") (*1"J") (*1"K") (*1"M") (*1"N") (*1"P") (*1"Q")

   ch-part3  = (*1"T") (*1"V") (*1"W") (*1"X") (*1"Y") (*1"Z")

   ;
   ; list definitions for the ch_ channel-list
   ;

   channel-list       = midi-chan-element *("." midi-chan-element)

   midi-chan-element  = midi-chan / midi-chan-range

   midi-chan-range    = midi-chan "-" midi-chan

                      ; decimal value of left midi-chan
                      ; MUST be strictly less than decimal
                      ; value of right midi-chan

   midi-chan          = %d0-15

   ;
   ; list definitions for the ch_ field list (f-list)
   ;

   f-list             = midi-field-element *("." midi-field-element)

   midi-field-element = midi-field / midi-field-range

   midi-field-range   = midi-field "-" midi-field
                      ;
                      ; decimal value of left midi-field
                      ; MUST be strictly less than decimal
                      ; value of right midi-field

   midi-field         = four-octet
                      ;
                      ; large range accommodates Chapter M
                      ; RPN (0-16383) and NRPN (16384-32767)
                      ; parameters, and Chapter X octet sizes.

noToC RFC4695 - Page 154

   ;
   ; definitions for ch_ sysex-data
   ;

   sysex-data         = "__"  h-list *("_" h-list) "__"

   h-list             = hex-field-element *("." hex-field-element)

   hex-field-element  = hex-octet / hex-field-range

   hex-field-range    = hex-octet "-" hex-octet
                      ;
                      ; hexadecimal value of left hex-octet
                      ; MUST be strictly less than hexadecimal
                      ; value of right hex-octet

   hex-octet          = 2("0" / "1" / "2"/ "3" / "4" /
                          "5" / "6" / "7" / "8" / "9" /
                          "A" / "B" / "C" / "D" / "E" / "F")
                      ;
                      ; rewritten version of hex-octet in [RFC2045]
                      ; (page 23).
                      ; note that a-f are not permitted, only A-F.
                      ; hex-octet values MUST NOT exceed 7F.

   ;
   ; definitions for rinit parameter
   ;

   mime-type          = "audio" / "application"

   mime-subtype       = token
                      ;
                      ; See Appendix C.6.2 for registration
                      ; requirements for rinit type/subtypes.

   ;
   ; definitions for base64 encoding
   ; copied from [RFC4566]

   base-64-block      = *base64-unit [base64-pad]

   base64-unit        =  4base64-char

   base64-pad         =  2base64-char "==" / 3base64-char "="

   base64-char        =  %x41-5A / %x61-7A / %x30-39 / "+" / "/"
                      ;  A-Z, a-z, 0-9, "+" and "/"

noToC RFC4695 - Page 155

   ;
   ; generic rules
   ;

   ietf-extension     = token
                      ;
                      ; ietf-extension may only be defined in
                      ; standards-track RFCs.

   extension          = token
                      ;
                      ; extension may be defined by filing
                      ; a registration with IANA.

   four-octet         = %d0-4294967295
                      ; unsigned encoding of 32-bits

   nonzero-four-octet = %d1-4294967295
                      ; unsigned encoding of 32-bits, ex-zero

   uri-element        = URI-reference
                      ; as defined in [RFC3986]

   double-quote       = %x22

                      ; the double-quote (") character

   token              =  1*token-char
                      ; copied from [RFC4566]

   token-char         =  %x21 / %x23-27 / %x2A-2B / %x2D-2E /
                         %x30-39 / %x41-5A / %x5E-7E
                      ; copied from [RFC4566]

   cid-block          = 1*cid-char

   cid-char           =  token-char
   cid-char           =/  "@"
   cid-char           =/  ","
   cid-char           =/  ";"
   cid-char           =/  ":"
   cid-char           =/  "\"
   cid-char           =/  "/"
   cid-char           =/  "["
   cid-char           =/  "]"
   cid-char           =/  "?"
   cid-char           =/  "="

noToC RFC4695 - Page 156

                      ;
                      ; add back in the tspecials [RFC2045], except for
                      ; double-quote and the non-email safe () <>
                      ; note that "cid" defined above ensures that
                      ; cid-block is enclosed with double-quotes

   ; external references
   ; URI-reference: from [RFC3986]

   ;
   ; End of ABNF


   The mpeg4-generic RTP payload [RFC3640] defines a "mode" parameter
   that signals the type of MPEG stream in use.  We add a new mode
   value, "rtp-midi", using the ABNF rule below:

   ;
   ; mpeg4-generic mode parameter extension
   ;

   mode              =/ "rtp-midi"
                     ; as described in Section 6.2 of this memo

(page 156 continued on part 7)