tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Glossaries     Architecture     IMS     UICC    |    search     info

RFC 6295

Proposed STD
Pages: 171
Top     in Index     Prev     Next
in Group Index     Prev in Group     Next in Group     Group: PAYLOAD

RTP Payload Format for MIDI

Part 1 of 7, p. 1 to 22
None       Next RFC Part

Obsoletes:    4695


Top       ToC       Page 1 
Internet Engineering Task Force (IETF)                        J. Lazzaro
Request for Comments: 6295                                  J. Wawrzynek
Obsoletes: 4695                                              UC Berkeley
Category: Standards Track                                      June 2011
ISSN: 2070-1721


                      RTP Payload Format for MIDI

Abstract

   This memo describes a Real-time Transport Protocol (RTP) payload
   format for the MIDI (Musical Instrument Digital Interface) command
   language.  The format encodes all commands that may legally appear on
   a MIDI 1.0 DIN cable.  The format is suitable for interactive
   applications (such as network musical performance) and content-
   delivery applications (such as file streaming).  The format may be
   used over unicast and multicast UDP and TCP, and it defines tools for
   graceful recovery from packet loss.  Stream behavior, including the
   MIDI rendering method, may be customized during session setup.  The
   format also serves as a mode for the mpeg4-generic format, to support
   the MPEG 4 Audio Object Types for General MIDI, Downloadable Sounds
   Level 2, and Structured Audio.  This document obsoletes RFC 4695.

Status of This Memo

   This is an Internet Standards Track document.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Further information on
   Internet Standards is available in Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc6295.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect

Page 2 
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1. Introduction ....................................................4
      1.1. Terminology ................................................6
      1.2. Bitfield Conventions .......................................6
   2. Packet Format ...................................................6
      2.1. RTP Header .................................................7
      2.2. MIDI Payload ..............................................11
   3. MIDI Command Section ...........................................13
      3.1. Timestamps ................................................14
      3.2. Command Coding ............................................16
   4. The Recovery Journal System ....................................22
   5. Recovery Journal Format ........................................24
   6. Session Description Protocol ...................................28
      6.1. Session Descriptions for Native Streams ...................29
      6.2. Session Descriptions for mpeg4-generic Streams ............30
      6.3. Parameters ................................................33
   7. Extensibility ..................................................34
   8. Congestion Control .............................................35
   9. Security Considerations ........................................35
   10. Acknowledgements ..............................................36
   11. IANA Considerations ...........................................37
      11.1. rtp-midi Media Type Registration .........................38
           11.1.1. Repository Request for audio/rtp-midi .............40
      11.2. mpeg4-generic Media Type Registration ....................42
           11.2.1. Repository Request for Mode rtp-midi for
                   mpeg4-generic .....................................44
      11.3. asc Media Type Registration ..............................46
   12. Changes from RFC 4695 .........................................48
   Appendix A. The Recovery Journal Channel Chapters .................52
      A.1. Recovery Journal Definitions ..............................52
      A.2. Chapter P: MIDI Program Change ............................56
      A.3. Chapter C: MIDI Control Change ............................57
           A.3.1. Log Inclusion Rules ................................58
           A.3.2. Controller Log Format ..............................59
           A.3.3. Log List Coding Rules ..............................61
           A.3.4. The Parameter System ...............................64
      A.4. Chapter M: MIDI Parameter System ..........................66
           A.4.1. Log Inclusion Rules ................................68
           A.4.2. Log Coding Rules ...................................69
                A.4.2.1. The Value Tool ..............................71
                A.4.2.2. The Count Tool ..............................74
      A.5. Chapter W: MIDI Pitch Wheel ...............................74

Top      ToC       Page 3 
      A.6. Chapter N: MIDI NoteOff and NoteOn ........................75
           A.6.1. Header Structure ...................................77
           A.6.2. Note Structures ....................................78
      A.7. Chapter E: MIDI Note Command Extras .......................79
           A.7.1. Note Log Format ....................................80
           A.7.2. Log Inclusion Rules ................................80
      A.8. Chapter T: MIDI Channel Aftertouch ........................81
      A.9. Chapter A: MIDI Poly Aftertouch  ..........................82
   Appendix B. The Recovery Journal System Chapters ..................83
      B.1. System Chapter D: Simple System Commands ..................83
                B.1.1. Undefined System Commands .....................84
      B.2. System Chapter V: Active Sense Command ....................87
      B.3. System Chapter Q: Sequencer State Commands ................87
                B.3.1. Non-Compliant Sequencers ......................89
      B.4. System Chapter F: MIDI Time Code Tape Position ............90
           B.4.1.  Partial Frames ....................................93
      B.5. System Chapter X: System Exclusive ........................94
                B.5.1. Chapter Format ................................94
                B.5.2. Log Inclusion Semantics .......................96
                B.5.3. TCOUNT and COUNT Fields .......................99
   Appendix C. Session Configuration Tools ....... ..................100
      C.1. Configuration Tools: Stream Subsetting ...................101
      C.2. Configuration Tools: The Journalling System ..............106
           C.2.1. The j_sec Parameter ...............................106
           C.2.2. The j_update Parameter ............................107
                C.2.2.1. The anchor Sending Policy ..................108
                C.2.2.2. The closed-loop Sending Policy .............109
                C.2.2.3. The open-loop Sending Policy ...............113
           C.2.3. Recovery Journal Chapter Inclusion Parameters .....114
      C.3. Configuration Tools: Timestamp Semantics .................119
           C.3.1. The comex Algorithm ...............................120
           C.3.2. The async Algorithm ...............................121
           C.3.3. The buffer Algorithm ..............................122
      C.4. Configuration Tools: Packet Timing Tools .................123
           C.4.1. Packet Duration Tools .............................123
           C.4.2. The guardtime Parameter ...........................124
      C.5. Configuration Tools: Stream Description ..................125
      C.6. Configuration Tools: MIDI Rendering ......................131
           C.6.1. The multimode Parameter ...........................132
           C.6.2. Renderer Specification ............................133
           C.6.3. Renderer Initialization ...........................135
           C.6.4. MIDI Channel Mapping ..............................137
                C.6.4.1. The smf_info Parameter .....................138
                C.6.4.2. The smf_inline, smf_url, and smf_cid
                         Parameters .................................140
                C.6.4.3. The chanmask Parameter .....................140
           C.6.5. The audio/asc Media Type ..........................141
      C.7. Interoperability .........................................143

Top      ToC       Page 4 
           C.7.1. MIDI Content-Streaming Applications ...............144
           C.7.2. MIDI Network Musical Performance Applications .....147
   Appendix D. Parameter Syntax Definitions .... ....................153
   Appendix E. A MIDI Overview for Networking Specialists ...........160
      E.1. Commands Types ...........................................162
      E.2. Running Status ...........................................163
      E.3. Command Timing ...........................................163
      E.4. AudioSpecificConfig Templates for MMA Renderers ..........164
   References .......................................................169
      Normative References ..........................................169
      Informative References ........................................170

1.  Introduction

   This document obsoletes [RFC4695].

   The Internet Engineering Task Force (IETF) has developed a set of
   focused tools for multimedia networking ([RFC3550] [RFC4566]
   [RFC3261] [RFC2326]).  These tools can be combined in different ways
   to support a variety of real-time applications over Internet Protocol
   (IP) networks.

   For example, a telephony application might use the Session Initiation
   Protocol (SIP, [RFC3261]) to set up a phone call.  Call setup would
   include negotiations to agree on a common audio codec [RFC3264].
   Negotiations would use the Session Description Protocol (SDP,
   [RFC4566]) to describe candidate codecs.

   After a call is set up, audio data would flow between the parties
   using the Real Time Protocol (RTP, [RFC3550]) under any applicable
   profile (for example, the Audio/Visual Profile (AVP, [RFC3551])).
   The tools used in this telephony example (SIP, SDP, and RTP) might be
   combined in a different way to support a content-streaming
   application, perhaps in conjunction with other tools, such as the
   Real Time Streaming Protocol (RTSP, [RFC2326]).

   The MIDI (Musical Instrument Digital Interface) command language
   [MIDI] is widely used in musical applications that are analogous to
   the examples described above.  On stage and in the recording studio,
   MIDI is used for the interactive remote control of musical
   instruments, an application similar in spirit to telephony.  On web
   pages, Standard MIDI Files (SMFs, [MIDI]) rendered using the General
   MIDI standard [MIDI] provide a low-bandwidth substitute for audio
   streaming.

   [RFC4695] was motivated by a simple premise: if MIDI performances
   could be sent as RTP streams that are managed by IETF session tools,
   a hybridization of the MIDI and IETF application domains might occur.

Top      ToC       Page 5 
   For example, interoperable MIDI networking might foster network music
   performance applications, in which a group of musicians located at
   different physical locations interact over a network to perform as
   they would if they were located in the same room [NMP].  As a second
   example, the streaming community might begin to use MIDI for low-
   bitrate audio coding, perhaps in conjunction with normative sound-
   synthesis methods [MPEGSA].

   Five years after [RFC4695], these applications have not yet reached
   the mainstream.  However, experiments in academia and industry
   continue.  This memo, which obsoletes [RFC4695] and fixes minor
   errata (see Section 12), has been written in service of these
   experiments.

   To enable MIDI applications to use RTP, this memo defines an RTP
   payload format and its media type.  Sections 2-5 and Appendices A and
   B define the RTP payload format.  Section 6 and Appendices C and D
   define the media types identifying the payload format, the parameters
   needed for configuration, and the utilization of the parameters in
   SDP.

   Appendix C also includes interoperability guidelines for the example
   applications described above: network musical performance using SIP
   (Appendix C.7.2) and content streaming using RTSP (Appendix C.7.1).

   Another potential application area for RTP MIDI is MIDI networking
   for professional audio equipment and electronic musical instruments.
   We do not offer interoperability guidelines for this application in
   this memo.  However, RTP MIDI has been designed with stage and studio
   applications in mind, and we expect that efforts to define a stage
   and studio framework will rely on RTP MIDI for MIDI transport
   services.

   Some applications may require MIDI media delivery at a certain
   service quality level (latency, jitter, packet loss, etc.).  RTP
   itself does not provide service guarantees.  However, applications
   may use lower-layer network protocols to configure the quality of the
   transport services that RTP uses.  These protocols may act to reserve
   network resources for RTP flows [RFC2205] or may simply direct RTP
   traffic onto a dedicated "media network" in a local installation.
   Note that RTP and the MIDI payload format do provide tools that
   applications may use to achieve the best possible real-time
   performance at a given service level.

   This memo normatively defines the syntax and semantics of the MIDI
   payload format.  However, this memo does not define algorithms for
   sending and receiving packets.  An ancillary document [RFC4696]

Top      ToC       Page 6 
   provides informative guidance on algorithms.  Supplemental
   information may be found in related conference publications [NMP]
   [GRAME].

   Throughout this memo, the phrase "native stream" refers to a stream
   that uses the rtp-midi media type.  The phrase "mpeg4-generic stream"
   refers to a stream that uses the mpeg4-generic media type (in mode
   rtp-midi) to operate in an MPEG 4 environment [RFC3640].  Section 6
   describes this distinction in detail.

1.1.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

1.2.  Bitfield Conventions

   Several bitfield coding idioms are used in this document.  As most of
   these idioms only appear in Appendices A and B, we define them in
   Appendix A.1.

   However, a few of these idioms also appear in the main text of this
   document.  For convenience, we describe them below:

   o  R flag bit.  R flag bits are reserved for future use.  Senders
      MUST set R bits to 0.  Receivers MUST ignore R bit values.

   o  LENGTH field.  All fields named LENGTH (as distinct from LEN) code
      the number of octets in the structure that contains it, including
      the header it resides in and all hierarchical levels below it.  If
      a structure contains a LENGTH field, a receiver MUST use the
      LENGTH field value to advance past the structure during parsing,
      rather than use knowledge about the internal format of the
      structure.

2.  Packet Format

   In this section, we introduce the format of RTP MIDI packets.  The
   description includes some background information on RTP for the
   benefit of MIDI implementors new to IETF tools.  Implementors should
   consult [RFC3550] for an authoritative description of RTP.

   This memo assumes that the reader is familiar with MIDI syntax and
   semantics.  Appendix E provides a MIDI overview, at a level of detail
   sufficient to understand most of this memo.  Implementors should
   consult [MIDI] for an authoritative description of MIDI.

Top      ToC       Page 7 
   The MIDI payload format maps a MIDI command stream (16 voice channels
   + systems) onto an RTP stream.  An RTP media stream is a sequence of
   logical packets that share a common format.  Each packet consists of
   two parts: the RTP header and the MIDI payload.  Figure 1 shows this
   format (vertical space delineates the header and payload).

   We describe RTP packets as "logical" packets to highlight the fact
   that RTP itself is not a network-layer protocol.  Instead, RTP
   packets are mapped onto network protocols (such as unicast UDP,
   multicast UDP, or TCP) by an application [ALF].  The interleaved mode
   of the Real Time Streaming Protocol (RTSP, [RFC2326]) is an example
   of an RTP mapping to TCP transport, as is [RFC4571].

2.1.  RTP Header

   [RFC3550] provides a complete description of the RTP header fields.
   In this section, we clarify the role of a few RTP header fields for
   MIDI applications.  All fields are coded in network byte order (big-
   endian).

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | V |P|X|  CC   |M|     PT      |        Sequence number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           Timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             SSRC                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     MIDI command section ...                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       Journal section ...                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                         Figure 1 -- Packet Format

   The behavior of the 1-bit M field depends on the media type of the
   stream.  For native streams, the M bit MUST be set to 1 if the MIDI
   command section has a non-zero LEN field and MUST be set to 0
   otherwise.  For mpeg4-generic streams, the M bit MUST be set to 1 for
   all packets (to conform to [RFC3640]).

   In an RTP MIDI stream, the 16-bit sequence number field is
   initialized to a randomly chosen value and is incremented by one
   (modulo 2^16) for each packet sent in the stream.  A related

Top      ToC       Page 8 
   quantity, the 32-bit extended packet sequence number, may be computed
   by tracking rollovers of the 16-bit sequence number.  Note that
   different receivers of the same stream may compute different extended
   packet sequence numbers, depending on when the receiver joined the
   session.

   The 32-bit timestamp field sets the base timestamp value for the
   packet.  The payload codes MIDI command timing relative to this
   value.  The timestamp units are set by the clock rate parameter.  For
   example, if the clock rate has a value of 44100 Hz, two packets whose
   base timestamp values differ by 2 seconds have RTP timestamp fields
   that differ by 88200.

   Note that the clock rate parameter is not encoded within each RTP
   MIDI packet.  A receiver of an RTP MIDI stream becomes aware of the
   clock rate as part of the session setup process.  For example, if a
   session management tool uses the Session Description Protocol (SDP,
   [RFC4566]) to describe a media session, the clock rate parameter is
   set using the rtpmap attribute.  We show examples of session setup in
   Section 6.

   For RTP MIDI streams destined to be rendered into audio, the clock
   rate SHOULD be an audio sample rate of 32 KHz or higher.  This
   recommendation is due to the sensitivity of human musical perception
   to small timing errors in musical note sequences and due to the
   timbral changes that occur when two near-simultaneous MIDI NoteOns
   are rendered with a different timing than that desired by the content
   author due to clock rate quantization.  RTP MIDI streams that are not
   destined for audio rendering (such as MIDI streams that control stage
   lighting) MAY use a lower clock rate but SHOULD use a clock rate high
   enough to avoid timing artifacts in the application.

   For RTP MIDI streams destined to be rendered into audio, the clock
   rate SHOULD be chosen from rates in common use in professional audio
   applications or in consumer audio distribution.  At the time of this
   writing, these rates include 32 KHz, 44.1 KHz, 48 KHz, 64 KHz, 88.2
   KHz, 96 KHz, 176.4 KHz, and 192 KHz.  If the RTP MIDI session is a
   part of a synchronized media session that includes another (non-MIDI)
   RTP audio stream with a clock rate of 32 KHz or higher, the RTP MIDI
   stream SHOULD use a clock rate that matches the clock rate of the
   other audio stream.  However, if the RTP MIDI stream is destined to
   be rendered into audio, the RTP MIDI stream SHOULD NOT use a clock
   rate lower than 32 KHz, even if this second stream has a clock rate
   lower than 32 KHz.

   Timestamps of consecutive packets do not necessarily increment at a
   fixed rate because RTP MIDI packets are not necessarily sent at a
   fixed rate.  The degree of packet transmission regularity reflects

Top      ToC       Page 9 
   the underlying application dynamics.  Interactive applications may
   vary the packet-sending rate to track the gestural rate of a human
   performer, whereas content-streaming applications may send packets at
   a fixed rate.

   Therefore, the timestamps for two sequential RTP packets may be
   identical, or the second packet may have a timestamp arbitrarily
   larger than the first packet (modulo 2^32).  Section 3 places
   additional restrictions on the RTP timestamps for two sequential RTP
   packets, as does the guardtime parameter (Appendix C.4.2).

   We use the term "media time" to denote the temporal duration of the
   media coded by an RTP packet.  The media time coded by a packet is
   computed by subtracting the last command timestamp in the MIDI
   command section from the RTP timestamp (modulo 2^32).  If the MIDI
   list of the MIDI command section of a packet is empty, the media time
   coded by the packet is 0 ms.  Appendix C.4.1 discusses media time
   issues in detail.

   We now define RTP session semantics, in the context of sessions
   specified using the Session Description Protocol [RFC4566].  A
   session description media line ("m=") specifies an RTP session.  An
   RTP session has an independent space of 2^32 synchronization sources.
   Synchronization source identifiers are coded in the SSRC header field
   of RTP session packets.  The payload types that may appear in the PT
   header field of RTP session packets are listed at the end of the
   media line.

   Several RTP MIDI streams may appear in an RTP session.  Each stream
   is distinguished by a unique SSRC value and has a unique sequence
   number and RTP timestamp space.  Multiple streams in the RTP session
   may be sent by a single party.  Multiple parties may send streams in
   the RTP session.  An RTP MIDI stream encodes data for a single MIDI
   command name space (16 voice channels + systems).

   Streams in an RTP session may use different payload types or they may
   use the same payload type.  However, each party may send, at most,
   one RTP MIDI stream for each payload type mapped to an RTP MIDI
   payload format in an RTP session.  Recall that dynamic binding of
   payload type numbers in [RFC4566] lets a party map many payload type
   numbers to the RTP MIDI payload format; thus, a party may send many
   RTP MIDI streams in a single RTP session.  Pairs of streams (unicast
   or multicast) that communicate between two parties in an RTP session
   and that share a payload type have the same association as a MIDI
   cable pair that cross-connects two devices in a MIDI 1.0 DIN network.

Top      ToC       Page 10 
   The RTP session architecture described above is efficient in its use
   of network ports, as one RTP session (using a port pair per party)
   supports the transport of many MIDI name spaces (16 MIDI channels +
   systems).  We define tools for grouping and labelling MIDI name
   spaces across streams and sessions in Appendix C.5 of this memo.

   The RTP header timestamps for each stream in an RTP session have
   separately and randomly chosen initialization values.  Receivers use
   the timing fields encoded in the RTP Control Protocol (RTCP,
   [RFC3550]) sender reports to synchronize the streams sent by a party.
   The SSRC values for each stream in an RTP session are also separately
   and randomly chosen, as described in [RFC3550].  Receivers use the
   CNAME field encoded in RTCP sender reports to verify that streams
   were sent by the same party and to detect SSRC collisions, as
   described in [RFC3550].

   In some applications, a receiver renders MIDI commands into audio (or
   into control actions, such as the rewind of a tape deck or the
   dimming of stage lights).  In other applications, a receiver presents
   a MIDI stream to software programs via an Application Programming
   Interface (API).  Appendix C.6 defines session configuration tools to
   specify what receivers should do with a MIDI command stream.

   If a multimedia session uses different RTP MIDI streams to send
   different classes of media, the streams MUST be sent over different
   RTP sessions.  For example, if a multimedia session uses one MIDI
   stream for audio and a second MIDI stream to control a lighting
   system, the audio and lighting streams MUST be sent over different
   RTP sessions, each with its own media line.

   Session description tools defined in Appendix C.5 let a sending party
   split a single MIDI name space (16 voice channels + systems) over
   several RTP MIDI streams.  Split transport of a MIDI command stream
   is a delicate task because correct command stream reconstruction by a
   receiver depends on exact timing synchronization across the streams.

   To support split name spaces, we define the following requirements:

   o  A party MUST NOT send several RTP MIDI streams that share a MIDI
      name space in the same RTP session.  Instead, each stream MUST be
      sent from a different RTP session.

   o  If several RTP MIDI streams sent by a party share a MIDI name
      space, all streams MUST use the same SSRC value and MUST use the
      same randomly chosen RTP timestamp initialization value.

Top      ToC       Page 11 
   These rules let a receiver identify streams that share a MIDI name
   space (by matching SSRC values) and also let a receiver accurately
   reconstruct the source MIDI command stream (by using RTP timestamps
   to interleave commands from the two streams).  Care MUST be taken by
   senders to ensure that SSRC changes due to collisions are reflected
   in both streams.  Receivers MUST regularly examine the RTCP CNAME
   fields associated with the linked streams to ensure that the assumed
   link is legitimate and not the result of an SSRC collision by another
   sender.

   Except for the special cases described above, a party may send many
   RTP MIDI streams in the same session.  However, it is sometimes
   advantageous for two RTP MIDI streams to be sent over different RTP
   sessions.  For example, two streams may need different values for RTP
   session-level attributes (such as the sendonly and recvonly
   attributes).  As a second example, two RTP sessions may be needed to
   send two unicast streams in a multimedia session that originate on
   different computers (with different IP numbers).  Two RTP sessions
   are needed in this case because transport addresses are specified on
   the RTP-session or multimedia-session level, not on a payload type
   level.

   On a final note, in some uses of MIDI, parties send bidirectional
   traffic to conduct transactions (such as file exchange).  These
   commands were designed to work over MIDI 1.0 DIN cable networks and
   may be configured in a multicast topology, which uses pure "party-
   line" signalling.  Thus, if a multimedia session ensures a multicast
   connection between all parties, bidirectional MIDI commands will work
   without additional support from the RTP MIDI payload format.

2.2.  MIDI Payload

   The payload (Figure 1) MUST begin with the MIDI command section.  The
   MIDI command section codes a (possibly empty) list of timestamped
   MIDI commands and provides the essential service of the payload
   format.

   The payload MAY also contain a journal section.  The journal section
   provides resiliency by coding the recent history of the stream.  A
   flag in the MIDI command section codes the presence of a journal
   section in the payload.

   Section 3 defines the MIDI command section.  Sections 4 and 5 and
   Appendices A and B define the recovery journal, the default format
   for the journal section.  Here, we describe how these payload
   sections operate in a stream in an RTP session.

Top      ToC       Page 12 
   The journalling method for a stream is set at the start of a session
   and MUST NOT be changed thereafter.  A stream may be set to use the
   recovery journal, to use an alternative journal format (none are
   defined in this memo), or not to use a journal.

   The default journalling method of a stream is inferred from its
   transport type.  Streams that use unreliable transport (such as UDP)
   default to using the recovery journal.  Streams that use reliable
   transport (such as TCP) default to not using a journal.  Appendix
   C.2.1 defines session configuration tools for overriding these
   defaults.  For all types of transport, a sender MUST transmit an RTP
   packet stream with consecutive sequence numbers (modulo 2^16).

   If a stream uses the recovery journal, every payload in the stream
   MUST include a journal section.  If a stream does not use
   journalling, a journal section MUST NOT appear in a stream payload.
   If a stream uses an alternative journal format, the specification for
   the journal format defines an inclusion policy.

   If a stream is sent over UDP transport, the Maximum Transmission Unit
   (MTU) of the underlying network limits the practical size of the
   payload section (for example, an Ethernet MTU is 1500 octets) for
   applications where predictable and minimal packet transmission
   latency is critical.  A sender SHOULD NOT create RTP MIDI UDP packets
   whose sizes exceed the MTU of the underlying network.  Instead, the
   sender SHOULD take steps to keep the maximum packet size under the
   MTU limit.

   These steps may take many forms.  The default closed-loop recovery
   journal sending policy (defined in Appendix C.2.2.2) uses RTP Control
   Protocol (RTCP, [RFC3550]) feedback to manage the RTP MIDI packet
   size.  In addition, Section 3.2 and Appendix B.5.2 provide specific
   tools for managing the size of packets that code MIDI System
   Exclusive (0xF0) commands.  Appendix C.5 defines session
   configuration tools that may be used to split a dense MIDI name space
   into several UDP streams (each sent in a different RTP session, per
   Section 2.1) so that the payload fits comfortably into an MTU.
   Another option is to use TCP.  Section 4.3 of [RFC4696] provides non-
   normative advice for packet size management.

Top      ToC       Page 13 
3.  MIDI Command Section

   Figure 2 shows the format of the MIDI command section.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |B|J|Z|P|LEN... |  MIDI list ...                                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Figure 2 -- MIDI Command Section

   The MIDI command section begins with a variable-length header.

   The header field LEN codes the number of octets in the MIDI list that
   follow the header.  If the header flag B is 0, the header is one
   octet long, and LEN is a 4-bit field, supporting a maximum MIDI list
   length of 15 octets.

   If B is 1, the header is two octets long, and LEN is a 12-bit field,
   supporting a maximum MIDI list length of 4095 octets.  LEN is coded
   in network byte order (big-endian): the 4 bits of LEN that appear in
   the first header octet code the most significant 4 bits of the 12-bit
   LEN value.

   A LEN value of 0 is legal, and it codes an empty MIDI list.

   If the J header bit is set to 1, a journal section MUST appear after
   the MIDI command section in the payload.  If the J header bit is set
   to 0, the payload MUST NOT contain a journal section.

   We define the semantics of the P header bit in Section 3.2.

   If the LEN header field is nonzero, the MIDI list has the structure
   shown in Figure 3.

Top      ToC       Page 14 
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time 0     (1-4 octets long, or 0 octets if Z = 0)     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command 0   (1 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time 1     (1-4 octets long)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command 1   (1 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time N     (1-4 octets long)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command N   (0 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 3 -- MIDI List Structure

   If the header flag Z is 1, the MIDI list begins with a complete MIDI
   command (coded in the MIDI Command 0 field in Figure 3) preceded by a
   delta time (coded in the Delta Time 0 field).  If Z is 0, the Delta
   Time 0 field is not present in the MIDI list, and the command coded
   in the MIDI Command 0 field has an implicit delta time of 0.

   The MIDI list structure may also optionally encode a list of N
   additional complete MIDI commands, each coded in a MIDI Command K
   field.  Each additional command MUST be preceded by a Delta Time K
   field, which codes the command's delta time.  We discuss exceptions
   to the "command fields code complete MIDI commands" rule in Section
   3.2.

   The final MIDI command field (i.e., the MIDI Command N field, shown
   in Figure 3) in the MIDI list MAY be empty.  Moreover, a MIDI list
   MAY consist of a single delta time (encoded in the Delta Time 0
   field) without an associated command (which would have been encoded
   in the MIDI Command 0 field).  These rules enable MIDI coding
   features that are explained in Section 3.1.  We delay the
   explanations because an understanding of RTP MIDI timestamps is
   necessary to describe the features.

3.1.  Timestamps

   In this section, we describe how RTP MIDI encodes a timestamp for
   each MIDI list command.  Command timestamps have the same units as
   RTP packet header timestamps (described in Section 2.1 and
   [RFC3550]).  Recall that RTP timestamps have units of seconds, whose
   scaling is set during session configuration (see Section 6.1 and
   [RFC4566]).

Top      ToC       Page 15 
   As shown in Figure 3, the MIDI list encodes time using a compact
   delta time format.  The RTP MIDI delta time syntax is a modified form
   of the MIDI File delta time syntax [MIDI].  RTP MIDI delta times use
   1-4 octet fields to encode 32-bit unsigned integers.  Figure 4 shows
   the encoded and decoded forms of delta times.  Note that delta time
   values may be legally encoded in multiple formats; for example, there
   are four legal ways to encode the zero delta time (0x00, 0x8000,
   0x808000, 0x80808000).

   RTP MIDI uses delta times to encode a timestamp for each MIDI
   command.  The timestamp for MIDI Command K is the summation (modulo
   2^32) of the RTP timestamp and decoded delta times 0 through K.  This
   cumulative coding technique, borrowed from MIDI File delta time
   coding, is efficient because it reduces the number of multi-octet
   delta times.

   All command timestamps in a packet MUST be less than or equal to the
   RTP timestamp of the next packet in the stream (modulo 2^32).

   This restriction ensures that a particular RTP MIDI packet in a
   stream is uniquely responsible for encoding time, starting at the
   moment after the RTP timestamp encoded in the RTP packet header and
   ending at the moment before the final command timestamp encoded in
   the MIDI list.  The "moment before" and "moment after" qualifiers
   acknowledge the "less than or equal" semantics (as opposed to
   "strictly less than") in the sentence above this paragraph.

   Note that it is possible to "pad" the end of an RTP MIDI packet with
   time that is guaranteed to be void of MIDI commands, by setting the
   "Delta Time N" field of the MIDI list to the end of the void time and
   by omitting its corresponding "MIDI Command N" field (a syntactic
   construction the preamble of Section 3 expressly made legal).

   In addition, it is possible to code an RTP MIDI packet to express
   that a period of time in the stream is void of MIDI commands.  The
   RTP timestamp in the header would code the start of the void time.
   The MIDI list of this packet would consist of a "Delta Time 0" field
   that coded the end of the void time.  No other fields would be
   present in the MIDI list (a syntactic construction the preamble of
   Section 3 also expressly made legal).

   By default, a command timestamp indicates the execution time for the
   command.  The difference between two timestamps indicates the time
   delay between the execution of the commands.  This difference may be
   zero, coding simultaneous execution.  In this memo, we refer to this
   interpretation of timestamps as "comex" (COMmand EXecution)
   semantics.  We formally define comex semantics in Appendix C.3.

Top      ToC       Page 16 
   The comex interpretation of timestamps works well for transcoding a
   Standard MIDI File (SMF) into an RTP MIDI stream, as SMFs code a
   timestamp for each MIDI command stored in the file.  To transcode an
   SMF that uses metric time markers, use the SMF tempo map (encoded in
   the SMF as meta-events) to convert metric SMF timestamp units into
   seconds-based RTP timestamp units.

   The comex interpretation also works well for MIDI hardware
   controllers that are coding raw sensor data directly onto an RTP MIDI
   stream.  Note that this controller design is preferable to a design
   that converts raw sensor data into a MIDI 1.0 cable command stream
   and then transcodes the stream onto an RTP MIDI stream.

   The comex interpretation of timestamps is usually not the best
   timestamp interpretation for transcoding a MIDI source that uses
   implicit command timing (such as MIDI 1.0 DIN cables) into an RTP
   MIDI stream.  Appendix C.3 defines alternatives to comex semantics
   and describes session configuration tools for selecting the timestamp
   interpretation semantics for a stream.

        One-Octet Delta Time:

           Encoded form: 0ddddddd
           Decoded form: 00000000 00000000 00000000 0ddddddd

        Two-Octet Delta Time:

           Encoded form: 1ccccccc 0ddddddd
           Decoded form: 00000000 00000000 00cccccc cddddddd

        Three-Octet Delta Time:

           Encoded form: 1bbbbbbb 1ccccccc 0ddddddd
           Decoded form: 00000000 000bbbbb bbcccccc cddddddd

        Four-Octet Delta Time:

           Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd
           Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd

                  Figure 4 -- Decoding Delta Time Formats

3.2.  Command Coding

   Each non-empty MIDI Command field in the MIDI list codes one of the
   MIDI command types that may legally appear on a MIDI 1.0 DIN cable.
   Standard MIDI File meta-events do not fit this definition and MUST
   NOT appear in the MIDI list.  As a rule, each MIDI Command field

Top      ToC       Page 17 
   codes a complete command, in the binary command format defined in
   [MIDI].  In the remainder of this section, we describe exceptions to
   this rule.

   The first MIDI channel command in the MIDI list MUST include a status
   octet.  Running status coding, as defined in [MIDI], MAY be used for
   all subsequent MIDI channel commands in the list.  As in [MIDI],
   System Common and System Exclusive messages (0xF0 ... 0xF7) cancel
   the running status state, but System Real-Time messages (0xF8 ...
   0xFF) do not affect the running status state.  All system commands in
   the MIDI list MUST include a status octet.

   As we note above, the first channel command in the MIDI list MUST
   include a status octet.  However, the corresponding command in the
   original MIDI source data stream might not have a status octet (in
   this case, the source would be coding the command using running
   status).  If the status octet of the first channel command in the
   MIDI list does not appear in the source data stream, the P (phantom)
   header bit MUST be set to 1.  In all other cases, the P bit MUST be
   set to 0.

   Note that the P bit describes the MIDI source data stream, not the
   MIDI list encoding; regardless of the state of the P bit, the MIDI
   list MUST include the status octet.

   As receivers MUST be able to decode running status, sender
   implementors should feel free to use running status to improve
   bandwidth efficiency.  However, senders SHOULD NOT introduce timing
   jitter into an existing MIDI command stream through an inappropriate
   use or removal of running status coding.  This warning primarily
   applies to senders whose RTP MIDI streams may be transcoded onto a
   MIDI 1.0 DIN cable [MIDI] by the receiver: both the timestamps and
   the command coding (running status or not) must comply with the
   physical restrictions of implicit time coding over a slow serial
   line.

   On a MIDI 1.0 DIN cable [MIDI], a System Real-Time command may be
   embedded inside of another "host" MIDI command.  This syntactic
   construction is not supported in the payload format: a MIDI Command
   field in the MIDI list codes exactly one MIDI command (partially or
   completely).

   To encode an embedded System Real-Time command, senders MUST extract
   the command from its host and code it in the MIDI list as a separate
   command.  The host command and System Real-Time command SHOULD appear
   in the same MIDI list.  The delta time of the System Real-Time
   command SHOULD result in a command timestamp that encodes the System
   Real-Time command placement in its original embedded position.

Top      ToC       Page 18 
   Two methods are provided for encoding MIDI System Exclusive (SysEx)
   commands in the MIDI list.  A SysEx command may be encoded in a MIDI
   Command field verbatim: a 0xF0 octet, followed by an arbitrary number
   of data octets, followed by a 0xF7 octet.

   Alternatively, a SysEx command may be encoded as multiple segments.
   The command is divided into two or more SysEx command segments; each
   segment is encoded in its own MIDI Command field in the MIDI list.

   The payload format supports segmentation in order to encode SysEx
   commands that encode information in the temporal pattern of data
   octets.  By encoding these commands as a series of segments, each
   data octet may be associated with a distinct delta time.
   Segmentation also supports the coding of large SysEx commands across
   several packets.

   To segment a SysEx command, first partition its data octet list into
   two or more sublists.  The last sublist MAY be empty (i.e., contain
   no octets); all other sublists MUST contain at least one data octet.
   To complete the segmentation, add the status octets defined in Figure
   5 to the head and tail of the first, last, and any "middle" sublists.
   Figure 6 shows example segmentations of a SysEx command.

   A sender MAY cancel a segmented SysEx command transmission that is in
   progress by sending the "cancel" sublist shown in Figure 5.  A
   "cancel" sublist MAY follow a "first" or "middle" sublist in the
   transmission but MUST NOT follow a "last" sublist.  The cancel MUST
   be empty (thus, 0xF7 0xF4 is the only legal cancel sublist).

   The cancellation feature is needed because Appendix C.1 defines
   configuration tools that let session parties exclude certain SysEx
   commands in the stream.  Senders that transcode a MIDI source onto an
   RTP MIDI stream under these constraints have the responsibility of
   excluding undesired commands from the RTP MIDI stream.

   The cancellation feature lets a sender start the transmission of a
   command before the MIDI source has sent the entire command.  If a
   sender determines that the command whose transmission is in progress
   should not appear on the RTP stream, it cancels the command.  Without
   a method for cancelling a SysEx command transmission, senders would
   be forced to use a high-latency store-and-forward approach to
   transcoding SysEx commands onto RTP MIDI packets, in order to
   validate each SysEx command before transmission.

   The recommended receiver reaction to a cancellation depends on the
   capabilities of the receiver.  For example, a sound synthesizer that
   is directly parsing RTP MIDI packets and rendering them to audio will

Top      ToC       Page 19 
   be aware of the fact that SysEx commands may be cancelled in RTP
   MIDI.  These receivers SHOULD detect a SysEx cancellation in the MIDI
   list and act as if they had never received the SysEx command.

   As a second example, a synthesizer may be receiving MIDI data from an
   RTP MIDI stream via a MIDI DIN cable (or a software API emulation of
   a MIDI DIN cable).  In this case, an RTP-MIDI-aware system receives
   the RTP MIDI stream and transcodes it onto the MIDI DIN cable (or its
   emulation).  Upon the receipt of the cancel sublist, the RTP-MIDI-
   aware transcoder might have already sent the first part of the SysEx
   command on the MIDI DIN cable to the receiver.

   Unfortunately, the MIDI DIN cable protocol cannot directly code
   "cancel SysEx in progress" semantics.  However, MIDI DIN cable
   receivers begin SysEx processing after the complete command arrives.
   The receiver checks to see if it recognizes the command (coded in the
   first few octets) and then checks to see if the command is the
   correct length.  Thus, in practice, a transcoder can cancel a SysEx
   command by sending an 0xF7 to (prematurely) end the SysEx command --
   the receiver will detect the incorrect command length and discard the
   command.

   Appendix C.1 defines configuration tools that may be used to prohibit
   SysEx command cancellation.

   The relative ordering of SysEx command segments in a MIDI list must
   match the relative ordering of the sublists in the original SysEx
   command.  By default, commands other than System Real-Time MIDI
   commands MUST NOT appear between SysEx command segments (Appendix C.1
   defines configuration tools to change this default to let other
   commands types appear between segments).  If the command segments of
   a SysEx command are placed in the MIDI lists of two or more RTP
   packets, the segment ordering rules apply to the concatenation of all
   affected MIDI lists.

          -----------------------------------------------------------
         | Sublist Position |  Head Status Octet | Tail Status Octet |
         |-----------------------------------------------------------|
         |    first         |       0xF0         |       0xF0        |
         |-----------------------------------------------------------|
         |    middle        |       0xF7         |       0xF0        |
         |-----------------------------------------------------------|
         |    last          |       0xF7         |       0xF7        |
         |-----------------------------------------------------------|
         |    cancel        |       0xF7         |       0xF4        |
          -----------------------------------------------------------

               Figure 5 -- Command Segmentation Status Octets

Top      ToC       Page 20 
   [MIDI] permits 0xF7 octets that are not part of a (0xF0, 0xF7) pair
   to appear on a MIDI 1.0 DIN cable.  Unpaired 0xF7 octets have no
   semantic meaning in MIDI apart from cancelling running status.

   Unpaired 0xF7 octets MUST NOT appear in the MIDI list of the MIDI
   Command section.  We impose this restriction to avoid interference
   with the command segmentation coding defined in Figure 5.

   SysEx commands carried on a MIDI 1.0 DIN cable may use the "dropped
   0xF7" construction [MIDI].  In this coding method, the 0xF7 octet is
   dropped from the end of the SysEx command, and the status octet of
   the next MIDI command acts both to terminate the SysEx command and
   start the next command.  To encode this construction in the payload
   format, follow these steps:

   o  Determine the appropriate delta times for the SysEx command and
      the command that follows the SysEx command.

   o  Insert the "dropped" 0xF7 octet at the end of the SysEx command to
      form the standard SysEx syntax.

   o  Code both commands into the MIDI list using the rules above.

   o  Replace the 0xF7 octet that terminates the verbatim SysEx encoding
      or the last segment of the segmented SysEx encoding with a 0xF5
      octet.  This substitution informs the receiver of the original
      "dropped 0xF7" coding.

   [MIDI] reserves the undefined System Common commands 0xF4 and 0xF5
   and the undefined System Real-Time commands 0xF9 and 0xFD for future
   use.  By default, undefined commands MUST NOT appear in a MIDI
   Command field in the MIDI list, with the exception of the 0xF5 octets
   used to code the "dropped 0xF7" construction and the 0xF4 octets used
   by SysEx "cancel" sublists.

   During session configuration, a stream may be customized to transport
   undefined commands (Appendix C.1).  For this case, we now define how
   senders encode undefined commands in the MIDI list.

   An undefined System Real-Time command MUST be coded using the System
   Real-Time rules.

   If the undefined System Common commands are put to use in a future
   version of [MIDI], the command will begin with an 0xF4 or 0xF5 status
   octet, followed by an arbitrary number of data octets (i.e., zero or
   more data bytes).  To encode these commands, senders MUST terminate
   the command with an 0xF7 octet and place the modified command into
   the MIDI Command field.

Top      ToC       Page 21 
   Unfortunately, non-compliant uses of the undefined System Common
   commands may appear in MIDI implementations.  To model these
   commands, we assume that the command begins with an 0xF4 or 0xF5
   status octet, followed by zero or more data octets, followed by zero
   or more trailing 0xF7 status octets.  To encode the command, senders
   MUST first remove all trailing 0xF7 status octets from the command.
   Then, senders MUST terminate the command with an 0xF7 octet and place
   the modified command into the MIDI Command field.

   Note that we include the trailing octets in our model as a cautionary
   measure: if such commands appeared in a non-compliant use of an
   undefined System Common command, an RTP MIDI encoding of the command
   that did not remove trailing octets could be mistaken for an encoding
   of the "middle" or "last" sublist of a segmented SysEx command
   (Figure 5) under certain packet loss conditions.

          Original SysEx command:

              0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

          A two-segment segmentation:

              0xF0 0x01 0x02 0x03 0x04 0xF0

              0xF7 0x05 0x06 0x07 0x08 0xF7

          A different two-segment segmentation:

              0xF0 0x01 0xF0

              0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

          A three-segment segmentation:

              0xF0 0x01 0x02 0xF0

              0xF7 0x03 0x04 0xF0

              0xF7 0x05 0x06 0x07 0x08 0xF7

         The segmentation with the largest number of segments:

              0xF0 0x01 0xF0

              0xF7 0x02 0xF0

              0xF7 0x03 0xF0

Top      ToC       Page 22 
              0xF7 0x04 0xF0

              0xF7 0x05 0xF0

              0xF7 0x06 0xF0

              0xF7 0x07 0xF0

              0xF7 0x08 0xF0

              0xF7 0xF7


                     Figure 6 -- Example Segmentations



(page 22 continued on part 2)

Next RFC Part