tech-invite   World Map     

IETF     RFCs     Groups     SIP     ABNFs    |    3GPP     Specs     Gloss.     Arch.     IMS     UICC    |    Misc.    |    search     info

RFC 4695


RTP Payload Format for MIDI

Part 7 of 7, p. 156 to 169
Prev RFC Part


prevText      Top      Up      ToC       Page 156 
E.  A MIDI Overview for Networking Specialists

   This appendix presents an overview of the MIDI standard, for the
   benefit of networking specialists new to musical applications.
   Implementors should consult [MIDI] for a normative description of

   Musicians make music by performing a controlled sequence of physical
   movements.  For example, a pianist plays by coordinating a series of
   key presses, key releases, and pedal actions.  MIDI represents a
   musical performance by encoding these physical gestures as a sequence
   of MIDI commands.  This high-level musical representation is compact
   but fragile: one lost command may be catastrophic to the performance.

   MIDI commands have much in common with the machine instructions of a
   microprocessor.  MIDI commands are defined as binary elements.
   Bitfields within a MIDI command have a regular structure and a
   specialized purpose.  For example, the upper nibble of the first
   command octet (the opcode field) codes the command type.  MIDI
   commands may consist of an arbitrary number of complete octets, but
   most MIDI commands are 1, 2, or 3 octets in length.

Top      Up      ToC       Page 157 
       |     Channel Voice Messages     |      Bitfield Pattern      |
       | NoteOff (end a note)           | 1000cccc 0nnnnnnn 0vvvvvvv |
       | NoteOn (start a note)          | 1001cccc 0nnnnnnn 0vvvvvvv |
       | PTouch (Polyphonic Aftertouch) | 1010cccc 0nnnnnnn 0aaaaaaa |
       | CControl (Controller Change)   | 1011cccc 0xxxxxxx 0yyyyyyy |
       | PChange (Program Change)       | 1100cccc 0ppppppp          |
       | CTouch (Channel Aftertouch)    | 1101cccc 0aaaaaaa          |
       | PWheel (Pitch Wheel)           | 1110cccc 0xxxxxxx 0yyyyyyy |

                 Figure E.1 -- MIDI Channel Messages

Top      Up      ToC       Page 158 
       |      System Common Messages    |     Bitfield Pattern       |
       | System Exclusive               | 11110000, followed by a    |
       |                                | list of 0xxxxxx octets,    |
       |                                | followed by 11110111       |
       | MIDI Time Code Quarter Frame   | 11110001 0xxxxxxx          |
       | Song Position Pointer          | 11110010 0xxxxxxx 0yyyyyyy |
       | Song Select                    | 11110011 0xxxxxxx          |
       | Undefined                      | 11110100                   |
       | Undefined                      | 11110101                   |
       | Tune Request                   | 11110110                   |
       | System Exclusive End Marker    | 11110111                   |

       |    System Realtime Messages    |     Bitfield Pattern       |
       | Clock                          | 11111000                   |
       | Undefined                      | 11111001                   |
       | Start                          | 11111010                   |
       | Continue                       | 11111011                   |
       | Stop                           | 11111100                   |
       | Undefined                      | 11111101                   |
       | Active Sense                   | 11111110                   |
       | System Reset                   | 11111111                   |

                      Figure E.2 -- MIDI System Messages

Top      Up      ToC       Page 159 
   Figure E.1 and E.2 show the MIDI command family.  There are three
   major classes of commands: voice commands (opcode field values in the
   range 0x8 through 0xE), system common commands (opcode field 0xF,
   commands 0xF0 through 0xF7), and system real-time commands (opcode
   field 0xF, commands 0xF8 through 0xFF).  Voice commands code the
   musical gestures for each timbre in a composition.  Systems commands
   perform functions that usually affect all voice channels, such as
   System Reset (0xFF).

E.1.  Commands Types

   Voice commands execute on one of 16 MIDI channels, as coded by its
   4-bit channel field (field cccc in Figure E.1).  In most
   applications, notes for different timbres are assigned to different
   channels.  To support applications that require more than 16
   channels, MIDI systems use several MIDI command streams in parallel,
   to yield 32, 48, or 64 MIDI channels.

   As an example of a voice command, consider a NoteOn command (opcode
   0x9), with binary encoding 1001cccc 0nnnnnnn 0aaaaaaa.  This command
   signals the start of a musical note on MIDI channel cccc.  The note
   has a pitch coded by the note number nnnnnnn, and an onset amplitude
   coded by note velocity aaaaaaa.

   Other voice commands signal the end of notes (NoteOff, opcode 0x8),
   map a specific timbre to a MIDI channel (PChange, opcode 0xC), or set
   the value of parameters that modulate the timbral quality (all other
   voice commands).  The exact meaning of most voice channel commands
   depends on the rendering algorithms the MIDI receiver uses to
   generate sound.  In most applications, a MIDI sender has a model (in
   some sense) of the rendering method used by the receiver.

   System commands perform a variety of global tasks in the stream,
   including "sequencer" playback control of pre-recorded MIDI commands
   (the Song Position Pointer, Song Select, Clock, Start, Continue, and
   Stop messages), SMPTE time code (the MIDI Time Code Quarter Frame
   command), and the communication of device-specific data (the System
   Exclusive messages).

E.2.  Running Status

   All MIDI command bitfields share a special structure: the leading bit
   of the first octet is set to 1, and the leading bit of all subsequent
   octets is set to 0.  This structure supports a data compression
   system, called running status [MIDI], that improves the coding
   efficiency of MIDI.

Top      Up      ToC       Page 160 
   In running status coding, the first octet of a MIDI voice command may
   be dropped if it is identical to the first octet of the previous MIDI
   voice command.  This rule, in combination with a convention to
   consider NoteOn commands with a null third octet as NoteOff commands,
   supports the coding of note sequences using two octets per command.

   Running status coding is only used for voice commands.  The presence
   of a system common message in the stream cancels running status mode
   for the next voice command.  However, system real-time messages do
   not cancel running status mode.

E.3.  Command Timing

   The bitfield formats in Figures E.1 and E.2 do not encode the
   execution time for a command.  Timing information is not a part of
   the MIDI command syntax itself; different applications of the MIDI
   command language use different methods to encode timing.

   For example, the MIDI command set acts as the transport layer for
   MIDI 1.0 DIN cables [MIDI].  MIDI cables are short asynchronous
   serial lines that facilitate the remote operation of musical
   instruments and audio equipment.  Timestamps are not sent over a MIDI
   1.0 DIN cable.  Instead, the standard uses an implicit "time of
   arrival" code.  Receivers execute MIDI commands at the moment of

   In contrast, Standard MIDI Files (SMFs, [MIDI]), a file format for
   representing complete musical performances, add an explicit timestamp
   to each MIDI command, using a delta encoding scheme that is optimized
   for statistics of musical performance.  SMF timestamps usually code
   timing using the metric notation of a musical score.  SMF meta-events
   are used to add a tempo map to the file, so that score beats may be
   accurately converted into units of seconds during rendering.

E.4.  AudioSpecificConfig Templates for MMA Renderers

   In Section 6.2 and Appendix C.6.5, we describe how session
   descriptions include an AudioSpecificConfig data block to specify a
   MIDI rendering algorithm for mpeg4-generic RTP MIDI streams.

   The bitfield format of AudioSpecificConfig is defined in [MPEGAUDIO].
   StructuredAudioSpecificConfig, a key data structure coded in
   AudioSpecificConfig, is defined in [MPEGSA].

   For implementors wishing to specify Structured Audio renderers, a
   full understanding of [MPEGSA] and [MPEGAUDIO] is essential.
   However, many implementors will limit their rendering options to the
   two MIDI Manufacturers Association renderers that may be specified in

Top      Up      ToC       Page 161 
   AudioSpecificConfig: General MIDI (GM, [MIDI]) and Downloadable
   Sounds 2 (DLS 2, [DLS2]).

   To aid these implementors, we reproduce the AudioSpecificConfig
   bitfield formats for a GM renderer and a DLS 2 renderer below.  We
   have checked these bitfields carefully and believe they are correct.
   However, we stress that the material below is informative, and that
   [MPEGAUDIO] and [MPEGSA] are the normative definitions for

   As described in Section 6.2, a minimal mpeg4-generic session
   description encodes the AudioSpecificConfig binary bitfield as a
   hexadecimal string (whose format is defined in [RFC3640]) that is
   assigned to the "config" parameter.  As described in Appendix C.6.3,
   a session description that uses the render parameter encodes the
   AudioSpecificConfig binary bitfield as a Base64-encoded string
   assigned to the "inline" parameter, or in the body of an HTTP URL
   assigned to the "url" parameter.

   Below, we show a simplified binary AudioSpecificConfig bitfield
   format, suitable for sending and receiving GM and DLS 2 data:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      | AOTYPE  |FREQIDX|CHANNEL|SACNK|  FILE_BLK 1 (required) ...    |
      |1|SACNK|              FILE_BLK 2 (optional) ...                |
      |  ...  |1|SACNK| FILE_BLK N (optional) ...                     |
      |0|0|        (first "0" bit terminates FILE_BLK list)

                  Figure E.3 -- Simplified AudioSpecificConfig

   The 5-bit AOTYPE field specifies the Audio Object Type as an unsigned
   integer.  The legal values for use with mpeg4-generic RTP MIDI
   streams are "15" (General MIDI), "14" (DLS 2), and "13" (Structured
   Audio).  Thus, receivers that do not support all three mpeg4-generic
   renderers may parse the first 5 bits of an AudioSpecificConfig coded
   in a session description and reject sessions that specify unsupported

   The 4-bit FREQIDX field specifies the sampling rate of the renderer.
   We show the mapping of FREQIDX values to sampling rates in Figure
   E.4.  Senders MUST specify a sampling frequency that matches the RTP
   clock rate, if possible; if not, senders MUST specify the escape

Top      Up      ToC       Page 162 
   value.  Receivers MUST consult the RTP clock parameter for the true
   sampling rate if the escape value is specified.

                       FREQIDX    Sampling Frequency

                         0x0            96000
                         0x1            88200
                         0x2            64000
                         0x3            48000
                         0x4            44100
                         0x5            32000
                         0x6            24000
                         0x7            22050
                         0x8            16000
                         0x9            12000
                         0xa            11025
                         0xb             8000
                         0xc          reserved
                         0xd          reserved
                         0xe          reserved
                         0xf         escape value

                     Figure E.4 -- FreqIdx encoding

   The 4-bit CHANNEL field specifies the number of audio channels for
   the renderer.  The values 0x1 to 0x5 specify 1 to 5 audio channels;
   the value 0x6 specifies 5+1 surround sound, and the value 0x7
   specifies 7+1 surround sound.  If the rtpmap line in the session
   description specifies one of these formats, CHANNEL MUST be set to
   the corresponding value.  Otherwise, CHANNEL MUST be set to 0x0.

   The CHANNEL field is followed by a list of one or more binary file
   data blocks.  The 3-bit SACNK field (the chunk_type field in class
   StructuredAudioSpecificConfig, defined in [MPEGSA]) specifies the
   type of each data block.

   For General MIDI, only Standard MIDI Files may appear in the list
   (SACNK field value 2).  For DLS 2, only Standard MIDI Files and DLS 2
   RIFF files (SACNK field value 4) may appear.  For both of these file
   types, the FILE_BLK field has the format shown in Figure E.5: a 32-
   bit unsigned integer value (FILE_LEN) coding the number of bytes in
   the SMF or RIFF file, followed by FILE_LEN bytes coding the file

Top      Up      ToC       Page 163 
       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      |     FILE_LEN (32-bit, a byte count SMF file or RIFF file)     |
      |  FILE_DATA (file contents, a list of FILE_LEN bytes) ...      |

                  Figure E.5 -- The FILE_BLK field format

   Note that several files may follow CHANNEL field.  The "1" constant
   fields in Figure E.3 code the presence of another file; the "0"
   constant field codes the end of the list.  The final "0" bit in
   Figure E.3 codes the absence of special coding tools (see [MPEGAUDIO]
   for details).  Senders not using these tools MUST append this "0"
   bit; receivers that do not understand these coding tools MUST ignore
   all data following a "1" in this position.

   The StructuredAudioSpecificConfig bitfield structure requires the
   presence of one FILE_BLK.  For mpeg4-generic RTP MIDI use of DLS 2,
   FILE_BLKs MUST code RIFF files or SMF files.  For mpeg4-generic RTP
   MIDI use of General MIDI, FILE_BLKs MUST code SMF files.  By default,
   this SMF will be ignored (Appendix C.6.4.1).  In this default case, a
   GM StructuredAudioSpecificConfig bitfield SHOULD code a FILE_BLK
   whose FILE_LEN is 0, and whose FILE_DATA is empty.

   To complete this appendix, we derive the
   StructuredAudioSpecificConfig that we use in the General MIDI session
   examples in this memo.  Referring to Figure E.3, we note that for GM,
   AOTYPE = 15.  Our examples use a 44,100 Hz sample rate (FREQIDX = 4)
   and are in mono (CHANNEL = 1).  For GM, a single SMF is encoded
   (SACNK = 2), using the SMF shown in Figure E.6 (a 26 byte file).

              |  MIDI File = <Header Chunk> <Track Chunk>  |

   <Header Chunk> = <chunk type> <length>     <format> <ntrks> <divsn>
                    4D 54 68 64  00 00 00 06  00 00    00 01   00 60

   <Track Chunk> = <chunk type>  <length>     <delta-time> <end-event>
                   4D 54 72 6B   00 00 00 04  00           FF 2F 00

            Figure E.6 -- SMF file encoded in the example

Top      Up      ToC       Page 164 
   Placing these constants in binary format into the data structure
   shown in Figure E.3 yields the constant shown in Figure E.7.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      |0 1 1 1 1|0 1 0 0|0 0 0 1|0 1 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0|
      |0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0|0 1 0 0|1 1 0 1|0 1 0 1|0 1 0 0|
      |0 1 1 0|1 0 0 0|0 1 1 0|0 1 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|
      |0 0 0 0|0 0 0 0|0 0 0 0|0 1 1 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|
      |0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 1|0 0 0 0|0 0 0 0|0 1 1 0|0 0 0 0|
      |0 1 0 0|1 1 0 1|0 1 0 1|0 1 0 0|0 1 1 1|0 0 1 0|0 1 1 0|1 0 1 1|
      |0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 1 1 0|
      |0 0 0 0|0 0 0 0|1 1 1 1|1 1 1 1|0 0 1 0|1 1 1 1|0 0 0 0|0 0 0 0|

            Figure E.7 -- AudioSpecificConfig used in GM examples

   Expressing this bitfield as an ASCII hexadecimal string yields:


   This string is assigned to the "config" parameter in the minimal
   mpeg4-generic General MIDI examples in this memo (such as the example
   in Section 6.2).  Expressing this string in Base64 [RFC2045] yields:


   This string is assigned to the "inline" parameter in the General MIDI
   example shown in Appendix C.6.5.

Top      Up      ToC       Page 165 

Normative References

   [MIDI]      MIDI Manufacturers Association.  "The Complete MIDI 1.0
               Detailed Specification", 1996.

   [RFC3550]   Schulzrinne, H., Casner, S., Frederick, R., and V.
               Jacobson, "RTP: A Transport Protocol for Real-Time
               Applications", STD 64, RFC 3550, July 2003.

   [RFC3551]   Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
               Video Conferences with Minimal Control", STD 65, RFC
               3551, July 2003.

   [RFC3640]   van der Meer, J., Mackie, D., Swaminathan, V., Singer,
               D., and P. Gentric, "RTP Payload Format for Transport of
               MPEG-4 Elementary Streams", RFC 3640, November 2003.

   [MPEGSA]    International Standards Organization.  "ISO/IEC 14496
               MPEG-4", Part 3 (Audio), Subpart 5 (Structured Audio),

   [RFC4566]   Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
               Description Protocol", RFC 4566, July 2006.

   [MPEGAUDIO] International Standards Organization.  "ISO 14496 MPEG-
               4", Part 3 (Audio), 2001.

   [RFC2045]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
               Extensions (MIME) Part One: Format of Internet Message
               Bodies", RFC 2045, November 1996.

   [DLS2]      MIDI Manufacturers Association.  "The MIDI Downloadable
               Sounds Specification", v98.2, 1998.

   [RFC4234]   Crocker, D. and P. Overell, "Augmented BNF for Syntax
               Specifications: ABNF", RFC 4234, October 2005.

   [RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC3711]   Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
               Norrman, "The Secure Real-time Transport Protocol
               (SRTP)", RFC 3711, March 2004.

Top      Up      ToC       Page 166 
   [RFC3264]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
               with Session Description Protocol (SDP)", RFC 3264, June

   [RFC3986]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
               Resource Identifier (URI): Generic Syntax", STD 66, RFC
               3986, January 2005.

   [RFC2616]   Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
               Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
               Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC3388]   Camarillo, G., Eriksson, G., Holler, J., and H.
               Schulzrinne, "Grouping of Media Lines in the Session
               Description Protocol (SDP)", RFC 3388, December 2002.

   [RP015]     MIDI Manufacturers Association.  "Recommended Practice
               015 (RP-015): Response to Reset All Controllers", 11/98.

   [RFC4288]   Freed, N. and J. Klensin, "Media Type Specifications and
               Registration Procedures", BCP 13, RFC 4288, December

   [RFC3555]   Casner, S. and P. Hoschka, "MIME Type Registration of RTP
               Payload Formats", RFC 3555, July 2003.

Informative References

   [NMP]       Lazzaro, J. and J. Wawrzynek.  "A Case for Network
               Musical Performance", 11th International Workshop on
               Network and Operating Systems Support for Digital Audio
               and Video (NOSSDAV 2001) June 25-26, 2001, Port
               Jefferson, New York.

   [GRAME]     Fober, D., Orlarey, Y. and S. Letz.  "Real Time Musical
               Events Streaming over Internet", Proceedings of the
               International Conference on WEB Delivering of Music 2001,
               pages 147-154.

   [RFC3261]   Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
               A., Peterson, J., Sparks, R., Handley, M., and E.
               Schooler, "SIP: Session Initiation Protocol", RFC 3261,
               June 2002.

   [RFC2326]   Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
               Streaming Protocol (RTSP)", RFC 2326, April 1998.

Top      Up      ToC       Page 167 
   [ALF]       Clark, D. D. and D. L. Tennenhouse. "Architectural
               considerations for a new generation of protocols",
               SIGCOMM Symposium on Communications Architectures and
               Protocols , (Philadelphia, Pennsylvania), pp. 200--208,
               IEEE, Sept. 1990.

   [RFC4696]   Lazzaro, J. and J. Wawrzynek, "An Implementation Guide
               for RTP MIDI", RFC 4696, November 2006.

   [RFC2205]   Braden, R., Zhang, L., Berson, S., Herzog, S., and S.
               Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
               Functional Specification", RFC 2205, September 1997.

   [RFC4288]   Freed, N. and J. Klensin, "Media Type Specifications and
               Registration Procedures", BCP 13, RFC 4288, December

   [RFC4289]   Freed, N. and J. Klensin, "Multipurpose Internet Mail
               Extensions (MIME) Part Four: Registration Procedures",
               BCP 13, RFC 4289, December 2005.

   [RFC4571]   Lazzaro, J. "Framing Real-time Transport Protocol (RTP)
               and RTP Control Protocol (RTCP) Packets over Connection-
               Oriented Transport", RFC 4571, July 2006.

   [RFC2818]   Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000.

   [SPMIDI]    MIDI Manufacturers Association.  "Scalable Polyphony
               MIDI, Specification and Device Profiles", Document
               Version 1.0a, 2002.

   [LCP]       Apple Computer. "Logic 7 Dedicated Control Surface
               Support", Appendix B.  Product manual available from

Top      Up      ToC       Page 168 
Authors' Addresses

   John Lazzaro (corresponding author)
   UC Berkeley
   CS Division
   315 Soda Hall
   Berkeley CA 94720-1776


   John Wawrzynek
   UC Berkeley
   CS Division
   631 Soda Hall
   Berkeley CA 94720-1776


Top      Up      ToC       Page 169 
Full Copyright Statement

   Copyright (C) The IETF Trust (2006).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at


   Funding for the RFC Editor function is currently provided by the
   Internet Society.