MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies

7    The Predefined Content-Type Values

            This document defines seven initial Content-Type values  and
            an  extension  mechanism  for private or experimental types.
            Further standard types must  be  defined  by  new  published
            specifications.   It is expected that most innovation in new
            types of mail will take place as subtypes of the seven types
            defined  here.   The  most  essential characteristics of the
            seven content-types are summarized in Appendix G.

7.1  The Text Content-Type

            The text Content-Type is intended for sending material which
            is  principally textual in form.  It is the default Content-
            Type.  A "charset" parameter may be  used  to  indicate  the
            character set of the body text.  The primary subtype of text
            is "plain".  This indicates plain (unformatted)  text.   The
            default  Content-Type  for  Internet  mail  is  "text/plain;

            Beyond plain text, there are many formats  for  representing
            what might be known as "extended text" -- text with embedded
            formatting and  presentation  information.   An  interesting
            characteristic of many such representations is that they are
            to some extent  readable  even  without  the  software  that
            interprets  them.   It is useful, then, to distinguish them,
            at the highest level, from such unreadable data  as  images,
            audio,  or  text  represented in an unreadable form.  In the
            absence  of  appropriate  interpretation  software,  it   is
            reasonable to show subtypes of text to the user, while it is
            not reasonable to do so with most nontextual data.

            Such formatted textual  data  should  be  represented  using
            subtypes  of text.  Plausible subtypes of text are typically
            given by the common name of the representation format, e.g.,

7.1.1     The charset parameter

            A critical parameter that may be specified in  the  Content-
            Type  field  for  text  data  is the character set.  This is
            specified with a "charset" parameter, as in:

                 Content-type: text/plain; charset=us-ascii

            Unlike some  other  parameter  values,  the  values  of  the
            charset  parameter  are  NOT  case  sensitive.   The default
            character set, which must be assumed in  the  absence  of  a
            charset parameter, is US-ASCII.

            An initial list of predefined character  set  names  can  be
            found at the end of this section.  Additional character sets
            may be registered with IANA  as  described  in  Appendix  F,
            although the standardization of their use requires the usual

            IAB  review  and  approval.  Note  that  if  the   specified
            character  set  includes  8-bit  data,  a  Content-Transfer-
            Encoding header field and a corresponding  encoding  on  the
            data  are  required  in  order to transmit the body via some
            mail transfer protocols, such as SMTP.

            The default character set, US-ASCII, has been the subject of
            some  confusion  and  ambiguity  in the past.  Not only were
            there some ambiguities in the definition,  there  have  been
            wide  variations  in  practice.   In order to eliminate such
            ambiguity and variations  in  the  future,  it  is  strongly
            recommended  that  new  user  agents  explicitly  specify  a
            character set via the Content-Type header field.  "US-ASCII"
            does not indicate an arbitrary seven-bit character code, but
            specifies that the body uses character coding that uses  the
            exact  correspondence  of  codes  to characters specified in
            ASCII.  National use variations of ISO 646 [ISO-646] are NOT
            ASCII   and   their  use  in  Internet  mail  is  explicitly
            discouraged. The omission of the ISO 646  character  set  is
            deliberate  in  this regard.  The character set name of "US-
            ASCII" explicitly refers  to ANSI X3.4-1986 [US-ASCII] only.
            The  character  set name "ASCII" is reserved and must not be
            used for any purpose.

            NOTE: RFC 821 explicitly specifies "ASCII",  and  references
            an earlier version of the American Standard.  Insofar as one
            of the purposes of specifying a Content-Type  and  character
            set is to permit the receiver to unambiguously determine how
            the sender intended the coded  message  to  be  interpreted,
            assuming  anything  other than "strict ASCII" as the default
            would risk unintentional and  incompatible  changes  to  the
            semantics  of  messages  now being transmitted.    This also
            implies that messages containing characters coded  according
            to  national  variations on ISO 646, or using code-switching
            procedures (e.g., those of ISO 2022), as well  as  8-bit  or
            multiple   octet character encodings MUST use an appropriate
            character set  specification  to  be  consistent  with  this

            The complete US-ASCII character set is listed in [US-ASCII].
            Note  that  the control characters including DEL (0-31, 127)
            have no defined meaning  apart  from  the  combination  CRLF
            (ASCII  values 13 and 10) indicating a new line.  Two of the
            characters have de facto meanings in wide use: FF (12) often
            means  "start  subsequent  text  on  the  beginning of a new
            page"; and TAB or HT (9) often  (though  not  always)  means
            "move  the  cursor  to  the  next available column after the
            current position where the column number is a multiple of  8
            (counting  the  first column as column 0)." Apart from this,
            any use of the control characters or DEL in a body  must  be
            part   of   a  private  agreement  between  the  sender  and
            recipient.  Such  private  agreements  are  discouraged  and
            should  be  replaced  by  the  other  capabilities  of  this

            NOTE:   Beyond  US-ASCII,  an  enormous   proliferation   of
            character  sets  is  possible. It is the opinion of the IETF
            working group that a large number of character sets is NOT a
            good  thing.   We would prefer to specify a single character
            set that can be used universally for representing all of the
            world's   languages   in  electronic  mail.   Unfortunately,
            existing practice in several communities seems to  point  to
            the  continued  use  of  multiple character sets in the near
            future.  For this reason, we define names for a small number
            of  character  sets  for  which  a  strong  constituent base
            exists.    It is our hope  that  ISO  10646  or  some  other
            effort  will  eventually define a single world character set
            which can then be specified for use in Internet mail, but in
            the  advance of that definition we cannot specify the use of
            ISO  10646,  Unicode,  or  any  other  character  set  whose
            definition is, as of this writing, incomplete.

            The defined charset values are:

                 US-ASCII -- as defined in [US-ASCII].

                 ISO-8859-X -- where "X"  is  to  be  replaced,  as
                      necessary,  for  the  parts of ISO-8859 [ISO-
                      8859].  Note that the ISO 646 character  sets
                      have  deliberately  been  omitted in favor of
                      their  8859  replacements,  which   are   the
                      designated  character sets for Internet mail.
                      As of the publication of this  document,  the
                      legitimate  values  for  "X" are the digits 1
                      through 9.

            Note that the character set used,  if  anything  other  than
            US-ASCII,   must  always  be  explicitly  specified  in  the
            Content-Type field.

            No other character set name may be  used  in  Internet  mail
            without  the  publication  of a formal specification and its
            registration with IANA as described in  Appendix  F,  or  by
            private agreement, in which case the character set name must
            begin with "X-".

            Implementors are discouraged  from  defining  new  character
            sets for mail use unless absolutely necessary.

            The "charset" parameter has been defined primarily  for  the
            purpose  of  textual  data, and is described in this section
            for that reason.   However,  it  is  conceivable  that  non-
            textual  data might also wish to specify a charset value for
            some purpose, in which  case  the  same  syntax  and  values
            should be used.

            In general, mail-sending  software  should  always  use  the
            "lowest  common  denominator"  character  set possible.  For
            example, if a body contains  only  US-ASCII  characters,  it

            should be marked as being in the US-ASCII character set, not
            ISO-8859-1, which, like all the ISO-8859 family of character
            sets,  is  a  superset  of  US-ASCII.   More generally, if a
            widely-used character set is a subset of  another  character
            set,  and a body contains only characters in the widely-used
            subset, it should be labeled as being in that  subset.  This
            will increase the chances that the recipient will be able to
            view the mail correctly.

7.1.2     The Text/plain subtype

            The primary subtype of text   is  "plain".   This  indicates
            plain  (unformatted)  text.  The  default  Content-Type  for
            Internet  mail,  "text/plain;  charset=us-ascii",  describes
            existing  Internet practice, that is, it is the type of body
            defined by RFC 822.

7.1.3     The Text/richtext subtype

            In order to promote the  wider  interoperability  of  simple
            formatted  text,  this  document defines an extremely simple
            subtype of "text", the "richtext" subtype.  This subtype was
            designed to meet the following criteria:

                 1.  The syntax must be extremely simple to  parse,
                 so  that  even  teletype-oriented mail systems can
                 easily strip away the formatting  information  and
                 leave only the readable text.

                 2.  The syntax must be extensible to allow for new
                 formatting commands that are deemed essential.

                 3.  The capabilities must be extremely limited, to
                 ensure  that  it  can  represent  no  more than is
                 likely to be representable by the  user's  primary
                 word  processor.   While  this  limits what can be
                 sent, it increases the  likelihood  that  what  is
                 sent can be properly displayed.

                 4.  The syntax must be compatible  with  SGML,  so
                 that,  with  an  appropriate  DTD  (Document  Type
                 Definition, the standard mechanism for defining  a
                 document  type  using SGML), a general SGML parser
                 could be made to parse richtext.  However, despite
                 this  compatibility,  the  syntax  should  be  far
                 simpler than full SGML, so that no SGML  knowledge
                 is required in order to implement it.

            The syntax of "richtext" is very simple.  It is assumed,  at
            the  top-level,  to be in the US-ASCII character set, unless
            of course a different charset parameter was specified in the
            Content-type  field.   All  characters represent themselves,
            with the exception of the "<" character (ASCII 60), which is
            used   to  mark  the  beginning  of  a  formatting  command.

            Formatting  instructions  consist  of  formatting   commands
            surrounded  by angle brackets ("<>", ASCII 60 and 62).  Each
            formatting command may be no  more  than  40  characters  in
            length,  all in US-ASCII, restricted to the alphanumeric and
            hyphen ("-") characters. Formatting commands may be preceded
            by  a  forward slash or solidus ("/", ASCII 47), making them
            negations, and such negations must always exist  to  balance
            the  initial opening commands, except as noted below.  Thus,
            if the formatting command "<bold>" appears  at  some  point,
            there  must  later  be a "</bold>" to balance it.  There are
            only three exceptions to this "balancing" rule:  First,  the
            command "<lt>" is used to represent a literal "<" character.
            Second, the command "<nl>" is used to represent  a  required
            line  break.   (Otherwise,  CRLFs in the data are treated as
            equivalent to  a  single  SPACE  character.)   Finally,  the
            command  "<np>"  is  used to represent a page break.  (NOTE:
            The 40 character  limit  on  formatting  commands  does  not
            include  the  "<",  ">",  or  "/"  characters  that might be
            attached to such commands.)

            Initially defined formatting commands, not all of which will
            be implemented by all richtext implementations, include:

                 Bold -- causes the subsequent text  to  be  in  a  bold
                 Italic -- causes the subsequent text to be in an italic
                 Fixed -- causes the subsequent text to be  in  a  fixed
                      width font.
                 Smaller -- causes  the  subsequent  text  to  be  in  a
                      smaller font.
                 Bigger -- causes the subsequent text to be in a  bigger
                 Underline  --  causes  the  subsequent   text   to   be
                 Center -- causes the subsequent text to be centered.
                 FlushLeft -- causes the  subsequent  text  to  be  left
                 FlushRight -- causes the subsequent text  to  be  right
                 Indent -- causes the subsequent text to be indented  at
                      the left margin.
                 IndentRight  --  causes  the  subsequent  text  to   be
                      indented at the right margin.
                 Outdent -- causes the subsequent text to  be  outdented
                      at the left margin.
                 OutdentRight  --  causes  the  subsequent  text  to  be
                      outdented at the right margin.
                 SamePage -- causes the subsequent text to  be  grouped,
                      if possible, on one page.
                 Subscript  --  causes  the  subsequent   text   to   be
                      interpreted as a subscript.

                 Superscript  --  causes  the  subsequent  text  to   be
                      interpreted as a superscript.
                 Heading -- causes the subsequent text to be interpreted
                      as a page heading.
                 Footing -- causes the subsequent text to be interpreted
                      as a page footing.
                 ISO-8859-X  (for any value of X  that  is  legal  as  a
                      "charset" parameter) -- causes the subsequent text
                      to be  interpreted  as  text  in  the  appropriate
                      character set.
                 US-ASCII  --  causes  the   subsequent   text   to   be
                      interpreted as text in the US-ASCII character set.
                 Excerpt -- causes the subsequent text to be interpreted
                      as   a   textual   excerpt  from  another  source.
                      Typically this will be displayed using indentation
                      and  an  alternate font, but such decisions are up
                      to the viewer.
                 Paragraph  --  causes  the  subsequent   text   to   be
                      interpreted    as   a   single   paragraph,   with
                      appropriate  paragraph  breaks  (typically   blank
                      space) before and after.
                 Signature  --  causes  the  subsequent   text   to   be
                      interpreted  as  a  "signature".  Some systems may
                      wish to display signatures in a  smaller  font  or
                      otherwise set them apart from the main text of the
                 Comment -- causes the subsequent text to be interpreted
                      as a comment, and hence not shown to the reader.
                 No-op -- has no effect on the subsequent text.
                 lt -- <lt> is replaced by a literal "<" character.   No
                      balancing </lt> is allowed.
                 nl -- <nl> causes a line break.  No balancing </nl>  is
                 np -- <np> causes a page break.  No balancing </np>  is

            Each positive formatting command affects all subsequent text
            until  the matching negative formatting command.  Such pairs
            of formatting commands must be properly balanced and nested.
            Thus, a proper way to describe text in bold italics is:


                 or, alternately,


                 but,  in  particular,  the  following  is  illegal


            NOTE:   The  nesting  requirement  for  formatting  commands
            imposes  a  slightly  higher  burden  upon  the composers of

            richtext  bodies,  but   potentially   simplifies   richtext
            displayers  by  allowing  them  to be stack-based.  The main
            goal of richtext is to be simple enough to  make  multifont,
            formatted  email  widely  readable,  so  that those with the
            capability of  sending  it  will  be  able  to  do  so  with
            confidence.   Thus  slightly  increased  complexity  in  the
            composing software was  deemed  a  reasonable  tradeoff  for
            simplified  reading  software.  Nonetheless, implementors of
            richtext  readers  are  encouraged  to  follow  the  general
            Internet  guidelines  of being conservative in what you send
            and liberal in what you accept.  Those implementations  that
            can  do so are encouraged to deal reasonably with improperly
            nested richtext.

            Implementations  must  regard  any  unrecognized  formatting
            command  as  equivalent to "No-op", thus facilitating future
            extensions to "richtext".  Private extensions may be defined
            using  formatting  commands that begin with "X-", by analogy
            to Internet mail header field names.

            It is worth noting that no special behavior is required  for
            the TAB (HT) character. It is recommended, however, that, at
            least  when  fixed-width  fonts  are  in  use,  the   common
            semantics  of  the  TAB  (HT)  character should be observed,
            namely that it moves to the next column position that  is  a
            multiple  of  8.   (In  other words, if a TAB (HT) occurs in
            column n, where the leftmost column is column 0,  then  that
            TAB   (HT)   should   be  replaced  by  8-(n  mod  8)  SPACE

            Richtext also differentiates between "hard" and "soft"  line
            breaks.   A line break (CRLF) in the richtext data stream is
            interpreted as a "soft" line break,  one  that  is  included
            only for purposes of mail transport, and is to be treated as
            white space by richtext interpreters.  To include  a  "hard"
            line  break (one that must be displayed as such), the "<nl>"
            or "<paragraph> formatting constructs  should  be  used.  In
            general, a soft line break should be treated as white space,
            but when soft line breaks immediately follow  a  <nl>  or  a
            </paragraph>  tag they should be ignored rather than treated
            as white space.

            Putting all this  together,  the  following  "text/richtext"
            body fragment:

                      <bold>Now</bold> is the time for
                      <italic>all</italic> good men
                       <smaller>(and <lt>women>)</smaller> to
                      <ignoreme></ignoreme> come

                      to the aid of their

                      beloved <nl><nl>country. <comment> Stupid
                      quote! </comment> -- the end

            represents the following  formatted  text  (which  will,  no
            doubt,  look  cryptic  in  the  text-only  version  of  this

                 Now is the time for all good men (and <women>)  to
                 come to the aid of their

                 country. -- the end

            Richtext conformance:  A minimal richtext implementation  is
            one  that  simply  converts "<lt>" to "<", converts CRLFs to
            SPACE, converts <nl> to a newline according to local newline
            convention,  removes  everything between a <comment> command
            and the next balancing </comment> command, and  removes  all
            other  formatting  commands  (all  text  enclosed  in  angle

            decidedly  not  SGML,  and  must  not  be  used to transport
            arbitrary SGML  documents.   Those  who  wish  to  use  SGML
            document  types as a mail transport format must define a new
            text or application subtype, e.g.,  "text/sgml-dtd-whatever"
            or   "application/sgml-dtd-whatever",   depending   on   the
            perceived readability  of  the  DTD  in  use.   Richtext  is
            designed  to  be  compatible  with SGML, and specifically so
            that it will be possible to define a richtext DTD if one  is
            needed.   However,  this  does not imply that arbitrary SGML
            can be called richtext, nor that richtext implementors  have
            any  need  to  understand  SGML;  the  description  in  this
            document is a complete definition of richtext, which is  far
            simpler than complete SGML.

            NOTE ON THE INTENDED USE OF RICHTEXT:  It is recognized that
            implementors  of  future  mail  systems  will want rich text
            functionality  far  beyond  that   currently   defined   for
            richtext.   The  intent  of  richtext is to provide a common
            format for expressing that functionality in a form in  which
            much  of  it, at least, will be understood by interoperating
            software.  Thus,  in  particular,  software  with  a  richer
            notion  of  formatted  text  than  richtext  can  still  use
            richtext as its basic representation, but can extend it with
            new  formatting  commands and by hiding information specific
            to that software  system  in  richtext  comments.   As  such
            systems  evolve,  it  is  expected  that  the  definition of
            richtext  will  be  further  refined  by  future   published
            specifications,  but  richtext  as  defined  here provides a
            platform on which evolutionary refinements can be based.

            IMPLEMENTATION NOTE:  In  some  environments,  it  might  be
            impossible  to combine certain richtext formatting commands,

            whereas in  others  they  might  be  combined  easily.   For
            example,  the  combination  of  <bold>  and  <italic>  might
            produce bold italics on systems that support such fonts, but
            there  exist  systems that can make text bold or italicized,
            but not both.  In  such  cases,  the  most  recently  issued
            recognized formatting command should be preferred.

            One of the major goals in the design of richtext was to make
            it  so  simple  that  even  text-only mailers will implement
            richtext-to-plain-text  translators,  thus  increasing   the
            likelihood  that  multifont  text  will become "safe" to use
            very widely.  To demonstrate this simplicity,  an  extremely
            simple  35-line  C program that converts richtext input into
            plain text output is included in Appendix D.

7.2  The Multipart Content-Type

            In the case of multiple part messages, in which one or  more
            different  sets  of  data  are  combined in a single body, a
            "multipart" Content-Type field must appear in  the  entity's
            header. The body must then contain one or more "body parts,"
            each preceded by an encapsulation boundary, and the last one
            followed  by  a  closing boundary.  Each part starts with an
            encapsulation  boundary,  and  then  contains  a  body  part
            consisting  of   header area, a blank line, and a body area.
            Thus a body part is similar to an RFC 822 message in syntax,
            but different in meaning.

            A body part is NOT to be interpreted as  actually  being  an
            RFC  822  message.   To  begin  with,  NO  header fields are
            actually required in body parts.  A body  part  that  starts
            with  a blank line, therefore, is allowed and is a body part
            for which all default values are to be assumed.  In  such  a
            case,  the  absence  of  a Content-Type header field implies
            that the encapsulation is plain  US-ASCII  text.   The  only
            header  fields  that have defined meaning for body parts are
            those the names of which begin with "Content-".   All  other
            header  fields  are  generally  to be ignored in body parts.
            Although  they  should  generally  be   retained   in   mail
            processing,  they may be discarded by gateways if necessary.
            Such other fields are permitted to appear in body parts  but
            should  not  be  depended on. "X-" fields may be created for
            experimental or private purposes, with the recognition  that
            the information they contain may be lost at some gateways.

            The distinction between an RFC 822 message and a  body  part
            is  subtle,  but  important.  A gateway between Internet and
            X.400 mail, for example, must be able to tell the difference
            between  a  body part that contains an image and a body part
            that contains an encapsulated message, the body of which  is
            an  image.   In order to represent the latter, the body part
            must have "Content-Type: message", and its body  (after  the
            blank  line)  must be the encapsulated message, with its own
            "Content-Type: image" header  field.   The  use  of  similar
            syntax facilitates the conversion of messages to body parts,
            and vice versa, but the distinction between the two must  be
            understood  by implementors.  (For the special case in which
            all parts actually are messages, a "digest" subtype is  also

            As stated previously, each  body  part  is  preceded  by  an
            encapsulation boundary.  The encapsulation boundary MUST NOT
            appear inside any of the encapsulated parts.   Thus,  it  is
            crucial  that  the  composing  agent  be  able to choose and
            specify the unique boundary that will separate the parts.

            All present and future subtypes of the "multipart" type must
            use  an  identical  syntax.  Subtypes  may  differ  in their
            semantics, and may impose additional restrictions on syntax,

            but  must  conform  to the required syntax for the multipart
            type.  This requirement ensures  that  all  conformant  user
            agents  will  at least be able to recognize and separate the
            parts of any  multipart  entity,  even  of  an  unrecognized

            As stated in the definition of the Content-Transfer-Encoding
            field, no encoding other than "7bit", "8bit", or "binary" is
            permitted for entities of type "multipart".   The  multipart
            delimiters  and  header fields are always 7-bit ASCII in any
            case, and data within the body parts can  be  encoded  on  a
            part-by-part  basis,  with  Content-Transfer-Encoding fields
            for each appropriate body part.

            Mail gateways, relays, and other mail  handling  agents  are
            commonly  known  to alter the top-level header of an RFC 822
            message.   In particular, they frequently  add,  remove,  or
            reorder  header  fields.   Such  alterations  are explicitly
            forbidden for the body part headers embedded in  the  bodies
            of messages of type "multipart."

7.2.1     Multipart:  The common syntax

            All subtypes of "multipart" share a common  syntax,  defined
            in  this  section.   A simple example of a multipart message
            also appears in this section.  An example of a more  complex
            multipart message is given in Appendix C.

            The Content-Type field for multipart  entities requires  one
            parameter,   "boundary",   which  is  used  to  specify  the
            encapsulation  boundary.   The  encapsulation  boundary   is
            defined   as  a  line  consisting  entirely  of  two  hyphen
            characters ("-", decimal code 45) followed by  the  boundary
            parameter value from the Content-Type header field.

            NOTE:  The hyphens are  for  rough  compatibility  with  the
            earlier  RFC  934  method  of message encapsulation, and for
            ease   of   searching   for   the   boundaries    in    some
            implementations.  However, it should be noted that multipart
            messages  are  NOT  completely  compatible  with   RFC   934
            encapsulations;  in  particular,  they  do  not obey RFC 934
            quoting conventions  for  embedded  lines  that  begin  with
            hyphens.   This  mechanism  was  chosen  over  the  RFC  934
            mechanism because the latter causes lines to grow with  each
            level  of  quoting.  The combination of this growth with the
            fact that SMTP implementations  sometimes  wrap  long  lines
            made  the  RFC 934 mechanism unsuitable for use in the event
            that deeply-nested multipart structuring is ever desired.

            Thus, a typical multipart Content-Type  header  field  might
            look like this:

                 Content-Type: multipart/mixed;

            This indicates that the entity consists  of  several  parts,
            each itself with a structure that is syntactically identical
            to an RFC 822 message, except that the header area might  be
            completely  empty,  and  that the parts are each preceded by
            the line


            Note that the  encapsulation  boundary  must  occur  at  the
            beginning  of  a line, i.e., following a CRLF, and that that
            initial CRLF is considered to be part of  the  encapsulation
            boundary  rather  than  part  of  the preceding part.    The
            boundary must be followed immediately either by another CRLF
            and the header fields for the next part, or by two CRLFs, in
            which case there are no header fields for the next part (and
            it is therefore assumed to be of Content-Type text/plain).

            NOTE:   The  CRLF  preceding  the  encapsulation   line   is
            considered  part  of  the boundary so that it is possible to
            have a part that does not end with  a  CRLF  (line   break).
            Body  parts that must be considered to end with line breaks,
            therefore, should have two CRLFs preceding the encapsulation
            line, the first of which is part of the preceding body part,
            and the  second  of  which  is  part  of  the  encapsulation

            The requirement that the encapsulation boundary begins  with
            a  CRLF  implies  that  the  body of a multipart entity must
            itself begin with a CRLF before the first encapsulation line
            --  that  is, if the "preamble" area is not used, the entity
            headers must be followed by TWO CRLFs.  This is  indeed  how
            such  entities  should be composed.  A tolerant mail reading
            program, however, may interpret a  body  of  type  multipart
            that  begins  with  an encapsulation line NOT initiated by a
            CRLF  as  also  being  an  encapsulation  boundary,  but   a
            compliant  mail  sending  program  must  not  generate  such

            Encapsulation  boundaries  must  not   appear   within   the
            encapsulations,  and  must  be no longer than 70 characters,
            not counting the two leading hyphens.

            The encapsulation boundary following the last body part is a
            distinguished  delimiter that indicates that no further body
            parts will follow.  Such a delimiter  is  identical  to  the
            previous  delimiters,  with the addition of two more hyphens
            at the end of the line:


            There appears to be room for additional information prior to
            the  first  encapsulation  boundary  and following the final

            boundary.  These areas should generally be left  blank,  and
            implementations  should  ignore anything that appears before
            the first boundary or after the last one.

            NOTE:  These "preamble" and "epilogue" areas  are  not  used
            because  of the lack of proper typing of these parts and the
            lack  of  clear  semantics  for  handling  these  areas   at
            gateways, particularly X.400 gateways.

            NOTE:  Because encapsulation boundaries must not  appear  in
            the  body  parts  being  encapsulated,  a  user  agent  must
            exercise care to choose a unique boundary.  The boundary  in
            the example above could have been the result of an algorithm
            designed to produce boundaries with a very  low  probability
            of  already  existing in the data to be encapsulated without
            having to prescan  the  data.   Alternate  algorithms  might
            result in more 'readable' boundaries for a recipient with an
            old user agent, but would  require  more  attention  to  the
            possibility   that   the   boundary   might  appear  in  the
            encapsulated  part.   The  simplest  boundary  possible   is
            something like "---", with a closing boundary of "-----".

            As a very simple example, the  following  multipart  message
            has  two  parts,  both  of  them  plain  text,  one  of them
            explicitly typed and one of them implicitly typed:

                 From: Nathaniel Borenstein <>
                 To:  Ned Freed <>
                 Subject: Sample message
                 MIME-Version: 1.0
                 Content-type: multipart/mixed; boundary="simple

                 This is the preamble.  It is to be ignored, though it
                 is a handy place for mail composers to include an
                 explanatory note to non-MIME compliant readers.
                 --simple boundary

                 This is implicitly typed plain ASCII text.
                 It does NOT end with a linebreak.
                 --simple boundary
                 Content-type: text/plain; charset=us-ascii

                 This is explicitly typed plain ASCII text.
                 It DOES end with a linebreak.

                 --simple boundary--
                 This is the epilogue.  It is also to be ignored.

            The use of a Content-Type of multipart in a body part within
            another  multipart  entity  is explicitly allowed.   In such
            cases, for obvious reasons, care must  be  taken  to  ensure
            that  each  nested  multipart  entity  must  use a different
            boundary delimiter. See Appendix C for an example of  nested

            multipart entities.

            The use of the multipart Content-Type  with  only  a  single
            body  part  may  be  useful  in  certain  contexts,  and  is
            explicitly permitted.

            The only mandatory parameter for the multipart  Content-Type
            is  the  boundary  parameter,  which  consists  of  1  to 70
            characters from a set of characters known to be very  robust
            through  email  gateways,  and  NOT ending with white space.
            (If a boundary appears to end with white  space,  the  white
            space  must be presumed to have been added by a gateway, and
            should  be  deleted.)   It  is  formally  specified  by  the
            following BNF:

            boundary := 0*69<bchars> bcharsnospace

            bchars := bcharsnospace / " "

            bcharsnospace :=    DIGIT / ALPHA / "'" / "(" / ")" / "+"  /
                           / "," / "-" / "." / "/" / ":" / "=" / "?"

            Overall, the body of a multipart entity may be specified  as

            multipart-body := preamble 1*encapsulation
                           close-delimiter epilogue

            encapsulation := delimiter CRLF body-part

            delimiter := CRLF "--" boundary   ; taken from  Content-Type
                                           ;   when   content-type    is
                                         ; There must be no space
                                         ; between "--" and boundary.

            close-delimiter := delimiter "--" ; Again, no  space  before

            preamble :=  *text                  ;  to  be  ignored  upon

            epilogue :=  *text                  ;  to  be  ignored  upon

            body-part = <"message" as defined in RFC 822,
                     with all header fields optional, and with the
                     specified delimiter not occurring anywhere in
                     the message body, either on a line by itself
                     or as a substring anywhere.  Note that the

                     semantics of a part differ from the semantics
                     of a message, as described in the text.>

            NOTE:  Conspicuously missing from the multipart  type  is  a
            notion  of  structured,  related body parts.  In general, it
            seems premature to try to  standardize  interpart  structure
            yet.  It is recommended that those wishing to provide a more
            structured or integrated multipart messaging facility should
            define   a   subtype  of  multipart  that  is  syntactically
            identical, but  that  always  expects  the  inclusion  of  a
            distinguished part that can be used to specify the structure
            and integration of the other parts,  probably  referring  to
            them  by  their Content-ID field.  If this approach is used,
            other implementations will not recognize  the  new  subtype,
            but  will  treat it as the primary subtype (multipart/mixed)
            and will thus be able to show the user the  parts  that  are

7.2.2     The Multipart/mixed (primary) subtype

            The primary subtype for multipart, "mixed", is intended  for
            use  when  the body parts are independent and intended to be
            displayed  serially.   Any  multipart   subtypes   that   an
            implementation does not recognize should be treated as being
            of subtype "mixed".

7.2.3     The Multipart/alternative subtype

            The multipart/alternative type is syntactically identical to
            multipart/mixed,   but  the  semantics  are  different.   In
            particular, each of the parts is an "alternative" version of
            the same information.  User agents should recognize that the
            content of the various parts are interchangeable.  The  user
            agent  should  either  choose  the  "best" type based on the
            user's environment and preferences, or offer  the  user  the
            available  alternatives.  In general, choosing the best type
            means displaying only the LAST part that can  be  displayed.
            This  may be used, for example, to send mail in a fancy text
            format in such  a  way  that  it  can  easily  be  displayed

            From:  Nathaniel Borenstein <>
            To: Ned Freed <>
            Subject: Formatted text mail
            MIME-Version: 1.0
            Content-Type: multipart/alternative; boundary=boundary42

            Content-Type: text/plain; charset=us-ascii

            ...plain text version of message goes here....

            Content-Type: text/richtext

            .... richtext version of same message goes here ...
            Content-Type: text/x-whatever

            .... fanciest formatted version of same  message  goes  here

            In this example, users  whose  mail  system  understood  the
            "text/x-whatever"  format  would see only the fancy version,
            while other users would see only the richtext or plain  text
            version, depending on the capabilities of their system.

            In general, user agents that  compose  multipart/alternative
            entities  should place the body parts in increasing order of
            preference, that is, with the  preferred  format  last.  For
            fancy  text,  the sending user agent should put the plainest
            format first and the richest format  last.   Receiving  user
            agents  should  pick  and  display  the last format they are
            capable of  displaying.   In  the  case  where  one  of  the
            alternatives  is  itself  of  type  "multipart" and contains
            unrecognized sub-parts, the user agent may choose either  to
            show that alternative, an earlier alternative, or both.

            NOTE:  From an implementor's perspective, it might seem more
            sensible  to  reverse  this  ordering, and have the plainest
            alternative last.  However, placing the plainest alternative
            first    is    the    friendliest   possible   option   when
            mutlipart/alternative entities are viewed using a  non-MIME-
            compliant mail reader.  While this approach does impose some
            burden on  compliant  mail  readers,  interoperability  with
            older  mail  readers was deemed to be more important in this

            It may be the case  that  some  user  agents,  if  they  can
            recognize more than one of the formats, will prefer to offer
            the user the choice of which format  to  view.   This  makes
            sense, for example, if mail includes both a nicely-formatted
            image version and an easily-edited text  version.   What  is
            most  critical,  however, is that the user not automatically
            be shown multiple versions of the  same  data.   Either  the
            user  should  be shown the last recognized version or should
            explicitly be given the choice.

7.2.4     The Multipart/digest subtype

            This document defines a "digest" subtype  of  the  multipart
            Content-Type.   This  type  is  syntactically  identical  to
            multipart/mixed,  but  the  semantics  are  different.    In
            particular,  in a digest, the default Content-Type value for
            a   body   part   is   changed    from    "text/plain"    to
            "message/rfc822".   This  is  done  to allow a more readable
            digest format that is largely  compatible  (except  for  the
            quoting convention) with RFC 934.

            A digest in this format might,  then,  look  something  like

            From: Moderator-Address
            MIME-Version: 1.0
            Subject:  Internet Digest, volume 42
            Content-Type: multipart/digest;
                 boundary="---- next message ----"

            ------ next message ----

            From: someone-else
            Subject: my opinion

            ...body goes here ...

            ------ next message ----

            From: someone-else-again
            Subject: my different opinion

            ... another body goes here...

            ------ next message ------

7.2.5     The Multipart/parallel subtype

            This document defines a "parallel" subtype of the  multipart
            Content-Type.   This  type  is  syntactically  identical  to
            multipart/mixed,  but  the  semantics  are  different.    In
            particular,  in  a  parallel  entity,  all  of the parts are
            intended to be presented in parallel, i.e.,  simultaneously,
            on  hardware  and  software  that  are  capable of doing so.
            Composing agents should be aware that many mail readers will
            lack this capability and will show the parts serially in any

7.3  The Message Content-Type

            It is frequently desirable, in sending mail, to  encapsulate
            another  mail  message. For this common operation, a special
            Content-Type, "message", is defined.  The  primary  subtype,
            message/rfc822,  has  no required parameters in the Content-
            Type field.  Additional subtypes, "partial"  and  "External-
            body",  do  have  required  parameters.   These subtypes are
            explained below.

            NOTE:  It has been suggested that subtypes of message  might
            be  defined  for  forwarded  or rejected messages.  However,
            forwarded and rejected messages can be handled as  multipart
            messages  in  which  the  first part contains any control or
            descriptive  information,  and  a  second  part,   of   type
            message/rfc822,   is  the  forwarded  or  rejected  message.
            Composing rejection and forwarding messages in  this  manner
            will  preserve  the type information on the original message
            and allow it to be correctly presented to the recipient, and
            hence is strongly encouraged.

            As stated in the definition of the Content-Transfer-Encoding
            field, no encoding other than "7bit", "8bit", or "binary" is
            permitted for messages  or  parts  of  type  "message".  The
            message  header  fields are always US-ASCII in any case, and
            data within the body can still be encoded, in which case the
            Content-Transfer-Encoding  header  field in the encapsulated
            message will reflect this.  Non-ASCII text in the headers of
            an   encapsulated   message   can  be  specified  using  the
            mechanisms described in [RFC-1342].

            Mail gateways, relays, and other mail  handling  agents  are
            commonly  known  to alter the top-level header of an RFC 822
            message.   In particular, they frequently  add,  remove,  or
            reorder  header  fields.   Such  alterations  are explicitly
            forbidden for  the  encapsulated  headers  embedded  in  the
            bodies of messages of type "message."

7.3.1     The Message/rfc822 (primary) subtype

            A Content-Type of "message/rfc822" indicates that  the  body
            contains  an encapsulated message, with the syntax of an RFC
            822 message.

7.3.2     The Message/Partial subtype

            A subtype of message, "partial",  is  defined  in  order  to
            allow  large  objects  to  be  delivered as several separate
            pieces  of  mail  and  automatically  reassembled   by   the
            receiving  user  agent.   (The  concept  is  similar  to  IP
            fragmentation/reassembly in the basic  Internet  Protocols.)
            This  mechanism  can  be  used  when  intermediate transport
            agents limit the size of individual  messages  that  can  be
            sent.   Content-Type  "message/partial"  thus indicates that

            the body contains a fragment of a larger message.

            Three parameters must be specified in the Content-Type field
            of  type  message/partial:  The  first,  "id",  is  a unique
            identifier,  as  close  to  a  world-unique  identifier   as
            possible,  to  be  used  to  match  the parts together.  (In
            general, the identifier  is  essentially  a  message-id;  if
            placed  in  double  quotes,  it  can  be  any message-id, in
            accordance with the BNF for  "parameter"  given  earlier  in
            this  specification.)   The second, "number", an integer, is
            the part number, which indicates where this part  fits  into
            the  sequence  of  fragments.   The  third, "total", another
            integer, is the total number of parts. This  third  subfield
            is  required  on  the  final  part,  and  is optional on the
            earlier parts. Note also that these parameters may be  given
            in any order.

            Thus, part 2 of a 3-part message  may  have  either  of  the
            following header fields:

                 Content-Type: Message/Partial;
                      number=2; total=3;

                 Content-Type: Message/Partial;

            But part 3 MUST specify the total number of parts:

                 Content-Type: Message/Partial;
                      number=3; total=3;

            Note that part numbering begins with 1, not 0.

            When the parts of a message broken up in this manner are put
            together,  the  result is a complete RFC 822 format message,
            which may have its own Content-Type header field,  and  thus
            may contain any other data type.

            Message fragmentation and reassembly:  The  semantics  of  a
            reassembled  partial  message  must  be those of the "inner"
            message, rather than  of  a  message  containing  the  inner
            message.   This  makes  it  possible, for example, to send a
            large audio message as several partial messages,  and  still
            have  it  appear  to the recipient as a simple audio message
            rather than as an encapsulated message containing  an  audio
            message.   That  is,  the  encapsulation  of  the message is
            considered to be "transparent".

            When  generating   and   reassembling   the   parts   of   a
            message/partial  message,  the  headers  of the encapsulated
            message must be merged with the  headers  of  the  enclosing

            entities.  In  this  process  the  following  rules  must be

                 (1) All of the headers from the initial  enclosing
                 entity  (part  one),  except those that start with
                 "Content-" and "Message-ID", must  be  copied,  in
                 order, to the new message.

                 (2) Only those headers  in  the  enclosed  message
                 which  start with "Content-" and "Message-ID" must
                 be appended, in order, to the headers of  the  new
                 message.   Any  headers  in  the  enclosed message
                 which do not start  with  "Content-"  (except  for
                 "Message-ID") will be ignored.

                 (3) All of the headers from  the  second  and  any
                 subsequent messages will be ignored.

            For example, if an audio message is broken into  two  parts,
            the first part might look something like this:

                 X-Weird-Header-1: Foo
                 Subject: Audio mail
                 MIME-Version: 1.0
                 Content-type: message/partial;
                      number=1; total=2

                 X-Weird-Header-1: Bar
                 X-Weird-Header-2: Hello
                 Content-type: audio/basic
                 Content-transfer-encoding: base64

                 ... first half of encoded audio data goes here...

            and the second half might look something like this:

                 Subject: Audio mail
                 MIME-Version: 1.0
                 Content-type: message/partial;
                      id=""; number=2; total=2

                 ... second half of encoded audio data goes here...

            Then,  when  the  fragmented  message  is  reassembled,  the
            resulting  message  to  be displayed to the user should look
            something like this:

                 X-Weird-Header-1: Foo
                 Subject: Audio mail
                 MIME-Version: 1.0
                 Content-type: audio/basic
                 Content-transfer-encoding: base64

                 ... first half of encoded audio data goes here...
                 ... second half of encoded audio data goes here...

            It should be  noted  that,  because  some  message  transfer
            agents  may choose to automatically fragment large messages,
            and because such  agents  may  use  different  fragmentation
            thresholds,  it  is  possible  that  the pieces of a partial
            message, upon reassembly, may prove themselves to comprise a
            partial message.  This is explicitly permitted.

            It should also be noted that the inclusion of a "References"
            field  in the headers of the second and subsequent pieces of
            a fragmented message that references the Message-Id  on  the
            previous  piece  may  be  of  benefit  to  mail readers that
            understand and track references. However, the generation  of
            such "References" fields is entirely optional.

7.3.3     The Message/External-Body subtype

            The external-body subtype indicates  that  the  actual  body
            data are not included, but merely referenced.  In this case,
            the  parameters  describe  a  mechanism  for  accessing  the
            external data.

            When  a   message   body   or   body   part   is   of   type
            "message/external-body",   it  consists  of  a  header,  two
            consecutive  CRLFs,  and  the   message   header   for   the
            encapsulated  message.  If another pair of consecutive CRLFs
            appears, this of course ends  the  message  header  for  the
            encapsulated   message.   However,  since  the  encapsulated
            message's body is itself external, it does NOT appear in the
            area  that  follows.   For  example,  consider the following

                 Content-type: message/external-body; access-

                 Content-type:  image/gif

                 THIS IS NOT REALLY THE BODY!

            The area at the end, which  might  be  called  the  "phantom
            body", is ignored for most external-body messages.  However,
            it may be used to contain auxilliary  information  for  some

            such  messages,  as  indeed  it  is  when the access-type is
            "mail-server".   Of  the  access-types   defined   by   this
            document, the phantom body is used only when the access-type
            is "mail-server".  In all other cases, the phantom  body  is

            The only always-mandatory  parameter  for  message/external-
            body  is  "access-type";  all of the other parameters may be
            mandatory or optional depending on the value of access-type.

                 ACCESS-TYPE -- One or more case-insensitive words,
                 comma-separated,   indicating   supported   access
                 mechanisms by  which  the  file  or  data  may  be
                 obtained.  Values include, but are not limited to,
                 "FTP", "ANON-FTP",  "TFTP",  "AFS",  "LOCAL-FILE",
                 and   "MAIL-SERVER".  Future  values,  except  for
                 experimental values beginning with "X-",  must  be
                 registered with IANA, as described in Appendix F .

            In addition, the following two parameters are  optional  for
            ALL access-types:

                 EXPIRATION -- The date (in the RFC 822 "date-time"
                 syntax, as extended by RFC 1123 to permit 4 digits
                 in the date field) after which  the  existence  of
                 the external data is not guaranteed.

                 SIZE -- The size (in octets)  of  the  data.   The
                 intent  of this parameter is to help the recipient
                 decide whether or  not  to  expend  the  necessary
                 resources to retrieve the external data.

                 PERMISSION -- A field that  indicates  whether  or
                 not it is expected that clients might also attempt
                 to  overwrite  the  data.   By  default,   or   if
                 permission  is "read", the assumption is that they
                 are not, and that if the data is  retrieved  once,
                 it  is never needed again. If PERMISSION is "read-
                 write", this assumption is invalid, and any  local
                 copy  must  be  considered  no  more than a cache.
                 "Read"  and  "Read-write"  are  the  only  defined
                 values of permission.

            The precise semantics of the access-types defined  here  are
            described in the sections that follow.  The "ftp" and "tftp" access-types

            An access-type of FTP or TFTP  indicates  that  the  message
            body is accessible as a file using the FTP [RFC-959] or TFTP
            [RFC-783] protocols, respectively.  For these  access-types,
            the following additional parameters are mandatory:

                 NAME -- The name of the  file  that  contains  the
                 actual body data.

                 SITE -- A machine  from  which  the  file  may  be
                 obtained, using the given protocol

            Before the data is retrieved,  using  these  protocols,  the
            user  will  generally need to be asked to provide a login id
            and a password for the machine named by the site parameter.

            In addition, the  following  optional  parameters  may  also
            appear when the access-type is FTP or ANON-FTP:

                 DIRECTORY -- A directory from which the data named
                 by NAME should be retrieved.

                 MODE  --  A  transfer  mode  for  retrieving   the
                 information, e.g. "image".  The "anon-ftp" access-type

            The "anon-ftp" access-type is identical to the "ftp"  access
            type,  except  that  the user need not be asked to provide a
            name and password for the specified site.  Instead, the  ftp
            protocol  will be used with login "anonymous" and a password
            that corresponds to the user's email address.  The "local-file" and "afs" access-types

            An access-type of "local-file"  indicates  that  the  actual
            body  is  accessible  as  a  file  on the local machine.  An
            access-type of "afs" indicates that the file  is  accessible
            via  the  global  AFS  file  system.   In both cases, only a
            single parameter is required:

                 NAME -- The name of the  file  that  contains  the
                 actual body data.

            The following optional parameter may be used to describe the
            locality  of  reference  for  the data, that is, the site or
            sites at which the file is expected to be visible:

                 SITE -- A domain specifier for a machine or set of
                 machines that are known to have access to the data
                 file.  Asterisks may be used for wildcard matching
                 to   a   part   of   a   domain   name,   such  as
                 "*", to indicate a set of machines on
                 which the data should be directly visible, while a
                 single asterisk may be used  to  indicate  a  file
                 that  is  expected  to  be  universally available,
                 e.g., via a global file system.  The "mail-server" access-type

            The "mail-server" access-type indicates that the actual body
            is  available  from  a mail server.  The mandatory parameter
            for this access-type is:

                 SERVER -- The email address  of  the  mail  server
                 from which the actual body data can be obtained.

            Because mail servers accept a variety  of  syntax,  some  of
            which  is  multiline,  the full command to be sent to a mail
            server is not included as a parameter  on  the  content-type
            line.   Instead,  it  may  be provided as the "phantom body"
            when  the  content-type  is  message/external-body  and  the
            access-type is mail-server.

            Note that  MIME  does  not  define  a  mail  server  syntax.
            Rather,  it  allows  the  inclusion of arbitrary mail server
            commands  in  the  phantom  body.   Implementations   should
            include the phantom body in the body of the message it sends
            to the mail server address to retrieve the relevant data.

            With  the  emerging  possibility  of  very  wide-area   file
            systems,  it becomes very hard to know in advance the set of
            machines where a  file  will  and  will  not  be  accessible
            directly  from the file system.  Therefore it may make sense
            to provide both a file name, to be tried directly,  and  the
            name of one or more sites from which the file is known to be
            accessible.  An implementation can try  to  retrieve  remote
            files  using FTP or any other protocol, using anonymous file
            retrieval or prompting the user for the necessary  name  and
            password.   If  an  external body is accessible via multiple
            mechanisms, the sender may include multiple  parts  of  type
            message/external-body    within    an    entity    of   type

            However, the external-body mechanism is not intended  to  be
            limited  to  file  retrieval,  as  shown  by the mail-server
            access-type.  Beyond this, one  can  imagine,  for  example,
            using a video server for external references to video clips.

            If an entity is of type  "message/external-body",  then  the
            body  of  the  entity  will contain the header fields of the
            encapsulated message.  The body itself is to be found in the
            external  location.   This  means  that  if  the body of the
            "message/external-body"  message  contains  two  consecutive
            CRLFs,  everything  after  those  pairs  is  NOT part of the
            message itself.  For  most  message/external-body  messages,
            this trailing area must simply be ignored.  However, it is a
            convenient place for additional data that cannot be included
            in  the  content-type  header field.   In particular, if the
            "access-type" value is "mail-server", then the trailing area
            must  contain  commands to be sent to the mail server at the
            address given by NAME@SITE, where  NAME  and  SITE  are  the
            values of the NAME and SITE parameters, respectively.

            The embedded message header fields which appear in the  body
            of the message/external-body data can be used to declare the
            Content-type  of  the  external  body.   Thus   a   complete
            message/external-body  message,  referring  to a document in
            PostScript format, might look like this:

                 From: Whomever
                 Subject: whatever
                 MIME-Version: 1.0
                 Content-Type: multipart/alternative; boundary=42

                 Content-Type: message/external-body;

                      expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

                 Content-type: application/postscript

                 Content-Type: message/external-body;
                      expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

                 Content-type: application/postscript

                 Content-Type: message/external-body;
                      expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"

                 Content-type: application/postscript

                 get rfc-xxxx doc


            Like the  message/partial  type,  the  message/external-body
            type  is  intended to be transparent, that is, to convey the
            data type in the external  body  rather  than  to  convey  a
            message  with  a body of that type.  Thus the headers on the
            outer and inner parts must be merged using the same rules as
            for  message/partial.   In  particular,  this means that the
            Content-type header is overridden, but the From and  Subject
            headers are preserved.

            Note that since the external bodies are not  transported  as
            mail,  they  need  not  conform to the 7-bit and line length
            requirements, but might in fact be  binary  files.   Thus  a
            Content-Transfer-Encoding is not generally necessary, though
            it is permitted.

            Note that the body of a message of  type  "message/external-
            body"  is  governed  by  the  basic  syntax  for  an RFC 822
            message.   In  particular,   anything   before   the   first
            consecutive  pair  of  CRLFs  is  header  information, while
            anything after it is body information, which is ignored  for
            most access-types.

