Network Working Group P. Resnick, Ed.
Request for Comments: 5322 Qualcomm Incorporated
Obsoletes: 2822 October 2008
Category: Standards Track
Internet Message Format
Status of This Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
This document specifies the Internet Message Format (IMF), a syntax
for text messages that are sent between computer users, within the
framework of "electronic mail" messages. This specification is a
revision of Request For Comments (RFC) 2822, which itself superseded
Request For Comments (RFC) 822, "Standard for the Format of ARPA
Internet Text Messages", updating it to reflect current practice and
incorporating incremental changes that were specified in other RFCs.
This document specifies the Internet Message Format (IMF), a syntax
for text messages that are sent between computer users, within the
framework of "electronic mail" messages. This specification is an
update to [RFC2822], which itself superseded [RFC0822], updating it
to reflect current practice and incorporating incremental changes
that were specified in other RFCs such as [RFC1123].
This document specifies a syntax only for text messages. In
particular, it makes no provision for the transmission of images,
audio, or other sorts of structured data in electronic mail messages.
There are several extensions published, such as the MIME document
series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms
for the transmission of such data through electronic mail, either by
extending the syntax provided here or by structuring such messages to
conform to this syntax. Those mechanisms are outside of the scope of
In the context of electronic mail, messages are viewed as having an
envelope and contents. The envelope contains whatever information is
needed to accomplish transmission and delivery. (See [RFC5321] for a
discussion of the envelope.) The contents comprise the object to be
delivered to the recipient. This specification applies only to the
format and some of the semantics of message contents. It contains no
specification of the information in the envelope.
However, some message systems may use information from the contents
to create the envelope. It is intended that this specification
facilitate the acquisition of such information by programs.
This specification is intended as a definition of what message
content format is to be passed between systems. Though some message
systems locally store messages in this format (which eliminates the
need for translation between formats) and others use formats that
differ from the one specified in this specification, local storage is
outside of the scope of this specification.
Note: This specification is not intended to dictate the internal
formats used by sites, the specific message system features that
they are expected to support, or any of the characteristics of
user interface programs that create or read messages. In
addition, this document does not specify an encoding of the
characters for either transport or storage; that is, it does not
specify the number of bits used or how those bits are specifically
transferred over the wire or stored on disk.
1.2. Notational Conventions
1.2.1. Requirements Notation
This document occasionally uses terms that appear in capital letters.
When the terms "MUST", "SHOULD", "RECOMMENDED", "MUST NOT", "SHOULD
NOT", and "MAY" appear capitalized, they are being used to indicate
particular requirements of this specification. A discussion of the
meanings of these terms appears in [RFC2119].
1.2.2. Syntactic Notation
This specification uses the Augmented Backus-Naur Form (ABNF)
[RFC5234] notation for the formal definitions of the syntax of
messages. Characters will be specified either by a decimal value
(e.g., the value %d65 for uppercase A and %d97 for lowercase A) or by
a case-insensitive literal value enclosed in quotation marks (e.g.,
"A" for either uppercase or lowercase A).
1.2.3. Structure of This Document
This document is divided into several sections.
This section, section 1, is a short introduction to the document.
Section 2 lays out the general description of a message and its
constituent parts. This is an overview to help the reader understand
some of the general principles used in the later portions of this
document. Any examples in this section MUST NOT be taken as
specification of the formal syntax of any part of a message.
Section 3 specifies formal ABNF rules for the structure of each part
of a message (the syntax) and describes the relationship between
those parts and their meaning in the context of a message (the
semantics). That is, it lays out the actual rules for the structure
of each part of a message (the syntax) as well as a description of
the parts and instructions for their interpretation (the semantics).
This includes analysis of the syntax and semantics of subparts of
messages that have specific structure. The syntax included in
section 3 represents messages as they MUST be created. There are
also notes in section 3 to indicate if any of the options specified
in the syntax SHOULD be used over any of the others.
Both sections 2 and 3 describe messages that are legal to generate
for purposes of this specification.
Section 4 of this document specifies an "obsolete" syntax. There are
references in section 3 to these obsolete syntactic elements. The
rules of the obsolete syntax are elements that have appeared in
earlier versions of this specification or have previously been widely
used in Internet messages. As such, these elements MUST be
interpreted by parsers of messages in order to be conformant to this
specification. However, since items in this syntax have been
determined to be non-interoperable or to cause significant problems
for recipients of messages, they MUST NOT be generated by creators of
Section 5 details security considerations to take into account when
implementing this specification.
Appendix A lists examples of different sorts of messages. These
examples are not exhaustive of the types of messages that appear on
the Internet, but give a broad overview of certain syntactic forms.
Appendix B lists the differences between this specification and
earlier specifications for Internet messages.
Appendix C contains acknowledgements.
2. Lexical Analysis of Messages
2.1. General Description
At the most basic level, a message is a series of characters. A
message that is conformant with this specification is composed of
characters with values in the range of 1 through 127 and interpreted
as US-ASCII [ANSI.X3-4.1986] characters. For brevity, this document
sometimes refers to this range of characters as simply "US-ASCII
Note: This document specifies that messages are made up of
characters in the US-ASCII range of 1 through 127. There are
other documents, specifically the MIME document series ([RFC2045],
[RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]), that
extend this specification to allow for values outside of that
range. Discussion of those mechanisms is not within the scope of
Messages are divided into lines of characters. A line is a series of
characters that is delimited with the two characters carriage-return
and line-feed; that is, the carriage return (CR) character (ASCII
value 13) followed immediately by the line feed (LF) character (ASCII
value 10). (The carriage return/line feed pair is usually written in
this document as "CRLF".)
A message consists of header fields (collectively called "the header
section of the message") followed, optionally, by a body. The header
section is a sequence of lines of characters with special syntax as
defined in this specification. The body is simply a sequence of
characters that follows the header section and is separated from the
header section by an empty line (i.e., a line with nothing preceding
Note: Common parlance and earlier versions of this specification
use the term "header" to either refer to the entire header section
or to refer to an individual header field. To avoid ambiguity,
this document does not use the terms "header" or "headers" in
isolation, but instead always uses "header field" to refer to the
individual field and "header section" to refer to the entire
2.1.1. Line Length Limits
There are two limits that this specification places on the number of
characters in a line. Each line of characters MUST be no more than
998 characters, and SHOULD be no more than 78 characters, excluding
The 998 character limit is due to limitations in many implementations
that send, receive, or store IMF messages which simply cannot handle
more than 998 characters on a line. Receiving implementations would
do well to handle an arbitrarily large number of characters in a line
for robustness sake. However, there are so many implementations that
(in compliance with the transport requirements of [RFC5321]) do not
accept messages containing more than 1000 characters including the CR
and LF per line, it is important for implementations not to create
The more conservative 78 character recommendation is to accommodate
the many implementations of user interfaces that display these
messages which may truncate, or disastrously wrap, the display of
more than 78 characters per line, in spite of the fact that such
implementations are non-conformant to the intent of this
specification (and that of [RFC5321] if they actually cause
information to be lost). Again, even though this limitation is put
on messages, it is incumbent upon implementations that display
messages to handle an arbitrarily large number of characters in a
line (certainly at least up to the 998 character limit) for the sake
2.2. Header Fields
Header fields are lines beginning with a field name, followed by a
colon (":"), followed by a field body, and terminated by CRLF. A
field name MUST be composed of printable US-ASCII characters (i.e.,
characters that have values between 33 and 126, inclusive), except
colon. A field body may be composed of printable US-ASCII characters
as well as the space (SP, ASCII value 32) and horizontal tab (HTAB,
ASCII value 9) characters (together known as the white space
characters, WSP). A field body MUST NOT include CR and LF except
when used in "folding" and "unfolding", as described in section
2.2.3. All field bodies MUST conform to the syntax described in
sections 3 and 4 of this specification.
2.2.1. Unstructured Header Field Bodies
Some field bodies in this specification are defined simply as
"unstructured" (which is specified in section 3.2.5 as any printable
US-ASCII characters plus white space characters) with no further
restrictions. These are referred to as unstructured field bodies.
Semantically, unstructured field bodies are simply to be treated as a
single line of characters with no further processing (except for
"folding" and "unfolding" as described in section 2.2.3).
2.2.2. Structured Header Field Bodies
Some field bodies in this specification have a syntax that is more
restrictive than the unstructured field bodies described above.
These are referred to as "structured" field bodies. Structured field
bodies are sequences of specific lexical tokens as described in
sections 3 and 4 of this specification. Many of these tokens are
allowed (according to their syntax) to be introduced or end with
comments (as described in section 3.2.2) as well as the white space
characters, and those white space characters are subject to "folding"
and "unfolding" as described in section 2.2.3. Semantic analysis of
structured field bodies is given along with their syntax.
2.2.3. Long Header Fields
Each header field is logically a single line of characters comprising
the field name, the colon, and the field body. For convenience
however, and to deal with the 998/78 character limitations per line,
the field body portion of a header field can be split into a
multiple-line representation; this is called "folding". The general
rule is that wherever this specification allows for folding white
space (not simply WSP characters), a CRLF may be inserted before any
For example, the header field:
Subject: This is a test
can be represented as:
is a test
Note: Though structured field bodies are defined in such a way
that folding can take place between many of the lexical tokens
(and even within some of the lexical tokens), folding SHOULD be
limited to placing the CRLF at higher-level syntactic breaks. For
instance, if a field body is defined as comma-separated values, it
is recommended that folding occur after the comma separating the
structured items in preference to other places where the field
could be folded, even if it is allowed elsewhere.
The process of moving from this folded multiple-line representation
of a header field to its single line representation is called
"unfolding". Unfolding is accomplished by simply removing any CRLF
that is immediately followed by WSP. Each header field should be
treated in its unfolded form for further syntactic and semantic
evaluation. An unfolded header field has no length restriction and
therefore may be indeterminately long.
The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:
o CR and LF MUST only occur together as CRLF; they MUST NOT appear
independently in the body.
o Lines of characters in the body MUST be limited to 998 characters,
and SHOULD be limited to 78 characters, excluding the CRLF.
Note: As was stated earlier, there are other documents,
specifically the MIME documents ([RFC2045], [RFC2046], [RFC2049],
[RFC4288], [RFC4289]), that extend (and limit) this specification
to allow for different sorts of message bodies. Again, these
mechanisms are beyond the scope of this document.