Network Working Group G. Vaudreuil
Request for Comments: 3801 Lucent Technologies
Obsoletes: 2421 G. Parsons
Category: Standards Track Nortel Networks
June 2004 Voice Profile for Internet Mail - version 2 (VPIMv2)
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright (C) The Internet Society (2004).
This document specifies a restricted profile of the Internet
multimedia messaging protocols for use between voice processing
server platforms. The profile is referred to as the Voice Profile
for Internet Mail (VPIM) in this document. These platforms have
historically been special-purpose computers and often do not have the
same facilities normally associated with a traditional Internet
Email-capable computer. As a result, VPIM also specifies additional
functionality, as it is needed. This profile is intended to specify
the minimum common set of features to allow interworking between
This document obsoletes RFC 2421 and describes version 2 of the
profile with greater precision. No protocol changes were made in
this revision. A list of changes from RFC 2421 are noted in Appendix
F. Appendix A summarizes the protocol profiles of this version of
MIME is the Internet multipurpose, multimedia-messaging standard.
This document explicitly recognizes its capabilities and provides a
mechanism for the exchange of various messaging technologies,
primarily voice and facsimile.
Voice messaging evolved as telephone answering service into a full
send, receive, and forward messaging paradigm with unique message
features, semantics and usage patterns. Voice messaging was
introduced on special purpose computers that interface to a telephone
switch and provide call answering and voice messaging services.
Traditionally, messages sent from one voice messaging system to
another were transported using analog networking protocols based on
DTMF signaling and analog voice playback. As the demand for
networking increases, there was a need for a standard high-quality
digital protocol to connect these machines. VPIM has successfully
demonstrated its usefulness as this new standard. VPIM is widely
implemented and is seeing deployment in customer networks. This
document clarifies ambiguities found in the earlier specification and
is consistent with implementation practice. The profile is referred
to as Voice Profile for Internet Mail (VPIM) in this document.
This document specifies a restricted profile of the Internet
multimedia messaging protocols for use between voice processing
server platforms. These platforms have historically been special-
purpose computers and often do not have the same facilities normally
associated with a traditional Internet Email-capable computer. As a
result, VPIM also specifies additional functionality, as it is
needed. This profile is intended to specify the minimum common set
of features to allow interworking between conforming systems.
This document obsoletes RFC 2421 and describes VPIM version 2 of with
greater precision. No protocol changes were made in this revision.
A list of changes from RFC 2421 are noted in Appendix F. Appendix A
summarizes the protocol profiles of this version of VPIM.
1.1. Voice Messaging System Limitations
The following are typical limitations of voice messaging platforms
that were considered in creating this baseline profile.
1) Text messages are not normally received and often cannot be
easily displayed or viewed. They can often be processed only via
text-to-speech or text-to-fax features not currently present in
many of these machines.
2) Voice mail machines usually act as an integrated Message
Transfer Agent, Message Store and User Agent. There is typically
no relaying of messages. RFC822 header fields may have limited
use in the context of the limited messaging features currently
3) Voice mail message stores are generally not capable of
preserving the full semantics of an Internet message. As such,
use of a voice mail machine for gatewaying is not supported. In
particular, storage of recipient lists, "Received:" lines, and
"Message-ID:" may be limited.
4) Internet-style distribution/exploder mailing lists are not
typically supported. Voice mail machines often implement only
local alias lists, with error-to-sender and reply-to-sender
behavior. Reply-all capabilities using a Cc list are not generally
5) Error reports must be machine-parsable so that helpful
responses can be voiced to users whose only access mechanism is a
6) The voice mail systems generally limit address entry to 16 or
fewer numeric characters, and normally do not support alphanumeric
mailbox names. Alpha characters are not generally used for
mailbox identification, as they cannot be easily entered from a
It should be noted that newer systems are based natively on SMTP/MIME
and do not suffer these limitations. In particular, some systems may
support media other than voice and fax.
1.2. Design Goals
It is a goal of this profile to make as few restrictions and
additions to the existing Internet mail protocols as possible while
satisfying the requirements for interoperability with current
generation voice messaging systems. This goal is motivated by the
desire to increase the accessibility to digital messaging by enabling
the use of proven existing networking software for rapid development.
This specification is intended for use on a TCP/IP network; however,
it is possible to use the SMTP protocol suite over other transport
protocols. The necessary protocol parameters for such use are
outside the scope of this document.
This profile is intended to be robust enough to be used in an
environment, such as the global Internet, with installed-base
gateways that do not understand MIME. Full functionality, such as
reliable error messages and binary transport, will require careful
selection of gateways (e.g., via MX records) to be used as VPIM
forwarding agents. Nothing in this document precludes use of
general-purpose MIME email packages to read and compose VPIM
messages. While no special configuration is required to receive VPIM
conforming messages, some may be required to originate conforming
It is expected that a system administrator who can perform TCP/IP
network configuration will manage a VPIM messaging system. When
using facsimile or multiple voice encodings, it is suggested that the
system administrator maintain a list of the capabilities of the
networked mail machines to reduce the sending of undeliverable
messages due to lack of feature support. Configuration,
implementation and management of these directory-listing capabilities
are local matters.
1.3. Applicability for VPIM
VPIM is intended for the exchange of voice messages between
traditional voice messaging systems and for systems that need to
interoperate with such systems. VPIM is intended connect voice-
messaging systems into special-purpose voice messaging networks.
VPIM may also be used between message store servers and VPIM-aware
clients such as web servers, TUI, and GUI clients. VPIM is not
intended or optimized for downloading to, or sending from commercial
Internet Voice Messaging, the subject of a separate standards
initiative, is intended to enable general-purpose email clients to
send and receive voice content through general-purpose message stores
in an interoperable way. IVM may also be a suitable format for
downloading voice messages from a VPIM server to a commercial email
client. It may also be a suitable format for submission of a voice
message from a general-purpose client into a VPIM system.
2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [REQ].
3. Protocol Restrictions
This protocol does not limit the number of recipients per message.
Where possible, server implementations should not restrict the number
of recipients in a single message. It is recognized that no
implementation supports unlimited recipients, and that the number of
supported recipients may be quite low.
This protocol does not limit the maximum message length.
Implementers should understand that some machines will be unable to
accept excessively long messages. A mechanism is defined in [SIZE]
to declare the maximum message size supported.
The following sections describe the restrictions and additions to
Internet mail protocols that are required to be conforming with this
VPIM v2 profile. Though various SMTP, ESMTP and MIME features are
described here, the implementer is referred to the relevant RFCs for
complete details. The table in Appendix A summarizes the protocol
details of this profile.
4. Voice Message Interchange Format
The voice message interchange format is a profile of the Internet
Mail Protocol Suite. Any Internet Mail message containing the format
defined in this section is referred to as a VPIM Message in this
document. As a result, this document assumes an understanding of the
Internet Mail specifications. Specifically, VPIM references
components from the message format standard for Internet messages
[RFC822], the Multipurpose Internet Message Extensions [MIME1-5], the
X.400 gateway specification [X.400], and the delivery status and
message disposition notifications [REPORT][DSN][DRPT][STATUS][MDN].
MIME, introduced in [MIME1], is a general-purpose message body format
that is extensible to carry a wide range of body parts. It provides
for encoding binary data so that it can be transported over the 7-bit
text-oriented SMTP protocol. This transport encoding (denoted by the
"Content-Transfer-Encoding:" MIME field) is in addition to the audio
encoding required to generate a binary object.
MIME defines two transport-encoding mechanisms to transform binary
data into a 7-bit representation, one designed for text-like data
("Quoted-Printable"), and one for arbitrary binary data ("Base64").
While Base64 is dramatically more efficient for audio data, either
will work. Where binary transport is available, no transport
encoding is needed, and the data can be labeled as "Binary".
4.1. VPIM Message Addressing Formats
VPIM addresses SHALL use the RFC 822 format based on the Domain Name
System. This naming system has two components: the local part, used
for username or mailbox identification; and the host part, used for
global machine identification.
4.1.1. VPIM Addresses
The local part of the address shall be a US-ASCII string uniquely
identifying a mailbox on a destination system. For voice messaging,
the local part SHALL be a printable string containing the mailbox ID
of the originator or recipient. While alpha characters and long
mailbox identifiers MAY be permitted, short numeric local parts
SHOULD be used as most voice mail networks rely on numeric mailbox
identifiers to retain compatibility with the limited 10-digit
telephone keypad. As a result, some voice messaging systems may only
be able to handle a numeric local part. The reception of
alphanumeric local parts on these systems may result in the address
being mapped to some locally unique (but confusing to the recipient)
number or, in the worst case the address could be deleted making the
message unreplyable. Additionally, it may be difficult to create
messages on these systems with an alphanumeric local part without
complex key sequences or some form of directory lookup (see 6). The
use of the Domain Name System should be transparent to the user. It
is the responsibility of the voice mail machine to lookup the fully-
qualified domain name (FQDN) based on the address entered by the user
In the absence of a global directory, specification of the local part
is expected to conform to international or private telephone
numbering plans. It is likely that private numbering plans will
prevail and these are left for local definition. However, it is
RECOMMENDED that public telephone numbers be noted according to the
international numbering plan described in [E.164]. The indication
that the local part is a public telephone number is given by a
preceding "+" (the "+" would not be entered from a telephone keypad,
it is added by the system as a flag). Since the primary information
in the numeric scheme is contained by the digits, other character
separators (e.g., "-") may be ignored (i.e., to allow parsing of the
numeric local mailbox) or may be used to recognize distinct portions
of the telephone number (e.g., country code). The specification of
the local part of a VPIM address can be split into the four groups
1) mailbox number
- for use as a private numbering plan (any number of digits)
- e.g., firstname.lastname@example.org
2) mailbox number+extension
- for use as a private numbering plan with extensions
any number of digits, use of "+" as separator
- e.g., 2722+111@Lucent.com
3) +international number
- for international telephone numbers conforming to E.164
maximum of 15 digits
- e.g., +email@example.com
4) +international number+extension
- for international telephone numbers conforming to E.164
maximum of 15 digits, with an extension (e.g., behind a
PBX) that has a maximum of 15 digits.
- e.g., +firstname.lastname@example.org
Note that this address format is designed to be compatible with
current usage within the voice messaging industry. It is not
compatible with the addressing formats of RFCs 2303-2304. It is
expected that as telephony services become more widespread on the
Internet, these addressing formats will converge.
4.1.2. Special Addresses
Special addresses to represent the sender are provided for
compatibility with the conventions of Internet mail. These addresses
do not use numeric local addresses, both to conform to current
Internet practice and to avoid conflict with existing numeric
addressing plans. Two special addresses are RESERVED for use as
By convention, a special mailbox named "postmaster" MUST exist on all
systems. This address is used for diagnostics and should be checked
regularly by the system manager. This mailbox is particularly likely
to receive text messages, which is not normal on a voice-processing
platform. The specific handling of these messages is an individual
If a reply to a message is not possible, such as a telephone-
answering message, then the special address "non-mail-user" SHOULD be
used as the originator's address. Any text name such as "Telephone
Answering", or the telephone number if it is available, is permitted.
This special address is used as a token to indicate an unreachable
originator. A conforming implementation MUST NOT permit a reply to an
address from "non-mail-user". For compatibility with the installed
base of mail user agents, implementations MUST reject the message
when a message addressed to "non-mail-user" is received. The status
code for such NDN's is 5.1.1 "Mailbox does not exist".
From: Telephone Answering <email@example.com>
4.1.3. Distribution Lists
There are many ways to handle distribution list (DL) expansions and
none are 'standard'. A VPIM implementation MAY support DLs. Using a
simple alias is a behavior closest to what many voice mail systems do
today and what is to be used with VPIM messages. A couple of
important features that need special care when DLs are used are:
Reply to the originator - (Address in the RFC822 "Reply-To:" or
Errors to the submitter - (Address in the MAIL FROM field of the
ESMTP exchange or the "Return-Path:"
Some proprietary voice messaging protocols include only the recipient
of the particular copy in the envelope and include no "header fields"
except date and per-message features. Most voice messaging systems
do not provide for "Header Information" in their messaging queues and
only include delivery information. As a result, recipient
information MAY be in either the "To:" or "Cc:" header fields. If all
recipients cannot be presented then the recipient header fields
SHOULD be omitted to indicate that an accurate list of recipients
(e.g., for use with a reply-all capability) is not known.
4.2. Message Header Fields
Internet messages contain a header information block. This header
block contains information required to identify the sender, the list
of recipients, the message send time, and other information intended
for user presentation. Except for specialized gateway and mailing
list cases, header fields do not indicate delivery options for the
transport of messages.
Distribution list processors are noted for modifying or adding to the
header fields of messages that pass through them. VPIM systems MUST
be able to accept and ignore header fields that are not defined here.
The following header lines are permitted for use with VPIM messages:
The originator's fully qualified domain address (a mailbox address
followed by the fully qualified domain name) MUST be present.
Systems conforming with this profile SHOULD provide the text personal
name of the voice message originator in a quoted phrase, if the name
is available. Text names of corporate or positional mailboxes MAY be
provided as a simple string. From: [RFC822]
From: "Joe S. User" <firstname.lastname@example.org>
From: Technical Support <email@example.com>
Voice mail machines may not be able to support separate attributes
for the "From:" header fields and the SMTP MAIL FROM, VPIM-conforming
systems SHOULD set these values to the same address. Use of
addresses different than those present in the "From:" header field
address may result in unanticipated behavior.
The user listed in the "From:" field MUST be presented in the voice
message envelope of the voice messaging system as the originator of
the message, though the exact presentation is an implementation
decision (e.g., the mailbox ID or the text name MAY be presented).
The "From:" address SHOULD be used for replies (see 4.9).
The "To:" field contains the recipient's fully-qualified domain
There MAY be one or more "To:" fields in any message. Systems SHOULD
provide a list of recipients only if all recipients are available.
Systems, such as gateways from protocols or legacy platforms that do
not indicate the complete list of recipients, MAY provide a "To:"
line. Because these systems cannot accurately enumerate all
recipients in the "To:" headers, recipients SHOULD NOT be enumerated.
Systems conforming to this profile MAY discard the addresses in the
"To:" fields if they are unable to store the information. This
would, of course, make a reply-to-all capability impossible. If
present, the addresses in the "To:" field MAY be used for a reply
message to all recipients.
The "Cc:" field contains additional recipients' fully qualified
domain addresses. Many voice mail systems maintain only sufficient
envelope information for message delivery and are not capable of
storing or providing a complete list of additional recipients.
Conforming implementations MAY send "Cc:" lists if all recipients are
known at the time of origination. If not, systems SHOULD omit the
"Cc:" fields to indicate that the full list of recipients is unknown
or otherwise unavailable. The list of disclosed recipients MUST NOT
include undisclosed recipients (i.e., those sent via a blind copy).
Systems conforming to this profile MAY add all the addresses in the
"Cc:" field to the "To:" field, others MAY discard the addresses in
the "Cc:" fields. If a list of "Cc:" addresses is present, these
addresses MAY be used for a reply message to all recipients.
The "Date:" field contains the date and time the message was sent by
The sending system MUST report the time the message was sent. The
time zone MUST be present and SHOULD be represented in a four-digit
time zone offset, such as -0500 for North American Eastern Standard
Time. This MAY be supplemented by a time zone name in parentheses,
e.g., "-0700 (PDT)".
Date: Wed, 28 Jul 96 10:08:49 -0800 (PST)
If the VPIM sender is relaying a message from a system that does not
provide a time stamp, the time of arrival at the gateway system
SHOULD be used as the date.
Conforming implementations SHOULD be able to convert [RFC822] date
and time stamps into local time
The "Sender:" field contains the actual address of the originator if
an agent on behalf of the author indicated in the "From:" field sends
This header field MAY be sent by VPIM-conforming systems.
If the address in the "Sender:" field cannot be preserved in the
recipient's message queues or in the next-hop protocol from a
gateway, the field MAY be silently discarded.
The "Return-path:" field is added by the final delivering SMTP
server. If present, it contains the address from the MAIL FROM
parameter of the ESMTP exchange (see [RFC822]). Any error messages
resulting from the delivery failure MUST be sent to this address.
Note that if the "Return-path:" is null ("<>") (e.g., a call answer
message would have no return path) delivery status notifications MUST
NOT be sent.
The originating system MUST NOT add this header.
If the receiving system is incapable of storing the return path (or
MAIL FROM) to be used for subsequent delivery errors (i.e., it is a
gateway to a legacy system or protocol), the receiving system must
otherwise ensure that further delivery errors don't happen. Systems
that do not support the return path MUST ensure that at the time the
message is acknowledged (i.e., when a DSN would be sent), the message
is delivered to the recipient's ultimate mailbox. Non-Delivery
notifications SHOULD NOT be sent after that final delivery.
The "Message-Id:" field contains a globally unique per-message
A globally unique message-id MUST be generated for each message sent
from a VPIM-conforming implementation.
When provided in the original message, it MUST be used when sending a
MDN. This identifier MAY be used for tracking and auditing. From
If present, the "Reply-To:" header provides a preferred address to
which reply messages should be sent (see 4.9). Typically, voice mail
systems can only support one originator of a message so it is likely
that this field will be ignored by the receiving system. From:
A conforming system SHOULD NOT send a "Reply-To:" header.
If a "Reply-To:" field is present, a reply-to-sender message MAY be
sent to the address specified (that is, in lieu of the address in the
"From:" field). If the receiving system (e.g., multi-protocol
gateway) only supports one address for the originator, then the
address in the "From:" field MUST be used and the "Reply-To:" field
MAY be silently discarded.
The "Received:" field contains trace information added to the
beginning of a RFC822 message by MTAs. This is the only field that
may be added by an MTA. Information in this header is useful for
debugging when using an US-ASCII message reader or a header-parsing
tool. From: [RFC822]
A VPIM-conforming system MUST add a "Received:" field. When acting
as a gateway, information about the system from which the message was
received SHOULD be included.
A VPIM-conforming system MUST NOT remove any "Received:" fields when
relaying messages to other MTAs or gateways. These header fields MAY
be ignored or deleted when the message is received at the final
4.2.10. MIME Version
The "MIME-Version:" field MUST be present to indicate that the
message conforms to [MIME1-5]. Systems conforming with this
specification SHOULD include a comment with the words "(Voice 2.0)".
[VPIM1] defines an earlier version of this profile and uses the token
(Voice 1.0). Example:
MIME-Version: 1.0 (Voice 2.0)
This identifier is intended for information only and SHOULD NOT be
used to semantically identify the message as being a VPIM message.
Instead, the presence of the multipart/voice-message content type
defined in section 18.2 SHOULD be used if identification is
The "Content-Type:" header MUST be present to declare the type of
content enclosed in the message. The typical top-level content in a
VPIM Message SHOULD be Multipart/Voice-Message. The allowable
contents are detailed starting in section 4.4 of this document.
Because Internet mail was initially specified to carry only 7-bit
US-ASCII text, it may be necessary to encode voice and fax data into
a representation suitable for that environment. The "Content-
Transfer-Encoding:" header describes this transformation if it is
An implementation in conformance with this profile SHOULD send audio
and/or facsimile data in "Binary" form when binary message transport
is available (see section 5). When binary transport is not
available, implementations MUST encode the audio and/or facsimile
data as "Base64".
Conforming implementations MUST recognize and decode the standard
encodings, "Binary" (when binary support is available), "7bit,
"8bit", "Base64" and "Quoted-Printable" per [MIME1]. The detection
and decoding of "Quoted-Printable", "7bit", and "8bit" MUST be
supported in order to meet MIME requirements and to preserve
interoperability with the fullest range of possible devices.
The "Sensitivity:" field, if present, indicates the requested privacy
level. If no privacy is requested, this field is omitted. The
header definition is as follows:
Sensitivity := "Sensitivity" ":" Sensitivity-value
Sensitivity-value := "Personal" / "Private" / "Company-Confidential"
A VPIM-conforming implementation MAY include this header to indicate
the sensitivity of a message. If a user marks a message "Private", a
conforming implementation MUST send only the "Private" sensitivity
level. There are no VPIM-specific semantics defined for the values
"Personal" or "Company-Confidential". A conforming implementation
SHOULD NOT send the values "Personal" or "Company-Confidential". If
the message is of "Normal" sensitivity, this field SHOULD be omitted.
If a "Sensitivity:" field with a value of "Private" is present in the
message, a conforming system MUST prohibit the recipient from
forwarding this message to any other user. A conforming system,
however, SHOULD allow the responder to reply to a sensitive message,
but SHOULD NOT include the original message content. The responder
MAY set the sensitivity of the reply message.
A receiving system MAY ignore sensitivity values of "Personal" and
If the receiving system does not support privacy and the sensitivity
is "Private", a negative delivery status notification MUST be sent to
the originator with the appropriate status code (5.6.0) "Other or
undefined protocol status" indicating that privacy could not be
assured. The message contents SHOULD be returned to the sender to
allow for a voice context with the notification. A non-delivery
notification to a private message SHOULD NOT be tagged private since
it will be sent to the originator. From: [X.400]
A message with no privacy explicitly noted (i.e., no header) or with
"Normal" sensitivity has no special treatment.
Indicates the requested importance to be given by the receiving
system. If no special importance is requested, this header MAY be
omitted and the value of the absent header assumed to be "normal".
Importance := "Importance" ":" importance-value
Importance-value := "low" / "normal" / "high"
Conforming implementations MAY include this header to indicate the
importance of a message.
If the receiving system does not support "Importance:", the attribute
MAY be silently dropped.
The "Subject:" field is often provided by email systems but is not
widely supported on voice mail platforms. From: [RFC822]
For compatibility with text-based mailbox interfaces, a text subject
field SHOULD be generated by a conforming implementation. It is
RECOMMENDED that voice-messaging systems that do not support any text
user interfaces (e.g., access only by a telephone) insert a generic
subject header of "VPIM Message" or "Voice Message" for the benefit
of GUI-enabled recipients.
It is anticipated that many voice-only systems will be incapable of
storing the subject line. The subject MAY be discarded by a
4.3. MIME Audio Content Descriptions
This field MAY be present to facilitate the text identification of
these body parts in simple email readers. Any values may be used.
Content-Description: Big Telco Voice Message
This field MAY be added to a voice body part to offer a freeform
description of the voice content. It is useful to incorporate the
values for Content-Disposition with additional descriptions. For
example, this can be used to indicate product name or transcoding
This field MAY be displayed to the recipient. However, since it is
only informative it MAY be ignored.
This field MUST be present to allow the parsable identification of
body parts within a VPIM voice message. This is especially useful
if, as is typical, more than one Audio/* body occurs within a single
level (e.g., Multipart/Voice-Message). Since a VPIM voice message is
intended to be automatically played in the order in which the audio
contents occur, the audio contents MUST always be of disposition
inline. However, it is still useful to include a filename value, so
this SHOULD be present if this information is available. From:
In order to distinguish between the various types of audio contents
in a VPIM voice message a new disposition parameter "voice" is
defined with IANA (see section 18.1) with the parameter values below
to be used as appropriate:
Audio-Type := "voice" "=" Audio-type-value
Audio-type-value := "Voice-Message" / "Voice-Message-Notification" /
"Originator-Spoken-Name" /"Recipient-Spoken-Name" /"Spoken-Subject"
Voice-Message - the primary voice message,
Voice-Message-Notification - a spoken delivery notification
or spoken disposition notification,
Originator-Spoken-Name - the spoken name of the originator,
Recipient-Spoken-Name - the spoken name of the recipient(s) if
available to the originator
Spoken-Subject- the spoken subject of the message, typically
spoken by the originator
Note that there SHOULD only be one instance of each of these types of
audio contents per message level. Additional instances of a given
type (i.e., parameter value) MAY occur within an attached forwarded
or reply voice message. If there are multiple recipients for a given
message, recipient-spoken-name MUST NOT be used.
Implementations SHOULD use this header. However, those that do not
understand the "voice" parameter (or the "Content-Disposition:"
header) can safely ignore it, and will present the audio body parts
in order (but will not be able to distinguish between them). If more
than one instance of the "voice" parameter type value is encountered
at one level (e.g., multiple 'Voice-Message' tagged contents) then
they SHOULD be presented together.
The "Content-Duration:" header provides an indication of the audio
length in seconds of the segment.
This field MAY be present to allow the specification of the length of
the audio body part in seconds.
The use of this field on reception is a local implementation issue.
This field MAY be present to allow the specification of the spoken
language of the audio body part. The encoding is defined in [LANG].
Example for UK English:
A sending system MAY add this field to indicate the language of the
voice. The determination of this (e.g., automated or user-selected)
is a local implementation issue.
The use of this field on reception is a local implementation issue.
It MAY be used as a hint to the recipient (e.g., end-user or an
automated translation process) as to the language of the voice
4.4. Voice Message Content Types
The content types described in this section are identified for use
within the Multipart/Voice-Message content. This content is referred
to as a "VPIM message" in this document and is the fundamental part
of a "VPIM message".
Only the contents profiled can be sent within a VPIM voice message
construct (i.e., the Multipart/Voice-Message content type) to form a
simple or a more complex structure (several examples are given in
Appendix B). The presence of other contents within a VPIM voice
message is not permitted. In the absence of a bilateral agreement,
conforming implementations MUST NOT create a message containing
prohibited contents. In the spirit of liberal acceptance, a
conforming implementation MAY accept and render prohibited content.
Systems unable to accept or render prohibited contents MAY discard
the prohibited contents as necessary to deliver the acceptable
content. When multiple contents are present within the
Multipart/Voice-Message, they SHOULD be presented to the user in the
order that they appear in the message.
Some deployed implementations based on a common interpretation of the
original VPIM v2 specification reject messages with prohibited
content rather than discard the unsupported contents. For
interoperability with these systems, it is especially important that
prohibited contents not be sent within a Multipart/Voice-Message.
This MIME multipart structure provides a mechanism for packaging a
voice message into one container that is tagged as VPIM v2
conforming. The sub-type is identical in semantics and syntax to
multipart/mixed, as defined in [MIME2]. As such, it may be safely
interpreted as a multipart/mixed by systems that do not understand
the sub-type (only the identification as a voice message would be
In addition to the MIME required boundary parameter, a version
parameter is also required for this sub-type. This is to distinguish
this refinement of the sub-type from the previous definition in
[VPIM1]. The value of the version parameter is "2.0" if the content
conforms to the requirements of this specification. Should there be
further revisions of this content type, there MUST be backwards
compatibility (i.e., systems implementing version n can read version
2, and systems implementing version 2 can read version 2 contents
within a version n).
The Multipart/Voice-Message content-type MUST only contain the
profiled media and content types specified in this section (i.e.,
Audio/*, Image/*, and Message/RFC822). The most common will be:
spoken name, spoken subject, the message itself, and an attached fax.
Forwarded messages are created by simply using the Message/RFC822
Conformant implementations MUST use Multipart/Voice-Message in a VPIM
message. In most cases, this Multipart/Voice-Message Content-Type
will be the top level but may be included within a Message/RFC822 if
the message is forwarded or within a multipart/mixed when more than
one message is being forwarded.
Conformant implementations MUST recognize the Multipart/Voice-Message
content (whether it is a top-level content or contained in a
Multipart/Mixed) and MUST be able to separate the contents (e.g.,
spoken name or spoken subject).
The semantic of Multipart/Voice-Message (defined in section 18.2) is
identical to Multipart/Mixed and may be interpreted as that by
systems that do not recognize this content-type.
MIME requires support of the Message/RFC822 message encapsulation
body part. This body part SHOULD be used within a Multipart/Voice-
Message to forward complete messages (see 4.8) or to reply with
original content (see 4.9). From: [MIME2]
The receiving system MUST accept this format and SHOULD treat this
attachment as a forwarded message. The receiving system MAY flatten
the forwarding structure (i.e., remove this construct to leave
multiple voice contents or even concatenate the voice contents to fit
in a recipient's mailbox), if necessary.
An implementation conforming to this profile MUST send Audio/32KADPCM
by default for voice [ADPCM]. This encoding is a moderately-
compressed encoding with a data rate of 32 kbits/second using
moderate processing resources. Typically, this body contains several
minutes of message content; however, if used for spoken name or
subject the content is expected to be considerably shorter (i.e.,
about 5 and 10 seconds respectively).
Receivers MUST be able to accept and decode Audio/32KADPCM. If an
implementation can only handle one voice body, then multiple voice
bodies (if present) SHOULD be concatenated, and MUST NOT be
discarded. If concatenated, the contents SHOULD be in the same order
they appeared in the multipart.
A common image encoding for facsimile, known as TIFF-F, is a
derivative of the Tag Image File Format (TIFF) and is described in
several documents. For the purposes of VPIM, the F Profile of TIFF
for Facsimile (TIFF-F) is defined in [TIFF-F], and the Image/TIFF
MIME content-type is defined in [TIFFREG]. While there are several
formats of TIFF, only TIFF-F is profiled for use within
Multipart/Voice-Message. Further, since the TIFF-F file format is
used in a store-and-forward mode with VPIM, the image MUST be encoded
so that there is only one image strip per facsimile page.
All VPIM implementations that support facsimile MUST generate TIFF-F
compatible facsimile contents in the Image/TIFF subtype using the
application=faxbw encoding by default. If the VPIM message is a
voice- annotated fax, the implementation SHOULD send this fax content
in Multipart/Voice-Message. If the message is a simple fax, an
implementation MAY send it without using the Multipart/Voice-Message
to be more compatible with fax-only (RFC 2305) implementations.
While any valid MIME body header MAY be used (e.g., Content-
Disposition to indicate the filename), none are specified to have
special semantics for VPIM and MAY be ignored. Note that the
content-type parameter application=faxbw MUST be included in outbound
Not all VPIM systems support fax, but all SHOULD accept it within the
multipart/voice-message. Within a Multipart/Voice-Message, a
receiving system that cannot render fax content SHOULD accept the
voice content of a VPIM message and discard the fax content. Outside
a Multipart/Voice-Message, a recipient system MAY reject (with
appropriate NDN) the entire message if it cannot store or is not
capable of rendering a message with fax attachments. VPIM conforming
systems MAY support fax outside of (or without) the Multipart/Voice-
Some deployed implementations based on a common interpretation of the
original VPIM V2 specification reject messages with fax content
within the Multipart/Voice-Message rather than discard the
unsupported contents. These systems will return the message to the
sender with an NDN indicating lack of support for fax.
4.5. Other MIME Contents
The following MIME contents (with the exception of multipart/mixed in
section 4.5.1) MAY be included within a multipart/voice message.
Other contents MUST NOT be included. Their handling is a local
implementation issue. Multipart/mixed is included to promote
interoperability with a wider range of systems and also to allow the
creation of more complex multimedia messages (with a VPIM message as
This common MIME content-type allows the enclosing of several body
parts in a single message.
A VPIM voice message (i.e., multipart/voice-message) MAY be included
within a message with a Multipart/Mixed top-level content type.
Typically, this would only be used when mixing non-voice and non-fax
contents with a voice message.
Such a message is not itself a VPIM message and the handling of such
a construct is outside the scope of the VPIM profile. However, an
the spirit of liberal acceptance, a conforming implementation MUST
accept and render a VPIM voice message contained in a
This content was profiled in the original specification of VPIM v2 as
a means of transporting contact information from the sender to the
recipient. This usage did not find widespread adoption and is no
longer a feature of VPIM V2. Conforming implementations SHOULD NOT
send the Text/Directory content type.
For compatibility with an earlier specification of VPIM v2, the
Text/Directory content type MUST be accepted by a conforming
implementation, but need not be stored, processed, or rendered to the
4.5.3. Proprietary Voice or Fax Formats
Use of any other encoding except the required codecs reduces
interoperability in the absence of explicit knowledge about the
capabilities of the recipient. A conforming implementation SHOULD
NOT use any other encoding unless a unique identifier is registered
with the IANA prior to use (see [MIME4]). The voice encodings SHOULD
be registered as subtypes of Audio. The fax encodings SHOULD be
registered as subtypes of Image.
Proprietary voice encoding formats or other standard formats SHOULD
NOT be sent under this profile unless the sender has a reasonable
expectation that the recipient will accept the encoding. In
practice, this requires explicit per-destination configuration
information maintained either in a directory, personal address book,
or gateway configuration tables.
Systems MAY accept other Audio/* or Image/* content types if they can
decode them. Systems which receive Audio/* or Image/* content types
which they are unable to deposit or unable to render MUST return the
message (and SHOULD include the original content) to the originator
with an NDN indicating media not supported.
MIME requires support of the basic Text/Plain content type (with the
US-ASCII character set). This content type has limited applicability
within the voice-messaging environment. However, because VPIM is a
MIME profile, MIME requirements SHOULD be met.
Conforming VPIM implementations SHOULD NOT send the Text/Plain
content-type. Implementations MAY send the Text/Plain content-type
outside the Multipart/Voice-Message.
Within a Multipart/Voice-Message, the Text/Plain content-type MAY be
dropped from the message, if necessary, to deliver the audio/fax
components. The recipient SHOULD NOT reject the entire message if
the text component cannot be accepted or rendered.
Outside a Multipart/Voice-Message, conforming implementations MUST
accept Text/Plain; however, specific handling is left as an
implementation decision. From: [MIME2]
Some deployed implementations based on a common interpretation of the
original VPIM V2 specification reject messages with any text content
rather than discard the unsupported contents. These systems will
return the message to the sender with an NDN indicating lack of
support for text.
4.6. Delivery Status Notification (DSN)
A DSN is a notification of delivery (positive DSN), non-delivery
(negative DSN), or temporary delivery delay (delayed DSN). The top-
level content-type of a DSN is Multipart/Report, which is defined in
[REPORT]. The content-type which distinguishes DSN's from other
types of notifications is Message/Delivery-Status, which is defined
A VPIM-compliant implementation MUST be able to send DSN's that
conform to [REPORT] and [DSN]. Unless requested otherwise, a non-
delivery DSN MUST be sent when any form of non-delivery of a message
A VPIM-compliant implementation SHOULD provide a spoken delivery
status in the "human-readable" body part of the DSN, but MAY provide
a textual status.
A VPIM-compliant implementation MUST be able to receive DSN's that
conform to [REPORT] and [DSN].
A VPIM-compliant implementation MUST be able to receive a DSN whose
"human-readable" body part contains a spoken delivery status phrase
or a textual description. Though subsequent use of the phrase or
text is a local implementation issue, the intent of the DSN MUST be
presented to the end user.
4.7. Message Disposition Notification (MDN)
An MDN is a notification indicating what happens to a message after
it is deposited in the recipient's mailbox. An MDN can be positive
(message was read/played/rendered/etc.) or negative (message was
deleted before recipient could see it, etc.). The top-level
content-type of a MDN is Multipart/Report, which is defined in
[REPORT]. The content-type which distinguishes MDN's from other
types of notifications is Message/Disposition-Notification, which is
defined in [MDN].
A VPIM-compliant implementation SHOULD support the ability to request
MDNs. This is done via the use of the "Disposition-Notification-To:"
header field as defined in [MDN].
A VPIM-compliant implementation SHOULD support the ability to send
MDNs, but these MDNs MUST conform to [REPORT] and [MDN].
When sending an MDN, a VPIM-compliant implementation SHOULD provide a
spoken message disposition in the "human-readable" body part of the
MDN, but MAY provide a textual status.
A VPIM-compliant implementation SHOULD respond to an MDN request with
an MDN response.
A VPIM-compliant implementation MUST be able to receive MDNs that
conform to [REPORT] and [MDN], if it is capable of requesting MDNs.
If a VPIM-compliant implementation is capable of receiving MDNs, it
MUST be able to receive a MDN whose "human-readable" body part
contains a spoken message disposition phrase or a textual disposition
description. Though subsequent use of the phrase or text is a local
implementation issue, the intent of the MDN MUST be presented to the
4.8. Forwarded Messages
VPIM v2 explicitly supports the forwarding of voice and fax content
with voice or fax annotation. However, only the two constructs
described below are acceptable in a VPIM message. Since only the
first (i.e., Message/RFC822) can be recognized as a forwarded message
(or even multiple forwarded messages), it is RECOMMENDED that this
construct be used whenever possible.
Forwarded VPIM messages SHOULD be sent as a Multipart/Voice-Message
with the entire original message enclosed in a Message/RFC822
content-type and the annotation as a separate Audio/* or Image/* body
part. If the RFC822 header fields are not available for the
forwarded content, simulated header fields with available information
SHOULD be constructed to indicate the original sending timestamp, and
the original sender as indicated in the "From:" field. Note that at
least one of "From:", "Subject:", or "Date:" MUST be present. As
well, the Message/RFC822 content MUST include at least the "MIME-
Version:", and "Content-Type:" header fields. From: [MIME2]
In the event that forwarding information is lost, the entire audio
content MAY be sent as a single Audio/* segment without including any
forwarding semantics. An example of this loss is an AMIS message
being forwarded through an AMIS-to-VPIM gateway.
4.9. Reply Messages
VPIM v2 explicitly supports replying to received messages.
Support of multiple originator header fields in a reply message is
often not possible on voice messaging systems, so it may be necessary
to choose only one when gatewaying a VPIM message to another voice
message system. However, implementers should note that this may make
it impossible to send DSN's, MDN's, and replies to their proper
In some cases, replying to a message is not possible, such as with a
message created by telephone answering (i.e., classic voice mail).
In this case, the From field SHOULD contain the special address non-
mail-user@domain (see 4.1.2). The recipient's VPIM system SHOULD NOT
offer the option to reply to this kind of message (unless an
outcalling feature is offered - which is out of scope for VPIM).