Network Working Group G. Vaudreuil Request for Comments: 3801 Lucent Technologies Obsoletes: 2421 G. Parsons Category: Standards Track Nortel Networks June 2004 Voice Profile for Internet Mail - version 2 (VPIMv2) Status of this Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2004).
AbstractThis document specifies a restricted profile of the Internet multimedia messaging protocols for use between voice processing server platforms. The profile is referred to as the Voice Profile for Internet Mail (VPIM) in this document. These platforms have historically been special-purpose computers and often do not have the same facilities normally associated with a traditional Internet Email-capable computer. As a result, VPIM also specifies additional functionality, as it is needed. This profile is intended to specify the minimum common set of features to allow interworking between conforming systems. This document obsoletes RFC 2421 and describes version 2 of the profile with greater precision. No protocol changes were made in this revision. A list of changes from RFC 2421 are noted in Appendix F. Appendix A summarizes the protocol profiles of this version of VPIM.
1. Introduction...................................................3 1.1. Voice Messaging System Limitations.......................3 1.2. Design Goals.............................................4 1.3. Applicability for VPIM...................................5 2. Requirements Language..........................................5 3. Protocol Restrictions..........................................6 4. Voice Message Interchange Format...............................6 4.1. VPIM Message Addressing Formats..........................7 4.2. Message Header Fields....................................9 4.3. MIME Audio Content Descriptions.........................17 4.4. Voice Message Content Types.............................19 4.5. Other MIME Contents.....................................23 4.6. Delivery Status Notification (DSN)......................25 4.7. Message Disposition Notification (MDN)..................26 4.8. Forwarded Messages......................................26 4.9. Reply Messages..........................................27 5. Message Transport Protocol....................................27 5.1. Base SMTP Protocol......................................28 5.2. SMTP Service Extensions.................................28 5.3. ESMTP - SMTP Downgrading................................30 6. Directory Address Resolution..................................30 7. Management Protocols..........................................30 7.1. Network Management......................................31 8. Conformance Requirements......................................31 9. Security Considerations.......................................32 9.1. General Directive.......................................32 9.2. Threats and Problems....................................32 9.3. Security Techniques.....................................33 10. Normative References..........................................33 11. Acknowledgments...............................................36 12. Appendix A - VPIM Requirements Summary........................37 13. Appendix B - Example Voice Messages...........................43 14. Appendix C - Example Error Voice Processing Error Codes.......49 15. Appendix D - Example Voice Processing Disposition Types.......50 16. Appendix E - IANA Registrations...............................50 16.1. Voice Content-Disposition Parameter Definition.........51 16.2. Multipart/Voice-Message MIME Media Type Definition.....51 17. Appendix F - Change History: RFC 2421 (VPIM V2) To This Doc...53 18. Authors' Addresses............................................54 19. Full Copyright Statement......................................55
2) Voice mail machines usually act as an integrated Message Transfer Agent, Message Store and User Agent. There is typically no relaying of messages. RFC822 header fields may have limited use in the context of the limited messaging features currently deployed. 3) Voice mail message stores are generally not capable of preserving the full semantics of an Internet message. As such, use of a voice mail machine for gatewaying is not supported. In particular, storage of recipient lists, "Received:" lines, and "Message-ID:" may be limited. 4) Internet-style distribution/exploder mailing lists are not typically supported. Voice mail machines often implement only local alias lists, with error-to-sender and reply-to-sender behavior. Reply-all capabilities using a Cc list are not generally available. 5) Error reports must be machine-parsable so that helpful responses can be voiced to users whose only access mechanism is a telephone. 6) The voice mail systems generally limit address entry to 16 or fewer numeric characters, and normally do not support alphanumeric mailbox names. Alpha characters are not generally used for mailbox identification, as they cannot be easily entered from a telephone terminal. It should be noted that newer systems are based natively on SMTP/MIME and do not suffer these limitations. In particular, some systems may support media other than voice and fax.
This profile is intended to be robust enough to be used in an environment, such as the global Internet, with installed-base gateways that do not understand MIME. Full functionality, such as reliable error messages and binary transport, will require careful selection of gateways (e.g., via MX records) to be used as VPIM forwarding agents. Nothing in this document precludes use of general-purpose MIME email packages to read and compose VPIM messages. While no special configuration is required to receive VPIM conforming messages, some may be required to originate conforming structures. It is expected that a system administrator who can perform TCP/IP network configuration will manage a VPIM messaging system. When using facsimile or multiple voice encodings, it is suggested that the system administrator maintain a list of the capabilities of the networked mail machines to reduce the sending of undeliverable messages due to lack of feature support. Configuration, implementation and management of these directory-listing capabilities are local matters. REQ].
SIZE] to declare the maximum message size supported. The following sections describe the restrictions and additions to Internet mail protocols that are required to be conforming with this VPIM v2 profile. Though various SMTP, ESMTP and MIME features are described here, the implementer is referred to the relevant RFCs for complete details. The table in Appendix A summarizes the protocol details of this profile. RFC822], the Multipurpose Internet Message Extensions [MIME1-5], the X.400 gateway specification [X.400], and the delivery status and message disposition notifications [REPORT][DSN][DRPT][STATUS][MDN]. MIME, introduced in [MIME1], is a general-purpose message body format that is extensible to carry a wide range of body parts. It provides for encoding binary data so that it can be transported over the 7-bit text-oriented SMTP protocol. This transport encoding (denoted by the "Content-Transfer-Encoding:" MIME field) is in addition to the audio encoding required to generate a binary object. MIME defines two transport-encoding mechanisms to transform binary data into a 7-bit representation, one designed for text-like data ("Quoted-Printable"), and one for arbitrary binary data ("Base64"). While Base64 is dramatically more efficient for audio data, either will work. Where binary transport is available, no transport encoding is needed, and the data can be labeled as "Binary".
RFC 822 format based on the Domain Name System. This naming system has two components: the local part, used for username or mailbox identification; and the host part, used for global machine identification.
2) mailbox number+extension - for use as a private numbering plan with extensions any number of digits, use of "+" as separator - e.g., 2722+111@Lucent.com 3) +international number - for international telephone numbers conforming to E.164 maximum of 15 digits - e.g., +email@example.com 4) +international number+extension - for international telephone numbers conforming to E.164 maximum of 15 digits, with an extension (e.g., behind a PBX) that has a maximum of 15 digits. - e.g., +firstname.lastname@example.org Note that this address format is designed to be compatible with current usage within the voice messaging industry. It is not compatible with the addressing formats of RFCs 2303-2304. It is expected that as telephony services become more widespread on the Internet, these addressing formats will converge.
address from "non-mail-user". For compatibility with the installed base of mail user agents, implementations MUST reject the message when a message addressed to "non-mail-user" is received. The status code for such NDN's is 5.1.1 "Mailbox does not exist". Example: From: Telephone Answering <email@example.com> RFC822 "Reply-To:" or "From" field) Errors to the submitter - (Address in the MAIL FROM field of the ESMTP exchange or the "Return-Path:" RFC822 field) Some proprietary voice messaging protocols include only the recipient of the particular copy in the envelope and include no "header fields" except date and per-message features. Most voice messaging systems do not provide for "Header Information" in their messaging queues and only include delivery information. As a result, recipient information MAY be in either the "To:" or "Cc:" header fields. If all recipients cannot be presented then the recipient header fields SHOULD be omitted to indicate that an accurate list of recipients (e.g., for use with a reply-all capability) is not known.
RFC822] Example: From: "Joe S. User" <firstname.lastname@example.org> From: Technical Support <email@example.com> From: Nonfirstname.lastname@example.org Voice mail machines may not be able to support separate attributes for the "From:" header fields and the SMTP MAIL FROM, VPIM-conforming systems SHOULD set these values to the same address. Use of addresses different than those present in the "From:" header field address may result in unanticipated behavior. RECEIVE RULES The user listed in the "From:" field MUST be presented in the voice message envelope of the voice messaging system as the originator of the message, though the exact presentation is an implementation decision (e.g., the mailbox ID or the text name MAY be presented). The "From:" address SHOULD be used for replies (see 4.9).
Systems, such as gateways from protocols or legacy platforms that do not indicate the complete list of recipients, MAY provide a "To:" line. Because these systems cannot accurately enumerate all recipients in the "To:" headers, recipients SHOULD NOT be enumerated. RECEIVE RULES Systems conforming to this profile MAY discard the addresses in the "To:" fields if they are unable to store the information. This would, of course, make a reply-to-all capability impossible. If present, the addresses in the "To:" field MAY be used for a reply message to all recipients.
time zone offset, such as -0500 for North American Eastern Standard Time. This MAY be supplemented by a time zone name in parentheses, e.g., "-0700 (PDT)". Example: Date: Wed, 28 Jul 96 10:08:49 -0800 (PST) If the VPIM sender is relaying a message from a system that does not provide a time stamp, the time of arrival at the gateway system SHOULD be used as the date. RECEIVE RULES Conforming implementations SHOULD be able to convert [RFC822] date and time stamps into local time RFC822]). Any error messages resulting from the delivery failure MUST be sent to this address. Note that if the "Return-path:" is null ("<>") (e.g., a call answer message would have no return path) delivery status notifications MUST NOT be sent. SEND RULES The originating system MUST NOT add this header.
RECEIVE RULES If the receiving system is incapable of storing the return path (or MAIL FROM) to be used for subsequent delivery errors (i.e., it is a gateway to a legacy system or protocol), the receiving system must otherwise ensure that further delivery errors don't happen. Systems that do not support the return path MUST ensure that at the time the message is acknowledged (i.e., when a DSN would be sent), the message is delivered to the recipient's ultimate mailbox. Non-Delivery notifications SHOULD NOT be sent after that final delivery. RFC822] RFC822] SEND RULES A conforming system SHOULD NOT send a "Reply-To:" header. RECEIVE RULES If a "Reply-To:" field is present, a reply-to-sender message MAY be sent to the address specified (that is, in lieu of the address in the "From:" field). If the receiving system (e.g., multi-protocol
gateway) only supports one address for the originator, then the address in the "From:" field MUST be used and the "Reply-To:" field MAY be silently discarded. RFC822 message by MTAs. This is the only field that may be added by an MTA. Information in this header is useful for debugging when using an US-ASCII message reader or a header-parsing tool. From: [RFC822] SEND RULES A VPIM-conforming system MUST add a "Received:" field. When acting as a gateway, information about the system from which the message was received SHOULD be included. RECEIVE RULES A VPIM-conforming system MUST NOT remove any "Received:" fields when relaying messages to other MTAs or gateways. These header fields MAY be ignored or deleted when the message is received at the final destination. VPIM1] defines an earlier version of this profile and uses the token (Voice 1.0). Example: MIME-Version: 1.0 (Voice 2.0) This identifier is intended for information only and SHOULD NOT be used to semantically identify the message as being a VPIM message. Instead, the presence of the multipart/voice-message content type defined in section 18.2 SHOULD be used if identification is necessary. section 4.4 of this document. From: [MIME2]
section 5). When binary transport is not available, implementations MUST encode the audio and/or facsimile data as "Base64". RECEIVE RULES Conforming implementations MUST recognize and decode the standard encodings, "Binary" (when binary support is available), "7bit, "8bit", "Base64" and "Quoted-Printable" per [MIME1]. The detection and decoding of "Quoted-Printable", "7bit", and "8bit" MUST be supported in order to meet MIME requirements and to preserve interoperability with the fullest range of possible devices. X.400]
RECEIVE RULES If a "Sensitivity:" field with a value of "Private" is present in the message, a conforming system MUST prohibit the recipient from forwarding this message to any other user. A conforming system, however, SHOULD allow the responder to reply to a sensitive message, but SHOULD NOT include the original message content. The responder MAY set the sensitivity of the reply message. A receiving system MAY ignore sensitivity values of "Personal" and "Company Confidential". If the receiving system does not support privacy and the sensitivity is "Private", a negative delivery status notification MUST be sent to the originator with the appropriate status code (5.6.0) "Other or undefined protocol status" indicating that privacy could not be assured. The message contents SHOULD be returned to the sender to allow for a voice context with the notification. A non-delivery notification to a private message SHOULD NOT be tagged private since it will be sent to the originator. From: [X.400] A message with no privacy explicitly noted (i.e., no header) or with "Normal" sensitivity has no special treatment. X.400] Importance := "Importance" ":" importance-value Importance-value := "low" / "normal" / "high" SEND RULES Conforming implementations MAY include this header to indicate the importance of a message. RECEIVE RULES If the receiving system does not support "Importance:", the attribute MAY be silently dropped.
RFC822] SEND RULES For compatibility with text-based mailbox interfaces, a text subject field SHOULD be generated by a conforming implementation. It is RECOMMENDED that voice-messaging systems that do not support any text user interfaces (e.g., access only by a telephone) insert a generic subject header of "VPIM Message" or "Voice Message" for the benefit of GUI-enabled recipients. RECEIVE RULES It is anticipated that many voice-only systems will be incapable of storing the subject line. The subject MAY be discarded by a receiving system.
DISP] SEND RULES In order to distinguish between the various types of audio contents in a VPIM voice message a new disposition parameter "voice" is defined with IANA (see section 18.1) with the parameter values below to be used as appropriate: Audio-Type := "voice" "=" Audio-type-value Audio-type-value := "Voice-Message" / "Voice-Message-Notification" / "Originator-Spoken-Name" /"Recipient-Spoken-Name" /"Spoken-Subject" Voice-Message - the primary voice message, Voice-Message-Notification - a spoken delivery notification or spoken disposition notification, Originator-Spoken-Name - the spoken name of the originator, Recipient-Spoken-Name - the spoken name of the recipient(s) if available to the originator Spoken-Subject- the spoken subject of the message, typically spoken by the originator Note that there SHOULD only be one instance of each of these types of audio contents per message level. Additional instances of a given type (i.e., parameter value) MAY occur within an attached forwarded or reply voice message. If there are multiple recipients for a given message, recipient-spoken-name MUST NOT be used. RECEIVE RULES Implementations SHOULD use this header. However, those that do not understand the "voice" parameter (or the "Content-Disposition:" header) can safely ignore it, and will present the audio body parts in order (but will not be able to distinguish between them). If more than one instance of the "voice" parameter type value is encountered at one level (e.g., multiple 'Voice-Message' tagged contents) then they SHOULD be presented together.
DUR] LANG]. Example for UK English: Content-Language: en-UK SEND RULES A sending system MAY add this field to indicate the language of the voice. The determination of this (e.g., automated or user-selected) is a local implementation issue. RECEIVE RULES The use of this field on reception is a local implementation issue. It MAY be used as a hint to the recipient (e.g., end-user or an automated translation process) as to the language of the voice message.
Only the contents profiled can be sent within a VPIM voice message construct (i.e., the Multipart/Voice-Message content type) to form a simple or a more complex structure (several examples are given in Appendix B). The presence of other contents within a VPIM voice message is not permitted. In the absence of a bilateral agreement, conforming implementations MUST NOT create a message containing prohibited contents. In the spirit of liberal acceptance, a conforming implementation MAY accept and render prohibited content. Systems unable to accept or render prohibited contents MAY discard the prohibited contents as necessary to deliver the acceptable content. When multiple contents are present within the Multipart/Voice-Message, they SHOULD be presented to the user in the order that they appear in the message. Some deployed implementations based on a common interpretation of the original VPIM v2 specification reject messages with prohibited content rather than discard the unsupported contents. For interoperability with these systems, it is especially important that prohibited contents not be sent within a Multipart/Voice-Message. MIME2]. As such, it may be safely interpreted as a multipart/mixed by systems that do not understand the sub-type (only the identification as a voice message would be lost). In addition to the MIME required boundary parameter, a version parameter is also required for this sub-type. This is to distinguish this refinement of the sub-type from the previous definition in [VPIM1]. The value of the version parameter is "2.0" if the content conforms to the requirements of this specification. Should there be further revisions of this content type, there MUST be backwards compatibility (i.e., systems implementing version n can read version 2, and systems implementing version 2 can read version 2 contents within a version n). SEND RULES The Multipart/Voice-Message content-type MUST only contain the profiled media and content types specified in this section (i.e., Audio/*, Image/*, and Message/RFC822). The most common will be: spoken name, spoken subject, the message itself, and an attached fax. Forwarded messages are created by simply using the Message/RFC822 construct.
Conformant implementations MUST use Multipart/Voice-Message in a VPIM message. In most cases, this Multipart/Voice-Message Content-Type will be the top level but may be included within a Message/RFC822 if the message is forwarded or within a multipart/mixed when more than one message is being forwarded. RECEIVE RULES Conformant implementations MUST recognize the Multipart/Voice-Message content (whether it is a top-level content or contained in a Multipart/Mixed) and MUST be able to separate the contents (e.g., spoken name or spoken subject). The semantic of Multipart/Voice-Message (defined in section 18.2) is identical to Multipart/Mixed and may be interpreted as that by systems that do not recognize this content-type. RFC822 message encapsulation body part. This body part SHOULD be used within a Multipart/Voice- Message to forward complete messages (see 4.8) or to reply with original content (see 4.9). From: [MIME2] RECEIVE RULES The receiving system MUST accept this format and SHOULD treat this attachment as a forwarded message. The receiving system MAY flatten the forwarding structure (i.e., remove this construct to leave multiple voice contents or even concatenate the voice contents to fit in a recipient's mailbox), if necessary. ADPCM]. This encoding is a moderately- compressed encoding with a data rate of 32 kbits/second using moderate processing resources. Typically, this body contains several minutes of message content; however, if used for spoken name or subject the content is expected to be considerably shorter (i.e., about 5 and 10 seconds respectively).
RECEIVE RULES Receivers MUST be able to accept and decode Audio/32KADPCM. If an implementation can only handle one voice body, then multiple voice bodies (if present) SHOULD be concatenated, and MUST NOT be discarded. If concatenated, the contents SHOULD be in the same order they appeared in the multipart. TIFF-F], and the Image/TIFF MIME content-type is defined in [TIFFREG]. While there are several formats of TIFF, only TIFF-F is profiled for use within Multipart/Voice-Message. Further, since the TIFF-F file format is used in a store-and-forward mode with VPIM, the image MUST be encoded so that there is only one image strip per facsimile page. SEND RULES All VPIM implementations that support facsimile MUST generate TIFF-F compatible facsimile contents in the Image/TIFF subtype using the application=faxbw encoding by default. If the VPIM message is a voice- annotated fax, the implementation SHOULD send this fax content in Multipart/Voice-Message. If the message is a simple fax, an implementation MAY send it without using the Multipart/Voice-Message to be more compatible with fax-only (RFC 2305) implementations. While any valid MIME body header MAY be used (e.g., Content- Disposition to indicate the filename), none are specified to have special semantics for VPIM and MAY be ignored. Note that the content-type parameter application=faxbw MUST be included in outbound messages. RECEIVE RULES Not all VPIM systems support fax, but all SHOULD accept it within the multipart/voice-message. Within a Multipart/Voice-Message, a receiving system that cannot render fax content SHOULD accept the voice content of a VPIM message and discard the fax content. Outside a Multipart/Voice-Message, a recipient system MAY reject (with appropriate NDN) the entire message if it cannot store or is not capable of rendering a message with fax attachments. VPIM conforming systems MAY support fax outside of (or without) the Multipart/Voice- Message.
Some deployed implementations based on a common interpretation of the original VPIM V2 specification reject messages with fax content within the Multipart/Voice-Message rather than discard the unsupported contents. These systems will return the message to the sender with an NDN indicating lack of support for fax. section 4.5.1) MAY be included within a multipart/voice message. Other contents MUST NOT be included. Their handling is a local implementation issue. Multipart/mixed is included to promote interoperability with a wider range of systems and also to allow the creation of more complex multimedia messages (with a VPIM message as one part).
RECEIVE RULES For compatibility with an earlier specification of VPIM v2, the Text/Directory content type MUST be accepted by a conforming implementation, but need not be stored, processed, or rendered to the recipient. MIME4]). The voice encodings SHOULD be registered as subtypes of Audio. The fax encodings SHOULD be registered as subtypes of Image. SEND RULES Proprietary voice encoding formats or other standard formats SHOULD NOT be sent under this profile unless the sender has a reasonable expectation that the recipient will accept the encoding. In practice, this requires explicit per-destination configuration information maintained either in a directory, personal address book, or gateway configuration tables. RECEIVE RULES Systems MAY accept other Audio/* or Image/* content types if they can decode them. Systems which receive Audio/* or Image/* content types which they are unable to deposit or unable to render MUST return the message (and SHOULD include the original content) to the originator with an NDN indicating media not supported.
RECEIVE RULES Within a Multipart/Voice-Message, the Text/Plain content-type MAY be dropped from the message, if necessary, to deliver the audio/fax components. The recipient SHOULD NOT reject the entire message if the text component cannot be accepted or rendered. Outside a Multipart/Voice-Message, conforming implementations MUST accept Text/Plain; however, specific handling is left as an implementation decision. From: [MIME2] Some deployed implementations based on a common interpretation of the original VPIM V2 specification reject messages with any text content rather than discard the unsupported contents. These systems will return the message to the sender with an NDN indicating lack of support for text. REPORT]. The content-type which distinguishes DSN's from other types of notifications is Message/Delivery-Status, which is defined in [DSN]. SEND RULES A VPIM-compliant implementation MUST be able to send DSN's that conform to [REPORT] and [DSN]. Unless requested otherwise, a non- delivery DSN MUST be sent when any form of non-delivery of a message occurs. A VPIM-compliant implementation SHOULD provide a spoken delivery status in the "human-readable" body part of the DSN, but MAY provide a textual status. RECEIVE RULES A VPIM-compliant implementation MUST be able to receive DSN's that conform to [REPORT] and [DSN]. A VPIM-compliant implementation MUST be able to receive a DSN whose "human-readable" body part contains a spoken delivery status phrase or a textual description. Though subsequent use of the phrase or text is a local implementation issue, the intent of the DSN MUST be presented to the end user.
REPORT]. The content-type which distinguishes MDN's from other types of notifications is Message/Disposition-Notification, which is defined in [MDN]. SEND RULES A VPIM-compliant implementation SHOULD support the ability to request MDNs. This is done via the use of the "Disposition-Notification-To:" header field as defined in [MDN]. A VPIM-compliant implementation SHOULD support the ability to send MDNs, but these MDNs MUST conform to [REPORT] and [MDN]. When sending an MDN, a VPIM-compliant implementation SHOULD provide a spoken message disposition in the "human-readable" body part of the MDN, but MAY provide a textual status. RECEIVE RULES A VPIM-compliant implementation SHOULD respond to an MDN request with an MDN response. A VPIM-compliant implementation MUST be able to receive MDNs that conform to [REPORT] and [MDN], if it is capable of requesting MDNs. If a VPIM-compliant implementation is capable of receiving MDNs, it MUST be able to receive a MDN whose "human-readable" body part contains a spoken message disposition phrase or a textual disposition description. Though subsequent use of the phrase or text is a local implementation issue, the intent of the MDN MUST be presented to the end user. RFC822) can be recognized as a forwarded message (or even multiple forwarded messages), it is RECOMMENDED that this construct be used whenever possible.
Forwarded VPIM messages SHOULD be sent as a Multipart/Voice-Message with the entire original message enclosed in a Message/RFC822 content-type and the annotation as a separate Audio/* or Image/* body part. If the RFC822 header fields are not available for the forwarded content, simulated header fields with available information SHOULD be constructed to indicate the original sending timestamp, and the original sender as indicated in the "From:" field. Note that at least one of "From:", "Subject:", or "Date:" MUST be present. As well, the Message/RFC822 content MUST include at least the "MIME- Version:", and "Content-Type:" header fields. From: [MIME2] In the event that forwarding information is lost, the entire audio content MAY be sent as a single Audio/* segment without including any forwarding semantics. An example of this loss is an AMIS message being forwarded through an AMIS-to-VPIM gateway.