13. IANA Considerations
This specification defines a new SDP [RFC4566] attribute in
Contact name: Philip Zimmermann <email@example.com>
Attribute name: "zrtp-hash"
Type of attribute: Media level
Subject to charset: Not
Purpose of attribute: The 'zrtp-hash' indicates that a UA supports
the ZRTP protocol and provides a hash of the
ZRTP Hello message. The ZRTP protocol
version number is also specified.
Allowed attribute values: Hex
14. Media Security Requirements
This section discuses how ZRTP meets all RTP security requirements
discussed in the Media Security Requirements [RFC5479] document
without any dependencies on other protocols or extensions, unlike
DTLS-SRTP [RFC5764] which requires additional protocols and
R-FORK-RETARGET is met since ZRTP is a media path key agreement
R-DISTINCT is met since ZRTP uses ZIDs and allows multiple
independent ZRTP exchanges to proceed.
R-HERFP is met since ZRTP is a media path key agreement protocol.
R-REUSE is met using the Multistream and Preshared modes.
R-AVOID-CLIPPING is met since ZRTP is a media path key agreement
R-RTP-CHECK is met since the ZRTP packet format does not pass the
RTP validity check.
R-ASSOC is met using the a=zrtp-hash SDP attribute in INVITEs and
responses (Section 8.1).
R-NEGOTIATE is met using the Commit message.
R-PSTN is met since ZRTP can be implemented in Gateways.
R-PFS is met using ZRTP Diffie-Hellman key agreement methods.
R-COMPUTE is met using the Hello/Commit ZRTP exchange.
R-CERTS is met using the verbal comparison of the SAS.
R-FIPS is met since ZRTP uses only FIPS-approved algorithms in all
relevant categories. The authors believe ZRTP is compliant with
[NIST-SP800-56A], [NIST-SP800-108], [FIPS-198-1], [FIPS-180-3],
[NIST-SP800-38A], [FIPS-197], and [NSA-Suite-B], which should meet
the FIPS-140 validation requirements set by [FIPS-140-2-Annex-A]
R-DOS is met since ZRTP does not introduce any new denial-of-
R-EXISTING is met since ZRTP can support the use of certificates
R-AGILITY is met since the set of hash, cipher, SRTP
authentication tag type, key agreement method, SAS type, and
signature type can all be extended and negotiated.
R-DOWNGRADE is met since ZRTP has protection against downgrade
R-PASS-MEDIA is met since ZRTP prevents a passive adversary with
access to the media path from gaining access to keying material
used to protect SRTP media packets.
R-PASS-SIG is met since ZRTP prevents a passive adversary with
access to the signaling path from gaining access to keying
material used to protect SRTP media packets.
R-SIG-MEDIA is met using the a=zrtp-hash SDP attribute in INVITEs
R-ID-BINDING is met using the a=zrtp-hash SDP attribute
R-ACT-ACT is met using the a=zrtp-hash SDP attribute in INVITEs
R-BEST-SECURE is met since ZRTP utilizes the RTP/AVP profile and
hence best effort SRTP in every case.
R-OTHER-SIGNALING is met since ZRTP can utilize modes in which
there is no dependency on the signaling path.
R-RECORDING is met using the ZRTP Disclosure flag.
R-TRANSCODER is met if the transcoder operates as a trusted MitM
(i.e., a PBX).
R-ALLOW-RTP is met due to ZRTP's best effort encryption.
15. Security Considerations
This document is all about securely keying SRTP sessions. As such,
security is discussed in every section.
Most secure phones rely on a Diffie-Hellman exchange to agree on a
common session key. But since DH is susceptible to a MiTM attack, it
is common practice to provide a way to authenticate the DH exchange.
In some military systems, this is done by depending on digital
signatures backed by a centrally managed PKI. A decade of industry
experience has shown that deploying centrally managed PKIs can be a
painful and often futile experience. PKIs are just too messy and
require too much activation energy to get them started. Setting up a
PKI requires somebody to run it, which is not practical for an
equipment provider. A service provider, like a carrier, might
venture down this path, but even then you have to deal with cross-
carrier authentication, certificate revocation lists, and other
complexities. It is much simpler to avoid PKIs altogether,
especially when developing secure commercial products. It is
therefore more common for commercial secure phones in the PSTN world
to augment the DH exchange with a Short Authentication String (SAS)
combined with a hash commitment at the start of the key exchange, to
shorten the length of SAS material that must be read aloud. No PKI
is required for this approach to authenticating the DH exchange. The
AT&T TSD 3600, Eric Blossom's COMSEC secure phones [comsec],
[PGPfone], and the GSMK CryptoPhone are all examples of products that
took this simpler lightweight approach. The main problem with this
approach is inattentive users who may not execute the voice
Some questions have been raised about voice spoofing during the short
authentication string (SAS) comparison. But it is a mistake to think
this is simply an exercise in voice impersonation (perhaps this could
be called the "Rich Little" attack). Although there are digital
signal processing techniques for changing a person's voice, that does
not mean a MiTM attacker can safely break into a phone conversation
and inject his own SAS at just the right moment. He doesn't know
exactly when or in what manner the users will choose to read aloud
the SAS, or in what context they will bring it up or say it, or even
which of the two speakers will say it, or if indeed they both will
say it. In addition, some methods of rendering the SAS involve using
a list of words such as the PGP word list[Juola2], in a manner
analogous to how pilots use the NATO phonetic alphabet to convey
information. This can make it even more complicated for the
attacker, because these words can be worked into the conversation in
unpredictable ways. If the session also includes video (an
increasingly common usage scenario), the MiTM may be further deterred
by the difficulty of making the lips sync with the voice-spoofed SAS.
The PGP word list is designed to make each word phonetically
distinct, which also tends to create distinctive lip movements.
Remember that the attacker places a very high value on not being
detected, and if he makes a mistake, he doesn't get to do it over.
A question has been raised regarding the safety of the SAS procedure
for people who don't know each other's voices, because it may allow
an attack from a MiTM even if he lacks voice impersonation
capabilities. This is not as much of a problem as it seems, because
it isn't necessary that users recognize each other by their voice.
It is only necessary that they detect that the voice used for the SAS
procedure doesn't match the voice in the rest of the phone
Special consideration must be given to secure phone calls with
automated systems that cannot perform a verbal SAS comparison between
two humans (e.g., a voice mail system). If a well-functioning PKI is
available to all parties, it is recommended that credentials be
provisioned at the automated system sufficient to use one of the
automatic MiTM detection mechanisms from Section 8.1.1 or
Section 7.2. Or rely on a previously established cached shared
secret (pbxsecret or rs1 or both), backed by a human-executed SAS
comparison during an initial call. Note that it is worse than
useless and absolutely unsafe to rely on a robot voice from the
remote endpoint to compare the SAS, because a robot voice can be
trivially forged by a MiTM. However, a robot voice may be safe to
use strictly locally for a different purpose. A ZRTP user agent may
render its locally computed SAS to the local user via a robot voice
if no visual display is available, provided the user can readily
determine that the robot voice is generated locally, not from the
A popular and field-proven approach to MiTM protection is used by SSH
(Secure Shell) [RFC4251], which Peter Gutmann likes to call the "baby
duck" security model. SSH establishes a relationship by exchanging
public keys in the initial session, when we assume no attacker is
present, and this makes it possible to authenticate all subsequent
sessions. A successful MiTM attacker has to have been present in all
sessions all the way back to the first one, which is assumed to be
difficult for the attacker. ZRTP's key continuity features are
actually better than SSH, at least for VoIP, for reasons described in
Section 15.1. All this is accomplished without resorting to a
centrally managed PKI.
We use an analogous baby duck security model to authenticate the DH
exchange in ZRTP. We don't need to exchange persistent public keys,
we can simply cache a shared secret and re-use it to authenticate a
long series of DH exchanges for secure phone calls over a long period
of time. If we verbally compare just one SAS, and then cache a
shared secret for later calls to use for authentication, no new voice
authentication rituals need to be executed. We just have to remember
we did one already.
If one party ever loses this cached shared secret, it is no longer
available for authentication of DH exchanges. This cache mismatch
situation is easy to detect by the party that still has a surviving
shared secret cache entry. If it fails to match, either there is a
MiTM attack or one side has lost their shared secret cache entry.
The user agent that discovers the cache mismatch must alert the user
that a cache mismatch has been detected, and that he must do a verbal
comparison of the SAS to distinguish if the mismatch is because of a
MiTM attack or because of the other party losing her cache (normative
language is in Section 4.3.2). Voice confirmation is absolutely
essential in this situation. From that point on, the two parties
start over with a new cached shared secret. Then, they can go back
to omitting the voice authentication on later calls.
Precautions must be observed when using a trusted MiTM device such as
a trusted PBX, as described in Section 7.3. Make sure you really
trust that this PBX will never be compromised before establishing it
as a trusted MiTM, because it is in a position to wiretap calls for
any phone that trusts it. It is "licensed" to be in a position to
wiretap. You are safer to try to arrange the connection topology to
route the media directly between the two ZRTP peers, not through a
trusted PBX. Real end-to-end encryption is preferred.
The security of the SAS mechanism depends on the user verifying it
verbally with his peer at the other endpoint. There is some risk the
user will not be so diligent and may ignore the SAS. For a
discussion on how users become habituated to security warnings in the
PKI certificate world, see [Sunshine]. Part of the problems
discussed in that paper are from the habituation syndrome common to
most warning messages, and part of them are from the fact that users
simply don't understand trust models. Fortunately, ZRTP doesn't need
a trust model to use the SAS mechanism, so it's easier for the user
to grasp the idea of comparing the SAS verbally with the other party;
it's easier than understanding a trust model, at least. Also, the
verbal comparison of the SAS gets both users involved, and they will
notice a mismatch of the SAS. Also, the ZRTP user agent will know
when the SAS has been previously verified because of the SAS verified
flag (V) (Section 7.1), and only ask the user to verify it when
needed. After it has been verified once, the key continuity features
make it unnecessary to verify it again.
15.1. Self-Healing Key Continuity Feature
The key continuity features of ZRTP are analogous to those provided
by SSH (Secure Shell) [RFC4251], but they differ in one respect. SSH
caches public signature keys that never change, and uses a permanent
private signature key that must be guarded from disclosure. If
someone steals your SSH private signature key, they can impersonate
you in all future sessions and can mount a successful MiTM attack any
time they want.
ZRTP caches symmetric key material used to compute secret session
keys, and these values change with each session. If someone steals
your ZRTP shared secret cache, they only get one chance to mount a
MiTM attack, in the very next session. If they miss that chance, the
retained shared secret is refreshed with a new value, and the window
of vulnerability heals itself, which means they are locked out of any
future opportunities to mount a MiTM attack. This gives ZRTP a
"self-healing" feature if any cached key material is compromised.
A MiTM attacker must always be in the media path. This presents a
significant operational burden for the attacker in many VoIP usage
scenarios, because being in the media path for every call is often
harder than being in the signaling path. This will likely create
coverage gaps in the attacker's opportunities to mount a MiTM attack.
ZRTP's self-healing key continuity features are better than SSH at
exploiting any temporary gaps in MiTM attack opportunities. Thus,
ZRTP quickly recovers from any disclosure of cached key material.
In systems that use a persistent private signature key, such as SSH,
the stored signature key is usually protected from disclosure by
encryption that requires a user-supplied high-entropy passphrase.
This arrangement may be acceptable for a diligent user with a desktop
computer sitting in an office with a full ASCII keyboard. But it
would be prohibitively inconvenient and unsafe to type a high-entropy
passphrase on a mobile phone's numeric keypad while driving a car.
Users will reject any scheme that requires the use of a passphrase on
such a platform, which means mobile phones carry an elevated risk of
compromise of stored key material, and thus would especially benefit
from the self-healing aspects of ZRTP's key continuity features.
The infamous Debian OpenSSL weak key vulnerability [dsa-1571]
(discovered and patched in May 2008) offers a real-world example of
why ZRTP's self-healing scheme is a good way to do key continuity.
The Debian bug resulted in the production of a lot of weak SSH (and
TLS/SSL) keys, which continued to compromise security even after the
bug had been patched. In contrast, ZRTP's key continuity scheme adds
new entropy to the cached key material with every call, so old
deficiencies in entropy are washed away with each new session.
It should be noted that the addition of shared secret entropy from
previous sessions can extend the strength of the new session key to
AES-256 levels, even if the new session uses Diffie-Hellman keys no
larger than DH-3072 or ECDH-256, provided the cached shared secrets
were initially established when the wiretapper was not present. This
is why AES-256 MAY be used with the smaller DH key sizes in
Section 5.1.5, despite the key strength comparisons in Table 2 of
Caching shared symmetric key material is also less CPU intensive
compared with using digital signatures, which may be important for
low-power mobile platforms.
Unlike the long-lived non-updated key material used by SSH, the
dynamically updated shared secrets of ZRTP may lose sync if
traditional backup/restore mechanisms are used. This limitation is a
consequence of the otherwise beneficial aspects of this approach to
key continuity, and it is partially mitigated by ZRTP's built-in
cache backup logic (Section 4.6.1).
The authors would like to thank Bryce "Zooko" Wilcox-O'Hearn and
Colin Plumb for their contributions to the design of this protocol.
Also, thanks to Hal Finney, Viktor Krikun, Werner Dittmann, Dan Wing,
Sagar Pai, David McGrew, Colin Perkins, Dan Harkins, David Black, Tim
Polk, Richard Harris, Roni Even, Jon Peterson, and Robert Sparks for
their helpful comments and suggestions. Thanks to Lily Chen at NIST
for her assistance in ensuring compliance with NIST SP800-56A and
The use of one-way hash chains to key HMACs in ZRTP is similar to
Adrian Perrig's TESLA protocol [TESLA].
17.1. Normative References
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
Hashing for Message Authentication", RFC 2104,
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3526] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP)
Diffie-Hellman groups for Internet Key Exchange (IKE)",
RFC 3526, May 2003.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004.
[RFC4231] Nystrom, M., "Identifiers and Test Vectors for HMAC-SHA-
224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512",
RFC 4231, December 2005.
[RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
Description Protocol", RFC 4566, July 2006.
[RFC4880] Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R.
Thayer, "OpenPGP Message Format", RFC 4880, November 2007.
[RFC4960] Stewart, R., "Stream Control Transmission Protocol",
RFC 4960, September 2007.
[RFC5114] Lepinski, M. and S. Kent, "Additional Diffie-Hellman
Groups for Use with IETF Standards", RFC 5114,
[RFC5479] Wing, D., Fries, S., Tschofenig, H., and F. Audet,
"Requirements and Analysis of Media Security Management
Protocols", RFC 5479, April 2009.
[RFC5759] Solinas, J. and L. Zieglar, "Suite B Certificate and
Certificate Revocation List (CRL) Profile", RFC 5759,
[RFC6188] McGrew, D., "The Use of AES-192 and AES-256 in Secure
RTP", RFC 6188, March 2011.
"Annex A: Approved Security Functions for FIPS PUB 140-2",
NIST FIPS PUB 140-2 Annex A, January 2011.
"Annex D: Approved Key Establishment Techniques for FIPS
PUB 140-2", NIST FIPS PUB 140-2 Annex D, January 2011.
"Secure Hash Standard (SHS)", NIST FIPS PUB 180-3, October
"Digital Signature Standard (DSS)", NIST FIPS PUB 186-
3, June 2009.
[FIPS-197] "Advanced Encryption Standard (AES)", NIST FIPS PUB
197, November 2001.
"The Keyed-Hash Message Authentication Code (HMAC)", NIST
FIPS PUB 198-1, July 2008.
Dworkin, M., "Recommendation for Block Cipher Modes of
Operation", NIST Special Publication 800-38A, 2001
Barker, E., Johnson, D., and M. Smid, "Recommendation for
Pair-Wise Key Establishment Schemes Using Discrete
Logarithm Cryptography", NIST Special Publication 800-
56A Revision 1, March 2007.
Barker, E. and J. Kelsey, "Recommendation for Random
Number Generation Using Deterministic Random Bit
Generators", NIST Special Publication 800-90 (Revised),
Chen, L., "Recommendation for Key Derivation Using
Pseudorandom Functions", NIST Special Publication 800-
108, October 2009.
[RFC4474] Peterson, J. and C. Jennings, "Enhancements for
Authenticated Identity Management in the Session
Initiation Protocol (SIP)", RFC 4474, August 2006.
[RFC4475] Sparks, R., Hawrylyshen, A., Johnston, A., Rosenberg, J.,
and H. Schulzrinne, "Session Initiation Protocol (SIP)
Torture Test Messages", RFC 4475, May 2006.
[RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E.
Carrara, "Key Management Extensions for Session
Description Protocol (SDP) and Real Time Streaming
Protocol (RTSP)", RFC 4567, July 2006.
[RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session
Description Protocol (SDP) Security Descriptions for Media
Streams", RFC 4568, July 2006.
[RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol
(SIP) Call Control - Conferencing for User Agents",
BCP 119, RFC 4579, August 2006.
[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
(ICE): A Protocol for Network Address Translator (NAT)
Traversal for Offer/Answer Protocols", RFC 5245,
[RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer
Security (DTLS) Extension to Establish Keys for the Secure
Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.
[RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand
Key Derivation Function (HKDF)", RFC 5869, May 2010.
[RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic
Curve Cryptography Algorithms", RFC 6090, February 2011.
McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption
in Secure RTP (SRTP)", Work in Progress, January 2011.
Jivsov, A., "ECC in OpenPGP", Work in Progress,
Perkins, C. and J. Valin, "Guidelines for the use of
Variable Bit Rate Audio with Secure RTP", Work
in Progress, December 2010.
Wing, D. and H. Kaplan, "SIP Identity using Media Path",
Work in Progress, February 2008.
Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid,
"Recommendation for Key Management - Part 1: General
(Revised)", NIST Special Publication 800-57 - Part
1 Revised March 2007.
Barker, E. and A. Roginsky, "Recommendation for the
Transitioning of Cryptographic Algorithms and Key
Lengths", NIST Special Publication 800-131A January 2011.
[SHA-3] "Cryptographic Hash Algorithm Competition", NIST Computer
Security Resource Center Cryptographic Hash Project.
[Skein1] "The Skein Hash Function Family - Web site",
[XEP-0262] Saint-Andre, P., "Use of ZRTP in Jingle RTP Sessions", XSF
XEP 0262, August 2010.
[Ferguson] Ferguson, N. and B. Schneier, "Practical Cryptography",
Wiley Publishing, 2003.
[Juola1] Juola, P. and P. Zimmermann, "Whole-Word Phonetic
Distances and the PGPfone Alphabet", Proceedings of the
International Conference of Spoken Language Processing
[Juola2] Juola, P., "Isolated Word Confusion Metrics and the
PGPfone Alphabet", Proceedings of New Methods in Language
[PGPfone] Zimmermann, P., "PGPfone", July 1996,
[Zfone] Zimmermann, P., "Zfone Project", 2006,
Santa Cruz, California
Alan Johnston (editor)
St. Louis, MO 63124