RFC 3711

The Secure Real-time Transport Protocol (SRTP)

Pages: 56
Proposed Standard
→ Errata
Updated by: 5506 6904 9335

Part 2 of 3 – Pages 19 to 36

RFC3711 - Page 19 prevText

4.  Pre-Defined Cryptographic Transforms

   While there are numerous encryption and message authentication
   algorithms that can be used in SRTP, below we define default
   algorithms in order to avoid the complexity of specifying the
   encodings for the signaling of algorithm and parameter identifiers.
   The defined algorithms have been chosen as they fulfill the goals
   listed in Section 2.  Recommendations on how to extend SRTP with new
   transforms are given in Section 6.

4.1.  Encryption

   The following parameters are common to both pre-defined, non-NULL,
   encryption transforms specified in this section.

   *  BLOCK_CIPHER-MODE indicates the block cipher used and its mode of
      operation
   *  n_b is the bit-size of the block for the block cipher
   *  k_e is the session encryption key
   *  n_e is the bit-length of k_e
   *  k_s is the session salting key
   *  n_s is the bit-length of k_s
   *  SRTP_PREFIX_LENGTH is the octet length of the keystream prefix, a
      non-negative integer, specified by the message authentication code
      in use.

   The distinct session keys and salts for SRTP/SRTCP are by default
   derived as specified in Section 4.3.

   The encryption transforms defined in SRTP map the SRTP packet index
   and secret key into a pseudo-random keystream segment.  Each
   keystream segment encrypts a single RTP packet.  The process of
   encrypting a packet consists of generating the keystream segment
   corresponding to the packet, and then bitwise exclusive-oring that
   keystream segment onto the payload of the RTP packet to produce the
   Encrypted Portion of the SRTP packet.  In case the payload size is
   not an integer multiple of n_b bits, the excess (least significant)
   bits of the keystream are simply discarded.  Decryption is done the
   same way, but swapping the roles of the plaintext and ciphertext.

RFC3711 - Page 20

   +----+   +------------------+---------------------------------+
   | KG |-->| Keystream Prefix |          Keystream Suffix       |---+
   +----+   +------------------+---------------------------------+   |
                                                                     |
                               +---------------------------------+   v
                               |     Payload of RTP Packet       |->(*)
                               +---------------------------------+   |
                                                                     |
                               +---------------------------------+   |
                               | Encrypted Portion of SRTP Packet|<--+
                               +---------------------------------+

   Figure 3: Default SRTP Encryption Processing.  Here KG denotes the
   keystream generator, and (*) denotes bitwise exclusive-or.

   The definition of how the keystream is generated, given the index,
   depends on the cipher and its mode of operation.  Below, two such
   keystream generators are defined.  The NULL cipher is also defined,
   to be used when encryption of RTP is not required.

   The SRTP definition of the keystream is illustrated in Figure 3.  The
   initial octets of each keystream segment MAY be reserved for use in a
   message authentication code, in which case the keystream used for
   encryption starts immediately after the last reserved octet.  The
   initial reserved octets are called the "keystream prefix" (not to be
   confused with the "encryption prefix" of [RFC3550, Section 6.1]), and
   the remaining octets are called the "keystream suffix".  The
   keystream prefix MUST NOT be used for encryption.  The process is
   illustrated in Figure 3.

   The number of octets in the keystream prefix is denoted as
   SRTP_PREFIX_LENGTH.  The keystream prefix is indicated by a positive,
   non-zero value of SRTP_PREFIX_LENGTH.  This means that, even if
   confidentiality is not to be provided, the keystream generator output
   may still need to be computed for packet authentication, in which
   case the default keystream generator (mode) SHALL be used.

   The default cipher is the Advanced Encryption Standard (AES) [AES],
   and we define two modes of running AES, (1) Segmented Integer Counter
   Mode AES and (2) AES in f8-mode.  In the remainder of this section,
   let E(k,x) be AES applied to key k and input block x.

RFC3711 - Page 21

4.1.1.  AES in Counter Mode

   Conceptually, counter mode [AES-CTR] consists of encrypting
   successive integers.  The actual definition is somewhat more
   complicated, in order to randomize the starting point of the integer
   sequence.  Each packet is encrypted with a distinct keystream
   segment, which SHALL be computed as follows.

   A keystream segment SHALL be the concatenation of the 128-bit output
   blocks of the AES cipher in the encrypt direction, using key k = k_e,
   in which the block indices are in increasing order.  Symbolically,
   each keystream segment looks like

      E(k, IV) || E(k, IV + 1 mod 2^128) || E(k, IV + 2 mod 2^128) ...

   where the 128-bit integer value IV SHALL be defined by the SSRC, the
   SRTP packet index i, and the SRTP session salting key k_s, as below.

      IV = (k_s * 2^16) XOR (SSRC * 2^64) XOR (i * 2^16)

   Each of the three terms in the XOR-sum above is padded with as many
   leading zeros as needed to make the operation well-defined,
   considered as a 128-bit value.

   The inclusion of the SSRC allows the use of the same key to protect
   distinct SRTP streams within the same RTP session, see the security
   caveats in Section 9.1.

   In the case of SRTCP, the SSRC of the first header of the compound
   packet MUST be used, i SHALL be the 31-bit SRTCP index and k_e, k_s
   SHALL be replaced by the SRTCP encryption session key and salt.

   Note that the initial value, IV, is fixed for each packet and is
   formed by "reserving" 16 zeros in the least significant bits for the
   purpose of the counter.  The number of blocks of keystream generated
   for any fixed value of IV MUST NOT exceed 2^16 to avoid keystream
   re-use, see below.  The AES has a block size of 128 bits, so 2^16
   output blocks are sufficient to generate the 2^23 bits of keystream
   needed to encrypt the largest possible RTP packet (except for IPv6
   "jumbograms" [RFC2675], which are not likely to be used for RTP-based
   multimedia traffic).  This restriction on the maximum bit-size of the
   packet that can be encrypted ensures the security of the encryption
   method by limiting the effectiveness of probabilistic attacks [BDJR].

   For a particular Counter Mode key, each IV value used as an input
   MUST be distinct, in order to avoid the security exposure of a two-
   time pad situation (Section 9.1).  To satisfy this constraint, an
   implementation MUST ensure that the combination of the SRTP packet

RFC3711 - Page 22

   index of ROC || SEQ, and the SSRC used in the construction of the IV
   are distinct for any particular key.  The failure to ensure this
   uniqueness could be catastrophic for Secure RTP.  This is in contrast
   to the situation for RTP itself, which may be able to tolerate such
   failures.  It is RECOMMENDED that, if a dedicated security module is
   present, the RTP sequence numbers and SSRC either be generated or
   checked by that module (i.e., sequence-number and SSRC processing in
   an SRTP system needs to be protected as well as the key).

4.1.2.  AES in f8-mode

   To encrypt UMTS (Universal Mobile Telecommunications System, as 3G
   networks) data, a solution (see [f8-a] [f8-b]) known as the f8-
   algorithm has been developed.  On a high level, the proposed scheme
   is a variant of Output Feedback Mode (OFB) [HAC], with a more
   elaborate initialization and feedback function.  As in normal OFB,
   the core consists of a block cipher.  We also define here the use of
   AES as a block cipher to be used in what we shall call "f8-mode of
   operation" RTP encryption.  The AES f8-mode SHALL use the same
   default sizes for session key and salt as AES counter mode.

   Figure 4 shows the structure of block cipher, E, running in f8-mode.

RFC3711 - Page 23

                    IV
                    |
                    v
                +------+
                |      |
           +--->|  E   |
           |    +------+
           |        |
     m -> (*)       +-----------+-------------+--  ...     ------+
           |    IV' |           |             |                  |
           |        |   j=1 -> (*)    j=2 -> (*)   ...  j=L-1 ->(*)
           |        |           |             |                  |
           |        |      +-> (*)       +-> (*)   ...      +-> (*)
           |        |      |    |        |    |             |    |
           |        v      |    v        |    v             |    v
           |    +------+   | +------+    | +------+         | +------+
    k_e ---+--->|  E   |   | |  E   |    | |  E   |         | |  E   |
                |      |   | |      |    | |      |         | |      |
                +------+   | +------+    | +------+         | +------+
                    |      |    |        |    |             |    |
                    +------+    +--------+    +--  ...  ----+    |
                    |           |             |                  |
                    v           v             v                  v
                   S(0)        S(1)          S(2)  . . .       S(L-1)

   Figure 4.  f8-mode of operation (asterisk, (*), denotes bitwise XOR).
   The figure represents the KG in Figure 3, when AES-f8 is used.

4.1.2.1.  f8 Keystream Generation

   The Initialization Vector (IV) SHALL be determined as described in
   Section 4.1.2.2 (and in Section 4.1.2.3 for SRTCP).

   Let IV', S(j), and m denote n_b-bit blocks.  The keystream,
   S(0) ||... || S(L-1), for an N-bit message SHALL be defined by
   setting IV' = E(k_e XOR m, IV), and S(-1) = 00..0.  For
   j = 0,1,..,L-1 where L = N/n_b (rounded up to nearest integer if it
   is not already an integer) compute

            S(j) = E(k_e, IV' XOR j XOR S(j-1))

   Notice that the IV is not used directly.  Instead it is fed through E
   under another key to produce an internal, "masked" value (denoted
   IV') to prevent an attacker from gaining known input/output pairs.

RFC3711 - Page 24

   The role of the internal counter, j, is to prevent short keystream
   cycles.  The value of the key mask m SHALL be

           m = k_s || 0x555..5,

   i.e., the session salting key, appended by the binary pattern 0101..
   to fill out the entire desired key size, n_e.

   The sender SHOULD NOT generate more than 2^32 blocks, which is
   sufficient to generate 2^39 bits of keystream.  Unlike counter mode,
   there is no absolute threshold above (below) which f8 is guaranteed
   to be insecure (secure).  The above bound has been chosen to limit,
   with sufficient security margin, the probability of degenerative
   behavior in the f8 keystream generation.

4.1.2.2.  f8 SRTP IV Formation

   The purpose of the following IV formation is to provide a feature
   which we call implicit header authentication (IHA), see Section 9.5.

   The SRTP IV for 128-bit block AES-f8 SHALL be formed in the following
   way:

        IV = 0x00 || M || PT || SEQ || TS || SSRC || ROC

   M, PT, SEQ, TS, SSRC SHALL be taken from the RTP header; ROC is from
   the cryptographic context.

   The presence of the SSRC as part of the IV allows AES-f8 to be used
   when a master key is shared between multiple streams within the same
   RTP session, see Section 9.1.

4.1.2.3.  f8 SRTCP IV Formation

   The SRTCP IV for 128-bit block AES-f8 SHALL be formed in the
   following way:

   IV= 0..0 || E || SRTCP index || V || P || RC || PT || length || SSRC

   where V, P, RC, PT, length, SSRC SHALL be taken from the first header
   in the RTCP compound packet.  E and SRTCP index are the 1-bit and
   31-bit fields added to the packet.

RFC3711 - Page 25

4.1.3.  NULL Cipher

   The NULL cipher is used when no confidentiality for RTP/RTCP is
   requested.  The keystream can be thought of as "000..0", i.e., the
   encryption SHALL simply copy the plaintext input into the ciphertext
   output.

4.2.  Message Authentication and Integrity

   Throughout this section, M will denote data to be integrity
   protected.  In the case of SRTP, M SHALL consist of the Authenticated
   Portion of the packet (as specified in Figure 1) concatenated with
   the ROC, M = Authenticated Portion || ROC; in the case of SRTCP, M
   SHALL consist of the Authenticated Portion (as specified in Figure 2)
   only.

   Common parameters:

   *  AUTH_ALG is the authentication algorithm
   *  k_a is the session message authentication key
   *  n_a is the bit-length of the authentication key
   *  n_tag is the bit-length of the output authentication tag
   *  SRTP_PREFIX_LENGTH is the octet length of the keystream prefix as
      defined above, a parameter of AUTH_ALG

   The distinct session authentication keys for SRTP/SRTCP are by
   default derived as specified in Section 4.3.

   The values of n_a, n_tag, and SRTP_PREFIX_LENGTH MUST be fixed for
   any particular fixed value of the key.

   We describe the process of computing authentication tags as follows.
   The sender computes the tag of M and appends it to the packet.  The
   SRTP receiver verifies a message/authentication tag pair by computing
   a new authentication tag over M using the selected algorithm and key,
   and then compares it to the tag associated with the received message.
   If the two tags are equal, then the message/tag pair is valid;
   otherwise, it is invalid and the error audit message "AUTHENTICATION
   FAILURE" MUST be returned.

4.2.1.  HMAC-SHA1

   The pre-defined authentication transform for SRTP is HMAC-SHA1
   [RFC2104].  With HMAC-SHA1, the SRTP_PREFIX_LENGTH (Figure 3) SHALL
   be 0.  For SRTP (respectively SRTCP), the HMAC SHALL be applied to
   the session authentication key and M as specified above, i.e.,
   HMAC(k_a, M).  The HMAC output SHALL then be truncated to the n_tag
   left-most bits.

RFC3711 - Page 26

4.3.  Key Derivation

4.3.1.  Key Derivation Algorithm

   Regardless of the encryption or message authentication transform that
   is employed (it may be an SRTP pre-defined transform or newly
   introduced according to Section 6), interoperable SRTP
   implementations MUST use the SRTP key derivation to generate session
   keys.  Once the key derivation rate is properly signaled at the start
   of the session, there is no need for extra communication between the
   parties that use SRTP key derivation.

                         packet index ---+
                                         |
                                         v
               +-----------+ master  +--------+ session encr_key
               | ext       | key     |        |---------->
               | key mgmt  |-------->|  key   | session auth_key
               | (optional |         | deriv  |---------->
               | rekey)    |-------->|        | session salt_key
               |           | master  |        |---------->
               +-----------+ salt    +--------+

   Figure 5: SRTP key derivation.

   At least one initial key derivation SHALL be performed by SRTP, i.e.,
   the first key derivation is REQUIRED.  Further applications of the
   key derivation MAY be performed, according to the
   "key_derivation_rate" value in the cryptographic context.  The key
   derivation function SHALL initially be invoked before the first
   packet and then, when r > 0, a key derivation is performed whenever
   index mod r equals zero.  This can be thought of as "refreshing" the
   session keys.  The value of "key_derivation_rate" MUST be kept fixed
   for the lifetime of the associated master key.

   Interoperable SRTP implementations MAY also derive session salting
   keys for encryption transforms, as is done in both of the pre-
   defined transforms.

   Let m and n be positive integers.  A pseudo-random function family is
   a set of keyed functions {PRF_n(k,x)} such that for the (secret)
   random key k, given m-bit x, PRF_n(k,x) is an n-bit string,
   computationally indistinguishable from random n-bit strings, see
   [HAC].  For the purpose of key derivation in SRTP, a secure PRF with
   m = 128 (or more) MUST be used, and a default PRF transform is
   defined in Section 4.3.3.

RFC3711 - Page 27

   Let "a DIV t" denote integer division of a by t, rounded down, and
   with the convention that "a DIV 0 = 0" for all a.  We also make the
   convention of treating "a DIV t" as a bit string of the same length
   as a, and thus "a DIV t" will in general have leading zeros.

   Key derivation SHALL be defined as follows in terms of <label>, an
   8-bit constant (see below), master_salt and key_derivation_rate, as
   determined in the cryptographic context, and index, the packet index
   (i.e., the 48-bit ROC || SEQ for SRTP):

   *  Let r = index DIV key_derivation_rate (with DIV as defined above).

   *  Let key_id = <label> || r.

   *  Let x = key_id XOR master_salt, where key_id and master_salt are
      aligned so that their least significant bits agree (right-
      alignment).

   <label> MUST be unique for each type of key to be derived.  We
   currently define <label> 0x00 to 0x05 (see below), and future
   extensions MAY specify new values in the range 0x06 to 0xff for other
   purposes.  The n-bit SRTP key (or salt) for this packet SHALL then be
   derived from the master key, k_master as follows:

      PRF_n(k_master, x).

   (The PRF may internally specify additional formatting and padding of
   x, see e.g., Section 4.3.3 for the default PRF.)

   The session keys and salt SHALL now be derived using:

   - k_e (SRTP encryption): <label> = 0x00, n = n_e.

   - k_a (SRTP message authentication): <label> = 0x01, n = n_a.

   - k_s (SRTP salting key): <label> = 0x02, n = n_s.

   where n_e, n_s, and n_a are from the cryptographic context.

   The master key and master salt MUST be random, but the master salt
   MAY be public.

   Note that for a key_derivation_rate of 0, the application of the key
   derivation SHALL take place exactly once.

   The definition of DIV above is purely for notational convenience.
   For a non-zero t among the set of allowed key derivation rates, "a
   DIV t" can be implemented as a right-shift by the base-2 logarithm of

RFC3711 - Page 28

   t.  The derivation operation is further facilitated if the rates are
   chosen to be powers of 256, but that granularity was considered too
   coarse to be a requirement of this specification.

   The upper limit on the number of packets that can be secured using
   the same master key (see Section 9.2) is independent of the key
   derivation.

4.3.2.  SRTCP Key Derivation

   SRTCP SHALL by default use the same master key (and master salt) as
   SRTP.  To do this securely, the following changes SHALL be done to
   the definitions in Section 4.3.1 when applying session key derivation
   for SRTCP.

   Replace the SRTP index by the 32-bit quantity: 0 || SRTCP index
   (i.e., excluding the E-bit, replacing it with a fixed 0-bit), and use
   <label> = 0x03 for the SRTCP encryption key, <label> = 0x04 for the
   SRTCP authentication key, and, <label> = 0x05 for the SRTCP salting
   key.

4.3.3.  AES-CM PRF

   The currently defined PRF, keyed by 128, 192, or 256 bit master key,
   has input block size m = 128 and can produce n-bit outputs for n up
   to 2^23.  PRF_n(k_master,x) SHALL be AES in Counter Mode as described
   in Section 4.1.1, applied to key k_master, and IV equal to (x*2^16),
   and with the output keystream truncated to the n first (left-most)
   bits.  (Requiring n/128, rounded up, applications of AES.)

5.  Default and mandatory-to-implement Transforms

   The default transforms also are mandatory-to-implement transforms in
   SRTP.  Of course, "mandatory-to-implement" does not imply
   "mandatory-to-use".  Table 1 summarizes the pre-defined transforms.
   The default values below are valid for the pre-defined transforms.

                         mandatory-to-impl.   optional     default

   encryption            AES-CM, NULL         AES-f8       AES-CM
   message integrity     HMAC-SHA1              -          HMAC-SHA1
   key derivation (PRF)  AES-CM                 -          AES-CM

   Table 1: Mandatory-to-implement, optional and default transforms in
   SRTP and SRTCP.

RFC3711 - Page 29

5.1.  Encryption: AES-CM and NULL

   AES running in Segmented Integer Counter Mode, as defined in Section
   4.1.1, SHALL be the default encryption algorithm.  The default key
   lengths SHALL be 128-bit for the session encryption key (n_e).  The
   default session salt key-length (n_s) SHALL be 112 bits.

   The NULL cipher SHALL also be mandatory-to-implement.

5.2.  Message Authentication/Integrity: HMAC-SHA1

   HMAC-SHA1, as defined in Section 4.2.1, SHALL be the default message
   authentication code.  The default session authentication key-length
   (n_a) SHALL be 160 bits, the default authentication tag length
   (n_tag) SHALL be 80 bits, and the SRTP_PREFIX_LENGTH SHALL be zero
   for HMAC-SHA1.  In addition, for SRTCP, the pre-defined HMAC-SHA1
   MUST NOT be applied with a value of n_tag, nor n_a, that are smaller
   than these defaults.  For SRTP, smaller values are NOT RECOMMENDED,
   but MAY be used after careful consideration of the issues in Section
   7.5 and 9.5.

5.3.  Key Derivation: AES-CM PRF

   The AES Counter Mode based key derivation and PRF defined in Sections
   4.3.1 to 4.3.3, using a 128-bit master key, SHALL be the default
   method for generating session keys.  The default master salt length
   SHALL be 112 bits and the default key-derivation rate SHALL be zero.

6.  Adding SRTP Transforms

   Section 4 provides examples of the level of detail needed for
   defining transforms.  Whenever a new transform is to be added to
   SRTP, a companion standard track RFC MUST be written to exactly
   define how the new transform can be used with SRTP (and SRTCP).  Such
   a companion RFC SHOULD avoid overlap with the SRTP protocol document.
   Note however, that it MAY be necessary to extend the SRTP or SRTCP
   cryptographic context definition with new parameters (including fixed
   or default values), add steps to the packet processing, or even add
   fields to the SRTP/SRTCP packets.  The companion RFC SHALL explain
   any known issues regarding interactions between the transform and
   other aspects of SRTP.

   Each new transform document SHOULD specify its key attributes, e.g.,
   size of keys (minimum, maximum, recommended), format of keys,
   recommended/required processing of input keying material,
   requirements/recommendations on key lifetime, re-keying and key
   derivation, whether sharing of keys between SRTP and SRTCP is allowed
   or not, etc.

RFC3711 - Page 30

   An added message integrity transform SHOULD define a minimum
   acceptable key/tag size for SRTCP, equivalent in strength to the
   minimum values as defined in Section 5.2.

7.  Rationale

   This section explains the rationale behind several important features
   of SRTP.

7.1.  Key derivation

   Key derivation reduces the burden on the key establishment.  As many
   as six different keys are needed per crypto context (SRTP and SRTCP
   encryption keys and salts, SRTP and SRTCP authentication keys), but
   these are derived from a single master key in a cryptographically
   secure way.  Thus, the key management protocol needs to exchange only
   one master key (plus master salt when required), and then SRTP itself
   derives all the necessary session keys (via the first, mandatory
   application of the key derivation function).

   Multiple applications of the key derivation function are optional,
   but will give security benefits when enabled.  They prevent an
   attacker from obtaining large amounts of ciphertext produced by a
   single fixed session key.  If the attacker was able to collect a
   large amount of ciphertext for a certain session key, he might be
   helped in mounting certain attacks.

   Multiple applications of the key derivation function provide
   backwards and forward security in the sense that a compromised
   session key does not compromise other session keys derived from the
   same master key.  This means that the attacker who is able to recover
   a certain session key, is anyway not able to have access to messages
   secured under previous and later session keys (derived from the same
   master key).  (Note that, of course, a leaked master key reveals all
   the session keys derived from it.)

   Considerations arise with high-rate key refresh, especially in large
   multicast settings, see Section 11.

7.2.  Salting key

   The master salt guarantees security against off-line key-collision
   attacks on the key derivation that might otherwise reduce the
   effective key size [MF00].

RFC3711 - Page 31

   The derived session salting key used in the encryption, has been
   introduced to protect against some attacks on additive stream
   ciphers, see Section 9.2.  The explicit inclusion method of the salt
   in the IV has been selected for ease of hardware implementation.

7.3.  Message Integrity from Universal Hashing

   The particular definition of the keystream given in Section 4.1 (the
   keystream prefix) is to give provision for particular universal hash
   functions, suitable for message authentication in the Wegman-Carter
   paradigm [WC81].  Such functions are provably secure, simple, quick,
   and especially appropriate for Digital Signal Processors and other
   processors with a fast multiply operation.

   No authentication transforms are currently provided in SRTP other
   than HMAC-SHA1.  Future transforms, like the above mentioned
   universal hash functions, MAY be added following the guidelines in
   Section 6.

7.4.  Data Origin Authentication Considerations

   Note that in pair-wise communications, integrity and data origin
   authentication are provided together.  However, in group scenarios
   where the keys are shared between members, the MAC tag only proves
   that a member of the group sent the packet, but does not prevent
   against a member impersonating another.  Data origin authentication
   (DOA) for multicast and group RTP sessions is a hard problem that
   needs a solution; while some promising proposals are being
   investigated [PCST1] [PCST2], more work is needed to rigorously
   specify these technologies.  Thus SRTP data origin authentication in
   groups is for further study.

   DOA can be done otherwise using signatures.  However, this has high
   impact in terms of bandwidth and processing time, therefore we do not
   offer this form of authentication in the pre-defined packet-integrity
   transform.

   The presence of mixers and translators does not allow data origin
   authentication in case the RTP payload and/or the RTP header are
   manipulated.  Note that these types of middle entities also disrupt
   end-to-end confidentiality (as the IV formation depends e.g., on the
   RTP header preservation).  A certain trust model may choose to trust
   the mixers/translators to decrypt/re-encrypt the media (this would
   imply breaking the end-to-end security, with related security
   implications).

RFC3711 - Page 32

7.5.  Short and Zero-length Message Authentication

   As shown in Figure 1, the authentication tag is RECOMMENDED in SRTP.
   A full 80-bit authentication-tag SHOULD be used, but a shorter tag or
   even a zero-length tag (i.e., no message authentication) MAY be used
   under certain conditions to support either of the following two
   application environments.

      1. Strong authentication can be impractical in environments where
         bandwidth preservation is imperative.  An important special
         case is wireless communication systems, in which bandwidth is a
         scarce and expensive resource.  Studies have shown that for
         certain applications and link technologies, additional bytes
         may result in a significant decrease in spectrum efficiency
         [SWO].  Considerable effort has been made to design IP header
         compression techniques to improve spectrum efficiency
         [RFC3095].  A typical voice application produces 20 byte
         samples, and the RTP, UDP and IP headers need to be jointly
         compressed to one or two bytes on average in order to obtain
         acceptable wireless bandwidth economy [RFC3095].  In this case,
         strong authentication would impose nearly fifty percent
         overhead.

      2. Authentication is impractical for applications that use data
         links with fixed-width fields that cannot accommodate the
         expansion due to the authentication tag.  This is the case for
         some important existing wireless channels.  For example, zero-
         byte header compression is used to adapt EVRC/SMV voice with
         the legacy IS-95 bearer channel in CDMA2000 VoIP services.  It
         was found that not a single additional octet could be added to
         the data, which motivated the creation of a zero-byte profile
         for ROHC [RFC3242].

   A short tag is secure for a restricted set of applications.  Consider
   a voice telephony application, for example, such as a G.729 audio
   codec with a 20-millisecond packetization interval, protected by a
   32-bit message authentication tag.  The likelihood of any given
   packet being successfully forged is only one in 2^32.  Thus an
   adversary can control no more than 20 milliseconds of audio output
   during a 994-day period, on average.  In contrast, the effect of a
   single forged packet can be much larger if the application is
   stateful.  A codec that uses relative or predictive compression
   across packets will propagate the maliciously generated state,
   affecting a longer duration of output.

RFC3711 - Page 33

   Certainly not all SRTP or telephony applications meet the criteria
   for short or zero-length authentication tags.  Section 9.5.1
   discusses the risks of weak or no message authentication, and section
   9.5 describes the circumstances when it is acceptable and when it is
   unacceptable.

8.  Key Management Considerations

   There are emerging key management standards [MIKEY] [KEYMGT] [SDMS]
   for establishing an SRTP cryptographic context (e.g., an SRTP master
   key).  Both proprietary and open-standard key management methods are
   likely to be used for telephony applications [MIKEY] [KINK] and
   multicast applications [GDOI].  This section provides guidance for
   key management systems that service SRTP session.

   For initialization, an interoperable SRTP implementation SHOULD be
   given the SSRC and MAY be given the initial RTP sequence number for
   the RTP stream by key management (thus, key management has a
   dependency on RTP operational parameters).  Sending the RTP sequence
   number in the key management may be useful e.g., when the initial
   sequence number is close to wrapping (to avoid synchronization
   problems), and to communicate the current sequence number to a
   joining endpoint (to properly initialize its replay list).

   If the pre-defined transforms are used, SRTP allows sharing of the
   same master key between SRTP/SRTCP streams belonging to the same RTP
   session.

   First, sharing between SRTP streams belonging to the same RTP session
   is secure if the design of the synchronization mechanism, i.e., the
   IV, avoids keystream re-use (the two-time pad, Section 9.1).  This is
   taken care of by the fact that RTP provides for unique SSRCs for
   streams belonging to the same RTP session.  See Section 9.1 for
   further discussion.

   Second, sharing between SRTP and the corresponding SRTCP is secure.
   The fact that an SRTP stream and its associated SRTCP stream both
   carry the same SSRC does not constitute a problem for the two-time
   pad due to the key derivation.  Thus, SRTP and SRTCP corresponding to
   one RTP session MAY share master keys (as they do by default).

   Note that message authentication also has a dependency on SSRC
   uniqueness that is unrelated to the problem of keystream reuse: SRTP
   streams authenticated under the same key MUST have a distinct SSRC in
   order to identify the sender of the message.  This requirement is
   needed because the SSRC is the cryptographically authenticated field

RFC3711 - Page 34

   used to distinguish between different SRTP streams.  Were two streams
   to use identical SSRC values, then an adversary could substitute
   messages from one stream into the other without detection.

   SRTP/SRTCP MUST NOT share master keys under any other circumstances
   than the ones given above, i.e., between SRTP and its corresponding
   SRTCP, and, between streams belonging to the same RTP session.

8.1.  Re-keying

   The recommended way for a particular key management system to provide
   re-key within SRTP is by associating a master key in a crypto context
   with an MKI.

   This provides for easy master key retrieval (see Scenarios in Section
   11), but has the disadvantage of adding extra bits to each packet.
   As noted in Section 7.5, some wireless links do not cater for added
   bits, therefore SRTP also defines a more economic way of triggering
   re-keying, via use of <From, To>, which works in some specific,
   simple scenarios (see Section 8.1.1).

   SRTP senders SHALL count the amount of SRTP and SRTCP traffic being
   used for a master key and invoke key management to re-key if needed
   (Section 9.2).  These interactions are defined by the key management
   interface to SRTP and are not defined by this protocol specification.

8.1.1.  Use of the <From, To> for re-keying

   In addition to the use of the MKI, SRTP defines another optional
   mechanism for master key retrieval, the <From, To>.  The <From, To>
   specifies the range of SRTP indices (a pair of sequence number and
   ROC) within which a certain master key is valid, and is (when used)
   part of the crypto context.  By looking at the 48-bit SRTP index of
   the current SRTP packet, the corresponding master key can be found by
   determining which From-To interval it belongs to.  For SRTCP, the
   most recently observed/used SRTP index (which can be obtained from
   the cryptographic context) is used for this purpose, even though
   SRTCP has its own (31-bit) index (see caveat below).

   This method, compared to the MKI, has the advantage of identifying
   the master key and defining its lifetime without adding extra bits to
   each packet.  This could be useful, as already noted, for some
   wireless links that do not cater for added bits.  However, its use
   SHOULD be limited to specific, very simple scenarios.  We recommend
   to limit its use when the RTP session is a simple unidirectional or
   bi-directional stream.  This is because in case of multiple streams,
   it is difficult to trigger the re-key based on the <From, To> of a
   single RTP stream. For example, if several streams share a master

RFC3711 - Page 35

   key, there is no simple one-to-one correspondence between the index
   sequence space of a certain stream, and the index sequence space on
   which the <From, To> values are based.  Consequently, when a master
   key is shared between streams, one of these streams MUST be
   designated by key management as the one whose index space defines the
   re-keying points.  Also, the re-key triggering on SRTCP is based on
   the correspondent SRTP stream, i.e., when the SRTP stream changes the
   master key, so does the correspondent SRTCP.  This becomes obviously
   more and more complex with multiple streams.

   The default values for the <From, To> are "from the first observed
   packet" and "until further notice".  However, the maximum limit of
   SRTP/SRTCP packets that are sent under each given master/session key
   (Section 9.2) MUST NOT be exceeded.

   In case the <From, To> is used as key retrieval, then the MKI is not
   inserted in the packet (and its indicator in the crypto context is
   zero).  However, using the MKI does not exclude using <From, To> key
   lifetime simultaneously.  This can for instance be useful to signal
   at the sender side at which point in time an MKI is to be made
   active.

8.2.  Key Management parameters

   The table below lists all SRTP parameters that key management can
   supply.  For reference, it also provides a summary of the default and
   mandatory-to-support values for an SRTP implementation as described
   in Section 5.

RFC3711 - Page 36

   Parameter                     Mandatory-to-support    Default
   ---------                     --------------------    -------

   SRTP and SRTCP encr transf.       AES_CM, NULL         AES_CM
   (Other possible values: AES_f8)

   SRTP and SRTCP auth transf.       HMAC-SHA1           HMAC-SHA1

   SRTP and SRTCP auth params:
     n_tag (tag length)                 80                 80
     SRTP prefix_length                  0                  0

   Key derivation PRF                 AES_CM              AES_CM

   Key material params
   (for each master key):
     master key length                 128                128
     n_e (encr session key length)     128                128
     n_a (auth session key length)     160                160
     master salt key
     length of the master salt         112                112
     n_s (session salt key length)     112                112
     key derivation rate                 0                  0

     key lifetime
        SRTP-packets-max-lifetime      2^48               2^48
        SRTCP-packets-max-lifetime     2^31               2^31
        from-to-lifetime <From, To>
     MKI indicator                       0                 0
     length of the MKI                   0                 0
     value of the MKI

   Crypto context index params:
     SSRC value
     ROC
     SEQ
     SRTCP Index
     Transport address
     Port number

   Relation to other RTP profiles:
     sender's order between FEC and SRTP FEC-SRTP      FEC-SRTP
     (see Section 10)