8]. This payload format uses the fields of the header in a manner consistent with that specification. The RTP timestamp corresponds to the sampling instant of the first sample encoded for the first frame-block in the packet. The timestamp clock frequency is the same as the sampling frequency, so the timestamp unit is in samples.
The duration of one speech frame-block is 20 ms for both AMR and AMR-WB. For AMR, the sampling frequency is 8 kHz, corresponding to 160 encoded speech samples per frame from each channel. For AMR-WB, the sampling frequency is 16 kHz, corresponding to 320 samples per frame from each channel. Thus, the timestamp is increased by 160 for AMR and 320 for AMR-WB for each consecutive frame-block. A packet may contain multiple frame-blocks of encoded speech or comfort noise parameters. If interleaving is employed, the frame- blocks encapsulated into a payload are picked according to the interleaving rules as defined in Section 4.4.1. Otherwise, each packet covers a period of one or more contiguous 20 ms frame-block intervals. In case the data from all the channels for a particular frame-block in the period is missing (for example, at a gateway from some other transport format), it is possible to indicate that no data is present for that frame-block rather than breaking a multi-frame- block packet into two, as explained in Section 4.3.2. To allow for error resiliency through redundant transmission, the periods covered by multiple packets MAY overlap in time. A receiver MUST be prepared to receive any speech frame multiple times, in exact duplicates, in different AMR rate modes, or with data present in one packet and not present in another. If multiple versions of the same speech frame are received, it is RECOMMENDED that the mode with the highest rate be used by the speech decoder. A given frame MUST NOT be encoded as speech in one packet and comfort noise parameters in another. The payload length is always made an integral number of octets by padding with zero bits if necessary. If additional padding is required to bring the payload length to a larger multiple of octets or for some other purpose, then the P bit in the RTP in the header may be set and padding appended as specified in . The RTP header marker bit (M) SHALL be set to 1 if the first frame- block carried in the packet contains a speech frame which is the first in a talkspurt. For all other packets the marker bit SHALL be set to zero (M=0). The assignment of an RTP payload type for this new packet format is outside the scope of this document, and will not be specified here. It is expected that the RTP profile under which this payload format is being used will assign a payload type for this encoding or specify that the payload type is to be bound dynamically.
2], or 0-8 for AMR-WB, as defined in Table 1a in . CMR value 15 indicates that no mode request is present, and other values are for future use. The codec mode request received in the CMR field is valid until the next codec mode request is received, i.e., a newly received CMR value corresponding to a speech mode, or NO_DATA overrides the previously received CMR value corresponding to a speech mode or NO_DATA. Therefore, if a terminal continuously wishes to receive frames in the
same mode X, it needs to set CMR=X for all its outbound payloads, and if a terminal has no preference in which mode to receive, it SHOULD set CMR=15 in all its outbound payloads. If receiving a payload with a CMR value that is not a speech mode or NO_DATA, the CMR MUST be ignored by the receiver. In a multi-channel session, the codec mode request SHOULD be interpreted by the receiver of the payload as the desired encoding mode for all the channels in the session. An IP end-point SHOULD NOT set the codec mode request based on packet losses or other congestion indications, for several reasons: - The other end of the IP path may be a gateway to a non-IP network (such as a radio link) that needs to set the CMR field to optimize performance on that network. - Congestion on the IP network is managed by the IP sender, in this case, at the other end of the IP path. Feedback about congestion SHOULD be provided to that IP sender through RTCP or other means, and then the sender can choose to avoid congestion using the most appropriate mechanism. That may include adjusting the codec mode, but also includes adjusting the level of redundancy or number of frames per packet. The encoder SHOULD follow a received codec mode request, but MAY change to a lower-numbered mode if it so chooses, for example, to control congestion. The CMR field MUST be set to 15 for packets sent to a multicast group. The encoder in the speech sender SHOULD ignore codec mode requests when sending speech to a multicast session but MAY use RTCP feedback information as a hint that a codec mode change is needed. The codec mode selection MAY be restricted by a session parameter to a subset of the available modes. If so, the requested mode MUST be among the signalled subset (see Section 8). If the received CMR value is outside the signalled subset of modes, it MUST be ignored.
In bandwidth-efficient mode, a ToC entry takes the following format: 0 1 2 3 4 5 +-+-+-+-+-+-+ |F| FT |Q| +-+-+-+-+-+-+ F (1 bit): If set to 1, indicates that this frame is followed by another speech frame in this payload; if set to 0, indicates that this frame is the last frame in this payload. FT (4 bits): Frame type index, indicating either the AMR or AMR-WB speech coding mode or comfort noise (SID) mode of the corresponding frame carried in this payload. The value of FT is defined in Table 1a in  for AMR and in Table 1a in  for AMR-WB. FT=14 (SPEECH_LOST, only available for AMR-WB) and FT=15 (NO_DATA) are used to indicate frames that are either lost or not being transmitted in this payload, respectively. NO_DATA (FT=15) frame could mean either that no data for that frame has been produced by the speech encoder or that no data for that frame is transmitted in the current payload (i.e., valid data for that frame could be sent in either an earlier or later packet). If receiving a ToC entry with a FT value in the range 9-14 for AMR or 10-13 for AMR-WB, the whole packet SHOULD be discarded. This is to avoid the loss of data synchronization in the depacketization process, which can result in a huge degradation in speech quality. Note that packets containing only NO_DATA frames SHOULD NOT be transmitted in any payload format configuration, except in the case of interleaving. Also, frame-blocks containing only NO_DATA frames at the end of a packet SHOULD NOT be transmitted in any payload format configuration, except in the case of interleaving. The AMR SCR/DTX is described in  and AMR-WB SCR/DTX in . The extra comfort noise frame types specified in table 1a in  (i.e., GSM-EFR CN, IS-641 CN, and PDC-EFR CN) MUST NOT be used in this payload format because the standardized AMR codec is only required to implement the general AMR SID frame type and not those that are native to the incorporated encodings. Q (1 bit): Frame quality indicator. If set to 0, indicates the corresponding frame is severely damaged, and the receiver should set the RX_TYPE (see ) to either SPEECH_BAD or SID_BAD depending on the frame type (FT).
The frame quality indicator is included for interoperability with the ATM payload format described in ITU-T I.366.2, the UMTS Iu interface , as well as other transport formats. The frame quality indicator enables damaged frames to be forwarded to the speech decoder for error concealment. This can improve the speech quality more than dropping the damaged frames. See Section 188.8.131.52 for more details. For multi-channel sessions, the ToC entries of all frames from a frame-block are placed in the ToC in consecutive order as defined in Section 4.1 in . When multiple frame-blocks are present in a packet in bandwidth-efficient mode, they will be placed in the packet in order of their creation time. Therefore, with N channels and K speech frame-blocks in a packet, there MUST be N*K entries in the ToC, and the first N entries will be from the first frame-block, the second N entries will be from the second frame-block, and so on. The following figure shows an example of a ToC of three entries in a single-channel session using bandwidth-efficient mode. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| FT |Q|1| FT |Q|0| FT |Q| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Below is an example of how the ToC entries will appear in the ToC of a packet carrying three consecutive frame-blocks in a session with two channels (L and R). +----+----+----+----+----+----+ | 1L | 1R | 2L | 2R | 3L | 3R | +----+----+----+----+----+----+ |<------->|<------->|<------->| Frame- Frame- Frame- Block 1 Block 2 Block 3
Each speech frame represents 20 ms of speech encoded with the mode indicated in the FT field of the corresponding ToC entry. The length of the speech frame is implicitly defined by the mode indicated in the FT field. The order and numbering notation of the bits are as specified for Interface Format 1 (IF1) in  for AMR and  for AMR-WB. As specified there, the bits of speech frames have been rearranged in order of decreasing sensitivity, while the bits of comfort noise frames are in the order produced by the encoder. The resulting bit sequence for a frame of length K bits is denoted d(0), d(1), ..., d(K-1). Section 184.108.40.206 for an example).
In the payload, no specific mode is requested (CMR=15), the speech frame is not damaged at the IP origin (Q=1), and the coding mode is AMR 7.4 kbps (FT=4). The encoded speech bits, d(0) to d(147), are arranged in descending sensitivity order according to . Finally, two padding bits (P) are added to the end as padding to make the payload octet aligned. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CMR=15|0| FT=4 |1|d(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | d(147)|P|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4]. (Note, no speech bits are present for the third frame.) Finally, seven zero bits are padded to the end to make the payload octet aligned.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CMR=1 |1| FT=0 |1|1| FT=9 |1|1| FT=15 |1|0| FT=1 |1|d(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | d(131)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |g(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | g(39)|h(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | h(176)|P|P|P|P|P|P|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CMR=15|1|1L FT=4|1|1|1R FT=4|1|1|2L FT=4|1|1|2R FT=4|1|1|3L FT| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |4|1|0|3R FT=4|1|d1L(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | d1L(147)|d1R(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | d1R(147)|d2L(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |d2L(147|d2R(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | d2R(147)|d3L(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | d3L(147)|d3R(0) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | d3R(147)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Section 4.3.1. R: is a reserved bit that MUST be set to zero. All R bits MUST be ignored by the receiver. ILL (4 bits, unsigned integer): This is an OPTIONAL field that is present only if interleaving is signalled out-of-band for the session. ILL=L indicates to the receiver that the interleaving length is L+1, in number of frame-blocks. ILP (4 bits, unsigned integer): This is an OPTIONAL field that is present only if interleaving is signalled. ILP MUST take a value between 0 and ILL, inclusive, indicating the interleaving index for frame-blocks in this payload in the interleaving group. If the value of ILP is found greater than ILL, the payload SHOULD be discarded. ILL and ILP fields MUST be present in each packet in a session if interleaving is signalled for the session. Interleaving MUST be performed on a frame-block basis (i.e., NOT on a frame basis) in a multi-channel session. The following example illustrates the arrangement of speech frame- blocks in an interleaving group during an interleaving session. Here we assume ILL=L for the interleaving group that starts at speech frame-block n. We also assume that the first payload packet of the interleaving group is s, and the number of speech frame-blocks carried in each payload is N. Then we will have:
Payload s (the first packet of this interleaving group): ILL=L, ILP=0, Carry frame-blocks: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1) Payload s+1 (the second packet of this interleaving group): ILL=L, ILP=1, frame-blocks: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+1+(N-1)*(L+1) ... Payload s+L (the last packet of this interleaving group): ILL=L, ILP=L, frame-blocks: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1) The next interleaving group will start at frame-block n+N*(L+1). There will be no interleaving effect unless the number of frame- blocks per packet (N) is at least 2. Moreover, the number of frame- blocks per payload (N) and the value of ILL MUST NOT be changed inside an interleaving group. In other words, all payloads in an interleaving group MUST have the same ILL and MUST contain the same number of speech frame-blocks. The sender of the payload MUST only apply interleaving if the receiver has signalled its use through out-of-band means. Since interleaving will increase buffering requirements at the receiver, the receiver uses media type parameter "interleaving=I" to set the maximum number of frame-blocks allowed in an interleaving group to I. When performing interleaving, the sender MUST use a proper number of frame-blocks per payload (N) and ILL so that the resulting size of an interleaving group is less or equal to I, that is, N*(L+1)<=I.
The list of ToC entries is organized in the same way as described for bandwidth-efficient mode in 4.3.2, with the following exception: when interleaving is used, the frame-blocks in the ToC will almost never be placed consecutively in time. Instead, the presence and order of the frame-blocks in a packet will follow the pattern described in 4.4.1. The following example shows the ToC of three consecutive packets, each carrying three frame-blocks, in an interleaved two-channel session. Here, the two channels are left (L) and right (R) with L coming before R, and the interleaving length is 3 (i.e., ILL=2). This results in the interleaving group size of 9 frame-blocks. Packet #1 --------- ILL=2, ILP=0: +----+----+----+----+----+----+ | 1L | 1R | 4L | 4R | 7L | 7R | +----+----+----+----+----+----+ |<------->|<------->|<------->| Frame- Frame- Frame- Block 1 Block 4 Block 7 Packet #2 --------- ILL=2, ILP=1: +----+----+----+----+----+----+ | 2L | 2R | 5L | 5R | 8L | 8R | +----+----+----+----+----+----+ |<------->|<------->|<------->| Frame- Frame- Frame- Block 2 Block 5 Block 8 Packet #3 --------- ILL=2, ILP=2: +----+----+----+----+----+----+ | 3L | 3R | 6L | 6R | 9L | 9R | +----+----+----+----+----+----+ |<------->|<------->|<------->| Frame- Frame- Frame- Block 3 Block 6 Block 9
A ToC entry takes the following format in octet-aligned mode: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |F| FT |Q|P|P| +-+-+-+-+-+-+-+-+ F (1 bit): see definition in Section 4.3.2. FT (4 bits, unsigned integer): see definition in Section 4.3.2. Q (1 bit): see definition in Section 4.3.2. P bits: padding bits, MUST be set to zero, and MUST be ignored on reception. The list of CRCs is OPTIONAL. It only exists if the use of CRC is signalled out-of-band for the session. When present, each CRC in the list is 8 bits long and corresponds to a speech frame (NOT a frame- block) carried in the payload. Calculation and use of the CRC is specified in the next section. Section 3.6. This section provides more details on how to use the frame CRC in the octet-aligned payload header together with a partial transport layer checksum to achieve UED. To achieve UED, one SHOULD use a transport layer checksum (for example, the one defined in UDP-Lite ) to protect the IP, transport protocol (e.g., UDP-Lite), and RTP headers, as well as the payload header and the table of contents in the payload. The frame CRC, when used, MUST be calculated only over all class A bits in the AMR or AMR-WB frame. Class B and C bits in the AMR or AMR-WB frame MUST NOT be included in the CRC calculation and SHOULD NOT be covered by the transport checksum. Note, the number of class A bits for various coding modes in AMR codec is specified as informative in  and is therefore copied into Table 1 in Section 3.6 to make it normative for this payload format. The number of class A bits for various coding modes in AMR-WB codec is specified as normative in Table 2 in , and the SID frame (FT=9) has 40 class A bits. These definitions of class A bits MUST be used for this payload format.
If the transport layer checksum or link layer checksum detects any errors within the protected (sensitive) part, it is assumed that the complete packet will be discarded as defined by UDP-Lite . The receiver of the payload SHOULD examine the data integrity of the received class A bits by re-calculating the CRC over the received class A bits and comparing the result to the value found in the received payload header. If the two values mismatch, the receiver SHALL consider the class A bits in the receiver frame damaged and MUST clear the Q flag of the frame (i.e., set it to 0). This will subsequently cause the frame to be marked as SPEECH_BAD, if the FT of the frame is 0..7 for AMR or 0..8 for AMR-WB, or SID_BAD if the FT of the frame is 8 for AMR or 9 for AMR-WB, before it is passed to the speech decoder. See  and  more details. The following example shows an octet-aligned ToC with a CRC list for a payload containing 3 speech frames from a single-channel session (assuming none of the FTs is equal to 14 or 15): 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| FT#1 |Q|P|P|1| FT#2 |Q|P|P|0| FT#3 |Q|P|P| CRC#1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CRC#2 | CRC#3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Each of the CRCs takes 8 bits 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ | c0| c1| c2| c3| c4| c5| c6| c7| +---+---+---+---+---+---+---+---+ (MSB) (LSB) and is calculated by the cyclic generator polynomial, C(x) = 1 + x^2 + x^3 + x^4 + x^8 where ^ is the exponentiation operator. In binary form, the polynomial appears as follows: 101110001 (MSB..LSB). The actual calculation of the CRC is made as follows: First, an 8-bit CRC register is reset to zero: 00000000. For each bit over which the CRC shall be calculated, an XOR operation is made between the rightmost (LSB) bit of the CRC register and the bit. The CRC
register is then right-shifted one step (each bit's significance is reduced by one), inputting a "0" as the leftmost bit (MSB). If the result of the XOR operation mentioned above is a "1", then "10111000" is bit-wise XOR-ed into the CRC register. This operation is repeated for each bit that the CRC should cover. In this case, the first bit would be d(0) for the speech frame for which the CRC should cover. When the last bit (e.g., d(54) for AMR 5.9 according to Table 1 in Section 3.6) has been used in this CRC calculation, the contents in CRC register should simply be copied to the corresponding field in the list of CRCs. Fast calculation of the CRC on a general-purpose CPU is possible using a table-driven algorithm. Section 4.3.3, with the following exceptions: - The last octet of each speech frame MUST be padded with zero bits at the end if all bits in the octet are not used. The padding bits MUST be ignored on reception. In other words, each speech frame MUST be octet-aligned. - When multiple speech frames are present in the speech data (i.e., compound payload), the speech frames are arranged either one whole frame after another as usual, or with the octets of all frames interleaved together at the octet level, depending on the media type parameters negotiated for the payload type. Since the bits within each frame are ordered with the most error-sensitive bits first, interleaving the octets collects those sensitive bits from all frames to be nearer the beginning of the packet. This is called "robust sorting order" which allows the application of UED (such as UDP-Lite ) or UEP (such as the ULP ) mechanisms to the payload data. The details of assembling the payload are given in the next section. The use of robust sorting order for a payload type MUST be agreed via out-of-band means. Section 8 specifies a media type parameter for this purpose. Note, robust sorting order MUST only be performed on the frame level and thus is independent of interleaving, which is at the frame-block level, as described in Section 4.4.1. In other words, robust sorting can be applied to either non-interleaved or interleaved payload types.
The first two frames in the payload are the L and R channel speech frames of frame-block #1, consisting of bits f1L(0..158) and f1R(0..158), respectively. The next two frames are the L and R channel frames of frame-block #3, consisting of bits f3L(0..158) and f3R(0..158), respectively, due to interleaving. For each of the four speech frames, a CRC is calculated as CRC1L(0..7), CRC1R(0..7), CRC3L(0..7), and CRC3R(0..7), respectively. Finally, the payload is robust sorted. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CMR=6 |R|R|R|R| ILL=1 | ILP=0 |1|FT#1L=5|Q|P|P|1|FT#1R=5|Q|P|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|FT#3L=5|Q|P|P|0|FT#3R=5|Q|P|P| CRC1L | CRC1R | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CRC3L | CRC3R | f1L(0..7) | f1R(0..7) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | f3L(0..7) | f3R(0..7) | f1L(8..15) | f1R(8..15) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | f3L(8..15) | f3R(8..15) | f1L(16..23) | f1R(16..23) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : ... : +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | f3L(144..151) | f3R(144..151) |f1L(152..158)|P|f1R(152..158)|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |f3L(152..158)|P|f3R(152..158)|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note, in the above example, the last octet in all four speech frames is padded with one zero bit to make it octet-aligned.
channel. The other operating modes: interleaving, robust sorting, and frame-wise CRC (in both single and multi-channel) are OPTIONAL to implement. The mode-change-period, mode-change-capability, and mode-change- neighbor parameters are intended for signaling with GSM endpoints. When interoperability with GSM is desired, encoders SHOULD only perform codec mode changes to neighboring modes and in integer multiples of 40 ms (two frame-blocks), but decoders SHOULD accept codec mode changes at any time, i.e., for every frame-block. The encoder may arbitrarily select the initial phase (odd or even frame- block) where codec mode changes are performed, but then SHOULD stick to that phase as far as possible. However, in rare cases, handovers or other events (e.g., call forwarding) may change this phase and may also cause mode changes to non-neighboring modes. The decoder SHALL therefore be prepared to accept changes also in the other phase and to other modes. Section 8 specifies the usage of the parameters mode-change-period and mode-change-capability to indicate the desired behavior in applications. See 3GPP TS 26.103  for preferred AMR and AMR-WB configurations for operation in GSM and 3GPP UMTS networks. In gateway scenarios, encoders can be requested through the "mode-set" parameter to use a limited mode-set that is supported by the link beyond the gateway. Further, to avoid congestion on that link, the encoder SHOULD limit the initial codec mode for a session to a lower mode, until at least one frame-block is received with rate control information.
Note, the "\n" is an important part of the magic numbers and MUST be included in the comparison, since, otherwise, the single-channel magic numbers above will become indistinguishable from those of the multi-channel files defined in the next section. Section 4.1 in .
Section 4.3.2. The P bits are padding and MUST be set to 0, and MUST be ignored. Following this one octet header come the speech bits as defined in 4.4.3. The last octet of each frame is padded with zeroes, if needed, to achieve octet alignment. The following example shows an AMR frame in 5.9 kbps coding mode (with 118 speech bits) in the storage format. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |P| FT=2 |Q|P|P| | +-+-+-+-+-+-+-+-+ + | | + Speech bits for frame-block n, channel k + | | + +-+-+ | |P|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Non-received speech frames or frame-blocks between SID updates during non-speech periods MUST be stored as NO_DATA frames (frame type 15, as defined in  and ). Frames or frame-blocks lost in transmission MUST be stored as NO_DATA frames or SPEECH_LOST (frame type 14, only available for AMR-WB) in complete frame-blocks to keep synchronization with the original media. Comfort noise frames of other types than AMR SID (FT=8) (i.e., frame type 9, 10, and 11 for AMR) SHALL NOT be used in the AMR file format.