RFC 3951

Internet Low Bit Rate Codec (iLBC)

Pages: 194
Experimental

Part 2 of 6 – Pages 19 to 44

RFC3951 - Page 19 prevText

3.6.  Encoding the Remaining Samples

   A dynamic codebook is used to encode 1) the 23/22 remaining samples
   in the two sub-blocks containing the start state; 2) the sub-blocks
   after the start state in time; and 3) the sub-blocks before the start
   state in time.  Thus, the encoding target can be either the 23/22
   samples remaining of the 2 sub-blocks containing the start state, or
   a 40-sample sub-block.  This target can consist of samples that are
   indexed forward in time or backward in time, depending on the
   location of the start state.  The length of the target is denoted by
   lTarget.

   The coding is based on an adaptive codebook that is built from a
   codebook memory that contains decoded LPC excitation samples from the
   already encoded part of the block.  These samples are indexed in the
   same time direction as is the target vector and end at the sample
   instant prior to the first sample instant represented in the target
   vector.  The codebook memory has length lMem, which is equal to
   CB_MEML=147 for the two/four 40-sample sub-blocks and 85 for the
   23/22-sample sub-block.

   The following figure shows an overview of the encoding procedure.

         +------------+    +---------------+    +-------------+
      -> | 1. Decode  | -> | 2. Mem setup  | -> | 3. Perc. W. | ->
         +------------+    +---------------+    +-------------+

         +------------+    +-----------------+
      -> | 4. Search  | -> | 5. Upd. Target  | ------------------>
       | +------------+    +------------------ |
       ----<-------------<-----------<----------
                     stage=0..2

         +----------------+
      -> | 6. Recalc G[0] | ---------------> gains and CB indices
         +----------------+

   Figure 3.4.  Flow chart of the codebook search in the iLBC encoder.

RFC3951 - Page 20

   1. Decode the part of the residual that has been encoded so far,
      using the codebook without perceptual weighting.

   2. Set up the memory by taking data from the decoded residual.  This
      memory is used to construct codebooks.  For blocks preceding the
      start state, both the decoded residual and the target are time
      reversed (section 3.6.1).
   3. Filter the memory + target with the perceptual weighting filter
      (section 3.6.2).

   4. Search for the best match between the target and the codebook
      vector.  Compute the optimal gain for this match and quantize that
      gain (section 3.6.4).

   5. Update the perceptually weighted target by subtracting the
      contribution from the selected codebook vector from the
      perceptually weighted memory (quantized gain times selected
      vector).  Repeat 4 and 5 for the two additional stages.

   6. Calculate the energy loss due to encoding of the residual.  If
      needed, compensate for this loss by an upscaling and
      requantization of the gain for the first stage (section 3.7).

   The following sections provide an in-depth description of the
   different blocks of Figure 3.4.

3.6.1.  Codebook Memory

   The codebook memory is based on the already encoded sub-blocks, so
   the available data for encoding increases for each new sub-block that
   has been encoded.  Until enough sub-blocks have been encoded to fill
   the codebook memory with data, it is padded with zeros.  The
   following figure shows an example of the order in which the sub-
   blocks are encoded for the 30 ms frame size if the start state is
   located in the last 58 samples of sub-block 2 and 3.

   +-----------------------------------------------------+
   |  5     | 1  |///|////////|    2   |    3   |    4   |
   +-----------------------------------------------------+

   Figure 3.5.  The order from 1 to 5 in which the sub-blocks are
   encoded.  The slashed area is the start state.

RFC3951 - Page 21

   The first target sub-block to be encoded is number 1, and the
   corresponding codebook memory is shown in the following figure.  As
   the target vector comes before the start state in time, the codebook
   memory and target vector are time reversed; thus, after the block has
   been time reversed the search algorithm can be reused.  As only the
   start state has been encoded so far, the last samples of the codebook
   memory are padded with zeros.

   +-------------------------
   |zeros|\\\\\\\\|\\\\|  1 |
   +-------------------------

   Figure 3.6.  The codebook memory, length lMem=85 samples, and the
   target vector 1, length 22 samples.

   The next step is to encode sub-block 2 by using the memory that now
   has increased since sub-block 1 has been encoded.  The following
   figure shows the codebook memory for encoding of sub-block 2.

   +-----------------------------------
   | zeros | 1  |///|////////|    2   |
   +-----------------------------------

   Figure 3.7.  The codebook memory, length lMem=147 samples, and the
   target vector 2, length 40 samples.

   The next step is to encode sub-block 3 by using the memory which has
   been increased yet again since sub-blocks 1 and 2 have been encoded,
   but the sub-block still has to be padded with a few zeros.  The
   following figure shows the codebook memory for encoding of sub-block
   3.

   +------------------------------------------
   |zeros| 1  |///|////////|    2   |   3    |
   +------------------------------------------

   Figure 3.8.  The codebook memory, length lMem=147 samples, and the
   target vector 3, length 40 samples.

   The next step is to encode sub-block 4 by using the memory which now
   has increased yet again since sub-blocks 1, 2, and 3 have been
   encoded.  This time, the memory does not have to be padded with
   zeros.  The following figure shows the codebook memory for encoding
   of sub-block 4.

RFC3951 - Page 22

   +------------------------------------------
   |1|///|////////|    2   |   3    |   4    |
   +------------------------------------------

   Figure 3.9.  The codebook memory, length lMem=147 samples, and the
   target vector 4, length 40 samples.

   The final target sub-block to be encoded is number 5, and the
   following figure shows the corresponding codebook memory.  As the
   target vector comes before the start state in time, the codebook
   memory and target vector are time reversed.

   +-------------------------------------------
   |  3  |   2    |\\\\\\\\|\\\\|  1 |   5    |
   +-------------------------------------------

   Figure 3.10.  The codebook memory, length lMem=147 samples, and the
   target vector 5, length 40 samples.

   For the case of 20 ms frames, the encoding procedure looks almost
   exactly the same.  The only difference is that the size of the start
   state is 57 samples and that there are only three sub-blocks to be
   encoded.  The encoding order is the same as above, starting with the
   23-sample target and then encoding the two remaining 40-sample sub-
   blocks, first going forward in time and then going backward in time
   relative to the start state.

3.6.2.  Perceptual Weighting of Codebook Memory and Target

   To provide a perceptual weighting of the coding error, a
   concatenation of the codebook memory and the target to be coded is
   all-pole filtered with the perceptual weighting filter specified in
   section 3.4.  The filter state of the weighting filter is set to
   zero.

      in(0..(lMem-1))            = unweighted codebook memory
      in(lMem..(lMem+lTarget-1)) = unweighted target signal


      in -> Wk(z) -> filtered,
          where Wk(z) is taken from the sub-block of the target

      weighted codebook memory = filtered(0..(lMem-1))
      weighted target signal = filtered(lMem..(lMem+lTarget-1))

   The codebook search is done with the weighted codebook memory and the
   weighted target, whereas the decoding and the codebook memory update
   uses the unweighted codebook memory.

RFC3951 - Page 23

3.6.3.  Codebook Creation

   The codebook for the search is created from the perceptually weighted
   codebook memory.  It consists of two sections, where the first is
   referred to as the base codebook and the second as the expanded
   codebook, as it is created by linear combinations of the first.  Each
   of these two sections also has a subsection referred to as the
   augmented codebook.  The augmented codebook is only created and used
   for the coding of the 40-sample sub-blocks and not for the 23/22-
   sample sub-block case.  The codebook size used for the different
   sub-blocks and different stages are summarized in the table below.

                              Stage
                        1               2 & 3
           --------------------------------------------
                22     128  (64+0)*2     128 (64+0)*2
   Sub-    1:st 40     256  (108+20)*2   128 (44+20)*2
   Blocks  2:nd 40     256  (108+20)*2   256 (108+20)*2
           3:rd 40     256  (108+20)*2   256 (108+20)*2
           4:th 40     256  (108+20)*2   256 (108+20)*2

   Table 3.1.  Codebook sizes for the 30 ms mode.

   Table 3.1 shows the codebook size for the different sub-blocks and
   stages for 30 ms frames.  Inside the parentheses it shows how the
   number of codebook vectors is distributed, within the two sections,
   between the base/expanded codebook and the augmented base/expanded
   codebook.  It should be interpreted in the following way:
   (base/expanded cb + augmented base/expanded cb).  The total number of
   codebook vectors for a specific sub-block and stage is given by the
   following formula:

   Tot. cb vectors = base cb + aug. base cb + exp. cb + aug. exp. cb

   The corresponding values to Figure 3.1 for 20 ms frames are only
   slightly modified.  The short sub-block is 23 instead of 22 samples,
   and the 3:rd and 4:th sub-frame are not present.

3.6.3.1.  Creation of a Base Codebook

   The base codebook is given by the perceptually weighted codebook
   memory that is mentioned in section 3.5.3.  The different codebook
   vectors are given by sliding a window of length 23/22 or 40, given by
   variable lTarget, over the lMem-long perceptually weighted codebook
   memory.  The indices are ordered so that the codebook vector
   containing sample (lMem-lTarget-n) to (lMem-n-1) of the codebook

RFC3951 - Page 24

   memory vector has index n, where n=0..lMem-lTarget.  Thus the total
   number of base codebook vectors is lMem-lTarget+1, and the indices
   are ordered from sample delay lTarget (23/22 or 40) to lMem+1 (86 or
   148).

3.6.3.2.  Codebook Expansion

   The base codebook is expanded by a factor of 2, creating an
   additional section in the codebook.  This new section is obtained by
   filtering the base codebook, base_cb, with a FIR filter with filter
   length CB_FILTERLEN=8.  The construction of the expanded codebook
   compensates for the delay of four samples introduced by the FIR
   filter.

   cbfiltersTbl[CB_FILTERLEN]={-0.033691, 0.083740, -0.144043,
                  0.713379, 0.806152, -0.184326,
                  0.108887, -0.034180};

                   ___
                   \
      exp_cb(k)=  + > cbfiltersTbl(i)*x(k-i+4)
                   /__
             i=0...(LPC_FILTERORDER-1)

      where x(j) = base_cb(j) for j=0..lMem-1 and 0 otherwise

   The individual codebook vectors of the new filtered codebook, exp_cb,
   and their indices are obtained in the same fashion as described above
   for the base codebook.

3.6.3.3.  Codebook Augmentation

   For cases where encoding entire sub-blocks, i.e., cbveclen=40, the
   base and expanded codebooks are augmented to increase codebook
   richness.  The codebooks are augmented by vectors produced by
   interpolation of segments.  The base and expanded codebook,
   constructed above, consists of vectors corresponding to sample delays
   in the range from cbveclen to lMem.  The codebook augmentation
   attempts to augment these codebooks with vectors corresponding to
   sample delays from 20 to 39.  However, not all of these samples are
   present in the base codebook and expanded codebook, respectively.
   Therefore, the augmentation vectors are constructed as linear
   combinations between samples corresponding to sample delays in the
   range 20 to 39.  The general idea of this procedure is presented in
   the following figures and text.  The procedure is performed for both
   the base codebook and the expanded codebook.

RFC3951 - Page 25

       - - ------------------------|
    codebook memory                |
       - - ------------------------|
                  |-5-|---15---|-5-|
                  pi  pp       po

                      |        |                       Codebook vector
                      |---15---|-5-|-----20-----|   <- corresponding to
                          i     ii      iii            sample delay 20

   Figure 3.11.  Generation of the first augmented codebook.

   Figure 3.11 shows the codebook memory with pointers pi, pp, and po,
   where pi points to sample 25, pp to sample 20, and po to sample 5.
   Below the codebook memory, the augmented codebook vector
   corresponding to sample delay 20 is drawn.  Segment i consists of
   fifteen samples from pointer pp and forward in time.  Segment ii
   consists of five interpolated samples from pi and forward and from po
   and forward.  The samples are linearly interpolated with weights
   [0.0, 0.2, 0.4, 0.6, 0.8] for pi and weights [1.0, 0.8, 0.6, 0.4,
   0.2] for po.  Segment iii consists of twenty samples from pp and
   forward.  The augmented codebook vector corresponding to sample delay
   21 is produced by moving pointers pp and pi one sample backward in
   time.  This gives us the following figure.

       - - ------------------------|
    codebook memory                |
       - - ------------------------|
                  |-5-|---16---|-5-|
                  pi  pp       po

                      |        |                       Codebook vector
                      |---16---|-5-|-----19-----|   <- corresponding to
                          i     ii      iii            sample delay 21

   Figure 3.12.  Generation of the second augmented codebook.

   Figure 3.12 shows the codebook memory with pointers pi, pp and po
   where pi points to sample 26, pp to sample 21, and po to sample 5.
   Below the codebook memory, the augmented codebook vector
   corresponding to sample delay 21 is drawn.  Segment i now consists of
   sixteen samples from pp and forward.  Segment ii consists of five
   interpolated samples from pi and forward and from po and forward, and
   the interpolation weights are the same throughout the procedure.
   Segment iii consists of nineteen samples from pp and forward.  The
   same procedure of moving the two pointers is continued until the last
   augmented vector corresponding to sample delay 39 has been created.
   This gives a total of twenty new codebook vectors to each of the two

RFC3951 - Page 26

   sections.  Thus the total number of codebook vectors for each of the
   two sections, when including the augmented codebook, becomes lMem-
   SUBL+1+SUBL/2.  This is provided that augmentation is evoked, i.e.,
   that lTarget=SUBL.

3.6.4.  Codebook Search

   The codebook search uses the codebooks described in the sections
   above to find the best match of the perceptually weighted target, see
   section 3.6.2.  The search method is a multi-stage gain-shape
   matching performed as follows.  At each stage the best shape vector
   is identified, then the gain is calculated and quantized, and finally
   the target is updated in preparation for the next codebook search
   stage.  The number of stages is CB_NSTAGES=3.

   If the target is the 23/22-sample vector the codebooks are indexed so
   that the base codebook is followed by the expanded codebook.  If the
   target is 40 samples the order is as follows: base codebook,
   augmented base codebook, expanded codebook, and augmented expanded
   codebook.  The size of each codebook section and its corresponding
   augmented section is given by Table 3.1 in section 3.6.3.

   For example, when the second 40-sample sub-block is coded, indices 0
   - 107 correspond to the base codebook, 108 - 127 correspond to the
   augmented base codebook, 128 - 235 correspond to the expanded
   codebook, and indices 236 - 255 correspond to the augmented expanded
   codebook.  The indices are divided in the same fashion for all stages
   in the example.  Only in the case of coding the first 40-sample sub-
   block is there a difference between stages (see Table 3.1).

3.6.4.1.  Codebook Search at Each Stage

   The codebooks are searched to find the best match to the target at
   each stage.  When the best match is found, the target is updated and
   the next-stage search is started.  The three chosen codebook vectors
   and their corresponding gains constitute the encoded sub-block.  The
   best match is decided by the following three criteria:

   1. Compute the measure

      (target*cbvec)^2 / ||cbvec||^2

   for all codebook vectors, cbvec, and choose the codebook vector
   maximizing the measure.  The expression (target*cbvec) is the dot
   product between the target vector to be coded and the codebook vector
   for which we compute the measure.  The norm, ||x||, is defined as the
   square root of (x*x).

RFC3951 - Page 27

   2. The absolute value of the gain, corresponding to the chosen
      codebook vector, cbvec, must be smaller than a fixed limit,
      CB_MAXGAIN=1.3:

            |gain| < CB_MAXGAIN

      where the gain is computed in the following way:

            gain = (target*cbvec) / ||cbvec||^2

   3. For the first stage, the dot product of the chosen codebook vector
      and target must be positive:

      target*cbvec > 0

   In practice the above criteria are used in a sequential search
   through all codebook vectors.  The best match is found by registering
   a new max measure and index whenever the previously registered max
   measure is surpassed and all other criteria are fulfilled.  If none
   of the codebook vectors fulfill (2) and (3), the first codebook
   vector is selected.

3.6.4.2.  Gain Quantization at Each Stage

   The gain follows as a result of the computation

      gain = (target*cbvec) / ||cbvec||^2

   for the optimal codebook vector found by the procedure in section
   3.6.4.1.

   The three stages quantize the gain, using 5, 4, and 3 bits,
   respectively.  In the first stage, the gain is limited to positive
   values.  This gain is quantized by finding the nearest value in the
   quantization table gain_sq5Tbl.

   gain_sq5Tbl[32]={0.037476, 0.075012, 0.112488, 0.150024, 0.187500,
                  0.224976, 0.262512, 0.299988, 0.337524, 0.375000,
                  0.412476, 0.450012, 0.487488, 0.525024, 0.562500,
                  0.599976, 0.637512, 0.674988, 0.712524, 0.750000,
                  0.787476, 0.825012, 0.862488, 0.900024, 0.937500,
                  0.974976, 1.012512, 1.049988, 1.087524, 1.125000,
                  1.162476, 1.200012}

   The gains of the subsequent two stages can be either positive or
   negative.  The gains are quantized by using a quantization table
   times a scale factor.  The second stage uses the table gain_sq4Tbl,
   and the third stage uses gain_sq3Tbl.  The scale factor equates 0.1

RFC3951 - Page 28

   or the absolute value of the quantized gain representation value
   obtained in the previous stage, whichever is larger.  Again, the
   resulting gain index is the index to the nearest value of the
   quantization table times the scale factor.

        gainQ = scaleFact * gain_sqXTbl[index]

   gain_sq4Tbl[16]={-1.049988, -0.900024, -0.750000, -0.599976,
                  -0.450012, -0.299988, -0.150024, 0.000000, 0.150024,
                  0.299988, 0.450012, 0.599976, 0.750000, 0.900024,
                  1.049988, 1.200012}

   gain_sq3Tbl[8]={-1.000000, -0.659973, -0.330017,0.000000,
                  0.250000, 0.500000, 0.750000, 1.00000}

3.6.4.3.  Preparation of Target for Next Stage

   Before performing the search for the next stage, the perceptually
   weighted target vector is updated by subtracting from it the selected
   codebook vector (from the perceptually weighted codebook) times the
   corresponding quantized gain.

      target[i] = target[i] - gainQ * selected_vec[i];

   A reference implementation of the codebook encoding is found in
   Appendix A.34.

3.7.  Gain Correction Encoding

   The start state is quantized in a relatively model independent manner
   using 3 bits per sample.  In contrast, the remaining parts of the
   block are encoded by using an adaptive codebook.  This codebook will
   produce high matching accuracy whenever there is a high correlation
   between the target and the best codebook vector.  For unvoiced speech
   segments and background noises, this is not necessarily so, which,
   due to the nature of the squared error criterion, results in a coded
   signal with less power than the target signal.  As the coded start
   state has good power matching to the target, the result is a power
   fluctuation within the encoded frame.  Perceptually, the main problem
   with this is that the time envelope of the signal energy becomes
   unsteady.  To overcome this problem, the gains for the codebooks are
   re-scaled after the codebook encoding by searching for a new gain
   factor for the first stage codebook that provides better power
   matching.

   First, the energy for the target signal, tene, is computed along with
   the energy for the coded signal, cene, given by the addition of the
   three gain scaled codebook vectors.  Because the gains of the second

RFC3951 - Page 29

   and third stage scale with the gain of the first stage, when the
   first stage gain is changed from gain[0] to gain_sq5Tbl[i] the energy
   of the coded signal changes from cene to

      cene*(gain_sq5Tbl[i]*gain_sq5Tbl[i])/(gain[0]*gain[0])

   where gain[0] is the gain for the first stage found in the original
   codebook search.  A refined search is performed by testing the gain
   indices i=0 to 31, and as long as the new codebook energy as given
   above is less than tene, the gain index for stage 1 is increased.  A
   restriction is applied so that the new gain value for stage 1 cannot
   be more than two times higher than the original value found in the
   codebook search.  Note that by using this method we do not change the
   shape of the encoded vector, only the gain or amplitude.

3.8.  Bitstream Definition

   The total number of bits used to describe one frame of 20 ms speech
   is 304, which fits in 38 bytes and results in a bit rate of 15.20
   kbit/s.  For the case of a frame length of 30 ms speech, the total
   number of bits used is 400, which fits in 50 bytes and results in a
   bit rate of 13.33 kbit/s.  In the bitstream definition, the bits are
   distributed into three classes according to their bit error or loss
   sensitivity.  The most sensitive bits (class 1) are placed first in
   the bitstream for each frame.  The less sensitive bits (class 2) are
   placed after the class 1 bits.  The least sensitive bits (class 3)
   are placed at the end of the bitstream for each frame.

   In the 20/30 ms frame length cases for each class, the following hold
   true: The class 1 bits occupy a total of 6/8 bytes (48/64 bits), the
   class 2 bits occupy 8/12 bytes (64/96 bits), and the class 3 bits
   occupy 24/30 bytes (191/239 bits).  This distribution of the bits
   enables the use of uneven level protection (ULP) as is exploited in
   the payload format definition for iLBC [1].  The detailed bit
   allocation is shown in the table below.  When a quantization index is
   distributed between more classes, the more significant bits belong to
   the lowest class.

RFC3951 - Page 30

   Bitstream structure:

   ------------------------------------------------------------------+
   Parameter                         |       Bits Class <1,2,3>      |
                                     |  20 ms frame  |  30 ms frame  |
   ----------------------------------+---------------+---------------+
                            Split 1  |   6 <6,0,0>   |   6 <6,0,0>   |
                   LSF 1    Split 2  |   7 <7,0,0>   |   7 <7,0,0>   |
   LSF                      Split 3  |   7 <7,0,0>   |   7 <7,0,0>   |
                   ------------------+---------------+---------------+
                            Split 1  | NA (Not Appl.)|   6 <6,0,0>   |
                   LSF 2    Split 2  |      NA       |   7 <7,0,0>   |
                            Split 3  |      NA       |   7 <7,0,0>   |
                   ------------------+---------------+---------------+
                   Sum               |  20 <20,0,0>  |  40 <40,0,0>  |
   ----------------------------------+---------------+---------------+
   Block Class                       |   2 <2,0,0>   |   3 <3,0,0>   |
   ----------------------------------+---------------+---------------+
   Position 22 sample segment        |   1 <1,0,0>   |   1 <1,0,0>   |
   ----------------------------------+---------------+---------------+
   Scale Factor State Coder          |   6 <6,0,0>   |   6 <6,0,0>   |
   ----------------------------------+---------------+---------------+
                   Sample 0          |   3 <0,1,2>   |   3 <0,1,2>   |
   Quantized       Sample 1          |   3 <0,1,2>   |   3 <0,1,2>   |
   Residual           :              |   :    :      |   :    :      |
   State              :              |   :    :      |   :    :      |
   Samples            :              |   :    :      |   :    :      |
                   Sample 56         |   3 <0,1,2>   |   3 <0,1,2>   |
                   Sample 57         |      NA       |   3 <0,1,2>   |
                   ------------------+---------------+---------------+
                   Sum               | 171 <0,57,114>| 174 <0,58,116>|
   ----------------------------------+---------------+---------------+
                            Stage 1  |   7 <6,0,1>   |   7 <4,2,1>   |
   CB for 22/23             Stage 2  |   7 <0,0,7>   |   7 <0,0,7>   |
   sample block             Stage 3  |   7 <0,0,7>   |   7 <0,0,7>   |
                   ------------------+---------------+---------------+
                   Sum               |  21 <6,0,15>  |  21 <4,2,15>  |
   ----------------------------------+---------------+---------------+
                            Stage 1  |   5 <2,0,3>   |   5 <1,1,3>   |
   Gain for 22/23           Stage 2  |   4 <1,1,2>   |   4 <1,1,2>   |
   sample block             Stage 3  |   3 <0,0,3>   |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                   Sum               |  12 <3,1,8>   |  12 <2,2,8>   |
   ----------------------------------+---------------+---------------+
                            Stage 1  |   8 <7,0,1>   |   8 <6,1,1>   |
               sub-block 1  Stage 2  |   7 <0,0,7>   |   7 <0,0,7>   |
                            Stage 3  |   7 <0,0,7>   |   7 <0,0,7>   |
                   ------------------+---------------+---------------+

RFC3951 - Page 31

                            Stage 1  |   8 <0,0,8>   |   8 <0,7,1>   |
               sub-block 2  Stage 2  |   8 <0,0,8>   |   8 <0,0,8>   |
   Indices                  Stage 3  |   8 <0,0,8>   |   8 <0,0,8>   |
   for CB          ------------------+---------------+---------------+
   sub-blocks               Stage 1  |      NA       |   8 <0,7,1>   |
               sub-block 3  Stage 2  |      NA       |   8 <0,0,8>   |
                            Stage 3  |      NA       |   8 <0,0,8>   |
                   ------------------+---------------+---------------+
                            Stage 1  |      NA       |   8 <0,7,1>   |
               sub-block 4  Stage 2  |      NA       |   8 <0,0,8>   |
                            Stage 3  |      NA       |   8 <0,0,8>   |
                   ------------------+---------------+---------------+
                   Sum               |  46 <7,0,39>  |  94 <6,22,66> |
   ----------------------------------+---------------+---------------+
                            Stage 1  |   5 <1,2,2>   |   5 <1,2,2>   |
               sub-block 1  Stage 2  |   4 <1,1,2>   |   4 <1,2,1>   |
                            Stage 3  |   3 <0,0,3>   |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                            Stage 1  |   5 <1,1,3>   |   5 <0,2,3>   |
               sub-block 2  Stage 2  |   4 <0,2,2>   |   4 <0,2,2>   |
                            Stage 3  |   3 <0,0,3>   |   3 <0,0,3>   |
   Gains for       ------------------+---------------+---------------+
   sub-blocks               Stage 1  |      NA       |   5 <0,1,4>   |
               sub-block 3  Stage 2  |      NA       |   4 <0,1,3>   |
                            Stage 3  |      NA       |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                            Stage 1  |      NA       |   5 <0,1,4>   |
               sub-block 4  Stage 2  |      NA       |   4 <0,1,3>   |
                            Stage 3  |      NA       |   3 <0,0,3>   |
                   ------------------+---------------+---------------+
                   Sum               |  24 <3,6,15>  |  48 <2,12,34> |
   ----------------------------------+---------------+---------------+
   Empty frame indicator             |   1 <0,0,1>   |   1 <0,0,1>   |
   -------------------------------------------------------------------
   SUM                                 304 <48,64,192> 400 <64,96,240>

   Table 3.2.  The bitstream definition for iLBC for both the 20 ms
   frame size mode and the 30 ms frame size mode.

   When packetized into the payload, the bits MUST be sorted as follows:
   All the class 1 bits in the order (from top to bottom) as specified
   in the table, all the class 2 bits (from top to bottom), and all the
   class 3 bits in the same sequential order.  The last bit, the empty
   frame indicator, SHOULD be set to zero by the encoder.  If this bit
   is set to 1 the decoder SHOULD treat the data as a lost frame.  For
   example, this bit can be set to 1 to indicate lost frame for file
   storage format, as in [1].

RFC3951 - Page 32

4.  Decoder Principles

   This section describes the principles of each component of the
   decoder algorithm.

              +-------------+    +--------+    +---------------+
   payload -> | 1. Get para | -> | 2. LPC | -> | 3. Sc Dequant | ->
              +-------------+    +--------+    +---------------+

              +-------------+    +------------------+
           -> | 4. Mem setup| -> | 5. Construct res |------->
           |  +-------------+    +-------------------   |
           ---------<-----------<-----------<------------
                     Sub-frame 0...2/4 (20 ms/30 ms)

              +----------------+    +----------+
           -> | 6. Enhance res | -> | 7. Synth | ------------>
              +----------------+    +----------+

              +-----------------+
           -> | 8. Post Process | ----------------> decoded speech
              +-----------------+

   Figure 4.1.  Flow chart of the iLBC decoder.  If a frame was lost,
   steps 1 to 5 SHOULD be replaced by a PLC algorithm.

   1. Extract the parameters from the bitstream.

   2. Decode the LPC and interpolate (section 4.1).

   3. Construct the 57/58-sample start state (section 4.2).

   4. Set up the memory by using data from the decoded residual.  This
      memory is used for codebook construction.  For blocks preceding
      the start state, both the decoded residual and the target are time
      reversed.  Sub-frames are decoded in the same order as they were
      encoded.

   5. Construct the residuals of this sub-frame (gain[0]*cbvec[0] +
      gain[1]*cbvec[1] + gain[2]*cbvec[2]).  Repeat 4 and 5 until the
      residual of all sub-blocks has been constructed.

   6. Enhance the residual with the post filter (section 4.6).

   7. Synthesis of the residual (section 4.7).

   8. Post process with HP filter, if desired (section 4.8).

RFC3951 - Page 33

4.1.  LPC Filter Reconstruction

   The decoding of the LP filter parameters is very straightforward.
   For a set of three/six indices, the corresponding LSF vector(s) are
   found by simple table lookup.  For each of the LSF vectors, the three
   split vectors are concatenated to obtain qlsf1 and qlsf2,
   respectively (in the 20 ms mode only one LSF vector, qlsf, is
   constructed).  The next step is the stability check described in
   section 3.2.5 followed by the interpolation scheme described in
   section 3.2.6 (3.2.7 for 20 ms frames).  The only difference is that
   only the quantized LSFs are known at the decoder, and hence the
   unquantized LSFs are not processed.

   A reference implementation of the LPC filter reconstruction is given
   in Appendix A.36.

4.2.  Start State Reconstruction

   The scalar encoded STATE_SHORT_LEN=58 (STATE_SHORT_LEN=57 in the 20
   ms mode) state samples are reconstructed by 1) forming a set of
   samples (by table lookup) from the index stream idxVec[n], 2)
   multiplying the set with 1/scal=(10^qmax)/4.5, 3) time reversing the
   57/58 samples, 4) filtering the time reversed block with the
   dispersion (all-pass) filter used in the encoder (as described in
   section 3.5.2); this compensates for the phase distortion of the
   earlier filter operation, and 5 reversing the 57/58 samples from the
   previous step.

   in(0..(STATE_SHORT_LEN-1)) = time reversed samples from table
                                look-up,
                                idxVecDec((STATE_SHORT_LEN-1)..0)

   in(STATE_SHORT_LEN..(2*STATE_SHORT_LEN-1)) = 0

   Pk(z) = A~rk(z)/A~k(z), where
                                  ___
                                  \
   A~rk(z)= z^(-LPC_FILTERORDER) + > a~ki*z^(i-(LPC_FILTERORDER-1))
                                  /__
                              i=0...(LPC_FILTERORDER-1)

   and A~k(z) is taken from the block where the start state begins

   in -> Pk(z) -> filtered

   out(k) = filtered(STATE_SHORT_LEN-1-k) +
                           filtered(2*STATE_SHORT_LEN-1-k),
                                         k=0..(STATE_SHORT_LEN-1)

RFC3951 - Page 34

   The remaining 23/22 samples in the state are reconstructed by the
   same adaptive codebook technique described in section 4.3.  The
   location bit determines whether these are the first or the last 23/22
   samples of the 80-sample state vector.  If the remaining 23/22
   samples are the first samples, then the scalar encoded
   STATE_SHORT_LEN state samples are time-reversed before initialization
   of the adaptive codebook memory vector.

   A reference implementation of the start state reconstruction is given
   in Appendix A.44.

4.3.  Excitation Decoding Loop

   The decoding of the LPC excitation vector proceeds in the same order
   in which the residual was encoded at the encoder.  That is, after the
   decoding of the entire 80-sample state vector, the forward sub-blocks
   (corresponding to samples occurring after the state vector samples)
   are decoded, and then the backward sub-blocks (corresponding to
   samples occurring before the state vector) are decoded, resulting in
   a fully decoded block of excitation signal samples.

   In particular, each sub-block is decoded by using the multistage
   adaptive codebook decoding module described in section 4.4.  This
   module relies upon an adaptive codebook memory constructed before
   each run of the adaptive codebook decoding.  The construction of the
   adaptive codebook memory in the decoder is identical to the method
   outlined in section 3.6.3, except that it is done on the codebook
   memory without perceptual weighting.

   For the initial forward sub-block, the last STATE_LEN=80 samples of
   the length CB_LMEM=147 adaptive codebook memory are filled with the
   samples of the state vector.  For subsequent forward sub-blocks, the
   first SUBL=40 samples of the adaptive codebook memory are discarded,
   the remaining samples are shifted by SUBL samples toward the
   beginning of the vector, and the newly decoded SUBL=40 samples are
   placed at the end of the adaptive codebook memory.  For backward
   sub-blocks, the construction is similar, except that every vector of
   samples involved is first time reversed.

   A reference implementation of the excitation decoding loop is found
   in Appendix A.5.

RFC3951 - Page 35

4.4.  Multistage Adaptive Codebook Decoding

   The Multistage Adaptive Codebook Decoding module is used at both the
   sender (encoder) and the receiver (decoder) ends to produce a
   synthetic signal in the residual domain that is eventually used to
   produce synthetic speech.  The module takes the index values used to
   construct vectors that are scaled and summed together to produce a
   synthetic signal that is the output of the module.

4.4.1.  Construction of the Decoded Excitation Signal

   The unpacked index values provided at the input to the module are
   references to extended codebooks, which are constructed as described
   in section 3.6.3, except that they are based on the codebook memory
   without the perceptual weighting.  The unpacked three indices are
   used to look up three codebook vectors.  The unpacked three gain
   indices are used to decode the corresponding 3 gains.  In this
   decoding, the successive rescaling, as described in section 3.6.4.2,
   is applied.

   A reference implementation of the adaptive codebook decoding is
   listed in Appendix A.32.

4.5.  Packet Loss Concealment

   If packet loss occurs, the decoder receives a signal saying that
   information regarding a block is lost.  For such blocks it is
   RECOMMENDED to use a Packet Loss Concealment (PLC) unit to create a
   decoded signal that masks the effect of that packet loss.  In the
   following we will describe an example of a PLC unit that can be used
   with the iLBC codec.  As the PLC unit is used only at the decoder,
   the PLC unit does not affect interoperability between
   implementations.  Other PLC implementations MAY therefore be used.

   The PLC described operates on the LP filters and the excitation
   signals and is based on the following principles:

4.5.1.  Block Received Correctly and Previous Block Also Received

   If the block is received correctly, the PLC only records state
   information of the current block that can be used in case the next
   block is lost.  The LP filter coefficients for each sub-block and the
   entire decoded excitation signal are all saved in the decoder state
   structure.  All of this information will be needed if the following
   block is lost.

RFC3951 - Page 36

4.5.2.  Block Not Received

   If the block is not received, the block substitution is based on a
   pitch-synchronous repetition of the excitation signal, which is
   filtered by the last LP filter of the previous block.  The previous
   block's information is stored in the decoder state structure.

   A correlation analysis is performed on the previous block's
   excitation signal in order to detect the amount of pitch periodicity
   and a pitch value.  The correlation measure is also used to decide on
   the voicing level (the degree to which the previous block's
   excitation was a voiced or roughly periodic signal).  The excitation
   in the previous block is used to create an excitation for the block
   to be substituted, such that the pitch of the previous block is
   maintained.  Therefore, the new excitation is constructed in a
   pitch-synchronous manner.  In order to avoid a buzzy-sounding
   substituted block, a random excitation is mixed with the new pitch
   periodic excitation, and the relative use of the two components is
   computed from the correlation measure (voicing level).

   For the block to be substituted, the newly constructed excitation
   signal is then passed through the LP filter to produce the speech
   that will be substituted for the lost block.

   For several consecutive lost blocks, the packet loss concealment
   continues in a similar manner.  The correlation measure of the last
   block received is still used along with the same pitch value.  The LP
   filters of the last block received are also used again.  The energy
   of the substituted excitation for consecutive lost blocks is
   decreased, leading to a dampened excitation, and therefore to
   dampened speech.

4.5.3.  Block Received Correctly When Previous Block Not Received

   For the case in which a block is received correctly when the previous
   block was not, the correctly received block's directly decoded speech
   (based solely on the received block) is not used as the actual
   output.  The reason for this is that the directly decoded speech does
   not necessarily smoothly merge into the synthetic speech generated
   for the previous lost block.  If the two signals are not smoothly
   merged, an audible discontinuity is accidentally produced.
   Therefore, a correlation analysis between the two blocks of
   excitation signal (the excitation of the previous concealed block and
   that of the current received block) is performed to find the best
   phase match.  Then a simple overlap-add procedure is performed to
   merge the previous excitation smoothly into the current block's
   excitation.

RFC3951 - Page 37

   The exact implementation of the packet loss concealment does not
   influence interoperability of the codec.

   A reference implementation of the packet loss concealment is
   suggested in Appendix A.14.  Exact compliance with this suggested
   algorithm is not needed for a reference implementation to be fully
   compatible with the overall codec specification.

4.6.  Enhancement

   The decoder contains an enhancement unit that operates on the
   reconstructed excitation signal.  The enhancement unit increases the
   perceptual quality of the reconstructed signal by reducing the
   speech-correlated noise in the voiced speech segments.  Compared to
   traditional postfilters, the enhancer has an advantage in that it can
   only modify the excitation signal slightly.  This means that there is
   no risk of over enhancement.  The enhancer works very similarly for
   both the 20 ms frame size mode and the 30 ms frame size mode.

   For the mode with 20 ms frame size, the enhancer uses a memory of six
   80-sample excitation blocks prior in time plus the two new 80-sample
   excitation blocks.  For each block of 160 new unenhanced excitation
   samples, 160 enhanced excitation samples are produced.  The enhanced
   excitation is 40-sample delayed compared to the unenhanced
   excitation, as the enhancer algorithm uses lookahead.

   For the mode with 30 ms frame size, the enhancer uses a memory of
   five 80-sample excitation blocks prior in time plus the three new
   80-sample excitation blocks.  For each block of 240 new unenhanced
   excitation samples, 240 enhanced excitation samples are produced.
   The enhanced excitation is 80-sample delayed compared to the
   unenhanced excitation, as the enhancer algorithm uses lookahead.

   Outline of Enhancer

   The speech enhancement unit operates on sub-blocks of 80 samples,
   which means that there are two/three 80 sample sub-blocks per frame.
   Each of these two/three sub-blocks is enhanced separately, but in an
   analogous manner.

RFC3951 - Page 38

   unenhanced residual
           |
           |   +---------------+    +--------------+
           +-> | 1. Pitch Est  | -> | 2. Find PSSQ | -------->
               +---------------+  | +--------------+
                                  +-----<-------<------<--+
               +------------+         enh block 0..1/2    |
            -> | 3. Smooth  |                             |
               +------------+                             |
                 \                                        |
                 /\                                       |
                /  \   Already                            |
               / 4. \----------->----------->-----------+ |
               \Crit/ Fulfilled                         | |
                \? /                                    v |
                 \/                                     | |
                  \  +-----------------+    +---------+ | |
              Not +->| 5. Use Constr.  | -> | 6. Mix  | ----->
           Fulfilled +-----------------+    +---------+

            ---------------> enhanced residual

   Figure 4.2.  Flow chart of the enhancer.

   1. Pitch estimation of each of the two/three new 80-sample blocks.

   2. Find the pitch-period-synchronous sequence n (for block k) by a
      search around the estimated pitch value.  Do this for n=1,2,3,
      -1,-2,-3.

   3. Calculate the smoothed residual generated by the six pitch-
      period-synchronous sequences from prior step.

   4. Check if the smoothed residual satisfies the criterion (section
      4.6.4).

   5. Use constraint to calculate mixing factor (section 4.6.5).

   6. Mix smoothed signal with unenhanced residual (pssq(n) n=0).

   The main idea of the enhancer is to find three 80 sample blocks
   before and three 80-sample blocks after the analyzed unenhanced sub-
   block and to use these to improve the quality of the excitation in
   that sub-block.  The six blocks are chosen so that they have the
   highest possible correlation with the unenhanced sub-block that is
   being enhanced.  In other words, the six blocks are pitch-period-
   synchronous sequences to the unenhanced sub-block.

RFC3951 - Page 39

   A linear combination of the six pitch-period-synchronous sequences is
   calculated that approximates the sub-block.  If the squared error
   between the approximation and the unenhanced sub-block is small
   enough, the enhanced residual is set equal to this approximation.
   For the cases when the squared error criterion is not fulfilled, a
   linear combination of the approximation and the unenhanced residual
   forms the enhanced residual.

4.6.1.  Estimating the Pitch

   Pitch estimates are needed to determine the locations of the pitch-
   period-synchronous sequences in a complexity-efficient way.  For each
   of the new two/three sub-blocks, a pitch estimate is calculated by
   finding the maximum correlation in the range from lag 20 to lag 120.
   These pitch estimates are used to narrow down the search for the best
   possible pitch-period-synchronous sequences.

4.6.2.  Determination of the Pitch-Synchronous Sequences

   Upon receiving the pitch estimates from the prior step, the enhancer
   analyzes and enhances one 80-sample sub-block at a time.  The pitch-
   period-synchronous-sequences pssq(n) can be viewed as vectors of
   length 80 samples each shifted n*lag samples from the current sub-
   block.  The six pitch-period-synchronous-sequences, pssq(-3) to
   pssq(-1) and pssq(1) to pssq(3), are found one at a time by the steps
   below:

   1) Calculate the estimate of the position of the pssq(n).  For
      pssq(n) in front of pssq(0) (n > 0), the location of the pssq(n)
      is estimated by moving one pitch estimate forward in time from the
      exact location of pssq(n-1).  Similarly, pssq(n) behind pssq(0) (n
      < 0) is estimated by moving one pitch estimate backward in time
      from the exact location of pssq(n+1).  If the estimated pssq(n)
      vector location is totally within the enhancer memory (Figure
      4.3), steps 2, 3, and 4 are performed, otherwise the pssq(n) is
      set to zeros.

   2) Compute the correlation between the unenhanced excitation and
      vectors around the estimated location interval of pssq(n).  The
      correlation is calculated in the interval estimated location +/- 2
      samples.  This results in five correlation values.

   3) The five correlation values are upsampled by a factor of 4, by
      using four simple upsampling filters (MA filters with coefficients
      upsFilter1.. upsFilter4).  Within these the maximum value is
      found, which specifies the best pitch-period with a resolution of
      a quarter of a sample.

RFC3951 - Page 40

      upsFilter1[7]={0.000000 0.000000 0.000000 1.000000
             0.000000 0.000000 0.000000}
      upsFilter2[7]={0.015625 -0.076904 0.288330 0.862061
            -0.106445 0.018799 -0.015625}
      upsFilter3[7]={0.023682 -0.124268 0.601563 0.601563
            -0.124268 0.023682 -0.023682}
      upsFilter4[7]={0.018799 -0.106445 0.862061 0.288330
            -0.076904 0.015625 -0.018799}

   4) Generate the pssq(n) vector by upsampling of the excitation memory
      and extracting the sequence that corresponds to the lag delay that
      was calculated in prior step.

   With the steps above, all the pssq(n) can be found in an iterative
   manner, first moving backward in time from pssq(0) and then forward
   in time from pssq(0).


   0              159             319             479             639
   +---------------------------------------------------------------+
   |  -5   |  -4   |  -3   |  -2   |  -1   |   0   |   1   |   2   |
   +---------------------------------------------------------------+
                                               |pssq 0 |
                                          |pssq -1| |pssq 1 |
                                       |pssq -2|       |pssq 2 |
                                    |pssq -3|             |pssq 3 |

   Figure 4.3.  Enhancement for 20 ms frame size.

   Figure 4.3 depicts pitch-period-synchronous sequences in the
   enhancement of the first 80 sample block in the 20 ms frame size
   mode.  The unenhanced signal input is stored in the last two sub-
   blocks (1 - 2), and the six other sub-blocks contain unenhanced
   residual prior-in-time.  We perform the enhancement algorithm on two
   blocks of 80 samples, where the first of the two blocks consists of
   the last 40 samples of sub-block 0 and the first 40 samples of sub-
   block 1.  The second 80-sample block consists of the last 40 samples
   of sub-block 1 and the first 40 samples of sub-block 2.

RFC3951 - Page 41

   0              159             319             479             639
   +---------------------------------------------------------------+
   |  -4   |  -3   |  -2   |  -1   |   0   |   1   |   2   |   3   |
   +---------------------------------------------------------------+
                                   |pssq 0 |
                              |pssq -1| |pssq 1 |
                           |pssq -2|       |pssq 2 |
                        |pssq -3|             |pssq 3 |

   Figure 4.4.  Enhancement for 30 ms frame size.

   Figure 4.4 depicts pitch-period-synchronous sequences in the
   enhancement of the first 80-sample block in the 30 ms frame size
   mode.  The unenhanced signal input is stored in the last three sub-
   blocks (1 - 3).  The five other sub-blocks contain unenhanced
   residual prior-in-time.  The enhancement algorithm is performed on
   the three 80 sample sub-blocks 0, 1, and 2.

4.6.3.  Calculation of the Smoothed Excitation

   A linear combination of the six pssq(n) (n!=0) form a smoothed
   approximation, z, of pssq(0).  Most of the weight is put on the
   sequences that are close to pssq(0), as these are likely to be most
   similar to pssq(0).  The smoothed vector is also rescaled so that the
   energy of z is the same as the energy of pssq(0).

      ___
      \
   y = > pssq(i) * pssq_weight(i)
      /__
   i=-3,-2,-1,1,2,3

   pssq_weight(i) = 0.5*(1-cos(2*pi*(i+4)/(2*3+2)))

   z = C * y, where C = ||pssq(0)||/||y||

4.6.4.  Enhancer Criterion

   The criterion of the enhancer is that the enhanced excitation is not
   allowed to differ much from the unenhanced excitation.  This
   criterion is checked for each 80-sample sub-block.

   e < (b * ||pssq(0)||^2), where b=0.05 and   (Constraint 1)

   e = (pssq(0)-z)*(pssq(0)-z), and "*" means the dot product

RFC3951 - Page 42

4.6.5.  Enhancing the excitation

   From the criterion in the previous section, it is clear that the
   excitation is not allowed to change much.  The purpose of this
   constraint is to prevent the creation of an enhanced signal
   significantly different from the original signal.  This also means
   that the constraint limits the numerical size of the errors that the
   enhancement procedure can make.  That is especially important in
   unvoiced segments and background noise segments for which increased
   periodicity could lead to lower perceived quality.

   When the constraint in the prior section is not met, the enhanced
   residual is instead calculated through a constrained optimization by
   using the Lagrange multiplier technique.  The new constraint is that

      e = (b * ||pssq(0)||^2)                     (Constraint 2)

   We distinguish two solution regions for the optimization: 1) the
   region where the first constraint is fulfilled and 2) the region
   where the first constraint is not fulfilled and the second constraint
   must be used.

   In the first case, where the second constraint is not needed, the
   optimized re-estimated vector is simply z, the energy-scaled version
   of y.

   In the second case, where the second constraint is activated and
   becomes an equality constraint, we have

      z= A*y + B*pssq(0)

   where

      A = sqrt((b-b^2/4)*(w00*w00)/ (w11*w00 + w10*w10)) and

      w11 = pssq(0)*pssq(0)
      w00 = y*y
      w10 = y*pssq(0)    (* symbolizes the dot product)

   and

      B = 1 - b/2 - A * w10/w00

   Appendix A.16 contains a listing of a reference implementation for
   the enhancement method.

RFC3951 - Page 43

4.7.  Synthesis Filtering

   Upon decoding or PLC of the LP excitation block, the decoded speech
   block is obtained by running the decoded LP synthesis filter,
   1/A~k(z), over the block.  The synthesis filters have to be shifted
   to compensate for the delay in the enhancer.  For 20 ms frame size
   mode, they SHOULD be shifted one 40-sample sub-block, and for 30 ms
   frame size mode, they SHOULD be shifted two 40-sample sub-blocks.
   The LP coefficients SHOULD be changed at the first sample of every
   sub-block while keeping the filter state.  For PLC blocks, one
   solution is to apply the last LP coefficients of the last decoded
   speech block for all sub-blocks.

   The reference implementation for the synthesis filtering can be found
   in Appendix A.48.

4.8.  Post Filtering

   If desired, the decoded block can be filtered by a high-pass filter.
   This removes the low frequencies of the decoded signal.  A reference
   implementation of this, with cutoff at 65 Hz, is shown in Appendix
   A.30.

5.  Security Considerations

   This algorithm for the coding of speech signals is not subject to any
   known security consideration; however, its RTP payload format [1] is
   subject to several considerations, which are addressed there.
   Confidentiality of the media streams is achieved by encryption;
   therefore external mechanisms, such as SRTP [5], MAY be used for that
   purpose.

6.  Evaluation of the iLBC Implementations

   It is possible and suggested to evaluate certain iLBC implementation
   by utilizing methodology and tools available at
   http://www.ilbcfreeware.org/evaluation.html

7.  References

7.1.  Normative References

   [1] Duric, A. and S. Andersen, "Real-time Transport Protocol (RTP)
       Payload Format for internet Low Bit Rate Codec (iLBC) Speech",
       RFC 3952, December 2004.

   [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
       Levels", BCP 14, RFC 2119, March 1997.

RFC3951 - Page 44

   [3] PacketCable(TM) Audio/Video Codecs Specification, Cable
       Television Laboratories, Inc.

7.2.  Informative References

   [4] ITU-T Recommendation G.711, available online from the ITU
       bookstore at http://www.itu.int.

   [5] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norman,
       "The Secure Real Time Transport Protocol (SRTP)", RFC 3711, March
       2004.

8.  Acknowledgements

   This extensive work, besides listed authors, has the following
   authors, who could not have been listed among "official" authors (due
   to IESG restrictions in the number of authors who can be listed):

      Manohar N. Murthi (Department of Electrical and Computer
      Engineering, University of Miami), Fredrik Galschiodt, Julian
      Spittka, and Jan Skoglund (Global IP Sound).

   The authors are deeply indebted to the following people and thank
   them sincerely:

      Henry Sinnreich, Patrik Faltstrom, Alan Johnston, and Jean-
      Francois Mule for great support of the iLBC initiative and for
      valuable feedback and comments.

      Peter Vary, Frank Mertz, and Christoph Erdmann (RWTH Aachen);
      Vladimir Cuperman (Niftybox LLC); Thomas Eriksson (Chalmers Univ
      of Tech), and Gernot Kubin (TU Graz), for thorough review of the
      iLBC document and their valuable feedback and remarks.