# VP8 Data Format and Decoding Guide

Pages: 304
Informational
Errata
Part 2 of 11 – Pages 16 to 41

```7.  Boolean Entropy Decoder

As discussed in the overview above, essentially the entire VP8 data
stream is encoded using a boolean entropy coder.

An understanding of the bool_decoder is critical to the
implementation of a VP8 decompressor, so we discuss the bool_decoder
in detail.  It is easier to comprehend the bool_decoder in
conjunction with the bool_encoder used by the compressor to write the
compressed data partitions.

The bool_encoder encodes (and the bool_decoder decodes) one bool
(zero-or-one boolean value) at a time.  Its purpose is to losslessly
compress a sequence of bools for which the probability of their being
zero or one can be well-estimated (via constant or previously coded
information) at the time they are written, using identical
corresponding probabilities at the time they are read.

As the reader is probably aware, if a bool is much more likely to be
zero than one (for instance), it can, on average, be faithfully
encoded using much less than one bit per value.  The bool_encoder
exploits this.

In the 1940s, [Shannon] proved that there is a lower bound for the
average datarate of a faithful encoding of a sequence of bools (whose
probability distributions are known and are independent of each
other) and also that there are encoding algorithms that approximate
this lower bound as closely as one wishes.

If we encode a sequence of bools whose probability of being zero is p
(and whose probability of being 1 is 1-p), the lowest possible
datarate per value is

plog(p) + (1-p)log(1-p);

taking the logarithms to the base 1/2 expresses the datarate in bits/
value.

We give two simple examples.  At one extreme, if p = 1/2, then log(p)
= log(1-p) = 1, and the lowest possible datarate per bool is 1/2 +
1/2 = 1; that is, we cannot do any better than simply literally
writing out bits.  At another extreme, if p is very small, say p =
1/1024, then log(p)=10, log(1-p) is roughly .0014, and the lowest
possible datarate is approximately 10/1024 + .0014, roughly 1/100 of
a bit per bool.
```
```   Because most of the bools in the VP8 datastream have zero-
probabilities nowhere near 1/2, the compression provided by the
bool_encoder is critical to the performance of VP8.

The boolean coder used by VP8 is a variant of an arithmetic coder.
An excellent discussion of arithmetic coding (and other lossless
compression techniques) can be found in [Bell].

7.1.  Underlying Theory of Coding

The basic idea used by the boolean coder is to consider the entire
data stream (either of the partitions in our case) as the binary
expansion of a single number x with 0 <= x < 1.  The bits (or bytes)
in x are of course written from high to low order, and if b[j] (B[j])
is the j^(th) bit (byte) in the partition, the value x is simply the
sum (starting with j = 1) of pow(2, -j) * b[j] or pow(256, -j) *
B[j].

Before the first bool is coded, all values of x are possible.

The coding of each bool restricts the possible values of x in
proportion to the probability of what is coded.  If p1 is the
probability of the first bool being zero and a zero is coded, the
range of possible values of x is restricted to 0 <= x < p1.  If a one
is coded, the range becomes p1 <= x < 1.

The coding continues by repeating the same idea.  At every stage,
there is an interval a <= x < b of possible values of x.  If p is the
probability of a zero being coded at this stage and a zero is coded,
the interval becomes a <= x < a + (p(b-a)).  If a one is coded, the
possible values of x are restricted to a + (p(b-a)) <= x < b.

Assuming that only finitely many values are to be coded, after the
encoder has received the last bool, it can write as its output any
value x that lies in the final interval.  VP8 simply writes the left
endpoint of the final interval.  Consequently, the output it would
make if encoding were to stop at any time either increases or stays
the same as each bool is encoded.

Decoding parallels encoding.  The decoder is presented with the
number x, which has only the initial restriction 0 <= x < 1.  To
decode the first bool, the decoder is given the first probability p1.
If x < p1, a zero is decoded; if x >= p1, a one is decoded.  In
either case, the new restriction on x -- that is, the interval of
possible values of x -- is remembered.
```
```   Decoding continues in exactly the same way: If a <= x < b is the
current interval and we are to decode a bool with zero-probability p,
we return a zero if a <= x < a + (p(b-a)) and a one if a + (p(b-a))
<= x < b.  In either case, the new restriction is remembered in
preparation for decoding the next bool.

The process outlined above uses real numbers of infinite precision to
express the probabilities and ranges.  It is true that, if one could
actualize this process and coded a large number of bools whose
supplied probabilities matched their value distributions, the
datarate achieved would approach the theoretical minimum as the
number of bools encoded increased.

Unfortunately, computers operate at finite precision, and an
approximation to the theoretically perfect process described above is
necessary.  Such approximation increases the datarate but, at quite
moderate precision and for a wide variety of data sets, this increase
is negligible.

The only conceptual limitations are, first, that coder probabilities
must be expressed at finite precision and, second, that the decoder
be able to detect each individual modification to the value interval
via examination of a fixed amount of input.  As a practical matter,
many of the implementation details stem from the fact that the coder
can function using only a small "window" to incrementally read or
write the arbitrarily precise number x.

7.2.  Practical Algorithm Description

VP8's boolean coder works with 8-bit probabilities p.  The range of
such p is 0 <= p <= 255; the actual probability represented by p is
p/256.  Also, the coder is designed so that decoding of a bool
requires no more than an 8-bit comparison, and so that the state of
both the encoder and decoder can be easily represented using a small
number of unsigned 16-bit integers.

The details are most easily understood if we first describe the
algorithm using bit-at-a-time input and output.  Aside from the
ability to maintain a position in this bitstream and write/read bits,
the encoder also needs the ability to add 1 to the bits already
output; after writing n bits, adding 1 to the existing output is the
same thing as adding pow(2, -n) to x.
```
```   Together with the bit position, the encoder must maintain two
unsigned 8-bit numbers, which we call "bottom" and "range".  Writing
w for the n bits already written and S = pow(2, - n - 8) for the
scale of the current bit position one byte out, we have the following
constraint on all future values v of w (including the final value
v = x):

w + ( S * bottom ) <= v < w + ( S * ( bottom + range ) )

Thus, appending bottom to the already-written bits w gives the left
endpoint of the interval of possible values, appending bottom + range
gives the right endpoint, and range itself (scaled to the current
output position) is the length of the interval.

So that our probabilistic encodings are reasonably accurate, we do
not let range vary by more than a factor of two: It stays within the
bounds 128 <= range <= 255.

The process for encoding a boolean value val whose probability of
being zero is prob / 256 -- and whose probability of being one is
( 256 - prob ) / 256 -- with 1 <= prob <= 255 is as follows.

Using an unsigned 16-bit multiply followed by an unsigned right
shift, we calculate an unsigned 8-bit split value:

split = 1 + (((range - 1) * probability)]] >> 8)

split is approximately ( prob / 256 ) * range and lies within the
bounds 1 <= split <= range - 1.  These bounds ensure the correctness
of the decoding procedure described below.

If the incoming boolean val to be encoded is false, we leave the left
interval endpoint bottom alone and reduce range, replacing it by
split.  If the incoming val is true, we move up the left endpoint to
bottom + split, propagating any carry to the already-written value w
(this is where we need the ability to add 1 to w), and reduce range
to range - split.

Regardless of the value encoded, range has been reduced and now has
the bounds 1 <= range <= 254.  If range < 128, the encoder doubles it
and shifts the high-order bit out of bottom to the output as it also
doubles bottom, repeating this process one bit at a time until 128 <=
range <= 255.  Once this is completed, the encoder is ready to accept
another bool, maintaining the constraints described above.

After encoding the last bool, the partition may be completed by
appending bottom to the bitstream.
```
```   The decoder mimics the state of the encoder.  It maintains, together
with an input bit position, two unsigned 8-bit numbers, a range
identical to that maintained by the encoder and a value.  Decoding
one bool at a time, the decoder (in effect) tracks the same left
interval endpoint as does the encoder and subtracts it from the
remaining input.  Appending the unread portion of the bitstream to
the 8-bit value gives the difference between the actual value encoded
and the known left endpoint.

The decoder is initialized by setting range = 255 and reading the
first 16 input bits into value.  The decoder maintains range and
calculates split in exactly the same way as does the encoder.

To decode a bool, it compares value to split; if value < split, the
bool is zero, and range is replaced with split.  If value >= split,
the bool is one, range is replaced with range - split, and value is
replaced with value - split.

Again, range is doubled one bit at a time until it is at least 128.
The value is doubled in parallel, shifting a new input bit into the
bottom each time.

Writing Value for value together with the unread input bits and Range
for range extended indefinitely on the right by zeros, the condition
Value < Range is maintained at all times by the decoder.  In
particular, the bits shifted out of value as it is doubled are always
zero.

7.3.  Actual Implementation

The C code below gives complete implementations of the encoder and
decoder described above.  While they are logically identical to the
"bit-at-a-time" versions, they internally buffer a couple of extra
bytes of the bitstream.  This allows I/O to be done (more
practically) a byte at a time and drastically reduces the number of
carries the encoder has to propagate into the already-written data.

Another (logically equivalent) implementation may be found in the
reference decoder file bool_decoder.h (Section 20.2).
```
```   ---- Begin code block --------------------------------------

/* Encoder first */

typedef struct {
uint8 *output;  /* ptr to next byte to be written */
uint32 range;   /* 128 <= range <= 255 */
uint32 bottom;  /* minimum value of remaining output */
int bit_count;  /* # of shifts before an output byte
is available */
} bool_encoder;

/* Must set initial state of encoder before writing any bools. */

void init_bool_encoder(bool_encoder *e, uint8 *start_partition)
{
e->output = start_partition;
e->range = 255;
e->bottom = 0;
e->bit_count = 24;
}

/* Encoding very rarely produces a carry that must be propagated
to the already-written output.  The arithmetic guarantees that
the propagation will never go beyond the beginning of the
output.  Put another way, the encoded value x is always less
than one. */

{
while (*--q == 255)
*q = 0;
++*q;
}

/* Main function writes a bool_value whose probability of being
zero is (expected to be) prob/256. */

void write_bool(bool_encoder *e, Prob prob, int bool_value)
{
/* split is approximately (range * prob) / 256 and,
crucially, is strictly bigger than zero and strictly
smaller than range */

uint32 split = 1 + (((e->range - 1) * prob) >> 8);
```
```     if (bool_value) {
e->bottom += split; /* move up bottom of interval */
e->range -= split;  /* with corresponding decrease in range */
} else
e->range = split;   /* decrease range, leaving bottom alone */

while (e->range < 128)
{
e->range <<= 1;

if (e->bottom & (1 << 31))  /* detect carry */

e->bottom <<= 1;        /* before shifting bottom */

if (!--e->bit_count) {  /* write out high byte of bottom ... */

*e->output++ = (uint8) (e->bottom >> 24);

e->bottom &= (1 << 24) - 1;  /* ... keeping low 3 bytes */

e->bit_count = 8;            /* 8 shifts until next output */
}
}
}

/* Call this function (exactly once) after encoding the last
bool value for the partition being written */

void flush_bool_encoder(bool_encoder *e)
{
int c = e->bit_count;
uint32 v = e->bottom;

if (v & (1 << (32 - c)))   /* propagate (unlikely) carry */
v <<= c & 7;               /* before shifting remaining output */
c >>= 3;                   /* to top of internal buffer */
while (--c >= 0)
v <<= 8;
c = 4;
while (--c >= 0) {    /* write remaining data, possibly padded */
*e->output++ = (uint8) (v >> 24);
v <<= 8;
}
}
```
```   /* Decoder state exactly parallels that of the encoder.
"value", together with the remaining input, equals the
complete encoded number x less the left endpoint of the
current coding interval. */

typedef struct {
uint8   *input;     /* pointer to next compressed data byte */
uint32  range;      /* always identical to encoder's range */
uint32  value;      /* contains at least 8 significant bits */
int     bit_count;  /* # of bits shifted out of
value, at most 7 */
} bool_decoder;

/* Call this function before reading any bools from the
partition. */

void init_bool_decoder(bool_decoder *d, uint8 *start_partition)
{
{
int i = 0;
d->value = 0;           /* value = first 2 input bytes */
while (++i <= 2)

d->value = (d->value << 8)  |  *start_partition++;
}

d->input = start_partition;  /* ptr to next byte to be read */
d->range = 255;           /* initial range is full */
d->bit_count = 0;         /* have not yet shifted out any bits */
}

/* Main function reads a bool encoded at probability prob/256,
which of course must agree with the probability used when the
bool was written. */

{
/* range and split are identical to the corresponding values
used by the encoder when this bool was written */

uint32  split = 1 + (((d->range - 1) * prob) >> 8);
uint32  SPLIT = split << 8;
int     retval;           /* will be 0 or 1 */
```
```     if (d->value >= SPLIT) {  /* encoded a one */
retval = 1;
d->range -= split;  /* reduce range */
d->value -= SPLIT;  /* subtract off left endpoint of interval */
} else {              /* encoded a zero */
retval = 0;
d->range = split;  /* reduce range, no change in left endpoint */
}

while (d->range < 128) {  /* shift out irrelevant value bits */
d->value <<= 1;
d->range <<= 1;
if (++d->bit_count == 8) {  /* shift in new bits 8 at a time */
d->bit_count = 0;
d->value |= *d->input++;
}
}
return retval;
}

/* Convenience function reads a "literal", that is, a "num_bits"-
wide unsigned value whose bits come high- to low-order, with
each bit encoded at probability 128 (i.e., 1/2). */

{
uint32 v = 0;

while (num_bits--)
v = (v << 1) + read_bool(d, 128);
return v;
}

/* Variant reads a signed number */

{
int32 v = 0;
if (!num_bits)
return 0;
v = -1;
while (--num_bits)
v = (v << 1) + read_bool(d, 128);
return v;
}

---- End code block ----------------------------------------
```
```8.  Compressed Data Components

At the lowest level, VP8's compressed data is simply a sequence of
probabilistically encoded bools.  Most of this data is composed of
(slightly) larger semantic units fashioned from bools, which we
describe here.

We sometimes use these descriptions in C expressions within data
format specifications.  In this context, they refer to the return
value of a call to an appropriate bool_decoder d, reading (as always)
from its current reference point.

+--------------+-------+--------------------------------------------+
| Call         | Alt.  | Return                                     |
+--------------+-------+--------------------------------------------+
| Bool(p)      | B(p)  | Bool with probability p/256 of being 0.    |
|              |       | Return value of read_bool(d, p).           |
|              |       |                                            |
| Flag         | F     | A one-bit flag (same thing as a B(128) or  |
|              |       | an L(1)).  Abbreviated F.  Return value of |
|              |       | read_bool(d, 128).                         |
|              |       |                                            |
| Lit(n)       | L(n)  | Unsigned n-bit number encoded as n flags   |
|              |       | (a "literal").  Abbreviated L(n).  The     |
|              |       | bits are read from high to low order.      |
|              |       | Return value of read_literal(d, n).        |
|              |       |                                            |
| SignedLit(n) |       | Signed n-bit number encoded similarly to   |
|              |       | an L(n).  Return value of                  |
|              |       | read_signed_literal(d, n).  These are      |
|              |       | rare.                                      |
|              |       |                                            |
| P(8)         |       | An 8-bit probability.  No different from   |
|              |       | an L(8), but we sometimes use this         |
|              |       | notation to emphasize that a probability   |
|              |       | is being coded.                            |
|              |       |                                            |
| P(7)         |       | A 7-bit specification of an 8-bit          |
|              |       | probability.  Coded as an L(7) number x;   |
|              |       | the resulting 8-bit probability is x ? x   |
|              |       | << 1 : 1.                                  |
|              |       |                                            |
| F?  X        |       | A flag that, if true, is followed by a     |
|              |       | piece of data X.                           |
|              |       |                                            |
```
```   | F?  X:Y      |       | A flag that, if true, is followed by X     |
|              |       | and, if false, is followed by Y.  Also     |
|              |       | used to express a value where Y is an      |
|              |       | implicit default (not encoded in the data  |
|              |       | stream), as in F?  P(8):255, which         |
|              |       | expresses an optional probability: If the  |
|              |       | flag is true, the probability is specified |
|              |       | as an 8-bit literal, while if the flag is  |
|              |       | false, the probability defaults to 255.    |
|              |       |                                            |
| B(p)?  X     | B(p)? | Variants of the above using a boolean      |
|              | X:Y   | indicator whose probability is not         |
|              |       | necessarily 128.                           |
|              |       |                                            |
| T            |       | Tree-encoded value from small alphabet.    |
+--------------+-------+--------------------------------------------+

The last type requires elaboration.  We often wish to encode
something whose value is restricted to a small number of
possibilities (the alphabet).

This is done by representing the alphabet as the leaves of a small
binary tree.  The (non-leaf) nodes of the tree have associated
probabilities p and correspond to calls to read_bool(d, p).  We think
of a zero as choosing the left branch below the node and a one as
choosing the right branch.

Thus, every value (leaf) whose tree depth is x is decoded after

A tree representing an encoding of an alphabet of n possible values
always contains n-1 non-leaf nodes, regardless of its shape (this is
easily seen by induction on n).

There are many ways that a given alphabet can be so represented.  The
choice of tree has little impact on datarate but does affect decoder
performance.  The trees used by VP8 are chosen to (on average)
minimize the number of calls to read_bool.  This amounts to shaping
the tree so that values that are more probable have smaller tree
depth than do values that are less probable.

Readers familiar with Huffman coding will notice that, given an
alphabet together with probabilities for each value, the associated
Huffman tree minimizes the expected number of calls to read_bool.
```
```   Such readers will also realize that the coding method described here
never results in higher datarates than does the Huffman method and,
indeed, often results in much lower datarates.  Huffman coding is, in
fact, nothing more than a special case of this method in which each
node probability is fixed at 128 (i.e., 1/2).

8.1.  Tree Coding Implementation

We give a suggested implementation of a tree data structure followed
by a couple of actual examples of its usage by VP8.

It is most convenient to represent the values using small positive
integers, typically an enum counting up from zero.  The largest
alphabet (used to code DCT coefficients, described in Section 13)
that is tree-coded by VP8 has only 12 values.  The tree for this
alphabet adds 11 interior nodes and so has a total of 23 positions.
Thus, an 8-bit number easily accommodates both a tree position and a
return value.

A tree may then be compactly represented as an array of (pairs of)
8-bit integers.  Each (even) array index corresponds to an interior
node of the tree; the 0th index of course corresponds to the root of
the tree.  The array entries come in pairs corresponding to the left
(0) and right (1) branches of the subtree below the interior node.
We use the convention that a positive (even) branch entry is the
index of a deeper interior node, while a nonpositive entry v
corresponds to a leaf whose value is -v.

The node probabilities associated to a tree-coded value are stored in
an array whose indices are half the indices of the corresponding tree
positions.  The length of the probability array is one less than the
size of the alphabet.

Here is C code implementing the foregoing.  The advantages of our
data structure should be noted.  Aside from the smallness of the
structure itself, the tree-directed reading algorithm is essentially
a single line of code.
```
```   ---- Begin code block --------------------------------------

/* A tree specification is simply an array of 8-bit integers. */

typedef int8 tree_index;
typedef const tree_index Tree[];

/* Read and return a tree-coded value at the current decoder
position. */

bool_decoder * const d, /* bool_decoder always returns a 0 or 1 */
Tree t,                 /* tree specification */
const Prob p[]     /* corresponding interior node probabilities */
) {
register tree_index i = 0;   /* begin at root */

/* Descend tree until leaf is reached */

while ((i = t[ i + read_bool(d, p[i>>1])]) > 0) {}

return -i;     /* return value is negation of nonpositive index */
}

---- End code block ----------------------------------------

Tree-based decoding is implemented in the reference decoder file
bool_decoder.h (Section 20.2).

8.2.  Tree Coding Example

As a multi-part example, without getting too far into the semantics
of macroblock decoding (which is of course taken up below), we look
at the "mode" coding for intra-predicted macroblocks.

It so happens that, because of a difference in statistics, the Y (or
luma) mode encoding uses two different trees: one for key frames and
another for interframes.  This is the only instance in VP8 of the
same dataset being coded by different trees under different
circumstances.  The UV (or chroma) modes are a proper subset of the Y
modes and, as such, have their own decoding tree.
```
```   ---- Begin code block --------------------------------------

typedef enum
{
DC_PRED, /* predict DC using row above and column to the left */
V_PRED,  /* predict rows using row above */
H_PRED,  /* predict columns using column to the left */
TM_PRED, /* propagate second differences a la "True Motion" */

B_PRED,  /* each Y subblock is independently predicted */

num_uv_modes = B_PRED,  /* first four modes apply to chroma */
num_ymodes   /* all modes apply to luma */
}
intra_mbmode;

/* The aforementioned trees together with the implied codings as
Actual (i.e., positive) indices are always even.
Value (i.e., nonpositive) indices are arbitrary. */

const tree_index ymode_tree [2 * (num_ymodes - 1)] =
{
-DC_PRED, 2,        /* root: DC_PRED = "0", "1" subtree */
4, 6,              /* "1" subtree has 2 descendant subtrees */
-V_PRED, -H_PRED, /* "10" subtree: V_PRED = "100",
H_PRED = "101" */
-TM_PRED, -B_PRED /* "11" subtree: TM_PRED = "110",
B_PRED = "111" */
};

const tree_index kf_ymode_tree [2 * (num_ymodes - 1)] =
{
-B_PRED, 2,            /* root: B_PRED = "0", "1" subtree */
4, 6,                 /* "1" subtree has 2 descendant subtrees */
-DC_PRED, -V_PRED,   /* "10" subtree: DC_PRED = "100",
V_PRED = "101" */
-H_PRED, -TM_PRED    /* "11" subtree: H_PRED = "110",
TM_PRED = "111" */
};
```
```   const tree_index uv_mode_tree [2 * (num_uv_modes - 1)] =
{
-DC_PRED, 2,          /* root: DC_PRED = "0", "1" subtree */
-V_PRED, 4,          /* "1" subtree:  V_PRED = "10",
"11" subtree */
-H_PRED, -TM_PRED   /* "11" subtree: H_PRED = "110",
TM_PRED = "111" */
};

/* Given a bool_decoder d, a Y mode might be decoded as follows. */

const Prob pretend_its_huffman [num_ymodes - 1] =
{ 128, 128, 128, 128};

pretend_its_huffman);

---- End code block ----------------------------------------

Since it greatly facilitates re-use of reference code, and since
there is no real reason to do otherwise, it is strongly suggested
that any decoder implementation use exactly the same enumeration
values and probability table layouts as those described in this
document (and in the reference code) for all tree-coded data in VP8.

The uncompressed data chunk at the start of each frame and at the
first part of the first data partition contains information
pertaining to the frame as a whole.  We list the fields in the order
of occurrence.  Most of the header decoding occurs in the reference
decoder file dixie.c (Section 20.4).

9.1.  Uncompressed Data Chunk

The uncompressed data chunk comprises a common (for key frames and
interframes) 3-byte frame tag that contains four fields, as follows:

1.  A 1-bit frame type (0 for key frames, 1 for interframes).

2.  A 3-bit version number (0 - 3 are defined as four different
profiles with different decoding complexity; other values may be
defined for future variants of the VP8 data format).
```
```   3.  A 1-bit show_frame flag (0 when current frame is not for display,
1 when current frame is for display).

4.  A 19-bit field containing the size of the first data partition in
bytes.

The version number setting enables or disables certain features in
the bitstream, as follows:

+---------+-------------------------+-------------+
| Version | Reconstruction Filter   | Loop Filter |
+---------+-------------------------+-------------+
| 0       | Bicubic                 | Normal      |
|         |                         |             |
| 1       | Bilinear                | Simple      |
|         |                         |             |
| 2       | Bilinear                | None        |
|         |                         |             |
| 3       | None                    | None        |
|         |                         |             |
| Other   | Reserved for future use |             |
+---------+-------------------------+-------------+

The reference software also adjusts the loop filter based on version
number, as per the table above.  Version number 1 implies a "simple"
loop filter, and version numbers 2 and 3 imply no loop filter.
However, the "simple" filter setting in this context has no effect
whatsoever on the decoding process, and the "no loop filter" setting
only forces the reference encoder to set filter level equal to 0.
Neither affect the decoding process.  In decoding, the only loop
filter settings that matter are those in the frame header.

For key frames, the frame tag is followed by a further 7 bytes of
uncompressed data, as follows:

---- Begin code block --------------------------------------

Start code byte 0     0x9d
Start code byte 1     0x01
Start code byte 2     0x2a

16 bits      :     (2 bits Horizontal Scale << 14) | Width (14 bits)
16 bits      :     (2 bits Vertical Scale << 14) | Height (14 bits)

---- End code block ----------------------------------------
```
```   The following source code segment illustrates validation of the start
code and reading the width, height, and scale factors for a key
frame.

---- Begin code block --------------------------------------

unsigned char *c = pbi->source+3;

// vet via sync code
if (c[0]!=0x9d||c[1]!=0x01||c[2]!=0x2a)
return -1;

---- End code block ----------------------------------------

Where pbi->source points to the beginning of the frame.

The following code reads the image dimension from the bitstream:

---- Begin code block --------------------------------------

pc->Width      = swap2(*(unsigned short*)(c+3))&0x3fff;
pc->horiz_scale = swap2(*(unsigned short*)(c+3))>>14;
pc->Height     = swap2(*(unsigned short*)(c+5))&0x3fff;
pc->vert_scale  = swap2(*(unsigned short*)(c+5))>>14;

---- End code block ----------------------------------------

Where the swap2 macro takes care of the endian on a different
platform:

---- Begin code block --------------------------------------

#if defined(__ppc__) || defined(__ppc64__)
# define swap2(d)  \
((d&0x000000ff)<<8) |  \
((d&0x0000ff00)>>8)
#else
# define swap2(d) d
#endif

---- End code block ----------------------------------------

While each frame is encoded as a raster scan of 16x16 macroblocks,
the frame dimensions are not necessarily evenly divisible by 16.  In
this case, write ew = 16 - (width & 15) and eh = 16 - (height & 15)
for the excess width and height, respectively.  Although they are
```
```   encoded, the last ew columns and eh rows are not actually part of the
image and should be discarded before final output.  However, these
"excess pixels" should be maintained in the internal reconstruction
buffer used to predict ensuing frames.

The scaling specifications for each dimension are encoded as follows.

+-------+--------------------------------------+
| Value | Scaling                              |
+-------+--------------------------------------+
| 0     | No upscaling (the most common case). |
|       |                                      |
| 1     | Upscale by 5/4.                      |
|       |                                      |
| 2     | Upscale by 5/3.                      |
|       |                                      |
| 3     | Upscale by 2.                        |
+-------+--------------------------------------+

Upscaling does not affect the reconstruction buffer, which should be
maintained at the encoded resolution.  Any reasonable method of
upsampling (including any that may be supported by video hardware in
the playback environment) may be used.  Since scaling has no effect
on decoding, we do not discuss it any further.

As discussed in Section 5, allocation (or re-allocation) of data
structures (such as the reconstruction buffer) whose size depends on
dimension will be triggered here.

9.2.  Color Space and Pixel Type (Key Frames Only)

+-------+------------------------------------------+
| Field | Value                                    |
+-------+------------------------------------------+
| L(1)  | 1-bit color space type specification     |
|       |                                          |
| L(1)  | 1-bit pixel value clamping specification |
+-------+------------------------------------------+

The color space type bit is encoded as follows:

o  0 - YUV color space similar to the YCrCb color space defined in
[ITU-R_BT.601]

o  1 - Reserved for future use
```
```   The pixel value clamping type bit is encoded as follows:

o  0 - Decoders are required to clamp the reconstructed pixel values
to between 0 and 255 (inclusive).

o  1 - Reconstructed pixel values are guaranteed to be between 0 and
255; no clamping is necessary.

Information in this subsection does not appear in interframes.

This subsection contains probability and value information for
behavior.  The data in this subsection is used in the decoding of the
ensuing per-segment information and applies to the entire frame.
be assigned a segment ID.  Macroblocks with the same segment ID
over default baseline values for the frame.  The adjustments can be
quantizer level or loop filter strength.

The context for decoding this feature at the macroblock level is
provided by a subsection in the frame header, which contains:

1.  A segmentation_enabled flag that enables the feature for this
frame if set to 1, and disables it if set to 0.  The following
fields occur if the feature is enabled.

2.  L(1) indicates if the segment map is updated for the current
frame (update_mb_segmentation_map).

3.  L(1) indicates if the segment feature data items are updated for
the current frame (update_segment_feature_data).

4.  If Item 3 above (update_segment_feature_data) is 1, the following
fields occur:

a.  L(1), the mode of segment feature data
(segment_feature_mode), can be absolute-value mode (0) or
delta value mode (1).
```
```       b.  Segment feature data items are decoded segment by segment for
each segment feature.  For every data item, a one-bit flag
indicates whether the item is 0, or a non-zero value to be
decoded.  If the value is non-zero, then the value is decoded
as a magnitude L(n), followed by a one-bit sign (L(1) -- 0
for positive and 1 for negative).  The length n can be looked
up from a pre-defined length table for all feature data.

5.  If the L(1) flag as noted in Item 2 above is set to 1, the
probabilities of the decoding tree for the segment map are
decoded from the bitstream.  Each probability is decoded with a
one-bit flag indicating whether the probability is the default
value of 255 (flag is set to 0), or an 8-bit value, L(8), from
the bitstream.

The layout and semantics supporting this feature at the macroblock
level are described in Section 10.

9.4.  Loop Filter Type and Levels

VP8 supports two types of loop filters having different computational
complexity.  The following bits occur in the header to support the
selection of the baseline type, strength, and sharpness behavior of
the loop filter used for the current frame.

+-------+-------------------+
| Index | Description       |
+-------+-------------------+
| L(1)  | filter_type       |
|       |                   |
| L(6)  | loop_filter_level |
|       |                   |
| L(3)  | sharpness_level   |
+-------+-------------------+

The meaning of these numbers will be further explained in Section 15.

VP8 has a feature in the bitstream that enables adjustment of the
loop filter level based on a macroblock's prediction mode and
reference frame.  The per-macroblock adjustment is done through delta
values against the default loop filter level for the current frame.
This subsection contains flag and value information for implementing
per-macroblock loop filter level adjustment to default decoder
behavior.  The data in this section is used in the decoding of the
ensuing per-macroblock information and applies to the entire frame.
```
```   L(1) is a one-bit flag indicating if the macroblock loop filter
adjustment is on for the current frame.  0 means that such a feature
is not supported in the current frame, and 1 means this feature is
enabled for the current frame.

Whether the adjustment is based on a reference frame or encoding
mode, the adjustment of the loop filter level is done via a delta
value against a baseline loop filter value.  The delta values are
updated for the current frame if an L(1) bit,
mode_ref_lf_delta_update, takes the value 1.  There are two groups of
delta values: One group of delta values is for reference frame-based
number of delta values in the two groups is MAX_REF_LF_DELTAS and
MAX_MODE_LF_DELTAS, respectively.  For every value within the two
groups, there is a one-bit L(1) to indicate if the particular value
is updated.  When one is updated (1), it is transmitted as a six-bit-
magnitude L(6) followed by a one-bit sign flag (L(1) -- 0 for
positive and 1 for negative).

9.5.  Token Partition and Partition Data Offsets

VP8 allows DCT coefficients to be packed into multiple partitions,
besides the first partition with header and per-macroblock prediction
information, so the decoder can perform parallel decoding in an
efficient manner.  A two-bit L(2) is used to indicate the number of
coefficient data partitions within a compressed frame.  The two bits
are defined in the following table:

+-------+-------+----------------------+
| Bit 1 | Bit 0 | Number of Partitions |
+-------+-------+----------------------+
| 0     | 0     | 1                    |
|       |       |                      |
| 0     | 1     | 2                    |
|       |       |                      |
| 1     | 0     | 4                    |
|       |       |                      |
| 1     | 1     | 8                    |
+-------+-------+----------------------+

Offsets are embedded in the bitstream to provide the decoder direct
greater than 1, the size of each partition (except the last) is
written in 3 bytes (24 bits).  The size of the last partition is the
remainder of the data not used by any of the previous partitions.
```
```   The partitioned data are consecutive in the bitstream, so the size
can also be used to calculate the offset of each partition.  The
following pseudocode illustrates how the size/offset is defined by
the three bytes in the bitstream.

---- Begin code block --------------------------------------

Offset/size  =  (uint32)(byte0) + ((uint32)(byte1)<<8)
+ ((uint32)(byte2)<<16);

---- End code block ----------------------------------------

9.6.  Dequantization Indices

All residue signals are specified via a quantized 4x4 DCT applied to
the Y, U, V, or Y2 subblocks of a macroblock.  As detailed in
Section 14, before inverting the transform, each decoded coefficient
is multiplied by one of six dequantization factors, the choice of
which depends on the plane (Y, chroma = U or V, Y2) and coefficient
position (DC = coefficient 0, AC = coefficients 1-15).  The six
values are specified using 7-bit indices into six corresponding fixed
tables (the tables are given in Section 14).

The first 7-bit index gives the dequantization table index for
Y-plane AC coefficients, called yac_qi.  It is always coded and acts
as a baseline for the other 5 quantization indices, each of which is
represented by a delta from this baseline index.  Pseudocode for

---- Begin code block --------------------------------------

yac_qi     = L(7);           /* Y ac index always specified */
ydc_delta  = F? delta(): 0;  /* Y dc delta specified if
flag is true */

y2dc_delta = F? delta(): 0;  /* Y2 dc delta specified if
flag is true */
y2ac_delta = F? delta(): 0;  /* Y2 ac delta specified if
flag is true */

uvdc_delta = F? delta(): 0;  /* chroma dc delta specified
if flag is true */
uvac_delta = F? delta(): 0;  /* chroma ac delta specified
if flag is true */

---- End code block ----------------------------------------
```
```   Where delta() is the process to read 5 bits from the bitstream to
determine a signed delta value:

+-------+--------------------------------------------------+
| Index | Description                                      |
+-------+--------------------------------------------------+
| L(4)  | Magnitude of delta                               |
|       |                                                  |
| L(1)  | Sign of delta, 0 for positive and 1 for negative |
+-------+--------------------------------------------------+

9.7.  Refresh Golden Frame and Altref Frame

For key frames, both the golden frame and the altref frame are
refreshed/ replaced by the current reconstructed frame, by default.
For non-key frames, VP8 uses two bits to indicate whether the two
frame buffers are refreshed, using the reconstructed current frame:

+-------+----------------------------------------------------------+
| Index | Description                                              |
+-------+----------------------------------------------------------+
| L(1)  | Whether golden frame is refreshed (0 for no, 1 for yes). |
|       |                                                          |
| L(1)  | Whether altref frame is refreshed (0 for no, 1 for yes). |
+-------+----------------------------------------------------------+

When the flag for the golden frame is 0, VP8 uses 2 more bits in the
bitstream to indicate whether the buffer (and which buffer) is copied
to the golden frame, or if no buffer is copied:

+-------+------------------------------------------+
| Index | Description                              |
+-------+------------------------------------------+
| L(2)  | Buffer copy flag for golden frame buffer |
+-------+------------------------------------------+

Where:

o  0 means no buffer is copied to the golden frame

o  1 means last_frame is copied to the golden frame

o  2 means alt_ref_frame is copied to the golden frame

Similarly, when the flag for altref is 0, VP8 uses 2 bits in the
bitstream to indicate which buffer is copied to alt_ref_frame.
```
```           +-------+------------------------------------------+
| Index | Description                              |
+-------+------------------------------------------+
| L(2)  | Buffer copy flag for altref frame buffer |
+-------+------------------------------------------+

Where:

o  0 means no buffer is copied to the altref frame

o  1 means last_frame is copied to the altref frame

o  2 means golden_frame is copied to the altref frame

Two bits are transmitted for ref_frame_sign_bias for golden_frame and
alt_ref_frame, respectively.

+-------+---------------------------------+
| Index | Description                     |
+-------+---------------------------------+
| L(1)  | Sign bias flag for golden frame |
|       |                                 |
| L(1)  | Sign bias flag for altref frame |
+-------+---------------------------------+

These values are used to control the sign of the motion vectors when
a golden frame or an altref frame is used as the reference frame for
a macroblock.

9.8.  Refresh Last Frame Buffer

VP8 uses one bit, L(1), to indicate if the last frame reference
buffer is refreshed using the constructed current frame.  On a key
frame, this bit is overridden, and the last frame buffer is always
refreshed.

9.9.  DCT Coefficient Probability Update

This field contains updates to the probability tables used to decode
DCT coefficients.  For each of the probabilities in the tables, there
is an L(1) flag indicating if the probability is updated for the
current frame, and if the L(1) flag is set to 1, there follows an
additional 8-bit value representing the new probability value.  These
tables are maintained across interframes but are of course replaced
with their defaults at the beginning of every key frame.

The layout and semantics of this field will be taken up in
Section 13.
```
```9.10.  Remaining Frame Header Data (Non-Key Frame)

+-------+-----------------------------------------------------------+
| Index | Description                                               |
+-------+-----------------------------------------------------------+
| L(1)  | mb_no_skip_coeff.  This flag indicates at the frame level |
|       | if skipping of macroblocks with no non-zero coefficients  |
|       | is enabled.  If it is set to 0, then prob_skip_false is   |
|       | not read and mb_skip_coeff is forced to 0 for all         |
|       | macroblocks (see Sections 11.1 and 12.1).                 |
|       |                                                           |
| L(8)  | prob_skip_false = probability used for decoding a         |
|       | macroblock-level flag, which indicates if a macroblock    |
|       | has any non-zero coefficients.  Only read if              |
|       | mb_no_skip_coeff is 1.                                    |
|       |                                                           |
| L(8)  | prob_intra = probability that a macroblock is "intra"     |
|       | predicted (that is, predicted from the already-encoded    |
|       | portions of the current frame), as opposed to "inter"     |
|       | predicted (that is, predicted from the contents of a      |
|       | prior frame).                                             |
|       |                                                           |
| L(8)  | prob_last = probability that an inter-predicted           |
|       | macroblock is predicted from the immediately previous     |
|       | frame, as opposed to the most recent golden frame or      |
|       | altref frame.                                             |
|       |                                                           |
| L(8)  | prob_gf = probability that an inter-predicted macroblock  |
|       | is predicted from the most recent golden frame, as        |
|       | opposed to the altref frame.                              |
|       |                                                           |
| F     | If true, followed by four L(8)s updating the              |
|       | probabilities for the different types of intra-prediction |
|       | for the Y plane.  These probabilities correspond to the   |
|       | four interior nodes of the decoding tree for intra-Y      |
|       | modes in an interframe, that is, the even positions in    |
|       | the ymode_tree array given above.                         |
|       |                                                           |
| F     | If true, followed by three L(8)s updating the             |
|       | probabilities for the different types of intra-prediction |
|       | for the chroma planes.  These probabilities correspond to |
|       | the even positions in the uv_mode_tree array given above. |
|       |                                                           |
| X     | Motion vector probability update.  Details are given in   |
|       | Section 17.2, "Probability Updates".                      |
+-------+-----------------------------------------------------------+
```
```   Decoding of this portion of the frame header is handled in the
reference decoder file dixie.c (Section 20.4).

9.11.  Remaining Frame Header Data (Key Frame)

+-------+-----------------------------------------------------------+
| Index | Description                                               |
+-------+-----------------------------------------------------------+
| L(1)  | mb_no_skip_coeff.  This flag indicates at the frame level |
|       | if skipping of macroblocks with no non-zero coefficients  |
|       | is enabled.  If it is set to 0, then prob_skip_false is   |
|       | not read and mb_skip_coeff is forced to 0 for all         |
|       | macroblocks (see Sections 11.1 and 12.1).                 |
|       |                                                           |
| L(8)  | prob_skip_false = Probability used for decoding a         |
|       | macroblock-level flag, which indicates if a macroblock    |
|       | has any non-zero coefficients.  Only read if              |
|       | mb_no_skip_coeff is 1.                                    |
+-------+-----------------------------------------------------------+

Decoding of this portion of the frame header is handled in the
reference decoder file modemv.c (Section 20.11).

This completes the layout of the frame header.  The remainder of the
first data partition consists of macroblock-level prediction data.

After the frame header is processed, all probabilities needed to
decode the prediction and residue data are known and will not change
until the next frame.

```

(page 41 continued on part 3)