5. RaptorQ FEC Code Specification 5.1. Background For the purpose of the RaptorQ FEC code specification in this section, the following definitions, symbols, and abbreviations apply. A basic understanding of linear algebra, matrix operations, and finite fields is assumed in this section. In particular, matrix multiplication and matrix inversion operations over a mixture of the

finite fields GF[2] and GF[256] are used. A basic familiarity with sparse linear equations, and efficient implementations of algorithms that take advantage of sparse linear equations, is also quite beneficial to an implementer of this specification. 5.1.1. Definitions o Source block: a block of K source symbols that are considered together for RaptorQ encoding and decoding purposes. o Extended Source Block: a block of K' source symbols, where K' >= K, constructed from a source block and zero or more padding symbols. o Symbol: a unit of data. The size, in octets, of a symbol is known as the symbol size. The symbol size is always a positive integer. o Source symbol: the smallest unit of data used during the encoding process. All source symbols within a source block have the same size. o Padding symbol: a symbol with all zero bits that is added to the source block to form the extended source block. o Encoding symbol: a symbol that can be sent as part of the encoding of a source block. The encoding symbols of a source block consist of the source symbols of the source block and the repair symbols generated from the source block. Repair symbols generated from a source block have the same size as the source symbols of that source block. o Repair symbol: the encoding symbols of a source block that are not source symbols. The repair symbols are generated based on the source symbols of a source block. o Intermediate symbols: symbols generated from the source symbols using an inverse encoding process based on pre-coding relationships. The repair symbols are then generated directly from the intermediate symbols. The encoding symbols do not include the intermediate symbols, i.e., intermediate symbols are not sent as part of the encoding of a source block. The intermediate symbols are partitioned into LT symbols and PI symbols for the purposes of the encoding process. o LT symbols: a process similar to that described in [LTCodes] is used to generate part of the contribution to each generated encoding symbol from the portion of the intermediate symbols designated as LT symbols.

o PI symbols: a process even simpler than that described in [LTCodes] is used to generate the other part of the contribution to each generated encoding symbol from the portion of the intermediate symbols designated as PI symbols. In the decoding algorithm suggested in Section 5.4, the PI symbols are inactivated at the start, i.e., are placed into the matrix U at the beginning of the first phase of the decoding algorithm. Because the symbols corresponding to the columns of U are sometimes called the "inactivated" symbols, and since the PI symbols are inactivated at the beginning, they are considered "permanently inactivated". o HDPC symbols: there is a small subset of the intermediate symbols that are HDPC symbols. Each HDPC symbol has a pre-coding relationship with a large fraction of the other intermediate symbols. HDPC means "High Density Parity Check". o LDPC symbols: there is a moderate-sized subset of the intermediate symbols that are LDPC symbols. Each LDPC symbol has a pre-coding relationship with a small fraction of the other intermediate symbols. LDPC means "Low Density Parity Check". o Systematic code: a code in which all source symbols are included as part of the encoding symbols of a source block. The RaptorQ code as described herein is a systematic code. o Encoding Symbol ID (ESI): information that uniquely identifies each encoding symbol associated with a source block for sending and receiving purposes. o Internal Symbol ID (ISI): information that uniquely identifies each symbol associated with an extended source block for encoding and decoding purposes. o Arithmetic operations on octets and symbols and matrices: the operations that are used to produce encoding symbols from source symbols and vice versa. See Section 5.7. 5.1.2. Symbols i, j, u, v, h, d, a, b, d1, a1, b1, v, m, x, y represent values or variables of one type or another, depending on the context. X denotes a non-negative integer value that is either an ISI value or an ESI value, depending on the context. ceil(x) denotes the smallest integer that is greater than or equal to x, where x is a real value.

floor(x) denotes the largest integer that is less than or equal to x, where x is a real value. min(x,y) denotes the minimum value of the values x and y, and in general the minimum value of all the argument values. max(x,y) denotes the maximum value of the values x and y, and in general the maximum value of all the argument values. i % j denotes i modulo j. i + j denotes the sum of i and j. If i and j are octets or symbols, this designates the arithmetic on octets or symbols, respectively, as defined in Section 5.7. If i and j are integers, then it denotes the usual integer addition. i * j denotes the product of i and j. If i and j are octets, this designates the arithmetic on octets, as defined in Section 5.7. If i is an octet and j is a symbol, this denotes the multiplication of a symbol by an octet, as also defined in Section 5.7. Finally, if i and j are integers, i * j denotes the usual product of integers. a ^^ b denotes the operation a raised to the power b. If a is an octet and b is a non-negative integer, this is understood to mean a*a*...*a (b terms), with '*' being the octet product as defined in Section 5.7. u ^ v denotes, for equal-length bit strings u and v, the bitwise exclusive-or of u and v. Transpose[A] denotes the transposed matrix of matrix A. In this specification, all matrices have entries that are octets. A^^-1 denotes the inverse matrix of matrix A. In this specification, all the matrices have octets as entries, so it is understood that the operations of the matrix entries are to be done as stated in Section 5.7 and A^^-1 is the matrix inverse of A with respect to octet arithmetic. K denotes the number of symbols in a single source block. K' denotes the number of source plus padding symbols in an extended source block. For the majority of this specification, the padding symbols are considered to be additional source symbols. K'_max denotes the maximum number of source symbols that can be in a single source block. Set to 56403.

L denotes the number of intermediate symbols for a single extended source block. S denotes the number of LDPC symbols for a single extended source block. These are LT symbols. For each value of K' shown in Table 2 in Section 5.6, the corresponding value of S is a prime number. H denotes the number of HDPC symbols for a single extended source block. These are PI symbols. B denotes the number of intermediate symbols that are LT symbols excluding the LDPC symbols. W denotes the number of intermediate symbols that are LT symbols. For each value of K' in Table 2 shown in Section 5.6, the corresponding value of W is a prime number. P denotes the number of intermediate symbols that are PI symbols. These contain all HDPC symbols. P1 denotes the smallest prime number greater than or equal to P. U denotes the number of non-HDPC intermediate symbols that are PI symbols. C denotes an array of intermediate symbols, C[0], C[1], C[2], ..., C[L-1]. C' denotes an array of the symbols of the extended source block, where C'[0], C'[1], C'[2], ..., C'[K-1] are the source symbols of the source block and C'[K], C'[K+1], ..., C'[K'-1] are padding symbols. V0, V1, V2, V3 denote four arrays of 32-bit unsigned integers, V0[0], V0[1], ..., V0[255]; V1[0], V1[1], ..., V1[255]; V2[0], V2[1], ..., V2[255]; and V3[0], V3[1], ..., V3[255] as shown in Section 5.5. Rand[y, i, m] denotes a pseudo-random number generator. Deg[v] denotes a degree generator. Enc[K', C ,(d, a, b, d1, a1, b1)] denotes an encoding symbol generator. Tuple[K', X] denotes a tuple generator function.

T denotes the symbol size in octets. J(K') denotes the systematic index associated with K'. G denotes any generator matrix. I_S denotes the S x S identity matrix. 5.2. Overview This section defines the systematic RaptorQ FEC code. Symbols are the fundamental data units of the encoding and decoding process. For each source block, all symbols are the same size, referred to as the symbol size T. The atomic operations performed on symbols for both encoding and decoding are the arithmetic operations defined in Section 5.7. The basic encoder is described in Section 5.3. The encoder first derives a block of intermediate symbols from the source symbols of a source block. This intermediate block has the property that both source and repair symbols can be generated from it using the same process. The encoder produces repair symbols from the intermediate block using an efficient process, where each such repair symbol is the exclusive-or of a small number of intermediate symbols from the block. Source symbols can also be reproduced from the intermediate block using the same process. The encoding symbols are the combination of the source and repair symbols. An example of a decoder is described in Section 5.4. The process for producing source and repair symbols from the intermediate block is designed so that the intermediate block can be recovered from any sufficiently large set of encoding symbols, independent of the mix of source and repair symbols in the set. Once the intermediate block is recovered, missing source symbols of the source block can be recovered using the encoding process. Requirements for a RaptorQ-compliant decoder are provided in Section 5.8. A number of decoding algorithms are possible to achieve these requirements. An efficient decoding algorithm to achieve these requirements is provided in Section 5.4. The construction of the intermediate and repair symbols is based in part on a pseudo-random number generator described in Section 5.3. This generator is based on a fixed set of 1024 random numbers that must be available to both sender and receiver. These numbers are

provided in Section 5.5. Encoding and decoding operations for RaptorQ use operations on octets. Section 5.7 describes how to perform these operations. Finally, the construction of the intermediate symbols from the source symbols is governed by "systematic indices", values of which are provided in Section 5.6 for specific extended source block sizes between 6 and K'_max = 56403 source symbols. Thus, the RaptorQ code supports source blocks with between 1 and 56403 source symbols. 5.3. Systematic RaptorQ Encoder 5.3.1. Introduction For a given source block of K source symbols, for encoding and decoding purposes, the source block is augmented with K'-K additional padding symbols, where K' is the smallest value that is at least K in the systematic index Table 2 of Section 5.6. The reason for padding out a source block to a multiple of K' is to enable faster encoding and decoding and to minimize the amount of table information that needs to be stored in the encoder and decoder. For purposes of transmitting and receiving data, the value of K is used to determine the number of source symbols in a source block, and thus K needs to be known at the sender and the receiver. In this case, the sender and receiver can compute K' from K and the K'-K padding symbols can be automatically added to the source block without any additional communication. The encoding symbol ID (ESI) is used by a sender and receiver to identify the encoding symbols of a source block, where the encoding symbols of a source block consist of the source symbols and the repair symbols associated with the source block. For a source block with K source symbols, the ESIs for the source symbols are 0, 1, 2, ..., K-1, and the ESIs for the repair symbols are K, K+1, K+2, .... Using the ESI for identifying encoding symbols in transport ensures that the ESI values continue consecutively between the source and repair symbols. For purposes of encoding and decoding data, the value of K' derived from K is used as the number of source symbols of the extended source block upon which encoding and decoding operations are performed, where the K' source symbols consist of the original K source symbols and an additional K'-K padding symbols. The Internal Symbol ID (ISI) is used by the encoder and decoder to identify the symbols associated with the extended source block, i.e., for generating encoding symbols and for decoding. For a source block with K original source symbols, the ISIs for the original source symbols are 0, 1, 2, ..., K-1, the ISIs for the K'-K padding symbols are K, K+1, K+2, ..., K'-1, and the ISIs for the repair symbols are K', K'+1, K'+2, .... Using the ISI

for encoding and decoding allows the padding symbols of the extended source block to be treated the same way as other source symbols of the extended source block. Also, it ensures that a given prefix of repair symbols are generated in a consistent way for a given number K' of source symbols in the extended source block, independent of K. The relationship between the ESIs and the ISIs is simple: the ESIs and the ISIs for the original K source symbols are the same, the K'-K padding symbols have an ISI but do not have a corresponding ESI (since they are symbols that are neither sent nor received), and a repair symbol ISI is simply the repair symbol ESI plus K'-K. The translation between ESIs (used to identify encoding symbols sent and received) and the corresponding ISIs (used for encoding and decoding), as well as determining the proper padding of the extended source block with padding symbols (used for encoding and decoding), is the internal responsibility of the RaptorQ encoder/decoder. 5.3.2. Encoding Overview The systematic RaptorQ encoder is used to generate any number of repair symbols from a source block that consists of K source symbols placed into an extended source block C'. Figure 4 shows the encoding overview. The first step of encoding is to construct an extended source block by adding zero or more padding symbols such that the total number of symbols, K', is one of the values listed in Section 5.6. Each padding symbol consists of T octets where the value of each octet is zero. K' MUST be selected as the smallest value of K' from the table of Section 5.6 that is greater than or equal to K.

-----------------------------------------------------------+ | | | +-----------+ +--------------+ +-------------+ | C' | | | C' | Intermediate | C | | | ----+--->| Padding |--->| Symbol |--->| Encoding |--+--> K | | | K' | Generation | L | | | | +-----------+ +--------------+ +-------------+ | | | (d,a,b, ^ | | | d1,a1,b1)| | | | +------------+ | | | K' | Tuple | | | +----------------------------->| | | | | Generation | | | +------------+ | | ^ | +-------------------------------------------------+--------+ | ISI X Figure 4: Encoding Overview Let C'[0], ..., C'[K-1] denote the K source symbols. Let C'[K], ..., C'[K'-1] denote the K'-K padding symbols, which are all set to zero bits. Then, C'[0], ..., C'[K'-1] are the symbols of the extended source block upon which encoding and decoding are performed. In the remainder of this description, these padding symbols will be considered as additional source symbols and referred to as such. However, these padding symbols are not part of the encoding symbols, i.e., they are not sent as part of the encoding. At a receiver, the value of K' can be computed based on K, then the receiver can insert K'-K padding symbols at the end of a source block of K' source symbols and recover the remaining K source symbols of the source block from received encoding symbols. The second step of encoding is to generate a number, L > K', of intermediate symbols from the K' source symbols. In this step, K' source tuples (d[0], a[0], b[0], d1[0], a1[0], b1[0]), ..., (d[K'-1], a[K'-1], b[K'-1], d1[K'-1], a1[K'-1], b1[K'-1]) are generated using the Tuple[] generator as described in Section 5.3.5.4. The K' source tuples and the ISIs associated with the K' source symbols are used to determine L intermediate symbols C[0], ..., C[L-1] from the source symbols using an inverse encoding process. This process can be realized by a RaptorQ decoding process.

Certain "pre-coding relationships" must hold within the L intermediate symbols. Section 5.3.3.3 describes these relationships. Section 5.3.3.4 describes how the intermediate symbols are generated from the source symbols. Once the intermediate symbols have been generated, repair symbols can be produced. For a repair symbol with ISI X > K', the tuple of non- negative integers (d, a, b, d1, a1, b1) can be generated, using the Tuple[] generator as described in Section 5.3.5.4. Then, the (d, a, b, d1, a1, b1) tuple and the ISI X are used to generate the corresponding repair symbol from the intermediate symbols using the Enc[] generator described in Section 5.3.5.3. The corresponding ESI for this repair symbol is then X-(K'-K). Note that source symbols of the extended source block can also be generated using the same process, i.e., for any X < K', the symbol generated using this process has the same value as C'[X]. 5.3.3. First Encoding Step: Intermediate Symbol Generation 5.3.3.1. General This encoding step is a pre-coding step to generate the L intermediate symbols C[0], ..., C[L-1] from the source symbols C'[0], ..., C'[K'-1], where L > K' is defined in Section 5.3.3.3. The intermediate symbols are uniquely defined by two sets of constraints: 1. The intermediate symbols are related to the source symbols by a set of source symbol tuples and by the ISIs of the source symbols. The generation of the source symbol tuples is defined in Section 5.3.3.2 using the Tuple[] generator as described in Section 5.3.5.4. 2. A number of pre-coding relationships hold within the intermediate symbols themselves. These are defined in Section 5.3.3.3. The generation of the L intermediate symbols is then defined in Section 5.3.3.4. 5.3.3.2. Source Symbol Tuples Each of the K' source symbols is associated with a source symbol tuple (d[X], a[X], b[X], d1[X], a1[X], b1[X]) for 0 <= X < K'. The source symbol tuples are determined using the Tuple[] generator defined in Section 5.3.5.4 as: For each X, 0 <= X < K' (d[X], a[X], b[X], d1[X], a1[X], b1[X]) = Tuple[K, X]

5.3.3.3. Pre-Coding Relationships The pre-coding relationships amongst the L intermediate symbols are defined by requiring that a set of S+H linear combinations of the intermediate symbols evaluate to zero. There are S LDPC and H HDPC symbols, and thus L = K'+S+H. Another partition of the L intermediate symbols is into two sets, one set of W LT symbols and another set of P PI symbols, and thus it is also the case that L = W+P. The P PI symbols are treated differently than the W LT symbols in the encoding process. The P PI symbols consist of the H HDPC symbols together with a set of U = P-H of the other K' intermediate symbols. The W LT symbols consist of the S LDPC symbols together with W-S of the other K' intermediate symbols. The values of these parameters are determined from K' as described below, where H(K'), S(K'), and W(K') are derived from Table 2 in Section 5.6. Let o S = S(K') o H = H(K') o W = W(K') o L = K' + S + H o P = L - W o P1 denote the smallest prime number greater than or equal to P. o U = P - H o B = W - S o C[0], ..., C[B-1] denote the intermediate symbols that are LT symbols but not LDPC symbols. o C[B], ..., C[B+S-1] denote the S LDPC symbols that are also LT symbols. o C[W], ..., C[W+U-1] denote the intermediate symbols that are PI symbols but not HDPC symbols. o C[L-H], ..., C[L-1] denote the H HDPC symbols that are also PI symbols.

The first set of pre-coding relations, called LDPC relations, is described below and requires that at the end of this process the set of symbols D[0] , ..., D[S-1] are all zero: o Initialize the symbols D[0] = C[B], ..., D[S-1] = C[B+S-1]. o For i = 0, ..., B-1 do * a = 1 + floor(i/S) * b = i % S * D[b] = D[b] + C[i] * b = (b + a) % S * D[b] = D[b] + C[i] * b = (b + a) % S * D[b] = D[b] + C[i] o For i = 0, ..., S-1 do * a = i % P * b = (i+1) % P * D[i] = D[i] + C[W+a] + C[W+b] Recall that the addition of symbols is to be carried out as specified in Section 5.7. Note that the LDPC relations as defined in the algorithm above are linear, so there exists an S x B matrix G_LDPC,1 and an S x P matrix G_LDPC,2 such that G_LDPC,1 * Transpose[(C[0], ..., C[B-1])] + G_LDPC,2 * Transpose(C[W], ..., C[W+P-1]) + Transpose[(C[B], ..., C[B+S-1])] = 0 (The matrix G_LDPC,1 is defined by the first loop in the above algorithm, and G_LDPC,2 can be deduced from the second loop.) The second set of relations among the intermediate symbols C[0], ..., C[L-1] are the HDPC relations and they are defined as follows:

Let o alpha denote the octet represented by integer 2 as defined in Section 5.7. o MT denote an H x (K' + S) matrix of octets, where for j=0, ..., K'+S-2, the entry MT[i,j] is the octet represented by the integer 1 if i= Rand[j+1,6,H] or i = (Rand[j+1,6,H] + Rand[j+1,7,H-1] + 1) % H, and MT[i,j] is the zero element for all other values of i, and for j=K'+S-1, MT[i,j] = alpha^^i for i=0, ..., H-1. o GAMMA denote a (K'+S) x (K'+S) matrix of octets, where GAMMA[i,j] = alpha ^^ (i-j) for i >= j, 0 otherwise. Then, the relationship between the first K'+S intermediate symbols C[0], ..., C[K'+S-1] and the H HDPC symbols C[K'+S], ..., C[K'+S+H-1] is given by: Transpose[C[K'+S], ..., C[K'+S+H-1]] + MT * GAMMA * Transpose[C[0], ..., C[K'+S-1]] = 0, where '*' represents standard matrix multiplication utilizing the octet multiplication to define the multiplication between a matrix of octets and a matrix of symbols (in particular, the column vector of symbols), and '+' denotes addition over octet vectors. 5.3.3.4. Intermediate Symbols 5.3.3.4.1. Definition Given the K' source symbols C'[0], C'[1], ..., C'[K'-1] the L intermediate symbols C[0], C[1], ..., C[L-1] are the uniquely defined symbol values that satisfy the following conditions: 1. The K' source symbols C'[0], C'[1], ..., C'[K'-1] satisfy the K' constraints C'[X] = Enc[K', (C[0], ..., C[L-1]), (d[X], a[X], b[X], d1[X], a1[X], b1[X])], for all X, 0 <= X < K', where (d[X], a[X], b[X], d1[X], a1[X], b1[X])) = Tuple[K',X], Tuple[] is defined in Section 5.3.5.4, and Enc[] is described in Section 5.3.5.3.

2. The L intermediate symbols C[0], C[1], ..., C[L-1] satisfy the pre-coding relationships defined in Section 5.3.3.3. 5.3.3.4.2. Example Method for Calculation of Intermediate Symbols This section describes a possible method for calculation of the L intermediate symbols C[0], C[1], ..., C[L-1] satisfying the constraints in Section 5.3.3.4.1. The L intermediate symbols can be calculated as follows: Let o C denote the column vector of the L intermediate symbols, C[0], C[1], ..., C[L-1]. o D denote the column vector consisting of S+H zero symbols followed by the K' source symbols C'[0], C'[1], ..., C'[K'-1]. Then, the above constraints define an L x L matrix A of octets such that: A*C = D The matrix A can be constructed as follows: Let o G_LDPC,1 and G_LDPC,2 be S x B and S x P matrices as defined in Section 5.3.3.3. o G_HDPC be the H x (K'+S) matrix such that G_HDPC * Transpose(C[0], ..., C[K'+S-1]) = Transpose(C[K'+S], ..., C[L-1]), i.e., G_HDPC = MT*GAMMA o I_S be the S x S identity matrix o I_H be the H x H identity matrix o G_ENC be the K' x L matrix such that G_ENC * Transpose[(C[0], ..., C[L-1])] = Transpose[(C'[0],C'[1], ...,C'[K'-1])],

i.e., G_ENC[i,j] = 1 if and only if C[j] is included in the symbols that are summed to produce Enc[K', (C[0], ..., C[L-1]), (d[i], a[i], b[i], d1[i], a1[i], b1[i])] and G_ENC[i,j] = 0 otherwise. Then o The first S rows of A are equal to G_LDPC,1 | I_S | G_LDPC,2. o The next H rows of A are equal to G_HDPC | I_H. o The remaining K' rows of A are equal to G_ENC. The matrix A is depicted in Figure 5 below: B S U H +-----------------------+-------+------------------+ | | | | S | G_LDPC,1 | I_S | G_LDPC,2 | | | | | +-----------------------+-------+----------+-------+ | | | H | G_HDPC | I_H | | | | +------------------------------------------+-------+ | | | | K' | G_ENC | | | | | +--------------------------------------------------+ Figure 5: The Matrix A The intermediate symbols can then be calculated as: C = (A^^-1)*D The source tuples are generated such that for any K' matrix A has full rank and is therefore invertible. This calculation can be realized by applying a RaptorQ decoding process to the K' source symbols C'[0], C'[1], ..., C'[K'-1] to produce the L intermediate symbols C[0], C[1], ..., C[L-1]. To efficiently generate the intermediate symbols from the source symbols, it is recommended that an efficient decoder implementation such as that described in Section 5.4 be used.

5.3.4. Second Encoding Step: Encoding In the second encoding step, the repair symbol with ISI X (X >= K') is generated by applying the generator Enc[K', (C[0], C[1], ..., C[L-1]), (d, a, b, d1, a1, b1)] defined in Section 5.3.5.3 to the L intermediate symbols C[0], C[1], ..., C[L-1] using the tuple (d, a, b, d1, a1, b1)=Tuple[K',X]. 5.3.5. Generators 5.3.5.1. Random Number Generator The random number generator Rand[y, i, m] is defined as follows, where y is a non-negative integer, i is a non-negative integer less than 256, and m is a positive integer, and the value produced is an integer between 0 and m-1. Let V0, V1, V2, and V3 be the arrays provided in Section 5.5. Let o x0 = (y + i) mod 2^^8 o x1 = (floor(y / 2^^8) + i) mod 2^^8 o x2 = (floor(y / 2^^16) + i) mod 2^^8 o x3 = (floor(y / 2^^24) + i) mod 2^^8 Then Rand[y, i, m] = (V0[x0] ^ V1[x1] ^ V2[x2] ^ V3[x3]) % m 5.3.5.2. Degree Generator The degree generator Deg[v] is defined as follows, where v is a non- negative integer that is less than 2^^20 = 1048576. Given v, find index d in Table 1 such that f[d-1] <= v < f[d], and set Deg[v] = min(d, W-2). Recall that W is derived from K' as described in Section 5.3.3.3.

+---------+---------+---------+---------+ | Index d | f[d] | Index d | f[d] | +---------+---------+---------+---------+ | 0 | 0 | 1 | 5243 | +---------+---------+---------+---------+ | 2 | 529531 | 3 | 704294 | +---------+---------+---------+---------+ | 4 | 791675 | 5 | 844104 | +---------+---------+---------+---------+ | 6 | 879057 | 7 | 904023 | +---------+---------+---------+---------+ | 8 | 922747 | 9 | 937311 | +---------+---------+---------+---------+ | 10 | 948962 | 11 | 958494 | +---------+---------+---------+---------+ | 12 | 966438 | 13 | 973160 | +---------+---------+---------+---------+ | 14 | 978921 | 15 | 983914 | +---------+---------+---------+---------+ | 16 | 988283 | 17 | 992138 | +---------+---------+---------+---------+ | 18 | 995565 | 19 | 998631 | +---------+---------+---------+---------+ | 20 | 1001391 | 21 | 1003887 | +---------+---------+---------+---------+ | 22 | 1006157 | 23 | 1008229 | +---------+---------+---------+---------+ | 24 | 1010129 | 25 | 1011876 | +---------+---------+---------+---------+ | 26 | 1013490 | 27 | 1014983 | +---------+---------+---------+---------+ | 28 | 1016370 | 29 | 1017662 | +---------+---------+---------+---------+ | 30 | 1048576 | | | +---------+---------+---------+---------+ Table 1: Defines the Degree Distribution for Encoding Symbols 5.3.5.3. Encoding Symbol Generator The encoding symbol generator Enc[K', (C[0], C[1], ..., C[L-1]), (d, a, b, d1, a1, b1)] takes the following inputs: o K' is the number of source symbols for the extended source block. Let L, W, B, S, P, and P1 be derived from K' as described in Section 5.3.3.3.

o (C[0], C[1], ..., C[L-1]) is the array of L intermediate symbols (sub-symbols) generated as described in Section 5.3.3.4. o (d, a, b, d1, a1, b1) is a source tuple determined from ISI X using the Tuple[] generator defined in Section 5.3.5.4, whereby * d is a positive integer denoting an encoding symbol LT degree * a is a positive integer between 1 and W-1 inclusive * b is a non-negative integer between 0 and W-1 inclusive * d1 is a positive integer that has value either 2 or 3 denoting an encoding symbol PI degree * a1 is a positive integer between 1 and P1-1 inclusive * b1 is a non-negative integer between 0 and P1-1 inclusive The encoding symbol generator produces a single encoding symbol as output (referred to as result), according to the following algorithm: o result = C[b] o For j = 1, ..., d-1 do * b = (b + a) % W * result = result + C[b] o While (b1 >= P) do b1 = (b1+a1) % P1 o result = result + C[W+b1] o For j = 1, ..., d1-1 do * b1 = (b1 + a1) % P1 * While (b1 >= P) do b1 = (b1+a1) % P1 * result = result + C[W+b1] o Return result

5.3.5.4. Tuple Generator The tuple generator Tuple[K',X] takes the following inputs: o K': the number of source symbols in the extended source block o X: an ISI Let o L be determined from K' as described in Section 5.3.3.3 o J = J(K') be the systematic index associated with K', as defined in Table 2 in Section 5.6 The output of the tuple generator is a tuple, (d, a, b, d1, a1, b1), determined as follows: o A = 53591 + J*997 o if (A % 2 == 0) { A = A + 1 } o B = 10267*(J+1) o y = (B + X*A) % 2^^32 o v = Rand[y, 0, 2^^20] o d = Deg[v] o a = 1 + Rand[y, 1, W-1] o b = Rand[y, 2, W] o If (d < 4) { d1 = 2 + Rand[X, 3, 2] } else { d1 = 2 } o a1 = 1 + Rand[X, 4, P1-1] o b1 = Rand[X, 5, P1] 5.4. Example FEC Decoder 5.4.1. General This section describes an efficient decoding algorithm for the RaptorQ code introduced in this specification. Note that each received encoding symbol is a known linear combination of the intermediate symbols. So, each received encoding symbol provides a

linear equation among the intermediate symbols, which, together with the known linear pre-coding relationships amongst the intermediate symbols, gives a system of linear equations. Thus, any algorithm for solving systems of linear equations can successfully decode the intermediate symbols and hence the source symbols. However, the algorithm chosen has a major effect on the computational efficiency of the decoding. 5.4.2. Decoding an Extended Source Block 5.4.2.1. General It is assumed that the decoder knows the structure of the source block it is to decode, including the symbol size, T, and the number K of symbols in the source block and the number K' of source symbols in the extended source block. From the algorithms described in Section 5.3, the RaptorQ decoder can calculate the total number L = K'+S+H of intermediate symbols and determine how they were generated from the extended source block to be decoded. In this description, it is assumed that the received encoding symbols for the extended source block to be decoded are passed to the decoder. Furthermore, for each such encoding symbol, it is assumed that the number and set of intermediate symbols whose sum is equal to the encoding symbol are passed to the decoder. In the case of source symbols, including padding symbols, the source symbol tuples described in Section 5.3.3.2 indicate the number and set of intermediate symbols that sum to give each source symbol. Let N >= K' be the number of received encoding symbols to be used for decoding, including padding symbols for an extended source block, and let M = S+H+N. Then, with the notation of Section 5.3.3.4.2, we have A*C = D. Decoding an extended source block is equivalent to decoding C from known A and D. It is clear that C can be decoded if and only if the rank of A is L. Once C has been decoded, missing source symbols can be obtained by using the source symbol tuples to determine the number and set of intermediate symbols that must be summed to obtain each missing source symbol. The first step in decoding C is to form a decoding schedule. In this step, A is converted using Gaussian elimination (using row operations and row and column reorderings) and after discarding M - L rows, into the L x L identity matrix. The decoding schedule consists of the sequence of row operations and row and column reorderings during the Gaussian elimination process, and it only depends on A and not on D.

The decoding of C from D can take place concurrently with the forming of the decoding schedule, or the decoding can take place afterwards based on the decoding schedule. The correspondence between the decoding schedule and the decoding of C is as follows. Let c[0] = 0, c[1] = 1, ..., c[L-1] = L-1 and d[0] = 0, d[1] = 1, ..., d[M-1] = M-1 initially. o Each time a multiple, beta, of row i of A is added to row i' in the decoding schedule, then in the decoding process the symbol beta*D[d[i]] is added to symbol D[d[i']]. o Each time a row i of A is multiplied by an octet beta, then in the decoding process the symbol D[d[i]] is also multiplied by beta. o Each time row i is exchanged with row i' in the decoding schedule, then in the decoding process the value of d[i] is exchanged with the value of d[i']. o Each time column j is exchanged with column j' in the decoding schedule, then in the decoding process the value of c[j] is exchanged with the value of c[j']. From this correspondence, it is clear that the total number of operations on symbols in the decoding of the extended source block is the number of row operations (not exchanges) in the Gaussian elimination. Since A is the L x L identity matrix after the Gaussian elimination and after discarding the last M - L rows, it is clear at the end of successful decoding that the L symbols D[d[0]], D[d[1]], ..., D[d[L-1]] are the values of the L symbols C[c[0]], C[c[1]], ..., C[c[L-1]]. The order in which Gaussian elimination is performed to form the decoding schedule has no bearing on whether or not the decoding is successful. However, the speed of the decoding depends heavily on the order in which Gaussian elimination is performed. (Furthermore, maintaining a sparse representation of A is crucial, although this is not described here.) The remainder of this section describes an order in which Gaussian elimination could be performed that is relatively efficient. 5.4.2.2. First Phase In the first phase of the Gaussian elimination, the matrix A is conceptually partitioned into submatrices and, additionally, a matrix X is created. This matrix has as many rows and columns as A, and it will be a lower triangular matrix throughout the first phase. At the beginning of this phase, the matrix A is copied into the matrix X.

The submatrix sizes are parameterized by non-negative integers i and u, which are initialized to 0 and P, the number of PI symbols, respectively. The submatrices of A are: 1. The submatrix I defined by the intersection of the first i rows and first i columns. This is the identity matrix at the end of each step in the phase. 2. The submatrix defined by the intersection of the first i rows and all but the first i columns and last u columns. All entries of this submatrix are zero. 3. The submatrix defined by the intersection of the first i columns and all but the first i rows. All entries of this submatrix are zero. 4. The submatrix U defined by the intersection of all the rows and the last u columns. 5. The submatrix V formed by the intersection of all but the first i columns and the last u columns and all but the first i rows. Figure 6 illustrates the submatrices of A. At the beginning of the first phase, V consists of the first L-P columns of A, and U consists of the last P columns corresponding to the PI symbols. In each step, a row of A is chosen. +-----------+-----------------+---------+ | | | | | I | All Zeros | | | | | | +-----------+-----------------+ U | | | | | | | | | | All Zeros | V | | | | | | | | | | +-----------+-----------------+---------+ Figure 6: Submatrices of A in the First Phase The following graph defined by the structure of V is used in determining which row of A is chosen. The columns that intersect V are the nodes in the graph, and the rows that have exactly 2 nonzero entries in V and are not HDPC rows are the edges of the graph that connect the two columns (nodes) in the positions of the two ones. A component in this graph is a maximal set of nodes (columns) and edges

(rows) such that there is a path between each pair of nodes/edges in the graph. The size of a component is the number of nodes (columns) in the component. There are at most L steps in the first phase. The phase ends successfully when i + u = L, i.e., when V and the all zeros submatrix above V have disappeared, and A consists of I, the all zeros submatrix below I, and U. The phase ends unsuccessfully in decoding failure if at some step before V disappears there is no nonzero row in V to choose in that step. In each step, a row of A is chosen as follows: o If all entries of V are zero, then no row is chosen and decoding fails. o Let r be the minimum integer such that at least one row of A has exactly r nonzeros in V. * If r != 2, then choose a row with exactly r nonzeros in V with minimum original degree among all such rows, except that HDPC rows should not be chosen until all non-HDPC rows have been processed. * If r = 2 and there is a row with exactly 2 ones in V, then choose any row with exactly 2 ones in V that is part of a maximum size component in the graph described above that is defined by V. * If r = 2 and there is no row with exactly 2 ones in V, then choose any row with exactly 2 nonzeros in V. After the row is chosen in this step, the first row of A that intersects V is exchanged with the chosen row so that the chosen row is the first row that intersects V. The columns of A among those that intersect V are reordered so that one of the r nonzeros in the chosen row appears in the first column of V and so that the remaining r-1 nonzeros appear in the last columns of V. The same row and column operations are also performed on the matrix X. Then, an appropriate multiple of the chosen row is added to all the other rows of A below the chosen row that have a nonzero entry in the first column of V. Specifically, if a row below the chosen row has entry beta in the first column of V, and the chosen row has entry alpha in the first column of V, then beta/alpha multiplied by the chosen row is added to this row to leave a zero value in the first column of V. Finally, i is incremented by 1 and u is incremented by r-1, which completes the step.

Note that efficiency can be improved if the row operations identified above are not actually performed until the affected row is itself chosen during the decoding process. This avoids processing of row operations for rows that are not eventually used in the decoding process, and in particular this avoids those rows for which beta!=1 until they are actually required. Furthermore, the row operations required for the HDPC rows may be performed for all such rows in one process, by using the algorithm described in Section 5.3.3.3. 5.4.2.3. Second Phase At this point, all the entries of X outside the first i rows and i columns are discarded, so that X has lower triangular form. The last i rows and columns of X are discarded, so that X now has i rows and i columns. The submatrix U is further partitioned into the first i rows, U_upper, and the remaining M - i rows, U_lower. Gaussian elimination is performed in the second phase on U_lower either to determine that its rank is less than u (decoding failure) or to convert it into a matrix where the first u rows is the identity matrix (success of the second phase). Call this u x u identity matrix I_u. The M - L rows of A that intersect U_lower - I_u are discarded. After this phase, A has L rows and L columns. 5.4.2.4. Third Phase After the second phase, the only portion of A that needs to be zeroed out to finish converting A into the L x L identity matrix is U_upper. The number of rows i of the submatrix U_upper is generally much larger than the number of columns u of U_upper. Moreover, at this time, the matrix U_upper is typically dense, i.e., the number of nonzero entries of this matrix is large. To reduce this matrix to a sparse form, the sequence of operations performed to obtain the matrix U_lower needs to be inverted. To this end, the matrix X is multiplied with the submatrix of A consisting of the first i rows of A. After this operation, the submatrix of A consisting of the intersection of the first i rows and columns equals to X, whereas the matrix U_upper is transformed to a sparse form. 5.4.2.5. Fourth Phase For each of the first i rows of U_upper, do the following: if the row has a nonzero entry at position j, and if the value of that nonzero entry is b, then add to this row b times row j of I_u. After this step, the submatrix of A consisting of the intersection of the first i rows and columns is equal to X, the submatrix U_upper consists of zeros, the submatrix consisting of the intersection of the last u rows and the first i columns consists of zeros, and the submatrix consisting of the last u rows and columns is the matrix I_u.

5.4.2.6. Fifth Phase For j from 1 to i, perform the following operations: 1. If A[j,j] is not one, then divide row j of A by A[j,j]. 2. For l from 1 to j-1, if A[j,l] is nonzero, then add A[j,l] multiplied with row l of A to row j of A. After this phase, A is the L x L identity matrix and a complete decoding schedule has been successfully formed. Then, the corresponding decoding consisting of summing known encoding symbols can be executed to recover the intermediate symbols based on the decoding schedule. The tuples associated with all source symbols are computed according to Section 5.3.3.2. The tuples for received source symbols are used in the decoding. The tuples for missing source symbols are used to determine which intermediate symbols need to be summed to recover the missing source symbols.