Section 9.4.9). For a stream-based transport, two strategies are therefore possible for the decompressor dispatcher: 1) The dispatcher collects a complete SigComp message and then invokes the UDVM. The advantage is that, even in implementations that have multiple incoming compressed streams, only one instance of the UDVM is ever required. 2) The dispatcher collects the SigComp header (see Section 7) and invokes the UDVM; the UDVM stays active while the rest of the message arrives. The advantage is that there is no need to buffer up the rest of the message; the message can be decompressed as it arrives, and any decompressed output can be relayed to the application immediately. In general, which of the strategies is used is an implementation choice. However, the compressor may want to take advantage of strategy 2 by expecting that some of the application message is passed on to the application before the SigComp message is terminated, e.g., by keeping the UDVM active while expecting the application to continuously receive decompressed output. This approach ("continuous mode") invalidates some assumptions of the SigComp security model and can only be used if the transport itself can provide the required protection against denial of service attacks. Also, since only strategy 2 works in this approach, the use of continuous mode requires previous agreement between the two endpoints.
Occurs in data stream: Action: 0xFF 00 one 0xFF byte in the data stream 0xFF 01 same, but the next byte is quoted (could be another 0xFF) : : 0xFF 7F same, but the next 127 bytes are quoted 0xFF 80 to 0xFF FE (reserved for future standardization) 0xFF FF end of SigComp message The combinations 0xFF01 to 0xFF7F are useful to limit the worst case expansion of the record marking scheme: the 1 (0xFF01) to 127 (0xFF7F) bytes following the byte combination are copied literally by the decompressor without taking any special action on 0xFF. (Note that 0xFF00 is just a special case of this, where zero following bytes are copied literally.) In UDVM version 0x01, any occurrence of the combinations 0xFF80 to 0xFFFE that are not protected by quoting causes decompression failure; the decompressor SHOULD close the stream-based transport in this case. Section 9.4.9). The application uses a suitable authentication mechanism to determine whether the decompressed message belongs to a legitimate compartment or not. If the application fails to authenticate the message with sufficient confidence to allow state to be saved or feedback data to be trusted, it supplies a "no valid compartment" error to the dispatcher and the UDVM is terminated without creating any state or forwarding any feedback data.
If present, the requested feedback item SHOULD be copied unmodified into the returned_feedback_item field provided in the SigComp message. Note that there is no need to transmit any requested feedback item more than once. The compressor SHOULD also upload the local SigComp parameters to the remote endpoint, unless the endpoint has indicated that it does not wish to receive these parameters or the compressor determines that the parameters have already successfully arrived (see Section 5.1 for details of how this can be achieved). The SigComp parameters are uploaded to the UDVM memory at the remote endpoint as described in Section 9.4.9. RFC-3321] for further details). The compressor must also ensure that the state item it wishes to access has not been rejected due to a lack of state memory. This can be accomplished by checking the state_memory_size parameter using the SigComp feedback mechanism (see Section 9.4.9 for further details).
If a compression failure occurs then the compressor informs the dispatcher and takes no further action. The dispatcher MUST report this failure to the application so that it can try other methods to deliver the message.
UDVM can also request the creation of a new state item by using the STATE-CREATE and END-MESSAGE instructions (see Chapter 9 for further details). Section 9.4.9. 3. If the state creation request contains a state_identifier that already exists then the state handler checks whether the requested state item is identical to the established state item and counts the state creation request as successful if this is the case. If not then the state creation request is unsuccessful (although the probability that this will occur is vanishingly small).
4. If the state creation request exceeds the state memory allocated to the compartment, sufficient items of state created by the same compartment are freed until enough memory is available to accommodate the new state. When a state item is freed, it is removed from the list of states created by the compartment and the memory cost of the state item no longer counts towards the total cost for the compartment. Note, however, that identical state items may be created by several different compartments, so a state item must not be physically deleted unless the state handler determines that it is no longer required by any compartment. 5. The order in which the existing state items are freed is determined by the state_retention_priority, which is set when the state items are created. The state_retention_priority of 65535 is reserved for locally available states; these states must always be freed first. Apart from this special case, states with the lowest state_retention_priority are always freed first. In the event of a tie, then the state item created first in the compartment is also the first to be freed. The state_retention_priority is always stored on a per-compartment basis as part of the list of state items created by each compartment. In particular, the same state item might have several priority values if it has been created by several different compartments. Note that locally available state items (as described in Section 3.3.3) need not be mapped to any particular compartment. However, if they are created on a per-compartment basis, then they must not interfere with the state created at the request of the remote endpoint. The special state_retention_priority of 65535 is reserved for locally available state items to ensure that this is the case. The UDVM may also explicitly request the state handler to free a specific state item in a compartment. In this case, the state handler deletes the state item from the list of state items created by the compartment (as before the state item itself must not be physically deleted unless the state handler determines that it is not longer required by any compartment). The application should indicate to the state handler when it wishes to close a particular compartment, so that the resources taken by the corresponding state can be reclaimed.
A SigComp message takes one of two forms depending on whether it accesses a state item at the receiving endpoint. The two variants of a SigComp message are given in Figure 3. (The T-bit controls the format of the returned feedback item and is defined in Section 7.1.) 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | 1 1 1 1 1 | T | len | | 1 1 1 1 1 | T | 0 | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | | | : returned feedback item : : returned feedback item : | | | | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | | code_len | : partial state identifier : +---+---+---+---+---+---+---+---+ | | | code_len | destination | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | | | : remaining SigComp message : : uploaded UDVM bytecode : | | | | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | : remaining SigComp message : | | +---+---+---+---+---+---+---+---+ Figure 3: Format of a SigComp message Decompression failure occurs if the SigComp message is too short to contain the expected fields (see Section 8.7 for further details). The fields except for the "remaining SigComp message" are referred to as the "SigComp header" (note that this may include the uploaded UDVM bytecode). Figure 4.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | 0 | returned_feedback_field | | 1 | returned_feedback_length | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | : returned_feedback_field : | | +---+---+---+---+---+---+---+---+ Figure 4: Format of returned feedback item Note that the returned feedback length specifies the size of the returned feedback field (from 0 to 127 bytes). So the total size of the returned feedback item lies between 1 and 128 bytes. The returned feedback item is not copied to the UDVM memory; instead, it is buffered until the UDVM has successfully decompressed the SigComp message. It is then forwarded to the state handler with the rest of the feedback data (see Section 9.4.9 for further details). Section 3.3.3, but it is possible to transmit as few as 6 bytes from the identifier if the sender believes that this is sufficient to match a unique state item at the receiving endpoint. The len field encodes the number of transmitted bytes as follows: Encoding: Length of partial state identifier 01 6 bytes 10 9 bytes 11 12 bytes The partial state identifier is passed to the state handler, which compares it with the most significant bytes of the state_identifier in every currently stored state item. Decompression failure occurs if no state item is matched or if more than one state item is matched.
Decompression failure also occurs if exactly one state item is matched but the state item contains a minimum_access_length greater than the length of the partial state identifier. This prevents especially sensitive state items from being accessed maliciously by brute force guessing of the state_identifier. If a state item is successfully accessed then the state_value byte string is copied into the UDVM memory beginning at state_address. The first 32 bytes of UDVM memory are then initialized to special values as illustrated in Figure 5. 0 7 8 15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDVM_memory_size | 0 - 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_bit | 2 - 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SigComp_version | 4 - 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | partial_state_ID_length | 6 - 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | state_length | 8 - 9 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : reserved : 10 - 31 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Initializing Useful Values in UDVM memory The first five 2-byte words are initialized to contain some values that might be useful to the UDVM bytecode (Useful Values). Note that these values are for information only and can be overwritten when executing the UDVM bytecode without any effect on the endpoint. The MSBs of each 2-byte word are stored preceding the LSBs. Addresses 0 to 5 indicate the resources available to the receiving endpoint. The UDVM memory size is expressed in bytes modulo 2^16, so in particular, it is set to 0 if the UDVM memory size is 65536 bytes. The cycles_per_bit is expressed as a 2-byte integer taking the value 16, 32, 64 or 128. The SigComp_version is expressed as a 2-byte value as per Section 3.3.2. Addresses 6 to 9 are initialized to the length of the partial state identifier, followed by the state_length from the retrieved state item. Both are expressed as 2-byte values.
Addresses 10 to 31 are reserved and are initialized to 0 for Version 0x01 of SigComp. Future versions of SigComp can use these locations for additional Useful Values, so a decompressor MUST NOT rely on these values being zero. Any remaining addresses in the UDVM memory that have not yet been initialized MUST be set to 0. The UDVM then begins executing instructions at the memory address contained in state_instruction (which is part of the retrieved item of state). Note that the remaining SigComp message is held by the decompressor dispatcher until requested by the UDVM. (Note that the Useful Values are only set at UDVM startup; there is no special significance to this memory area afterwards. This means that the UDVM bytecode is free to use these locations for any other purpose a memory location might be used for; it just has to be aware they are not necessarily initialized to zero.)
The UDVM memory is initialized as per Figure 5, except that addresses 6 to 9 inclusive are set to 0 because no state item has been accessed. The UDVM then begins executing instructions at the memory address specified by the destination field. As above, the remaining SigComp message is held by the decompressor dispatcher until needed by the UDVM. Figure 6 gives a detailed view of the interfaces between the UDVM and its environment.
+----------------+ +----------------+ | | Request compressed data | | | |-------------------------------->| | | |<--------------------------------| | | | Provide compressed data | | | | | | | | Output decompressed data | Decompressor | | |-------------------------------->| dispatcher | | | | | | | Indicate end of message | | | |-------------------------------->| | | |<--------------------------------| | | UDVM | Provide compartment identifier | | | | +----------------+ | | | | +----------------+ | | Request state information | | | |-------------------------------->| | | |<--------------------------------| | | | Provide state information | State | | | | handler | | | Make state creation request | | | |-------------------------------->| | | | Forward feedback information | | +----------------+ +----------------+ Figure 6: Interfaces between the UDVM and its environment Note that once the UDVM has been initialized, additional compressed data and state information are only provided at the request of a specific UDVM instruction. This chapter describes the basic features of the UDVM including the UDVM registers and the format of UDVM bytecode. Figure 7.
0 7 8 15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte_copy_left | 64 - 65 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte_copy_right | 66 - 67 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | input_bit_order | 68 - 69 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | stack_location | 70 - 71 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: Memory addresses of the UDVM registers The MSBs of each register are always stored before the LSBs. So, for example, the MSBs of byte_copy_left are stored at Address 64 whilst the LSBs are stored at Address 65. The use of each UDVM register is defined in the following sections. (Note that the UDVM registers start at Address 64, that is 32 bytes after the area reserved for Useful Values. The intention is that the gap, i.e., the area between Address 32 and Address 63, will often be used as scratch-pad memory that is guaranteed to be zero at UDVM startup and is efficiently addressable in operand types reference ($) and multitype (%).)
The P-bit controls the order in which bits are passed from the dispatcher to the INPUT instructions. If set to 0, it indicates that the bits within an individual byte are passed to the INPUT instructions in MSB to LSB order. If it is set to 1, the bits are passed in LSB to MSB order. Note that the input_bit_order register cannot change the order in which the bytes themselves are passed to the INPUT instructions (bytes are always passed in the same order as they occur in the SigComp message). The following diagram illustrates the order in which bits are passed to the INPUT instructions for both cases: MSB LSB MSB LSB MSB LSB MSB LSB +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 2 3 4 5 6 7|8 9 ... | |7 6 5 4 3 2 1 0| ... 9 8| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 0 Byte 1 Byte 0 Byte 1 P = 0 P = 1 Note that after one or more INPUT instructions the dispatcher may hold a fraction of a byte (what used to be the LSBs if P = 0, or, the MSBs, if P = 1). If an INPUT instruction is encountered and the P- bit has changed since the last INPUT instruction, any fraction of a byte still held by the dispatcher MUST be discarded (even if the INPUT instruction requests zero bits). The first bit passed to the INPUT instruction is taken from the subsequent byte. When an INPUT instruction requests n bits of compressed data, it interprets the received bits as an integer between 0 and 2^n - 1. The F-bit and the H-bit specify whether the bits in these integers are considered to arrive in MSB to LSB order (bit set to 0) or in LSB to MSB order (bit set to 1). If the F-bit is set to 0, the INPUT-BITS instruction interprets the received bits as arriving MSBs first, and if it is set to 1, it interprets the bits as arriving LSBs first. The H-bit performs the same function for the INPUT-HUFFMAN instruction. Note that it is possible to set these two bits to different values in order to use different bit orders for the two instructions (certain algorithms actually require this, e.g., DEFLATE [RFC-1951]). (Note that there are no special considerations for changing the F- or H-bit between INPUT instructions, unlike the discard rule for the P-bit described above.)
Decompression failure occurs if an INPUT-BITS or an INPUT-HUFFMAN instruction is encountered and the input_bit_order register does not lie between 0 and 7 inclusive.
To reduce the size of typical UDVM bytecode, each operand for a UDVM instruction is compressed using variable-length encoding. The aim is to store more common operand values using fewer bytes than rarely occurring values. Four different types of operand are available: the literal, the reference, the multitype and the address. Chapter 9 gives a complete list of UDVM instructions and the operand types that follow each instruction. The UDVM bytecode for each operand type is illustrated in Figure 8 to Figure 10, together with the integer values represented by the bytecode. Note that the MSBs in the bytecode are illustrated as preceding the LSBs. Also, any string of bits marked with k consecutive "n"s is to be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the MSBs of n illustrated as preceding the LSBs). The decoded integer value of the bytecode can be interpreted in two ways. In some cases it is taken to be the actual value of the operand. In other cases it is taken to be a memory address at which the 2-byte operand value can be found (MSBs found at the specified address, LSBs found at the following address). The latter cases are denoted by memory[X] where X is the address and memory[X] is the 2- byte value starting at Address X. The simplest operand type is the literal (#), which encodes a constant integer from 0 to 65535 inclusive. A literal operand may require between 1 and 3 bytes depending on its value. Bytecode: Operand value: Range: 0nnnnnnn N 0 - 127 10nnnnnn nnnnnnnn N 0 - 16383 11000000 nnnnnnnn nnnnnnnn N 0 - 65535 Figure 8: Bytecode for a literal (#) operand The second operand type is the reference ($), which is always used to access a 2-byte value located elsewhere in the UDVM memory. The bytecode for a reference operand is decoded to be a constant integer from 0 to 65535 inclusive, which is interpreted as the memory address containing the actual value of the operand.
Bytecode: Operand value: Range: 0nnnnnnn memory[2 * N] 0 - 65535 10nnnnnn nnnnnnnn memory[2 * N] 0 - 65535 11000000 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 9: Bytecode for a reference ($) operand Note that the range of a reference operand is always 0 - 65535 independently of how many bits are used to encode the reference, because the operand always references a 2-byte value in the memory. The third kind of operand is the multitype (%), which can be used to encode both actual values and memory addresses. The multitype operand also offers efficient encoding for small integer values (both positive and negative) and for powers of 2. Bytecode: Operand value: Range: 00nnnnnn N 0 - 63 01nnnnnn memory[2 * N] 0 - 65535 1000011n 2 ^ (N + 6) 64 , 128 10001nnn 2 ^ (N + 8) 256 , ... , 32768 111nnnnn N + 65504 65504 - 65535 1001nnnn nnnnnnnn N + 61440 61440 - 65535 101nnnnn nnnnnnnn N 0 - 8191 110nnnnn nnnnnnnn memory[N] 0 - 65535 10000000 nnnnnnnn nnnnnnnn N 0 - 65535 10000001 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 10: Bytecode for a multitype (%) operand The fourth operand type is the address (@). This operand is decoded as a multitype operand followed by a further step: the memory address of the UDVM instruction containing the address operand is added to obtain the correct operand value. So if the operand value from Figure 10 is D then the actual operand value of an address is calculated as follows: operand_value = (memory_address_of_instruction + D) modulo 2^16 Address operands are always used in instructions that control program flow, because they ensure that the UDVM bytecode is position- independent code (i.e., it will run independently of where it is placed in the UDVM memory).
Section 8.7). To ensure that a SigComp message cannot consume excessive processing resources, SigComp limits the number of "UDVM cycles" allocated to each message. The number of available UDVM cycles is initialized to 1000 plus the number of bits in the SigComp header (as described in Section 7); this sum is then multiplied by cycles_per_bit. Each time an instruction is executed the number of available UDVM cycles is decreased by the amount specified in Chapter 9. Additionally, if the UDVM successfully requests n bits of compressed data using one of the INPUT instructions then the number of available UDVM cycles is increased by n * cycles_per_bit once the instruction has been executed. This means that the maximum number of UDVM cycles available for processing an n-byte SigComp message is given by the formula: maximum_UDVM_cycles = (8 * n + 1000) * cycles_per_bit The reason that this total is not allocated to the UDVM when it is invoked is that the UDVM can begin to decompress a message that has only been partially received. So the total message size may not be known when the UDVM is initialized. Note that the number of UDVM cycles MUST NOT be increased if a request for additional compressed data fails. The UDVM stops executing instructions when it encounters an END- MESSAGE instruction or if decompression failure occurs (see Section 8.7 for further details).
Reasons for decompression failure include the following: 1. A SigComp message contains an invalid header as per Chapter 7. 2. A SigComp message is larger than the decompression_memory_size. 3. An instruction costs more than the number of remaining UDVM cycles. 4. The UDVM attempts to read from or write to a memory address beyond its memory size. 5. An unknown instruction is encountered. 6. An unknown operand is encountered. 7. An instruction is encountered that cannot be processed successfully by the UDVM (for example a RETURN instruction when no CALL instruction has previously been encountered). 8. A request to access some state information fails. 9. A manual decompression failure is triggered using the DECOMPRESSION-FAILURE instruction. If a decompression failure occurs when decompressing a message then the UDVM informs the dispatcher and takes no further action. It is the responsibility of the dispatcher to decide how to cope with the decompression failure. In general a dispatcher SHOULD discard the compressed message (or the compressed stream if the transport is stream-based) and any decompressed data that has been outputted but not yet passed to the application.