This clause introduces the concepts and mechanisms involved in the compression and decompression of a stream of data.
Central to the compression of a stream of data and the subsequent recovery of the original data is the that both sender and receiver have information that not only describes the content of the data stream, but how the stream is encoded.
For example, a simple rule such as "it's 8 bit data" is enough to transport any character value in the range 0 to 255 with 8 bits being required for each and every character. In contrast if both sender and receive know that some characters are more frequent than others, then the more frequent might be encoded in fewer bits while the less frequent in more - resulting in a net reduction of the total number of bits used to express the data stream.
This knowledge of the nature of the data stream can be established in two ways. Either both sender and receiver can agree some key aspects of the data stream prior to it being processed or key aspects of the data can be garnered dynamically during its processing.
The disadvantage of an approach based on "prior information" is that it must be known. It can either be carried as a header to the data stream, in which case it adds to the net size of the compressed stream. Or it can be fixed and known to the (de)compression algorithm itself in which case compression performance degrades as a given stream diverges in nature from these fixed and known states. In contrast, the disadvantage of "dynamic information" is that it must be discovered; typically this means a greater processing requirement for the (de)compressor. It also implies that compression performance is initially poor as the algorithm has to "learn" about the data stream before it can apply this knowledge. It will also require greater working memory to store its knowledge about the data stream.
The choice of compression algorithms is always a balancing of compression rate (in terms of fewer output bits), working memory requirements of the (de)compressor and CPU bandwidth. For the compression of SMS messages, there is the additional requirement that it should work well (in terms of compression rate) even on short data streams.
Compression / Decompression is an optional feature but when implemented, the only mandatory requirement is 'Raw Untrained Dynamic Huffman' . The default initialisation for the Huffman Encoder / Decoder operating in the Raw Untrained Dynamic Huffman mode are defined in annex R. (See also subclause 4.1.)
i.e. There is no need for any pre-defined attributes such as language dependency to be included. This is of particular significance for entities such as an MS which may have memory storage constraints.
The present document introduces the concepts and mechanisms involved in the compression and decompression of a stream of data.
The following documents contain provisions which, through reference in this text, constitute provisions of the present document.
References are either specific (identified by date of publication, edition number, version number, etc.) or non specific.
For a specific reference, subsequent revisions do not apply.
For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same Release as the present document.
For the purposes of the present document, the following abbreviations apply.
Compressed Data Stream
Compressed Data Stream Length
Character Group ID
Compression Language Context
Huffman initialization ID
Keyword Dictionary ID