Input files read by the encoder and output files written by the decoder consist of 16-bit integer words per each data sample. The byte order in each word depends on the host architecture (e.g. LSB first on PCs, etc.). Both the encoder and the decoder program process complete frames corresponding to multiples of 20 ms. The remaining samples are discarded.
The encoder will pad the last frame to integer multiples of 20ms frames, i.e. n speech frames will be produced from an input file with a length between [(n-1)*20ms+1 sample; n*20ms]. The files produced by the decoder will always have a length of n*20ms.
The encoder program can optionally read in a rate switching profile file which specifies the encoding bitrate for each frame of the input data. The rate switching profile is a binary file, generated by
'gen-rate-profile' tool, which is part of STL 2009, as contained in
ITU-T G.191 [10]. The rate switching profile contains 32-bit integer words where each word represents the encoding bitrate for each particular frame. The rate switching profile is recycled if it contains less entries than the total number of frames in the input file. The rate switching profile can contain EVS primary mode bitrates and AMR-WB IO mode bitrates arbitrarily. I.e. switching between the two modes can be specified by the rate switching profile.
The files produced by the speech/audio encoder/expected by the speech decoder contain an arbitrary number of frames in the following available formats.
The fields have the following size and meaning:
-
Packet size: 32 bit unsigned integer (= 12 + 2 + DATA_LENGTH).
-
Arrival time: 32 bit unsigned integer in ms.
-
RTP header: 96 bits (see RFC 3550), including RTP timestamp and SSRC.
The encoder program can optionally read in a bandwidth switching profile, which specifies the encoding bandwidth for each frame of speech processed. The file is a text file where each line contains "nb_frames B". B specifies the signal bandwidth that is one of the supported four bandwidths, i.e. NB, WB, SWB or FB. And "nb_frames" is an integer number of frames and specifies the duration of activation of the accompanied signal bandwidth B.
The encoder program can optionally read in a configuration file which specifies the values of FEC indicator p and FEC offset o, where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames. Each line of the configuration file contains the values of p and o separated by a space.
The channel-aware configuration file is meant to simulate channel feedback from a receiver to a sender, i.e. the decoder would generate FEC indication and FEC offset values for receiver feedback that correspond to the current transmission channel characteristics, thereby allowing optimization of the transmission by the encoder which applies the FEC offset and FEC indication when in the channel-aware mode.
The decoder can generate a JBM trace file with the –Tracefile switch as a by-product of the decoder operation in case of JBM operation (which is triggered with the –VOIP switch on the decoder side).
The trace file is a CSV file with semi-colon as separator. The trace file starts with one header line that contains the column names in the following order:
rtpSeqNo;rtpTs;rcvTime;playtime;active
For each played out speech frame one entry is written to the trace file. The interval of the playtime values is usually 20ms, but may differ, depending on the JBM operation. Each entry is a line in the trace file that contains values as specified in
Table 2.