is reprinted from the Fraunhofer
MPEG Audio Layer-3
In 1987, the Fraunhofer IIS-A started
to work on perceptual audio coding in the framework of the EUREKA project
EU147, Digital Audio Broadcasting (DAB). In a joint cooperation with the
University of Erlangen (Prof. Dieter Seitzer), the Fraunhofer IIS-A finally
devised a very powerful algorithm that is standardized as ISO-MPEG Audio
Layer-3 (IS 11172-3 and IS 13818-3).
Without data reduction, digital audio signals typically consist
of 16 bit samples recorded at a sampling rate more than twice the actual
audio bandwidth (e.g. 44.1 kHz for Compact Disks). So you end up with
more than 1.400 Mbit to represent just one second of stereo
music in CD quality. By using MPEG audio coding, you may shrink down
the original sound data from a CD by a factor of 12, without losing sound
quality. Factors of 24 and even more still maintain a sound quality that
is significantly better than what you get by just reducing the sampling
rate and the resolution of your samples. Basically, this is realized by
perceptual coding techniques addressing the perception of sound
waves by the human ear.
Using MPEG audio, one may achieve a typical data reduction of
||by Layer 1 (corresponds
with 384 kbps for a stereo signal),
||by Layer 2 (corresponds
with 256..192 kbps for a stereo signal),
||by Layer 3 (corresponds
with 128..112 kbps for a stereo signal),
still maintaining the original CD sound quality.
By exploiting stereo effects and by limiting the audio bandwidth, the coding
schemes may achieve an acceptable sound quality at even lower bitrates.
MPEG Layer-3 is the most powerful member of the MPEG audio coding family.
For a given sound quality level, it requires the lowest bitrate - or for
a given bitrate, it achieves the highest sound quality.
Some typical performance data of MPEG
||8 kbps *
|better than short-wave
|better than AM radio
|similar to FM radio
|*) Fraunhofer uses a
non-ISO extension of MPEG Layer-3 for enhanced performance ("MPEG
In all international listening tests,
MPEG Layer-3 impressively proved its superior performance, maintaining
the original sound quality at a data reduction of 1:12 (around 64 kbit/s
per audio channel). If applications may tolerate a limited bandwidth of
around 10 kHz, a reasonable sound quality for stereo signals can be achieved
even at a reduction of 1:24.
For the use of low bit-rate audio coding schemes in broadcast applications
at bitrates of 60 kbit/s per audio channel, the ITU-R recommends MPEG
Layer-3. (ITU-R doc. BS.1115)
The filter bank used in MPEG Layer-3 is a hybrid filter bank which consists
of a polyphase filter bank and a Modified Discrete Cosine Transform (MDCT).
This hybrid form was chosen for reasons of compatibility to its predecessors,
Layer-1 and Layer-2.
The perceptual model is mainly determining the quality of a given encoder
implementation. It uses either a separate filter bank or combines the
calculation of energy values (for the masking calculations) and the main
filter bank. The output of the perceptual model consists of values for
the masking threshold or the allowed noise for each coder partition. If
the quantization noise can be kept below the masking threshold, then the
compression results should be indistinguishable from the original signal.
Joint stereo coding takes advantage of the fact that both channels of
a stereo channel pair contain far the same information. These stereophonic
irrelevancies and redundancies are exploited to reduce the total bitrate.
Joint stereo is used in cases where only low bitrates are available but
stereo signals are desired.
Quantization and Coding
A system of two nested iteration loops is the common solution for quantization
and coding in a Layer-3 encoder.
Quantization is done via a power-law quantizer. In this way, larger values
are automatically coded with less accuracy and some noise shaping is already
built into the quantization process.
The quantized values are coded by Huffman coding. As a specific method
for entropy coding, hufman coding is lossless. Thus is called noiseless
coding because no noise is added to the audio signal.
The process to find the optimum gain and scalefactors for a given block,
bit-rate and output from the perceptual model is usually done by two nested
iteration loops in an analysis-by-synthesis way:
- Inner iteration loop (rate loop)
The Huffman code tables assign shorter code words to (more frequent)
smaller quantized values. If the number of bits resulting from the coding
operation exceeds the number of bits available to code a given block
of data, this can be corrected by adjusting the global gain to result
in a larger quantization step size, leading to smaller quantized values.
This operation is repeated with different quantization step sizes until
the resulting bit demand for Huffman coding is small enough. The loop
is called rate loop because it modifies the overall coder rate until
it is small enough.
- Outer iteration loop (noise control/distortion
To shape the quantization noise according to the masking threshold,
scalefactors are applied to each scalefactor band. The systems starts
with a default factor of 1.0 for each band. If the quantization noise
in a given band is found to exceed the masking threshold (allowed noise)
as supplied by the perceptual model, the scalefactor for this band is
adjusted to reduce the quantization noise. Since achieving a smaller
quantization noise requires a larger number of quantization steps and
thus a higher bitrate, the rate adjustment loop has to be repeated every
time new scalefactors are used. In other words, the rate loop is nested
within the noise control loop. The outer (noise control) loop is executed
until the actual noise (computed from the difference of the original
spectral values minus the quantized spectral values) is below the masking
threshold for every scalefactor band (i.e. critical band).