Fraunhofer IIS-A Logo
 

Fraunhofer pages
Fraunhofer
mpeg basics
mp3 FAQ
mp3 details

harmonic cycle
music intro
mp3
midi
links

 

 

 

 

 

 

 

 

 

this article is reprinted from the Fraunhofer website

Basics about MPEG Perceptual Audio Coding

Purpose Quality How does it work? Psychoacoustics

The purpose of audio compression

There is a lot of confusion surrounding the terms audio compression, audio encoding, and audio decoding. This section will give you an overview what audio coding (another one of these terms...) is all about.
 
Up to the advent of audio compression, high-quality digital audio data took a lot of hard disk space to store (or channel bandwidth to transmit).
 
Let us go through a short example. You want to sample your favorite 1-minute song and store it on your harddisk. Because you want CD quality, you sample at 44.1 kHz, stereo, with 16 bits per sample.
 
44.100 Hz means that you have 44.100 values per second coming in from your sound card (or input file). Multiply that by two because you have two channels. Multiply by another factor of two because you have two bytes per value (that's what 16 bit means). The song will take up

44.100 samples/s * 2 channels * 2 bytes/sample * 60 s/min = around 10 MBytes

of storage space on your harddisk. If you wanted to download that over the internet, given an average 28.8 modem, it would take you

10.000.000 bytes * 8 bits/byte / (28.800 bits/s * 60 s/min) = around 49 minutes.

Just to download one minute of stereo music!

Digital audio coding, which - in this context - is synonymously called digital audio compression as well, is the art of minimizing storage space (or channel bandwidth) requirements for audio data. Modern perceptual audio coding techniques (like MPEG Layer-3 or MPEG-2 AAC) exploit the properties of the human ear (the perception of sound) to achieve a size reduction by a factor of 12 with little or no perceptible loss of quality.
 
Therefore, such schemes are the key technology for high quality low bit-rate applications, like soundtracks for CD-ROM games, solid-state sound memories, Internet audio, digital audio broadcasting systems, and the like.

 

Compression ratios, bitrate and quality

It has not been explicitly mentioned up to now: What you end up with after encoding and decoding is not the same sound file anymore: All superfluous information has been squeezed out, so to say. (More precisely: the redundant and irrelevant parts of the sound signal.) The reconstructed WAVE file differs from the original WAVE file, but it will sound the same - more or less, depending on how much compression had been performed on it.
 
Because compression ratio is a somewhat unwieldy measure, experts use the term bitrate when speaking of the strength of compression. Bitrate denotes the average number of bits that one second of audio data will consume. The usually units here are kbps, which is kbit per second, or 1000 bits/s.
 
For a digital audio signal from a CD, the bit-rate is 1411.2 kbps. With MPEG-2 AAC, CD-like sound quality is achieved at 96 kbps.

 

How does it work?

Audio compression really consists of two parts. The first part, called encoding, transforms the digital audio data that resides, say, in a WAVE file, into a highly compressed form called bitstream (or coded audio data). To play the bitstream on your soundcard, you need the second part, called decoding. Decoding takes the bitstream and reconstruct it to a WAVE file.
 
Highest coding efficiency is achieved with algorithms exploiting signal redundancies and irrelevancies in the frequency domain based on a model of the human auditory system.
 
All coders use the same basic structure. The coding scheme can be described as "perceptual noise shaping" or "perceptual subband / transform coding". The encoder analyzes the spectral components of the audio signal by calculating a filterbank (transform) and applies a psychoacoustics model to estimate the just noticeable noise-level. In its quantization and coding stage, the encoder tries to allocate the available number of data bits in a way to meet both the bitrate and masking requirements.
 
The decoder is much less complex. Its only task is to synthesize an audio signal out of the coded spectral components.

 

 
   

Psychoacoustics

The term psychoacoustics describes the characteristics of the human auditory system on which modern audio coding technology is based.
 
The sensitivity of the human auditory systems for audio signals is one of its most significant characteristics. It varies in the frequency domain, e.g. the sensitivity of the human auditory system is high for frequencies between 2.5 and 5 kHz and decreases beyond and below this frequency band. The sensitivity is represented by the Threshold In Quiet. Any tone below this threshold will not be perceived.
 
The most important psychoacoustics fact is the masking effect of spectral sound elements in an audio signal like tones and noise. For every tone in the audio signal a masking threshold can be calculated. If another tone lies below this masking threshold, it will be masked by the louder tone and remains inaudible, too.

 

 
 

masking

    These inaudible elements of an audio signal are irrelevant for the human perception and thus can be eliminated by the encoder.
 
For the audio quality of a coded and decoded audio signal the quality of the psychoacoustics model used by an audio encoder is of prime importance. The audio coding schemes developed by Fraunhofer engineers belong to the best worldwide.

 
 
    Copyright ©1998-2001 Fraunhofer-Gesellschaft