Audio Coding
C.1.4 SMPTE 302M: Mapping of AES3 Data into an MPEG-2 Transport Stream
Though not specifically a coding standard, this does define a method of carrying
AES3 uncompressed audio streams in an MPEG-2 transport stream. The AES
stream can contain non-audio data as well as uncompressed audio. This
mechanism can be used to carry Linear PCM audio, or Dolby E data.
C.1.5 Dolby E
Dolby E was developed by Dolby Laboratories. It allows up to 8 channels of Dolby E
compressed audio to be distributed over an existing 2 channel digital infrastructure.
The compression applied is less than that for consumer codecs (i.e. Dolby Digital),
so is better quality and the audio can be decompressed and re-compressed several
times. The Dolby E stream can also include metadata and timecode.
Dolby E frame duration is either equal to or double the duration of a video frame.
For interlaced formats the duration matches a video frame, but is double the frame
duration for progressive formats. This facilitates easier editing of video and audio in
the digital domain. Dolby E frames are generally aligned to video frames.
C.1.6 AAC (Advanced Audio Coding)
AAC was designed to be non-backwards-compatible to be able to achieve high
audio quality at a rate of 64 kbps/channel for 5.1 systems.
AAC consists of several tools other than those shown in the basic model:
• Pre-processing – signal split into 4 equally sized frequency bands and their level
adjusted.
• Filter bank – MDCT filter is used.
• Temporal Noise Shaping (TNS) – pre-echo removal.
• Intensity stereo coding / coupling stereo coding.
• Prediction – intensity difference between the previous and current frames
coding.
There are three profiles (or versions) available:
• Main (MP) - includes all of the tools that improve encoding efficiency.
• Low Complexity (LC) - used for broadcast, which allows the pre-processing
and prediction tools to be discarded and the TNS complexity to be reduced.
Some tools are not allowed and others are restricted to enable this algorithm to
fit into the broadcast space.
• Scalable Sample Rate (SSR) - maximises temporal resolution (getting the high
frequency sounds at the right time) at the expense of coding efficiency.