Digital Audio Recording Systems: Video PCM formats

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

In 1985, a new consumer video format was launched, called Video 8. Sony expected this format to gradually replace the older Betamax and VHS consumer video formats. The Video 8 format uses a much smaller cassette than older video formats, enabling construction of very small video recorders. Almost all major manufacturers in the field of consumer electronics are supporting this new format.

Audio information can be recorded on Video 8 recorders as either an FM signal, along with the picture, or as a PCM signal written in a section of the tape where no picture information is recorded.

In some recorders, however, PCM data can be recorded in the video area, instead of the picture signal. By doing this, six channels of high-quality audio can be recorded on a tape.

The specification of the Video 8 PCM standard is summarized in Table 1.

Table 1 Specification of Video 8 format

FIG. 1 Video 8 cassette.

FIG. 2 Block diagram of Video 8 signal processing.

FIG. 3 Noise reduction system characteristics.

FIG. 4 Non-linear encoder characteristics.

A/D-D/A conversion

As only 8 bits per channel are used, audio characteristics would be poor if special measures were not taken. These measures include:

• audio compression and expansion for noise reduction purposes;

• 10-bit sampling;

• non-linear quantization by 10-bit to 8-bit compression and expansion.

These are illustrated, in a block diagram, in FIG. 2. The characteristics of the noise reduction (NR) system are shown in FIG. 3, while FIG. 4 shows the characteristics of the non-linear encoder. The upper limit of the frequency response of the system is limited to a maximum of 15 625 Hz, i.e., half of the sampling frequency of 31 250 Hz.

With these measures, typical audio characteristics of Video 8 PCM audio recordings are:

• frequency response: 20-15 000 Hz;

• dynamic range: more than 88 dB;

• sampling frequency: 31.5 kHz;

• quantization: 8-bit non-linear;

• wow and flutter: less than 0.005%.

Description of the format

One track on the section of tape used to record PCM audio information holds 157 blocks of data for a PAL machine and 132 blocks for an NTSC machine. Each block contains eight 8-bit data words, two 8-bit parity words (P and Q), one 16-bit error-detection word, one 8-bit address word and three sync bits. So, one block comprises 107 bits, and each track comprises 16 799 bits in PAL mode and 14 017 bits in NTSC mode.

Error-correction and detection words are added as shown in FIG. 5.

FIG. 5 Error-correction encoding prior to recording.

FIG. 6 Data interleaving in Video 8 PCM.

The error-correction code adopted for Video 8 PCM is a modified cross-interleaved code (MCIC) in which the code is composed of blocks which are related to the video fields. The version used is called improved MCIC, in which ICIC, the initial value necessary for parity calculation, can be any value and has numerous applications, such as identification words.

As eight audio data words are combined with two parity words, the Video 8 system is often called an 8w-2p coding system. A CRCC word is also added as an error detector.

In encoding, the sequence can be expressed as follows:

P-parity sequence

Q-parity sequence

... where n is the block number (0 < n < 157 for PAL recordings, and 0 < n < 132 for NTSC), D is the delay of the P-parity sequence which converts a burst error into random errors (17 for PAL, 1514 for NTSC) and d is the Q-parity sequence delay behind the P-parity sequence (three for PAL and 32 for NTSC).

The error-detection code is a 16-bit CRCC and its polynomial is given by:

In decoding, the pointer method is used, which corrects an erroneous word using a pointer flag.

The redundancy of the Video 8 format is as follows:

there are 8 × 8 = 64 audio data bits and (2 × 8) + 16 = 32 error-correction and detection bits, plus 1 × 8 = 8 address bits So, redundancy R is:

Words are interleaved onto the PCM section of tape as shown in FIG. 6.

FIG. 7 Audio encoding modes.

Table 2 Audio modes available in DV format

DV-PCM format

As of the mid-1990s, the analog 8 mm and Hi8 video formats were initially complemented and now rather superseded by digital video formats like DV and D8. The audio encoding of these new digital video is a logical evolution from the PCM formats used in their analog predecessors.

Recording format

In the DV recording format, one frame of video signal is recorded on 12 tracks for a PAL system and 10 tracks for NTSC. Each track consists of Insert and Track Information (ITI), Audio, Video and Sub Code.

The audio signal is recorded on two audio blocks, each of which consists of six consecutive tracks for a PAL set. For the NTSC set, each block consists of five consecutive tracks.

Encoding modes are defined in each audio block. They are classified by the type of sampling frequency, quantization and the number of channels in the audio block. Four types of audio mode are available in the DV format (Table 2).

FIG. 8 Audio channel allocation.

Table 3 Audio channels FIG. 9 Audio frame modes.

Block and byte allocation

In the 48k, 44.1k and 32k modes, one audio channel signal is recorded in an audio block at its respective sampling frequency.

The encoded data are represented by samples of 16-bit linear.

The 16-bit encoded data are divided into two bytes and form an SD-2ch audio block.

In the 32k-2ch mode, two channels of the audio signal are recorded in an audio block at 32 kHz sampling frequency. The encoded data are expressed by two samples of 12-bit non-linear.

Each pair of 12-bit encoded data is divided into three bytes and forms an SD-4ch audio block, as shown in FIG. 8.

The audio blocks are allocated to channels CH1 and CH2 (Table 3). The track positions for CH1 and CH2 are different for PAL and NTSC systems.

Two different kinds of frame mode are available in DV format: the unlock mode and the lock mode. Those two modes are differentiated by the synchronization relation between the audio signal and the video signal.

Table 4 Lock mode

Table 5 Unlock mode Lock mode

The audio clock used in the sampling process of the audio signal is precisely synchronized to the video clock. The main advantage is that the exact same number of audio samples is recorded per frame. Lock mode is only available for sampling frequencies of 48 and 32 kHz. The number of audio samples per frame is different for PAL and NTSC systems (Table 4).

FIG. 10 Audio sector format.

Table 6 A AUX source pack

Table 7 A AUX source control pack

FIG. 11 DV signal processing.

Unlock mode

In unlock mode, there is no precise synchronization between audio and video signal. Instead of an exact number of samples per frame as in lock mode, the audio sample clock keeps track of the video clock with a variation of maximum ±25 audio samples per frame. Unlock mode is available for every sampling frequency and the variation of audio samples per frame is as shown in Table 5.

A complete frame (Video, Audio and Subcode data) in DV format consists of 175 blocks of 90 bytes. Each block is composed of 2 bytes of synchronization (SYNC), 3 bytes of identification code (ID), a 1-byte header (H), 76 bytes of data and 8 bytes of parity.

The audio sector consists of nine blocks of 76 data bytes, giving a maximum data rate of 12 (10) tracks/frame × 9 blocks/track × 76 bytes/block × 8 bits/byte × 25 (30) frames/s = 1.6416 Mbps for a PAL (NTSC) system. The maximum audio rate of 1.536 Mbps is reached when 48k mode (2ch × 48/1000 × 16) or 32k-2ch mode (4ch × 32/1000 × 12) are used. This leaves 105.6 kbps for audio auxiliary data (AAUX). The audio auxiliary data are the parameters of information of the recorded data and consist of two data types (Tables 6 and 7).

Prev. | Next