Digital Audio--Principles and Concepts: Error Correction (part 1)

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

The advent of digital audio changed audio engineering forever and introduced entirely new techniques. Error correction was perhaps the most revolutionary new technique of them all. With analog audio, there is no opportunity for error correction. If the conveyed waveform is disrupted or distorted, then the waveform is usually irrevocably damaged. It is impossible to exactly reconstruct an infinitely variable waveform. With digital audio, the nature of binary data lends itself to recovery in the event of damage. When a bit is diagnosed as incorrect, it is easy to invert it. To permit this, when audio data is stored or transmitted, it can be specially coded and accompanied by redundancy. This enables the reproduced data to be checked for errors. When an error is detected, further processing can be performed to absolutely correct the error. In the worst case, errors can be concealed by synthesizing new data. In either case, error correction makes digital data transmission and storage much more robust and indeed makes the systems more efficient.

Strong error-correction techniques allow a channel's signal to-noise ratio to be reduced; thus, for example, permitting lower power levels for digital broadcasting. Similarly, error correction relaxes the manufacturing tolerances for mass media such as CD, DVD, and Blu-ray.

However, error correction is more than a fortuitous opportunity; it is also an obligation. Because of high data densities in audio storage media, a petty defect in the manufactured media or a particle of dust can obliterate hundreds or thousands of bits. Compared to absolute numerical data stored digitally, where a bad bit might mean the difference between adding or subtracting figures from a bank account, digital audio data is relatively forgiving of errors; enjoyment of music is not necessarily ruined because of occasional flaws in the output waveform.

However, even so, error correction is mandatory for a digital audio system because uncorrected audible errors can easily occur because of the relatively harsh environment to which most audio media are subjected.

Error correction for digital audio is thus an opportunity to preserve data integrity, an opportunity not available with analog audio, and it is absolutely necessary to ensure the success of digital audio storage, because errors surely occur. With proper design, digital audio systems such as CD, DVD, and Blu-ray can approach the computer industry standard, which specifies an error rate of 10^-12, that is, less than one uncorrectable error in 10¹² (one trillion) bits.

However, less stringent error performance is adequate for most audio applications. Without such protection, digital audio recording would not be viable. Indeed, the evolution of digital audio technology can be measured by the prerequisite advances in error correction.

Sources of Errors

Degradation can occur at every stage of the digital audio recording chain. Quantization error, converter nonlinearity, and jitter can all limit system performance. High-quality components, careful circuit design, and strict manufacturing procedures minimize these degradations. For example, a high-quality analog-to-digital (A/D) converter can exhibit satisfactory low-level linearity, and phase-locked loops can overcome jitter.

The science of error correction mainly attends to stored and transmitted data, because it is the errors presented by those environments that are most severe and least subject to control. For example, optical media can be affected by pit asymmetry, bubbles or defects in substrate, and coating defects. Likewise, for example, transmitted data is subject to errors caused by interfering signals and atmospheric conditions.

Barring design defects or a malfunctioning device, the most severe types of errors occur in the recording or transmission signal chains. Such errors result in corrupted data, and hence a defective audio signal. The most significant cause of errors in digital media is dropouts, essentially a defect in the medium that causes a momentary drop in signal strength. Dropouts can occur in any optical disc and can be traced to two causes: a manufactured defect in the medium or a defect introduced during use. Transmitted signals are vulnerable to disturbances from many types of human-made or natural interference. The effects of noise, multipath interference, crosstalk, weak signal strength, and fading are familiar to all cell phone users.

Defects in storage media or disruptions during transmission can cause signal transitions that are misrepresented as erroneous data. The severity of the error depends on the nature of the error. A single bad bit can be easily corrected, but a string of bad bits may be uncorrectable. An error in the least significant bit of a PCM word may pass unnoticed, but an error in the most significant bit may create a drastic change in amplitude.

Usually, a severe error results in a momentary increase in noise or distortion or a muted signal. In the worst case, a loss of data or invalid data can provoke a click or pop, as the digital-to-analog converter's output jumps to a new amplitude while accepting invalid data.

Errors occur in several modes; thus, classifications have been developed to better identify them. Errors that have no relation to each other are called random-bit errors. These random-bit errors occur singly and are generally easily corrected. A burst error is a large error, disrupting perhaps thousands of bits, and requires more sophisticated correction. An important characteristic of any error correction system is the burst length, that is, the maximum number of adjacent erroneous bits that can be corrected.

Both random and burst errors can occur within a single medium; therefore, an error-correction system must be designed to correct both error types simultaneously.

In addition, because the nature of errors depends on the medium, error-correction techniques must be optimized for the application at hand. Thus, storage media errors, wired and wireless transmission errors must be considered separately. Finally, error-correction design is influenced by the kind of modulation code used to convey the data.

Optical discs can suffer from dropouts resulting from various manufacturing and handling problems. When a master disc is manufactured, an incorrect intensity of laser light, incorrect developing time, or a defect in the photoresist can create badly formed pits. Dust or scratches incurred during the plating or pressing processes, or pinholes or other defects in the discs' reflective coating can all create dropouts. Stringent manufacturing conditions and quality control can prevent many of these dropouts from leaving the factory. Optical discs become dirty or damaged with use. Dust, dirt, and oil can be wiped clean; however, scratches might interfere with the pickup's ability to read data. Although physically small to bulky humans, foreign objects such as those shown in FIG. 1 are large when it is considered that the recorded bit size can be much smaller than 1 µm (micrometer). When cleaning optical media such as a Compact Disc, a clean, soft cloth should be used to wipe the disc radially, that is, across the width of the disc and not around its circumference. Any scratches that result will be perpendicular to tracks and thus easier to correct, but a single scratch along one track could be impossible to correct because of the sustained consecutive loss of data.

FIG. 1 Small physical objects are large in the context of a 1-µm recorded bit size.

Hard-disk magnetic media are enclosed in a sealed, protective canister and are mainly immune to dust and dirt contamination. Moreover, bad sectors are marked and not used for recording; only minimal error correction is needed.

A number of factors conspire to degrade the quality of wired data channels including limited bandwidth and intersymbol interference, attenuation, noise, ringing, reflection, and substandard cables and connectors. For example, as data travels down a length of cable, the cable acts as a low pass filter, reducing bandwidth and adding group delay. A square-wave signal will show a gradual rise and fall time, and if the data rate is too high, the signal might not reach its full logical 1 and 0 levels. In other words, the level actually representing one bit can partly depend on the level of the previous bit; this is intersymbol interference.

Similarly, interference can be caused by signal reflection, echoes from the terminating circuits that can reinforce, or cancel signal levels. Reflections can be caused by impedance mismatches between cables and terminating circuits. Other error sources include external low-frequency noise that can shift data transitions, causing jitter. Radio-frequency (RF) interference must be prevented with proper shielding and grounding. Fiber-optic cables are largely immune to radio-frequency interference and other problems that plague copper cables. The quality of a digital signal can be evaluated from its eye pattern, as discussed in Section 4.

Cellular telephones, digital radio, and other audio technologies using wireless transmission and reception are subject to numerous error conditions. These systems use radio waves traveling through air for communication, and these signals may be attenuated or otherwise unreliable, resulting in poor reception, dropped calls, and dead zones.

Many types of electronic devices emit unwanted electromagnetic signals which can interfere with wireless communications. Adjacent-channel interference is impairment at one frequency caused by a signal on a nearby channel frequency. Co-channel interference is created by other transmitters using the same channel frequency, for example, in nearby cells. Radio signals can be disrupted by multipath interference created when signals are reflected by a large surface such as a nearby building. Signal strength is affected by location distance or placement of antennas in cellular network architectures, and low network density. Weather conditions as well as solar flares and sunspots can affect signal strength and quality. Signal strength can also be reduced by many kinds of obstructions; this commonly occurs in some buildings, subways, and tunnels, as well as on hilly terrain. Some kinds of obstructions are not intuitively obvious. Excessive foliage can block signals; for example, mobile systems operating in the 800-MHz band are not used in forested areas because pine needles can absorb and obstruct signals at that wavelength.

Quantifying Errors

A number of parameters have been devised to quantify the integrity of data. The bit-error rate (BER) is the number of bits received in error divided by the total number of bits received. For example, a BER of 10^-9 specifies one bit error for 10⁹ (1 billion) bits transmitted. In other words, BER is an error count. An optical disc system, for example, may contain error-correction algorithms able to handle a BER of 10^-5 to 10^-4. The BER specifies the number of errors, but not their distribution. For example, a BER value might be the same for a single large error burst, as for several smaller bursts. Thus the BER is not a good indicator for an otherwise reliable channel subject to burst errors. For example, over 80% of the errors in an optical disc might be burst errors.

Taking a different approach, the block-error rate (BLER) measures the number of blocks or frames of data that have at least one occurrence of uncorrected data. In many cases, BLER is measured as an error count per second, averaged over a 10-second interval. For example, the CD standard specifies that a CD can exhibit a maximum of 220 BLER errors per second, averaged over a 10-second interval. These values do not reflect the literal number of errors, but gauge their relative severity. BLER is discussed in more detail below, in the context of the Reed-Solomon error-correction code. The burst-error length (BERL) counts the number of consecutive blocks in error; it can be specified as a count per second.

Because burst errors can occur intermittently, they are difficult to quantify with a bit-error rate. In some cases, the errored second is measured; this tallies each second of data that contains an error. Errors can be described as the number of errored seconds over a period of time, or the time since the last errored second. As with BLER and BERL, this does not indicate how many bits are in error, but rather counts the occasions of error.

When designing error correction for an audio system, benchmarks such as raw BER and BLER must be accurately estimated because they are critical in determining the extent of error correction needed. If the estimates are too low, the system may suffer from uncorrected errors. On the other hand, if the estimates are too high, too much channel capacity will be given to error correction redundancy, leaving relatively less capacity for audio data, and thus overall audio fidelity may suffer. No matter how errors are addressed, it is the job of any system to detect and correct errors, keeping uncorrected errors to a minimum.

Objectives of Error Correction

Many types of error correction exist for different applications. Designers must judge the correctability of random and burst errors, redundancy overhead, probability of misdetection, maximum burst error lengths to be corrected or concealed, and the cost of an encoder and decoder. For digital audio, errors that cannot be corrected are concealed. However, sometimes a misdetected error cannot be concealed, and this can result in an audible click in the audio output. Other design objectives are set by the particular application; for example, optical discs are relatively tolerant of fingerprints because the transparent substrate places them out of focus to the optical pickup.

Because of the varied nature of errors-some predictable and some unpredictable-error-correction systems ultimately must use various techniques to guard against them.

Many storage or transmission channels will introduce errors in the stored or transmitted data; for accurate recovery, it is thus necessary for the data to include redundancy that overcomes the error. However, redundancy alone will not ensure accuracy of the recovered information; appropriate error-detection and correction coding must be used.

Although a perfect error-correction system is theoretically possible, in which every error is detected and corrected, such a system would create an unreasonably high data overhead because of the amount of redundant data required to accomplish it. Thus, an efficient audio error-correction system aims for a low audible error rate after correction and concealment, while minimizing the amount of redundant data and data processing required for successful operation. An error-correction system comprises three operations:

• Error detection uses redundancy to permit data to be checked for validity.

• Error correction uses redundancy to replace erroneous data with newly calculated valid data.

• In the event of large errors or insufficient data for correction, error concealment techniques substitute approximately correct data for invalid data.

When not even error concealment is possible, digital audio systems can mute the output signal rather than let the output circuitry attempt to decode severely incorrect data, and produce severely incorrect sounds.

Error Detection

All error-detection and correction techniques are based on data redundancy. Data is said to be redundant when it is entirely derived from existing data, and thus conveys no additional information. In general, the greater the likelihood of errors, the greater the redundancy required. All information systems rely heavily on redundancy to achieve reliable communication; for example, spoken and written language contains redundancy. If the garbled lunar message "ONC GIAVT LE?P FOR MHNKIND" is received, the information could be recovered. In fact, Claude Shannon estimated that written English is about 50% redundant. Many messages incorporate both the original message as well as redundancy to help ensure that the message is properly understood.

Redundancy is required for reliable data communication. If a data value alone is generated, transmitted once, and received, there is no absolute way to check its validity at the receiving end. At best, a word that differs radically from its neighbors might be suspect. With digital audio, in which there is usually some correlation from one 48,000th of a second to the next, such an algorithm might be reasonable. However, we could not absolutely detect errors or begin to correct them. Clearly, additional information is required to reliably detect errors in received data. Since the additional information usually originates from the same source as the original data, it is subject to the same error-creating conditions as the data itself. The task of error detection is to properly code transmitted or stored information, so that when data is lost or made invalid, the presence of the error can be detected.

In an effort to detect errors, the original message could simply be repeated. For example, each data word could be transmitted twice. A conflict between repeated words would reveal that one is in error, but it would be impossible to identify the correct word. If each word was repeated three times, probability would suggest that the two in agreement were correct while the differing third was in error. Yet the opposite could be true, or all three words could agree and all be in error. Given enough repetition, the probability of correctly detecting an error would be high; however, the data overhead would be enormous. Also, the increased data load can itself introduce additional errors. More efficient methods are required.

Single-Bit Parity

Practical error detection uses techniques in which redundant data is coded so it can be used to efficiently check for errors. Parity is one such method. One early error-detection method was devised by Islamic mathematicians in the ninth century; it is known as "casting out 9s." In this technique, numbers are divided by 9, leaving a remainder or residue. Calculations can be checked for errors by comparing residues. For example, the residue of the sum (or product) of two numbers equals the sum (or product) of the residues. It is important to compare residues, and sometimes the residue of a sum or product residue must be taken. If the residues are not equal, it indicates an error has occurred in the calculation, as shown in FIG. 2. An insider's trick makes the method even more efficient; the sum of digits in a number always has the same 9s residue as the number itself. The technique of casting out 9s can be used to cast out any number, and forms the basis for a binary error-detection method called parity.

FIG. 2 Casting out of 9s and 2s provides simple error detection.

FIG. 3 Parity can be formed through the modulo 2 addition of data bits.

Given a binary number, a residue bit can be formed by casting out 2s. This extra bit is formed when the word is transmitted or stored, and is carried along with the data word. This extra bit, known as a parity bit, permits error detection, but not correction. Rather than cast out 2s, a more efficient algorithm can be used. An even parity bit is formed with two simple rules: if the number of 1s in the data word is even (or zero), the parity bit is a 0; if the number of 1s in the word is odd, the parity bit is a 1. In other words, with even parity, there is always an even number of 1s. To accomplish this, the data bits are added together with modulo 2 addition, as shown in FIG. 3. Thus, an 8-bit data word, made into a 9-bit word with an even parity bit, will always have an even number of 1s (or none). This method results in even parity. Alternatively, by forcing an odd number of 1s, odd parity results. Both methods are functionally identical.

At playback, the validity of the received data word is tested using the parity bit; that is, the received data bits are added together to calculate the parity of the received data.

If the received parity bit and the calculated parity bit are in conflict, then an error has occurred. This is a single-bit detector with no correction ability. With even parity, for example, the function can determine when an odd number of errors (1, 3, 5, and so on) are present in the received data; an even number of errors will not be detected, as shown in FIG. 4. Probability dictates that the error is in the data word, rather than the parity bit itself. However, the reverse could be true.

FIG. 4 Examples of single-bit parity error detection.

In many cases, errors tend to occur as burst errors.

Thus, many errors could occur within each word and single bit parity would not provide reliable detection. By itself, a single-bit parity check code is not suitable for error detection in many digital audio storage or transmission systems.

ISBN

For many applications, simple single-bit parity is not sufficiently robust. More sophisticated error detection codes have been devised to make more efficient use of redundancy. One example of coded information is the International Standard Book Number (ISBN) code found on virtually every book published. No two books and no two editions of the same book have the same ISBN. Even soft and hard-cover editions of a book have different ISBNs.

An ISBN number is more than just a series of numbers.

For example, consider the ISBN number, 0-14-044118-2 (the hyphens are extraneous). The first digit (0) is a country code; for example, 0 is for the United States and some other English-speaking countries. The next two digits (14) is a publisher code. The next six digits (044118) is the book title code. The last digit (2) is particularly interesting; it is a check digit. It can be used to verify that the other digits are correct. The check digit is a modulo 11 weighted checksum of the previous digits. In other words, when the digits are added together in modulo 11, the weighted sum must equal the number's checksum. (To maintain uniform length of 10 digits per ISBN, the Roman numeral X is used to represent the check digit 10.) Given this code, with its checksum, we can check the validity of any ISBN by adding together (modulo 11) the series of weighted digits, and comparing them to the last checksum digit.

Consider an example of the verification of an ISBN code. To form the weighted checksum of a 10-digit number abcdefghij, compute the weighted sum of the numbers by multiplying each digit by its digit position, starting with the leftmost digit:

10a +9b +8c +7d +6e +5f +4g +3h +2i +1j

For the ISBN number 0-14-044118-2, the weighted sum is:

10× 0 + 9× 1 + 8× 4 + 7× 0 + 6× 4 + 5× 4 + 4× 1 + 3× 1 + 2× 8 + 1× 2 = 110

The weighted checksum modulo 11 is found by taking the remainder after dividing by 11:

110/11 = 10, with a 0 remainder.

The 0 remainder suggests that the ISBN is correct. In this way, ISBNs can be accurately checked for errors.

Incidentally, this 10-digit ISBN system (which has the ability to assign 1 billion numbers) is being replaced by a 13-digit ISBN to increase the numbering capacity of the system; it adds a 978 prefix to existing ISBNs, a 979 prefix to new numbers, and retains a checksum. In any case, the use of a weighted checksum compared to the calculated remainder with modulo arithmetic is a powerful way to detect errors. In fact, error-detection codes in general are based on this principle.

Cyclic Redundancy Check Code

The cyclic redundancy check code (CRCC) is an error detection method used in some audio applications because of its ability to detect burst errors in the recording medium or transmission channel. The CRCC is a cyclic block code that generates a parity check word. The bits of a data word can be added together to form a sum of the bits; this forms a parity check word. For example, in 1011011010, the six binary 1s are added together to form binary 0110 (6 in base 10), and this check word is appended to the data word to form the codeword for transmission or storage. As with single-bit parity, any disagreement between the received checksum and that formed from the received data would indicate with high probability that an error has occurred.

The CRCC works similarly, but with a more sophisticated calculation. Simply stated, each data block is divided by an arbitrary and constant number. The remainder of the division is appended to the stored or transmitted data block. Upon reproduction, division is performed on the received word; a zero remainder is taken to indicate an error-free signal, as shown in FIG. 5. A more detailed examination of the encoding and decoding steps in the CRCC algorithm is shown in FIG. 6. A message m in a k-bit data block is operated upon to form an n - k bit CRCC detection block, where n is the length of the complete block.

The original k-bit data block is multiplied by Xn-k to shift the data in preparation to appending the check bits. It is then divided by the generation polynomial g to form the quotient q and remainder r. The transmission polynomial v is formed from the original message m and the remainder r; it is thus a multiple of the generation polynomial g. The transmission polynomial v is then transmitted or stored. The received data u undergoes error detection by calculating a syndrome, in the sense that an error is sign of malfunction or disease. Specifically, the operation creates a syndrome c with modulo 2 addition of received parity bits and parity that is newly calculated from the received message. A zero syndrome shows an error-free condition. A nonzero syndrome denotes an error.

FIG. 5 CRCC in simplified form showing the generation of the remainder. A. Original block of data is divided to produce a remainder. The quotient is discarded. B. The remainder is appended to the data word and both words are transmitted or stored. C. The received data is again divided to produce a remainder, used to check for errors.

FIG. 6 CRCC encoding and decoding algorithms. A. CRCC encoding. B. CRCC decoding and syndrome calculation.

Error correction can be accomplished by forming an error pattern that is the difference between the received data and the original data to be recovered. This is mathematically possible because the error polynomial e divided by the original generation polynomial produces the syndrome as a remainder. Thus, the syndrome can be used to form the error pattern and hence recover the original data. It is important to select the generation polynomial g so that error patterns in the error polynomial e are not exactly divisible by g. CRCC will fail to detect an error if the error is exactly divisible by the generation polynomial.

In practice, CRCC and other detection and correction codewords are described in mathematical terms, where the data bits are treated as the coefficients of a binary polynomial. As we observed in Section 1, the binary number system is one of positional notation such that each place represents a power of 2. Thus, we can write binary numbers in a power of 2 notation. For example, the number 1001011, with MSB (most significant bit) leading, can be expressed as:

1× 26 + 0× 25 + 0× 24 + 1× 23 + 0× 22 + 1× 21 + 1× 20 or:

26 + 23 + 21 + 20

In general, the number can be expressed as:

x6 + x3 + x + 1

With LSB (least significant bit) leading, the notation would be reversed. This polynomial notation is the standard terminology in the field of error correction. In the same way that use of modulo arithmetic ensures that all possible numbers will fall in the modulus, polynomials are used to ensure that all values fall within a given range. In modulo arithmetic, all numbers are divided by the modulus and the remainder is used as the number. Similarly, all polynomials are divided by a modulus polynomial of degree n. The remainder has n coefficients, and is recognizable as a code symbol. Further, just as we desire a modulus that is a prime number (so that when a product is 0, at least one factor is 0), we desire a prime polynomial-one that cannot be represented as the product of two polynomials. The benefit is added structure to devised codes, resulting in more efficient implementation.

Using polynomial notation, we can see how CRCC works. An example of cyclic code encoding is shown in Fig. 7. The message 1001 is written as the polynomial x3 + 1.

So that three parity check bits can be appended, the message is multiplied by x3 to shift the data to the left. The message is then divided by the generation polynomial x3 + x2 + 1. The remainder x + 1 is appended to create the polynomial x6 + x3 + x + 1.

FIG. 7 An example of cyclic code encoding, with message 1001 written as the polynomial x3 + 1. The encoder outputs the original message and a parity word.

x + 1.

A shift register implementation of this encoder is shown in FIG. 8. The register is initially loaded with 0s, and the four message bits are sequentially shifted into the register, and appear at the output. When the last message bit has been output, the encoder's switches are switched, 0s enter the register, and three parity bits are output. Modulo 2 adders are used. This example points up an advantage of this type of error-detection encoding: the implementation can be quite simple.

The larger the data block, the less redundancy results, yet mathematical analysis shows that the same error detection capability remains. However, if random or short duration burst errors tend to occur frequently, then the integrity of the detection is decreased and shorter blocks might be necessitated. The extent of error detectability of a CRCC code can be summarized. Given a k-bit data word with m (where m = n - k) bits of CRCC, a codeword of n bits is formed, and the following are true:

• Burst errors less than or equal to m bits are always detectable.

• Detection probability of burst errors of m + 1 bits is 1 - 2-m + 1.

• Detection probability of burst errors longer than m + 1 bits is 1 - 2-m. (These first three items are not affected by the length n of the codeword.)

• Random errors up to three consecutive bits long can be detected.

FIG. 8 An implementation of a cyclic encoder using shift registers. Modulo 2 (XOR) adders are used.

CRCC is quite reliable. For example, if 16 parity check bits are generated, the probability of error detection is 1 - 2-16 or about 99.99%. Ultimately, the medium determines the design of the CRCC and the rest of the error-correction system. The power of the error-correction processing following the CRCC also influences how accurate the CRCC detection must be. The CRCC is often used as an error pointer to identify the number and extent of errors prior to other error-correction processing. In some cases, to reduce codeword size, a shortened or punctured code is used. In this technique, some of the leading bits of the codeword are omitted.

The techniques used in the CRCC algorithm apply generally to cyclic error-detection and error-correction algorithms. They use codewords that are polynomials that normally result in a zero remainder when divided by a generation polynomial. To perform error checking on received data, division is performed; a zero remainder syndrome indicates that there is no error. A nonzero remainder syndrome indicates an error. The nonzero syndrome is the error polynomial divided by the check polynomial. The syndrome can be used to correct the error by multiplying the syndrome by the check polynomial to obtain the error polynomial. The error in the received data appears as the error polynomial added to the original codeword. The correct original codeword can be obtained by subtracting (adding in modulo 2) the error polynomial from the received polynomial.

Error-Correction Codes

With the help of redundant data, it is possible to correct errors that occur during the storage or transmission of digital audio data. In the simplest case, data is merely duplicated. For example, instead of writing only one data track, two tracks of identical data could be written. The first track would normally be used for playback, but if an error was detected through parity or other means, data could be taken from the second track. To alleviate the problem of simultaneously erroneous data, redundant samples could be displaced with respect to each other in time.

In addition, channel coding can be used beneficially. For example, 3-bit words can be coded as 7-bit words, selected from 27 possible combinations to be as mutually different as possible. The receiver examines the 7-bit words and compares them to the eight allowed codewords.

Errors could be detected, and the words changed to the nearest allowed codeword, before the codeword is decoded to the original 3-bit sequence. Four correction bits are required for every three data bits; the method can correct a single error in a 7-bit block. This minimum length concept is important in more sophisticated error-correction codes.

Although such simple methods are workable, they are inefficient because of the data overhead that they require.

A more enlightened approach is that of error-correction codes, which can achieve more reliable results with less redundancy. In the same way that redundant data in the form of parity check bits is used for error detection, redundant data is used to form codes for error correction.

Digital audio data is encoded with related detection and correction algorithms. On playback, errors are identified and corrected by the detection and correction decoder.

Coded redundant data is the essence of all correction codes; however, there are many types of codes, different in their designs and functions.

The field of error-correction codes is a highly mathematical one. Many types of codes have been developed for different applications. In general, two approaches are used: block codes using algebraic methods, and convolutional codes using probabilistic methods. Block codes form a coded message based solely on the message parsed into a data block. In a convolutional code, the coded message is formed from the message present in the encoder at that time as well as previous message data. In some cases, algorithms use a block code in a convolutional structure known as a Cross Interleave Code. For example, such codes are used in the CD format.

Block Codes

Block error-correction encoders assemble a number of data words to form a block and, operating over that block, generate one or more parity words and append them to the block. During decoding, an algorithm forms a syndrome word that detects errors and, given sufficient redundancy, corrects them. Such algorithms are effective against errors encountered in digital audio applications. Error correction is enhanced by interleaving consecutive words. Block codes base their parity calculations on an entire block of information to form parity words. In addition, parity can be formed from individual words in the block, using single-bit parity or a cyclic code. In this way, greater redundancy is achieved and correction is improved. For example, CRCC could be used to detect an error, then block parity used to correct the error.

A block code can be conceived as a binary message consolidated into a block, with row and column parity. Any single-word error will cause one row and one column to be in error; thus, the erroneous data can be corrected. For example, a message might be grouped into four 8-bit words (called symbols). A parity bit is added to each row and a parity word added to each column, as shown in Fig. 9. At the decoder, the data is checked for correct parity, and any single-symbol error is corrected. In this example, bit parity shows that word three is in error, and word parity is used to correct the symbol. A double-word error can be detected, but not corrected. Larger numbers of errors might result in misdetection or miscorrection.

FIG. 9 An example of block parity with row parity bits and a column parity word.

Block correction codes use many methods to generate the transmitted codeword and its parity; however, they are fundamentally identical in that only information from the block itself is used to generate the code. The extent of the correction capabilities of block correction codes can be simply illustrated with decimal examples. Given a block of six data words, a 7th parity word can be calculated by adding the six data words. To check for an error, a syndrome is created by comparing (subtracting in the example) the parity (sum) of the received data with the received parity value. If the result is zero, then most probably no error has occurred, as shown in FIG. 10A. If one data word is detected and the word is set to zero, a condition called a single erasure, a nonzero syndrome indicates that; furthermore, the erasure value can be obtained from the syndrome. If CRCC or single-bit parity is used, it points out the erroneous word, and the correct value can be calculated using the syndrome, as shown in FIG. 10B. Even if detection itself is in error and falsely creates an error pointer, the syndrome yields the correct result, as shown in FIG. 10C. Such a block correction code is capable of detecting a one-word error, or making one erasure correction, or correcting one error with a pointer. The correction ability depends on the detection ability of pointers. In this case, unless the error is identified with a pointer, erasure, or CRCC detection, the error cannot be corrected.

FIG. 10 Examples of single-parity block coding. A. Block correction code showing no error condition. B. Block correction code showing a correction with a pointer. C. Block correction code showing a false pointer.

For enhanced performance, two parity words can be formed to protect the data block. For example, one parity word might be the sum of the data and the second parity word the weighted sum as shown in FIG. 11A. If any two words are erroneous and marked with pointers, the code provides correction. Similarly, if any two words are marked with erasure, the code can use the two syndromes to correct the data. Unlike the single parity example, this double parity code can also correct any one-word error, even if it is not identified with a pointer, as shown in Fig. 11B. This type of error correction is well-suited for audio applications.

FIG. 11 Examples of double-parity block coding. A. Block correction code with double parity showing no error condition. B. Block correction code with double parity showing a single-error correction without a pointer.

Hamming Codes

Cyclic codes such as CRCC are a subclass of linear block codes, which can be used for error correction. Special block codes, known as Hamming codes, create syndromes that point to the location of the error. Multiple parity bits are formed for each data word, with unique encoding. For example, three parity check bits (4, 5, and 6) might be added to a 4-bit data word (0, 1, 2, and 3); seven bits are then transmitted. For example, suppose that the three parity bits are uniquely defined as follows: parity bit 4 is formed from modulo 2 addition of data bits 1, 2, and 3; parity bit 5 is formed from data bits 0, 2, and 3; and parity bit 6 is formed from data bits 0, 1, and 3. Thus, the data word 1100, appended with parity bits 110, is transmitted as the 7-bit codeword 1100110. A table of data and parity bits is shown in FIG. 12A.

This algorithm for calculating parity bits is summarized in FIG. 12B. An error in a received data word can be located by examining which of the parity bits detects an error. The received data must be correctly decoded; therefore, parity check decoding equations must be written.

These equations are computationally represented as a parity check matrix H, as shown in FIG. 12C. Each row of H represents one of the original encoding equations. By testing the received data against the values in H, the location of the error can be identified. Specifically, a syndrome is calculated from the modulo 2 addition of the parity calculated from the received data and the received parity. An error generates a 1; otherwise a 0 is generated.

The resulting error pattern is matched in the H matrix to locate the erroneous bit. For example, if the codeword 1100110 is transmitted, but 1000110 is received, the syndromes will detect the error and generate a 101 error pattern. Matching this against the H matrix, we see that it corresponds to the second column; thus, bit 1 is in error, as shown in FIG. 12D. This algorithm is a single-error correcting code; therefore, it can correctly identify and correct any single-bit error.

Returning to the design of this particular code, we can observe another of its interesting properties. Referring again to FIG. 12A, recall that the 7-bit data words each comprise four data bits and three parity bits. These seven bits provide 128 different encoding possibilities, but only 16 of them are used; thus, 112 patterns are clearly illegal, and their presence would denote an error. In this way, the patterns of bits themselves are useful. We also may ask the question: How many bits must change value for one word in the table to become another word in the table? For example, we can see that three bits in the word 0101010 must change for it to become 0110011. Similarly, we observe that any word in the table would require at least three bit changes to become any other word. This is important because this dissimilarity in data corresponds to the code's error-correction ability. The more words that are disallowed, the more robust the detection and correction.

For example, if we receive a word 1110101, which differs from a legal word 1010101 by one bit, the correct word can be determined. In other words, any single-bit error leaves the received word closer to the correct word than any other.

The number of bits that one legal word must change to become another legal word is known as the Hamming distance, or minimum distance. In this example, the Hamming distance is 3. Hamming distance thus defines the potential error correctability of a code. A distance of 1 determines simple uniqueness of a code. A distance of 2 provides single-error detectability. A distance of 3 provides single-error correctability or double-error detection. A distance of 4 provides both single-error correction and double-error detection, or triple-error detection alone. A distance of 5 provides double-error correction. As the distance increases, so does the correctability of the code.

FIG. 12 With Hamming codes, the syndrome points to the error location. A. In this example with four data bits and three parity bits, the code has a Hamming distance of 3. B. Parity bits are formed in the encoder. C. Parity-check matrix in the decoder. D. Single-error correction using a syndrome to point to the error location.

The greater the correctability required, the greater the distance the code must possess. In general, to detect td number of errors, the minimum distance required is greater than or equal to td + 1. In the case of an erasure e (when an error location is known and set to zero), the minimum distance is greater than or equal to e + 1. To correct all combinations of tc or fewer errors, the minimum distance required is greater than or equal to 2tc + 1. For a combination of error correction, the minimum distance is greater than or equal to td + e + 2tc + 1. These hold true for both bit-oriented and word-oriented codes. If m parity blocks are included in a block code, the minimum distance is less than or equal to m + 1. For the maximum-distance separable (MDS) codes such as B-adja-cent and Reed- Solomon codes, the minimum distance is equal to m + 1.

Block codes are notated in terms of the input relative to the output data. Data is grouped in symbols; the smallest symbol is one-bit long. A message of k symbols is used to generate a larger n-bit symbol. Such a code is notated as (n, k). For example, if 12 symbols are input to an encoder and 20 symbols are output, the code would be (20,12). In other words n - k, or eight parity symbols are generated.

The source rate R is defined as k/n; in this example, R = 12/20.

cont. to part 2 >>

Prev. | Next