|Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting
Departments | Features | ADs | Equipment | Music/Recordings | History
By KENNETH GUNDRY AND JOSEPH HULL
Kenneth Gundry, Dolby Laboratories' Principal Staff Engineer, has been with the company since 1971 and is currently in charge of developing a Dolby S-type encoder for use in production of prerecorded cassettes. He is also the inventor of the Dolby HX (headroom-extension) system and co-developer of the Dolby ADM (adaptive delta modulation) digital system.
Joseph Hull, resident wordsmith at Dolby Laboratories, originally joined the company in 1978, after working as Director of Communications at Advent. As a freelancer, he wrote manuals and product literature for Boston Acoustics, Cambridge Sound Works, Kloss Video, Lucasfilm, and Mitsubishi.
The authors are indebted to their colleagues Stan Cossette, who has spearheaded the development of Dolby S-type, and Ray Dolby, who conceived the principles upon which it is based.
Since the introduction of the Dolby SR (spectral recording) process in 1986, the professional music recording, broadcast, and cinema industries have equipped more than 35,000 analog audio channels worldwide with this re cording enhancement system. Dolby S-type, a new noise-reduction system from Dolby Laboratories for improving the popular home audio cassette, uses several of Dolby SR's proven techniques to achieve similar ends increased headroom, lowered distortion, and greatly reduced noise. Dolby S-type is less complex and less costly than Dolby SR.
Among other reasons, this is possible because home listening levels are lower than those in studios, and because the spectral content of cassette noise differs from that of open-reel tape. However, both systems share such objectives as freedom from audible side effects, and are based upon similar principles of operation.
Principle of Least Treatment
Complementary noise-reduction systems, such as those developed by Dolby Laboratories, boost the signal as it is recorded (compression) and then reduce the boosted signal by the same amount (expansion) in play back; tape noise is also reduced by the same amount. The original signal theoretically survives the complementary process unscathed, as opposed to playback-only NR systems, where the original signal is inevitably damaged in the attempt to remove noise retroactively.
The companding action need not be the same at all frequencies. For example, in cassette recording, tape hiss so predominates that it is desirable to concentrate the noise-reduction action at higher frequencies. Yet whatever the spectral effect of the action, it must be confined to lower level signals to prevent overloading the medium (e.g., saturating magnetic tape) when high-level signals occur.
At first glance, designing a system to operate only at lower levels makes sense, because high-level signals are assumed to mask noise. Wide-band companders, which boost the entire frequency spectrum at low levels upon recording and lower it at playback, work on this assumption. Unfortunately, it is not entirely valid.
On quiet signals, noise reduction is indeed effective, because full recording boost is applied (Fig. 1A). However, because it is necessary to prevent overload by not boosting higher level signals as they are recorded, these systems allow the noise level to go up upon playback as signals get louder, an effect called noise modulation (Fig. 1B). The higher noise can be heard under certain circumstances, because it is only at and near a signal's frequency that masking occurs. If the music is loud in only part of the spectrum, noise will be heard in the other parts, where there is neither masking nor boost (i.e., no NR effect). The result is annoying changes in noise level concurrent with changes in signal level.
The ideal noise-reduction system, on the other hand, would act wherever signals fall below a certain threshold, even when there are loud signals elsewhere in the spectrum.
Ideally, with a loud rap on a bass drum, there would be no record boost on the drum itself, to prevent overloading the recording medium. But there would be full boost, and there fore effective noise reduction upon playback, over the rest of the spectrum.
We call the application of constant gain wherever there are no high-level signals, even in the presence of such signals else where in the spectrum, "the principle of least treatment." Reductions in record boost to prevent overload should be confined just to those parts of the spectrum where loud signals, and thus natural masking, occur (Fig. 2A). This results in audibly consistent noise reduction. By contrast, in the presence of any loud signal, a wide-band compander reduces the record boost throughout the spectrum, resulting in audible changes in the NR effect (Fig. 2B).
The Dolby A-type professional noise-reduction system strives for the ideal by splitting the spectrum into four bands with independently acting companders. Thus, with a loud bass drum, Dolby A-type NR does not operate in the low-frequency band where masking occurs, but does act in the higher frequency bands where there is no masking.
In Dolby B-type and C-type NR, a single companding band of frequencies slides up out of the way of the bass drum, keeping the NR effect active at higher frequencies where tape hiss is audible. In Dolby S-type and Dolby SR, a combination of fixed and sliding bands, along with other new developments, results in the closest adherence yet to the principle of least treatment.
Benefits of Least Treatment
The major benefit of adhering to the principle of least treatment is a better recording system, virtually free of such side effects as noise modulation. However, the fact that high-level signals have little effect on low level signals has significant additional ad vantages in the real world, where encoded recordings are subject to decoding errors.
Decoding errors can be divided into two categories. The first we might call "inadvertent," that is, errors resulting from frequency response or level changes introduced be tween original encoding and ultimate decoding with the same complementary system.
These are the kind of errors that occur, for example, when using a tape formulation for which the original recorder was not optimized. The other category we might call "de liberate," that is, errors that would result, say, from playing back Dolby S-type cassettes on a machine that is equipped only with Dolby B-type NR.
As a result of its adherence to the least-treatment principle, Dolby S-type is robust. If a tape is made on a recorder with less than perfect response, the listener is unlikely to notice anything wrong beyond the original imperfection itself (if indeed it is audible). If a tape is recorded with S-type and played with B-type, a critical listener may notice a reduction in dynamic range, which may even be desirable in a noisy environment such as an automobile. However, there is virtually no distracting "pumping" or other dynamic artifacts. Minimizing the effect of high-level signals on low-level signals, which eliminates the principal mechanism by which the ear detects the use of level-sensitive processing, may well become a factor, in the software industry's consideration of releasing Dolby S-type prerecorded cassettes.
"Action substitution," which was applied first in Dolby SR and is now being used in Dolby S-type, is a new development enabling closer adherence to the principle of least treatment. It results from combining both fixed-band and sliding-band techniques in a way that maintains their advantages while mitigating their disadvantages.
Figure 3A illustrates a sliding-band system. When high-level signals, if any, are relatively low in frequency, the band assumes the quiescent characteristic represented by the solid curve. If a higher frequency signal then comes along at a level high enough to require less boost, the band must slide up considerably, even to achieve only 2 dB less boost as shown (dashed curve). Thus, considerable noise reduction below the dominant signal's frequency is lost (as shown by the shaded area)-a disadvantage. How ever, above that frequency, the NR effect is essentially unchanged-an advantage.
Figure 3B illustrates a fixed-band system having the same quiescent characteristic, again represented by the solid curve. When, as above, the dominant signal is loud enough to require less boost, there is an overall reduction of boost (dashed curve) and an equivalent loss of NR effect (shaded area). This means that unlike the sliding band, the fixed-band system causes a loss of NR effect above the dominant frequency a disadvantage. However, there is significantly less loss of noise-reduction effect below the dominant frequency than with the sliding band-an advantage.
At higher frequencies, where tape hiss predominates, Dolby S-type combines both sliding and fixed bands in a way which results in what we call "action substitution."
Figure 3C illustrates an action-substitution system having the same quiescent characteristic as the individual system discussed above (solid curve). When less boost is required (dashed curve), the action of the fixed band predominates below the dominant frequency, so that less NR effect is lost than when using a sliding band alone. Above the dominant frequency, the sliding band pre dominates, resulting in none of the NR loss which would occur with a fixed band alone.
Thus, with action substitution the boost of low-level signals is more constant, as is the NR effect, so that the system conforms more closely to the principle of least treatment.
Action substitution has an additional benefit. With a complementary NR system, changes in level introduced after the signal is initially processed can cause the playback processor to mistrack, that is, it may not act as a precise mirror image of the record processor. With a sliding-band system, a relatively small (and otherwise innocuous) level change introduced by the recorder at the dominant frequency can cause disproportionate, and potentially audible, decoder mistracking at lower frequencies. With Dolby S-type, however, the fixed band predominates at frequencies below the dominant frequency, reducing the potential for audible mistracking.
In contrast to action substitution, modulation control, also developed originally for Dolby SR, deals with the effects of high-level signals outside the NR bands which need not, and should not, be boosted. With a sliding-band system, such high-level signals cause the band to slide up in frequency, out of their way. However, the higher the signal level at a given frequency, the further away the band slides. If left to its own devices, a sliding band can move so far away as to create a gap between its noise-reduction effect and the natural masking of the high-level signal, thereby causing a subtle form of noise modulation.
With a fixed-band system, dominant high level signals nominally outside, but close to, the band can cause an undesirable reduction in the band's boost. This is because the filter used to create the bandpass cannot have an infinitely steep slope. If the dominant signal is strong enough, even quite far down the slope it will have the same effect as a lower-level dominant signal well within the bandpass. As with the sliding band, the high-level signal causes a reduction in NR effect where there should be none; that is, it causes noise modulation.
With Dolby S-type, a special technique called modulation control is applied to both the sliding and the fixed bands. It reduces the tendency of a sliding band to move further away from high-level signals than is necessary (Fig. 4A), and reduces a fixed band's tendency to react to high-level signals out side, but close to, the band (Fig. 4B). Thus, like action substitution, modulation control helps to keep all low-level signals in a more constantly boosted state in accordance with the principle of least treatment.
Spectral Skewing and Antisaturation
Spectral skewing and antisaturation techniques have been incorporated in Dolby S-type, as in Dolby SR. Spectral skewing consists of networks in the encoder which roll off the extreme low and high ends of the spectrum; complementary networks in the decoder restore flat response. The networks reduce the dependency of the system's action on signals at the extreme ends of the spectrum, thereby reducing decoder mistracking as the result of response errors introduced by the recorder in those regions. Such errors include those at low frequencies caused by head bumps, and those at high frequencies caused by variations among tape formulations even within the same nominal category, and by head-azimuth variations between the machine on which a tape is recorded and those on which it is played back. While spectral skewing results in some loss of NR effect at those extremes, the ear is so insensitive to noise at the extremes that the benefits far outweigh the theoretical NR loss.
Antisaturation consists of high-frequency shelving networks which operate at high signal levels; complementary networks restore flat response at playback. The shelving significantly reduces the high-frequency losses and distortion caused by tape saturation, thereby significantly extending headroom and further reducing the likelihood of decoder mistracking. Antisaturation reaches lower in frequency than spectral skewing, so its effects are limited to high levels to prevent any audible loss of NR effect.
Antisaturation effects are also contributed by both the low- and high-frequency spectral skewing networks. The low-frequency net work, for example, virtually cancels out the low-frequency boost imparted by standard 3,180-uS cassette equalization, resulting in a notable reduction in distortion on strong low-frequency signals. This improvement is possible because Dolby S-type provides noise reduction at low frequencies, without which eliminating the standard pre-emphasis would increase low-frequency noise. The combined effects of spectral skewing and antisaturation techniques at both low and high frequencies can be seen in Fig. 5, which illustrates Dolby S-type's overall en code characteristics.
Multi-Level Staggered Action
The principle of least treatment calls for a fixed gain, determined by the amount of noise reduction desired, wherever in the spectrum signals fall below some threshold.
But there is also a need to fix the gain on signals which occur above a higher thresh old, to reduce the effects of high-level over shoots. Overshoots are brief, exaggerated increases in level which occur during the time it takes for a compressor to react to a suddenly louder signal and start reducing the gain. At low signal levels, overshoots are of little concern: They are recorded onto the tape and then "undone" by the decoder at playback. But at high signal levels, over shoots from the encoder can get lost as a result of tape overload, resulting in distortion and decoder mistracking.
Therefore, changes in gain should occur only at intermediate levels, a characteristic we call bilinear compression and expansion (Fig. 6). To achieve this characteristic with Dolby S-type, as in all our systems, a dual path circuit configuration is used whereby processing takes place only in a side chain, whose output is added to the main signal path for encoding and subtracted from the main signal for decoding (Fig. 7). At low input levels, the side chain's compressed output makes a significant contribution to the encoder's total output. Because the encoded signal is still comparatively low, the over shoots introduced are of little consequence, as described above. However, as the input signal level increases, the side chain's contribution lessens proportionally, so that the unprocessed main path predominates.
Eventually a level is reached above which the side chain might as well not be there; changes in gain, and therefore overshoots, virtually cease.
However, as more boost is designed into the compressor to achieve more noise reduction, the levels below and above which gain is fixed tend to move lower and higher, respectively. Preventing the one from going too low and the other from going too high runs the risk of introducing too high a compression ratio, which could magnify response and level errors in the recorder and thereby increase the potential for decoder mistracking. Therefore, at the high frequencies, where cassette noise predominates, Dolby S-type uses two 12-dB companding stages connected in series to provide what we call "multi-level staggered action," a technique developed originally for Dolby C type NR and refined for Dolby SR (Fig. 8).
The use of two stages enables us to achieve more noise reduction and maintain the ad vantages of a bilinear characteristic, without introducing unduly high compression.
Each 12-dB stage has a bilinear compression/expansion characteristic. At the low signal levels where maximum noise reduction is desired, the boosts imparted by the two stages add to provide the desired 24 dB.
However, the thresholds of the stages are staggered: In what we call the low-level stage, the levels above and below which gain is fixed are lower than those for what we call the high-level stage. As a result, their compression ratios do not multiply, and the signal is never subjected to a higher ratio than that of an individual stage (Fig. 9).
Multi-level staggered action has additional benefits. For example, the slopes of both stages' NR bands combine to provide steeper overall characteristics (Fig. 10), so dominant high-level signals can be that much closer in frequency to the bands without causing their gain, and thus the NR effect, to change. In addition, production tolerances for the individual stages of a multi-level con figuration can be wider than for a single-level configuration with similar parameters, resulting in a system more readily mass-produced.
However, at low levels, the boosts of the two stages add (C) to provide more noise reduction.
DOLBY S-TYPE AT A GLANCE
New system for cassette recording derived from Dolby SR com bines both fixed and sliding bands.
24 dB of noise reduction at higher frequencies, with 10 dB at lower frequencies.
Increased headroom, particularly at frequency extremes.
Minimal effect of high-level signals on low-level signals minimizes noise modulation as well as decoding errors.
Encoded signal free of dynamic artifacts.
Newly developed dedicated IC configuration for use in consumer products.
New higher performance standards for products licensed to in corporate Dolby S-type.
Figure 11 is a block diagram of a complete Dolby S-type encoder (the decoder is essentially a mirror image of this). There is one element on the diagram that we have not yet discussed: A single-stage fixed band providing 10 dB of low-frequency noise reduction, in addition to the high-frequency stages' 24 dB. Because of the spectral content of cassette noise, the ear's reduced sensitivity to low-frequency noise, and the high-frequency stages' relatively low reach, the low-frequency band has to provide only modest noise reduction, and only below 200 Hz. For these reasons it was also judged that providing both fixed and sliding low-frequency bands was not subjectively worth the added cost.
The low-frequency band also helps balance the encoded signal spectrally for playback without Dolby S-type decoding.
Another element in Dolby S-type is not on the block diagram at all: Dolby Laboratories' requirement that recorders with Dolby S-type meet new, higher performance standards.
These include extended high-frequency response, tighter overall response tolerances, a new standard ensuring head height accuracy, increased overload margin in the electronics, lower wow and flutter, and, for the first time in the cassette recorder industry, a head azimuth standard. These new standards will not only contribute to unprecedented cassette performance but will also help to ensure that tapes recorded on one machine-including prerecorded cassettes-will play back with unprecedented accuracy on any other.
The first tape decks with Dolby S-type will use a new, dedicated three-IC set developed with our cooperation by Sony's IC division, which will be making them available to all Dolby licensees. Later this year, Sony expects to complete the development of a single-chip version having identical performance and other IC manufacturers have ex pressed interest in doing so as well. How ever, Dolby S-type is always likely to cost more than our current consumer systems.
That higher cost, combined with the higher overall performance required of the machine, means that cassette recorders with Dolby S-type may remain at the higher end of the price range. The first models, expected later this year, definitely will be so.
Dolby S-Type and the Future
We cannot predict now many home listeners might want better cassette performance, and how much more they will be willing to pay for it. However, the success of the CD indicates that higher quality sound is appealing to a significant market, and we have found that, at the highest playback levels likely to be encountered in the home, sophisticated listeners subjected to A/B comparisons of CDs and Dolby S-type cassettes are unable to identify which is which with any regularity. We are also unable to predict if the prerecorded cassette industry, unwilling to release titles in more than one format, will consider Dolby S-type cassettes sufficiently "compatible" with B-type playback to issue significant numbers in the new format. Be that as it may, the initial response to demonstrations we have conducted for the industry is generally favorable, and we are proceeding with the development of an appropriate professional encoder. Adding to these factors is the enormous investment in the cassette format by consumers, the music industry, and the audio industry (prerecorded cassettes significantly outsell CDs and LPs combined, and more than 280 million cassette machines with Dolby noise reduction alone have been sold). Therefore, there is a real possibility that Dolby S-type will extend and increase the returns on that investment, just as Dolby SR is already doing for professional analog formats.
A FIRST TEST DOLBY S-TYPE
This short evaluation of Dolby Laboratories' new Dolby S NR system was made using a Teac V-10000 Esoteric cassette deck, which also has Dolby B and C NR. In addition, it has signal generators which produce a 400-Hz tone for level calibration and a 10-kHz tone for bias setting. I used Fuji FR Metal tape for all my tests.
Pink noise, band-limited at 15 Hz and 25 kHz, was the source for the first tests. After deck calibration, a third-octave RTA showed a rising response above 10 kHz at -20 dB, so I increased the bias slightly to get flatter overall responses. Figure B1 shows record/playback responses from +5 dB (relative to meter zero, which is at Dolby level) down to -25 dB, in 5-dB steps. The highest levels show some saturation effects, but the curves are very flat, in general, over the wide range in levels.
One of the more interesting features of Dolby S NR is the spectral skewing at both low and high frequencies. (Dolby C NR has it at high frequencies only.) Both Dolby C and S NR have high-frequency antisaturation circuits. I measured the saturation caused by increasing levels (from -20 to +10 dB) for Dolby B, C, and S NR and without .NR. The great improvement across the entire band with Dolby S NR, compared to Dolby B or no NR, was immediately very obvious. Resistance to saturation with Dolby C NR was close to that for Dolby S NR at the high frequencies but was clearly not as good as with Dolby S NR at the low frequencies. The low-frequency maximum output level (MOL) results with Dolby C NR were very slightly better than with Dolby B NR. With Dolby S NR, however, the MOL improvement over Dolby C NR was 0.7 dB at 1 kHz, increasing at lower frequencies to over 8 dB at 20 Hz, a significant change. The Dolby S saturation output level (SOL) results were better than those for Dolby C NR from 3 kHz to about 14 kHz, where they dropped just below those for Dolby C.
I recorded the band-limited pink noise at -20 dB, with Dolby C NR and played it back using Dolby B (Fig. B2). There was some boost around 8 kHz and a roll-off above 10 kHz, but these were not bad, overall, for a change in mode at a level sensitive to errors. I also recorded with Dolby S and played this back with Dolby B NR. In this case, the changes in frequency response were more widespread and the level shifts were greater. I tried the same two combinations over a range of levels, and the basic results generally remained the same: For playback with Dolby B NR, the response deviations were less with the Dolby C recording than with the Dolby S recording.
I purposely misadjusted bias and level calibrations a few different ways and confirmed Dolby Laboratories' claim that Dolby S NR is more resistant than Dolby C NR to mistracking from calibration errors. Final conclusions awaited results from the listening tests.
Next, I ran signal-to-noise tests, referred to Dolby level, using all NR modes. With A-weighting, the ratios were 55.0, 63.4, 72.1, and 73.4 dBA for no NR, B, C, and S NR, respectively. With CCIR/ARM weighting, the figures were 52.0, 62.2, 71.8, and 71.8 dB, in the same order. Checking noise in third-octave bands, I confirmed that low-frequency noise (around 80 to 100 Hz) was 10 dB lower with Dolby S NR than with any other NR choice. The noise with Dolby S NR was slightly higher than with Dolby C NR from 2.5 to 5 kHz, but noise with C NR was noticeably higher than with S NR from 20 Hz to 2 kHz and from 10 to 20 kHz. From the RTA display, the maximum reduction in third-octave noise with Dolby S NR was 20.5 dB at 1 and 1.6 kHz. Referred to the 400-Hz MOLs, the signal-to noise ratios were 72.6, 81.6, and 84.2 dBA for Dolby B, C, and S NR, respectively.
The first CD I tried recording was Bach: The Organs at First Congregational Church, Los Angeles, with Michael Murray (Telarc CD-80088). I was quickly convinced that at high recording levels, Dolby S NR yielded superior results on the low pedal notes. Between tracks, I detected no difference in noise level between the CD source and the tape playback with Dolby S NR; with Dolby C NR, I had heard a slight difference.
The next CD was Tchaikovsky's 1812 Overture (Telarc CD-80041), performed by Kunzel and the Cincinnati Symphony Orchestra. I concentrated on recording and playing back the last minute of the overture. Very quickly I demonstrated the challenge of recording this CD: I thought I had set the input pots conservatively, but the cannons caused levels way above +10 and the sound was badly distorted, so I reduced level a bit.
At that point, I got very acceptable results with Dolby S NR, even though momentary peaks still came up to +10. For the first time, I truly appreciated the cannon sound in the playback. Other NR modes were definitely not acceptable unless the level was lowered greatly.
Stravinsky's Firebird Suite (Telarc CD-80039), with Shaw and the Atlanta Symphony Orchestra, was very well recorded with Dolby S NR. The playback of the "Infernal Dance of King Kastchei" and the finale was low in distortion and well detailed from the bass drum to the cymbal crashes. I then tried recording other sections of this CD with Dolby S NR, switching back and forth in playback be tween this NR system and Dolby B or C NR. In the medium- and low-level passages, I could hear the response shifts which had shown up in the bench tests, and I certainly preferred the sound when playing this Dolby S-encoded tape through the Dolby S decoder.
Yet when listening to this same tape through Dolby B and C decoders, though the spectral balance was no longer accurate, it did not change when the level of the signal changed even though the levels varied over a wide range. I did not detect any disturbing pumping or obvious shifts in spectral characteristics. Overall, the sonic compatibility with Dolby B and C NR was definitely better than I thought it would be.
Dolby S NR has established it self as my preferred noise-reduction process for its resistance to overload across the band, good signal-to-noise ratio at all levels, low distortion, and resistance to calibration errors. Let's hope the chip makers can bring the cost down to facilitate including Dolby S NR in more than just the top-of-the-line cassette decks.
-Howard A. Roberson
(adapted from Audio magazine, Jun. 1990)
= = = =
Prev. | Next