DSD and SACD -- Evolutionary or Revolutionary?

Home | Audio mag. | Stereo Review mag. | High Fidelity mag. | AE/AA mag.

I can remember walking across Manhattan on a sparkling day in the late summer of 1996 to hear a then-new digital coding system called Direct Stream Digital (DSD). The venue was Sony Music’s West 57th Street studios; I had barely an inkling of what I was in for. Sitting in the control room, I could see a jazz trio in the next studio, everyone close-miked. On the other side of the console was a pair of Wilson Audio Watt/Puppies. The idea was to compare the sound of the console feed with the sound through a then-state-of-the-art 20-bit pulse-code modulation (PCM) chain and the sound through a DSD chain, and then to decide which of the latter was closer to the former. As I recall, we had a choice of 48- or 96-kHz sampling rate on the PCM system, and a Studer 15/30-ips deck was available to play analog masters.

Everyone seemed to agree that DSD was closer to the direct feed than 48-kHz PCM, but something stuck in my craw. Finally, the light dawned. In the next room, I could see a microphone positioned 3 inches in front of the trumpet and another swallowed up in the piano, but my ears heard a reasonably spacious sound with a modest degree of ambience. When I pushed the point, the engineers admitted to adding a bit of reverb and otherwise toying around to make the sound “more natural.” Presumably, this was done digitally. Whether this affected the experiment’s validity is moot, but, having run lots of demos in my 10 years at CBS Labs, I am a very suspicious guy. So I left Sony Music that day unconvinced that DSD was better than PCM. Besides, DSD proved to be a real bit-hog compared with PCM, and the scientist in me thinks that wasting resources is a sign of inelegant engineering.

We asked whether Sony intended DSD as a new consumer audio format and got the ol’ runaround: “DSD was developed to archive Sony Music’s vast collection of analog masters that are drying out and disintegrating this very moment!” (Help! Help!) Sony’s stated DSD objective was a digital recording system that would pre serve every nuance recorded on those masters, was not limited by PCM’s bandwidth and dynamic range constraints, and was convertible to any PCM format presently used or likely to be used, without sound degradation. Phew! Such a lofty objective, and here I was quibbling about bit rate! For shame. Besides, videotape (the proposed archiving medium) can Store wads of data, Sony pointed out. I repressed my urge to ask why one would want to replace one crumbling tape medium with an other when optical storage is possible and left the studio.

If DSD is the perfect way to archive past recordings, ipso facto it must be the best way to record new programs. Soon after, Sony and co-developer Philips trotted the DSD system around the pro audio world to drum up interest.

Despite a lack of professional mixing and sound processing equipment capable of handling DSD (DSD signals can’t be mixed on a PCM board), Sony managed to convert a handful of top-notch recording engineers, who made some excellent recordings using the new DSD technology. I remained my curmudgeonly skeptical self until last January, when I heard these recordings in a private demo at the Consumer Electronics Show. They’d now become Su per Audio Compact Discs (SACDs), and a new consumer audio format was born.

Sony and Philips have been accused of launching SACD simply to roil the waters for DVD-Audio, a statement easier to make than to prove. The way I see it—and I’m sure I’ll get lots of arguments on this—DVD-Audio extends present-day PCM technology into the surround sound realm and permits use of higher sampling rates and longer word lengths, whereas SACD takes a fresh look at what digital recording should be all about. DVD-Audio’s goals are admirable—but it is still a multibit, PCM system, whereas SACD uses the aforementioned Direct Stream Digital technology. Whether DSD proves sonically superior to 20-bit/192-kHz PCM remains to be seen, but I’d not rule out the possibility. (Yes, I know DVD-Audio claims to permit 24-bit resolution, but for the reasons given in my DVD-Audio article in the September issue, I think that’s just specsmanship.)

It’s not what DSD has, but what it doesn’t have, that makes the difference: a case where less may be more and where 1 bit may be better than 20. Nor did DSD emerge from the seashell as a fully formed Aphrodite. DSD is a byproduct of developments that led to the modern PCM converter. In fact, DSD analog-to-digital converters and most modern high-resolution PCM A/D converters start off the same way. Both use high-speed, oversampling 1-bit (or “low-bit”) delta-sigma modulation, and noise shaping, to transform analog signals into the digital domain. DSD samples incoming signals 2,822,400 times a second (64 times the 44.1-kHz CD rate), an over-sampling rate typical of PCM converters as well. (For more about oversampling, 1-bit delta-sigma modulation, and noise shaping, see “A Compleat Digital Primer.”) Where DSD and PCM part company is in what they do with the 1-bit data stream.

To record PCM, the high- speed, 1-bit data are converted into multi-bit words at a lower sampling rate. In the conversion, data are decimated and digitally filtered to avoid aliasing (see “Primer”), because the sampling rate is being reduced. Although a brick-wall digital filter is easier to implement than its analog equivalent—it simply involves a series of mathematical calculations—its effects are not necessarily benign.

Digital filters overshoot and ring both before and after transients, and each mathematical calculation increases the word length. If the accumulators in the filter cannot accommodate the longer words, overshoots caused by the calculation can be clipped or the least significant bits (LSBs) of calculated data can be truncated. In a complex, multistage brick-wall decimation filter, internal word length can reach or exceed 100 bits! To constrain the word length, well-designed multistage decimation filters re-quantize (to reduce the word length) at each stage. In any case, the output data must be requantized to fit the system parameters—i.e., word length must be reduced to 16 bits for CD or to 16, 20, or 24 bits for DVD-Audio. Each requantization produces distortion if the data are not properly redithered, but in randomizing quantization distortion, dither raises the noise floor slightly. No free lunch!

When played back through a delta-sigma D/A converter (the dominant method nowadays), multibit PCM words are converted back to a high-speed 1-bit (or low-bit) data stream by an interpolation filter and returned to analog by a delta-sigma modulator and reconstruction filter—hence more calculations, more overshoot, and more phase shift. Which perhaps is what has led some audiophiles to prefer the sound of gentle, low-order interpolation filters to that of brick-wall types, despite the large amounts of aliasing distortion that so-called soft filters permit! I’m not in that camp, but it does raise a question: Why use this approach? The simple answer is that delta-sigma modulators have been found to be the most practical and cost-effective way to convert digital information back to the analog domain. A 1-bit modulator is inherently linear and, properly implemented, can have wide dynamic range. Ladder DACs have problems achieving 16-bit accuracy, never mind 20 or 24 bits!

The beauty of a 1-bit delta-sigma-modulated data stream is that it contains the analog signal within itself. Although the delta-sigma stream is digital, in that it toggles between two fixed levels (“1” and “0”) rather than trying to describe a continuum of intermediate values, Fourier analysis of the data stream reveals the original analog signal embedded in the original band of frequencies where it began. All that is needed to restore it is an analog low-pass filter to dump the digital trash!

The salient difference between Direct Stream Digital and PCM is that DSD cuts out the middlemen. Instead of using a decimator and digital filter to get 1-bit data into multibit form for recording and then using more calculations in the player to turn PCM words into 1-bit equivalents so they can be linearly converted to analog, DSD records and plays the 1-bit stream directly (Fig. 1). In principle, all that is needed in the player is an analog low-pass filter to reconstruct the audio and reject the ultrasonic digital trash. Looked at this way, DSD is a simpler, more straightforward, more analog approach to digital recording, and the old audiophile adage that says “the less the better” is more often right than wrong.

DSD also offers a flexible trade-off between bandwidth and dynamic range. With PCM, bandwidth is sharply limited to half the sampling rate while maximum dynamic range is dictated by word length. Proponents argue that the dynamic range of a PCM system can be made arbitrarily large by increasing the word length, but in my experience, the practical end-to-end limit for a PCM system is about 20 bits, and there are good reasons for that.

With DSD, bandwidth and dynamic range are determined by the type and order of the noise shaping used in recording and, to a lesser extent, by the analog low-pass filter in the player, characteristics that are not cast in stone. The only constraint the SACD “Scarlet Book” of standards places on the encoder is that the total noise energy in the region below 100 kHz not exceed -20 dBFS. There is a recommendation that SACD players incorporate a 50-kHz low-pass filter “for use with conventional speakers and amplifiers,” but higher cutoff frequencies are permitted.

High-order noise shaping forces more of the quantization noise into the ultrasonic region and improves dynamic range within the audio band. But since high-order noise shapers are not stable when overloaded, they must be designed with controls that prevent overload and a way out should it occur. (The same applies to the noise shapers used in high-resolution PCM converters.) The cutoff frequency of the noise shaper enters the equation be cause it affects the bandwidth of the DSD system. You could, for instance, use a high cutoff frequency to maximize system band width, in which case you’d have the choice of using either high- order noise shaping to maintain the same dynamic range over the wider bandwidth or a lower-order filter at the sacrifice of dynamic range. On the other hand, you also could lower the cutoff frequency of the noise shaper to improve dynamic range in the audio band (for a given filter order) if you thought that delivered better sound.

The point is that DSD is a flexible coding system that leaves the recording engineer (and A/D converter designer) in charge of sound quality. I find that appealing because it portends future quality improvements within the SACD format. By adopting 16-bit/44.1-kHz PCM, Sony and Philips locked the CD in a straitjacket. Apparently, they don’t intend to repeat that situation. The flip side of the flexibility coin is that it’s difficult to speak in concrete terms of a player’s (or the SACD format’s) bandwidth or dynamic range because the two are interrelated and affected by choices made in the recording studio.

Fig. 1—By eliminating the conversions from delta-sigma modulation to PCM—and back—that are typical of modern CD recording and reproduction, SACD greatly simplifies the entire process.

Suffice to say that SACD claims the equivalent of 20-bit PCM resolution within the audible range and a bandwidth of 100 kHz. That should encompass the range of human hearing quite nicely. Presumably the 20-bitll00-kHz claim is based on using reasonably high-order noise shaping in the A/D converter and a fairly steep low-pass filter in the player. There’s a limit to how far one can (or should) go in optimizing both factors at the same time. It is counterproductive to shoot for maximum bandwidth together with maximum dynamic range. Doing so would require a very high- order noise shaper with a very high cutoff frequency, which would force so much of the quantization noise into the ultrasonic region that a brick-wall filter would be required in the player to dump it. That's exactly what DSD was meant to avoid.

So much for DSD; what about SACD? I draw a distinction be tween DSD, the recording technology, and SACD, the consumer audio super-disc, because DSD may succeed for archival purposes whether or not SACD proves a consumer success. Physically, an SACD is very similar to a DVD. Although it can't be played on to day's DVD players, I see no reason why it couldn't be played on fu ture systems because the physical size (12 centimeters), storage capacity (4.7 gigabytes per layer), and optimum laser wavelength (650 nanometers) are the same. (In fact, Philips recently announced its intention to deliver a universal DVD-Video/DVD-Audio/SACD/CD player late next year.) Only the maximum transfer rate and data format differ, SACD discs are single-sided and come in three varieties: a disc with one high-density (HD) layer that carries DSD information exclusively, a disc with two HD layers that provides 8.5 gigabytes of storage for longer playing time, and a hybrid that has an HD up per layer and a 44.l-kHz, 16-bit Red Book (CD-compatible) lower layer that existing CD players should recognize (Fig. 2). Hybrid discs solve the dual-inventory problem that was said to be so important to the music industry. At one point, a DVD-Audio spokesman told me that the same could be done with their format, but lately they seem to be backing off that position on the grounds that they've found many CD players that will not recognize the CD layer of a hybrid disc.

===

Fig. 2-A hybrid SACD has two layers. The high- density layer is derived from DVD optical technology and carries the high-resolution DSD data streams. The base layer is essentially a standard CD. The HD layer is only semi-reflective, to allow the longer wavelengths of a standard CD player's laser pickup to penetrate it and read the CD layer.

====

Four of the five discs I got with the Sony SCD-1 (reviewed else where in this issue) were hybrids, so I thought I'd give them a go on the players I had around. All played fine on my reference Sony CDP-XA7ES player, on a Technics portable, and on an antediluvian Yamaha CD deck. My Sony DVP-S7700 DVD player had no difficulty recognizing the hybrid SACD as a CD, nor did my Philips laserdisc player. The only machine that refused to read the disc was a Matsushita-sourced single-laser DVD player that also burps on CD-Rs.

SACD supports 255 tracks, which eliminates any need for index numbers. It also supports text (song. lyrics, liner notes, and the like), graphics, and limited video (still-frame, album covers, etc.), but it is not primarily a video system. The format allows for as many as six channels of DSD surround sound in addition to a DSD stereo track. SACD does not incorporate anything like DVD-Audio's Smart Contents automated-mix-down capability; so a separate stereo track is not merely an option if high-resolution stereo is de sired, but a necessity. Sony and Philips contend (and most top notch recording engineers would agree) that it is not possible to do justice to stereo by having a player generate it from a surround mix, even if the producer has as much control over the mix-down process as he does in Smart Contents. In the studio, stereo mixes are made from the original multitrack master, not from a surround mix thereof. Sometimes even the microphone setup is different for stereo than for surround sound.

SACD was designed to have the same 74-minute playing time as a normal CD (sensible in light of the Red Book layer on the hybrid disc), but with eight DSD channels (six surround plus studio mixed stereo) on the HD layer, SACD devours storage capacity at a wild pace. The DSD data rate comes to 2.8224 megabytes/second, which chews up a 4.7-gigabyte HD layer in 27 minutes, 45 seconds. To boost this capacity to 74-minutes, the Philips half of the team concocted a lossless compression algorithm called Direct Stream Transfer (DST). DST uses a combination of linear prediction and Huffman coding to compress the data by a ratio greater than 2:1. (By my calculations, they'd need 2.67:1 packing to deliver 74 minutes of six surround plus two stereo channels on a 4.7-gigabyte layer.)

DST divides the data into 13.3-millisecond frames. Within each frame, a linear filter predicts whether the next bit will be "1" or "0," based on the value of the preceding bit. The predicted value is compared with the true value. If the prediction is correct, the comparator outputs a "0"; if it's incorrect, the comparator outputs a "1." Because audio data are not random, the predictor is right more often than wrong. Therefore, the output of the comparator is likely to consist of relatively long strings of "0s" with occasional "is" where the prediction was incorrect. And because it contains long strings of the same value, the predicted data is more redundant than the original. The redundancy is then removed by a Huffman code, which produces the actual compression.

Huffman coding replaces common patterns of data with simpler codes. For example, if four consecutive "0s" is the most frequent occurrence, "0000" could be replaced with a single "0," defined in the Huffman table as "0000." That yields 4:1 compression for the "0000" pattern, but now a different code is needed to replace a single "0." Because the single "0" code will be more than 1 bit long, some ground is lost. The point is to make the data as redundant as possible (via linear prediction) and then assign the shortest codes to the most common patterns and the longer codes to the least common patterns, A Huffman table can be fixed or computed for each data frame to optimize compression on a frame-by-frame basis. It's not clear which approach is taken, though I suspect Philips derives a new table for each frame based on the patterns found within it.

Because the table constantly changes in this case, modifications to it must be included with the data as side information, so the player knows how to decipher the block. Consequently, there's a delicate trade-off between the size of the frame and the amount of side information that must be transferred to optimize the compression ratio.

Assuming error-free transmission, this type of compression is lossless. That is, the original 1-bit data stream can be perfectly re stored from the compressed information. Although I use the term "prediction error," there is no error overall. When the encoding predictor is wrong, that information is sent in the data to the player whose decoding predictor is told "whatever you think the value should be is wrong, so fix it!" As I write this, the only SACD player and existing sample discs are stereo-only. One cannot help but conclude that in order to beat DVD-Audio to market, choices had to be made. My guess is that Philips' DST algorithm was not locked in silicon in time, but it's also possible that no multichannel DSD masters were available. As I said earlier, DSD cannot be mixed on a PCM board, and DSD mixers and signal processors are as rare as hens' teeth.

Sony has stated that the SACD launch will be supported by an initial release of approximately 40 titles from AudioQuest, Delos, dmp, Mobile Fidelity, Telarc, Water Lily Acoustics, and Sony Music. Sony Music has indicated it will release 10 titles per month there after. Time will tell. Philips can't help because it sold its record labels (PolyGram et al.) to Universal Music, which is aligned with the DVD-Audio group. At present, the DVD-Audio group has gotten more companies to sign on the dotted line than have Philips and Sony, possibly because it seems more amenable to the stronger copy-protection and anti-piracy schemes the record industry de sires. But the protection issue remains a thorn in the sides of both camps.

As I see it, the music industry wants something it can't have without destroying the rationale for a superior audio carrier. From what I've been told, the record companies not only want to be able to protect against indiscriminate digital cloning but to control copying in the analog domain as well. It's one thing to scramble a digital signal or insert flags in the data stream that le gal digital recorders recognize and refuse to record; it's another to have protection carry over into analog output signals. Digital flags can be recognized, acted upon, and stripped before conversion to analog, but it seems to me that anything that ends up recognizable by an analog recorder can't help but be audible. If the music industry couldn't get away with this on CDs (it tried, remember), how can it possibly expect to do so on a higher-quality medium? I think the best the record labels can hope for is to make indiscriminate digital cloning illegal and difficult and let it go at that. No one is seriously proposing military-style encryption, and anything short of that will quickly be hacked by commercial pirates. Commercial piracy and the Internet are the serious threats to the music industry, not audiophiles. We're the customers! Actually, SACD seems to have certain advantages over DVD-Audio with respect to piracy, even though Sony and Philips won't agree to any tampering with the signal that possibly could be audible. SACD discs not only carry a digital watermark but can also carry a visible one. As with DVD-Audio, the digital watermark can be used to prevent a player from reproducing an illegal disc and to control how many copies can be made. The (optional) visible watermark assures the buyer that the disc is legal.

The visible watermark is created by Pit Signal Processing (PSP) while mastering the disc. PSP varies the pit width in a way that leaves a visible pattern on the disc-for example, the DSD logo. Reportedly, the technology needed to do this in a way that does not cause jitter is extremely complex, and the magic boxes will be made available only to licensed pressing plants. PSP offers a fighting chance of reducing commercial piracy if consumers refuse to buy unwatermarked discs. But that's a big if, when visible watermarking is optional. A

I wish to acknowledge the assistance of David Kawakami of Sony and Dr. Andrew Demery of Philips in providing AES Preprints and other technical background material. Special thanks to Ed Meitner of EMM Labs for sharing his experience in designing the DSD professional equipment that made possible many of the recordings I heard.

=== A COMPLEAT DIGITAL PRIMER ===

Digital audio systems have two things in common: sampling the analog input signal at a regular rate and quantizing (assigning a value to) each sample. The sampling rate puts an upper limit on frequency response; quantization establishes the dynamic range. The theoretical upper limit of a sampling system is one-half the sampling rate. This is the so-called Nyquist cri tenon. Because Compact Discs use 44.1-kl4z sampling, the theoretical upper limit on frequency response is 22.05 kHz.

Signals of higher frequency that enter the sampler cause aliases that cannot be distinguished from real signals. Aliasing is a form of intermodulation distortion, or beating between the signal and the carrier, that produces new signals at the sum and difference frequencies. For example, if a 43. l-kHz (or 45.1-kHz) signal is sampled at 44.1 kHz, a 1-kHz beat is produced that cannot be distinguished from a true 1-kHz signal.

Aliases at the sum frequencies (87.2 kHz and 89.2 kHz in this case) also occur, but, lying in the ultrasonic region, they are in audible. The 1-kHz difference signal definitely is audible, so signals of frequencies greater than half the sampling rate must be prevented from entering a sampler. This is the job of the anti-aliasing filter.

In digital audio's early days, anti-aliasing filters were complex analog circuits that attempted flat response to 20 kHz and high rejection at 22.05 kHz and above. Designing such a filter was (and is) a formidable task; building an affordable one was (and is) even more difficult. In-band response seldom was flat, out-of-band rejection rarely was as much as needed, and group delay wasn't uniform. This caused severe phase distortion at high frequencies.

Oversampling-sampling at a higher rate than the Nyquist criterion requires for the desired bandwidth-offers an out.

Instead of sampling at 44.1 kHz, suppose one samples at 176.4 khz (4x oversampling) while maintaining the original 20-kHz system bandwidth. Signals above 88.2 kHz still alias, but the lowest frequency that creates a sub-20-kHz alias is 56.4 kHz (176.4 kHz minus 156.4 kHz equals 20 kHz). That provides more room for the analog input filter to work, greatly simplifies the design, and improves phase response.

When the digital signal is returned to the analog domain, it appears as a series of samples, and similar aliasing problems occur. If the D/A converter operates at the original sampling rate, another brick-wall filter is needed at the output. (That's what was done in the early days, and CDs got the rap for bad sound.) A solution is the use of another form of oversampling, a digital interpolation filter.

Interpolation filters use mathematical calculation to increase the sampling rate and move images out to higher frequencies, where they can be removed by a relatively simple analog filter. Over sampling is beneficial but has a major drawback: There's less time for each conversion, which is a real problem if one is shooting for 16-bit (or better) accuracy. This brings up the subject of quantization.

Just as prices are usually rounded up or down to the nearest penny because that's the smallest coin in our currency system, a digital representation must be rounded up or down to the nearest value permitted. This produces quantization error, which in the audio world is called quantization noise. PCM 16- bit binary words (CD, DAT, etc.) have 2^16 possible values. In decimal notation, that's 65,536 possible values, one of which is zero. Because the maximum quantization error is always ±½ of the least significant bit (LSB), increasing the word length provides finer resolution and better dynamic range. In a PCM system, theoretical signal-to-noise ratio is equal to 6.02 times the number of bits plus 1.76 dB, so a 16-bit system can theoretically handle a dynamic range of 98 dB. Achieving this in practice is easier said than done.

Early analog-to-digital and digital-to-analog converters used a straightforward way of converting between domains. On the AID side, the sample value was compared to a set of voltages established by a resistor network. The digital word was then generated by successively approximating the value until it got as close as possible. On the D/A side, another resistive ladder generated currents proportional to each bit of the digital word. The currents corresponding to the "1" bits were summed to pro duce the output. The concept is simple, the execution excruciating. The problem is one of tolerance; the MSB (most significant bit) of a PCM word accounts for half the total value, and it's nearly impossible to make it precise enough that the error doesn't swamp the less significant bits. Consequently, early 16- bit converters rarely achieved true 16-bit performance.

Most modern converters use high-order oversampling, I-bit or low-bit delta-sigma modulation, and noise shaping to transform between the domains. Delta-sigma modulation is an offshoot of the delta modulator developed at Bell Labs decades ago. Delta modulators are rather simple circuits that compare the present value of the analog signal with the past and output a "1" if the value has in creased and a "0" if it has decreased. Delta modulators do not encode the actual signal; instead, they mathematically encode its change-its delta, or derivative. By including an integrator (sigma) in the comparator loop, the delta modulator becomes a delta-sigma (probably more properly called a sigma-delta) modulator. Integration mathematically reverses the derivative so that a delta-sigma modulator encodes the signal itself.

In theory, a 1-bit delta-sigma modulator is perfectly linear, because all bits have identical value. The problem is quantization noise. With only 1 bit to describe data, the dynamic range is zilch! Oversampling and noise shaping came to the rescue. Digital quantization noise spreads from DC to half of the sampling rate. (See Fig. A1.) With 64- or 128-times oversampling, most noise is in the ultrasonic region and can be disregarded, but the raw quantization noise of a 1-bit converter is so great that oversampling by itself is not enough. Doubling the sampling rate spreads the noise over twice as much bandwidth, but that improves the dynamic range within the audio band by only 3 dB. Thus, 64-times oversampling results in an 18-dB gain and a 1-bit, 64-times oversampled converter has about 4 bits of resolution in the audio band. Hardly enough! Here's where noise shaping comes in. Noise shaping is a technique that effectively low-pass filters the desired signal and high-pass filters the quantization noise. This squeezes the noise out of the audio band into the ultrasonic region, much as air in a cylindrical balloon can be squeezed into one end by grabbing the other (Fig. A2). Actually, a 1-bit delta-sigma modulator is itself a first-order noise shaper. The integration provides the filtering action, and a 64-times oversampled delta-sigma converter has better than 4-bit resolution within the audio band.

Combining several delta-sigma modulators in the same loop results in higher-order noise shaping that forces more of the noise into the ultrasonic region. Within reason, this technique can achieve arbitrarily good resolution within the audio band and perfect conversion linearity. The extremely high sampling rate also effectively eliminates aliasing concerns in the delta-sigma modulator, although aliasing can occur if the 1-bit stream is down-converted into multibit PCM at a slower rate. One of DSD's prime claims to fame is that it avoids that last step.

Fig. A1. A simple 1-bit PCM quantizer (a) produces very high quantization noise (the difference between the input and output waveforms) within the audio band. A 1-bit delta-sigma modulator (b) produces the same total amount of quantization noise, but it is spread over a much wider bandwidth, so less of it appears in the audible range.

Fig. A2.Comparison of noise spectra with and without noise shaping.

Prev. | Next