Digital Audio for the Future by Bob Stuart (part 2) (April 1998)

Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting

<< cont. from part 1

by Bob Stuart

[J. Robert Stuart is chairman and technical director of meridian ajdio and chairman of the acoustic renaissance for audio. An organization founded to campaign for establishment of a dvd-audio standard that would assure the highest possible quality of sound reproduction.]

Last month, Part 1 of this article examined in detail what is required for digital audio coding that can be guaranteed trans parent for all listeners with all types of signals. This month, we turn our attention to how most effectively to code such audio for distribution.

In its analysis of this question, the Acoustic Renaissance for Audio (ARA) concluded that uniform, linear, multibit pulse-code modulation (PCM) offers the following overwhelming benefits, against which other contenders should be judged:

(1) Its uniform sampling and quantization gives the option of scalability. (2) Optimal dither offers effectively infinite time and amplitude resolution and is demonstrably linear, both mathematically and in practice. (3) Psychoacoustically based pre-emphasis schemes are easily incorporated. (4) Stationary, psycho-acoustically based noise shaping can be applied as a straight forward optimization technique. (5) Transparent data compression is an option to save bandwidth or storage capacity.

A distribution channel need not carry raw PCM. In fact, the choices currently available include: (1) PCM at sampling rates between 32 and 192 kHz and with word lengths from 8 to 24 bits. (2) Pre -emphasized PCM with complementary de-emphasis on playback. (3) PCM using psychoacoustically optimized noise shaping to deliver higher resolution for a given word length. (4) PCM combining the techniques of options 2 and 3. (5) Losslessly compressed PCM (what the ARA has called "packed PCM"). (6) PCM with losslessly cascadable lossy encoding. (7) PCM compressed using a lossy method (psychoacoustically based variants of which include Dolby Digital, DTS, and MPEG). (8) Bitstream coding.

We in the ARA have contended that current technology can guarantee transparency in a channel only if lossy perceptual coding (option 7 above) is not used. Underlying this point is an extremely important observation: With the exception of bitstream coding (option 8 above), all the systems start and end with linear PCM. Linear PCM, when used correctly, provides an infinite-resolution (but noise -limited) representation of the output of a microphone. We in the ARA take the purist view that we want to convey as nearly as possible the acoustic waveform of the original performance. By coding that waveform, we can at tempt to replay the audio by reversing the process-the traditional goal of high-fidelity sound reproduction. This does not exclude certain legitimate concessions to psychoacoustics: balanced limitation of bandwidth and dynamic range and enhancement of perceived resolution by means of pre-emphasis or psychoacoustically optimized noise shaping (processes that can be considered effectively linear).

Among the other coding methods, lossless compression, or packing, of PCM data is simply a method of delivering a bit-accurate output while reducing the quantity of data stored or the data rate transmitted in the channel. This is no different in concept from the well-known methods, such as Zip files, used in computers for storing data in less space than occupied by the original files, although the techniques for packing audio are quite different from those used to compress text and pictures.

Such lossless compression is an important tool in the quest to optimize resolution and deliver able sound quality.

The ARA has strongly sup ported the use of lossless packing, not only because it enables much more efficient data storage and transmission but also be cause such compression lowers the correlation between bit pat terns and audio data. This, in turn, can lead to reduced levels of correlated jitter, which is a critical factor in high-resolution digital audio systems.

For sampling rates of 96 kHz and above, designers of both lossy and lossless compression schemes have considered reducing the fundamental word rate in the distribution channel, principally to enable easy transmission through existing carriers or interfaces. A lossless processor, for example, can offer at least 2:1 compression on most 96-kHz audio material, effectively allowing the distribution sample rate to be halved as one expression of the reduced data rate. In principle, this lowered rate can be treated in two ways: (1) as a single, compressed, high-rate (e.g., 96-kHz) signal or (2) as a combination of a half -bandwidth version suitable for reproduction at the channel sample rate (e.g., 48 kHz) and a high-frequency touch-up signal for highest -quality playback, either of which may use lossless or lossy encoding.


I have argued that if a well -executed PCM channel is to guarantee transparency to a human listener, it will require more than 16 bits and a sampling rate higher than 48 kHz. I have also pointed out that normal practice still does not exploit the full potential of the current 44.1-kHz, 16-bit Red Book CD channel.

If we were to change the parameters purely according to audio considerations, we might well propose 20 bits at 66.15 kHz.

Such a channel would require a data rate of 1.4 megabits per second, which is twice the rate required by the current 44.1-kHz, 16-bit channel. In fact, however, there are very strong practical reasons for maintaining 2:1 relationships with the Red Book release for mat, with the current archive, or with video -related formats (48 -kHz sampling).

Realistically, therefore, the next useful sampling rates for pure audio are 88.2 and 96 kHz.

It should be clear that increasing the bandwidth in this way will double the required data rate. In conventional audio engineering terms, that could look like a bad deal, depending on one's views concerning the value of audio content above 20 kHz and the desirability of setting standards defensively (using more data to cover up bad implementations). With that in mind, let's examine the options for reducing the data requirements of channels that run rather too fast (at 96 kHz, for example).

Fig. 1--The current standard pre -emphasis curve for digital audio compared with a pre-emphasis characteristic optimized for high sampling rates, such as 96 kHz. Also shown are a noise-shaping curve matched to the proposed pre-emphasis as well as the spectrum of the noise floor produced by this noise shaper.

Fig. 2--Output noise spectrum and headroom after application of the proposed pre- and de-emphasis. The graph expresses dynamic range in hits. This example illustrates a capacity of 18.1 bits at 4-khz for a 16-bit channel, a perceptual gain of 2.1 bits.


There are two linear and psychoacoustically correct coding methods for improving the performance of linear PCM channels, particularly if the distribution channel uses a word length shorter than that of the original. These methods are: (1) noise shaping during a word -length reduction process to maintain a high effective dynamic range in a channel of fewer bits and (2) pre- and de-emphasis to better match channel capacity to the energy spectrum of music and to human hearing. Noise shaping can also be combined very effectively with pre-emphasis/de-emphasis, particularly if the noise shaper is designed to exploit the pre-emphasis curve.

Because the use of a high sampling rate (such as 96 kHz) allows the bandwidth of the channel comfortably to exceed the high -frequency cutoff of human hearing, there are new options for noise shapers that have low coloration in the mid -band but also redistribute channel capacity in a more useful way. We have come up with examples of noise shapers that can provide perceptual gains of as much as 6 bits in a 96 -kHz channel! The unique advantage of using noise shaping alone as a method of minimizing data rate or maximizing the perceptual performance of a channel is that it requires neither equipment changes for replay nor a decoder. It is fairly easy to design noise shapers that will provide the required dynamic range in a 96-kHz, 16-bit channel.

The use of pre- and de-emphasis to improve the perceived dynamic range of analog channels is quite familiar to audio engineers. The method has been used with particular success in systems where the analog noise level increases with frequency, such as with magnetic tape, shellac or vinyl records, and FM broadcast. In each case, a well-documented property of music and speech is exploited: When material of acoustic origin is microphoned at normal listening positions, the average and peak spectrum levels decline with frequency above a few kilohertz. It is therefore efficient to boost (pre-emphasize) high-frequency signals enough to make it more likely that they will fully occupy the channel's capacity. De-emphasis is applied on replay or reception, restoring correct frequency response while reducing both noise and distortion from the preceding chain.

Although pre-emphasis followed by de-emphasis was first implemented in analog systems, it can very usefully be applied to digital channels. When a link in the transmission chain requires a shorter word length (for example, when a 20 -bit recording is transferred to CD), very real benefit can be obtained by performing pre-emphasis in the digital domain, quantizing with a noise shaper designed to exploit the pre-emphasis curve, and performing de-emphasis in the analog domain (or in the digital domain to a digital channel using a larger word size).

So far, all standards for digital audio have permitted the use of pre- and de-emphasis.

The universal characteristic is currently 50/15 microseconds, as shown in Fig. 1.

This pre-emphasis characteristic makes an increase in subjective dynamic range possible by boosting audio frequencies above 3 kHz in the transmission channel and attenuating them (and channel noise) on replay. It has not been overwhelmingly popular with the recording industry, mainly because some close-miked material does not offer in-band high-frequency headroom and because pre-emphasis brings on a mastering management issue, as its use has to be flagged. (Once again, a great potential squandered by poor practice.)

Fig. 3--Output noise spectrum and headroom for 16-, 20-, and 24-bit channels alter application of the proposed pre- and de -emphasis and noise shaping. The graph expresses dynamic range in kits. Ike example illustrates a capacity of almost 25 bits at 4 khz for a 16-bit channel, a perceptual gain of 7 bits.

The ARA committee has pro posed a new pre- and de-emphasis scheme to the DVD Forum for material recorded at sampling rates higher than 88.2 kHz. It combines a very suitable pre-emphasis characteristic with a matched noise shaper (Fig. 1 again). Figure 2 shows the output noise spectrum after application of the proposed pre- and de-emphasis. This scheme has several attractions. It substantially improves high-audio-frequency headroom (reducing it by only 2 dB at 15 kHz, compared to 9 dB in the current standard). The pre-emphasis method involves a noise shaper that gives a 2.2 -bit increase in overall dynamic range when used as a word-length-reduction device. In essence, channel capacity is moved from the extreme high -frequency range, where it is not required, to the mid-band, where it is incredibly useful. Because the noise shaper has the same shape as the pre-emphasis curve, the output (i.e., de-emphasized) noise spectrum is "white." And analog playback devices can match the de-emphasis curve very closely.

Figure 3 clarifies the way in which the suggested pre-emphasis combines with a noise shaper to provide increased dynamic range. The headroom curves at the top show de-emphasized response normalized for 16-, 20-, and 24-bit channels. The bottom curve represents the noise spectrum of the shaper used, after correction to allow for the 2.2-bit gain achieved by the pre-emphasis scheme. You can see from the figure how a 16 -bit channel operating at a 96-kHz sampling rate can have an effective dynamic range of 23 bits in the critical 4-kHz region; note also that the channel is still offering 19-bit performance at 20 kHz. Table I summarizes the benefits noise shaping and pre-emphasis can offer.

Table I--Coding benefits of pre-emphasis, noise shaping, and a combination using both of the described schemes.

Fig. 4--Proposed lossless encoder and decoder for DVD-Audio, each consisting of the core algorithm followed to the encoder or preceded in the decoder by a smoothing buffer.


kny stream of digital data representing coded audio information is in principle compressible, for two basic reasons. First, the full capacity of a rectangular channel is not occupied continuously by audio that conveys meaning. This leaves room for simple techniques like noise shaping and pre-/de-emphasis to work successfully. Second, material of interest to human listeners contains some structure that can in part be predicted. It is therefore possible to design a coding and decoding scheme that reduces the quantity of data transmitted or stored.

Doubling the data rate from 48 to 96 kHz to convey any less than twice the information is inefficient. One elegant solution is to use lossless compression in the channel. There are many methods of implementing lossless compression; most are based on the use of prediction, which reduces the quantity of data to be conveyed.

An appropriate lossless compressor should: (1) return the original data intact, bit-for-bit; (2) be robust in dealing with errors in the channel; (3) be effective in reducing the data rate at high sampling rates (i.e., recognize ultrasonic content); and (4) control the peak data rate (a factor of importance in DVD replay).

The ARA has strongly recommended that high -quality audio channels be losslessly compressed (packed). Signal processing has advanced to such a level that the data-reduction benefits of this sort of coding are too good to pass by. Unlike perceptual coding or other forms of lossy data reduction, lossless compression does not alter the final, decoded signal in any way; it merely packs the audio data more efficiently into a lower data rate.

Existing lossless audio data-compression systems are optimized for reducing the average data rate but not for reducing the peak data rate or for obtaining good results at high sampling rates, such as 88.2 or 96 kHz. The process of packing PCM becomes more efficient as the sampling rate is increased. For example, packed 96-kHz audio does not double the data rate of packed 48-kHz audio, as you might expect; the increase is more like 30%.

Lossless compression can also allow the record producer to choose what trade-offs should be made between playing time, frequency range, number of active channels, and precision. The packed channel can con vey this choice implicitly in its control data, and the system's operation will be transparent to the user. This arrangement has the following benefits: (1) A producer mastering at 48 kHz can control the incoming precision of each channel and trade playing time or channels for noise floor. (2) A producer mastering at 96 kHz can also trade bandwidth for playing time, active channels, and precision. Playing time or precision can be extended, for example, by pre-filtering information above 30 kHz or supplying only a two-, three-, or four-channel mix. (3) Lossless packing offers an opportunity to make a much better product, in that more precision and more channels can be provided.


it should be apparent that any lossless compression scheme will, of its nature, be more successful in some passages than in others, so that the compressor's output data rate will not be constant. Recently, however, a lossless compression scheme has been developed that is optimized for (but not exclusive to) DVD, in that it delivers a constant data rate in the packed do main. With this scheme, 16-bit, 96-kHz-sampled audio signals can almost always be losslessly compressed to 8 bits (and 16-bit, 48-kHz-sampled signals to 12 bits), with exact reconstruction of the original on playback.

The properties of the lossless coding scheme proposed for DVD audio are as follows: (1) output data filled out to a constant data rate to meet disc constraints; (2) out put data rate typically lower than that of PCM input at 48 kHz; (3) output data rate significantly lower than that of PCM input at 96 kHz; (4) input word length continuously adjustable between 16 and 24 bits; (5) bandwidth continuously adjustable between 22 and 48 kHz, with efficient coding for these options; (6) good compression; (7) seamless transition from lossless to lossy operation (if necessary); (8) extremely simple decoder; and (9) auxiliary data stream exactly synchronized to the audio. This scheme, which uses a simple hardware or software decoder that takes instructions from the bit stream, enables great flexibility at the mastering stage. And the option remains open of substituting a more sophisticated encoder at some future time to achieve better compression.

Figure 4 shows the lossless encode/decode process. Low-frequency effects channels do not require special handling, as the encoder automatically makes bit-rate savings according to signal bandwidth. The encoder core produces a data rate that varies with the audio signal, being greatest during peaks of high treble energy. Because the peak data rate is a limiting factor in DVD, the complete encoder includes a buffer that smooths the peaks in the data rate. A corresponding buffer on the play back side allows peak data rates higher than the DVD can handle to be delivered to the decoder core. Table II shows the reductions in peak and average data rates achieved by the proposed lossless compression scheme. These levels of compression very comfortably exceed the tentative projections put forward by Gerzon in the original ARA proposal.

In DVD applications, the peak data rate is the important parameter, whereas for hard-disc storage, the average rate would be the one to examine. At 44.1 or 48 kHz, the proposed scheme can almost always re duce the peak data rate by at least 4 bits/sample in lossless mode--i.e., 16-bit audio can be losslessly compressed so that it fits into a 12 -bit channel. At 96 kHz, this scheme can reduce the peak data rate by 8 bits/sample in lossless mode, which means that 24-bit audio can be compressed to 16 bits and 16-bit, 96-kHz audio can be losslessly compressed so that it fits into an 8 bit channel.

These numbers indicate that lossless compression enables more audio channels to be transmitted on a given carrier. For ex ample, the following intriguing arrangements are possible: (1) three channels at 16-bit, 44.1 -kHz audio in a Red Book CD data stream (allowing, for instance, Ambisonic B format to be issued on CD); (2) two channels of 16 -bit, 88.2-kHz audio in a Red Book CD data stream; (3) four channels of 24 -bit, 96-kHz audio in a DVD stream of 6.144 megabits/second (currently only two channels fit); (4) 5.1 channels o 20 -bit, 96 -kHz audio in a DVD stream and (5) eight channels of 20-bit, 48-kHz audio in a DVD audio stream.


Lossy compression schemes attempt to evaluate the component of the microphone output that is irrelevant to human listener (either because it falls below the hearing threshold or because it will be masked b adjacent content) and to convey the essence of the sound rather than the wave form. Now, perceptual coding is not a ridiculous idea: After all, that is exactly what happens in our hearing system, and such techniques have been shown to be capable of excellent performance.

However, no one in the ARA believes that any existing lossy coding system can be said absolutely to have passed the test of time with respect to complete sonic transparency. Furthermore, our current understanding of human psychoacoustics is such that it would take a very brave (or foolish) person to suggest we under stand all we need to design a lossy compression coding scheme that meets all the ARA requirements.

This is not to say at all that lossy compression is wrong; it is very useful in appropriate con texts. However, if we are to be confident that we are conveying the original music event with complete transparency, then our current understanding can offer nothing beyond passing all the captured data, bit-accurate, to the playback system. That may well change in the future, but for now, I don't feel that lossy compression can be advocated for audio of the highest resolution.

Table II--Reduction in data rates, in bits per sample. per channel, when the proposed lossless compression scheme is used.

Fig. 5--Block diagram of a delta-sigma A/D converter, showing both PCM and Bitstream outputs.

Table III--Data rates per channel, relative efficiencies, and jitter susceptibility of a number of coding options. Also shown are the number of whole channels that can fit into a DVD data stream of 6.144 megabits/second (Mb/S). CODING EXAMPLES


it has been suggested that a suitable distribution format might be the single-bit, perhaps 64-times-oversampled, data streams produced by many modern-day A/D converters (see Fig. 5) or even hybrid bit streams, such as eight-times oversampled 8-bit. The argument for the 1-bit scheme is that simple DACs complete the chain, allowing the stages of digital filtering in the A/D and D/A converters to be by passed, and that the bitstream signal represents a superior archive. The data rate of such a channel is high (around 3.1 Mb/S), and even with lossless compression, a bitstream channel requires nearly three times the data required by a losslessly compressed PCM equivalent.

While it is not appropriate here to go too deeply into the arguments for and against bitstream coding, there are some very powerful negatives, beginning with the fact that we should be aiming substantially higher for the future than accepting a 1-bit, 64 times modulator. The best current -day converters use 4- or 8 -bit modulators. Further more, most recordings are multibit and originate in a multibit DSP environment (for example, as a result of performing a mixing or editing function). So, if the cap turing A/D converter uses a different modulator architecture (such as 4-bit, 128fs) or the recording is multibit as an original, it makes no sense to convert the signal to bitstream--especially as the conversion process is inherently lossy and nonlinear. I believe that it would be a very great mistake to try to standardize the archive format, particularly to anything of such question able audio promise as 64fs, 1-bit code.

In addition, any attempt to introduce a low-bit distribution format would face a significant difficulty because the industry has no interfaces, DSP methods, or machinery that would permit the change to be effected gradually. In fact, the inherent simplicity of bitstream coding rapidly disappears when any subsequent operations on the data are required, such as in normal editing.

Bitstream coding might be appropriate to very simple two-channel systems, but its data-rate requirement becomes unacceptable when the needs of multichannel are taken into account. It is also difficult to guarantee perfect linearity when bitstream coders based on delta -sigma modulation are used. That is because, unlike a multi level quantizer with dither, a two-level quantizer, even with dither, is not linear.

Linearity is improved by negative feedback, but performance cannot be guaranteed for all signals.


Thus far, we have reviewed a number of important features of the eight coding methods listed at the beginning of this article. Because we are interested here only in the highest -quality sound, I have set aside lossy compression schemes. All the remaining options on the list can be engineered to provide equivalent resolution. In the con text of real applications, such as DVD, a crucial comparison is the quantity of data used by each method. Channel coding that requires more data to convey the same sound quality uses up bandwidth that could otherwise be used to convey more channels or higher-quality associated video information.

Table III offers a useful comparison. The base data rate is taken to be a 14 -bit, 58 -kHz channel with noise shaping, suggested last month in Part 1. If the sampling rates are limited to multiples of 48 kHz, then a simple rectangular PCM channel using 20 bits at 96 kHz (example 5 in the table) can meet the target performance.

When sampling is at 48 kHz, the perceptually equivalent 21-bit channel (example 2 in the table) uses 24% more data to convey less bandwidth than may be needed. Noise shaping with pre-emphasis (example 3) is close to 100% efficient, and its losslessly compressed version, at 106% efficiency, is very effective indeed, enabling eight nearly transparent channels to fit into a DVD audio stream.

Moving up to 96-kHz sampling guarantees adequate bandwidth. Examples 5 and 9 in Table III show that raw 20- and 24 -bit channels use up to three times the base data rate and restrict a 6.144-Mb/S stream to two or three channels. Example 8 indicates how using the new pre -emphasis scheme alone increases efficiency, and when noise shaping is added (example 7), we see 60% efficiency, with four channels accommodated. The PCM options have medium jitter susceptibility.

When lossless compression is added (example 10 in the table), efficiency rises to 70% and five channels fit into the data stream. The highest efficiency in this group (88%) is achieved with 30-kHz band-limited lossless compression (example 11). The losslessly compressed examples exhibit low jitter sensitivity.

The bitstream options (examples 15 and 16) have the lowest efficiency of all, at 26%.

They are also highly susceptible to jitter and manage to fit only two channels into the data stream. With bitstream coding, it is very difficult to offer multichannel audio or high -quality associated video.


In Part 1 of this article, I reviewed the is sues surrounding the transmission of high-resolution digital audio. The conclusion was that audible transparency can be guar anteed by a channel equivalent to one that uses PCM coding at a 58-kHz sampling rate with either 14-bit representation and appropriate noise shaping or 20-bit representation and a flat noise floor (i.e., a "rectangular" channel). This implies that the current CD channel standard -- 44.1-kHz, 16-bit coding -- may not always be adequate (even with noise shaping to extend the resolution) and that raising the sampling rate to 48 kHz is still not quite enough.

Sampling at 88.2 or 96 kHz is too high, on the other hand, and therefore wasteful of storage capacity or transmission band width. The use of sampling rates above 96 kHz to convey a wider audio bandwidth cannot currently be justified at all. But on the assumption that practical considerations of down-sampling will lead the industry to choose sampling rates for DVD-Audio that are integral multiples of 44.1 or 48 kHz (i.e., 88.2 and 96 kHz), we have looked in this installment at options for improving coding efficiency at these rates.

Noise shaping combined with a new pre emphasis/de-emphasis characteristic for 96-kHz (or 88.2-kHz) applications can result in an effective increase of 2 to 7 bits' worth of dynamic range to the channel. At these sampling rates, in other words, a 16-bit channel should be sufficient. (Actually, a 14-bit channel can be made to yield a 21-bit dynamic range via these techniques, but the examples given are based on 16-bit channels because that is the smallest option in the current DVD standard.) This coding scheme compares very favorably with other methods of reducing the data rate, in that it offers a very low implementation cost, assures transparency, and is compatible with existing systems. I and the other members of the ARA strongly urge its adoption as a standard.

And though we consider bitstream coding and lossy compression schemes inappropriate for an audio system intended to deliver the highest possible resolution and transparency, we see considerable advantages to well -implemented lossless compression. Consequently, I have also presented here a lossless compression method that provides significant savings in peak data rate at both 48- and 96-kHz sampling rates.

The savings made in the high-rate channels are sufficient to allow more than five channels to be carried in a 6.144-Mb/S DVD audio stream or to leave room for video on a DVD-Audio carrier.


This article is based on papers I have presented to the Audio Engineering Society. Inevitably, it draws on the work of others. Many points emerged from discussions between members of the Technical Subcom mittee of the Acoustic Renaissance for Audio (Tony Griffiths, Professor Malcolm Hawksford, David Meares, and me) and our advisors (Peter Craven, Michael Gerzon, Hiro Negishi, Francis Rumsey, and Chris Travis). The assistance of Takeo Yamomoto, Bike Suzuki, Malcolm Law, and Adrian Farmer has also been particularly valuable.

Those who wish to contact the Acoustic Renaissance for Audio (ARA) may write (c/o Meridian Audio, Stonehill, Stukeley Meadows, Huntingdon PE18 6ED, England), call (+44 1480 52144), send a fax (+44 1480 451587), send e-mail (, or visit the group's Web site (www.meridian



Acoustic Renaissance for Audio, "DVD: Application of Hierarchically Encoded Surround Sound-Including Ambisonics," private publication, November 1996; available for download at

Acoustic Renaissance for Audio, "High -Quality Audio Application of DVD (Draft 0.5)," private publication, November 1996; available for download at www.meridian

Acoustic Renaissance for Audio, "A Proposal for High-Quality Application of High-Density CD Carriers," private publication, April 1995; available for download at (reprinted in Japanese in the Journal of the Japan Audio Society, Vol. 35, October 1995).

Acoustic Renaissance for Audio, "DVD: Pre -Emphasis for Use at 96 kHz or 88.2 kHz," private publication, November 1996; available for download at www.meridian

Akune, M., R. M. Heddle, and K. Akagiri, "Super Bit Map ping: Psychoacoustically Optimized Digital Recording," 93rd Convention of the Audio Engineering Society (AES), San Francisco, 1992; AES Preprint No. 3371.

Craven, P. G. and M. A. Gerzon, "Compatible Improvement of 16 -Bit Systems Using Subtractive Dither," 93rd Convention of the AES, San Francisco, 1992; AES Preprint No. 3356.

Craven, P. G. and M. A. Gerzon, "Lossless Coding for Audio Discs," Journal of the Audio Engineering Society (JAES), September 1996 (Vol. 44, No. 9; pp. 706-720).

Craven, P. G., M. J. Law, and J. R. Stuart, "Lossless Compression Using IIR Prediction Filters," JAES (Abstracts), March 1996 (Vol. 44, No. 3, pg. 404), and AES Preprint No. 4415.

Craven, R G. and M. A. Gerzon, "Lossless Coding Method for Waveform Data," International Patent Application No. PCT/GB96/01164 (May 1996).

Dadson, R. S. and J. H. King, "A Determination of the Normal Threshold of Hearing and Its Relation to the Standardisation of Audiometers," Journal of Laryngology and Otology, Vol. 66 (1952; pp. 366-378).

Hawksford, M. O. J. and C. Dunn, "Is the AES/EBU/SPDIF Digital Audio Interface Flawed?", 93rd Convention of the AES, San Francisco, 1992; AES Preprint No. 3360.

Hawksford, M. O. J., "Bitstream vs. PCM Debate for High-Density Compact Disc," private publication, April 1995; available for download at

Gerzon, M. A. and R G. Craven, "Optimal Noise Shaping and Dither of Digital Signals," 87th Convention of the AES, New York, 1989; AES Preprint No. 2822.

Gerzon, M. A., P. G. Craven, J. R. Stuart, and R. J. Wilson, "Psychoacoustic Noise Shaped Improvements in CD and Other Linear Digital Media," 94th Convention of the AES, Berlin, 1993; AES Preprint No. 3501.

Katz, B., "96 -kHz Listening Test," thread on Internet news-group (July 1997).

Komamura, M., "Wideband and Wide Dynamic -Range Recording and Reproduction of Digital Audio," 96th Convention of the AES, Amsterdam, 1994; AES Preprint No. 3844.

Meridian Audio, Ltd., "Lossless Compression for DVD: Summary of Features," private publication, July 1997; avail able for download at

Meridian Audio, Ltd., "Lossless Compression for DVD: Technical Proposal," private publication, July 1997; available for download at

Ohashi, T., E. Nishina, N. Kawai, Y. Fuwamoto, and H. Imai, "High Frequency Sound Above the Audible Range Affects Brain Electrical Activity and Sound Perception," 91st Convention of the AES, New York, 1991; AES Preprint No. 3207.

Ohashi, T., E. Nishina, Y. Fuwamoto, and N. Kawai, "On the Mechanism of Hypersonic Effect," Proceedings of the International Computer Music Conference, Tokyo, 1993 (pp. 432-434).

Oomen, A. W. J., R. G. Groenwegen, R. G. van der Waal, and R. N. J. Veldhuis, "A Variable-Bit-Rate Buried-Data Channel for Compact Disc," JAES, January/February 1995 (Vol. 43, No. 1/2; pp. 23-28).

Robinson, D. W. and R. S. Dadson, in International Standards Organization ISO131-1959.

Robinson, D. W. and R. S. Dadson, "A Redetermination of the Equal -Loudness Relations for Pure Tones," British Journal of Applied Physics, May 1956 (Vol. 7; pp. 166-181).

Stuart, J. R. and R. J. Wilson, "A Search for Efficient Dither for DSP Applications," 92nd Convention of the AES, Vienna, 1992; AES Preprint No. 3334.

Stuart, J. R., "Noise: Methods for Estimating Detectability and Threshold," JAES, March 1994 (Vol. 42, No. 3; pp. 124 140).

Stuart, J. R. and R. J. Wilson, " Dynamic Range Enhancement Using Noise -Shaped Dither Applied to Signals With and Without Pre-Emphasis," 96th Convention of the AES, Amsterdam, 1994; AES Preprint No. 3871.

Stuart, J. R. and R. J. Wilson, " Dynamic Range Enhancement Using Noise -Shaped Dither at 44.1, 48 and 96 kHz," 100th Convention of the AES, Copenhagen, 1996.

Stuart, J. R., "Auditory Modeling Related to the Bit Budget," Proceedings of AES U.K. Conference: Managing the Bit Budget, 1994 (pp. 167-178).

Vanderkooy, J. and S. P. Lipshitz, "Digital Dither: Signal Processing with Resolution Far Below the Least Significant Bit," Proceedings of the AES 7th International Conference: Audio in Digital Times, Toronto, 1989 (pp. 87-96).


<< cont. from part 1

Adapted from 1998 Audio magazine article. Classic Audio and Audio Engineering magazine issues are available for free download at the Internet Archive (, aka The Wayback Machine)


Top of Page   All Related Articles    Home

Updated: Saturday, 2017-11-04 19:05 PST