Studio digital methods [Introducing Digital Audio]

Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting


Departments | Features | ADs | Equipment | Music/Recordings | History



Digital tape

Studio recording techniques now concentrate on digital tape methods rather than on analog, so reducing the problems of quality in tape mastering. The main problem, which is a very old one that has always dogged the fortunes of tape recording, is that the magnetizing of a tape is not a linear process. The graph of retained magnetism plotted against amplitude of magnetizing current is 'S' shaped, and at very low values of magnetizing current, the tape is not magnetized at all. The traditional way of partially overcoming this has been to add ultrasonic bias to the recording head so that the audio signal was superimposed on the peaks of a high frequency signal . By arranging for a suitable amplitude of bias, these peaks could be placed in a region of the tape characteristic that was reasonably linear. The linearity obtained in this way was just acceptable -- but only just.

The main advantage of digital recording for a tape medium is obvious. Since a signal consists only of 0 or 1 digits, there is no requirement for linearity, so removing the greatest problem of tape recording. Compared to this, all other considerations become minor ones because all analog tape recording is a struggle to keep reasonable linearity while trying to record a good dynamic range of sound amplitude without overloading the tape . In the past, this has involved very elaborate companding circuits whose contribution to the fidelity of sound has been acceptable in some cases, dubious in others . Since digital reproduction requires no variation of amplitude of signals, the problems of tape overload and saturation also disappear.

Now that digital recording techniques are well established, it's difficult to remember just what immense problems tape always represented as an analog medium. While distortion figures of 0.1 % were commonplace for amplifiers, magnetic recording was striving to get below 1 % distortion, and even when this had been achieved, to stay there consistently was quite another matter. The situation was such that several of the smaller recording companies who could not make a colossal investment in modern techniques were reverting to direct disc cutting to avoid the problems of tape mastering and the temptation that all sound engineers have to 'doctor' the sound while it is on the tape.

Before we get too euphoric, though, it's as well to remember the difficulties that faced the pioneers of digital tape recording. The first (and main) problem is that a digital signal requires a much higher recording density than an analog one. If we stay with the CD standards of around 44 kHz sampling rate, and recording 16 bits at each sample, plus allowance for extra error-checking and correcting bits, then the number of 1 or 0 signals per second becomes very formidable, of the order of several megahertz. This takes us into video frequency rates, and digital audio would probably not have been developed at anything like the rate that was possible if video recording had not smoothed a path. Digital recording is, if anything, rather easier than video recording because the video signal is not digital . In order to record video signals we have to convert them to frequency modulated signals of constant amplitude, making them rather like digital signals . Even allowing for the fact that the digital signal can be easily recorded, using, for example, one direction of magnetization for 0 and the opposite direction for 1, the rate of recording is still very high. The bandwidth of a digital recorder needs to be about 30 times as much as is needed for analog recording -- and in the days of analog tape mastering it was almost as much a struggle to achieve adequate bandwidth as it was to achieve acceptable linearity. This, however, is the only really serious problem, and all other considerations point to digital recording as a preferable method. There are, in addition, some useful bonuses, like the ability to interleave two or more stereo channels into one track, or to interleave audio (digital) signals with control signals.

There are quite a large number of methods of recording digital signals on tape (or other magnetic media) of which the NRZ (non-return to zero) system is the simplest. NRZ recording uses two directions of magnetization, referred to for convenience as positive and negative though there is no connection with voltage positive and negative, to represent digital signals with positive meaning 1 and negative meaning 0. The main problem of NRZ methods is that if the bits of a signal are unchanged for a long period the signal that is recorded will be the equivalent of a DC signal, and will suffer distortion, since magnetic heads used on replay will respond to the rate of change of magnetization on the tape, not the extent of magnetization.

NRZ is less of a problem when data has been coded as FM signals, so that it is a method that is used for video recorders when they are working with digital signals that have been frequency modulated. For other purposes, NRZ is not favored, and it was replaced many years ago for digital signal purposes by more modern methods. There is a modification of NRZ which adds an extra dock bit so that each data signal consists of a dock signal followed by a data signal, with the dock signal always in the opposite direction. This ensures that each data bit is represented by a change in magnetization, and so avoids the problems of long strings of identical bits . This type of code has been used for recording sound on the Video-8 system when the recorder is switched to sound-only; the bits are then frequency-modulated . A much-used method for computing is known as MFM, modified frequency modulation. This is used extensively on computer magnetic discs, though it is steadily being replaced by other methods. Referring to the diagram, Figure 5.1 each bit is taken as occupying a time called the bit cell, and changes between positive and negative directions of magnetization occur either in the middle of the cell or at the start . Where there is a zero in a cell, there is no change in the direction of magnetization in the middle of the cell . Where there is a 1 in a cell, the direction of magnetization occurs in the middle of the cell time . There is always ...


Figure 5.1 Illustrating MFM recording in which each 1 bit causes a transition (0 to 1 or 1 to 0) in the middle of the bit cell and two adjacent 0’s will cause a transition at the start of the bit cell.

... a change of direction of magnetization at the start of any cell in which a second or subsequent zero occurs . Since the MFM coding is by change of direction rather than by absolute direction, there is no problem with long strings of 1's or 0' s. On the other hand, the coding and decoding is more elaborate and it is necessary to keep a precise clock rate in order to be able to distinguish a change of magnetization that occurs at the start of a cell as compared to one that occurs in the middle. In addition, the use of MFM requires changes to occur at frequent intervals, as close as half a bit cell, and this limits the density with which the signals can be packed on to the magnetic tape or disc surface.

Other methods depend on code conversions, and of these the system of most interest is the EFM (Eight to Fourteen) system as used for compact disc - since the principles 9f so many other coding systems follow much the same pattern only EFM will be explained here. The same system, used for computer hard discs is known as the RLL (run-length limited) system, and it allows tighter packing of data on to the disc area as compared with the older MFM. Using EFM, the data bits are dealt with not individually but in blocks of eight, corresponding to the byte unit used so much in computing. Each set of eight bits is then converted to a 14-bit pattern, using a conversion table (a ROM which will give the appropriate 14-bit output for an 8-bit input) . A collection of 14 bits can represent 214 states of 1 and 0, a total of 16384 possible arrangements . Since 8 bits can be used to represent 256 possible states, the conversion to 14 can be done in such a way that each 14-bit code has no string of identical digits of more than 7 nor less than 2. This simultaneously deal with both requirements - no long sequences and no 1010 types of transitions. In computing, this is often referred to as the RLL(2,7) system. Once modulated, these 14-bit units can be recorded by ordinary NRZ methods to give tight packing and better error-avoidance. For the CD form of EFM, three extra bits are added to each 14-bit unit for synchronization and low-frequency suppression purposes.

The tape bandwidth problem No matter what we do to code the digital signals, the problem of bandwidth can be solved in only one satisfactory way, which is to increase the rate at which the tape passes the recording head. This can be done by using a stationary head and moving the tape rapidly past the head, as used in early types of video and digital audio recorders . More useful systems use multitrack heads, the basis of the S-DAT system, or the more familiar video recorder technique of using rotating heads which cross the slow-moving tape at an angle. For studio recording, the use of fixed head machines is not such a disadvantage as it would be for domestic use (the requirement for very long tapes in large reels, for example), and fixed head machines were the first to be developed and used. One mitigating point about a fixed-head machine is that it readily allows more than one track to be recorded, since studios normally want at least 24 tracks . It also allows for easier cut-and splice editing, though editing systems for other forms of recording are by now well established.

The critical quantities, as far as any tape recording system is concerned, are the minimum recorded wavelength and the maximum frequency. The minimum recorded wavelength is the minimum distance along the tape on which one complete wave can be recorded. This quantity depends on the gap in the recording head and will be in the range 1 to 5 µm (1 µm = one millionth of a meter, one thousandth of a millimeter, about 40 millionths of an inch) . Suppose that the tape is moving over the head at 5 cm per second (taking a convenient figure rather than an actual one) and the minimum wavelength is 5 µm. With this figure of minimum wavelength, we can record 100000015 = 200,000 cycles in one meter of tape, and so get 200,000/20 = 10,000 cycles into 5 cm of tape. At a speed of 5 cm /s therefore we can record 10,000 cycles in a second, giving a bandwidth of 10 kHz.

For recording frequencies in the region of 4 MHz, then, with the minimum recorded wavelength of 5 µm, we are going to need tape speeds of around 20 meters per second. Even for studio use this is unacceptable, and studio digital tape recorders in the past have used simpler digital methods that allowed the bandwidth to be reduced to less than 1 MHz. By further 'trickery', such as splitting data between different channels on the tape, tape speeds as low as 38 emfs could be used in fixed-head recorders . The use of rotating head recorders, though it brings problems of switching and synchronization, greatly relieves the problem of tape speed, and allows tape speeds comparable to the speeds of domestic tape cassettes to be used. In addition, it allows the PCM systems, comparable to the CD system, to be used rather than other digital modulation systems . The acknowledged pioneer of rotating head recorders with PCM are Nippon Columbia of Tokyo, who in 1972 were able to demonstrate a PCM audio system operating on a professional video tape machine (domestic video recorders did not exist at that time) . Since that time, many other forms of digital audio studio recorders have been developed -- and so also has the CD system. As you might expect, this has raised problems of compatibility.

One of the reasons for the compatibility problem has been that both video recording and digital audio recording have been developing very fast and in parallel with each other. Quite apart from digital methods, there has always been a lack of uniform standards in regard to recording throughout the world. As video recording developed, incompatible standards developed there also -- in the domestic VCR field we have seen the example of VHS, Betamax and the various Philips standards competing. It is hardly surprising, then, to find that there are at least five major incompatible digital recording systems in use at the present time for studio recording on reel-to-reel tape.

The enormous success of the collaboration of Philips with Sony in developing a single world-wide standard for compact disc has had some effect, however, and manufacturers are moving towards studio systems that have at least some compatibility with CD. Fixed head machines are not so common now, though they still have an advantage in allowing easy splicing of the razor-blade and block variety. This can be an important consideration for a small studio. In addition, a fixed head machine can employ another head for monitoring the recorded signal, something that is not generally possible with the rotating head type . The Sony PCM 1620 and 1630 machines were among the first rotary head machines used for studio recording, and are still widely used though rotary head machines are now available from a number of other suppliers. Since the modem R-DAT (rotary head digital audio tape) cassette systems also use rotating heads and work with digital signals that are compatible with CD standards, we can expect to find that studio recorders will fall into line. The R-DAT cassettes have been available for some time, but their sales have been hindered by objections from the recording studios who feel that a recording system that could make perfect tape copies from CDs would undermine the market for CDs . The answer, as the computer software business has shown, may be to sell CDs at a price that is competitive with the price of a blank R-DAT tape.

The problem is probably not so great as it might seem. The urge to transfer LPs to tape was for a large part motivated by the rapid deterioration of the LP as it was played several times over. Since a CD has virtually an unlimited life, there is little point in taping it except for use in a car fitted with conventional cassette equipment . One wonders if the objections would have been so strong if R-DAT had been developed in Europe and the major CD suppliers had been in Japan. As it happens, events have thrown a cloud of doubt over the whole future of R-DAT; we shall return to this in Section 7.

Rotary head techniques

We have made a lot of mention of rotating head recorders in this section, but to the reader who has taken no interest in video cassette techniques, rotating head recorders are as unusual as digital recording itself. The following section is an introduction to the ideas of rotary head tape recording for the benefit of the reader whose experience is almost entirely based on conventional analog audio equipment . If you have experience with video recording then most of what follows will be familiar, though the use of rotary head recording with digital audio signals is not identical in pattern to that used for videocassettes . If you need a more comprehensive treatment of rotating-head video recording then you will find a book by Eugene Trundle, 'Television and Video Engineers Pocket Book' very helpful; (see Appendix 1). To start with, the reason for adopting rotary head recording for videocassettes was that a manageable cassette can contain only a limited length of tape, and with fixed head recording such a quantity of tape could provide only a few minutes of playing. To put it into perspective, the record/play speed for a fixed head recorder would need to be of the same order of speed as the fast wind of a modern videocassette recorder. The solution that has been adopted for videocassette recording, and also on some other videotape machines, is to use revolving tapeheads whose speed relative to the tape can be very high, allowing the tape itself to be moved at a comparatively slow pace, about 1.873 emfs (0.737 inches/s) on the Betamax type of recorder.

The usual technique is to have two tapeheads revolving fast enough to give a speed between head and tape of between 5 and 7 mls depending on the type of system - the higher speed is used by the Betamax system. The path of the tape is not in the plane of rotation of the heads, but at a small angle of about 5° so that each head will cross the tape so that it lays down a diagonal track shown in Figure 5.2 with the angle considerably exaggerated to make the tracks more obvious. The angle around which the tape is wrapped…


Figure 5.2 The principle of rotating-head video recording. The tape is wrapped around a guide so that a drum containing the two heads can pass across the tape at a small angle . This lays down a pattern of diagonal tracks separated by an unrecorded 'guard band'. The overlap between the heads is not used, because only synchronizing signals are transmitted in this time, so that the edges of the tape can be used for audio and for control signals.

... is about 186 degrees. The movement of the tape itself is such as to arrange for a small gap, the guard band, to exist between successive video tracks . Each guard band therefore separates tracks that have been recorded by alternate heads. The edges of the tape pass over stationary heads, one of which records the audio signals on one side of the tape, with the other recording synchronizing and control signals on the other edge . In the older design of videocassette recorder, the sound quality is very poor because the audio signals are being recorded with a stationary head and a tape speed that is very low even by audio cassette standards. More modern units use an additional rotating audio head, following the same methods that have evolved into the R-DAT system. The video signals are separated into sections and frequency-modulated on to a carrier with the color (chroma) signals at a lower frequency band than the luminance (brightness) signals. The use of frequency modulation minimizes the need for linear tape response and also allows for speed jitter by recording signals which when retrieved can be used to create a local clock signal . In addition, the two recording/replay heads are placed at different azimuth angles to reduce interference between adjacent tracks, and by storing and comparing signals, the effects of crosstalk can be minimized.

At first sight, you might think that this well-tried system could be adopted in a perfectly straightforward way to digital audio recording, because the digital audio signals are not quite so demanding. The problem is that audio signals are continuous, whereas video signals are not . A video signal consists of a set of waveforms which repeat at 20 ms intervals, the field time. Each field of a video signal corresponds to a set of lines being drawn on the screen of the receiver, and two fields make up a complete picture . The fields are interlaced, meaning that the odd-numbered lines of a picture are in one field and the even numbered lines in the next field. The important point here is that there is a time gap between fields, large enough for the receiver cathode ray tube to move the scanning spot back to the top left hand of the picture . This time is the field synchronization period, and takes the time of twenty lines in each field.

This field synchronization period contains only synchronizing pulses, not picture, and these pulses contain timing information, not picture signal . If the rotation of the heads is suitably governed, then, and the amount of tape wrap is correct, the crossover from one head to another can be arranged to take place during the field synchronizing interval rather than at a time when picture information is being sent . One tape track from one head, in other words, records or contains the picture signal data of one complete field.

The field synchronization pulses occur at regular intervals and can be replaced by clock pulses from an oscillator.

This discontinuity of video signals makes the unavoidable problem of the changeover from one head to another rather insignificant when a TV signal is being recorded. There is, however, no such natural break in an audio signal, analog or digital, and if we are going to use a two-head system then we have to create a break. This is the basis of all systems that use standard videocassette recorders for digital audio, and also the R-DAT digital audio cassette system.

Creating a break is done by time-compression of the digital pulses that make up a 'frame' of signal . This signal 'frame' is entirely artificial, simply the number of digital signals that will fit into the time that a head requires to scan its path across a piece of tape. By time compression, I mean that the number of digital pulses that for one frame are sent to the recording head (or received from it) in a time that is shorter than their natural period.

This allows for a gap in the interval when the scanning of the tape...

(Each pulse JL represents a group of sound signals)


Figure 5.3 Data compression used to make artificial gaps in data for use with rotating head audio recorders. The signals are placed into memory with one clock speed, and read with a higher clock speed, with a gate used to interrupt the reading clock at intervals. The illustration shows pulses grouped in twelves, but larger numbers are used in practice . At the receiver, the reverse process restores the original clock rate and the original pulse rate.

... is handed over from one head to the other, after which another frame of signals can be sent or received.

This time compression is achieved by the use of computer-style memory. The signals in digital form taken from the processing circuits are loaded into memory continuously, but the memory is read only at intervals that are separated by a gap, Figure 5.3. This needs more than a simple serial shift register to accomplish, because the bits of memory are read faster than they are written (on the recording side), so that the memory needs to accommodate a complete frame of signals plus the signals that arrive in the time between frames. The solution is the use of random-access memory (RAM) as employed in computers -- and if this is not a familiar system we shall have to explain it further here . Unlike memory based on shift registers, RAM uses a set of flip-flops in which any one of the set can be selected without the need to select any other in a sequence - we can, in other words select at random, hence the name . In such a memory, selection is done by using a binary number, called an address, to locate each unit of memory, so that each unit has its own unique address number.

On the type of RAM used for computers, reading and writing never takes place at the same time, so the memory can be switched to either function so as to allow one set of lines to carry data either to the memory or from it. For digital audio time compression we can use memory IC' s that have separate input and output terminals . The time needed to write or read a bit in one of these memories is very short, of the order of 150 ns overall, so that it is possible to interleave reading and writing actions. There is no need to clear the memory at any time, because the action of writing a bit replaces whatever was in that unit of memory previously.

The use of memory in this way therefore creates a signal which can also have 'horizontal' synchronizing pulses (at more frequent intervals) added to it. By now this is to all intents and purposes a video signal which can be recorded on a video recorder of the rotating head type. The standard video recorder modulation system of NRZ (non-return to zero, meaning that the 0’s and l's are recorded unchanged) can be used along with the circuitry that is normal to every videocassette recorder . The recorded signal can be recovered and the processing reversed so that the digital signal is taken at a steady rate from memory while it is intermittently loaded from the tape circuits into the same memory.


Figure 5.4 Reading problems. If a waveform contains too many grouped l's in particular, it can cause trouble because of the inevitable integration (a) . This is particularly difficult on the NRZ type of signal. If the recording method uses a 1 to signal each transition from 0 to 1 or I to 0 (b), then the problem is greatly reduced.

The recording system for the videocassette recorder has been referred to as the NRZ type, a topic that has been referred to earlier. This system is not well suited to a signal that consists of a long stream of the same bits . It is difficult to separate out exactly how many bits are in the stream, since such a stream would be recovered from the tape as a long pulse with sloping sides (Figure 5.4). The use of bits in this form, usually with one direction of magnetization to represent 1 and the opposite direction to represent 0, is the NRZ, non-return to zero, system. It is not, as we have commented, well suited to audio coded signals, but the conversion of an audio digital signal into a frequency-modulated form of video signal allows this type of modulation to be used.

This avoids the use of the rather less simple methods that are employed for computer disc recording, and makes the recording of digital audio identical with the recording of video. As we shall see later, though, if we do not use a pseudo-video type of signal we still have to tackle the problem of having sequences of too many 1's or 0’s, and this is done by altering the way that the numbers are carried in digital code, departing from the simple binary code system.

Consoles and editing

Studios that handle audio signals in digital form need to be able to carry out all the normal studio actions of fading, mixing and cutting on the digital signals, since it would be completely counter productive to have to convert back to analog form for these actions . The splicing of tape is one action which greatly favors the use of stationary head recorders, and though machines exist to splice tape from rotary head equipment, the greater simplicity of fixed head methods can be attractive.

No such simplicity exists for fading and mixing, and a completely digital studio console is a very formidable piece of equipment . Whereas the reduction of signal amplitude for an analog signal can be carried out by using a potentiometer, on a digital signal it has to be done by the equivalent of subtracting from each sampled digital number. The amount that is subtracted cannot be a constant, otherwise the signal will be distorted, so the amount of signal that is subtracted will depend on the amplitude of the signal -- in other words it is a constant fraction of the signal amplitude at any point . This makes the requirement clearer -- what we are doing is to multiply each digital number by a fraction which is equal to the potentiometer division ratio that would be used for an analog signal . Now multiplication by a fraction is, in computing terms, a slow process, mainly because it involves a large number of steps . The only way round this is to carry out the multiplication by using hardware; circuitry rather than operations on numbers . Small studios tend therefore to do all this type of action before the signal is converted to digital form.

Nevertheless, the completely digital studio can be created and several are in use. In the UK, the internationally respected audio engineers Neve produce all-digital consoles that are designed to be used in the production of CDs, and it cannot be long before totally digital equipment is almost universal in studios, particularly since the steadily decreasing prices of devices like memory chips have worked their way through the manufacturing system. The period in which the price of a computer for a given task fell from about £15,000 to nearer £500 has been the development period for digital sound, and we can now start to reap the benefits of the costly pioneering work.

Owners of Sony Video-8 recorders can make use of the 8-mm video tapes for recording and replaying of pulse-code modulated audio signals . This can be done either along with video signals or, in another mode, as a sound-only recorder. In this latter mode, a playing time of as much as 18 hours is possible. The methods that are used are of interest in showing how digital techniques have improved in the few years since the first compact discs appeared.

The A-D converter in the Video-8 machines is of 10-bit resolution, giving a quantization to 1024 levels . When the sound is being recorded along with video signals, this 10-bit signal is reduced to 8 bits and the signals are stored into memory to be released at intervals so that the signals can be recorded at about seven times their normal rate, corresponding to about 2 Mbits per second . This audio track is recorded on the first part of each head sweep, so that the signals have to be switched at the head amplifiers . When the recorder is being used for sound only the whole of the tape can be used for the compressed audio signals.

The signals contain parity bits and are, like the signals of CD, scattered in such a way as to ensure that a dropout will affect only a small part of any signal. The analog stages also contain a compressor for recording and an expander for replay, so that the signal range from a system of only 8 bits corresponds to that from an un-companded 13-bit system.

Tail piece The following description is of a scheme for an older digital recording system which used a normal videocassette machine ( US standards) . This gives some idea of the signal coding methods that were used, and another point of interest is that this scheme was considered at one time for domestic audio recording. As Section 7 notes, however, the final DAT system bears little resemblance to this.

The system allows for a choice of sampling rates to be used, with the maximum frequency that can be recorded depending on the sampling rate that is used. The audio signals are passed through a low-pass filter whose range is DC to 20kHz, ± 2db, so that higher frequencies do not interfere with the sampling frequency. The block diagram is shown in Figure 5.5, and you can see that the low-pass filter is followed by the sample-and-hold circuit. The output from this is converted, with a time of about 5 µs available.

A sampling rate of 44.056 kHz is used.

The resulting digital signal is then changed into the form of a NTSC (the US color TV standard) TV signal. This allows any ordinary videocassette recorder to be used for recording and replay through the video input/output socket. To start with, an eight to 14 modulation is used on the signal - this type of modulation, which gives a 14 bit signal for each eight bits of 8-4 2-1 number, is also used for CD, as noted in the following section.

The signal is then arranged into 'lines' and 'fields' so as to mimic the TV signal exactly. Each line takes a time equivalent to 128 bits of signal which when the synchronizing pulse is added takes a time of 168 bits, as Figure 5.6 shows. These lines are assembled into fields of 245 lines each, which with sync pulses take the time of 262.5 lines. Figure 5.7 shows the difference between 'even' and 'odd' lines which matches the interlace specification of a TV signal . In each frame, a data disposition signal is added, consisting of one line that contains a leveling signal of 1100 (to set data level controls), a contents identification signal, a 16-bit cyclic redundancy checking code error detecting signal, and address and control signals (intended to prevent digital dubbing) . The signal levels of 0.4V for 1 and 0.lV for 0 are used, corresponding to TV pedestal level and black level respectively. A peak white signal is transmitted during flyback so as to correct the action of AGC circuits.


Figure 5.5 The block diagram for a digital tape recording system using pulse code modulation. The replay steps are the reverse of the recording steps.

Data CRCC


Figure 5.6 The line structure of an audio data signal created so as to simulate a color TV waveform and allow a videocassette recorder to be used .

Even numbered field

Odd numbered field

All numbers refer to line equivalents, 168 bit times


Figure 5.7 The field structure of a pseudo-video type of digital signal . This is arranged with the same type of differences between odd and even numbered fields as is used in TV signals.


PREV | NEXT

Top of Page  All Related Articles   Home

Updated: Sunday, 2024-03-03 21:23 PST