Style and Sound [Television's Style: Image and Sound]

Home | Audio Magazine | Stereo Review magazine | Good Sound | Troubleshooting

Up to this point our discussion of television style has dealt primarily with visual elements: mise-en-scene, the camera, and editing. But television is not solely a visual medium. Sound has always been a crucial component of television's style.

This is not surprising when one remembers that, in economic and technological terms, television's predecessor and closest relation is radio-not film, literature, or the theater. Economically, television networks replicated and often grew out of radio networks. Technologically, TV broadcasting has always relied on much of the same equipment as radio broadcasting (microphones, transmitters, and so on). With these close economic and technological ties to radio-a sound-only medium-it is almost inevitable that television's aesthetics would rely heavily on sound. The experience of watching television is equally an experience of listening to television.

Sound's importance to the medium becomes obvious if one performs a simple experiment. Turn the sound off and watch 15 minutes of a program.

Then "watch" the next 15 minutes with the sound on, but do not look at the picture. Which 15 minute segment made more sense? Which communicated the most narrative (or other) information? Which had the greatest impact? Typically, sound without image is more self-sufficient than image without sound. Sound affects the viewer and conveys televisual meaning just as much, and possibly more, than the image does. Indeed, so little is communicated in the visuals of some genres-talk shows, game shows, soap operas-that they would cease to exist without sound.

In approaching television sound, we need to understand:

• The different types of televisual sound.

• The functions that sound serves on television.

• Sound's basic acoustic properties and how they are rendered through televisual sound technology.

• The significance of sound to television's structuring of space, time, and narrative.


The types of sound that are heard on television can be divided into three main categories:

1. Speech

2. Music

3. Sound effects

In television's more expensive productions, each of these components is given to a separate sound technician to create and shape. That is, one person does speech, one does music, and one does the rest. Each sound category is separately edited on different tracks of a sound editor, or digital audio workstation (DAW). DAWs are the audio-only version of digital video editing and are configured much like the Media 100 editor discussed in section 7. Digital sound editors typically divide the audio into numerous discrete tracks-marked Al through A8 in Figs. 7.3 and 7.4. In the Media 100, each sound element is labeled and the line beneath the label shows the volume (or level) of that element. A single sound source can be placed in each of these tracks and thereby is separated from the other sounds. Typically, certain tracks would only contain speech, while others would be limited to music and sound effects. This allows the sound editor to manipulate individual sound components before combining them together into the finished, composite soundtrack. The use of multi-track technology and the assignment of labor to the sound technicians indicates how the industry categorizes sound. This will serve as our starting point.

Speech. Without doubt, talk is the most conspicuous aspect of television sound. Soap operas thrive on it, and talk shows are defined by it. Even sports programs, which one would think would provide enough visual interest to get along without commentary, rely heavily on discussion of the game. Once, during the 1980s, a network experimented with broadcasting a football game without announcers, providing only the sights and sounds of the game and on screen statistics. Sports fans were not comfortable with this, and it hasn't been tried since. Apparently, television visuals are lost without speech. Sometimes it appears as if the images were superfluous, as if TV were, as one critic put it, a "lava lamp" with sound.

Speech in narrative television most commonly takes the form of dialogue among characters. Dialogue does not typically address viewers. It is as if they were eavesdropping on a conversation. In some comic situations, however, a character (e.g., George Burns, Dobie Gillis, Malcolm) will break this convention of the "fourth wall" and speak directly to the camera. Additionally, narration or voice-over, in which a character's or omniscient narrator's voice is heard over an image, can sometimes speak directly to the viewer, as when the adult Kevin Arnold talks to the viewer about his younger self in The Wonder Years (1988-93). (Note the difference between "narration," which refers to a voice speaking over an image, and "narrative," which we use more generally to refer to a some sort of story or fiction.) Speech in non-narrative television, in contrast, is often directly addressed to the viewer (see sections 4 and 12). News anchors look at and speak to ward the viewer. David Letterman directs his monologue right at the camera.

The announcers in advertisements cajole viewers directly, imploring them to try their products. Other programs are more ambiguous in the way they ad dress the viewer. Game shows pose questions to the social actors on screen, but these questions are also meant for viewers so that they can play along. Needless to say, the way that speech is addressed can be quite complicated, and even contradictory.

In terms of standard production practice, speech is most often recorded live on the set, during the "production" phase, rather than during pre-production or post-production (see section 7). This means that speech is usually recorded at the same time that the image is, but not always. Post-production sound work can modify the dialogue or, indeed, can even add to it or replace it altogether-as occurs when sound is dubbed or dialogue is changed using Automatic Dialog Replacement (ADR, also known as looping). In dubbing and ADR, one voice is substituted for another, as is illustrated in the backstage film Singin' in the Rain (1952), where one woman's voice is dubbed in for another's in the movie within that movie. ADR is conventionally used in several instances in television. First, when an actor's reading of a line is not considered satisfactory, it may be replaced with an alternative reading by that same actor. Second, if an actor's voice is not considered appropriate to the character it may be replaced by a different actor's. For example, when Andie McDowell played Jane, a British character, in Greystoke: The Legend of Tarzan, Lord of the Apes (1984), the producers felt that her natural Louisiana dialect did not suit the role. Subsequently, Glenn Close's voice was dubbed in for all of McDowell's dialogue. Third, dubbing is used in puppetry (e.g., Alf) and animation, as when Nancy Cartwright's voice is used for Bart Simpson's. Fourth and finally, in the rare instances in which foreign-language films or television programs are shown on U.S. television they are frequently dubbed into English, although they can be subtitled instead of dubbed. (In subtitling, the English translation is printed on the bottom of the screen and the original dialogue is retained.) Music. Music and speech go hand-in-hand on television. Customarily, dialogue will be accompanied by music throughout narrative programs. Indeed, it is the rare line of dialogue that has no music beneath it. And portions of a program-say, a car chase-that have no dialogue will almost always increase the presence of the music. Television is seldom devoid of both music and speech.

It is not a quiet medium.

Television music comes in many different genres-from the rock soundtrack of That '70s Show (1998-), to the country tunes of The Dukes of Hazard ( 1979 85 ), to the rap music of In Living Color (1990-94). Television absorbs a fairly broad spectrum of popular music, although it seldom presents avant-garde performances, and classical music appears infrequently (and is relegated to its own "highbrow" ghetto on PBS). Until relatively recently narrative programs did not use much music by well-known popular performers. If a scene required rock music, then studio musicians were used to create the rock sound, rather than using a well-known performer's work. When WKRP in Cincinnati premiered in 1978, it was thought to be groundbreaking because it featured music by original performers rather than sound-alikes.

Television's reticence to use popular music is partially an economic decision and partially an aesthetic one. As far as economics goes, if one uses a song that has been copyrighted then one must pay royalties for its use. If there is no current copyright on a piece of music, then it is said to be in the public domain, and may be used without charge. This provides an obvious economic incentive to avoid copyrighted music, and encourages producers to either use public domain music or generate new, original music. Copyright issues have become particularly complicated in this digital age-where an MP3 copy of a Metallica song sounds identical to the original and may be easily distributed across the Internet. The U.S. Congress has attempted to pass laws protecting the corporations that own copyrights, but this often leads to the abridgement of free speech. In addition to facilitating copying, digital technology also expedites rap music's sampling of bits of older tunes, which has occasionally resulted in lawsuits over the ownership of this music. As you can see, television producers face numerous new challenges in the legal use of copyrighted music.

One other, principally aesthetic, reason that some TV genres have shied away from popular music in the past is because rock music during the 1950s and 1960s was associated with subversive or countercultural elements. Soap operas and sports programs, for instance, avoided rock music until the 1980s because it was perceived as too decadent for those historically conservative genres. The fact that both sports and soaps now regularly incorporate rock tunes indicates both a change in rock's position in U.S. culture (it has now become mainstream) and a change in these genres themselves, an attempt on their part to attract younger viewers.

Music fits into television's mode of production slightly differently than speech does. Unlike speech, very little music is recorded live on the set during the production phase-excepting, of course, videotapings / broadcasts of live musical performances (e.g., the musical segments of Saturday Night Live [1975-], the Boston Pops Orchestra broadcasts). Instead, most music is either prepared before the production or after. In music videos, the music is recorded ahead of time (with a few, very rare exceptions). The performers then mouth the words to the song while they are filmed or videotaped. This form of synchronization of image to music is known as lip sync (see section 10 for more on music videos). Aside from musical productions, however, it is more common to add music to the image later, in post-production, than before shooting begins. Most scenes are shot without music, even ones in which music is supposed to be in the background-for example, a nightclub or dance. Music is laid on later so that the sound technician can get a clear recording of the dialogue and the director can tightly control the music's impact.

Live-on-tape productions have a fairly unique approach to music. In both narrative (principally, soap operas) and non-narrative programs (talk shows and game shows) that are recorded live-on-tape, the music is inserted while the scene is being videotaped rather than during post-production. In narrative programs, sound technicians record the theme song and several generic musical themes ahead of time. When the cameras start to roll, they insert the appropriate music when cued by the director-much like a musical cue in the theater. Non-narrative programs follow the same procedure for their theme music.

Other non-narrative programs such as late-night talk shows include a live band (e.g., Paul Shaffer's on Late Show With David Letterman) and live performances by guests. The principle remains the same, however. The music is inserted while the program is being videotaped-the only difference being that the music is performed rather than played back on an audio device.

Sound Effects. All the elements of television's sound that are not speech or music fall into the catch-all category of sound effects. This includes gunshots, doorbell rings, footsteps on the pavement, the crunch of a fist into a jaw, and so on. It also includes the background sound of a particular room or other space-in other words, the room's ambient sound. In live-on-tape productions, most of these sound effects are whatever is picked up on the set or inserted by the sound editor during videotaping, but in programs that are edited in post-production, sound effects can be fabricated and manipulated in seemingly infinite ways.

During the actual videotaping/filming, sound technicians will record the background noise and various other sound effects elements, but they will try, as much as possible, to isolate those sounds from the dialogue. This gives them the greatest flexibility in post-production sound editing. Footsteps may be heightened to increase suspense, or the background sound of a jet chancing to pass by during videotaping/filming may be eliminated. Sound effects, like speech and music, are endlessly malleable-especially through the use of sound processing software.

Commonly, sound effects are created in post-production work using the Foley process. Foley artists view a segment of video/film in a sound studio, a Foley stage, that is equipped with different floor surfaces (rug, tile, wood, etc.), a variety of doors (car doors, screen doors, house doors, etc.), and many other sound effects contraptions. While the segment is projected on a screen, the Foley artists recreate the appropriate sounds. When a character walks up to a door, the Foley artist is recorded walking along the studio floor. When the character opens the door, the Foley artist is recorded opening a door in the studio, and so on. Some programs have only occasional Foley work in them, but others, especially complicated mini-series and MOWs, might create all of the sound effects in this manner.


Among the many purposes that sound serves on television, four will concern us here:

1. Capturing viewer attention.

2. Manipulating viewer understanding of the image.

3. Maintaining televisual flow.

4. Maintaining continuity within individual scenes.

Regardless of the production techniques used to create sound, these are the essential functions that it serves on television.

Capturing Viewer Attention. The first and perhaps most significant function of television sound is to snare the attention of the viewer. Television, unlike cinema and the theater, exists in an environment of competing distractions. Most people watch television in a brightly lit room, with the TV set positioned amid a variety of visual stimuli (unlike the darkened room of a theater). While the television is on, conversations continue, a phone rings, a tea kettle may start boiling, a cat may rub against a viewer's leg. In sum, television viewing is an inattentive pastime. The viewer's gaze may be riveted to the set for brief, intense intervals, but the overall experience is one of the distracted glance.

In this setting, visuals alone are not captivating enough to grab the viewer's attention. Sound is a much more effective stimulus in this regard. This is not just the case of the loud, abrasive commercial demanding your attention. It's also the sports announcer's excited comments and the cheers of the crowd that cause one to look up from folding laundry to see an instant replay; or the soap opera character posing the question, "So, April, are you ready to reveal the true father of your child?" that brings one running back from the kitchen. Sound invokes viewers' attention, cuing them to significant visual action or a major narrative twist. In other words, sound may be used to hail viewers-much as one hails a cab.

Manipulating Viewer Understanding. The second function sound serves is to shape our understanding of the image. The sound-image relationship is a complex one that we will return to several times in this section. In the most general terms, this relationship manifests itself in three ways:

1. Sound and image support one another.

2. Sound and image contradict one another.

3. Sound helps to emphasize select elements within the image.

Sound and image can support each other in a variety of fashions. In Fig. 7.8, from our discussion of editing in Northern Exposure, we see a medium shot of Maggie and Joel and a table laden with food. Her lips are moving. On the soundtrack is the sound of a woman's voice, coordinated to the moving lips, and classical music in the background. The viewer presumes that that voice originates from those lips; the acoustic properties (discussed later) of the voice help characterize Maggie and her attitude toward Joel: slightly flirtatious, inquisitive, probing. Classical music plays in the background, signifying a certain romantic potential in this context. In this simple example, sound supports and heightens the impact of the image.

One of the most blunt ways in which television sound underscores the image and directly attempts to affect viewer response is the laugh track. The laugh track fabricates an audience and inveigles the viewer into responding as the ersatz audience is responding. Television is one of the few, if not the only, media that includes its implied audience response within texts themselves. And, in this case, sound is the vehicle by which this response is presented.

Sound does not always reinforce the image, however. Contrasting sound image would be exemplified if, in the scene between Joel and Maggie above, Joel's voice accompanied the image of Maggie's moving lips, or funereal music were played over the flirtatious dialogue. Obviously this stark contradiction between sound and image occurs infrequently on television. When sound does contrast with image it's normally to make some sort of narrative or editorial point. For example, obvious political commentary was made by contrasting the image of President Clinton hugging a beret-wearing Monica Lewinslcy (during a White House lawn party in 1996) with the audio of his angry declaration: "I want to say one thing to the American people. I did not have sexual relations with that woman." The sound-image relationship need not simply be one of either support or contrast. Often sound emphasizes part of the image while negating or de emphasizing other parts. In Fig. 1, the first shot in a scene from The Wonder Years, Haley sits in a high school cafeteria, eating lunch. In the background are the program's central figure, Kevin, and two friends of his. Sound is used in this shot to draw our attention from her to Kevin's table at the back of the image, as we hear what they say about her. Although Kevin and the others are in the background, their voices are louder than the ambient sound. It might seem strange, but Haley does not hear their voices even though we do and we are closer to her than we are to them. That is, if we were standing where the camera was positioned then we could hear Kevin only if Haley could, too. (See the following discussion of sound perspective.) Even though this use of sound is implausible it would likely not be noticed by most viewers. Why? Because this use of sound fits the narrative logic of the scene; it helps to tell the story of Haley's interaction with Kevin and his friends.

One could imagine other uses of sound in Fig. 1. The wall clock's ticking might be heard above everything else, suggesting the rushed nature of high school lunches. The sound of Haley eating might dominate the soundtrack, signifying that she is a glutton. Or, in contrast, an eerie foreboding could be represented by the lunchroom being totally, unnaturally silent. If the soundtrack were filled with sounds of wind howling and hail pelting the ground (off camera) it would direct our attention in another fashion, and spark other meanings: nice weather for werewolves. Each of these uses of sound and silence would move the story in a different direction. Each is an example of how, in subtle ways, the viewer's attention and comprehension may be channeled by the sounds accompanying the image.

Fig. 1

Maintaining Televisual Flow. The third function sound serves on television is the maintenance of television's pulsion, its forward drive. As is discussed in section 1, television pulls the viewer along in a flow of segments leading from one to the next. Sound plays a major role in this segment-to segment flow.

Audio transitions between scenes parallel the visual transitions described in section 7. One may fade out or fade in sound just as one may fade out/in image- although the two fades are often not quite simultaneous. Image frequently fades out just a bit earlier than sound. Additionally, the sound equivalent of a dissolve is the cross-fade, in which one sound fades out while the other fades in and the two overlap briefly. Another term for the transition from one sound to another--especially one song to another in music video presentations--is segue (pronounced "segway"), which may be a cross-fade or a fade out/in.

There are several ways in which sound aids televisual flow, working to keep the viewer watching. First, the speech of television announcers and the dialogue of characters are frequently used to pose questions and enigmas in order to lure the viewer into staying around to see what happens next. Station promotional announcements promise uncommon sights to come, and narrative dialogue frames questions that the viewer may hope to see answered. In either case, speech plays on our curiosity to pull us into the television flow.

But speech is not the only sound device that pulls the viewer into the flow.

Music is another common hook. Within programs it is especially common that the music does not end at the same time as the scene. Rather, the music continues-if only for just a few seconds. This continuity of music helps to soften the disruption inherent in the transition from one scene to the next. It is seldom used, however, between one program and the next. Here it is more important for television to differentiate slightly between the shows, to signal that one show is ending but that another follows immediately.

Audience applause is one final aspect of sound that plays an important role in television transitions. Applause is commonly used as the marker of the end or beginning of a segment. However, in contrast to its traditional meaning as a sign of audience respect or appreciation or enjoyment, television applause more often simply means: "This is the beginning" or "This is the end." As everyone knows, studio audiences of sitcoms and talk shows are told when to applaud. Moreover, if the actual audience does not provide enough applause the sound editor can easily add more, in a process often called sweetening. This is particularly important toward the end of sitcoms' taping/filming sessions when, commonly, much of the audience has become bored and left the studio and, thus, the level of real applause is diminished. Also, sitcoms usually splice together alternative takes of scenes. Artificial applause and laughter help to conceal the transition between the shots.

Maintaining Continuity Within Scenes. A fourth and final function of sound is its use within each individual scene to help construct the continuity of space and time. As explained in section 7, each television scene is made up of a variety of shots that are strung together according to the continuity editing system. The main purpose of this continuity system is to smooth over the potential disruptions that are caused by cutting from one shot to another. In this way the space and time of a particular setting and scene are made to appear continuous, even though they may have been recorded out of order. Dialogue, music, and ambient sound all play parts in maintaining this continuity.

Dialogue scenes, especially in the single-camera mode of production, are edited so that the cuts do not coincide with vocal pauses or the ends of sentences.

(This is less true in live-on-tape productions that are switched much more approximately.) Instead, the dialogue usually continues across a cut, helping to ease the transition from one shot to the next. In the scene from the single-camera production Northern Exposure analyzed in section 7 (see Figs. 7- 29), most cuts come in the midst of a phrase-creating, in a sense, a verbal match-on action as the words continue across the cut. The phrasing serves as the glue holding the cut together.

Similarly, music and ambient sound unify the shots. The forward movement of a melody helps to propel the story onward. The temporal continuum of the music, its ability to flow through time, overrides the discontinuous time of the editing. Music helps to draw the viewer's attention from jump cuts, continuity errors, or other disruptions in the visuals. Ambient sound serves the same function, though even less noticeably. Ambient sound signifies a specific space and time to the viewer. A particular room, for example, has a particular sound associated with it at a particular time. Even slight shifts in that ambient sound can disrupt the viewer by making it appear that the space and/or time has changed. This is why sound technicians will record ambient or wild sound to lay down over shots that were originally done silent, or to make consistent the sound behind dialogue that was shot at different times or locations. Consistent background sound, in a sense, certifies that the action took place in the same location at the same time, even though the shots are from different angles and may have been taken hours or days or weeks apart.

Laugh tracks also function in the background to underscore the continuity of a scene. For example, The Andy Griffith Show (1960-68), an unusual, single camera sitcom, incorporated a laugh track even though there was no studio audience. In each episode, the laughter continues across the cuts within a scene and thereby diminishes their disruptive potential. Viewers, in theory, don't notice the cut because they are too busy laughing along with the laugh track. In addition, multiple-camera programs such as Rowan & Martin's Laugh-In (1968 1973) are videotaped in short segments, with a laugh track tying all the segments together in post-production.


Sound on television appears deceptively simple. This is largely due to the fact that the sounds emanating from the TV speaker closely resemble the sounds that surround us in our everyday lives-unlike television's two-dimensional images, which are fundamentally dissimilar from our visual experience of the three dimensional world. A person's voice on TV is not that different from a person's voice coming from someone sharing the living room with you. A person's image on TV is flat and two-dimensional compared to your 3D viewing companion.

The aesthetic techniques and digital/mechanical technology that are used to create sound are much less intrusive than are those used to create image. It sometimes seems as if television sound were merely an exact copy of the sounds of reality. This makes television's manipulation of sound even more difficult to detect than its manipulation of image. One aim of this section is to alert the reader to the ways that the makers of television shape our perception and our understanding by controlling acoustic properties and sound technology.

General Acoustic Properties

Even though we are mostly concerned here with the differences between television sound and real-life sound, it would be foolish to presume that there are not rudimentary similarities between the two. Any television sound shares three basic characteristics with the sounds we hear in reality:

1. Loudness, or volume

2. Pitch

3. Timbre (pronounced "tam-burr"), or tone

Loudness. How loud or soft a sound is plays an obvious role in our perception of it. The more amplified a sound is, the greater its impact. Loudness is used for more than just emphasis in television, however. It can also, among other things, signify distance. The louder a sound is, the closer we assume the person or thing causing the sound must be. Further, the variation of loudness can be used for different effects. A sudden loud noise after a quiet segment, needless to say, causes shock or surprise. In contrast, soft sounds after a loud segment can force viewers to focus their attention in order to hear what's going on.

Pitch. Pitch is how high or low a sound is. On television, pitch is especially important to the meanings that voices convey (see section 3). For example, higher pitched voices carry conventionalized connotations of femininity, and lower pitches of masculinity. Pitch is significant to the impact of television music as well as its speech. In narrative scenes, higher notes are often used to accompany suspenseful situations, while lower notes can imply an ominous presence.

These examples should not be taken proscriptively (high notes don't always mean suspense), but they do indicate how television conventionalizes pitch to signify meanings and establish atmosphere. As with all stylistic conventions, the meanings associated with pitch shift over time and from culture to culture.

Timbre. Timbre is a term borrowed from music theory. It signifies the particular harmonic mix that gives a note its "color" or tonal quality. A violin has a different tone than a cello even when they play the same note. A saxophone's tone can be distinguished from a piano's.

The human voice also has timbre, and that tonal quality can be used by actors and directors to convey meaning. A nasal timbre can make a character into an annoying toady. A throaty timbre in a woman can signify a certain androgynous sexuality. In particular contexts, timbre communicates particular meanings.

TV-Specific Acoustic Properties

The sounds that the viewer hears on television are altered as they journey from sound stage to living room. The technology of various audio machines affects those sounds and provides the sound technicians with opportunities to manipulate volume, pitch, and timbre. Their use of this technology is guided by aesthetic conventions, by "rules" regulating the function of sound on television.

Digital Versus Analog. Before digital technology changed our concept of sound recording in the 1980s, audio and video tapes were based on analog principles. The presence of analog sound (and image, too) is rapidly decreasing in the consumer marketplace. However, it is still important to understand how analog recording works and how digital recording differs from it because the remnants of analog technology will be with us for some time to come.

First, let's consider the basic difference between all analog or digital phenomena. Anything labeled "digital" is rooted in digits or, put more simply, in numbers. An analog replica of something, in contrast, is a model that reproduces that thing in a different form from the original. The concept is a slippery one, but may become clearer if we consider the differences between analog and digital representations of temperature. An analog thermometer is one in which the mercury appears as a line within a tube. When the line gets up to a certain area it signifies "warm," when it goes farther, it signifies "hot." There are numbers calibrating the heat, but they aren't entirely necessary because the length of the line represents, in analog fashion, the amount of heat. The line's length is, in a sense, a model of the amount of heat. When it's long, it's hot. A digital thermometer, one that just displays numbers (e.g., 32 degrees), converts the amount of heat into digits. It doesn't tell us "warm" or "hot" or show us a model of the heat; it only gives us numbers. Further, as you can see in this example, all digital information is packaged in discrete units (e.g., a single degree), while analog models are unbroken continua (e.g., the continuous length of a mercury line in a tube). Now, let's apply this principal to sound recording. Analog sound technology creates replicas of sound waves on audio tape (or, earlier, on vinyl records and wax cylinders). That is, the sound wave is converted into an electronic replica that is recorded on a piece of magnetic tape-a ribbon of plastic with a coating on it that is sensitive to magnetic impulses created by electricity. These magnetic impulses are modulated on the tape in a fashion that parallels the sound wave's modulation.

In contrast to analog recording, digital technology transforms the sound wave into numbers. The process is called sampling, but it's not the same as the rap-music sampling mentioned above. Digital recording takes a tiny snippet from a sound-a fraction of a second-and measures the characteristics of the sound at that very instant. The characteristics of this sample are then converted into a set of binary numbers-just strings of zeros and ones-and recorded on magnetic tape or a hard drive (i.e., a magnetic disk). Thousands of samples are taken each second and then combined to create a digital representation of the sound. Much like our digital thermometer, this digital recording contains no information other than groups of digits-lots and lots of zeros and ones.

That, then, is the difference between analog and digital recording. But what is the significance of digital recording to television sound as it is played back in our living rooms? Currently, the sound technology in our television sets is analog, but this is quickly changing. For the home user, the digital audio revolution began in the 1980s with compact discs (CDs), which are little more than a collection of numbers that have been pressed into aluminum (or occasionally gold) and coated with plastic. DVDs, which debuted commercially in 1997, use a similar process to marry digital sound to a digital image. By 2006, the FCC has required that U.S. broadcasters completely phase in digital television (DTV) and other countries are taking similar initiatives to launch DTV. At that point, most sounds emanating from our stereos and TVs will be fully digital-excepting those old audio/video cassettes and vinyl LPs with which we refuse to part! What does the difference between analog and digital really mean to the listener? If you compare the sound of a digital recording with that of a com parable analog one you'll notice three aspects of the digital recording: (1) less background noise (hiss and the like caused by analog recording), (2) a larger dynamic range (reproducing softer sounds without noise obscuring them and louder sounds without distortion), and (3) a greater frequency response (re producing a wider range of low-to-high tones). Today's analog TVs lose much of these digital advantages, because the digitally recorded sound is still passing through analog technology. A certain additional amount of noise is added in the broadcast process as well. Consequently, much of the value of digital sound quality is lost on analog TV DTV, in contrast, will not degrade the quality of the digitally recorded original sound-unless the DTV signal has been severely compressed.

Perhaps more significant than the digital recording process and its high quality are the abilities of digital technology to both process existing sounds and manufacture new sounds. A broad variety of sound effects are now achieved using DAWs (digital audio workstations), which may significantly alter the volume, pitch, and timbre of any recorded sound. There is virtually no way for the viewer to be able to tell when this sort of subtle manipulation has taken place. It is equally difficult to discern when sound, especially music, has been fabricated digitally. This manufactured music has become popular in live-on-tape productions where a variety of music is needed, particularly for narrative programs such as soap operas.

Just about any type of instrumentation--from lush orchestral sounds to jazz and rock quartets-can be digitally created, instantaneously and inexpensively. This has greatly changed the musical sound of many genres. Productions that previously could not afford a full orchestra may now synthesize that sound cheaply. Soap operas, for instance, always used to be accompanied by a lone organ. That organ sound was so identified with the genre that it was a prominent part of soap opera parodies such as "As the Stomach Turns" on The Carol Burnett Show (1967-79). Nowadays, however, the soaps have a wide-ranging variety of music, much of which is synthesized digitally. Economics and technology have worked to change television's aesthetics.

Sound Perspective and Directionality. The position of a microphone, like the position of a camera, sets up a relationship between the recording device and the person or object creating the sound. The point of view that this relationship implies is its sound perspective. Mike placement and the division of sound into stereo channels permit the manipulation of sound perspective-thus influencing the viewer's understanding of a scene. If a mike is placed close to someone's lips, then the sound recorded will be an intimate, "close-up" perspective. And if the mike is positioned far away, then the sound perspective will be distant, similar to a long shot. In a sense, then, mike position "frames" the sound for viewers, signaling to them how "close" they are to the sound-producing person or object.

In terms of distance from the mike to the recorded object or person, there are four conventional positions:

1. Overhead boom (which can also be beneath the actors)

2. Lavaliere

3. Hand-held

4. Close-miking

Fig. 2: Omnidirectional Microphone; Cardioid Microphone; Hypercardioid microphone.

These positions incorporate different types of microphone technology based largely on the direction in which the mike is capable of picking up sound. That is, some mikes pick up sound from all directions equally and are thus omnidirectional (Fig. 2). Other mikes are more sensitive to sound coming from certain directions. These unidirectional mikes usually have somewhat heart-shaped pickup patterns, which have come to be labeled cardioid and hypercardioid (Fig. 2). A cardioid mike's pickup pattern looks like an inverted heart, with most of its sensitivity aimed toward the front. Similarly, hypercardioid mikes emphasize sound from the front, but they also allow sound from the rear to be recorded as well. The aesthetics of microphone positioning works with the technology of microphone directionality to determine how sound is picked up.

The overhead boom mike is held on a long arm that enables the boom operator to position it above the actors' heads, just out of the view of the camera. (It may also be placed below the camera frame.) It uses a hypercardioid, shotgun mike so that the operator may aim it directly at a specific person and minimize the surrounding ambient sound. Since the mike is 3 or 4 feet away from the actors' mouths, the sound perspective is roughly equivalent to the sound one hears when standing near a group of people and engaging in conversation.

Boom miking helps position the viewer vis-a-vis the characters or performers. This particular position implies an objective point of view, of being slightly distanced from the characters-or, at least, of not hearing subjectively through a character's mind.

The boom mike position has become the conventionalized norm for most narrative programs, whether using single-camera or multiple-camera mode of production. Moreover, it is the principal way that multiple-camera sitcoms and soap operas are recorded. They are videotaped/filmed straight through and consequently the mikes must record several persons from one mike position.

Thus the economic imperative of shooting these programs live-on-tape results in the technological necessity to use boom mikes, causing the aesthetic consequence of a certain "objective" sound perspective.

The omnidirectional lavaliere mike is attached to actors' chests, clipped to their clothing under which the microphone wire is concealed. Lavaliere miking is the norm for news broadcasters in the studio, though not for those out in the field who use a more directional mike to filter competing, incidental sounds.

Although closer than boom miking, the lavaliere mike is still 1 or 2 feet from the broadcaster's mouth. The sound that it picks up is the audio equivalent to the medium close-up and close-up perspective that typify framing in contemporary news practice.

The hand-held mike sounds much like the lavaliere mike because it is also positioned around chest high, although it may also be held higher than that.

Hand-held mikes are used in news and sports field production (e.g., in interviews with athletes) and in talk shows. These cardioid or hypercardioid mikes yield a sound perspective quite similar to the lavaliere mike, but, because they are directional microphones, the pickup may be aimed in one direction or another.

Hand-held mikes are never utilized in narrative programs. Unlike boom and lavaliere mikes, the hand-held mike is both visible and obvious to the viewer (the lavaliere mike is so small it can be overlooked or mistaken for a broach or a tie-clip). To use it in narrative programs would make evident the technology involved in creating television; it would be like having a camera appear on-screen. This violates conventions of repressing television devices in narrative programs; to see a mike would make the viewer conscious of the whole production apparatus, which is taboo unless you are avant-garde playwright Bertolt Brecht or comedian Garry Shandling (in It's Garry Shandling's Show [1988-90] and The Larry Sanders Show [1992-98] ). In news reporting, the hand-held mike is sometimes wielded like a club, intruding into the personal space of interviewees whether or not they wish to be spoken to. Thus, the hand-held mike has come to signify broadcast journalism in certain contexts. Occasionally, it means overly aggressive reporting.

In close-miking, the mike is positioned right next to a person's mouth-the "extreme close-up" of miking. This is how radio announcers are miked and it is also how television announcers-the ones that read promotional announcements and advertisements-are miked. Moreover, it is the miking technique used to record singers in a sound studio. This type of miking creates sound that has a full, rich timbre, a wide frequency response (often emphasizing bass pitch for male studio announcers), and very little ambient noise. Viewers have come to expect the close-miked sound in television announcements and music videos. For these elements of television, close-miking is the norm. However, close-miking can also prove to be disruptive when used in narrative programs.

Dubbing and other ADR in narrative programs are often recorded in close miking. This can clash with the viewer's expectations for the sound perspective created with boom-miking. To cut from a boom-miked piece of dialogue to one that is close-miked makes it sound as if the characters were suddenly right on top of you. To avoid this, sound technicians position the mike away from the ADR actors.

Sound perspective is not limited to a sense of closeness, of near or far. The widespread acceptance of stereo-TV sets and programs in the 1990s afforded sound editors with another tool for representing perspective. By altering the relative loudness of sounds in the right and left channels, they give us a sense of the lateral (i.e., sideways) position of a person or object. For example, a gun appears on the right side of the frame and when it is fired, the gunshot principally emanates from the right-hand speaker. This sound cue confirms our spatial sense of the position of the gun.

Sound mixing in theatrical films, DVDs, and DTV has seen the number of channels multiply in recent years. Dolby Digital, for instance, was introduced in film theaters in 1992 and DTV in 1998. It boasts 5.1 channels. The Dolby Web site explains their arrangement: Dolby Digital programs can deliver surround sound with five discrete full-range channels-left, center, right, left surround, and right surround-plus a sixth channel for those powerful low-frequency effects (LFE) that are felt more than heard in movie theaters. As it needs only about one-tenth the bandwidth of the others, the LFE channel is referred to as a `. 1 ' channel (and sometimes erroneously as the "subwoofer" channel).' Thus, Dolby Digital 5.1 is actually created with six speakers. Four speakers for the left, right, center, and LFE channels are in front of the viewer. The two so-called "surround" channels emanate from speakers placed behind the viewer.

By literally surrounding viewers with six speakers, Dolby Digital creates a sound space in which sounds may come from behind and in front, and to the left and right-unlike the sound in original monaural TVs that only came from a single point in front of the viewer.

Dolby Digital and other multi-channel sound systems greatly enhance the potential for sound-perspective manipulation on TV, but the placement of sounds in particular channels is not without its "rules" and conventions. Al most all dialogue is placed in the center channel -even if actors are positioned to the far left or right of the frame. Left/right channels are reserved for music and sound effects only. The rear left/right ("surround") and the LFE channels contain only sound effects, no dialogue or music. Aside from the LFE channel, there is no technical reason for this assignment of channels to certain types of sound. Thus, the seemingly endless variety of sound perspective is constrained by aesthetic convention--with sound again being largely divided into speech, music, and effects.

To this point, we have suggested ways in which sound perspective may be roughly equivalent to image perspective. But directors and sound editors, especially in narrative programs, need not rely on that equivalence. Indeed, they may try to subvert it for specific narrative effect. In the scene from The Wonder Years discussed above (Fig. 1), for instance, Kevin is shown in the background in long shot, too far away for the viewer to hear, but his voice is presented at "normal," boom-miked level. Sound perspective contrasts with image perspective in order to achieve a specific narrative effect-in this case, Kevin's opinion of Haley is presented without her knowledge of it. This is a major plot point in the narrative. Only later in the episode will she learn of his and his friends' opinions of her.


Much of what we hear on TV comes from a source that we can see on TV at the very same time. In other words, much TV sound originates in onscreen space and is synchronized with the time of the image. But this is not true of all sound on television. What, then, is the relationship of a sound to the space and the time of the image that it accompanies? And if it does not match them, then what effect does that disjuncture cause? Sound and Space. In section 6 we discussed how the aesthetic/ technological fact of the camera frame can be used by the director and videographer/cinematographer to achieve a variety of framing effects. The frame is also important to our consideration of sound. It forms the division between off screen space and onscreen space, between what is within the frame and what is presumed to be outside it. Often the source of a sound will be situated offscreen.

This is quite common in non-narrative, live-on-tape productions when a voice is heard from an actor who is not currently onscreen-for example, Paul Shaffer's chortle following one of David Letterman's jokes. And, of course, the laughter and applause of the studio audience normally comes from offscreen, too.

Our commonsensical understanding of offscreen space is also used in narrative programs. A voice or sound from offscreen helps to create the illusion that life is going on all around the characters that we see onscreen. Offscreen space thus aids the construction of the continuity of space--that is, the sense that the onscreen space continues out beyond the camera frame. This can be as simple as the sound of traffic inserted in the background of a scene in an apartment, or it can involve the more complicated manipulation of sounds and framing that create the illusion of a killer following a victim in a shadowy alley. In short, sound draws the viewer's mind out past the frame into a fictional world that has been created for this narrative.

Sound and Time. The time of a sound, in relation to the image it accompanies, can be:

1. Earlier than the image.

2. Simultaneous with the image.

3. Later than the image.

Obviously, the vast majority of sound falls into the second category, but there are also many instances of sound being displaced from the time of the image.

In a sound flashback we hear speech, music, or sound effects from an earlier time than the image currently on the screen. This occurs frequently in narrative programs. A boy, for example, may be trying to make up his mind about whether or not to shoplift. As we see his face in close up we might hear repeated the words of his mother about being honest. Those words come from a much earlier time in the story. The reverse-that is, sound later than the image-can also occur.

When a sound flash-forward is used, the viewer hears sound from a future part of the story. The time frame of a sound is similarly displaced when a character's voice in the "present" speaks over images of the past, as in The Wonder Years when we see an image from the 1970s and hear the voice of Kevin in the 1990s commenting on it.

Diegetic and Nondiegetic Sound. Recall that "diegesis" has been used in TV/film studies to refer to the story itself, the narrative action. The physical world in which this narrative action takes place is the diegetic space. In Seinfeld (1990-98), for example, this would be Jerry's apartment and the New York City locations the characters frequent (e.g., Monk's Cafe). Diegetic sound, then, consists of speech, music, and sound effects whose source is in the world of the story: the dialogue of Jerry, Elaine, George, and Kramer; the noises and ambient sound in the apartment; and so on.

Diegetic sound may be either objective or subjective. Objective diegetic sound originates in the external world of the narrative and would include, for example, Jerry and George's conversations. Subjective diegetic sound comes from inside a character's head and cannot be heard by other characters at the same location. When characters' voiceovers are used to signify their thoughts, then diegetic sound is being used subjectively. One strange example of this is Hennesy (1959-62), in which the thoughts of a dog are frequently presented in voiceover.

Not all of the sound on narrative TV programs, however, originates in the diegesis. Most notably, this nondiegetic sound includes the so-called "mood" music that accompanies each scene. The viewer hears it, but the characters do not because it is not part of their world. They also do not hear the narration of an omniscient announcer (one who is not a character). Nondiegetic music and narration are commonly used to guide the viewer's perception of the narrative.


The importance of sound to television is easy to overlook, because it is often difficult to detect how sound has been manipulated by the makers of television programs. When watching television, however, it is important to recognize how the different types of sound (speech, music, and sound effects), have been molded in order to achieve particular purposes. As always, these manipulations, these purposes, are ruled by television's aesthetics, economics, and technology.

The essential function of sound on TV is to hail the viewer to watch TV. This purpose cannot be overstated. The producers of commercials have long understood the significance of sound in capturing viewer interest. Once we have been hooked, sound channels our perception of an image by either reinforcing the meaning of that image or directing us toward select elements of the image.

In less common instances, it may subvert what the image seems to be saying.

Sound also functions to propel television forward. Within individual scenes, the illusion of continuity is preserved through the mix of music, speech, and sound effects. Sound, thus, becomes an integral part of the continuity system.

On a larger scale, sound also helps maintain the flow between one televisual segment and the next. Speech is especially significant in its construction of enigmas to pull the viewer into the televisual current.

Sound on television is in some ways identical to the sounds of life. In both, sound may be characterized in terms of its volume, pitch, and timbre, but it would be wrong to assume that TV sound is not manipulated in its transition from historical world to TV speaker. Digital and analog technologies present sound editors with a broad audial palette from which to choose. They may orchestrate pre-existing sounds or even create them, synthesize them, from scratch. One of the simplest components of sound technology is the positioning of the microphone and the effect that this has on sound perspective. Different types of microphone technology, in different locations, give the viewer an audial point-of-view from which to hear the action.

Most of the sound we hear on television is synchronized with the space and time of the images we are watching, but it need not always be so. Sounds can be offscreen as easily as they are onscreen. Offscreen sound draws the viewer out beyond the frame, further constructing spatial continuity. And the time of a sound may be displaced from that of the image. Sound of an earlier or later time can be laid over an image to various effect.

Thus television sound, which so often appears to be the "simple" recording of life's speech, music, and sound effects, is actually another manipulated and/or fabricated component of the television medium.


Sound style is discussed in many of the readings suggested at the end of section 5.

The critical study of television sound is just beginning. Rick Altman, "Television/Sound," in Studies in Entertainment: Critical Approaches to Mass Culture, ed. Tania Modleski (Bloomington: Indiana University Press, 1986), 39-54 builds on his work on sound in the neighboring medium of the cinema. Stephen Heath and Gillian Skirrow, "Television: A World in Action," Screen 18, no. 2 (Summer 1977): 7-59 is not wholly devoted to sound, but it does make some keen observations on the sound-image relationship. Also important for their considerations of sound's significance are the previously cited John Ellis, Visible Fictions and Herbert Zettl, Sight Sound Motion.

The principal essays on cinema sound are collected in Elisabeth Weis and John Belton, Film Sound: Theory and Practice (New York: Columbia University Press, 1985) and two journal issues on the topic: Yale French Studies 60 (1980) and Screen 25, no. 3 (May-June 1984).


1. "Frequently Asked Questions about Dolby Digital in the Home," Dolby Laboratories, Inc. 2000.

Prev | Next | Index

Top of Page | Home

Updated: Sunday, 2021-08-22 9:35 PST