What is Vocoder

The vocoder was invented in the 1920s for communication and communication purposes. However, its true purpose was discovered in electronic music, where it became a key tool for creating robotic voices. Almost a hundred years after its appearance, the vocoder is actively used in the music industry, but not everyone knows how this unique instrument works and how to use it. In this text you can learn how the Second World War made speech synthesizers popular, how the vocoder functions and how to use it correctly.

The development of the vocoder began in 1928 through the work of an engineer named Homer Dudley at Bell Labs. By the end of the 1930s, the final result was achieved, and in November 1937 Dudley received the first patent for his invention, and in 1939 – the second. Dudley’s main idea was to recreate the human speech apparatus using electronics. Using electronic components and effects, the engineer sought to imitate as closely as possible the functioning of the human speech organs, reproducing the sounds created by the passage of air through various parts of the human body, such as the lungs and other organs.

In 1939, Bell Labs demonstrated a speech synthesis device called VODER (Voice Operating Demonstrator) to the public through a series of demonstrations in New York and San Francisco. The device featured a pair of switchable oscillators and a noise generator as an audio source. A dedicated vocal path consisting of ten-band filters was linked to a velocity-sensitive keyboard that controlled the intensity of the filtering. The pitch of the sound was changed using a foot pedal. Additional keys were responsible for generating the letters “P”, “D”, “J”, as well as sound combinations “JAW” and “CH”.

VODER was a complex device that required specialized training and training lasting several months to use. For daily demonstrations, Bell Labs specially trained 20 people, who took turns presenting the new product to everyone interested. During the demonstration, VODER said the phrase “Good afternoon, radio audience!”

In 1949, the KO-6 voice converter was developed, which encoded speech and information at a rate of 1200 bits per second. In 1953, another vocoder appeared, the KY-9 THESEUS, which not only increased the processing speed to 1650 bits per second, but also used different components. Thanks to the modified materials, it was possible to reduce the weight of the vocoder from 55 tons for SIGSALY to 256 kilograms for KY-9. Finally, in 1961, with the release of the HY-2 converter, it was possible to reduce the weight of the vocoder to 45 kilograms, and also increase the encoding speed to 2400 bits per second. The HY-2 was the last industrial vocoder used in secure communications systems, while the instrument remained in the consumer sector.

In 1948, the German scientist Werner Mayer-Eppler, who had a special interest in voice synthesis, published a dissertation on speech synthesis and electronic music from the point of view of sound synthesis. His knowledge later played an important role in the creation of the West German Radio (WDR) Electronic Music Studio in Cologne in 1951.

The first use of a vocoder to create music occurred in 1959, also in Germany. Between 1956 and 1959, Siemens developed the Siemens Synthesizer, which could convert sound into speech. In 1968, Robert Moog, founder of the Moog company, developed one of the first vocoders designed specifically for use in the music industry. This vocoder was commissioned by the University at Buffalo.

Since then, the history of the vocoder has evolved on its own, and it has become widely used in all areas of audio and video. The instrument became known to the general public thanks to the group Kraftwerk, which independently assembled a vocoder for their experiments and used it since its founding in 1970. The most famous and popular example of using a vocoder was the Kraftwerk album “Trans-Europe Express”, which we examined in detail in a review of unusual musical instruments by German electronic artists.

How does a vocoder work?

It is better to use two signals than one. The vocoder requires two sound sources to operate:

Operator: initial sound signal;
Modulator: a signal with different harmonic characteristics that determine the operator’s sound.

The sound passes through a special “filter bank” that analyzes the modulator signal, divides it into frequency bands and applies a filter to each band. Filters are always adjusted so that the cutoff point is exactly in the center of each range in the modulator signal. Regardless of the slicing density, the signal within each range is filtered at the center.

Then the operator signal is supplied to the modulator, which passes through all the filters. The vocoder adjusts the cutoff point of each filter depending on the harmonics and overtones in the modulator signal.

To understand the principle of operation of a vocoder, we can draw an analogy with the human voice. The sound of the voice is formed by the signals of operators and modulators. When we pronounce words, a flow of air passes through the vocal cords, creating the original signal operator. At the same time, other parts of the vocal apparatus vibrate, generating a modulator signal. These characteristics directly affect the sound of the voice.

A vocoder works in a similar way: it modifies the original signal due to the characteristics of the additional signal.

Any audio signal can be an operator or a modulator. Producers often use synthesized sounds as operators and the voice as a modulator. An example of the use of a vocoder in music is the track “Trans-Europe Express” by Kraftwerk. The operator is the synthesizer signal, and the modulator is ordinary speech.

A more experimental use of the vocoder can be seen in the track “Nightcall” by Kavinsky. This effect can be recreated using iZotope VocalSynth by setting the patch to generate chords from two sound waves and white noise as an operator, modulated by the voice.

How to use a vocoder

For a vocoder to sound as impressive as many commercial recordings, the signal operator must be rich in overtones. The richer and more varied the operator, the stronger the impact of the modulator.

It’s best to start experimenting with patches that use or are based on the sawtooth sound waveform. Ramp wave signals are typically richer and richer than triangle or sine waves. It is also good practice to compress or saturate the operator signal before feeding it into the vocoder. This will highlight the effect of the signal passing through the filter bank.

The voice acting as a modulator requires special attention. When writing words, you should be very clear and precise, emphasizing each sound. No matter what type of voice you have, it is important that the articulation is pronounced. It is the precision and clarity that creates the characteristic vocoder effect that gives a robotic voice. Notice how in Kavinsky’s “Nightcall” each word is pronounced clearly and slowly. When working with a vocoder, it is important to monitor articulation to avoid distortion.

Voice pitch is not as important when using a vocoder. Focus on other characteristics of the voice: timbre, depth, clarity and definition. Instead of experimenting with range, it is better to work on expression and intonation.

What parameters control the operation of the vocoder?

Both hardware and software (VST) vocoders usually have a similar set of parameters. In most cases, their settings are similar: although the names of the controls and parameters may vary depending on the manufacturer, their essence remains approximately the same.

Number of Bands

The Bands control controls how the audio signal is divided into different frequency ranges. The position of this control determines how many parts the modulator signal will be divided into. Unlike software vocoders and plug-ins, older devices have a limit on the number of frequency ranges into which the signal can be divided. To create a traditional robotic sound similar to the Kraftwerk style, it is recommended to set the Bands parameter in the range of 8 to 12 values.

Frequency Range

This parameter determines the range of frequencies that will be used in the operator signal processing process. When operating the vocoder, only frequencies within this specified interval will be taken into account, the rest will be ignored. To improve audio clarity, it is recommended to set the upper limit above 5 kHz.

Formant

Some vocoder models have a formant adjustment feature, often referred to as “Shift.” With this option, the user can change the width or narrowness of the bands to filter the audio. Increasing the formants makes the processed signal brighter, while reducing it makes the processed signal darker and deeper.

Typically, formant adjustment is used to adjust the vocoder to female or male voices, with the shift making the robotic voice more feminine or masculine. Some vocoder models, instead of adjusting the formants, have a “Gender” parameter, which allows you to adjust the gender of the resulting voice.

Unvoiced

Human speech in any language is always accompanied by so-called plosive sounds. An explosive sound occurs at the moment when, in order to pronounce it, it is necessary to pass a stream of air through closed lips, for example, when pronouncing the letters “P” and “B”. Plosives are not vocal sounds, so they are often called unvoiced sounds.

Non-vocal sounds have no specific pitch and are noise across the entire frequency range that the vocoder ignores. But you shouldn’t rejoice at the exclusion of such noises: imagine how familiar words sound without the letters “P” and “B” (“habitual” – “rich”, “problem” – “rolema”).

To prevent the vocoder from missing plosive sounds and “swallowing” letters in words, manufacturers add a special “Unvoiced” parameter to the settings section. This control is connected to a noise generator, which corrects shortcomings in the operation of the vocoder: the more the parameter is turned, the stronger the correction. The noise generator reproduces a signal with a sound waveform similar to the operator signal. All pitchless and transitional plosives remain in the signal, letters in words are preserved, and speech sounds correct after the vocoder.

Read related articles

Sound recording

Articles

Amped Studio is a complete virtual recording studio. If you are into music or are just interested in creating it, our application is the perfect find for these purposes ...

Multiband compression

Articles

There are many types of compressors, but one of the most versatile and flexible is the multi-band compressor ...

Best Waves VST plugins

Articles

With a Waves Creative Access subscription, you'll always have access to a wide range of plugins from Waves ...

Song structure

Articles

The composer's ideas are in themselves only the basis for the future composition, a kind of framework on which the main elements are to be laid ...

Best midi interfaces for under $100

Articles

In today’s music industry, social media is a key component. The way music is promoted as a whole has changed dramatically due to this massive shift in the music industry ...

The best microphones

Articles

If you're looking for the best microphones for your needs, then you've come to the right place ...

Best delay VST

Articles

Like the continuous vibrations of an endlessly multiplying tape, delay plug-ins are a vast collection of tools that seem endless in the variety available to modern music producers ...

Fade in & fade out audio

Articles

The effect of fade in and fade out audio is similar in nature to how an old tube TV turns on and off, only in the case of a TV ...

Audio designer

Articles

To begin with, it is important to understand the difference between music and sound. Sound is a general concept that includes music, noises and speech ...

Free registration