To the main Blog page

The History of Conversational AI: From Nine Digits And Sixteen Words To Human-like Virtual Agents

Voice technology can work miracles these days: today’s digital assistants can be as small as teacups, but the first voice recognition devices were almost the size of an entire room.

Ever wonder how – and when – it all started? Today we’ll trace the way from the first voice recognition technologies to modern AI that can communicate in a human-like manner.

1939, The Voder: the birth of the voice technology

Researchers and historians agree that The Voder is the first real voice technology project. It was introduced in 1939 at the World Fair in New York City. The system could synthesize human speech. This was achieved by imitating the effects of the human vocal cords. The operator of the machine can select one of the two basic sounds using the pedal bar.

The Bell Telephone Laboratory introduced the Voder. Specialists trained twenty operators. At the New York Fair, The Voder said the words “Good afternoon, radio audience.”

1952, Audrey: speech recognition via numbers

Bell Labs was at the forefront of voice recognition research in the middle of the 20th century. Thirteen years later, the Voder has introduced a team at the company that designed the Audrey (the Automatic Digit Recognition), a machine capable of understanding spoken digits.

Audrey could recognize one voice pronouncing digits between zero and nine. The machine could recognize digits spoken by its creator HK Davis (90% accuracy) and achieve 70-80% accuracy for a few other speakers. The accuracy dropped significantly for voices Audrey was not familiar with.

The machine was quite big. It occupied a six-foot-high relay rack, required lots of power, and dozens of meters of cables.

According to Larry O’Gorman of Nokia Bell Labs, the main goal of Bell Labs was the transition to machine-only telephony. The company needed to reduce the bandwidth and recognize speech would reduce data traveling over the wires. Even when telephone switches became digital in the 1970s, call routing accelerated but still depended on an operator identifying a person’s request to dial a number. So, the company needed to create a machine that would recognize digits from zero to nine and the words ‘yes’ or ‘no.’

Bell Labs was not the only company conducting voice recognition research. Scientists and engineers from the Radio Research Lab in Tokyo have built a vowel recognizer.

1962, Shoebox: 16 spoken words

Audrey was a breakthrough; however, it only could deal with numbers and specific voices. Ten years later, at the 1962 Seattle World Fair, IBM demonstrated the Shoebox machine. It could understand up to 16 spoken words in English. The machine accepted input data via a microphone and then analyzed electrical impulses to which the sound was converted.

1971, Automatic Call Identification System

Another significant shift in voice technologies happened in 1971 when IBM rolled out the Automatic Call Identification System. This machine allowed people to talk and receive spoken answers.

Also, in the same year, the US Department of Defense’s research agency Darpa launched a Speech Understanding Research program. The agency wanted to get a voice recognition technology that will recognize at least 1,000 words. The list of the participants of the program included IBM, Carnegie Mellon University (CMU), and Stanford Research Institute.

1976, Harpy: 1,011 words

This program delivered the result in 1976 when researchers at Carnegie Mellon University introduced Harpy, a system able to understand 1,011 words (approximate vocabulary of a three-year-old child). Unlike the previous machines, Harpy could recognize even sentences. The scheme was as follows: the device converted speech into text, then did machine translation.

The 1980s, the first real-world applications

In the 1980s, there were more new releases in the voice recognition sphere. First, IBM presented Tangora; a voice-activated typewriter that was capable of handling 20,000 words. The tool heavily relied on a hidden Markov model to predict the most likely phonemes to follow a given phoneme.

By that time, voice recognition tech was at a level acceptable for first real-world applications. The debut was in toys: Teddy Ruxpin, Pamela the Living Doll, Talking Micky Mouse, and other products brought voice recognition into people’s homes.

In the 1990s, voice recognition comes to PCs and Mac

IBM was not the only company actively researching voice recognition. Dragon Systems was one of the biggest competitors who’ve developed its approach. In 1990 the company released a Dragon Dictate app for PC. For an enormous price of $9,000, the software could understand voice and translate it into text. In 1997 Dragon Systems released the Dragon NaturallySpeaking app that recognized up to 100 words in a minute. Professionals across the globe still use this application. For example, doctors use it to document medical records.

Apple, a manufacturer of Mac computers, also released its speech recognition product called Speakable Items. This software appeared in 1993, and in 1996 IBM launched MedSpeak, which became the first-ever commercial product capable of recognizing continuous speech.

The 2000s, Productivity plateau

By the year 2001, voice recognition tech was at about 80% accuracy. And it was too hard to keep the pace of the progress at the level it was before. So, the 2000s became a period of thorough research work with few notable public releases. The list of the most significant changes was implementing speech recognition into the Microsoft Office products.

The 2010s: Smart voice-powered assistants

The breakthrough came in the early 2010s when Google released its Voice Search app. Millions of people worldwide got an opportunity to use voice recognition on their phones. Simultaneously, it allowed Google to analyze billions of voice searches to increase the quality of recognition and speech patterns prediction.

Apple jumped in the conversational voice recognition technologies bandwagon in 2011 with its smart assistant Siri, while Microsoft rolled out Cortana AI.

The second part of the decade saw an explosion of voice assistants capable of recognizing human speech: Amazon Alexa, Google Home, and other products started helping millions of people in their everyday tasks.

The 2020s: Conversational AI

The constant development of voice recognition and conversational AI technologies for nearly 80 years gave us outstanding results. Nowadays, we have technologies developed by Google, IBM, and Microsoft with speech accuracy ranging from 4.9 to 6.9 percent.

Companies like Neuro.net develop virtual agents for call centers that are indistinguishable from a human. Only 1% of the customers asking the business will understand they are talking to the bot; the other 99% usually think they are having a conversation with a human agent.

This example and other consumer virtual assistants like Alex or Google Home prove that people are more comfortable talking to machines. And there are no signs that could tell this trend won’t continue to emerge.

Ready to get started?

Discover how your business can benefit from virtual agents aimed to create a better CX, boost operational efficiency, and achieve greater results.