Book Image

Voice User Interface Projects

By : Henry Lee
Book Image

Voice User Interface Projects

By: Henry Lee

Overview of this book

From touchscreen and mouse-click, we are moving to voice- and conversation-based user interfaces. By adopting Voice User Interfaces (VUIs), you can create a more compelling and engaging experience for your users. Voice User Interface Projects teaches you how to develop voice-enabled applications for desktop, mobile, and Internet of Things (IoT) devices. This book explains in detail VUI and its importance, basic design principles of VUI, fundamentals of conversation, and the different voice-enabled applications available in the market. You will learn how to build your first voice-enabled application by utilizing DialogFlow and Alexa’s natural language processing (NLP) platform. Once you are comfortable with building voice-enabled applications, you will understand how to dynamically process and respond to the questions by using NodeJS server deployed to the cloud. You will then move on to securing NodeJS RESTful API for DialogFlow and Alexa webhooks, creating unit tests and building voice-enabled podcasts for cars. Last but not the least you will discover advanced topics such as handling sessions, creating custom intents, and extending built-in intents in order to build conversational VUIs that will help engage the users. By the end of the book, you will have grasped a thorough knowledge of how to design and develop interactive VUIs.
Table of Contents (12 chapters)

Technological advancement of VUIs

In 1952, at Bell Labs, the engineers Davis, Biddulph, and Balashek built the Automatic Digit Recognizer (Audrey), a rudimentary voice recognition system. Audrey was limited by the technology of the time but was able to recognize the numbers 0 to 9. The Audrey system, which processed the 10 digits through voice recognition, was 6 feet tall and covered the walls of Bell Labs, containing large numbers of analog circuits with capacitors, amplifiers, and filters. Audrey did the following three things:

  • The Audrey system took the user's voice as input and put the voice into the machine's memory. The voice input was classified and pattern matching was performed against the predefined classes of voices for the numbers 0 to 9. Finally, the identified number was stored in memory.
  • It flashed a light that represented the matching number.
  • It was also able to communicate selected digits over the telephone.

Audrey performed what's known today as NLP, using ML with AI.

Although Audrey recognized voice input with an accuracy of 97% to 99%, it was very expensive and large in size, and it was extremely difficult to maintain its complex electronics. Thus, Audrey could not be commercialized. However, since the inception of Audrey, voice technology and research has continued to leap forward.

First-generation VUIs

The big break came in 1984, when SpeechWorks and Nuance introduced interactive voice response (IVR) systems. IVR systems were able to recognize human voices over the telephone and carried out tasks given to them (Roberto Pieraccini and Lawrence Rabiner 2012, The Voice in the Machine: Building Computers That Understand Speech). You will be able to recognize IVR systems today when you call major companies for support. For example, when you call to make a hotel reservation, you will be familiar with "Press 1 or say reservation, Press 2 or say check reservation, Press 3 or say cancel reservation, Press # or say main menu." In the '90s, I remember working on my first VUIs in an IVR system. To develop the IVR system, I had to work with the Microsoft Speech API (SAPI), With SAPI, I was able to perform text to speech (TTS), where the voice received from the user was translated into text in order to evaluate the user's intent; then, after evaluating the user's intent, a text message was created and converted back to the voice to relay the message to the user on the telephone.

Boom of VUIs

In order to really appreciate the start of the emerging voice technology, first let's look at the year 2005. In 2005, Web 2.0 contributed to the increase in the volume of data. This increase brought about the creation of Hadoop and big data in order to meet the demand for storing, processing, and understanding data. Big data helps to advance data analytics, ML, and AI in order to identify patterns in data in business contexts. The same techniques as those used for big data, such as ML and AI, have helped in advancing NLP to recognize speech patterns and VUIs. The Web 2.0 big data boom kick-started the boom in the use of VUIs on smart phones, in the home, and in automobiles.

History of VUIs on mobile devices

In 2006, Apple introduced the concept of Siri, which allows users to interact with machines using their voice. In 2007, Google followed Apple and introduced voice searches. In 2011, Apple finally brought Siri concepts into reality by integrating Siri into iOS and iPhones. But unfortunately, with Steve Jobs' death that same year, the voice innovations from Apple slowed down, allowing others, such as Google and Amazon, to catch up. In 2015, Microsoft introduced Cortana for the Windows 10 operating system and smart phones (refer to the following screenshot). In 2016, Google introduced Google Assistant (refer to the following screenshot) to mobile devices. Later, from Chapter 3, Build a Fortune Cookie Application, to Chapter 5, Deploying the Fortune Cookie App to Google Home, you will learn how to create voice assistant applications for mobile devices. One of the major advantages of writing applications for Google Assistant is that the same applications you write for Google Assistant can also be deployed to Google Home.

The following illustration depicts screenshots of the mobile voice assistants Cortana, Siri, and Google Assistant:

Mobile voice assistants—Cortana, Siri, and Google Assistant

History of VUIs for Google Home

In 2014, Amazon introduced Amazon Echo (refer to the following screenshot), the first VUI device designed for consumers' home. In 2016, Google released Google Home (refer to the following screenshot). In 2017, Amazon and Google continued to compete against each other in the consumer marketplace with the Amazon Echo and Google Home devices. The competition between Amazon and Google with these home devices shared similarities with the competition between Apple's iPhone and Google's Android. Currently, these home devices lack the third party applications the consumers can use and, as such, huge start-up and entrepreneurial opportunities exist. Remember Angry Birds for iPhone and Android? What could be the next big hit in this untapped marketplace? Later, from Chapter 3, Building a Fortune Cookie Application, through Chapter 8, Migrating the Alexa Cooking Application to Google Home, you will learn how to develop applications for Amazon Echo and Google Home devices.

The following photo shows Amazon Echo:

Amazon Echo

The following is a photo of Google Home:

Google Home

History of VUIs in cars

In 2007, Microsoft partnered with Ford and integrated Microsoft Sync Framework, giving drivers hands-free interaction with their car's features of the car. In 2013, Apple introduced CarPlay for the cars, but only limited number of car manufacturers were willing to adopt CarPlay ( On the other hand, in 2018, major car manufacturers adopted Google Auto ( because Google Auto is based on the Android operating system and already has huge developer ecosystems in the Android marketplace. Later, in Chapter 9, Building a Voice Enabled Podcast for the Car, and Chapter 10, Hosting and Enhancing the Android Auto Podcast, you will learn how to create your own podcast and stream your own content to cars through car dashboards that support Google Auto.

The following photo shows the voice assistant from Apple's CarPlay:

Apple CarPlay

The following screenshot shows Google Auto:

Google Auto