The idea of being able to talk with a computer has fascinated many people for a long time. However, until recently, this has seemed to be the stuff of science fiction. Now things have changed so that people who own a smartphone or tablet can perform many tasks on their device using voice—you can send a text message, update your calendar, set an alarm, and ask the sorts of queries that you would previously have typed into your search box. Often voice input is more convenient, especially on small devices where physical limitations make typing and tapping more difficult.
This book provides a practical guide to the development of voice apps for Android devices, using the Google Speech APIs for text-to-speech (TTS) and automated speech recognition (ASR) as well as other open source software. Although there are many books that cover Android programming in general, there is no single source that deals comprehensively with the development of voice-based applications for Android.
Developing for a voice user interface shares many of the characteristics of developing for more traditional interfaces, but there are also ways in which voice application development has its own specific requirements and it is important that developers coming to this area are aware of common pitfalls and difficulties. This book provides some introductory material to cover those aspects that may not be familiar to professionals from a mainstream computing background. It then goes on to show in detail how to put together complete apps, beginning with simple programs and progressing to more sophisticated applications. By building on the examples in the book and experimenting with the techniques described, you will be able to bring the power of voice to your Android apps, making them smarter and more intuitive, and boosting your users' mobile experience.
Chapter 1, Speech on Android Devices, discusses how speech can be used on Android devices and outlines the technologies involved.
Chapter 2, Text-to-Speech Synthesis, covers the technology of text-to-speech synthesis and how to use the Google TTS engine.
Chapter 3, Speech Recognition, provides an overview of the technology of speech recognition and how to use the Google Speech to Text engine.
Chapter 4, Simple Voice Interactions, shows how to build simple interactions in which the user and app can talk to each other to retrieve some information or perform an action.
Chapter 5, Form-filling Dialogs, illustrates how to create voice-enabled dialogs that are similar to form-filling in a traditional web application.
Chapter 6, Grammars for Dialog, introduces the use of grammars to interpret inputs from the user that go beyond single words and phrases.
Chapter 7, Multilingual and Multimodal Dialogs, looks at how to build apps that use different languages and modalities.
Chapter 8, Dialogs with Virtual Personal Assistants, shows how to build a speech-enabled personal assistant.
Chapter 9, Taking it Further, shows how to develop a more advanced Virtual Personal Assistant.
To run the code examples and develop your own apps, you will need to install the Android SDK and platform tools. A complete bundle that includes the essential Android SDK component and a version of the Eclipse IDE with built-in ADT (Android Developer Tools) along with tutorials is available for download at http://developer.android.com/sdk/.
You will also need an Android device to build and test the examples as Android ASR (speech recognition) does not work on virtual devices (emulators).
This book is intended for all those who are interested in speech application development, including students of speech technology and mobile computing. We assume some background of programming in general, particularly in Java. We also assume some familiarity with Android programming.
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The following lines of code create a TextToSpeech
object that implements the onInit
method of the onInitListener
interface."
A block of code is set as follows:
TextToSpeech tts = new TextToSpeech(this, new OnInitListener(){ public void onInit(int status){ if (status == TextToSpeech.SUCCESS) speak("Hello world", TextToSpeech.QUEUE_ADD, null); } }
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
Interpret field i: Play prompt of field i Listen for ASR result Process ASR result: If the recognition was successful, then save recognized keyword as value for the field i and move to the next field If there was a no match or no input, then interpret field i If there is any other error, then stop interpreting Move to the next field: If the next field has already a value assigned, then move to the next one If the last field in the form is reached,thenendOfDialogue=true
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Please say a word of the album title."
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>
, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
There is a web page for the book at http://lsi.ugr.es/zoraida/androidspeechbook, with additional resources, including ideas for exercises and projects, suggestions for further reading, and links to useful web pages.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]>
with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]>
if you are having a problem with any aspect of the book, and we will do our best to address it.