Book Image

Learning Microsoft Cognitive Services - Second Edition

By : Leif Larsen
Book Image

Learning Microsoft Cognitive Services - Second Edition

By: Leif Larsen

Overview of this book

Microsoft has revamped its Project Oxford to launch the all new Cognitive Services platform-a set of 30 APIs to add speech, vision, language, and knowledge capabilities to apps. This book will introduce you to 24 of the APIs released as part of Cognitive Services platform and show you how to leverage their capabilities. More importantly, you'll see how the power of these APIs can be combined to build real-world apps that have cognitive capabilities. The book is split into three sections: computer vision, speech recognition and language processing, and knowledge and search. You will be taken through the vision APIs at first as this is very visual, and not too complex. The next part revolves around speech and language, which are somewhat connected. The last part is about adding real-world intelligence to apps by connecting them to Knowledge and Search APIs. By the end of this book, you will be in a position to understand what Microsoft Cognitive Service can offer and how to use the different APIs.
Table of Contents (19 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Chapter 5. Speaking with Your Application

In the previous chapter, we learned to discover and understand the intent of a user, based on utterances. In this chapter, we will learn how to add audio capabilities to our applications. We will learn to convert text to speech and speech to text. We will learn how to identify the person speaking. Throughout this chapter, we will learn how you can utilize spoken audio to verify a person. Finally, we will touch briefly on how to customize speech recognition, to make it unique for your application's usage.

By the end of this chapter, we will have covered the following topics:

  • Converting spoken audio to text and text to spoken audio
  • Recognizing intent from spoken audio, utilizing LUIS
  • Verifying that the speaker is who they claim to be
  • Identifying the speaker
  • Tailoring the recognition API to recognize custom speaking styles and environments