Book Image

Learning Microsoft Cognitive Services - Second Edition

By : Leif Larsen
Book Image

Learning Microsoft Cognitive Services - Second Edition

By: Leif Larsen

Overview of this book

Microsoft has revamped its Project Oxford to launch the all new Cognitive Services platform-a set of 30 APIs to add speech, vision, language, and knowledge capabilities to apps. This book will introduce you to 24 of the APIs released as part of Cognitive Services platform and show you how to leverage their capabilities. More importantly, you'll see how the power of these APIs can be combined to build real-world apps that have cognitive capabilities. The book is split into three sections: computer vision, speech recognition and language processing, and knowledge and search. You will be taken through the vision APIs at first as this is very visual, and not too complex. The next part revolves around speech and language, which are somewhat connected. The last part is about adding real-world intelligence to apps by connecting them to Knowledge and Search APIs. By the end of this book, you will be in a position to understand what Microsoft Cognitive Service can offer and how to use the different APIs.
Table of Contents (19 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Customizing speech recognition


At the time of writing, the Custom Recognition Intelligent Service (CRIS) is still at the private beta stage. As such, we will not spend a lot of time on this, other than going through some key concepts.

When using speech-recognition systems, there are several components working together. Two of the more important components are acoustic and language models. The first one labels short fragments of audio into sound units. The second helps the system decide words, based on the likelihood of a given word appearing in certain sequences.

Although Microsoft has done a great job creating comprehensive acoustic and language models, there may still be times when you need to customize these models.

Imagine you have an application that is supposed to be used in a factory environment. Using speech recognition will require acoustic training of that environment, so that the recognition can separate usual factory noises.

Another example is if your application is used by a specific...