Book Image

Learning Microsoft Cognitive Services - Second Edition

By : Leif Larsen
Book Image

Learning Microsoft Cognitive Services - Second Edition

By: Leif Larsen

Overview of this book

Microsoft has revamped its Project Oxford to launch the all new Cognitive Services platform-a set of 30 APIs to add speech, vision, language, and knowledge capabilities to apps. This book will introduce you to 24 of the APIs released as part of Cognitive Services platform and show you how to leverage their capabilities. More importantly, you'll see how the power of these APIs can be combined to build real-world apps that have cognitive capabilities. The book is split into three sections: computer vision, speech recognition and language processing, and knowledge and search. You will be taken through the vision APIs at first as this is very visual, and not too complex. The next part revolves around speech and language, which are somewhat connected. The last part is about adding real-world intelligence to apps by connecting them to Knowledge and Search APIs. By the end of this book, you will be in a position to understand what Microsoft Cognitive Service can offer and how to use the different APIs.
Table of Contents (19 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

An overview of what we are dealing with


Now that you have seen a basic example of how to detect faces, it is time to learn a bit about what else Cognitive Services can do for you. When using Cognitive Services, you have 21 different APIs to hand. These are, in turn, separated into five top-level domains according to what they do. They are vision, speech, language, knowledge, and search. Let's learn more about them in the following sections.

Vision

APIs under the Vision flags allows your apps to understand images and video content. It allows you to retrieve information about faces, feelings, and other visual content. You can stabilize videos and recognize celebrities. You can read text in images and generate thumbnails from videos and images.

There are four APIs contained in the Vision area, which we will look at now.

Computer Vision

Using the Computer Vision API, you can retrieve actionable information from images. This means that you can identify content (such as image format, image size, colors, faces, and more). You can detect whether or not an image is adult/racy. This API can recognize text in images and extract it to machine-readable words. It can detect celebrities from a variety of areas. Lastly it can generate storage-efficient thumbnails with smart cropping functionality.

We will look into Computer Vision in Chapter 2, Analyzing Images to Recognize a Face.

Emotion

The Emotion API allows you to recognize emotions, both in images and in videos. This can allow for more personalized experiences in applications. Emotions detected are cross-cultural emotions: anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise.

We will cover Emotion API over two chapters: Chapter 2, Analyzing Images to Recognize a Face, for image-based emotions, and Chapter 3, Analyzing Videos, for video-based emotions.

Face

We have already seen a very basic example of what the Face API can do. The rest of the API revolves around this detecting, identifying, organizing, and tagging faces in photos. Apart from face detection, you can see how likely it is that two faces belong to the same person. You can identify faces and also find similar-looking faces.

We will dive further into Face API in Chapter 2, Analyzing Images to Recognize a Face.

Video

The Video API is about the analyzing, editing, and processing of videos in your app. If you have a video that is shaky, the API allows you to stabilize it. You can detect and track faces in videos. If a video contains a stationary background, you can detect motion. The API lets you generate thumbnail summaries for videos, which allows users to see previews or snapshots quickly.

Video will be covered in Chapter 3, Analyzing Videos.

Video Indexer

Using the Video Indexer API, one can start indexing videos immediately upon upload. This means you can get video insights without using experts or custom code. Content discovery can be improved, utilizing the powerful artificial intelligence of this API. This allows you to make your content more discoverable.

Video indexer will be covered in Chapter 3, Analyzing Videos.

Content Moderator

The Content Moderator API utilizes machine learning to automatically moderate content. It can detect potentially offensive and unwanted images, videos, and text for over 100 languages. In addition, it allows you to review detected material to improve the service.

Content Moderator will be covered in Chapter 2, Analyzing Images to Recognize a Face.

Custom Vision Service

Custom Vision Service allows you to upload your own labeled images to a vision service. This means that you can add images that are specific to your domain to allow recognition using the Computer Vision API.

Custom Vision Service is not covered in this book.

Speech

Adding one of the Speech APIs allows your application to hear and speak to your users. The APIs can filter noise and identify speakers. Based on the recognized intent, they can drive further actions in your application.

Speech contains three APIs that are discussed as follows.

Bing Speech

Adding the Bing Speech API to your application allows you to convert speech to text and vice versa. You can convert spoken audio to text, either by utilizing a microphone or other sources in real time or by converting audio from files. The API also offers speech intent recognition, which is trained by Language Understanding Intelligent Service (LUIS) to understand the intent.

Speaker Recognition

The SpeakerRecognition API gives your application the ability to know who is talking. By using this API, you can verify that the person speaking is who they claim to be. You can also determine who an unknown speaker is based on a group of selected speakers.

Custom Recognition

To improve speech recognition, you can use the Custom Recognition API. This allows you to fine-tune speech recognition operations for anyone, anywhere. By using this API, the speech recognition model can be tailored to the vocabulary and speaking style of the user. In addition, the model can be customized to match the expected environment of the application.

Translator Speech API

The Translator Speech API is a cloud-based automatic translation service for spoken audio. Using this API, you can add end-to-end translation across web apps, mobile apps, and desktop applications. Depending on your use cases, it can provide you with partial translations, full translations, and transcripts of the translations.

We will cover all speech related APIs in Chapter 5, Speak with Your Application.

Language

APIs related to language allow your application to process natural language and learn how to recognize what users want. You can add textual and linguistic analysis to your application, as well as natural language understanding.

The following five APIs can be found in the Language area.

Bing Spell Check

The Bing Spell Check API allows you to add advanced spell checking to your application.

This API will be covered in Chapter 6, Understanding Text.

Language Understanding Intelligent Service (LUIS)

LUIS is an API that can help your application understand commands from your users. Using this API, you can create language models that understand intents. By using models from Bing and Cortana, you can make these models recognize common requests and entities (such as places, times, and numbers). You can add conversational intelligence to your applications.

LUIS will be covered in Chapter 4, Let Applications Understand Commands.

Linguistic Analysis

The Linguistic Analysis API lets you parse complex text to explore the structure of text. By using this API, you can find nouns, verbs, and more in text, which allows your application to understand who is doing what to whom.

We will see more of Linguistic Analysis in Chapter 6, Understanding Text.

Text Analysis

The Text Analysis API will help you in extracting information from text. You can find the sentiment of a text (whether the text is positive or negative). You will be able to detect language, topic, and key phrases used throughout the text.

We will also cover Text Analysis in Chapter 6, Understanding Text.

Web Language Model

By using the Web Language Model (WebLM) API, you are able to leverage the power of language models trained on web-scale data. You can use this API to predict which words or sequences follow a given sequence or word.

Web Language Model API will be covered in Chapter 6, Understanding Text.

Translator Text API

By adding the Translator Text API, you can get textual translations for over 60 languages. It can detect languages automatically, and you can customize the API to your needs. In addition, you can improve translations by creating user groups, utilizing the power of crowd-sourcing.

Translator Text API will not be covered in this book.

Knowledge

When talking about Knowledge APIs, we are talking about APIs that allow you to tap into rich knowledge. This may be knowledge from the web. It may be from academia or it may be your own data. Using these APIs, you will be able to explore different nuances of knowledge.

The following four APIs are contained in the Knowledge API area.

Academic

Using the Academic API, you can explore relationships among academic papers, journals, and authors. This API allows you to interpret natural language user query strings, which allow your application to anticipate what the user is typing. It will evaluate said expression and return academic knowledge entities.

This API will be covered more in Chapter 8, Query Structured Data in a Natural Way.

Entity Linking

Entity Linking is the API you would use to extend knowledge of people, places, and events based on the context. As you may know, a single word may be used differently based on the context. Using this API allows you to recognize and identify each separate entity within a paragraph, based on the context.

We will go through Entity Linking API in Chapter 7, Extending Knowledge Based on Context.

Knowledge Exploration

The Knowledge Exploration API will let you add the possibility of using interactive search for structured data in your projects. It interprets natural language queries and offers auto-completions to minimize user effort. Based on the query expression received, it will retrieve detailed information about matching objects.

Details on this API will be covered in Chapter 8, Query Structured Data in a Natural Way.

Recommendations

The Recommendations API allows you to provide personalized product recommendations for your customers. You can use this API to add a frequently bought together functionality to your application. Another feature you can add is item-to-item recommendations, which allows customers to see what other customers like. This API will also allow you to add recommendations based on the prior activity of the customer.

We will go through this API in Chapter 7, Extending Knowledge Based on Context.

QnA Maker

The QnA Maker is a service to distill information for Frequently Asked Questions (FAQ). Using existing FAQs, either online or per document, you can create question and answer pairs. Pairs can be edited, removed, and modified, and you can add several similar questions to match a given pair.

We will cover QnA Maker in Chapter 8, Query Structured Data in a Natural Way.

Custom Decision Service

Custom Decision Service is a service designed to use reinforced learning to personalize content. The service understands any context and can provide context-based content.

This book does not cover Custom Decision Service.

Search

Search APIs give you the ability to make your applications more intelligent with the power of Bing. Using these APIs, you can use a single call to access data from billions of web pages, images, videos, and news.

The following five APIs are in the search domain.

Bing Web Search

With Bing Web Search, you can search for details in billions of web documents indexed by Bing. All the results can be arranged and ordered according to a layout you specify, and the results are customized to the location of the end user.

Bing Image Search

Using the Bing Image Search API, you can add an advanced image and metadata search to your application. Results include URL to images, thumbnails, and metadata. You will also be able to get machine-generated captions, similar images, and more. This API allows you to filter the results based on image type, layout, freshness (how new the image is), and license.

Bing Video Search

Bing Video Search will allow you to search for videos and return rich results. The results contain metadata from the videos, static or motion-based thumbnails, and the video itself. You can add filters to the result based on freshness, video length, resolution, and price.

Bing News Search

If you add Bing News Search to your application, you can search for news articles. Results can include authoritative images, related news and categories, information on the provider, URL, and more. To be more specific, you can filter news based on topics.

Bing Autosuggest

The Bing Autosuggest API is a small, but powerful one. It will allow your users to search faster with search suggestions, allowing you to connect a powerful search to your apps.

All Search APIs will be covered in Chapter 9, Adding Specialized Search.

Bing Entity Search

Using the Bing Entity Search API, you can enhance your searches. The API will find the most relevant entity based on your search terms. It will find entities such as famous people, places, movies, and more.

We will not cover Bing Entity Search in this book.