Book Image

Learning Microsoft Cognitive Services - Third Edition

By : Leif Larsen
Book Image

Learning Microsoft Cognitive Services - Third Edition

By: Leif Larsen

Overview of this book

Microsoft Cognitive Services is a set of APIs for integrating artificial intelligence in your applications to solve logical business problems. If you’re new to developing applications with AI, Learning Microsoft Cognitive Services will give you a comprehensive introduction to Microsoft’s AI stack and get you up-to-speed in no time. The book introduces you to 24 APIs, including Emotion, Language, Vision, Speech, Knowledge, and Search. Using Visual Studio, you can develop applications with enhanced capabilities for image processing, speech recognition, text processing, and much more. Moving forward, you will work with datasets that enable your applications to process various data in the form of image, video, or text. By the end of the book, you’ll be able to confidently explore Cognitive Services APIs for building intelligent applications that can be deployed for real-world business uses.
Table of Contents (17 chapters)
Learning Microsoft Cognitive Services - Third Edition
Contributors
Acknowledgments
Preface
Index

An overview of different APIs


Now that you have seen a basic example of how to detect faces, it is time to learn a bit about what else Cognitive Services can do for you. When using Cognitive Services, you have 21 different APIs to hand. These are, in turn, separated into five top-level domains depending on what they do. These domains are vision, speech, language, knowledge, and search. We will learn more about them in the following sections.

Vision

APIs under the vision flags allow your apps to understand images and video content. They allow you to retrieve information about faces, feelings, and other visual content. You can stabilize videos and recognize celebrities. You can read text in images and generate thumbnails from videos and images.

There are four APIs contained in the vision domain, which we will look at now.

Computer vision

Using the computer vision API, you can retrieve actionable information from images. This means that you can identify content (such as image format, image size, colors, faces, and more). You can detect whether or not an image is adult/racy. This API can recognize text in images and extract it to machine-readable words. It can detect celebrities from a variety of areas. Lastly, it can generate storage-efficient thumbnails with smart-cropping functionality.

We will look into computer vision in Chapter 2, Analyzing Images to Recognize a Face.

Face

We have already seen a very basic example of what the Face API can do. The rest of the API revolves around the detection, identification, organization, and tagging of faces in photos. As well as face detection, you can also see how likely it is that two faces belong to the same person. You can identify faces and also find similar-looking faces. We can also use the API to recognize emotions in images.

We will dive further into the Face API in Chapter 2, Analyzing Images to Recognize a Face.

Video indexer

Using the video indexer API, you can start indexing videos immediately upon upload. This means that you can get video insights without using experts or custom code. Content discovery can be improved, utilizing the powerful artificial intelligence of this API. This allows you to make your content more discoverable.

The video indexer API will be covered in greater detail in Chapter 3, Analyzing Videos.

Content moderator

The content moderator API utilizes machine learning to automatically moderate content. It can detect potentially offensive and unwanted images, videos, and text for over 100 languages. In addition, it allows you to review detected material to improve the service.

The content moderator will be covered in Chapter 2, Analyzing Images to Recognize a Face.

Custom vision service

The custom vision service allows you to upload your own labeled images to a vision service. This means that you can add images that are specific to your domain to allow recognition using the computer vision API.

The custom vision service will be covered in more detail in Chapter 2, Analyzing Images to Recognize a Face.

Speech

Adding one of the Speech APIs allows your application to hear and speak to your users. The APIs can filter noise and identify speakers. Based on the recognized intent, they can drive further actions in your application.

The speech domain contains three APIs that are outlined in the following sections.

Bing Speech

Adding the Bing Speech API to your application allows you to convert speech to text and vice versa. You can convert spoken audio to text either by utilizing a microphone or other sources in real time or by converting audio from files. The API also offers speech intent recognition, which is trained by the Language Understanding Intelligent Service (LUIS) to understand the intent.

Speaker recognition

The speaker recognition API gives your application the ability to know who is talking. By using this API, you can verify that the person that is speaking is who they claim to be. You can also determine who an unknown speaker is based on a group of selected speakers.

Translator speech API

The translator speech API is a cloud-based automatic translation service for spoken audio. Using this API, you can add end-to-end translation across web apps, mobile apps, and desktop applications. Depending on your use cases, it can provide you with partial translations, full translations, and transcripts of the translations cover all speech-related APIs in Chapter 5, Speak with Your Application.

Language

APIs that are related to the language domain allow your application to process natural language and learn how to recognize what users want. You can add textual and linguistic analysis to your application, as well as natural language understanding.

The following five APIs can be found in the language domain.

Bing Spell Check

The Bing Spell Check API allows you to add advanced spell checking to your application.

This API will be covered in Chapter 6, Understanding Text.

Language Understanding Intelligent Service (LUIS)

LUIS is an API that can help your application understand commands from your users. Using this API, you can create language models that understand intents. By using models from Bing and Cortana, you can make these models recognize common requests and entities (such as places, times, and numbers). You can add conversational intelligence to your applications.

LUIS will be covered in Chapter 4, Let Applications Understand Commands.

Text analytics

The text analytics API will help you in extracting information from text. You can use it to find the sentiment of a text (whether the text is positive or negative), and will also be able to detect the language, topic, key phrases, and entities that are used throughout the will also cover the text analysis API in Chapter 6, Understanding Text.

Translator Text API

By adding the translator text API, you can get textual translations for over 60 languages. It can detect languages automatically, and you can customize the API to your needs. In addition, you can improve translations by creating user groups, utilizing the power of crowdsourcing.

The translator text API will not be covered in this book.

Knowledge

When we talk about knowledge APIs, we are talking about APIs that allow you to tap into rich knowledge. This may be knowledge from the web or from academia, or it may be your own data. Using these APIs, you will be able to explore the different nuances of knowledge.

The following four APIs are contained in the knowledge API domain.

Project Academic Knowledge

Using the Project Academic Knowledge API, you can explore relationships among academic papers, journals, and authors. This API allows you to interpret natural language user query strings, which allows your application to anticipate what the user is typing. It will evaluate what is being typed and return academic knowledge entities.

This API will be covered in more detail in Chapter 8, Query Structured Data in a Natural Way.

Knowledge exploration

The knowledge exploration API will let you add the possibility of using interactive searches for structured data in your projects. It interprets natural language queries and offers autocompletions to minimize user effort. Based on the query expression received, it will retrieve detailed information about matching objects.

Details on this API will be covered in Chapter 8, Query Structured Data in a Natural Way.

Recommendations solution

The recommendations solution API allows you to provide personalized product recommendations for your customers. You can use this API to add a frequently-bought-together functionality to your application. Another feature that you can add is item-to-item recommendations, which allows customers to see what other customers like. This API will also allow you to add recommendations based on the prior activity of the customer.

We will go through this API in Chapter 7, Building Recommendation Systems for Businesses.

QnA Maker

The QnA Maker is a service to distill information for frequently asked questions (FAQ). Using existing FAQs, either online or in a document, you can create question and answer pairs. Pairs can be edited, removed, and modified, and you can add several similar questions to match a given pair.

We will cover QnA Maker in Chapter 8, Query Structured Data in a Natural Way.

Project Custom Decision Service

Project Custom Decision Service is a service designed to use reinforced learning to personalize content. The service understands any context and can provide context-based content.

This book does not cover Project Custom Decision Service.

Search

Search APIs give you the ability to make your applications more intelligent with the power of Bing. Using these APIs, you can use a single call to access data from billions of web pages, images, videos, and news articles.

The search domain contains the following APIs.

Bing Web Search

With Bing Web Search, you can search for details in billions of web documents that are indexed by Bing. All the results can be arranged and ordered according to a layout that you specify, and the results are customized to the location of the end user.

Bing Web Search will be covered in Chapter 9, Adding Specialized Search.

Bing Image Search

Using the Bing Image Search API, you can add an advanced image and metadata search to your application. Results include URLs to images, thumbnails, and metadata. You will also be able to get machine-generated captions, similar images, and more. This API allows you to filter the results based on image type, layout, freshness (how new the image is), and license. Bing Image Search will be covered in Chapter 9, Adding Specialized Search.

Bing Video Search

Bing Video Search will allow you to search for videos and return rich results. The results could contain metadata from the videos, static or motion-based thumbnails, and the video itself. You can add filters to the results based on freshness, video length, resolution, and price.

Bing Video Search will be covered in Chapter 9, Adding Specialized Search.

Bing News Search

If you add Bing News Search to your application, you can search for news articles. Results can include authoritative images, related news and categories, information on the provider, URLs, and more. To be more specific, you can filter news based on topics.

Bing News Search will be covered in Chapter 9, Adding Specialized Search.

Bing Autosuggest

The Bing Autosuggest API is a small but powerful one. It will allow your users to search faster using their search suggestions, allowing you to connect a powerful search functionality to your apps.

Bing Autosuggest will be covered in Chapter 9, Adding Specialized Search.

Bing Visual Search

Using the Bing Visual Search API, you can identify and classify images. You can also acquire knowledge about images.

Bing Visual Search will be covered in Chapter 9, Adding Specialized Search.

Bing Custom Search

By utilizing the Bing Custom Search API, you can create a powerful, customized search that fits your needs. This tool is an ad-free commercial tool that allows you to deliver the search results you want.

Bing Custom Search will be covered in Chapter 9, Adding Specialized Search.

Bing Entity Search

Using the Bing Entity Search API, you can enhance your searches. The API will find the most relevant entity based on your search terms. It will find entities such as famous people, places, movies, and more.

We will not cover Bing Entity Search in this book.