Book Image

Deep Learning with Microsoft Cognitive Toolkit Quick Start Guide

By : Willem Meints
Book Image

Deep Learning with Microsoft Cognitive Toolkit Quick Start Guide

By: Willem Meints

Overview of this book

Cognitive Toolkit is a very popular and recently open sourced deep learning toolkit by Microsoft. Cognitive Toolkit is used to train fast and effective deep learning models. This book will be a quick introduction to using Cognitive Toolkit and will teach you how to train and validate different types of neural networks, such as convolutional and recurrent neural networks. This book will help you understand the basics of deep learning. You will learn how to use Microsoft Cognitive Toolkit to build deep learning models and discover what makes this framework unique so that you know when to use it. This book will be a quick, no-nonsense introduction to the library and will teach you how to train different types of neural networks, such as convolutional neural networks, recurrent neural networks, autoencoders, and more, using Cognitive Toolkit. Then we will look at two scenarios in which deep learning can be used to enhance human capabilities. The book will also demonstrate how to evaluate your models' performance to ensure it trains and runs smoothly and gives you the most accurate results. Finally, you will get a short overview of how Cognitive Toolkit fits in to a DevOps environment
Table of Contents (9 chapters)

The relationship between AI, machine learning, and deep learning

In order to understand what deep learning is, we have to explore what Artificial Intelligence (AI) is and how it relates to machine learning and deep learning. Conceptually, deep learning is a form of machine learning, whilst machine learning is a form of AI:

In computer science, Artificial intelligence, is a form of intelligence demonstrated by machines. AI is a term that was invented in the 1950s by scientists doing research in computer science. AI encompasses a large set of algorithms that shows behavior that is more intelligent than the standard software we build for our computers.

Some algorithms demonstrate intelligent behavior but aren't capable of improving themselves. One group of algorithms, called machine learning algorithms, can learn from sample data that you show them and generate models that you then use on similar data to make predictions.

Within the group of machine learning algorithms there's the sub-category of deep learning algorithms. This group of algorithms uses models that are inspired by the structure and function of a biological brain found in humans or animals.

Both machine learning and deep learning learn from sample data that you provide. When we build regular programs, we write business rules by using different language constructs, such as if-statements, loops, and functions. The rules are fixed. In machine learning, we feed samples and an expected answer into an algorithm that then learns the rules that connect the samples to the expected answers:

There are two major components in machine learning: machine learning models and machine learning algorithms.

When you use machine learning to build a program, you first choose a machine learning model. A machine learning model is a mathematical equation containing trainable parameters that transforms input into a predicted answer. This model shapes the rules that the computer will learn. For example: predicting the miles per gallon for a car requires that you model reality in a certain way. Classifying whether a credit card transaction is fraudulent requires a different model.

The representation of the input could be the properties of a car turned into a vector. The output of the model could be the miles per gallon for a car. In the case of credit card fraud, the input could be the properties of the user account and the transaction that was done. The output representation could be a score between 0 and 1 where a value close to 1 means that the transaction should be rejected.

The mathematical transformation in the machine learning model is controlled by a set of parameters that need to be trained for the transformation to produce the correct output representation.

This is where the second part, the machine learning algorithm comes into play. To find the best values for the parameters in the machine learning model we need to perform a multi-step process:

  1. Initially, the computer will choose a random value for each of the unknown parameters in your model
  2. It will then use sample data to make an initial prediction
  3. This prediction is fed into a loss function together with the expected output to get feedback regarding how well the model is performing
  4. This feedback is then used by the machine learning algorithm to find better values for the parameters in the model

These steps are repeated many times to find the best possible values for the parameters in the model. If all goes well, you end up with a model that is capable of making accurate predictions for many complicated situations.

The fact that we can learn rules from examples is a useful concept. There are many situations where we can't use simple rules to solve a particular problem. For example: credit card fraud cases come in many shapes and sizes. Sometimes a hacker slowly breaks the system injecting smaller hacks over time and then stealing the money. Other times hackers simply try to steal a lot of money in one attempt. A rule-based program would become too hard to maintain because it would need to contain a lot of code to handle all different fraud cases. Machine learning is an elegant way to solve this problem. It understands how to handle different kinds of credit card fraud without a lot of code. And it is also capable of making a judgment on cases that it hasn't seen before, within reasonable boundaries.

Limitations of machine learning

Machine learning models are very powerful. You can use them in many cases where rule-based programs fall short. Machine learning is a good first alternative whenever you find a problem that can't be solved with a regular rule-based program. Machine learning models do, however, come with their limitations.

The mathematical transformation in machine learning models is very basic. For example: when you want to classify whether a credit transaction should be marked as fraud, you can use a linear model. A logistic regression model is a great model for this kind of use case; it creates a decision boundary function that separates fraud cases from non-fraud cases. Most of the fraud cases will be above the line and correctly marked as such. But no machine learning model is perfect and some of the cases will not be correctly marked as fraud by the model as you can see in the following image.

If your data happens to be perfectly linearly-separable all cases would be correctly classified by the model. But when have to deal with more complex types of data, the basic machine learning models fall short. And there are more reasons why machine learning is limited in what it can do:

  • Many algorithms assume that there's no interaction between features in the input
  • Machine learning are, in many cases, based on linear algorithms, that don't handle non-linearity very well
  • Often, you are dealing with a lot of features, classic machine learning algorithms have a harder time to deal with high dimensionality in the input data