Book Image

Hands-On Neural Network Programming with C#

By : Matt Cole
Book Image

Hands-On Neural Network Programming with C#

By: Matt Cole

Overview of this book

Neural networks have made a surprise comeback in the last few years and have brought tremendous innovation in the world of artificial intelligence. The goal of this book is to provide C# programmers with practical guidance in solving complex computational challenges using neural networks and C# libraries such as CNTK, and TensorFlowSharp. This book will take you on a step-by-step practical journey, covering everything from the mathematical and theoretical aspects of neural networks, to building your own deep neural networks into your applications with the C# and .NET frameworks. This book begins by giving you a quick refresher of neural networks. You will learn how to build a neural network from scratch using packages such as Encog, Aforge, and Accord. You will learn about various concepts and techniques, such as deep networks, perceptrons, optimization algorithms, convolutional networks, and autoencoders. You will learn ways to add intelligent features to your .NET apps, such as facial and motion detection, object detection and labeling, language understanding, knowledge, and intelligent search. Throughout this book, you will be working on interesting demonstrations that will make it easier to implement complex neural networks in your enterprise applications.
Table of Contents (16 chapters)
13
Activation Function Timings

Understanding back propagation

Back propagation, which is short for the backward propagation of errors, is an algorithm for supervised learning of neural networks using gradient descent. This calculates what is known as the gradient of the error function, with respect to the network's weights. It is a generalized form of the delta rule for perceptrons all the way to multi-layer feed-forward neural networks.

Unlike forward propagation, back-prop calculates the gradients by moving backwards through the network. The gradient of the final layer of weights is calculated first, and the gradient of the first layer is hence calculated last. With the recent popularity in deep learning for image and speech recognition, back-prop has once again taken the spotlight. It is, for all intents and purposes, an efficient algorithm, and today's version utilizes GPUs to further improve performance.

Lastly, because the computations for back-prop are dependent upon the activations and outputs from the forward phase (non-error term for all layers, including hidden), all of these values must be computed prior to the backwards phase beginning. It is therefore a requirement that the forward phase precede the backward phase for every iteration of gradient descent.

Forward and back propagation differences

Let's take a moment to clarify the difference between feed forward and back propagation. Once you understand this, you can visualize and understand much better how the entire neural network flows.

In neural networks, you forward-propagate data to get the output and then compare it with the real intended value to get the error, which is the difference between what the data is suppose to be versus what your machine-learning algorithm actually thinks it is. To minimize that error, you now must propagate backward by finding the derivative of error, with respect to each weight, and then subtract this value from the weight itself.

The basic learning that is being done in a neural network is training neurons when to get activated, when to fire, and when to be on or off. Each neuron should activate only for certain types of inputs, not all of them. Therefore, by propagating forward, you see how well your neural network is behaving and find the error(s). After you find out what your network error rate is, you back-propagate and use a form of gradient descent to update new values of the weights. Once again, you will forward-propagate your data to see how well those weights are performing, and then backward-propagate the data to update the weights. This will go on until you reach some minima for error value (hopefully the global minimum and not the local). Again, lather, rinse, repeat!