An artificial neural network is a network of computing entities that can perform various tasks, such as regression, classification, clustering, and feature extraction. They are inspired by biological neural networks in the human brain. The most fundamental unit of a neural network is called a neuron/perceptron. A neuron is a simple computing unit that takes in a set of inputs and applies a function to these inputs in order to produce output.

The following diagram shows a simple neuron:

In 1957, Frank Rosenblatt proposed a classical perceptron model in which he associated weight with each input. He also proposed a method to realize these weights. A perceptron model is a simple computing unit with a threshold, , which can be defined by the following equation:

The following diagram represents a perceptron:

Perceptrons can only deal with linearly separable cases. The neural networks that we use today make use of activation functions rather than a harsh threshold, which are used in perceptrons. Unlike perceptrons, neural networks with non-linear activation functions can learn complex non-linear functional mappings between inputs and outputs, making them favorable for more complicated applications such as image recognition, language translation, speech recognition, and so on. The most popular activation functions are sigmoid, tanh, relu, and softmax.

We can implement various machine learning algorithms, such as simple linear regression, logistic regression, and so on, using neural networks. For example, we can think of logistic regression as a single-layer neural network. A logistic regression neural network uses a sigmoid () activation function. The following diagram shows a logistic regression neural network:

The output of the network is given as follows:

*where z is equal to _{}*

While implementing a multinomial logistic regression problem using neural networks, we place a softmax activation function in the output layer. The following equation shows the output of a multinomial logistic regression neural network:

*where z is the weighted sum of inputs for the j ^{th} class*

In neural networks, the network error is calculated by comparing the model's output to the desired output. This error term is used to guide the training of neural networks. After each training iteration, the error is communicated backward in the network and the weights of the network are updated in order to minimize the error. This process is called **backpropagation**. In this recipe, we will build a multi-class classification neural network using the `keras` library in R.