Book Image

Neural Network Projects with Python

By : James Loy
Book Image

Neural Network Projects with Python

By: James Loy

Overview of this book

Neural networks are at the core of recent AI advances, providing some of the best resolutions to many real-world problems, including image recognition, medical diagnosis, text analysis, and more. This book goes through some basic neural network and deep learning concepts, as well as some popular libraries in Python for implementing them. It contains practical demonstrations of neural networks in domains such as fare prediction, image classification, sentiment analysis, and more. In each case, the book provides a problem statement, the specific neural network architecture required to tackle that problem, the reasoning behind the algorithm used, and the associated Python code to implement the solution from scratch. In the process, you will gain hands-on experience with using popular Python libraries such as Keras to build and train your own neural networks from scratch. By the end of this book, you will have mastered the different neural network architectures and created cutting-edge AI projects in Python that will immediately strengthen your machine learning portfolio.
Table of Contents (10 chapters)

What is machine learning?

Although machine learning and AI are often used interchangeably, there are subtle differences that set them apart. The term AI was first coined in the 1950s, and it refers to the capability of a machine to imitate intelligent human behavior. To that end, researchers and computer scientists have pursued several approaches. Early efforts in AI were centered around an approach known as symbolic AI. Symbolic AI attempts to express human knowledge in a declarative form that computers could process. The height of symbolic AI resulted in the expert system, a computer system that emulated human decision making.

However, one major drawback of symbolic AI is that it relied on the domain knowledge of human experts, and required those rules and knowledge to be hardcoded for problem-solving. AI as a scientific field went through a period of drought (known as the AI winter), when scientists became increasingly disillusioned by the limitations of AI.

While symbolic AI took center stage in the 1950s, a subfield of AI known as machine learning was quietly bubbling in the background.

Machine learning refers to algorithms that computers use to learn from data, allowing it to make predictions on future, unseen data.

However, early AI researchers did not pay much attention to machine learning, as computers back then were neither powerful enough nor had the capability to store the huge amount of data that machine learning algorithms require. As it turns out, machine learning would not be left in the cold for long. In the late 2000s, AI enjoyed a resurgence, with machine learning largely propelling its growth. The key reason for this resurgence was the maturation of computer systems that could collect and store a massive amount of data (big data), along with processors that are fast enough to run the machine learning algorithms. Thus, the AI summer began.

Machine learning algorithms

Now that we have talked about what machine learning is, we need to understand how machine learning algorithms work. Machine learning algorithms can be broadly classified into two categories:

  • Supervised learning: Using labeled training data, the algorithm learns the rule for mapping the input variables into the target variable. For example, a supervised learning algorithm learns to predict whether there will be rain (the target variable) from input variables such as the temperature, time, season, atmospheric pressure, and so on.
  • Unsupervised learning: Using unlabeled training data, the algorithm learns associative rules for the data. The most common use case for unsupervised learning algorithms is in clustering analysis, where the algorithm learns hidden patterns and groups in data that are not explicitly labeled.

In this book, we will focus on supervised learning algorithms. As a concrete example of a supervised learning algorithm, let's consider the following problem. You are an animal lover and a machine learning enthusiast and you wish to build a machine learning algorithm using supervised learning to predict whether an animal is a friend (a friendly puppy) or a foe (a dangerous bear). For simplicity, let's assume that you have collected two measurements from different breeds of dogs and bears—their Weight and their Speed. After collecting the data (known as the training dataset), you plot them out on a graph, along with their labels (Friend or Foe):

Immediately, we can see that dogs tend to weigh less, and are generally faster, while bears are heavier and generally slower. If we draw a line (known as a decision boundary) between the dogs and the bears, we can use that line to make future predictions. Whenever we receive the measurements for a new animal, we can just see if it falls to the left or to the right of the line. Friends are to the left, and foes are to the right.

But this is a trivial dataset. What if we collect hundreds of different measurements? Then the graph would be more than 100-dimensional, and it would be impossible for a human being to draw a dividing line. However, such a task is not a problem for machine learning.

In this example, the task of the machine learning algorithm is to learn the optimal decision boundary separating the datasets. Ideally, we want the algorithm to produce a Decision Boundary that completely separates the two classes of data (although this is not always possible, depending on the dataset):

With this Decision Boundary, we can then make predictions on future, unseen data. If the New Instance lies to the left of the Decision Boundary, then we classify it as a friend. Vice versa, if the new instance lies to the right of the Decision Boundary, then we classify it as a foe.

In this trivial example, we have used only two input variables and two classes. However, we can generalize the problem to include multiple input variables with multiple classes.

Naturally, our choice of machine learning algorithm affects the kind of decision boundary produced. Some of the more popular supervised machine learning algorithms are as follows:

  • Neural networks
  • Linear regression
  • Logistic regression
  • Support vector machines (SVMs)
  • Decision trees

The nature of the dataset (such as an image dataset or a numerical dataset) and the underlying problem that we are trying to solve should dictate the machine learning algorithm used. In this book, we will focus on neural networks.

The machine learning workflow

We have discussed what machine learning is. But how exactly do you do machine learning? At a high level, machine learning projects are all about taking in raw data as input and churning out Predictions as Output. To do that, there are several important intermediate steps that must be accomplished. This machine learning workflow can be summarized by the following diagram:

The Input to our machine learning workflow will always be data. Data can come from different sources, with different data formats. For example, if we are working on a computer vision-based project, then our data will likely be images. For most other machine learning projects, the data will be presented in a tabular form, similar to spreadsheets. In some machine learning projects, data collection will be a significant first step. In this book, we will assume that the data will be provided to us, allowing us to focus on the machine learning aspect.

The next step is to preprocess the data. Raw data is often messy, error-prone, and unsuitable for machine learning algorithms. Hence, we need to preprocess the data before we feed it to our models. In cases where data is provided from multiple sources, we need to merge the data into a single dataset. Machine learning models also require a numeric dataset for training purposes. If there are any categorical variables in the raw dataset (that is, gender, country, day of week, and so on), we need to encode those variables as numeric variables. We will see how we can do so later on in the chapter. Data scaling and normalization is also required for certain machine learning algorithms. The intuition behind this is that if the magnitude of certain variables is much greater than other variables, then certain machine learning algorithms will mistakenly place more emphasis on those dominating variables.

Real-world datasets are often messy. You will find that the data is incomplete and contains missing data in several rows and columns. There are several ways to deal with missing data, each with its own advantages and disadvantages. The easiest way is to simply discard rows and columns with missing data. However, this may not be practical, as we may end up discarding a significant percentage of our data. We can also replace the missing variables with the mean of the variables (if the variables happen to be numeric). This approach is more ideal than discarding data, as it preserves our dataset. However, replacing missing values with the mean tends to affect the distribution of the data, which may negatively impact our machine learning models. One other method is to predict what the missing values are, based on other values that are present. However, we have to be careful as doing this may introduce significant bias into our dataset.

Lastly, in Data Preprocessing, we need to split the dataset into a training and testing dataset. Our machine learning models will be trained and fitted only on the training set. Once we are satisfied with the performance of our model, we will then evaluate our model using the testing dataset. Note that our model should never be trained on the testing set. This ensures that the evaluation of model performance is unbiased, and will reflect its real-world performance.

Once Data Preprocessing has been completed, we will move on to Exploratory Data Analysis (EDA). EDA is the process of uncovering insights from your data using data visualization. EDA allows us to construct new features (known as feature engineering) and inject domain knowledge into our machine learning models.

Finally, we get to the heart of machine learning. After Data Preprocessing and EDA have been completed, we move on to Model Building. As mentioned in the earlier section, there are several machine learning algorithms at our disposal, and the nature of the problem should dictate the type of machine learning algorithm used. In this book, we will focus on neural networks. In Model Building, Hyperparameter Tuning is an essential step, and the right hyperparameters can drastically improve the performance of our model. In a later section, we will look at some of the hyperparameters in a neural network. Once the model has been trained, we are finally ready to evaluate our model using the testing set.

As we can see, the machine learning workflow consists of many intermediate steps, each of which are crucial to the overall performance of our model. The major advantage of using Python for machine learning is that the entire machine learning workflow can be executed end-to-end entirely in Python, using just a handful of open source libraries. In this book, you will gain experience using Python in each step of the machine learning workflow, as you create sophisticated neural network projects from scratch.