Book Image

Hands-On Neural Networks

By : Leonardo De Marchi, Laura Mitchell
Book Image

Hands-On Neural Networks

By: Leonardo De Marchi, Laura Mitchell

Overview of this book

Neural networks play a very important role in deep learning and artificial intelligence (AI), with applications in a wide variety of domains, right from medical diagnosis, to financial forecasting, and even machine diagnostics. Hands-On Neural Networks is designed to guide you through learning about neural networks in a practical way. The book will get you started by giving you a brief introduction to perceptron networks. You will then gain insights into machine learning and also understand what the future of AI could look like. Next, you will study how embeddings can be used to process textual data and the role of long short-term memory networks (LSTMs) in helping you solve common natural language processing (NLP) problems. The later chapters will demonstrate how you can implement advanced concepts including transfer learning, generative adversarial networks (GANs), autoencoders, and reinforcement learning. Finally, you can look forward to further content on the latest advancements in the field of neural networks. By the end of this book, you will have the skills you need to build, train, and optimize your own neural network model that can be used to provide predictable solutions.
Table of Contents (16 chapters)
Free Chapter
1
Section 1: Getting Started
4
Section 2: Deep Learning Applications
9
Section 3: Advanced Applications

Supervised learning in practice with Python

As we said earlier, supervised learning algorithms learn to approximate a function by mapping inputs and outputs to create a model that is able to predict future outputs given unseen inputs.

It's conventional to denote inputs as x and outputs as y; both can be numerical or categorical.

We can distinguish them as two different types of supervised learning:

  • Classification
  • Regression

Classification is a task where the output variable can assume a finite amount of elements, called categories. An example of classification would be classifying different types of flowers (output) given the sepal length (input). Classification can be further categorized in more sub types:

  • Binary classification: The task of predicting whether an instance belongs either to one class or the other
  • Multiclass classification: The task (also known as multinomial) of predicting the most probable label (class) for each single instance
  • Multilabel classification: When multiple labels can be assigned to each input

Regression is a task where the output variable is continuous. Here are some common regression algorithms:

  • Linear regression: This finds linear relationships between inputs and outputs
  • Logistic regression: This finds the probability of a binary output

In general, the supervised learning problem is solved in a standard way by performing the following steps:

  1. Performing data cleaning to make sure the data we are using is as accurate and descriptive as possible.
  2. Executing the feature engineering process, which involves the creation of new features out of the existing ones for improving the algorithm's performance.
  3. Transforming input data into something that our algorithm can understand, which is known as data transformation. Some algorithms, such as neural networks, don't work well with data that is not scaled as they would naturally give more importance to inputs with a larger magnitude.
  4. Choosing an appropriate model (or a few of them) for the problem.
  5. Choosing an appropriate metric to measure the effectiveness of our algorithm.
  6. Train the model using a subset of the available data, called the training set. On this training set, we calibrate the data transformations.
  7. Testing the model.

Data cleaning

Data cleaning is a fundamental process to make sure we are able to produce good results at the end. It is task-specific, as in the cleaning you will have to perform on audio data will be different for images, text, or a time series data.

We will need to make sure there is no missing data, and if that's the case we can decide how to deal with it. In the case of missing data—for example, an instance missing a few variables, it's possible to fill them with the average for that variable, fill it with a value that the input cannot assume, such as -1 if the variable is between 0 and 1 or disregard the instance if we have a lot of data.

Also, it's good to check whether the data respects the limitations of the values we are measuring. For example, a temperature in Celsius cannot be lower than 273.15 degrees, if that's the case, we know straight away that the data point is unreliable.

Other checks include the format, the data types, and the variance in the dataset.

It's possible to load some clean data directly from scikit-learn. There are a lot of datasets for all sort of tasks—for example, if we want to load some image data, we can use the following Python code:

from sklearn.datasets import fetch_lfw_people
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

This data is known as Labeled Faces in the Wild, a dataset for face recognition.