Book Image

Python Machine Learning Cookbook - Second Edition

By : Giuseppe Ciaburro, Prateek Joshi
Book Image

Python Machine Learning Cookbook - Second Edition

By: Giuseppe Ciaburro, Prateek Joshi

Overview of this book

This eagerly anticipated second edition of the popular Python Machine Learning Cookbook will enable you to adopt a fresh approach to dealing with real-world machine learning and deep learning tasks. With the help of over 100 recipes, you will learn to build powerful machine learning applications using modern libraries from the Python ecosystem. The book will also guide you on how to implement various machine learning algorithms for classification, clustering, and recommendation engines, using a recipe-based approach. With emphasis on practical solutions, dedicated sections in the book will help you to apply supervised and unsupervised learning techniques to real-world problems. Toward the concluding chapters, you will get to grips with recipes that teach you advanced techniques including reinforcement learning, deep neural networks, and automated machine learning. By the end of this book, you will be equipped with the skills you need to apply machine learning techniques and leverage the full capabilities of the Python ecosystem through real-world examples.
Table of Contents (18 chapters)

Normalization

Data normalization is used when you want to adjust the values in the feature vector so that they can be measured on a common scale. One of the most common forms of normalization that is used in machine learning adjusts the values of a feature vector so that they sum up to 1.

Getting ready

To normalize data, the preprocessing.normalize() function can be used. This function scales input vectors individually to a unit norm (vector length). Three types of norms are provided, l1, l2, or max, and they are explained next. If x is the vector of covariates of length n, the normalized vector is y=x/z, where z is defined as follows:

The norm is a function that assigns a positive length to each vector belonging to a vector space, except 0.

How to do it...

Let's see how to normalize data in Python:

  1. As we said, to normalize data, the preprocessing.normalize() function can be used as follows (we will use the same data as in the previous recipe):
>> data_normalized = preprocessing.normalize(data, norm='l1', axis=0)
  1. To display the normalized array, we will use the following code:
>> print(data_normalized)

The following output is returned:

[[ 0.75 -0.17045455  0.47619048  -0.45762712]
[ 0. 0.45454545 -0.07142857 0.1779661 ]
[ 0.25 0.375 -0.45238095 -0.36440678]]

This is used a lot to make sure that datasets don't get boosted artificially due to the fundamental nature of their features.

  1. As already mentioned, the normalized array along the columns (features) must return a sum equal to 1. Let's check this for each column:
>> data_norm_abs = np.abs(data_normalized)
>> print(data_norm_abs.sum(axis=0))

In the first line of code, we used the np.abs() function to evaluate the absolute value of each element in the array. In the second row of code, we used the sum() function to calculate the sum of each column (axis=0). The following results are returned:

[1. 1. 1. 1.]

Therefore, the sum of the absolute value of the elements of each column is equal to 1, so the data is normalized.

How it works...

In this recipe, we normalized the data at our disposal to the unitary norm. Each sample with at least one non-zero component was rescaled independently of other samples so that its norm was equal to one.

There's more...

Scaling inputs to a unit norm is a very common task in text classification and clustering problems.

See also