Book Image

Python Machine Learning (Wiley)

By : Wei-Meng Lee
Book Image

Python Machine Learning (Wiley)

By: Wei-Meng Lee

Overview of this book

With computing power increasing exponentially and costs decreasing at the same time, this is the best time to learn machine learning using Python. Machine learning tasks that once required enormous processing power are now possible on desktop machines. Python Machine Learning begins by covering some fundamental libraries used in Python that make machine learning possible. You'll learn how to manipulate arrays of numbers with NumPy and use pandas to deal with tabular data. Once you have a firm foundation in the basics, you'll explore machine learning using Python and the scikit-learn libraries. You'll learn how to visualize data by plotting different types of charts and graphs using the matplotlib library. You'll gain a solid understanding of how the various machine learning algorithms work behind the scenes. The later chapters explore the common machine learning algorithms, such as regression, clustering, and classification, and discuss how to deploy the models that you have built, so that they can be used by client applications running on mobile and desktop devices. By the end of the book, you'll have all the knowledge you need to begin machine learning using Python.
Table of Contents (16 chapters)
Free Chapter
1
Cover
2
Introduction
11
CHAPTER 9: Supervised Learning—Classification Using K‐Nearest Neighbors (KNN)
15
Index
16
End User License Agreement

Data Cleansing

In machine learning, one of the first tasks that you need to perform is data cleansing. Very seldom would you have a dataset that you can use straightaway to train your model. Instead, you have to examine the data carefully for any missing values and either remove them or replace them with some valid values, or you have to normalize them if there are columns with wildly different values. The following sections show some of the common tasks you need to perform when cleaning

Cleaning Rows with NaNs

Consider a CSV file named NaNDataset.csv with the following content:

A,B,C
1,2,3
4,,6
7,,9
10,11,12
13,14,15
16,17,18 

Visually, you can spot that there are a few rows with empty fields. Specifically, the second and third rows have missing values for the second columns. For small sets of data, this is easy to spot. But if you have a large dataset, it becomes almost impossible to detect. An effective way to detect for empty rows is to load the dataset into a Pandas dataframe and...