-
Book Overview & Buying
-
Table Of Contents
The Applied Artificial Intelligence Workshop
By :
Before building a classifier, we need to format our data so that we can keep relevant data in the most suitable format for classification and remove all the data that we are not interested in.
The following points are the best ways to achieve this:
For instance, if there are N/A (or NA) values in the dataset, we may be better off substituting these values with a numeric value we can handle. Recall from the previous chapter that NA stands for Not Available and that it represents a missing value. We may choose to ignore rows with NA values or replace them with an outlier value.
Note
An outlier value is a value such as -1,000,000 that clearly stands out from regular values in the dataset.
The fillna() method of a DataFrame does this type of replacement. The replacement of NA values with an outlier looks as follows:
df.fillna(-1000000, inplace=True)
The fillna() method changes all NA values into numeric values.
This numeric value...