Dealing with missing values

Dealing with missing values is a common problem we will typically face when analyzing data. A missing value is a value within a field or variable that is not present, even though it is expected to be. There are several reasons why this could have happened, but a common reason is that the data value wasn’t provided at the point of data collection. As we explore and analyze data, missing values can easily lead to inaccurate or biased conclusions; therefore, they need to be taken care of. Missing values are typically represented by blank spaces, but in pandas, they are represented by NaN.

Several techniques can be used to deal with missing values. In this recipe, we will focus on dropping missing values using the dropna method in pandas.

Getting ready

We will work with the full Marketing Campaign data in this recipe.

How to do it…

We will drop rows and columns using the pandas library:

Import the pandas library:
```
import pandas as pd
```

Load the .csv into a dataframe using read_csv. Then, subset the dataframe to include only relevant columns:

marketing_data = pd.read_csv("data/marketing_campaign.csv")

marketing_data = marketing_data[['ID', 'Year_Birth', 'Education','Income']]

Inspect the data. Check the first few rows, and check the number of columns and rows:

marketing_data.head()

    ID    Year_Birth        Education    Income

0    5524    1957            Graduation    58138.0

1    2174    1954            Graduation    46344.0

2    4141    1965            Graduation    71613.0

3    6182    1984            Graduation    26646.0

4    5324    1981            PhD         58293.0

marketing_data.shape

(2240, 4)

Check for missing values using the isnull and sum methods:

marketing_data.isnull().sum()

ID             0

Year_Birth     0

Education      0

Income        24

Drop missing values using the dropna method:

marketing_data_withoutna = marketing_data.dropna(how = 'any')

marketing_data_withoutna.shape

(2216, 4)

Good job! We have dropped missing values from our dataset.

How it works...

In step 1, we import pandas and refer to it as pd. In step 2, we use read_csv to load the .csv file into a pandas dataframe and call it marketing_data. We also subset the dataframe to include only four relevant columns. In step 3, we inspect the dataset using the head and shape methods.

In step 4, we use the isnull and sum methods to check for missing values. These methods give us the number of rows with missing values within each column in our dataset. Columns with zero have no rows with missing values.

In step 5, we use the dropna method to drop missing values. For the how parameter, we supply 'any' as the value to indicate that we want to drop rows that have missing values in any of the columns. An alternative value to use is 'all', which ensures all the columns of a row have missing values before dropping the row. We then check the number of rows and columns using the shape method. We can see that the final dataset has 24 fewer rows.

There’s more...

As highlighted previously, there are several reasons why a value may be missing in our dataset. Understanding the reason can point us to optimal solutions to resolve this problem. Missing values shouldn’t be addressed with a one-size-fits-all approach. Chapter 9 dives into the details of how to optimally deal with missing values and outliers by providing several techniques.

Exploratory Data Analysis with Python Cookbook

By : Ayodele Oluleye

Exploratory Data Analysis with Python Cookbook

By: Ayodele Oluleye

Overview of this book

Dealing with missing values

Getting ready

How to do it…

How it works...

There’s more...

See also

Exploratory Data Analysis with Python Cookbook

By : Ayodele Oluleye

Exploratory Data Analysis with Python Cookbook

By: Ayodele Oluleye

Overview of this book

Dealing with missing values

Getting ready

How to do it…

How it works...

There’s more...

See also

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access