Our first method of identifying missing values is to give us a better understanding of how to work with real-world data. Often, data can have missing values due to a variety of reasons, for example with survey data, some observations may not have been recorded. It is important for us to analyze our data, and get a sense of what the missing values are so we can decide how we want to handle missing values for our machine learning. To start, let's dive into a dataset that we will be interested in for the duration of this chapter, the Pima Indian Diabetes Prediction
dataset.
This dataset is available on the UCI Machine Learning Repository at:
https://archive.ics.uci.edu/ml/datasets/pima+indians+diabetes.
From the main website, we can learn a few things about this publicly available dataset. We have nine columns and 768 instances (rows). The dataset is primarily used for predicting the onset of diabetes within five years...