Book Image

Hands-On Data Analysis with NumPy and Pandas

By : Curtis Miller
5 (1)
Book Image

Hands-On Data Analysis with NumPy and Pandas

5 (1)
By: Curtis Miller

Overview of this book

Python, a multi-paradigm programming language, has become the language of choice for data scientists for visualization, data analysis, and machine learning. Hands-On Data Analysis with NumPy and Pandas starts by guiding you in setting up the right environment for data analysis with Python, along with helping you install the correct Python distribution. In addition to this, you will work with the Jupyter notebook and set up a database. Once you have covered Jupyter, you will dig deep into Python’s NumPy package, a powerful extension with advanced mathematical functions. You will then move on to creating NumPy arrays and employing different array methods and functions. You will explore Python’s pandas extension which will help you get to grips with data mining and learn to subset your data. Last but not the least you will grasp how to manage your datasets by sorting and ranking them. By the end of this book, you will have learned to index and group your data for sophisticated data analysis and manipulation.
Table of Contents (12 chapters)

Handling missing data in a pandas DataFrame


In this section, we will be looking at how we can handle missing data in a pandas DataFrame. We have a few ways of detecting missing data that work for both series and DataFrames. We could use NumPy's isnan function; we could also use the isnull or notnull method supplied with series and DataFrames for detection. NaN detection could be useful for custom approaches for handling missing information.

In this Notebook, we're going to look at ways of managing missing information. First we generate a DataFrame containing missing data, illustrated in the following screenshot:

As mentioned before in pandas, missing information is encoded by NumPy's NaN. This is, obviously, not necessarily how missing information is encoded everywhere. For example, in some surveys, missing data is encoded by an impossible numeric value. Say, the number of children the mother has is 999; this is obviously not correct. This is an example of using a sentinel value to indicate...