Book Image

Learning pandas - Second Edition

By : Michael Heydt
Book Image

Learning pandas - Second Edition

By: Michael Heydt

Overview of this book

You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance. With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Table of Contents (16 chapters)

How to work with missing data

Data is missing in pandas when it has a value of NaN (also seen as np.nan - the form from NumPy). This NaN value means that there is no value specified for the particular index label in a particular Series.

How can data be missing? There are a number of reasons why a value can be NaN:

  • A join of two sets of data does not have matched values
  • Data that you retrieved from an external source is incomplete
  • The NaN value is not known at a given point in time and will be filled in later
  • There is a data collection error retrieving a value, but the event must still be recorded in the index
  • Reindexing of data has resulted in an index that does not have a value
  • The shape of data has changed and there are now additional rows or columns, which at the time of reshaping could not be determined
  • There are likely more reasons, but the general point is that these situations...