Book Image

Learning Pandas

By : Michael Heydt
Book Image

Learning Pandas

By: Michael Heydt

Overview of this book

Table of Contents (19 chapters)
Learning pandas
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Working with missing data


Data is "missing" in pandas when it has a value of NaN (also seen as np.nan—the form from NumPy). The NaN value represents that in a particular Series that there is not a value specified for the particular index label.

In pandas, there are a number of reasons why a value can be NaN:

  • A join of two sets of data does not have matched values

  • Data that you retrieved from an external source is incomplete

  • The NaN value is not known at a given point in time and will be filled in later

  • There is a data collection error retrieving a value, but the event must still be recorded in the index

  • Reindexing of data has resulted in an index that does not have a value

  • The shape of data has changed and there are now additional rows or columns, which at the time of reshaping could not be determined

There are likely more reasons, but the general point is that they occur and you, as a pandas programmer, will need to work with them effectively to be able to perform correct data analysis. Fortunately...