Data is "missing" in pandas when it has a value of NaN
(also seen as np.nan
—the form from NumPy). The NaN
value represents that in a particular Series
that there is not a value specified for the particular index label.
In pandas, there are a number of reasons why a value can be NaN
:
A join of two sets of data does not have matched values
Data that you retrieved from an external source is incomplete
The
NaN
value is not known at a given point in time and will be filled in laterThere is a data collection error retrieving a value, but the event must still be recorded in the index
Reindexing of data has resulted in an index that does not have a value
The shape of data has changed and there are now additional rows or columns, which at the time of reshaping could not be determined
There are likely more reasons, but the general point is that they occur and you, as a pandas programmer, will need to work with them effectively to be able to perform correct data analysis. Fortunately...