pandas mathematical operators and functions handle NaN
in a special manner (compared to NumPy) that does not break the computations. pandas is lenient with missing data assuming that it is a common situation.
To demonstrate the difference, we can examine the following code, which calculates the mean of a NumPy array:
In [54]: # mean of numpy array values nda = np.array([1, 2, 3, 4, 5]) nda.mean() Out[54]: 3.0
The result is as expected. The following code replaces one value with a NaN
value:
In [55]: # mean of numpy array values with a NaN nda = np.array([1, 2, 3, 4, np.NaN]) nda.mean() Out[55]: nan
When encountering a NaN
value, NumPy simply returns NaN
. pandas changes this, so that NaN
values are ignored:
In [56]: # ignores NaN values s = pd.Series(nda) s.mean() Out[56]: 2.5
In this case, pandas override the mean function of the Series
object so that NaN
values are simply ignored. They are not counted as a 0 value...