Outliers are a problem because they influence our understanding of data. In this recipe, we define outliers to be away from the first or third quartile of the data by at least 1.5 times the interquartile range. The interquartile range is the distance between the first and third quartiles. Let's count the outliers for each month of the year. The complete code is in the extreme.py
file in this book's code bundle:
import numpy as np import matplotlib.pyplot as plt import calendar as cal data = np.load('cbk12.npy') # Multiply to get hPa values meanp = .1 * data[:,1] # Filter out 0 values meanp = np.ma.array(meanp, mask = meanp == 0) # Calculate quartiles and irq q1 = np.percentile(meanp, 25) median = np.percentile(meanp, 50) q3 = np.percentile(meanp, 75) irq = q3 - q1 # Get months dates = data[:,0] months = (dates % 10000)/100 m_low = np.zeros(12) m_high = np.zeros(12) month_range = np.arange(1, 13) for month in month_range: indices ...