Data that doesn't follow a known distribution, such as the normal distribution, is often difficult to manage. A popular strategy to get control of the data is to apply the Box-Cox transformation. It is given by the following equation:
The scipy.stats.boxcox()
function can apply the transformation for positive data. We will use the same data as in the Clipping and filtering outliers recipe. With Q-Q plots, we will show that the Box-Cox transformation does indeed make the data appear more normal.
The following steps show how to normalize data with the Box-Cox transformation:
The imports are as follows:
import statsmodels.api as sm import matplotlib.pyplot as plt from scipy.stats import boxcox import seaborn as sns import dautil as dl from IPython.display import HTML
Load the data and transform it as follows:
context = dl.nb.Context('normalizing_boxcox') starsCYG = sm.datasets.get_rdataset("starsCYG", "robustbase", cache=True).data var...