Winsorizing is another technique to deal with outliers and is named after Charles Winsor. In effect, Winsorization clips outliers to given percentiles in a symmetric fashion. For instance, we can clip to the 5th and 95th percentile. SciPy has a winsorize()
function, which performs this procedure. The data for this recipe is the same as that for the Clipping and filtering outliers recipe.
Winsorize the data with the following procedure:
The imports are as follows:
rom scipy.stats.mstats import winsorize import statsmodels.api as sm import seaborn as sns import matplotlib.pyplot as plt import dautil as dl from IPython.display import HTML
Load and winsorize the data for the effective temperature (limit is set to 15%):
starsCYG = sm.datasets.get_rdataset("starsCYG", "robustbase", cache=True).data limit = 0.15 winsorized_x = starsCYG.copy() winsorized_x['log.Te'] = winsorize(starsCYG['log.Te'], limits=limit)
Winsorize the light intensity as follows:
winsorized_y = starsCYG...