-
Book Overview & Buying
-
Table Of Contents
Data Smart
By :
Outliers are the odd points in a dataset—the ones that don't fit somehow. Historically, that's meant extreme values, meaning quantities that were either too large or small to have come naturally from the same process as the other observations in the dataset.
The only reason people used to care about outliers was because they wanted to get rid of them. Statisticians a hundred years ago had a lot in common with the Borg: a data point needed to assimilate or die. However, this was done with good reason (in the case of the statistician)—outliers can move averages and mess with spread measurements in the data. A good example of outlier removal is in gymnastics, where the highest and lowest judges' scores are always trimmed from the data before taking the average score.
Outliers have a knack for messing up machine learning models. For example, in Chapters 6 and...
Change the font size
Change margin width
Change background colour