Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Extreme values and outliers


An outlier or extreme value is defined as a data point that deviates so far from the other observations, that it becomes suspicious to be generated by a totally different mechanism or simply by error. Identifying outliers is important because those extreme values can:

  • Increase error variance

  • Influence estimates

  • Decrease normality

Or in other words, let's say your raw dataset is a piece of rounded stone to be used as a perfect ball in some game, which has to be cleaned and polished before actually using it. The stone has some small holes on its surface, like missing values in the data, which should be filled – with data imputation.

On the other hand, the stone does not only has holes on its surface, but some mud also covers some parts of the item, which is to be removed. But how can we distinguish mud from the real stone? In this section, we will focus on what the outliers package and some related methods have to offer for identifying extreme values.

As this package...