Book Image

Learning Data Mining with R

By : Bater Makhabel
Book Image

Learning Data Mining with R

By: Bater Makhabel

Overview of this book

<p>Being able to deal with the array of problems that you may encounter during complex statistical projects can be difficult. If you have only a basic knowledge of R, this book will provide you with the skills and knowledge to successfully create and customize the most popular data mining algorithms to overcome these difficulties.</p> <p>You will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. Discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on RHadoop projects. You will finish this book feeling confident in your ability to know which data mining algorithm to apply in any situation.</p>
Table of Contents (19 chapters)
Learning Data Mining with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Algorithms and Data Structures
Index

Credit card fraud detection and statistical methods


One major solution to detect outliers is the model-based method or statistical method. The outlier is defined as the object not belonging to the model that is used to represent the original dataset. In other words, that model does not generate the outlier.

Among the accurate models to be adopted for the specific dataset, there are many choices available such as Gaussian and Poisson. If the wrong model is used to detect outliers, the normal data point may wrongly be recognized as an outlier. In addition to applying the single distribution model, the mixture of distribution models is practical too.

The log-likelihood function is adopted to find the estimation of parameters of a model:

The likelihood-based outlier detection algorithm

The summarized pseudocode of the likelihood-based outlier detection algorithm is as follows:

The R implementation

Look up the file of R codes, ch_07_lboutlier_detection.R, from the bundle of R codes for the previously...