Learning Data Mining with R

Learning Data Mining with R

By : Bater Makhabel

Buy this Book

Learning Data Mining with R

By: Bater Makhabel

Buy this Book

Overview of this book

<p>Being able to deal with the array of problems that you may encounter during complex statistical projects can be difficult. If you have only a basic knowledge of R, this book will provide you with the skills and knowledge to successfully create and customize the most popular data mining algorithms to overcome these difficulties.</p> <p>You will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. Discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on RHadoop projects. You will finish this book feeling confident in your ability to know which data mining algorithm to apply in any situation.</p>

Learning Data Mining with R

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Warming Up

Big data

Data source

Data mining

Social network mining

Why R?

Data attributes and description

Data cleaning

Data integration

Data dimension reduction

Data transformation and discretization

Visualization of results

Time for action

Summary

Mining Frequent Patterns, Associations, and Correlations

An overview of associations and patterns

Market basket analysis

Hybrid association rules mining

Mining sequence dataset

The R implementation

High-performance algorithms

Time for action

Summary

Classification

Generic decision tree induction

High-value credit card customers classification using ID3

Web spam detection using C4.5

Web key resource page judgment using CART

Trojan traffic identification method and Bayes classification

Identify spam e-mail and Naïve Bayes classification

Rule-based classification of player types in computer games and rule-based classification

Time for action

Summary

Advanced Classification

Ensemble (EM) methods

Biological traits and the Bayesian belief network

Protein classification and the k-Nearest Neighbors algorithm

Document retrieval and Support Vector Machine

Classification using frequent patterns

Classification using the backpropagation algorithm

Time for action

Summary

Cluster Analysis

Search engines and the k-means algorithm

Automatic abstraction of document texts and the k-medoids algorithm

The CLARA algorithm

CLARANS

Unsupervised image categorization and affinity propagation clustering

News categorization and hierarchical clustering

Time for action

Summary

Advanced Cluster Analysis

Customer categorization analysis of e-commerce and DBSCAN

Clustering web pages and OPTICS

Visitor analysis in the browser cache and DENCLUE

Recommendation system and STING

Web sentiment analysis and CLIQUE

Opinion mining and WAVE clustering

User search intent and the EM algorithm

Customer purchase data analysis and clustering high-dimensional data

SNS and clustering graph and network data

Time for action

Summary

Outlier Detection

Credit card fraud detection and statistical methods

Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods

Intrusion detection and density-based methods

Intrusion detection and clustering-based methods

Monitoring the performance of the web server and classification-based methods

Detecting novelty in text, topic detection, and mining contextual outliers

Collective outliers on spatial data

Outlier detection in high-dimensional data

Time for action

Summary

Mining Stream, Time-series, and Sequence Data

The credit card transaction flow and STREAM algorithm

Predicting future prices and time-series analysis

Stock market data and time-series clustering and classification

Web click streams and mining symbolic sequences

Mining sequence patterns in transactional databases

Time for action

Summary

Graph Mining and Network Analysis

Graph mining

Mining frequent subgraph patterns

Social network mining

Time for action

Summary

Mining Text and Web Data

Text mining and TM packages

Text summarization

The question answering system

Genre categorization of web pages

Categorizing newspaper articles and newswires into topics

Web usage mining with web logs

Time for action

Summary

Algorithms and Data Structures

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

User search intent and the EM algorithm

The Expectation Maximization (EM) algorithm is a probabilistic-model-based clustering algorithm that depends on the mixture model in which the data is modeled by a mixture of simple models. The parameters related to these models are estimated by Maximum Likelihood Estimation (MLE).

Mixture models assume that the data is the result of the combination of various simple probabilistic distribution functions. Given K distribution functions and the jth distribution with the parameter, , is the set of of all distributions:

The EM algorithm performs in the following way. In the first step, an initial group of model parameters are selected. The expectation step is the second step that performs the calculation of the probability:

The previous equation represents the probability of each data object belonging to each distribution. Maximization is the third step. With the result of the expectation step, update the estimation of the parameters with the ones that...

Learning Data Mining with R

By : Bater Makhabel

Learning Data Mining with R

By: Bater Makhabel

Overview of this book

Related Content you might be interested in

Current Title:

Learning Data Mining with R

User search intent and the EM algorithm