Learning Data Mining with R

Learning Data Mining with R

By : Bater Makhabel

Buy this Book

Learning Data Mining with R

By: Bater Makhabel

Buy this Book

Overview of this book

<p>Being able to deal with the array of problems that you may encounter during complex statistical projects can be difficult. If you have only a basic knowledge of R, this book will provide you with the skills and knowledge to successfully create and customize the most popular data mining algorithms to overcome these difficulties.</p> <p>You will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. Discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on RHadoop projects. You will finish this book feeling confident in your ability to know which data mining algorithm to apply in any situation.</p>

Learning Data Mining with R

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Warming Up

Big data

Data source

Data mining

Social network mining

Why R?

Data attributes and description

Data cleaning

Data integration

Data dimension reduction

Data transformation and discretization

Visualization of results

Time for action

Summary

Mining Frequent Patterns, Associations, and Correlations

An overview of associations and patterns

Market basket analysis

Hybrid association rules mining

Mining sequence dataset

The R implementation

High-performance algorithms

Time for action

Summary

Classification

Generic decision tree induction

High-value credit card customers classification using ID3

Web spam detection using C4.5

Web key resource page judgment using CART

Trojan traffic identification method and Bayes classification

Identify spam e-mail and Naïve Bayes classification

Rule-based classification of player types in computer games and rule-based classification

Time for action

Summary

Advanced Classification

Ensemble (EM) methods

Biological traits and the Bayesian belief network

Protein classification and the k-Nearest Neighbors algorithm

Document retrieval and Support Vector Machine

Classification using frequent patterns

Classification using the backpropagation algorithm

Time for action

Summary

Cluster Analysis

Search engines and the k-means algorithm

Automatic abstraction of document texts and the k-medoids algorithm

The CLARA algorithm

CLARANS

Unsupervised image categorization and affinity propagation clustering

News categorization and hierarchical clustering

Time for action

Summary

Advanced Cluster Analysis

Customer categorization analysis of e-commerce and DBSCAN

Clustering web pages and OPTICS

Visitor analysis in the browser cache and DENCLUE

Recommendation system and STING

Web sentiment analysis and CLIQUE

Opinion mining and WAVE clustering

User search intent and the EM algorithm

Customer purchase data analysis and clustering high-dimensional data

SNS and clustering graph and network data

Time for action

Summary

Outlier Detection

Credit card fraud detection and statistical methods

Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods

Intrusion detection and density-based methods

Intrusion detection and clustering-based methods

Monitoring the performance of the web server and classification-based methods

Detecting novelty in text, topic detection, and mining contextual outliers

Collective outliers on spatial data

Outlier detection in high-dimensional data

Time for action

Summary

Mining Stream, Time-series, and Sequence Data

The credit card transaction flow and STREAM algorithm

Predicting future prices and time-series analysis

Stock market data and time-series clustering and classification

Web click streams and mining symbolic sequences

Mining sequence patterns in transactional databases

Time for action

Summary

Graph Mining and Network Analysis

Graph mining

Mining frequent subgraph patterns

Social network mining

Time for action

Summary

Mining Text and Web Data

Text mining and TM packages

Text summarization

The question answering system

Genre categorization of web pages

Categorizing newspaper articles and newswires into topics

Web usage mining with web logs

Time for action

Summary

Algorithms and Data Structures

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Search engines and the k-means algorithm

The general process of partition-based clustering is iterative. The first step defines or chooses a predefined number of representatives of the cluster and updates the representative after each iteration if the measure for the clustering quality has improved. The following diagram shows the typical process, that is, the partition of the given dataset into disjoint clusters:

The characteristics of partition-based clustering methods are as follows:

The resulting clusters are exclusive in most of the circumstances
The shape of the clusters are spherical, because of most of the measures adopted are distance-based measures
The representative of each cluster is usually the mean or medoid of the corresponding group (cluster) of points
A partition represents a cluster
These clusters are applicable for small-to-medium datasets
The algorithm will converge under certain convergence object functions, and the result clusters are often local optimum

The k-means clustering...

Learning Data Mining with R

By : Bater Makhabel

Learning Data Mining with R

By: Bater Makhabel

Overview of this book

Related Content you might be interested in

Current Title:

Learning Data Mining with R

Search engines and the k-means algorithm