Learning Data Mining with R

By : Bater Makhabel

Learning Data Mining with R

By: Bater Makhabel

Overview of this book

<p>Being able to deal with the array of problems that you may encounter during complex statistical projects can be difficult. If you have only a basic knowledge of R, this book will provide you with the skills and knowledge to successfully create and customize the most popular data mining algorithms to overcome these difficulties.</p> <p>You will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. Discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on RHadoop projects. You will finish this book feeling confident in your ability to know which data mining algorithm to apply in any situation.</p>

Learning Data Mining with R

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Warming Up

Big data

Data source

Data mining

Social network mining

Why R?

Data attributes and description

Data cleaning

Data integration

Data dimension reduction

Data transformation and discretization

Visualization of results

Time for action

Summary

Mining Frequent Patterns, Associations, and Correlations

An overview of associations and patterns

Market basket analysis

Hybrid association rules mining

Mining sequence dataset

The R implementation

High-performance algorithms

Time for action

Summary

Classification

Generic decision tree induction

High-value credit card customers classification using ID3

Web spam detection using C4.5

Web key resource page judgment using CART

Trojan traffic identification method and Bayes classification

Identify spam e-mail and Naïve Bayes classification

Rule-based classification of player types in computer games and rule-based classification

Time for action

Summary

Advanced Classification

Ensemble (EM) methods

Biological traits and the Bayesian belief network

Protein classification and the k-Nearest Neighbors algorithm

Document retrieval and Support Vector Machine

Classification using frequent patterns

Classification using the backpropagation algorithm

Time for action

Summary

Cluster Analysis

Search engines and the k-means algorithm

Automatic abstraction of document texts and the k-medoids algorithm

The CLARA algorithm

CLARANS

Unsupervised image categorization and affinity propagation clustering

News categorization and hierarchical clustering

Time for action

Summary

Advanced Cluster Analysis

Customer categorization analysis of e-commerce and DBSCAN

Clustering web pages and OPTICS

Visitor analysis in the browser cache and DENCLUE

Recommendation system and STING

Web sentiment analysis and CLIQUE

Opinion mining and WAVE clustering

User search intent and the EM algorithm

Customer purchase data analysis and clustering high-dimensional data

SNS and clustering graph and network data

Time for action

Summary

Outlier Detection

Credit card fraud detection and statistical methods

Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods

Intrusion detection and density-based methods

Intrusion detection and clustering-based methods

Monitoring the performance of the web server and classification-based methods

Detecting novelty in text, topic detection, and mining contextual outliers

Collective outliers on spatial data

Outlier detection in high-dimensional data

Time for action

Summary

Mining Stream, Time-series, and Sequence Data

The credit card transaction flow and STREAM algorithm

Predicting future prices and time-series analysis

Stock market data and time-series clustering and classification

Web click streams and mining symbolic sequences

Mining sequence patterns in transactional databases

Time for action

Summary

Graph Mining and Network Analysis

Graph mining

Mining frequent subgraph patterns

Social network mining

Time for action

Summary

Mining Text and Web Data

Text mining and TM packages

Text summarization

The question answering system

Genre categorization of web pages

Categorizing newspaper articles and newswires into topics

Web usage mining with web logs

Time for action

Summary

Algorithms and Data Structures

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Index

A

A-Priori algorithm
- about / A-Priori algorithms
- input data characteristics / Input data characteristics and data structure
- data structure / Input data characteristics and data structure
- join action / The A-Priori algorithm
- prune action / The A-Priori algorithm
- R implementation / The R implementation
- variants / A-Priori algorithm variants
AdaBoost algorithm
- about / The boosting and AdaBoost algorithms
affinity propagation (AP) clustering
- about / Unsupervised image categorization and affinity propagation clustering
- R implementation / The R implementation
- used, for unsupervised image categorization / Unsupervised image categorization
- spectral clustering algorithm / The spectral clustering algorithm
agglomerative clustering
- about / News categorization and hierarchical clustering
- pseudocode / Agglomerative hierarchical clustering
algorithm, for association rule generation
- R implementation / The R implementation
association rules
- about / Association rules
- generating, with algorithm / The algorithm to generate association rules
associations
- about / An overview of associations and patterns
associative classification
- about / The associative classification
- Classification Based on Association (CBA) / The associative classification
- Classification Based on Multiple Association Rules (CMAR) / The associative classification
attribute
- about / Data attributes and description
Auto Regressive Integrated Moving Average (ARIMA) algorithm
- about / Predicting future prices and time-series analysis, The ARIMA algorithm
- future prices, predicting / Predicting future prices

B

bagging algorithm
- about / The bagging algorithm
- input parameters / The bagging algorithm
basket
- about / The frequent itemset
Bayes classification
- about / Trojan traffic identification method and Bayes classification
- used, for Trojan traffic identification / Trojan traffic identification method and Bayes classification, Trojan traffic identification method
- prior probability estimation / Prior probability estimation
- likelihood estimation / Likelihood estimation
- pseudocode / The Bayes classification
- R implementation / The R implementation
Bayesian hierarchical clustering algorithm
- about / The Bayesian hierarchical clustering algorithm
- pseudocode / The Bayesian hierarchical clustering algorithm
BBN algorithm
- about / Biological traits and the Bayesian belief network, The Bayesian belief network (BBN) algorithm
- R implementation / The R implementation
- biological traits / Biological traits
big data
- about / Big data
- data types / Big data
- scalability / Scalability and efficiency
- efficiency / Scalability and efficiency
binning
- about / Junk, noisy data, or outlier
BIRCH algorithm
- about / The BIRCH algorithm
- CF-Tree rebuilding / The BIRCH algorithm
- CF-Tree insertion / The BIRCH algorithm
- pseudocode / The BIRCH algorithm
Bonferroni's Principle
- about / The limitations of statistics on data mining
Bonferroni correction
- about / The limitations of statistics on data mining
boosting algorithm
- about / The boosting and AdaBoost algorithms
BP algorithm
- about / Classification using the backpropagation algorithm
- input parameters / The BP algorithm
- pseudocode / The BP algorithm
- R implementation / The R implementation
- parallel version, with MapReduce / Parallel version with MapReduce
brute-force algorithm
- about / The brute-force algorithm

C

C4.5 algorithm
- used, for web spam detection / Web spam detection using C4.5, Web spam detection
- characteristics / Web spam detection using C4.5
- pseudocode / The C4.5 algorithm
- R implementation / The R implementation
- parallel version, MapReduce / A parallel version with MapReduce
CART algorithm
- used, for web key resource page judgment / Web key resource page judgment using CART, Web key resource page judgment
- about / Web key resource page judgment using CART
- characteristics / Web key resource page judgment using CART
- pseudocode / The CART algorithm
- R implementation / The R implementation
categorical attributes
- about / Categorical attributes
- nominal / Categorical attributes
- ordinal / Categorical attributes
CF-Tree
- about / The BIRCH algorithm
chameleon algorithm
- about / The chameleon algorithm
- sparsification / The chameleon algorithm
- graph partitioning / The chameleon algorithm
- agglomerative hierarchical clustering / The chameleon algorithm
Charm algorithm
- about / The Charm algorithm with closed frequent itemsets
- R implementation / The R implementation
CLARA algorithm
- about / The CLARA algorithm
- pseudocode / The CLARA algorithm
- R implementation / The R implementation
CLARANS algorithm
- about / CLARANS
- input parameters / The CLARANS algorithm
- pseudocode / The CLARANS algorithm
- R implementation / The R implementation
classification
- about / Classification
- training (supervised learning) / Classification
- validation / Classification
classification, with frequent patterns
- about / Classification using frequent patterns
- associative classification / The associative classification
- discriminative frequent pattern-based classification / Discriminative frequent pattern-based classification
- R implementation / The R implementation
- sentential frequent itemsets / Text classification using sentential frequent itemsets
classification-based methods
- about / Monitoring the performance of the web server and classification-based methods
- OCSVM (One Class SVM) algorithm / The OCSVM algorithm
- one-class nearest neighbor algorithm / The one-class nearest neighbor algorithm
- R implementation / The R implementation
- web server performance, monitoring / Monitoring the performance of the web server
Classification Based on Association (CBA)
- about / CBA
- pseudocode / CBA
Classification Based on Multiple Association Rules (CMAR)
- about / The associative classification
CLIQUE algorithm
- about / Web sentiment analysis and CLIQUE
- characteristics / Web sentiment analysis and CLIQUE
- pseudocode / The CLIQUE algorithm
- R implementation / The R implementation
- web sentiment analysis / Web sentiment analysis
closed frequent itemsets
- mining, with Charm algorithm / The Charm algorithm with closed frequent itemsets
clustering-based methods
- about / Intrusion detection and clustering-based methods
- hierarchical clustering algorithm / Hierarchical clustering to detect outliers
- k-means algorithm / The k-means-based algorithm
- ODIN algorithm / The ODIN algorithm
- R implementation / The R implementation
collective outliers
- about / Collective outliers on spatial data
- route outlier detection (ROD) algorithm / The route outlier detection (ROD) algorithm
- characteristics / Characteristics of collective outliers
Comprehensive R Archive Network (CRAN)
- about / What are the disadvantages of R?
conditional anomaly detection (CAD) algorithm
- about / The conditional anomaly detection (CAD) algorithm
- R implementation / The R implementation
conditional probability tables (CPT)
- about / Biological traits and the Bayesian belief network
constraint-based frequent pattern mining
- about / Hybrid association rules mining, Constraint-based frequent pattern mining
contextual outliers
- mining / Detecting novelty in text, topic detection, and mining contextual outliers
- conditional anomaly detection (CAD) algorithm / The conditional anomaly detection (CAD) algorithm
continuous, numeric attributes
- about / Numeric attributes
correlation rules
- about / Correlation rules
credit card fraud detection
- about / Credit card fraud detection
credit card transaction flow
- mining / The credit card transaction flow
CRISP-DM
- about / The data mining process
- business understanding / CRISP-DM
- data understanding / CRISP-DM
- data preparation / CRISP-DM
- modeling / CRISP-DM
- evaluation / CRISP-DM
- deployment / CRISP-DM
CRM (Customer Relation Management)
- about / Web click streams
Cubic Clustering Criterion
- about / The k-means-based algorithm
CUR decomposition
- about / CUR decomposition
customer purchase data analysis
- about / Customer purchase data analysis

D

DASL
- about / Data source
- URL / Data source
data attributes
- about / Data attributes and description
- numeric attributes / Numeric attributes
- categorical attributes / Categorical attributes
data attributes, views
- algebraic or geometric view / Data attributes and description
- probability view / Data attributes and description
data classification
- linearly separable / Document retrieval and Support Vector Machine
- nonlinearly separable / Document retrieval and Support Vector Machine
data cleaning
- about / Data cleaning
- missing values, avoiding / Missing values
- junk / Junk, noisy data, or outlier
- noisy data / Junk, noisy data, or outlier
- outlier / Junk, noisy data, or outlier
data description
- about / Data attributes and description, Data description
- measures of central tendency / Data description
- measures of data dispersion / Data description
data dimension reduction
- about / Data dimension reduction
- eigenvalues / Eigenvalues and Eigenvectors
- eigenvectors / Eigenvalues and Eigenvectors
- PCA / Principal-Component Analysis
- SVD / Singular-value decomposition
- CUR decomposition / CUR decomposition
data discretization
- about / Data transformation and discretization, Data discretization
- by binning / Data discretization
- by histogram analysis / Data discretization
- by cluster analysis / Data discretization
- by decision tree analysis / Data discretization
- by correlation analysis / Data discretization
data integration
- about / Data integration
- issues / Data integration
data measuring
- about / Data measuring
data mining
- about / Data mining
- feature extraction / Feature extraction
- summarization / Summarization
- process / The data mining process
- statistics / Statistics and data mining
Data Quality (DQ)
- about / Data cleaning
dataset
- link-based features / Web spam detection
- content-based features / Web spam detection
data smoothing
- binning / Junk, noisy data, or outlier
- regression / Junk, noisy data, or outlier
- classification / Junk, noisy data, or outlier
- outlier / Junk, noisy data, or outlier
data source
- about / Data source
- online resources / Data source
data transformation
- about / Data transformation and discretization, Data transformation
- smoothing / Data transformation
- attribute construction / Data transformation
- aggregation / Data transformation
- normalization / Data transformation
- discretization / Data transformation
- concept hierarchy generation, for nominal data / Data transformation
- normalization methods / Normalization data transformation methods
DBSCAN algorithm
- about / Customer categorization analysis of e-commerce and DBSCAN
- characteristics / Customer categorization analysis of e-commerce and DBSCAN
- pseudocode / The DBSCAN algorithm
- customer categorization analysis, of e-commerce / Customer categorization analysis of e-commerce
decision tree
- about / Generic decision tree induction
decision tree induction
- about / Generic decision tree induction
- characteristics / Generic decision tree induction
- attribute selection measures / Attribute selection measures
- tree pruning / Tree pruning
- algorithm, pseudocode / General algorithm for the decision tree generation
- R implementation / The R implementation
decision tree induction, attribute selection measures
- Entropy / Attribute selection measures
- Gain / Attribute selection measures
- Gain Ratio / Attribute selection measures
- Information Gain / Attribute selection measures
- Gini Index / Attribute selection measures
- Split Info / Attribute selection measures
DENCLUE algorithm
- about / Visitor analysis in the browser cache and DENCLUE
- density attractor / Visitor analysis in the browser cache and DENCLUE
- influence function / Visitor analysis in the browser cache and DENCLUE
- density function / Visitor analysis in the browser cache and DENCLUE
- gradient / Visitor analysis in the browser cache and DENCLUE
- pseudocode / The DENCLUE algorithm
- R implementation / The R implementation
- visitor analysis, in browser cache / Visitor analysis in the browser cache
density-based cluster
- about / Customer categorization analysis of e-commerce and DBSCAN
density-based methods
- about / Intrusion detection and density-based methods
- OPTICS-OF algorithm / The OPTICS-OF algorithm
- High Contrast Subspace (HiCS) algorithm / The High Contrast Subspace algorithm
- R implementation / The R implementation
- intrusion detection / Intrusion detection
directed graphs
- about / Graph
discrete, numeric attributes
- about / Numeric attributes
discriminative frequent pattern-based classification
- about / Discriminative frequent pattern-based classification
- pseudocode / Discriminative frequent pattern-based classification
disjunctive normal form (DNF)
- about / Web sentiment analysis and CLIQUE
distance-based outlier detection algorithm
- about / The distance-based algorithm
divisive clustering
- about / News categorization and hierarchical clustering
document retrieval
- with SVM algorithm / Document retrieval
document text
- automatic abstraction, k-medoids algorithm used / Automatic abstraction and summarization of document text
Dolphin algorithm
- about / The Dolphin algorithm

E

e-commerce
- customer categorization analysis / Customer categorization analysis of e-commerce
Eclat algorithm
- about / The Eclat algorithm
- R implementation / The R implementation
eigenvalues
- about / Eigenvalues and Eigenvectors
eigenvectors
- about / Eigenvalues and Eigenvectors
EM methods
- about / Ensemble (EM) methods
- structure / Ensemble (EM) methods
- bagging algorithm / The bagging algorithm
- AdaBoost algorithm / The boosting and AdaBoost algorithms
- boosting algorithm / The boosting and AdaBoost algorithms
- Random forests algorithm / The Random forests algorithm
- R implementation / The R implementation
- parallel version, with MapReduce / Parallel version with MapReduce
escription length (DL)
- about / The RIPPER algorithm
Expectation Maximization (EM) algorithm
- about / User search intent and the EM algorithm
- pseudocode / The EM algorithm
- R implementation / The R implementation
- user search intent, determining / The user search intent

F

FCA-based association rule mining algorithm
- used, for web usage mining / The FCA-based association rule mining algorithm
- R implementation / The R implementation
feature extraction, examples
- frequent itemsets / Feature extraction
- similar items / Feature extraction
FindAllOutsD algorithm
- about / The FindAllOutsD algorithm
FindAllOutsM algorithm
- about / The FindAllOutsM algorithm
FP-growth algorithm
- about / The FP-growth algorithm
- input data characteristics / Input data characteristics and data structure
- data structure / Input data characteristics and data structure
- pseudo code / The FP-growth algorithm
- R implementation / The R implementation
frequent itemset
- about / The frequent itemset
Frequent Itemset Mining Dataset Repository
- URL / Data source
- about / Data source
frequent patterns
- about / Patterns and pattern discovery
- frequent itemset / Patterns and pattern discovery, The frequent itemset
- frequent substructures / Patterns and pattern discovery, The frequent substructures
- frequent subsequence / Patterns and pattern discovery, The frequent subsequence
frequent subgraph patterns mining algorithm
- about / Mining frequent subgraph patterns
- gPLS algorithm / Mining frequent subgraph patterns
- GraphSig algorithm / The gSpan algorithm
- gSpan algorithm / The gSpan algorithm
- R implementation / The R implementation
frequent subsequence
- about / The frequent subsequence
- examples / The frequent subsequence
frequent substructures
- about / The frequent substructures
- examples / The frequent substructures
future prices
- predicting / Predicting future prices

G

GenMax algorithm
- about / The GenMax algorithm with maximal frequent itemsets
- R implementation / The R implementation
genre categorization
- of web pages / Genre categorization of web pages
graph
- about / Graph
- directed graphs / Graph
- undirected graphs / Graph
Graph-Based Sub-topic Partition Algorithm (GSPSummary) algorithm
- about / The multidocument summarization algorithm
graph and network data
- clustering / SNS and clustering graph and network data
graph mining
- about / Graph mining
- algorithms / Graph mining algorithms
GSP algorithm
- sequence dataset, mining / The GSP algorithm
- features / The GSP algorithm
- R implementation / The R implementation

H

hError algorithm
- about / The hError algorithm
- R implementation / The R implementation
hierarchical clustering
- about / News categorization and hierarchical clustering
- agglomerative clustering / News categorization and hierarchical clustering
- divisive clustering / News categorization and hierarchical clustering
- characteristics / News categorization and hierarchical clustering
- BIRCH algorithm / The BIRCH algorithm
- chameleon algorithm / The chameleon algorithm
- Bayesian hierarchical clustering algorithm / The Bayesian hierarchical clustering algorithm
- probabilistic hierarchical clustering algorithm / The probabilistic hierarchical clustering algorithm
- R implementation / The R implementation
- used, for news categorization / News categorization
hierarchical clustering algorithm
- about / Hierarchical clustering to detect outliers
high-dimensional data
- clustering / Customer purchase data analysis and clustering high-dimensional data
high-performance algorithms
- about / High-performance algorithms
high-value credit card customers
- classifying, ID3 algorithm used / High-value credit card customers classification using ID3, High-value credit card customers classification
High Contrast Subspace (HiCS) algorithm
- about / The High Contrast Subspace algorithm
HilOut algorithm
- about / The HilOut algorithm
- R implementation / The R implementation
horizontal format
- about / Input data characteristics and data structure
hybrid association rules mining
- about / Hybrid association rules mining
- multilevel and multidimensional association rules mining / Hybrid association rules mining, Mining multilevel and multidimensional association rules
- constraint-based frequent pattern mining / Hybrid association rules mining, Constraint-based frequent pattern mining

I

ID3 algorithm
- about / High-value credit card customers classification using ID3
- used, for classifying high-value credit card customers / High-value credit card customers classification using ID3, High-value credit card customers classification
- input parameters / The ID3 algorithm
- output parameter / The ID3 algorithm
- pseudocode / The ID3 algorithm
- R implementation / The R implementation
- used, for web attack detection / Web attack detection
interval-scaled
- dissimilarity / Data measuring
intrusion detection
- about / Intrusion detection
Intrusion Detection System (IDS)
- about / Web attack detection
IR
- about / Information retrieval and text mining
iterative classification algorithms
- about / The node classification and iterative classification algorithms

K

k-itemset
- about / The frequent itemset
k-means algorithm
- about / Search engines and the k-means algorithm, The k-means-based algorithm
- search engine / Search engines and the k-means algorithm, Search engine and web page clustering
- shortages / Search engines and the k-means algorithm
- guidelines / Search engines and the k-means algorithm
- pseudocode / The k-means clustering algorithm
- kernel k-means algorithm, pseudocode / The kernel k-means algorithm
- k-modes algorithm / The k-modes algorithm
- R implementation / The R implementation
- parallel version, with MapReduce / Parallel version with MapReduce
k-medoids algorithm
- about / Automatic abstraction of document texts and the k-medoids algorithm
- case considerations / Automatic abstraction of document texts and the k-medoids algorithm
- PAM algorithm / The PAM algorithm
- R implementation / The R implementation
- used, for automatic abstraction of document text / Automatic abstraction and summarization of document text
kNN algorithm
- about / Protein classification and the k-Nearest Neighbors algorithm, The kNN algorithm
- used, for protein classification / Protein classification and the k-Nearest Neighbors algorithm
- pseudocode / The kNN algorithm
- R implementation / The R implementation

L

likelihood-based outlier detection algorithm
- about / The likelihood-based outlier detection algorithm
- R implementation / The R implementation
Local Outlier Factor (LOF)
- about / Intrusion detection and density-based methods
Local Reachability Density (LRD)
- about / Intrusion detection and density-based methods

M

machine learning
- statistics / Statistics and machine learning
machine learning (ML)
- about / Web data mining, Machine learning
- architecture / Machine learning architecture
- training and testing / Machine learning architecture
- batch versus online learning / Machine learning architecture
- feature selection / Machine learning architecture
- training set, creating / Machine learning architecture
machine learning (ML), classes
- decision tree / Approaches to machine learning
- perceptron / Approaches to machine learning
- neural nets / Approaches to machine learning
- instance-based learning / Approaches to machine learning
- support-vector machines / Approaches to machine learning
MAFIA algorithm
- about / Customer purchase data analysis and clustering high-dimensional data
- pseudocode / The MAFIA algorithm
- customer purchase data analysis / Customer purchase data analysis
MapReduce
- C4.5 algorithm, parallel version / A parallel version with MapReduce
- EM methods, parallel version / Parallel version with MapReduce
- SVM algorithm, parallel version / Parallel version with MapReduce
- BP algorithm, parallel version / Parallel version with MapReduce
- k-means algorithm, parallel version / Parallel version with MapReduce
market basket analysis
- about / Market basket analysis
- market basket model / The market basket model
- A-Priori algorithm / A-Priori algorithms
- Eclat algorithm / The Eclat algorithm
- FP-growth algorithm / The FP-growth algorithm
- GenMax algorithm / The GenMax algorithm with maximal frequent itemsets
- Charm algorithm / The Charm algorithm with closed frequent itemsets
- association rules, generating / The algorithm to generate association rules
market basket model
- about / The market basket model
maximal frequent itemset (MFI)
- about / The GenMax algorithm with maximal frequent itemsets
- mining, with GenMax algorithm / The GenMax algorithm with maximal frequent itemsets
Maximal Marginal Relevance (MMR) algorithm
- about / The Maximal Marginal Relevance algorithm
- R implementation / The R implementation
Maximum Likelihood Estimation (MLE)
- about / User search intent and the EM algorithm
missing values
- avoiding / Missing values
- considerations / Missing values
mobile fraud detection
- about / Activity monitoring and the detection of mobile fraud
multidocument summarization algorithm
- about / The multidocument summarization algorithm
multilevel and multidimensional association rules mining
- about / Mining multilevel and multidimensional association rules

N

1NN classifier algorithm
- about / Time-series classification with the 1NN classifier
N-gram-based text-categorization algorithm
- used, for categorizing newspaper articles / Categorizing newspaper articles and newswires into topics
- used, for categorizing newswires / Categorizing newspaper articles and newswires into topics
- about / The N-gram-based text categorization
- pseudocode / The N-gram-based text categorization
- R implementation / The R implementation
Naïve Bayes classification
- used, for identifying spam e-mail / Identify spam e-mail and Naïve Bayes classification, Identify spam e-mail
- characteristics / Identify spam e-mail and Naïve Bayes classification
- pseudocode / The Naïve Bayes classification
- R implementation / The R implementation
news categorization
- with hierarchical clustering / News categorization
NL algorithm
- about / The NL algorithm
nominal attributes
- dissimilarity / Data measuring
normalization methods, data transformation
- min-max normalization / Normalization data transformation methods
- z-score normalization / Normalization data transformation methods
- normalization by decimal scaling / Normalization data transformation methods
numeric attributes
- about / Numeric attributes
numeric attributes, types
- interval-scaled / Numeric attributes
- ratio-scaled / Numeric attributes

O

OCSVM (One Class SVM) algorithm
- about / The OCSVM algorithm
ODIN algorithm
- about / The ODIN algorithm
one-class nearest neighbor algorithm
- about / The one-class nearest neighbor algorithm
opinion-orientation algorithm
- about / Opinion mining
opinion mining
- about / Opinion mining
OPTICS-OF algorithm
- about / The OPTICS-OF algorithm
OPTICS algorithm
- about / Clustering web pages and OPTICS, The OPTICS algorithm
- core-distance of object / Clustering web pages and OPTICS
- reachability-distance of object / Clustering web pages and OPTICS
- pseudocode / The OPTICS algorithm
- R implementation / The R implementation
- web pages, clustering / Clustering web pages
ordinal attributes
- dissimilarity / Data measuring
outlier detection
- with statistical method / Credit card fraud detection and statistical methods
- proximity-based methods / Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
- density-based methods / Intrusion detection and density-based methods
- clustering-based methods / Intrusion detection and clustering-based methods
- classification-based methods / Monitoring the performance of the web server and classification-based methods
- topic detection / Detecting novelty in text and topic detection
- novelty, detecting in text / Detecting novelty in text and topic detection
- in high-dimensional data / Outlier detection in high-dimensional data
- brute-force algorithm / The brute-force algorithm
- HilOut algorithm / The HilOut algorithm

P

PAM algorithm
- about / The PAM algorithm
partition-based clustering
- about / Search engines and the k-means algorithm
- characteristics / Search engines and the k-means algorithm
patterns
- about / An overview of associations and patterns
- frequent patterns / Patterns and pattern discovery
PCA
- about / Principal-Component Analysis
PrefixSpan algorithm
- about / The PrefixSpan algorithm
- R implementation / The R implementation
probabilistic hierarchical clustering algorithm
- about / The probabilistic hierarchical clustering algorithm
process, data mining
- CRISP-DM / The data mining process, CRISP-DM
- SEMMA / The data mining process, SEMMA
proximity-based methods
- about / Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
- density-based outlier detection algorithm / Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods
- distance-based outlier detection algorithm / Activity monitoring – the detection of fraud involving mobile phones and proximity-based methods, The distance-based algorithm
- NL algorithm / The NL algorithm
- FindAllOutsM algorithm / The FindAllOutsM algorithm
- FindAllOutsD algorithm / The FindAllOutsD algorithm
- Dolphin algorithm / The Dolphin algorithm
- R implementation / The R implementation
- activity monitoring / Activity monitoring and the detection of mobile fraud
- mobile fraud detection / Activity monitoring and the detection of mobile fraud

Q

queries
- keyword query / Information retrieval and text mining
- boolean query / Information retrieval and text mining
- phrase query / Information retrieval and text mining
- proximity query / Information retrieval and text mining
- full document query / Information retrieval and text mining
- natural language questions / Information retrieval and text mining
question answering (QA) system
- about / The question answering system

R

R
- about / Why R?
- advantage / Why R?
- disadvantage / What are the disadvantages of R?
- statistics / Statistics and R
- visualization / Visualization with R
Random forests algorithm
- about / The Random forests algorithm
recommendation systems
- about / Recommendation systems
Relative Closeness (RC), chameleon algorithm
- about / The chameleon algorithm
Relative Interconnectivity (RI), chameleon algorithm
- about / The chameleon algorithm
RHadoop
- about / Big data
RIPPER algorithm
- about / The RIPPER algorithm
- pseudocode / The RIPPER algorithm
route outlier detection (ROD) algorithm
- about / The route outlier detection (ROD) algorithm
- R implementation / The R implementation
rule-based classification
- about / Rule-based classification of player types in computer games and rule-based classification, Rule-based classification
- decision tree, transforming into decision rules / Transformation from decision tree to decision rules
- sequential covering algorithm / Sequential covering algorithm
- RIPPER algorithm / The RIPPER algorithm
- R implementation / The R implementation
- player types, classifying in computer games / Rule-based classification of player types in computer games
rules
- association rules / Relationship or rules discovery, Association rules
- correlation rules / Relationship or rules discovery, Correlation rules
- generating, from sequential patterns / Rule generation from sequential patterns

S

search engine
- web page clustering / Search engine and web page clustering
SEMMA
- about / The data mining process, SEMMA
- sample / SEMMA
- explore / SEMMA
- modify / SEMMA
- model / SEMMA
- assess / SEMMA
sentential frequent itemsets
- used, for text classification / Text classification using sentential frequent itemsets
sequence dataset
- mining / Mining sequence dataset
- about / Sequence dataset
- mining, with GSP algorithm / The GSP algorithm
sequence patterns
- mining / Mining sequence patterns in transactional databases
- PrefixSpan algorithm / The PrefixSpan algorithm
sequential covering algorithm
- about / Sequential covering algorithm
- pseudocode / Sequential covering algorithm
sequential patterns
- rules, generating / Rule generation from sequential patterns
shingling algorithm
- about / Social network mining
single-pass-any-time clustering algorithm
- about / The single-pass-any-time clustering algorithm
social network
- mining / Social network mining
- characteristics / Social network
- telephone networks / Social network
- e-mail networks / Social network
- collaboration networks / Social network
- example / Social network
social networking service (SNS)
- about / Social networking service (SNS)
social network mining
- about / Social network mining
- community detection / Social network mining
- shingling algorithm / Social network mining
- node classification / The node classification and iterative classification algorithms
- iterative classification algorithms / The node classification and iterative classification algorithms
- R implementation / The R implementation
SPADE algorithm
- about / The SPADE algorithm
- features / The SPADE algorithm
- R implementation / The R implementation
spam e-mail
- identifying, Naïve Bayes classification used / Identify spam e-mail and Naïve Bayes classification, Identify spam e-mail
spectral clustering algorithm
- about / The spectral clustering algorithm
- pseudocode / The spectral clustering algorithm
- R implementation / The R implementation
squared error-based clustering algorithm
- about / Search engines and the k-means algorithm
statistical method
- about / Credit card fraud detection and statistical methods
- likelihood-based outlier detection algorithm / The likelihood-based outlier detection algorithm
- credit card fraud detection / Credit card fraud detection
statistics
- about / Statistics
- data mining / Statistics and data mining
- machine learning / Statistics and machine learning
- and R / Statistics and R
- limitations, on data mining / The limitations of statistics on data mining
STING algorithm
- about / Recommendation system and STING
- characteristics / Recommendation system and STING
- pseudocode / The STING algorithm
- R implementation / The R implementation
- recommendation systems / Recommendation systems
stock market data
- about / Stock market data
STREAM algorithm
- about / The credit card transaction flow and STREAM algorithm
- pseudocode / The STREAM algorithm
- R implementation / The R implementation
- credit card transaction flow / The credit card transaction flow
stream data
- mining / The credit card transaction flow and STREAM algorithm
Structural Clustering Algorithm for Network (SCAN) algorithm
- about / SNS and clustering graph and network data
- pseudocode / The SCAN algorithm
- R implementation / The R implementation
- social networking service (SNS) / Social networking service (SNS)
summarization
- about / Summarization
SURFING algorithm
- pseudocode / The SURFING algorithm
- about / The SURFING algorithm
- R implementation / The R implementation
SVD
- about / Singular-value decomposition
SVM algorithm
- about / Document retrieval and Support Vector Machine
- pseudocode / The SVM algorithm
- R implementation / The R implementation
- parallel version, with MapReduce / Parallel version with MapReduce
- used, for document retrieval / Document retrieval
symbolic sequences
- mining / Web click streams and mining symbolic sequences

T

Term Frequency-Inverse Document Frequency (TF-IDF)
- about / Search engine and web page clustering
text classification
- with sentential frequent itemsets / Text classification using sentential frequent itemsets
text mining
- about / Text mining, Text mining and TM packages
- IR / Information retrieval and text mining
- for prediction / Mining text for prediction
Text Retrieval Conference (TREC)
- about / Identify spam e-mail
text summarization
- about / Text summarization
- topic representation / Topic representation
- multidocument summarization algorithm / The multidocument summarization algorithm
- Maximal Marginal Relevance (MMR) algorithm / The Maximal Marginal Relevance algorithm
time-series data
- mining / Predicting future prices and time-series analysis
- clustering / Stock market data and time-series clustering and classification
- clustering, with hError algorithm / The hError algorithm
- clustering, with 1NN classifier algorithm / Time-series classification with the 1NN classifier
- stock market data / Stock market data
Time To Live (TTL)
- about / Trojan traffic identification method
topic detection
- about / Detecting novelty in text and topic detection
topic representation
- about / Topic representation
topic signature
- about / Topic representation
Tracking Evolving Clusters in NOisy Streams (TECNO-STREAMS) algorithm
- about / Web click streams and mining symbolic sequences, The TECNO-STREAMS algorithm
- R implementation / The R implementation
- used, for mining web click streams / Web click streams
tree pruning
- about / Tree pruning
- post-pruning / Tree pruning
- pre-pruning / Tree pruning
Trojan horse
- about / Trojan traffic identification method
Trojan traffic identification
- with Bayes classification / Trojan traffic identification method and Bayes classification, Trojan traffic identification method

U

UCI Machine Learning Repository
- about / Data source
- URL / Data source
undirected graphs
- about / Graph
unsupervised image categorization
- with affinity propagation (AP) clustering / Unsupervised image categorization
user search intent
- determining / The user search intent

V

vector-space model
- about / Search engine and web page clustering
vertical format
- about / Input data characteristics and data structure
visitor analysis, in browser cache
- hit / Visitor analysis in the browser cache
- unique visitors / Visitor analysis in the browser cache
- new/return visitors / Visitor analysis in the browser cache
- page views / Visitor analysis in the browser cache
- page views per visitor / Visitor analysis in the browser cache
- IP address / Visitor analysis in the browser cache
- visitor location / Visitor analysis in the browser cache
- visitor language / Visitor analysis in the browser cache
- referring pages/sites (URLs) / Visitor analysis in the browser cache
- keywords / Visitor analysis in the browser cache
- browser type / Visitor analysis in the browser cache
- operating system version / Visitor analysis in the browser cache
- screen resolution / Visitor analysis in the browser cache
- Java or Flash-enabled / Visitor analysis in the browser cache
- connection speed / Visitor analysis in the browser cache
- errors / Visitor analysis in the browser cache
- visit duration / Visitor analysis in the browser cache
- visitor paths/navigation / Visitor analysis in the browser cache
- bounce rate / Visitor analysis in the browser cache
visualization
- about / Visualization of results
- with R / Visualization with R
visualization, features
- novel / Visualization of results
- informative / Visualization of results
- efficient / Visualization of results
- aesthetic / Visualization of results

W

WAVE clustering algorithm
- about / Opinion mining and WAVE clustering
- characteristics / Opinion mining and WAVE clustering
- pseudocode / The WAVE cluster algorithm
- R implementation / The R implementation
- opinion mining / Opinion mining
web attack
- detecting, ID3 algorithm used / Web attack detection
- DOS / Web attack detection
- R2L / Web attack detection
- U2R / Web attack detection
- probing / Web attack detection
web click streams
- mining / Web click streams and mining symbolic sequences, Web click streams
web data mining
- about / Web data mining
- web structure mining / Web data mining
- web content mining / Web data mining
- web usage mining / Web data mining
web data mining, tasks
- information extraction (IE) / Web data mining
- natural language processing (NLP) / Web data mining
- question answering / Web data mining
- resource discovery / Web data mining
web key resource page judgment
- with CART algorithm / Web key resource page judgment using CART, Web key resource page judgment
- attributes / Web key resource page judgment
web logs
- used, for web usage mining / Web usage mining with web logs
web page clustering
- about / Search engine and web page clustering
web pages
- clustering / Clustering web pages
- genre categorization / Genre categorization of web pages
web sentiment analysis
- about / Web sentiment analysis
web server
- performance, monitoring / Monitoring the performance of the web server
web spam
- detecting, C4.5 algorithm used / Web spam detection using C4.5, Web spam detection
- link spam / Web spam detection
- content spam / Web spam detection
- cloaking / Web spam detection
web usage mining
- with web logs / Web usage mining with web logs
- with, FCA-based association rule mining algorithm / The FCA-based association rule mining algorithm
WordNet
- URL / Data source

Learning Data Mining with R

By : Bater Makhabel

Learning Data Mining with R

By: Bater Makhabel

Overview of this book

Related Content you might be interested in

Current Title:

Learning Data Mining with R

Index

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

W