Book Image

R: Mining spatial, text, web, and social media data

By : Nathan H. Danneman, Richard Heimann, Pradeepta Mishra, Bater Makhabel
Book Image

R: Mining spatial, text, web, and social media data

By: Nathan H. Danneman, Richard Heimann, Pradeepta Mishra, Bater Makhabel

Overview of this book

Data mining is the first step to understanding data and making sense of heaps of data. Properly mined data forms the basis of all data analysis and computing performed on it. This learning path will take you from the very basics of data mining to advanced data mining techniques, and will end up with a specialized branch of data mining—social media mining. You will learn how to manipulate data with R using code snippets and how to mine frequent patterns, association, and correlation while working with R programs. You will discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on R Hadoop projects. Now that you are comfortable with data mining with R, you will move on to implementing your knowledge with the help of end-to-end data mining projects. You will learn how to apply different mining concepts to various statistical and data applications in a wide range of fields. At this stage, you will be able to complete complex data mining cases and handle any issues you might encounter during projects. After this, you will gain hands-on experience of generating insights from social media data. You will get detailed instructions on how to obtain, process, and analyze a variety of socially-generated data while providing a theoretical background to accurately interpret your findings. You will be shown R code and examples of data that can be used as a springboard as you get the chance to undertake your own analyses of business, social, or political data. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: ? Learning Data Mining with R by Bater Makhabel ? R Data Mining Blueprints by Pradeepta Mishra ? Social Media Mining with R by Nathan Danneman and Richard Heimann
Table of Contents (6 chapters)

Chapter 4. Advanced Classification

In this chapter, you will learn about the top classification algorithms written in the R language. You will also learn the ways to improve the classifier.

We will cover the following topics:

  • Ensemble methods
  • Biological traits and Bayesian belief network
  • Protein classification and the k-Nearest Neighbors algorithm
  • Document retrieval and Support Vector Machine
  • Text classification using sentential frequent itemsets and classification using frequent patterns
  • Classification using the backpropagation algorithm

Ensemble (EM) methods

To improve the accuracy of classification, EM methods are developed. The accuracy is dramatically improved by at least one grade compared to its base classifiers, because the EM methods make mistakes only when at least half of the result of the base classifiers are wrong.

The concept structure of EM methods is illustrated in the following diagram:

Ensemble (EM) methods

The label for the new data tuple is the result of the voting of a group of base classifiers. A combined classifier is created based on several base classifiers. Each classifier is trained with a different dataset or training set re-sampled with the replacement of the original training dataset.

Three popular EM methods are discussed in the successive sections:

  • Bagging
  • Boosting
  • Random forests

The bagging algorithm

Here is a concise description of the bagging algorithm (noted as the bootstrap aggregation), followed by the summarized pseudocode. For iteration i (The bagging algorithm), a training set, The bagging algorithm, of d tuples is sampled with replacement from the original set of tuples, D. Any training set is sampled by employing bootstrap sampling (with replacement) for it, which in turn is used to learn a classifier model, The bagging algorithm. To classify an unknown or test tuple, X, each classifier, The bagging algorithm, returns its class prediction, which counts as one vote. Assume the number of classifiers as follows, which predicts the same class, The bagging algorithm, given the test tuple X:

The bagging algorithm

The bagged classifier, The bagging algorithm, counts the votes and assigns the class with the most votes to X. Each vote has the same weight in the equation;

The bagging algorithm

About the prediction of continuous values, the average value of each prediction for a given test tuple is used as the result. The algorithm reduces the variance, given a more correct result than the base classifiers.

The input parameters for bagging algorithm are as follows:

  • D: This is the training tuples dataset
  • K: This is the number of classifiers combined
  • S: This is a classification learning algorithm or scheme to learning base classifier
  • The bagging algorithm: This is the ensemble classifier, which is the output of the algorithm

The summarized pseudocode for the bagging algorithm is as follows:

The bagging algorithm

The boosting and AdaBoost algorithms

As opposed to an ensemble algorithm, the bagging algorithm is the weighted voting and weighted sample training tuple dataset for each base classifier. The base classifiers are learned iteratively. Once a classifier is learned, the relative weights are updated with a certain algorithm for the next learning of the base classifiers. The successive model learning will emphasize the tuples misclassified by the former classifier. As a direct result, the accuracy of a certain classifier will play an important role in the voting of the final label once an unknown test tuple is provided for the combined classifier.

Adaptive Boosting or AdaBoost is one of the boosting algorithms. If it contains K base classifiers, then the AdaBoost will be performed as K passes. Given the tuple dataset The boosting and AdaBoost algorithms and its corresponding classifier The boosting and AdaBoost algorithms, the error rate The boosting and AdaBoost algorithms of classifier The boosting and AdaBoost algorithms is defined as follows:

The boosting and AdaBoost algorithms

The new classifier The boosting and AdaBoost algorithms will be discarded once the error (The boosting and AdaBoost algorithms) is bigger than 0.5. The training tuples set will be resampled for this classifier and perform the training of this classifier from scratch.

For tuple The boosting and AdaBoost algorithms, the error function is as follows:

The boosting and AdaBoost algorithms

All the weights of the tuples in the training tuples dataset The boosting and AdaBoost algorithms are initialized with The boosting and AdaBoost algorithms. When the classifier The boosting and AdaBoost algorithms is learned from the training tuples set The boosting and AdaBoost algorithms, the weight for the tuple, which is correctly classified, is multiplied by The boosting and AdaBoost algorithms. After the updates, the weights for all tuples are normalized, which means the weight of the classified tuple increases and the weight of the others decreases.

The weight of the vote of classifier The boosting and AdaBoost algorithms is as follows:

The boosting and AdaBoost algorithms

We defined The boosting and AdaBoost algorithms as the weight of the vote for class The boosting and AdaBoost algorithms upon the K classifiers, The boosting and AdaBoost algorithms representing the weight of the ith classifier:

The boosting and AdaBoost algorithms

The AdaBoost combined classifier, The boosting and AdaBoost algorithms, counts the votes with their respective weights multiplied and assigns the class with the most votes to X. Each vote has the same weight in the equation:

The boosting and AdaBoost algorithms

The input parameters for the AdaBoost algorithm are as follows:

  • D, which denotes a set of training tuples
  • k, which is the number of rounds
  • A classification learning algorithm

The output of the algorithm is a composite model. The pseudocode of AdaBoost is listed here:

The boosting and AdaBoost algorithms

The Random forests algorithm

Random forests algorithm is an ensemble method to combine a group of decision trees, which is generated by a strategy of applying a random selection of attributes at each node that will be split. Given an unknown tuple, each classifier votes, and the most popular one decides the final result. The pseudocode for the ForestRI algorithm to generate a forest is as follows:

The Random forests algorithm

T denotes a total order of the variables in line 2. In line 5, The Random forests algorithm denotes the set of variables preceding The Random forests algorithm. Prior knowledge is required for line 6.

Instead of random selection of attributes on splitting the node, for another algorithm, ForestRC, the random linear combination strategy of the existing attributes is used to split the task. New attributes are built by the random linear combination of the original attributes set. With a couple of new attributes added, the best split is searched over the updated attributes set including the new and original attributes.

The R implementation

Here we provide three R implementations, bagging, AdaBoost, and Random forests. Please look up the R codes file ch_04_bagging.R, ch_04_adaboost.R, ch_04_forestrc.R, and ch_04_forestri.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:

> source("ch_04_bagging.R")
> source("ch_04_adaboost.R")
> source("ch_04_forestrc.R")
> source("ch_04_forestri.R")

Parallel version with MapReduce

The following algorithm is the parallelized AdaBoost algorithm, which depends on a couple of workers to construct boosting classifiers. The dataset for the pth worker is defined using the following formula, where, Parallel version with MapReduce denoting its size is Parallel version with MapReduce:

Parallel version with MapReduce

The classifier Parallel version with MapReduce is defined in the following format, with Parallel version with MapReduce as the weight:

Parallel version with MapReduce

The output is the final classifier. The input is the training dataset of M workers Parallel version with MapReduce.

Parallel version with MapReduce

Biological traits and the Bayesian belief network

The Bayesian belief network, once trained, can be used for classification. Based on the Bayes' theorem, which is defined in the The Bayes classification section of Chapter 3, Classification, it is defined with two parts, one directed acyclic graph and conditional probability tables (CPT) for each variable; this is in turn represented by one node in the graph and models the uncertainty by graphically representing the conditional dependencies between distinct components. The arcs in the image give a representation of causal knowledge. The interaction among the diverse sources of uncertainty is also graphically illustrated.

The uncertainty comes from various sources:

  • The way to associate the knowledge by the expert
  • The domain intrinsic uncertainty
  • The requirement of the knowledge to be translated
  • The accuracy and availability of knowledge

Here is an example of the Bayesian belief network with four Boolean variables and the corresponding arcs. Whether the grass is wet is influenced by the work results of sprinkler and whether it has just rained, and so on. Each arc has a certain probability.

Let us have a look at the CPT representation of Biological traits and the Bayesian belief network:

Biological traits and the Bayesian belief network

In the network, each variable is conditionally independent of its non-descendants. Here is the definition of the joint probability distribution:

Biological traits and the Bayesian belief network

The Bayesian belief network (BBN) algorithm

Before the application of the BBN algorithm to classification, we need to train it first. In the process of training the network, the expert knowledge, that is, the prior knowledge, can be used in the training process to help the design of the network. For the variables that participated in direct dependency, experts must specify their conditional probability. There are many algorithms to learn the network from the training dataset; we will introduce an adaptive probabilistic networks algorithm.

The input parameters for the BBN algorithm are as follows:

  • T, denotes a total order of the variables
  • CPT

The output of the algorithm is the topology structure of BBN, which is as follows:

The Bayesian belief network (BBN) algorithm

T denotes a total order of the variables in line 2. In line 5, The Bayesian belief network (BBN) algorithm denotes the set of variables preceding The Bayesian belief network (BBN) algorithm.

The R implementation

Please look up the R codes file ch_04_bnn.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following command:

> source("ch_04_bnn.R")

Biological traits

A biological trait is one of the important applications of the BBN algorithm.

The Bayesian belief network (BBN) algorithm

Before the application of the BBN algorithm to classification, we need to train it first. In the process of training the network, the expert knowledge, that is, the prior knowledge, can be used in the training process to help the design of the network. For the variables that participated in direct dependency, experts must specify their conditional probability. There are many algorithms to learn the network from the training dataset; we will introduce an adaptive probabilistic networks algorithm.

The input parameters for the BBN algorithm are as follows:

  • T, denotes a total order of the variables
  • CPT

The output of the algorithm is the topology structure of BBN, which is as follows:

The Bayesian belief network (BBN) algorithm

T denotes a total order of the variables in line 2. In line 5, The Bayesian belief network (BBN) algorithm denotes the set of variables preceding The Bayesian belief network (BBN) algorithm.

The R implementation

Please look up the R codes file ch_04_bnn.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following command:

> source("ch_04_bnn.R")

Biological traits

A biological trait is one of the important applications of the BBN algorithm.

The R implementation

Please look up the R codes file ch_04_bnn.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following command:

> source("ch_04_bnn.R")

Biological traits

A biological trait is one of the important applications of the BBN algorithm.

Biological traits

A biological trait is one of the important applications of the BBN algorithm.

Protein classification and the k-Nearest Neighbors algorithm

The k-Nearest Neighbors (kNN) algorithm is one of the lazy learners that postpones the learning until the test tuple or test instance is provided.

A single training tuple is represented by a point in an n-dimensional space. In other words, n attributes' combinations are used to represent the specific training tuple. There is no specific training before the arrival of the test tuple that needs to be classified. Some preprocessing steps are needed, such as normalization for some attributes with large values compared to other attributes' values. Data normalization approaches in the data transformation can be applied here for preprocessing.

When a test tuple is given, the k-nearest training tuples are found from the training tuples space by a specific measure to calculate the distance between test tuple and the training tuple. The k-nearest training tuples are also known as the kNN. One popular solution is the Euclidean distance in real space, illustrated in the following equation. This method is only applicable to numeric attributes:

Protein classification and the k-Nearest Neighbors algorithm

For nominal attributes, one solution is that the difference between two attribute values is defined as 1, or as 0. We already know that many approaches deal with missing values in the attributes. With a predefined threshold, the value of k is selected with the number of tuples with the lowest error-rate among all the training tuples.

The class label of the test tuple is defined by the voting of the most common class in the kNN.

The kNN algorithm

The input parameters for kNN algorithm are as follows:

  • D, the set of training objects
  • z, the test object, which is a vector of attribute values
  • L, the set of classes used to label the objects

The output of the algorithm is the class of z, represented as The kNN algorithm.

The pseudocode snippet for kNN is illustrated here:

The kNN algorithm

The I function in line 6 denotes an indicator function that returns the value 1 if its argument is true and 0 otherwise.

The R implementation

Please look up the R codes file ch_04_knn.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_04_knn.R")

The kNN algorithm

The input parameters for kNN algorithm are as follows:

  • D, the set of training objects
  • z, the test object, which is a vector of attribute values
  • L, the set of classes used to label the objects

The output of the algorithm is the class of z, represented as The kNN algorithm.

The pseudocode snippet for kNN is illustrated here:

The kNN algorithm

The I function in line 6 denotes an indicator function that returns the value 1 if its argument is true and 0 otherwise.

The R implementation

Please look up the R codes file ch_04_knn.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_04_knn.R")

The R implementation

Please look up the R codes file ch_04_knn.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_04_knn.R")

Document retrieval and Support Vector Machine

Support Vector Machine (SVM) is a classification algorithm applicable to both linear and nonlinear data classification. It is based on an assumption: if two classes of data cannot be divided by a hyper-plane, then after mapping the source dataset to sufficient higher dimension spaces, the optimal separating hyper-plane must exist.

Here are two concepts that need to be clearly defined:

  • Linearly separable: This means that a dataset can be divided into the target classes with a linear equation with the input of a training tuple.
  • Nonlinearly separable: This means that none of the linear equations exist in the space with the same dimension as that of the training tuple.

The linear hyper-plane can be represented as the linear discriminant equation, given the weight vector w and the training tuple x,

Document retrieval and Support Vector Machine
Document retrieval and Support Vector Machine
Document retrieval and Support Vector Machine

With the preceding equation, we have the following image to illustrate a hyper-plane:

Document retrieval and Support Vector Machine

The target of SVM is to find the optimal hyper-plane, by which the margin between data points belonging to different classes are maximized.

There are two hyper-planes with equal distance and that are parallel to the Document retrieval and Support Vector Machine hyper-plane. They are boundary hyper-planes and all support vectors are on them. This is illustrated in the following diagram:

Document retrieval and Support Vector Machine

In the following diagram, the case of a nonlinearly separable case is illustrated:

Document retrieval and Support Vector Machine

In the following diagram, after mapping the vector from a low-dimensional space to high-dimensional space, the nonlinearly separable case will be transformed into to a linearly separable case:

Document retrieval and Support Vector Machine

The SVM algorithm

The input parameters for a dual SVM algorithm are as follows:

  • D, the set of training objects
  • K
  • C
  • The SVM algorithm

The output of the algorithm is the SVM algorithm. The pseudocode snippet for this algorithm is illustrated here:

The SVM algorithm
The SVM algorithm

Here is the pseudocode for another version of the SVM algorithm, which is the primal kernel SVM algorithm. The input parameters for the primal kernel SVM algorithm are as follows:

  • D, the set of training objects
  • K
  • C
  • The SVM algorithm

The output of the algorithm is the SVM model. The pseudocode snippet for this algorithm is illustrated here:

The SVM algorithm

The R implementation

Please look up the R codes file ch_04_svm.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_04_svm.R")

Parallel version with MapReduce

Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.

The t denotes for iteration number, l for MapReduce function size, Parallel version with MapReduce for the best hypothesis at iteration t, Parallel version with MapReduce for sub-dataset for node l, Parallel version with MapReduce for support vectors at node l and Parallel version with MapReduce for global support vector:

Parallel version with MapReduce

Next, the mapper algorithm is designed and the for loop is immediately behind the while loop, which is looped for each subset:

Parallel version with MapReduce

Finally, the reducer algorithm is designed. On line 3, the code inside the for loop immediately follows the while loop. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:

Parallel version with MapReduce

Document retrieval

One of the important applications of SVM is document retrieval, which has a static information source; the task is to fetch a ranking of documents as a response to a user request. The vector model is a widely implemented document-retrieval or information-retrieval model.

The SVM algorithm

The input parameters for a dual SVM algorithm are as follows:

  • D, the set of training objects
  • K
  • C
  • The SVM algorithm

The output of the algorithm is the SVM algorithm. The pseudocode snippet for this algorithm is illustrated here:

The SVM algorithm
The SVM algorithm

Here is the pseudocode for another version of the SVM algorithm, which is the primal kernel SVM algorithm. The input parameters for the primal kernel SVM algorithm are as follows:

  • D, the set of training objects
  • K
  • C
  • The SVM algorithm

The output of the algorithm is the SVM model. The pseudocode snippet for this algorithm is illustrated here:

The SVM algorithm

The R implementation

Please look up the R codes file ch_04_svm.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_04_svm.R")

Parallel version with MapReduce

Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.

The t denotes for iteration number, l for MapReduce function size, Parallel version with MapReduce for the best hypothesis at iteration t, Parallel version with MapReduce for sub-dataset for node l, Parallel version with MapReduce for support vectors at node l and Parallel version with MapReduce for global support vector:

Parallel version with MapReduce

Next, the mapper algorithm is designed and the for loop is immediately behind the while loop, which is looped for each subset:

Parallel version with MapReduce

Finally, the reducer algorithm is designed. On line 3, the code inside the for loop immediately follows the while loop. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:

Parallel version with MapReduce

Document retrieval

One of the important applications of SVM is document retrieval, which has a static information source; the task is to fetch a ranking of documents as a response to a user request. The vector model is a widely implemented document-retrieval or information-retrieval model.

The R implementation

Please look up the R codes file ch_04_svm.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_04_svm.R")

Parallel version with MapReduce

Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.

The t denotes for iteration number, l for MapReduce function size, Parallel version with MapReduce for the best hypothesis at iteration t, Parallel version with MapReduce for sub-dataset for node l, Parallel version with MapReduce for support vectors at node l and Parallel version with MapReduce for global support vector:

Parallel version with MapReduce

Next, the mapper algorithm is designed and the for loop is immediately behind the while loop, which is looped for each subset:

Parallel version with MapReduce

Finally, the reducer algorithm is designed. On line 3, the code inside the for loop immediately follows the while loop. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:

Parallel version with MapReduce

Document retrieval

One of the important applications of SVM is document retrieval, which has a static information source; the task is to fetch a ranking of documents as a response to a user request. The vector model is a widely implemented document-retrieval or information-retrieval model.

Parallel version with MapReduce

Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.

The t denotes for iteration number, l for MapReduce function size, Parallel version with MapReduce for the best hypothesis at iteration t, Parallel version with MapReduce for sub-dataset for node l, Parallel version with MapReduce for support vectors at node l and Parallel version with MapReduce for global support vector:

Parallel version with MapReduce

Next, the mapper algorithm is designed and the for loop is immediately behind the while loop, which is looped for each subset:

Parallel version with MapReduce

Finally, the reducer algorithm is designed. On line 3, the code inside the for loop immediately follows the while loop. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:

Parallel version with MapReduce

Document retrieval

One of the important applications of SVM is document retrieval, which has a static information source; the task is to fetch a ranking of documents as a response to a user request. The vector model is a widely implemented document-retrieval or information-retrieval model.

Document retrieval

One of the important applications of SVM is document retrieval, which has a static information source; the task is to fetch a ranking of documents as a response to a user request. The vector model is a widely implemented document-retrieval or information-retrieval model.

Classification using frequent patterns

There are two types of classification using frequent patterns:

  • Associative classification model as well as association rules, which are generated from frequent patterns and used for classifications
  • Discriminative frequent pattern-based classification

The associative classification

The generic association classification algorithm is defined here. The input parameters for the kNN algorithm are as follows:

  • D, which is a set of training objects
  • F, which is the itemset
  • MIN_SUP, which is the minimal support
  • MIN_CONF, which is the minimal confidence

The output of the algorithm is a rule-based classifier and is shown as follows:

The associative classification

Two popular algorithms are illustrated in the successive sections, one is Classification Based on Association (CBA), and the other is Classification Based on Multiple Association Rules (CMAR).

CBA

Here is the pseudocode for CBA:

CBA

Discriminative frequent pattern-based classification

Here is the pseudocode for discriminative frequent pattern-based classification:

Discriminative frequent pattern-based classification

The R implementation

Please look up the R codes files ch_04_associative_classification.R, ch_04_cba.R, and ch_04_frequent_pattern_based_classification.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:

> source("ch_04_associative_classification.R")
> source("ch_04_cba.R")
> source("ch_04_frequent_pattern_based_classification.R")

Text classification using sentential frequent itemsets

One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.

Text classification using sentential frequent itemsets

The associative classification

The generic association classification algorithm is defined here. The input parameters for the kNN algorithm are as follows:

  • D, which is a set of training objects
  • F, which is the itemset
  • MIN_SUP, which is the minimal support
  • MIN_CONF, which is the minimal confidence

The output of the algorithm is a rule-based classifier and is shown as follows:

The associative classification

Two popular algorithms are illustrated in the successive sections, one is Classification Based on Association (CBA), and the other is Classification Based on Multiple Association Rules (CMAR).

CBA

Here is the pseudocode for CBA:

CBA

Discriminative frequent pattern-based classification

Here is the pseudocode for discriminative frequent pattern-based classification:

Discriminative frequent pattern-based classification

The R implementation

Please look up the R codes files ch_04_associative_classification.R, ch_04_cba.R, and ch_04_frequent_pattern_based_classification.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:

> source("ch_04_associative_classification.R")
> source("ch_04_cba.R")
> source("ch_04_frequent_pattern_based_classification.R")

Text classification using sentential frequent itemsets

One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.

Text classification using sentential frequent itemsets

CBA

Here is the pseudocode for CBA:

CBA
Discriminative frequent pattern-based classification

Here is the pseudocode for discriminative frequent pattern-based classification:

Discriminative frequent pattern-based classification

The R implementation

Please look up the R codes files ch_04_associative_classification.R, ch_04_cba.R, and ch_04_frequent_pattern_based_classification.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:

> source("ch_04_associative_classification.R")
> source("ch_04_cba.R")
> source("ch_04_frequent_pattern_based_classification.R")
Text classification using sentential frequent itemsets

One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.

Text classification using sentential frequent itemsets

Discriminative frequent pattern-based classification

Here is the pseudocode for discriminative frequent pattern-based classification:

Discriminative frequent pattern-based classification

The R implementation

Please look up the R codes files ch_04_associative_classification.R, ch_04_cba.R, and ch_04_frequent_pattern_based_classification.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:

> source("ch_04_associative_classification.R")
> source("ch_04_cba.R")
> source("ch_04_frequent_pattern_based_classification.R")

Text classification using sentential frequent itemsets

One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.

Text classification using sentential frequent itemsets

The R implementation

Please look up the R codes files ch_04_associative_classification.R, ch_04_cba.R, and ch_04_frequent_pattern_based_classification.R from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:

> source("ch_04_associative_classification.R")
> source("ch_04_cba.R")
> source("ch_04_frequent_pattern_based_classification.R")

Text classification using sentential frequent itemsets

One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.

Text classification using sentential frequent itemsets

Text classification using sentential frequent itemsets

One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.

Text classification using sentential frequent itemsets

Classification using the backpropagation algorithm

The backpropagation (BP) algorithm learns the classification model by training a multilayer feed-forward neural network. The generic architecture of the neural network for BP is shown in the following diagrams, with one input layer, some hidden layers, and one output layer. Each layer contains some units or perceptron. Each unit might be linked to others by weighted connections. The values of the weights are initialized before the training. The number of units in each layer, number of hidden layers, and the connections will be empirically defined at the very start.

Classification using the backpropagation algorithm

The training tuples are assigned to the input layer; each unit in the input layer calculates the result with certain functions and the input attributes from the training tuple, and then the output is served as the input parameter for the hidden layer; the calculation here happened layer by layer. As a consequence, the output of the network contains all the output of each unit in the output layer. The training is performed iteratively by updating the weights of the connections with the feedback of errors, that is, back propagation of the errors.

The prototype for the unit in the hidden/output layer is shown here, the one in the input layer with only one input and other minor differences compared to this. Classification using the backpropagation algorithm denotes the weight related to that link or connection. Each unit is bound with a bias, which is Classification using the backpropagation algorithm. The threshold or activation function bound with each unit is Classification using the backpropagation algorithm.

Classification using the backpropagation algorithm

For a certain unit or perceptron in the hidden or output layer, the net input is a combination of the linear combination of each of its input from the previous unit, that is, the output of it, which is Classification using the backpropagation algorithm. Let k denote the number of input connections of the unit q:

Classification using the backpropagation algorithm

The output of this unit q is Classification using the backpropagation algorithm.

Classification using the backpropagation algorithm

If the unit q is in the output layer and Classification using the backpropagation algorithm denotes the expected or known output value in the training tuple, then its error Classification using the backpropagation algorithm can be calculated as follows:

Classification using the backpropagation algorithm

If the unit q is in the hidden layer, then let Classification using the backpropagation algorithm denote the weight of the connection from unit q to one unit with the error Classification using the backpropagation algorithm in the next layer or the known output value in the training tuple. We can then calculate its error as Classification using the backpropagation algorithm. Let M denote the number of output connections of the unit q:

Classification using the backpropagation algorithm

After this preprocessing or preparation, the weights and biases can be updated accordingly with the backpropagation strategy. Classification using the backpropagation algorithm denotes the learning rate; it is an empirical parameter, which belongs to (0,1):

Classification using the backpropagation algorithm
Classification using the backpropagation algorithm
Classification using the backpropagation algorithm
Classification using the backpropagation algorithm

The weights and biases are updated per tuple of the training tuple dataset.

The BP algorithm

The input parameters for the BP algorithm, the topology of the neural network, the number of hidden layers, and the connections, are defined before the start of the training:

  • D, which is the training tuples set
  • W, which is the initial values for all the weights.
  • The BP algorithm, which is the bias for each units
  • I, which is the learning rate

The output of the algorithm is the topology structure of BP, which consists of:

  • BPNN, which is the trained BP neural network
  • W, which is a set of weights of the connections in the neural network

Here is the pseudocode for the training of the backpropagation network:

The BP algorithm
The BP algorithm

The R implementation

Please look up the R codes file ch_04_bp.R from the bundle of R codes for the BP algorithm. The codes can be tested with the following command:

> source("ch_04_bp.R")

Parallel version with MapReduce

Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.

Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.

The backpropagation main algorithm is listed as follows:

Parallel version with MapReduce

Four steps compose the backpropagation mapper algorithm. They are listed here:

Parallel version with MapReduce

Five steps compose the backpropagation reducer algorithm. They are listed here:

Parallel version with MapReduce

The BP algorithm

The input parameters for the BP algorithm, the topology of the neural network, the number of hidden layers, and the connections, are defined before the start of the training:

  • D, which is the training tuples set
  • W, which is the initial values for all the weights.
  • The BP algorithm, which is the bias for each units
  • I, which is the learning rate

The output of the algorithm is the topology structure of BP, which consists of:

  • BPNN, which is the trained BP neural network
  • W, which is a set of weights of the connections in the neural network

Here is the pseudocode for the training of the backpropagation network:

The BP algorithm
The BP algorithm

The R implementation

Please look up the R codes file ch_04_bp.R from the bundle of R codes for the BP algorithm. The codes can be tested with the following command:

> source("ch_04_bp.R")

Parallel version with MapReduce

Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.

Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.

The backpropagation main algorithm is listed as follows:

Parallel version with MapReduce

Four steps compose the backpropagation mapper algorithm. They are listed here:

Parallel version with MapReduce

Five steps compose the backpropagation reducer algorithm. They are listed here:

Parallel version with MapReduce

The R implementation

Please look up the R codes file ch_04_bp.R from the bundle of R codes for the BP algorithm. The codes can be tested with the following command:

> source("ch_04_bp.R")

Parallel version with MapReduce

Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.

Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.

The backpropagation main algorithm is listed as follows:

Parallel version with MapReduce

Four steps compose the backpropagation mapper algorithm. They are listed here:

Parallel version with MapReduce

Five steps compose the backpropagation reducer algorithm. They are listed here:

Parallel version with MapReduce

Parallel version with MapReduce

Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.

Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.

The backpropagation main algorithm is listed as follows:

Parallel version with MapReduce

Four steps compose the backpropagation mapper algorithm. They are listed here:

Parallel version with MapReduce

Five steps compose the backpropagation reducer algorithm. They are listed here:

Parallel version with MapReduce

Time for action

Here are some practice questions for you to check your understanding about the concepts covered:

  • Find one application of the classification using frequent patterns
  • What is the BBN algorithm?
  • What is the kNN algorithm?
  • What is the backpropagation algorithm?
  • What is SVM?

Summary

In this chapter, we looked at the following topics:

  • Ensemble methods
  • Bagging
  • AdaBoost
  • Random forests
  • BBN, which provides a topology model of causal relationship; it is an eager learner.

    Tip

    An eager learner is one that behaves in an opposite way to the lazy learner. The lazy learner postpones the learning until the test tuple or test instance is provided.

  • The kNN algorithm, which is a lazy learner
  • SVM, by which the original data is transformed to a higher dimension, and is separated with a hyper-plane; it is an eager learner
  • Classification using frequent patterns; this is an eager learner
  • Classification using the BP algorithm, which is a neural network trained with a descent gradient

In the next chapter, we will learn the clustering algorithm, which is also a kind of unsupervised classification with no predefined labels.