In this chapter, you will learn about the top classification algorithms written in the R language. You will also learn the ways to improve the classifier.
We will cover the following topics:
- Ensemble methods
- Biological traits and Bayesian belief network
- Protein classification and the k-Nearest Neighbors algorithm
- Document retrieval and Support Vector Machine
- Text classification using sentential frequent itemsets and classification using frequent patterns
- Classification using the backpropagation algorithm
To improve the accuracy of classification, EM methods are developed. The accuracy is dramatically improved by at least one grade compared to its base classifiers, because the EM methods make mistakes only when at least half of the result of the base classifiers are wrong.
The concept structure of EM methods is illustrated in the following diagram:
The label for the new data tuple is the result of the voting of a group of base classifiers. A combined classifier is created based on several base classifiers. Each classifier is trained with a different dataset or training set re-sampled with the replacement of the original training dataset.
Three popular EM methods are discussed in the successive sections:
- Bagging
- Boosting
- Random forests
Here is a concise description of the bagging algorithm (noted as the bootstrap aggregation), followed by the summarized pseudocode. For iteration i (), a training set, , of d tuples is sampled with replacement from the original set of tuples, D. Any training set is sampled by employing bootstrap sampling (with replacement) for it, which in turn is used to learn a classifier model, . To classify an unknown or test tuple, X, each classifier, , returns its class prediction, which counts as one vote. Assume the number of classifiers as follows, which predicts the same class, , given the test tuple X:
The bagged classifier, , counts the votes and assigns the class with the most votes to X. Each vote has the same weight in the equation;
About the prediction of continuous values, the average value of each prediction for a given test tuple is used as the result. The algorithm reduces the variance, given a more correct result than the base classifiers.
The input parameters for bagging algorithm are as follows:
- D: This is the training tuples dataset
- K: This is the number of classifiers combined
- S: This is a classification learning algorithm or scheme to learning base classifier
- : This is the ensemble classifier, which is the output of the algorithm
The summarized pseudocode for the bagging algorithm is as follows:
As opposed to an ensemble algorithm, the bagging algorithm is the weighted voting and weighted sample training tuple dataset for each base classifier. The base classifiers are learned iteratively. Once a classifier is learned, the relative weights are updated with a certain algorithm for the next learning of the base classifiers. The successive model learning will emphasize the tuples misclassified by the former classifier. As a direct result, the accuracy of a certain classifier will play an important role in the voting of the final label once an unknown test tuple is provided for the combined classifier.
Adaptive Boosting or AdaBoost is one of the boosting algorithms. If it contains K base classifiers, then the AdaBoost will be performed as K passes. Given the tuple dataset and its corresponding classifier , the error rate of classifier is defined as follows:
The new classifier will be discarded once the error () is bigger than 0.5. The training tuples set will be resampled for this classifier and perform the training of this classifier from scratch.
For tuple , the error function is as follows:
All the weights of the tuples in the training tuples dataset are initialized with . When the classifier is learned from the training tuples set , the weight for the tuple, which is correctly classified, is multiplied by . After the updates, the weights for all tuples are normalized, which means the weight of the classified tuple increases and the weight of the others decreases.
The weight of the vote of classifier is as follows:
We defined as the weight of the vote for class upon the K classifiers, representing the weight of the ith classifier:
The AdaBoost combined classifier, , counts the votes with their respective weights multiplied and assigns the class with the most votes to X
. Each vote has the same weight in the equation:
The input parameters for the AdaBoost algorithm are as follows:
- D, which denotes a set of training tuples
- k, which is the number of rounds
- A classification learning algorithm
The output of the algorithm is a composite model. The pseudocode of AdaBoost is listed here:
Random forests algorithm is an ensemble method to combine a group of decision trees, which is generated by a strategy of applying a random selection of attributes at each node that will be split. Given an unknown tuple, each classifier votes, and the most popular one decides the final result. The pseudocode for the ForestRI algorithm to generate a forest is as follows:
T
denotes a total order of the variables in line 2. In line 5, denotes the set of variables preceding . Prior knowledge is required for line 6.
Instead of random selection of attributes on splitting the node, for another algorithm, ForestRC, the random linear combination strategy of the existing attributes is used to split the task. New attributes are built by the random linear combination of the original attributes set. With a couple of new attributes added, the best split is searched over the updated attributes set including the new and original attributes.
Here we provide three R implementations, bagging, AdaBoost, and Random forests. Please look up the R codes file ch_04_bagging.R
, ch_04_adaboost.R
, ch_04_forestrc.R
, and ch_04_forestri.R
from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:
> source("ch_04_bagging.R") > source("ch_04_adaboost.R") > source("ch_04_forestrc.R") > source("ch_04_forestri.R")
The following algorithm is the parallelized AdaBoost algorithm, which depends on a couple of workers to construct boosting classifiers. The dataset for the pth worker is defined using the following formula, where, denoting its size is :
The classifier is defined in the following format, with as the weight:
The output is the final classifier. The input is the training dataset of M workers .
The Bayesian belief network, once trained, can be used for classification. Based on the Bayes' theorem, which is defined in the The Bayes classification section of Chapter 3, Classification, it is defined with two parts, one directed acyclic graph and conditional probability tables (CPT) for each variable; this is in turn represented by one node in the graph and models the uncertainty by graphically representing the conditional dependencies between distinct components. The arcs in the image give a representation of causal knowledge. The interaction among the diverse sources of uncertainty is also graphically illustrated.
The uncertainty comes from various sources:
- The way to associate the knowledge by the expert
- The domain intrinsic uncertainty
- The requirement of the knowledge to be translated
- The accuracy and availability of knowledge
Here is an example of the Bayesian belief network with four Boolean variables and the corresponding arcs. Whether the grass is wet is influenced by the work results of sprinkler and whether it has just rained, and so on. Each arc has a certain probability.
Let us have a look at the CPT representation of :
In the network, each variable is conditionally independent of its non-descendants. Here is the definition of the joint probability distribution:
Before the application of the BBN algorithm to classification, we need to train it first. In the process of training the network, the expert knowledge, that is, the prior knowledge, can be used in the training process to help the design of the network. For the variables that participated in direct dependency, experts must specify their conditional probability. There are many algorithms to learn the network from the training dataset; we will introduce an adaptive probabilistic networks algorithm.
The input parameters for the BBN algorithm are as follows:
- T, denotes a total order of the variables
- CPT
The output of the algorithm is the topology structure of BBN, which is as follows:
T denotes a total order of the variables in line 2. In line 5, denotes the set of variables preceding .
The Bayesian belief network (BBN) algorithm
Before the application of the BBN algorithm to classification, we need to train it first. In the process of training the network, the expert knowledge, that is, the prior knowledge, can be used in the training process to help the design of the network. For the variables that participated in direct dependency, experts must specify their conditional probability. There are many algorithms to learn the network from the training dataset; we will introduce an adaptive probabilistic networks algorithm.
The input parameters for the BBN algorithm are as follows:
- T, denotes a total order of the variables
- CPT
The output of the algorithm is the topology structure of BBN, which is as follows:
T denotes a total order of the variables in line 2. In line 5, denotes the set of variables preceding .
The k-Nearest Neighbors (kNN) algorithm is one of the lazy learners that postpones the learning until the test tuple or test instance is provided.
A single training tuple is represented by a point in an n-dimensional space. In other words, n attributes' combinations are used to represent the specific training tuple. There is no specific training before the arrival of the test tuple that needs to be classified. Some preprocessing steps are needed, such as normalization for some attributes with large values compared to other attributes' values. Data normalization approaches in the data transformation can be applied here for preprocessing.
When a test tuple is given, the k-nearest training tuples are found from the training tuples space by a specific measure to calculate the distance between test tuple and the training tuple. The k-nearest training tuples are also known as the kNN. One popular solution is the Euclidean distance in real space, illustrated in the following equation. This method is only applicable to numeric attributes:
For nominal attributes, one solution is that the difference between two attribute values is defined as 1, or as 0. We already know that many approaches deal with missing values in the attributes. With a predefined threshold, the value of k is selected with the number of tuples with the lowest error-rate among all the training tuples.
The class label of the test tuple is defined by the voting of the most common class in the kNN.
The input parameters for kNN algorithm are as follows:
- D, the set of training objects
- z, the test object, which is a vector of attribute values
- L, the set of classes used to label the objects
The output of the algorithm is the class of z, represented as .
The pseudocode snippet for kNN is illustrated here:
The I function in line 6 denotes an indicator function that returns the value 1
if its argument is true and 0
otherwise.
The kNN algorithm
The input parameters for kNN algorithm are as follows:
- D, the set of training objects
- z, the test object, which is a vector of attribute values
- L, the set of classes used to label the objects
The output of the algorithm is the class of z, represented as .
The pseudocode snippet for kNN is illustrated here:
The I function in line 6 denotes an indicator function that returns the value 1
if its argument is true and 0
otherwise.
Support Vector Machine (SVM) is a classification algorithm applicable to both linear and nonlinear data classification. It is based on an assumption: if two classes of data cannot be divided by a hyper-plane, then after mapping the source dataset to sufficient higher dimension spaces, the optimal separating hyper-plane must exist.
Here are two concepts that need to be clearly defined:
The linear hyper-plane can be represented as the linear discriminant equation, given the weight vector w
and the training tuple x
,
With the preceding equation, we have the following image to illustrate a hyper-plane:
The target of SVM is to find the optimal hyper-plane, by which the margin between data points belonging to different classes are maximized.
There are two hyper-planes with equal distance and that are parallel to the hyper-plane. They are boundary hyper-planes and all support vectors are on them. This is illustrated in the following diagram:
In the following diagram, the case of a nonlinearly separable case is illustrated:
In the following diagram, after mapping the vector from a low-dimensional space to high-dimensional space, the nonlinearly separable case will be transformed into to a linearly separable case:
The input parameters for a dual SVM algorithm are as follows:
- D, the set of training objects
- K
- C
The output of the algorithm is the SVM algorithm. The pseudocode snippet for this algorithm is illustrated here:
Here is the pseudocode for another version of the SVM algorithm, which is the primal kernel SVM algorithm. The input parameters for the primal kernel SVM algorithm are as follows:
- D, the set of training objects
- K
- C
The output of the algorithm is the SVM model. The pseudocode snippet for this algorithm is illustrated here:
Please look up the R codes file ch_04_svm.R
from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:
> source("ch_04_svm.R")
Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.
The t
denotes for iteration number, l
for MapReduce function size, for the best hypothesis at iteration t
, for sub-dataset for node l
, for support vectors at node l
and for global support vector:
Next, the mapper algorithm is designed and the for loop
is immediately behind the while loop
, which is looped for each subset:
Finally, the reducer algorithm is designed. On line 3, the code inside the for loop
immediately follows the while loop
. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:
The SVM algorithm
The input parameters for a dual SVM algorithm are as follows:
- D, the set of training objects
- K
- C
The output of the algorithm is the SVM algorithm. The pseudocode snippet for this algorithm is illustrated here:
Here is the pseudocode for another version of the SVM algorithm, which is the primal kernel SVM algorithm. The input parameters for the primal kernel SVM algorithm are as follows:
- D, the set of training objects
- K
- C
The output of the algorithm is the SVM model. The pseudocode snippet for this algorithm is illustrated here:
Please look up the R codes file ch_04_svm.R
from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:
> source("ch_04_svm.R")
Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.
The t
denotes for iteration number, l
for MapReduce function size, for the best hypothesis at iteration t
, for sub-dataset for node l
, for support vectors at node l
and for global support vector:
Next, the mapper algorithm is designed and the for loop
is immediately behind the while loop
, which is looped for each subset:
Finally, the reducer algorithm is designed. On line 3, the code inside the for loop
immediately follows the while loop
. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:
The R implementation
Please look up the R codes file ch_04_svm.R
from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:
> source("ch_04_svm.R")
Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.
The t
denotes for iteration number, l
for MapReduce function size, for the best hypothesis at iteration t
, for sub-dataset for node l
, for support vectors at node l
and for global support vector:
Next, the mapper algorithm is designed and the for loop
is immediately behind the while loop
, which is looped for each subset:
Finally, the reducer algorithm is designed. On line 3, the code inside the for loop
immediately follows the while loop
. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:
Parallel version with MapReduce
Along with the online computation requests, such as mobile cloud applications for classification, a high-performance, robust, and high-accuracy classification platform or algorithm is widely required. The parallelized SVM distributes the computing of optimization by the MapReduce technique to achieve working on a large scale of datasets and high performance at the same time. There are many implementations of various versions of parallelized SVM. One of them is shown here.
The t
denotes for iteration number, l
for MapReduce function size, for the best hypothesis at iteration t
, for sub-dataset for node l
, for support vectors at node l
and for global support vector:
Next, the mapper algorithm is designed and the for loop
is immediately behind the while loop
, which is looped for each subset:
Finally, the reducer algorithm is designed. On line 3, the code inside the for loop
immediately follows the while loop
. This is for training by merging the datasets to obtain support vectors and binary-class hypothesis:
There are two types of classification using frequent patterns:
- Associative classification model as well as association rules, which are generated from frequent patterns and used for classifications
- Discriminative frequent pattern-based classification
The generic association classification algorithm is defined here. The input parameters for the kNN algorithm are as follows:
- D, which is a set of training objects
- F, which is the itemset
- MIN_SUP, which is the minimal support
- MIN_CONF, which is the minimal confidence
The output of the algorithm is a rule-based classifier and is shown as follows:
Two popular algorithms are illustrated in the successive sections, one is Classification Based on Association (CBA), and the other is Classification Based on Multiple Association Rules (CMAR).
Please look up the R codes files ch_04_associative_classification.R
, ch_04_cba.R
, and ch_04_frequent_pattern_based_classification.R
from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:
> source("ch_04_associative_classification.R") > source("ch_04_cba.R") > source("ch_04_frequent_pattern_based_classification.R")
One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.
The associative classification
The generic association classification algorithm is defined here. The input parameters for the kNN algorithm are as follows:
- D, which is a set of training objects
- F, which is the itemset
- MIN_SUP, which is the minimal support
- MIN_CONF, which is the minimal confidence
The output of the algorithm is a rule-based classifier and is shown as follows:
Two popular algorithms are illustrated in the successive sections, one is Classification Based on Association (CBA), and the other is Classification Based on Multiple Association Rules (CMAR).
Please look up the R codes files ch_04_associative_classification.R
, ch_04_cba.R
, and ch_04_frequent_pattern_based_classification.R
from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:
> source("ch_04_associative_classification.R") > source("ch_04_cba.R") > source("ch_04_frequent_pattern_based_classification.R")
One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.
Please look up the R codes files ch_04_associative_classification.R
, ch_04_cba.R
, and ch_04_frequent_pattern_based_classification.R
from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:
> source("ch_04_associative_classification.R") > source("ch_04_cba.R") > source("ch_04_frequent_pattern_based_classification.R")
One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.
Discriminative frequent pattern-based classification
Here is the pseudocode for discriminative frequent pattern-based classification:
Please look up the R codes files ch_04_associative_classification.R
, ch_04_cba.R
, and ch_04_frequent_pattern_based_classification.R
from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:
> source("ch_04_associative_classification.R") > source("ch_04_cba.R") > source("ch_04_frequent_pattern_based_classification.R")
One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.
The R implementation
Please look up the R codes files ch_04_associative_classification.R
, ch_04_cba.R
, and ch_04_frequent_pattern_based_classification.R
from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following commands:
> source("ch_04_associative_classification.R") > source("ch_04_cba.R") > source("ch_04_frequent_pattern_based_classification.R")
One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.
Text classification using sentential frequent itemsets
One application of CBA is text classification. The key here is to build a matrix with a document or text term and labels. With the matrix built, any classification algorithm can be applied to it. One example of document matrix is illustrated here. The term might include a character, word, phrase, and concept and so on.
The backpropagation (BP) algorithm learns the classification model by training a multilayer feed-forward neural network. The generic architecture of the neural network for BP is shown in the following diagrams, with one input layer, some hidden layers, and one output layer. Each layer contains some units or perceptron. Each unit might be linked to others by weighted connections. The values of the weights are initialized before the training. The number of units in each layer, number of hidden layers, and the connections will be empirically defined at the very start.
The training tuples are assigned to the input layer; each unit in the input layer calculates the result with certain functions and the input attributes from the training tuple, and then the output is served as the input parameter for the hidden layer; the calculation here happened layer by layer. As a consequence, the output of the network contains all the output of each unit in the output layer. The training is performed iteratively by updating the weights of the connections with the feedback of errors, that is, back propagation of the errors.
The prototype for the unit in the hidden/output layer is shown here, the one in the input layer with only one input and other minor differences compared to this. denotes the weight related to that link or connection. Each unit is bound with a bias, which is . The threshold or activation function bound with each unit is .
For a certain unit or perceptron in the hidden or output layer, the net input is a combination of the linear combination of each of its input from the previous unit, that is, the output of it, which is . Let k denote the number of input connections of the unit q:
The output of this unit q is .
If the unit q is in the output layer and denotes the expected or known output value in the training tuple, then its error can be calculated as follows:
If the unit q is in the hidden layer, then let denote the weight of the connection from unit q to one unit with the error in the next layer or the known output value in the training tuple. We can then calculate its error as . Let M denote the number of output connections of the unit q:
After this preprocessing or preparation, the weights and biases can be updated accordingly with the backpropagation strategy. denotes the learning rate; it is an empirical parameter, which belongs to (0,1)
:
The weights and biases are updated per tuple of the training tuple dataset.
The input parameters for the BP algorithm, the topology of the neural network, the number of hidden layers, and the connections, are defined before the start of the training:
- D, which is the training tuples set
- W, which is the initial values for all the weights.
- , which is the bias for each units
- I, which is the learning rate
The output of the algorithm is the topology structure of BP, which consists of:
- BPNN, which is the trained BP neural network
- W, which is a set of weights of the connections in the neural network
Here is the pseudocode for the training of the backpropagation network:
Please look up the R codes file ch_04_bp.R
from the bundle of R codes for the BP algorithm. The codes can be tested with the following command:
> source("ch_04_bp.R")
Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.
Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.
The backpropagation main algorithm is listed as follows:
Four steps compose the backpropagation mapper algorithm. They are listed here:
Five steps compose the backpropagation reducer algorithm. They are listed here:
The BP algorithm
The input parameters for the BP algorithm, the topology of the neural network, the number of hidden layers, and the connections, are defined before the start of the training:
- D, which is the training tuples set
- W, which is the initial values for all the weights.
- , which is the bias for each units
- I, which is the learning rate
The output of the algorithm is the topology structure of BP, which consists of:
- BPNN, which is the trained BP neural network
- W, which is a set of weights of the connections in the neural network
Here is the pseudocode for the training of the backpropagation network:
Please look up the R codes file ch_04_bp.R
from the bundle of R codes for the BP algorithm. The codes can be tested with the following command:
> source("ch_04_bp.R")
Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.
Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.
The backpropagation main algorithm is listed as follows:
Four steps compose the backpropagation mapper algorithm. They are listed here:
Five steps compose the backpropagation reducer algorithm. They are listed here:
The R implementation
Please look up the R codes file ch_04_bp.R
from the bundle of R codes for the BP algorithm. The codes can be tested with the following command:
> source("ch_04_bp.R")
Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.
Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.
The backpropagation main algorithm is listed as follows:
Four steps compose the backpropagation mapper algorithm. They are listed here:
Five steps compose the backpropagation reducer algorithm. They are listed here:
Parallel version with MapReduce
Massive data makes the training process of BP dramatically slow. The parallelized version of BP is proven to accelerate the speed in an amazing way. There are many versions of parallelized BPNN algorithms implemented on the MapReduce architecture. Here is one implementation, the MapReduce-based Backpropagation Neural Network (MBNN) algorithm.
Given the training dataset, each mapper is fed with one single training item. During the mapper tasks stage, new values for weights are calculated. Then, within the reducer tasks stage, new values for one weight are collected, and this results in an average value of these values to output for the weights. After the new values are provided, all the weights are updated, batch-wise. These steps are executed iteratively until the termination condition is matched.
The backpropagation main algorithm is listed as follows:
Four steps compose the backpropagation mapper algorithm. They are listed here:
Five steps compose the backpropagation reducer algorithm. They are listed here:
Here are some practice questions for you to check your understanding about the concepts covered:
- Find one application of the classification using frequent patterns
- What is the BBN algorithm?
- What is the kNN algorithm?
- What is the backpropagation algorithm?
- What is SVM?
In this chapter, we looked at the following topics:
- Ensemble methods
- Bagging
- AdaBoost
- Random forests
- BBN, which provides a topology model of causal relationship; it is an eager learner.
- The kNN algorithm, which is a lazy learner
- SVM, by which the original data is transformed to a higher dimension, and is separated with a hyper-plane; it is an eager learner
- Classification using frequent patterns; this is an eager learner
- Classification using the BP algorithm, which is a neural network trained with a descent gradient
In the next chapter, we will learn the clustering algorithm, which is also a kind of unsupervised classification with no predefined labels.