-
Book Overview & Buying
-
Table Of Contents
Machine Learning for Imbalanced Data
By :
Now, let’s look at some of the SMOTE variants, such as Borderline-SMOTE, SMOTE-NC, and SMOTEN. These variants apply the SMOTE algorithm to samples of a certain kind and may not always be applicable.
Borderline-SMOTE [4] is a variation of SMOTE that generates synthetic samples from the minority class samples that are near the classification boundary, which divides the majority class from the minority class.
The idea is that the examples near the classification boundary are more prone to misclassification than those far away from the decision boundary. Producing more such minority samples along the boundary would help the model learn better about the minority class. Intuitively, it is also true that the points away from the classification boundary likely won’t make the model a better classifier.
Here’s a step-by-step algorithm for Borderline-SMOTE: