Book Image

Machine Learning for Imbalanced Data

By : Kumar Abhishek, Dr. Mounir Abdelaziz
Book Image

Machine Learning for Imbalanced Data

By: Kumar Abhishek, Dr. Mounir Abdelaziz

Overview of this book

As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This comprehensive guide helps you address this class imbalance to significantly improve model performance. Machine Learning for Imbalanced Data begins by introducing you to the challenges posed by imbalanced datasets and the importance of addressing these issues. It then guides you through techniques that enhance the performance of classical machine learning models when using imbalanced data, including various sampling and cost-sensitive learning methods. As you progress, you’ll delve into similar and more advanced techniques for deep learning models, employing PyTorch as the primary framework. Throughout the book, hands-on examples will provide working and reproducible code that’ll demonstrate the practical implementation of each technique. By the end of this book, you’ll be adept at identifying and addressing class imbalances and confidently applying various techniques, including sampling, cost-sensitive techniques, and threshold adjustment, while using traditional machine learning or deep learning models.
Table of Contents (15 chapters)

Questions

  1. Apply the CSL technique to the SVM model from scikit-learn while utilizing the dataset that was used in this chapter. Use the class_weight and sample_weight parameters, similar to how we used them for other models in this chapter. Compare the performance of this model with the ones that we already encountered in this chapter.
  2. LightGBM is another gradient-boosting framework similar to XGBoost. Apply the cost-sensitive learning technique to a LightGBM model while utilizing the dataset we used in this chapter. Use the class_weight and sample_weight parameters similar to how we used them for other models in this chapter. Compare the performance of this model with the ones that we already encountered in this chapter.
  3. AdaCost [10] is a variant of AdaBoost that combines boosting with CSL. It updates the training distribution for successive boosting rounds by utilizing the misclassification cost. Extend AdaBoostClassifier from scikit-learn to implement the AdaCost algorithm...