Machine learning algorithms tend to tremble when faced with imbalanced classification datasets due to the lack of necessary information about the minority class to make an accurate prediction. Imbalanced classification refers to a supervised learning problem where one class outnumbers another class by a large proportion.
Luckily, there are some useful techniques to treat imbalanced datasets before applying the dataset for ML prediction:
- Undersampling: This approach reduces the number of observations from the majority class to make the dataset balanced and is well suited for large datasets by eliminating some training examples of the majority class.
- Oversampling: This approach randomly replicates the observations from the minority class to balance the data. It is also known as Upsampling.
- Synthetic Minority Oversampling (SMOTE): This...