Balancing data and handling anomalous data are often thought of as the same process. In our case, data balancing involves understanding the techniques used to spread anomalous data without disrupting the underlying data distribution. In this recipe, we will discuss the core concepts in data balancing.
Generative modeling is attempting to build a model that represents the entire data distribution. In order to learn this underlying distribution, the data must represent that data in a verbose but compact form—that is, we want to ensure that each of the traits on features that we are attempting to learn, is represented in similar quantities the way in which they would be generated.