We have already seen the importance of data representation and distribution in tackling the problem of Bias and Variance. Another related problem we encounter is the unequal distribution of data among various classes in classification tasks. This is called data imbalance. For example if we have a binary classification problem and one of the classes has 50000 images and the other class has only 1000 images, this can lead to huge problems in the performance of the trained algorithm. We have to tackle this problem of imbalanced data by:
Yes it is always better to make the class data distribution equal. Gather as much data as possible and populate the class with fewer samples. For this purpose you can search for databases over the internet which are similar to your problem and include these. Simple web searches can also bring many images uploaded by various sources. Sometimes you will see that the model performance does not improve with more data. This is an...