Evaluating balancing with Auto Classifier
Two traps to avoid in data mining are that one should always balance, or that there is only one way to balance. Like most questions asked during a data mining project, the question of whether to balance or not should be answered empirically. The purpose of this recipe is to show how three common kinds of balancing can be compared easily using the Auto Classifier node. This is not to suggest that the resulting models are final models. Rather, this is an early test that can be conducted to evaluate whether or not to balance. One of the kinds of balancing suggested here is to not balance at all. Another suggestion is to double the numbers in a fully reduced balance node.
Getting ready
We will start with the Choose Balance.str
stream.
How to do it...
To show how three common kinds of balancing can be compared easily using the Auto Classifier Node:
Open the starting stream.
Edit the Balance node labeled
Fully Reduce
. This node was automatically generated by...