How (and why) to validate as well as test
In this recipe we explore the importance of validation. The test data set sometimes carries a great burden. During the modeling phase it is not unusual to produce dozens of models. During that process, for some data miners, accuracy of the model on the test data set becomes the sole criterion for the ranking of the modeling attempts. That would certainly seem to be a violation of the Value Law quoted in the chapter introduction.
One can argue that this issue—the issue of value—and the issue of validation are not identical, but they are related. Even if one applies the recommended broader definition of value, if the actual behavior in choosing the semi-finalist models during the modeling phase is to check for stability and accuracy in the Analysis node, then one runs the risk of putting too much emphasis on a single source of information. After all, even if one wisely chooses the best model on a variety of criteria, the selection of the top 3 or top...