Book Image

IBM SPSS Modeler Cookbook

Book Image

IBM SPSS Modeler Cookbook

Overview of this book

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.
Table of Contents (17 chapters)
IBM SPSS Modeler Cookbook
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

How (and why) to validate as well as test


In this recipe we explore the importance of validation. The test data set sometimes carries a great burden. During the modeling phase it is not unusual to produce dozens of models. During that process, for some data miners, accuracy of the model on the test data set becomes the sole criterion for the ranking of the modeling attempts. That would certainly seem to be a violation of the Value Law quoted in the chapter introduction.

One can argue that this issue—the issue of value—and the issue of validation are not identical, but they are related. Even if one applies the recommended broader definition of value, if the actual behavior in choosing the semi-finalist models during the modeling phase is to check for stability and accuracy in the Analysis node, then one runs the risk of putting too much emphasis on a single source of information. After all, even if one wisely chooses the best model on a variety of criteria, the selection of the top 3 or top...