Book Image

IBM SPSS Modeler Cookbook

Book Image

IBM SPSS Modeler Cookbook

Overview of this book

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.
Table of Contents (11 chapters)
10
Index

Building models with and without outliers


The Anomaly Modeling node can automatically identify and remove outliers. Why not always remove outliers? Even when the data is examined closely, it can be difficult to decide whether any cases should be regarded as outliers and, if so, which. Even when the data miner feels confident about this, the internal or external client may not agree.

Some types of analysis are not affected much by outliers, for example, the calculation of a median. But many widely used modeling methods can be strongly influenced by the presence of outliers. A linear regression model can be shifted significantly by a single outlier in the data.

What are the risks? A model that is affected by an outlier may frequently predict values that are too high, or too low. The level of uncertainty in estimated values will be increased. When the predicted values are plotted against actual outcomes, viewers will likely sense that the graph looks or feels wrong, and the model does not fit...