Book Image

IBM SPSS Modeler Cookbook

Book Image

IBM SPSS Modeler Cookbook

Overview of this book

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.
Table of Contents (11 chapters)
10
Index

Introduction


This chapter will focus on the Construct subtask of CRISP-DM's data preparation phase. The CRISP-DM document describes it as follows:

This task includes constructive data preparation operations such as the production of derived attributes, entire new records, or transformed values for existing attributes.

Of all the subtasks in CRISP-DM, the Construct subtask is a good candidate for the one that many novices fail to plan enough time for. Everyone knows that the data must be cleaned and braced for that task to take a long time. "What needs to be constructed?", one might ask. The example that frequently inspires the Aha! experience is dates. Dates—quite simply—are nearly useless in the modeling phase. They are stored as merely points in time. The modeling algorithms have to work awfully hard to spot an interesting date—perhaps spotting a difference between big dates and little dates. One needs to give the algorithms a major helping hand. But, what is interesting are the distances...