Book Image

IBM SPSS Modeler Cookbook

By : Keith McCormick, Abbott
Book Image

IBM SPSS Modeler Cookbook

By: Keith McCormick, Abbott

Overview of this book

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.
Table of Contents (11 chapters)
10
Index

Historical introduction to scripting

By the time Clementine Version 4 was released in 1997, the workbench had gained substantial market traction. Its revolutionary visual programming interface had enabled a more business-focused approach to analytics than ever before—all the major families of algorithms were represented in an easy-to-use form, ODBC had enabled integration with a comprehensive range of data, and commercial partners were busy rebadging Clementine to reach a wider audience through new market channels.

The workbench lacked one major kind of functionality, that of automation, to enable the embedding of data mining within other applications. It was therefore decided that automation would form the centre piece of Version 5, and it would be provided by two major features: batch mode and scripting. Batch mode enabled running the workbench without the user interface so that streams could be run in the background, could be scheduled to run at a given time or at regular intervals, and could be run as part of a larger application. Scripting enabled the user to gain automated control of stream execution, even without the user being present; this was also a prerequisite for any complex operation executed in batch mode.

The motivation behind scripting was to provide a number of capabilities:

  • Gain control of the order of stream execution where this matters, that is, when using the Set Globals node
  • Automate repetitive processes, for example, cross-validation or the exploration of many different sets of fields or options
  • Remove the need for user intervention so that streams could run in the background
  • Manipulate complex streams, for example, if the need arose to create 1000 different Derive nodes

These motives led to an underlying philosophy of scripting, that is, scripts replace the user, not the stream. This means that the operations of scripting should be at the same level as the actions of the user, that is, they would create nodes and link them, control their settings, execute streams, and save streams and models. Scripts would not be used to implement data manipulation or algorithms directly; these would remain in the domain of the stream itself. This reflects a fundamental fact about technologies—they are defined by what they cannot do as by what they can. These principles are not inflexible, for example, cross-validation might be considered as part of an algorithm but was one of the first scripts to be written; however, they guided the design of the scripting language. A consequence of this philosophy was that there could be no interaction between script and data; the restriction was lifted only later with the introduction of access to output objects.

A number of factors influenced the design of the scripting language in addition to the above philosophy:

  • In line with the orientation towards nontechnical users, the language should be simple
  • The timescale for implementation was short, so the language should be easy to implement
  • The language should be familiar, and so should use existing programming concepts and constructs, and not attempt to introduce new ones

These philosophical and practical constraints led to a programming language influenced by BASIC, with structured features taken from POP-11 and an object-oriented approach to nodes taken from Smalltalk and its descendants.