Book Image

IBM SPSS Modeler Cookbook

Book Image

IBM SPSS Modeler Cookbook

Overview of this book

IBM SPSS Modeler is a data mining workbench that enables you to explore data, identify important relationships that you can leverage, and build predictive models quickly allowing your organization to base its decisions on hard data not hunches or guesswork. IBM SPSS Modeler Cookbook takes you beyond the basics and shares the tips, the timesavers, and the workarounds that experts use to increase productivity and extract maximum value from data. The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art. Follow the industry standard data mining process, gaining new skills at each stage, from loading data to integrating results into everyday business practices. Get a handle on the most efficient ways of extracting data from your own sources, preparing it for exploration and modeling. Master the best methods for building models that will perform well in the workplace. Go beyond the basics and get the full power of your data mining workbench with this practical guide.
Table of Contents (17 chapters)
IBM SPSS Modeler Cookbook
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Assessing the situation by Meta Brown


Modern computers and software make it easy to dive into data and explore, so why delay the action with assessment and planning? Why not get right down to business and see what develops? Your organization, be it business, a government agency, or nonprofit, has a mission. Your role as a data miner is to provide relevant information in support of that mission. Assessment and planning early in the data mining process aligns your efforts with management goals and maximizes your chances of developing information that is actionable, rather than merely interesting.

Time is your most precious resource, so ensure that you use it to meet the expectations set out for you. There's a simple survival motivation—if you don't deliver what your manager requires, and on time, it won't be good for you. Yet you want more than mere survival. It feels good to uncover really useful information, to find something that was not obvious to others, or to give factual support to what was once just a hunch. Perhaps you have some personal theories you'd like to explore; you will be able to do that if you get the requirements covered first.

Does any project go so smoothly that no roadblocks are encountered along the way? Perhaps you will face unexpected resistance to access the data you require. The tech support contact that has always been so helpful may be replaced by someone who is less cooperative. The subject matter expert whose help you require may not consider your project a high priority. Early preparation readies you to effectively address problems such as these so you can get on with your work.

As you go through the assessment and planning steps, understand that you are about to do much more than thinking and chatting. Each item must be documented in writing. Sponsoring managers should review these documents and revise them if necessary. Documents must be easily available to the data miners as the project progresses. These documents provide guidance as you go about your everyday work, support when challenges arise, and verify that the information you deliver is consistent with the goals set at the start.

Taking inventory of resources

Gather all the documents that mention the resources to be used in your project. Think broadly when considering resources; these may include intangibles such as executive sponsorship and approvals as well as direct resources such as people participating in the project, budgets, and hardware. Any informal notes and verbal or informal understandings should now be properly documented.

Some documents may contain private or sensitive information that is not appropriate to include in the project file. For each document of this type, create a simple document outlining the nature of the resource mentioned and the information that data miners working on the project will require. In some instances, these replacement documents may be as simple as the originals with sensitive information such as passwords omitted and replaced with the name of the person who has access and knows the password. While the original document contains a lot of sensitive or irrelevant information, the replacement document may be a summary of any sections relevant to data mining project resources, again with reference to the original source document and appropriate contacts.

Create an outline listing the major resource types for your project. These will include items such as the project description, personnel, data sources, and other relevant categories. Using the information in the documents that you have saved, prepare a summary for each heading in your outline. For example, under personnel, list the names and roles (data miners assigned to the project, a business expert or subject matter expert, and so on) of each person, information about skills and experience, and other details. Data should include explanations of the general purpose of the data source, how it may be accessed, data dictionaries (detailed descriptions of fields and coding within the data source), and so on. The project description may be the most difficult section to complete. In most instances, you will find that there are gaps in either your level of understanding or in project resources, or in documentation.

Take action now to obtain additional documentation for any areas where your understanding about resources is still informal. In some instances, this may require only an e-mail to confirm that a certain resource is at your disposal. Other items may be far more challenging, requiring meetings and considerable discussion. Tackle these now. Most important are the elements of the project description itself.

By establishing a clear, documented explanation of the work to be done and the resources available, you will save time and other resources while the work is underway. Revise and complete your summaries of each resource type in the outline. Circulate the document to project participants for final review and make any necessary corrections.

Doubts or disagreements about the direction of your project can be resolved by reviewing the project description and its evaluation criteria. Resistance from data gatekeepers can be addressed by referring to the original correspondence assuring access. In most instances, you will be able to resolve questions and conflicts without requiring further involvement of the management, and when it is required, your preparation will smooth the path.

Reviewing requirements, assumptions, and constraints

Prepare summaries for your understanding of project requirements, assumptions, and constraints. The more thoroughly you determine business objectives, the easier it will be to prepare these summaries. In most instances, though, you will discover some gaps in your understanding. The requirements section should refer to the project description and also include information regarding executive sponsorship and success criteria. You must establish a clear understanding of expectations, especially of how results will be evaluated. Assumptions may be verifiable (such as the distribution of a particular variable in the dataset) or not (such as the future level of growth in GDP). State whether each assumption is verifiable and if so, how. Constraints may include deadlines, resources and technological limitations, boundaries related to privacy and legal obligations, and others.

A well-defined understanding of management expectations is the most valuable thing a data miner can have. Establishing this from the start maximizes the chances of producing results that will motivate an executive to take action. Remedy any obvious gap in information through additional research and discussions to complete the summary of requirements, assumptions, and constraints.

When your project is completed, you will be making a report to the management on the results in writing, as a presentation, or both. By introducing your report with a reminder of the goals and success criteria set by the management for you at the start, you will establish that the results you are about to show are exactly what was requested. It means that they must be evaluated based on the criteria that were set from the start, and that if the results meet those criteria, the executive must follow through with the next steps.

Identifying risks and defining contingencies

Using the documents that you have saved, including dark and creative thinking, list all the risks that could delay or halt your work; organize these in categories. For example, your work depends on computing resources. What could threaten your access to computing resources? Hardware failure, network failure, and competing demands for use of equipment are among the possibilities. For each of these, create a contingency plan with one or more alternatives. Each contingency plan should first include preventive measures. Often, a bit of maintenance or negotiation now avoids aggravation and delays later. If you encounter any risks for which a satisfactory contingency is not available, address this concern with the management. An influential advocate may be able to open up alternatives or assure a greater level of security. If any of the contingencies that you have planned require the assistance of others or access to resources not normally under your control, contact the parties involved and verify that the contingency is realistic.

When your project is threatened, you will be able to respond quickly and effectively if you have a clear contingency planned in advance. No executive is sympathetic to excuses about project delays, no matter how valid they are. Stand out from the crowd by making productive use of all your time, even when conditions make that difficult, and completing your work on time, even when others do not.

Defining terminology

Review the documents that you have created and saved earlier. Cull these, creating a list of abbreviations, acronyms, and terminology that may not be immediately understood by all the stakeholders. As you review materials, imagine that it will be read long after the project is completed by someone who has a good understanding of business in general, does not work in your organization, is not very familiar with your field, and is not a data miner. What terms would be less than clear if the reader were an outside consultant, a new employee, or a manager coming from a different department or industry?

Organize the terms into three categories: general business terms—those that are used in many organizations—organization-specific terms, and data mining terms. Pay particular attention to organization-specific terms, as these are often problematic for outsiders or for those reviewing older projects after internal changes have caused terminology to change. If you have not documented those terms, important points in your work may become incomprehensible over time. Define all the terms, making an effort to explain how these are used within your organization. Illustrate your definitions with examples that are relevant to the project.

A glossary of terminology is a resource that helps all the stakeholders to clearly understand one another.

As you proceed with your work, refer to the glossary occasionally and make an effort to add additional terms as they arise and improve on the definitions you have created. You may choose to refer to specific steps and results in the data mining project.

Tip

Never consider the glossary as completely finished; treat it as a living document.

When the time comes to prepare your final report, the glossary will remind you of what terms to use that can be clearly understood by the management as well as refining your own understanding of those terms. You may choose to include the glossary as an appendix to written reports. In any event, be sure that the glossary is archived with other project resources. It is an important resource, even for your own use, when reviewing projects at a later time.

Evaluating costs and benefits

Review the material that you have prepared earlier in the project assessment, particularly materials relating to project requirements and their success criteria. Extract goals and success metrics. If these are not already stated in monetary terms, they must be converted. For example, perhaps the success metric is the conversion rate for a marketing campaign, and the success criteria calls for action if the conversion rate improves by 5 percent for a particular intervention.

How much money will that bring in for the company? You may have to assemble several facts to determine the answer. If the current campaign results in 100,000 sales at an average of 50 dollars, meeting the success criteria implies an additional 100,000 sales, which means it need to be 5 percent of 50 dollars per sale, or at least 250,000 dollars in increased revenue. In the same manner, identify any costs associated with the alternatives you will investigate. Include these in your summary side-by-side with benefits.

The cost/benefit analysis is a reality check for both the data miner and the business. Data miners must be reminded of the difference between an interesting model and solid return for the business; business managers cannot dismiss analytics when the financial impact is made clear.

Keep in mind that you are a data miner and not an accountant, so keep the analysis simple. It must be reasonable but not perfect. If your project absolutely demands a sophisticated cost/benefit analysis, it may be worthwhile to enlist the aid of an appropriate expert in finance.

No part of your final report will be more important or compelling than the information that is expressed in terms of cold, hard dollars. Indeed, this varies little even in organizations that are not profit-making businesses. If you have performed a cost/benefit analysis at the start of your project, you have a good motivator for everyone involved. You can be certain that your effort is worthwhile and that executives will understand the significance of your findings, not in the statistical sense but in terms of financial impact to the organization.