-
Book Overview & Buying
-
Table Of Contents
IBM SPSS Modeler Essentials
By :
Datasets may contain duplicate records that often must be removed before data mining can begin. For example, the same individual may appear multiple times in a dataset with different addresses. The Distinct node finds or removes duplicate records in a dataset. The Distinct node, located in the Record Ops palette, checks for duplicate records and identifies the cases that appear more than once in a file so they can be reviewed and/or removed.
A duplicate case is defined by having identical data values on one or more fields that are specified. Any number or combination of fields may be used to specify a duplicate:
Distinct node from the Record Ops palette onto the canvas.Sort node to the Distinct node.Distinct node.The Distinct node can be a bit tricky to use; this is why we will run this node a couple of times, and hopefully in this way its options will become well-defined. The Mode option controls how the Distinct node is...