3.8 IDENTIFYING OUTLIERS
Once the numeric fields are standardized, one may use the z‐values to identify outliers, which are records with extreme values along a particular dimension or dimensions. For example, consider the field number_of_contacts, which represents the number of customer contacts made over the course of the marketing campaign. The mean number of contacts per customer is 2.6, with a standard deviation of 2.7 (allowing for rounding). So, we obtain the standardized field as follows:
A rough rule of thumb is that a data value is an outlier if its z‐value is either greater than 3, or less than −3. For instance, a customer who had been contacted 10 times (which seems like a lot) would have standardized value,
Thus, 10 contacts, while a lot, is not identified as an outlier using this method, since 2.7 < 3.
The...