Binning scale variables to address missing data
This recipe will tackle the issue of null values that are non-applicable rather than values that are unknown. When transactions are processed for modeling, invariably there will be certain transactions that are missing for a given case. In this recipe our cases will be customers. Imagine the straightforward instance that a customer, Bill Johnson, did not rent a horror movie within the last 12 months. The Using an @NULL multiple Derive to explore missing data recipe in Chapter 1, Data Understanding, helps determine if the presence or absence of such a value is predictive of the target. This recipe prepares the original variable for modeling. The issue addressed in this recipe is virtually guaranteed to occur when preparing dates of transactions and that is the nature of this particular recipe. However, its application is not limited to date arithmetic on transactions. It can be used on any scale variable that has the possibility of a true null...