Creating time-aligned cohorts
In this recipe we will create a table that combines customer information, monthly statements, and churner identifiers conditioned by cohort information.
Why we would do this is best explained by means of an example. Suppose we wish to identify the best predictors of whether a customer is going to churn. To do this we might be tempted to throw everyone into a pot of data and see what algorithm best predicts who are churners and who are not churners. There are two immediate problems with this: one, the results would be skewed where we would have many more non-churners than churners going into the analysis, and two, the process used would be insensitive to everything going on within similar customer traits. After all, while John churned in January 2012, Sally (who came from the same region) has not churned. Wouldn't it make more sense to fine-tune the analysis so that we are comparing customers with similar experiences but different outcomes? That way we get the...