Introduction
This chapter will focus on the Construct subtask of CRISP-DM's data preparation phase. The CRISP-DM document describes it as follows:
This task includes constructive data preparation operations such as the production of derived attributes, entire new records, or transformed values for existing attributes.
Of all the subtasks in CRISP-DM, the Construct subtask is a good candidate for the one that many novices fail to plan enough time for. Everyone knows that the data must be cleaned and braced for that task to take a long time. "What needs to be constructed?", one might ask. The example that frequently inspires the Aha! experience is dates. Dates—quite simply—are nearly useless in the modeling phase. They are stored as merely points in time. The modeling algorithms have to work awfully hard to spot an interesting date—perhaps spotting a difference between big dates and little dates. One needs to give the algorithms a major helping hand. But, what is interesting are the distances...