Chapter 3. How to Clean Texts, Numbers, and Dates
The first two chapters dealt with the first stage of the data analysis life cycle, the data gathering stage. In this chapter, we will get our hands dirty with steps that some people may want to avoid altogether. This is the stage that will challenge you creatively and intellectually and check your patience. Getting the data and then preparing it for a report or an analysis can be very time consuming. I remember working on a project that I was very excited about; I had just read a book about data mining and I was very eager to apply some of the knowledge that was still fresh in my mind. At that time, I was working for a book wholesaler, a company that purchased and sold college text books. My goal was to gather historical prices of books and predict when a book's price would go up or would go down. Just imagine that if this was possible, I could tell my boss which books to buy today or later because I could "predict" the future price of each...