This was quite a long chapter. However, the chapter still just scratched the surface of possible issues with data preparation. Remember that in a real-life project, you spend the majority of time doing the not-so fun parts: the data preparation and data overview.
In this chapter, you learned the basics of data preparation, including handling missing values, dealing with nominal variables, different ways of discretizing continuous variables, and how to measure the entropy of a discrete variable. At the end, you learned about some efficient ways of performing data manipulation in T-SQL, Python, and R.
The last three chapters of this book are concerned with introducing real analysis of the data. I will start with intermediate-level statistical methods, to explore the associations between variables, and with more advanced graphs.