In this chapter, we will cover the following recipes:
Cleaning data with regular expressions
Maintaining consistency with synonym maps
Identifying and removing duplicate data
Regularizing numbers
Calculating relative values
Parsing dates and times
Lazily processing very large data sets
Sampling from very large data sets
Fixing spelling errors
Parsing custom data formats
Validating data with Valip