This chapter will cover the following recipes:
Trimming excess whitespace
Ignoring punctuation and specific characters
Coping with unexpected or missing input
Validating records by matching regular expressions
Lexing and parsing an e-mail address
Deduplication of nonconflicting data items
Deduplication of conflicting data items
Implementing a frequency table using Data.List
Implementing a frequency table using Data.MultiSet
Computing the Manhattan distance
Computing the Euclidean distance
Comparing scaled data using the Pearson correlation coefficient
Comparing sparse data using cosine similarity