As we've discussed before, there are countless ways that a dataset could be messy. There are many other messy situations and solutions that we can't discuss at length here. So that you, dear reader, are not left in the dark regarding other custodial solutions, here are some other tools that you may find helpful on your analytics journey.
Though OpenRefine (formerly Google Refine) doesn't have anything to do with R per se, it is a sophisticated tool for working with and cleaning up messy data. Among its numerous, sophisticated capabilities is the ability to auto-detect misspelled or misspecified categories and fix them with the click of a button.
In order to prepare for record linkage on book titles, we normalized the strings by performing a few operations on them. An alternative is to use what is referred to as fuzzy matching.
An exact match between two strings requires that the strings be... well... exactly the same. For example, Finnegans...