One common problem with data is inconsistency. Sometimes a value is capitalized and sometimes not, sometimes abbreviated and sometimes in full, and sometimes it is misspelled.
When it's an open domain, such as words in a free-text field, the problem can be quite difficult. However, when the data represents a limited vocabulary—like US state names, for our example here—there's a simple trick that can help. A mapping from common forms or mistakes to a normalized form is an easy way to fix variants in a field.