Collection of data from the real world is fraught with massive challenges. The raw data collected is plagued with a lot of issues, so much so that we need to adopt ways to sanitize the data to make it suitable for use in further studies.
Raw data as collected from the field is rigged with human error. Data entry is a major source of error when collecting data. Even technological methods of collecting data are not spared. Inaccurate reading of devices, faulty gadgetry, and changes in environmental factors can introduce significant margins of errors as data is collected.
The data collected may also be inconsistent with other records collected over time. The existence of duplicate entries and incomplete records warrant that we treat the data in such a way as to bring out hidden and buried treasure. The raw data may also be shrouded in a sea of irrelevant data.
To clean the data up, we can totally discard irrelevant data, better known as noise. Data with...