Cleaning and preparing data
Data preparation is a crucial step in machine learning because the quality, relevance, and suitability of the data used for model training directly impact the accuracy, reliability, and effectiveness of the resulting machine learning models.
- Removing null values
- Removing columns that are not needed
- Encoding (for example, the one-hot encoding that we used in some of the examples in Chapter 2)
- Feature scaling
- Splitting into test and training datasets
- Setting correct data types
- Removing duplicates
- Correcting data errors
- Removing outliers
Let’s take a closer look at some of these steps using examples.