DATA SCRUBBING
Similar to Swiss or Japanese watch design, a good machine learning model should run smoothly and contain no extra parts. This means avoiding syntax or other errors that prevent the code from executing and removing redundant variables that might clog up the model’s decision path.
This inclination towards simplicity extends to beginners coding their first model. When working with a new algorithm, it helps to create a minimal viable model and add complexity to the code later. If you find yourself at an impasse, look at the troublesome element and ask, “Do I need it?” If the model can’t handle missing values or multiple variable types, the quickest cure is to remove those variables. This should help the afflicted model spring to life and breathe normally. Once the model is working, you can go back and add complexity to your code.
Let’s now take a look at specific data scrubbing techniques to prepare, streamline,...