Monitoring and anticipating drifts
In the previous section, we understood how a thorough data analysis and data profiling approach can help us to identify data issues related to volume, consistency, and purity. Usually, during the initial data exploration process, most data scientists try to inspect issues in the dataset in terms of volume and purity and perform necessary preprocessing and feature engineering steps to handle these issues.
But the detection of data consistency for real-time systems and production systems is a challenging problem for almost all ML systems. Additionally, issues relating to data consistency are often overlooked and are quite unpredictable as they can happen at any point in time in production systems. Some of the cases where data consistency issues can occur are listed as follows:
- They can occur due to natural reasons such as changes in external environmental conditions or due to the natural wear and tear of sensors or systems capturing the...