Defining a retention policy
Working with large volumes of data can become an expensive operation over time. As your data volume grows, so do your storage costs. Additionally, in some cases, working with aging data may provide undesired results. When we are working with machine-generated data, especially from application logs and IoT devices, data volumes grow quite quickly. This begs the question, how long do you need to keep your data?
In machine learning, the more data you have in your hands to train new models, the better. This is based on the law of large numbers, which states that the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed. Translating that to the data verbatim, the more samples you have of a certain measurement, the closer you are to predicting what that measurement should be in the future. Some researchers, however, such as Dr. Andrew Ng, one of the...