Defining dataset threats
ML depends heavily on clean data. Dataset threats are especially problematic because ML techniques require huge datasets that aren’t easily monitored. The following sections help you categorize dataset threats to make them easier to understand.
Security and data in ML
Even though many of the issues addressed in this chapter also apply to data management best practices, they take on special meaning for ML because ML relies on such huge amounts of automatically collected data. Certain entities can easily add, subtract, or modify the data without anyone knowing because it’s not possible to check every piece of data or even use automation to verify it with absolute certainty. Consequently, with ML, it’s entirely possible to have a security issue and not know about it unless due diligence is exercised to remove as many possible sources of data threats as possible.
Learning about the kinds of database threats
Dataset modification...