Using Isolation Forest to find anomalies
Isolation Forest is a relatively new machine learning technique for identifying anomalies. It has quickly become popular, partly because its algorithm is optimized to find anomalies, rather than normal values. It finds outliers by successive partitioning of the data until a data point has been isolated. Points that require fewer partitions to be isolated receive higher anomaly scores. This process turns out to be fairly easy on system resources. In this recipe, we demonstrate how to use it to detect outlier COVID-19 cases and deaths.
You will need scikit-learn and Matplotlib to run the code in this recipe. You can install them by entering
pip install sklearn and
pip install matplotlib in the terminal or
powershell (in Windows).
How to do it...
We will use Isolation Forest to find the countries whose attributes indicate that they are most anomalous:
matplotlib, and the