-
Book Overview & Buying
-
Table Of Contents
Python Data Cleaning Cookbook
By :
Isolation Forest is a relatively new machine learning technique for identifying anomalies. It has quickly become popular, partly because its algorithm is optimized to find anomalies, rather than normal values. It finds outliers by successive partitioning of the data until a data point has been isolated. Points that require fewer partitions to be isolated receive higher anomaly scores. This process turns out to be fairly easy on system resources. In this recipe, we demonstrate how to use it to detect outlier COVID-19 cases and deaths.
You will need scikit-learn and Matplotlib to run the code in this recipe. You can install them by entering pip install sklearn and pip install matplotlib in the terminal or powershell (in Windows).
We will use Isolation Forest to find the countries whose attributes indicate that they are most anomalous:
pandas, matplotlib, and the StandardScaler and IsolationForest...