-
Book Overview & Buying
-
Table Of Contents
scikit-learn Cookbook - Third Edition
By :
Once outliers have been identified, we face an important decision: how should we handle them? The appropriate strategy depends on the context of the problem and the nature of the data. Outliers can be informative (e.g., fraud cases) or disruptive (e.g., sensor glitches) and choosing how to treat them affects model performance and interpretability.
This recipe outlines common strategies for handling outliers, including removal, transformation, imputation, and retaining them for specialized modeling. We’ll walk through practical code examples to demonstrate each approach.
We’ll use a dataset that includes outliers detected via the Isolation Forest method.
Load the libraries:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.datasets import make_blobsGenerate the dataset:
X_inliers, _ = make_blobs(n_samples=300, centers=[[0, 0]], cluster_std=0.6, random_state...