The statsmodels library allows us to define arbitrary weights per data point for regression. Outliers are sometimes easy to spot with simple rules of thumbs. One of these rules of thumb is based on the interquartile range, which is the difference between the first and third quartile of data. With the interquartile ranges, we can define weights for the weighted least squares regression.
We will use the data and model from Fitting a robust linear mode, but with arbitrary weights. The points we suspect are outliers will get a lower weight, which is the inverse of the interquartile range values just mentioned.
Fit the data with weighted least squares using the following method:
The imports are as follows:
import dautil as dl import matplotlib.pyplot as plt import statsmodels.api as sm import numpy as np from IPython.display import HTML
Load the data and add an outlier:
temp = dl.data.Weather.load()['TEMP'].dropna() temp = dl.ts...