Building models with and without outliers
The Anomaly Modeling node can automatically identify and remove outliers. Why not always remove outliers? Even when the data is examined closely, it can be difficult to decide whether any cases should be regarded as outliers and, if so, which. Even when the data miner feels confident about this, the internal or external client may not agree.
Some types of analysis are not affected much by outliers, for example, the calculation of a median. But many widely used modeling methods can be strongly influenced by the presence of outliers. A linear regression model can be shifted significantly by a single outlier in the data.
What are the risks? A model that is affected by an outlier may frequently predict values that are too high, or too low. The level of uncertainty in estimated values will be increased. When the predicted values are plotted against actual outcomes, viewers will likely sense that the graph looks or feels wrong, and the model does not fit...