Anomalies are a different but related kind of information from trends. While trend analysis aims to discover what is normal about a data stream, recognizing anomalies is about finding out which events represented in the data stream are clearly abnormal. To recognize anomalies, one must already have an idea of what is normal. Additionally, recognizing anomalies requires deciding some threshold of how far from normal data may be before it is labeled anomalous.
We will look at four techniques for recognizing anomalies. First, we'll devise two ways to use z-scores to identify data points that are significantly different from the average data point. Then we will look at a variation of principal component analysis, a kind of matrix decomposition technique similar to singular value decomposition from Chapter 4, A Blueprint for Recommending Products and Services, that separates normal data from anomalous or extreme events from noise. Finally, we will use a cosine similarity...