Various statistical techniques can be used to identify spam. These generally involve a training phase, where a database of spam and ham emails is taught to the filter or passed through it to identify typical characteristics of spam and ham. This allows future emails to be identified based on the learning from past emails. The various statistical techniques vary in their choice of tokens and the algorithms they use to predict whether an email is spam or ham. The tokens used are normally words, but can include email headers, HTML markup within emails, and other characters such as punctuation marks.
Statistical filters rely on regular training. They use the knowledge gained in training to estimate the probability that new emails are spam. As spam changes, the filter must adapt in order to continue to detect the spam.
SpamAssassin contains a statistical filter based on Bayesian analysis. This is enabled by default and, if trained properly, aids in the correct recognition of...