Using Bayes Rule to Create an AI Model
All right, it's time to leave my music taste behind and think on this Mandrill tweet problem. You're going to treat each tweet as a bag of words, meaning you'll break each tweet up into words (often called tokens) at spaces and punctuation. There are two classes of tweets—called app for the Mandrill.com
tweets and other for everything else.
You care about these two probabilities:
- p(app | word1, word2, word3, …)
- p(other | word1, word2, word3, …)
These are the probabilities of a tweet being either about the app or about something else given that we see the words “word1,” “word2,” “word3,” etc.
The standard implementation of a naïve Bayes model classifies a new document based on which of these two classes is most likely given the words. In other words, if:
- p(app | word1, word2, word3, …) > p(other | word1, word2, word3, …)
then you have a tweet about...