Thresholds and local thresholds
The next option to be explored is the use of thresholds. As we have seen, most of our classifiers provide scores for every option for each tweet, with the default setting being to choose the option with the highest score. In Chapter 6, Naive Bayes, we saw that assuming that our classifier will assign exactly one label to each tweet puts quite a tight upper bound on how well it can perform and that instead of doing that, we can set a threshold and say that everything that exceeds that threshold should be accepted as a label.
Consider the following tweet: “Hi guys ! I now do lessons via Skype ! Contact me for more info . # skype # lesson # basslessons # teacher # free lesson # music # groove # rock # blues.”
The Gold Standard assigns this the scores (‘anger’, 0), (‘anticipation’, 1), (‘disgust’, 0), (‘fear’, 0), (‘joy’, 1), (‘love’, 0), (‘optimism...