Text classification for question tags
This section is about supervised learning. We define the problem of assigning tags to a question as a text classification problem and we apply it to a dataset of questions from Stack Exchange.
Before introducing the details of text classification, let's consider the following question from the Movies & TV Stack Exchange website (title and body of the question have been merged):
"What's the House MD episode where he hired a woman to fake dead to fool the team? I remember a (supposedly dead) woman waking up and giving a high-five to House. Which episode was this from?"
The preceding question asks for details about a particular episode of the popular TV series House, M.D. As described earlier, questions on Stack Exchange are labeled with tags with the purpose of quickly identifying the topic of the question. The tags assigned by the user to this question are house
and identify-this-episode
, the first one being a reference to the TV series itself, while...