Understanding text classification
Text or document classification is simply a process of assigning one or more labels (often called classes) to a piece of text (often called a document). A text classifier is a machine learning model that receives some text as input and computes a probability distribution over a set of classes.
Text classification has many real-world uses and is actively used to solve the following problems:
- Topic classification – the process of assigning a topic to a document
- Spam detection – the process of detecting unwanted emails
- Sentiment analysis – classifying text sentiment into positive and negative
- Hate speech detection – identifying hate speech in text
- Language identification – figuring out what language a document is written in
Here's an example of the real world use of text classification:
In relation to how documents are...