In this recipe, we'll do a relatively simple supervised task: based on texts, we'll train a model to determine what an article is about, from a selection of topics. This is a relatively common task with NLP; we'll try to give an overview of different ways to approach this.
You might also want to compare the Battling algorithmic bias recipe in Chapter 2, Advanced Topics in Supervised Machine Learning, on how to approach this problem using a bag-of-words approach (CountVectorizer in scikit-learn). In this recipe, we'll be using approaches with word embeddings and deep learning models using word embeddings.
In this recipe, we'll be using scikit-learn and TensorFlow (Keras), as in so many other recipes of this book. Additionally, we'll use word embeddings that we'll have to download, and we'll use utility functions from the Gensim library to apply them in our machine learning pipeline:
!pip install gensim