Book Image

fastText Quick Start Guide

By : Joydeep Bhattacharjee
Book Image

fastText Quick Start Guide

By: Joydeep Bhattacharjee

Overview of this book

Facebook's fastText library handles text representation and classification, used for Natural Language Processing (NLP). Most organizations have to deal with enormous amounts of text data on a daily basis, and gaining efficient data insights requires powerful NLP tools such as fastText.  This book is your ideal introduction to fastText. You will learn how to create fastText models from the command line, without the need for complicated code. You will explore the algorithms that fastText is built on and how to use them for word representation and text classification.  Next, you will use fastText in conjunction with other popular libraries and frameworks such as Keras, TensorFlow, and PyTorch.  Finally, you will deploy fastText models to mobile devices. By the end of this book, you will have all the required knowledge to use fastText in your own applications at work or in projects.
Table of Contents (14 chapters)
Free Chapter
First Steps
The FastText Model
Using FastText in Your Own Models

Gensim fastText parameters

Gensim supports the same hyperparameters that are supported in the native implementation of fastText. You should be able to set them as follows:

  • sentences: This can be a list of list of tokens. In general, a stream of tokens is recommended, such as LineSentence from the word2vec module, as you have seen earlier. In the Facebook fastText library this is given by the path to the file and is given by the -input parameter.
  • sg: Either 1 or 0. 1 means to train a skip-gram model, and 0 means to train a CBOW model. In the Facebook fastText library the equivalent is when you pass the skipgram and cbow arguments.
  • size: The dimensions of the word vectors and hence must be an integer. In line with the original implementation, 100 is chosen as default. This is similar to the -dim argument in the Facebook fastText implementation.
  • window: The window size that is considered...