Book Image

fastText Quick Start Guide

By : Joydeep Bhattacharjee
Book Image

fastText Quick Start Guide

By: Joydeep Bhattacharjee

Overview of this book

Facebook's fastText library handles text representation and classification, used for Natural Language Processing (NLP). Most organizations have to deal with enormous amounts of text data on a daily basis, and gaining efficient data insights requires powerful NLP tools such as fastText.  This book is your ideal introduction to fastText. You will learn how to create fastText models from the command line, without the need for complicated code. You will explore the algorithms that fastText is built on and how to use them for word representation and text classification.  Next, you will use fastText in conjunction with other popular libraries and frameworks such as Keras, TensorFlow, and PyTorch.  Finally, you will deploy fastText models to mobile devices. By the end of this book, you will have all the required knowledge to use fastText in your own applications at work or in projects.
Table of Contents (14 chapters)
Free Chapter
1
First Steps
4
The FastText Model
7
Using FastText in Your Own Models

fastText model quantization

Due to the efforts of the Facebook AI Research team, there is a way to get vastly smaller models (in terms of the size that they take up in the hard drive), as you have seen in the Model quantization section in Chapter 2, Creating Models Using FastText Command Line. Models which take up hundreds of MBs can be quantized to only a couple of MBs. For example, if you see the DBpedia model released by Facebook, which can be accessed at the web page https://fasttext.cc/docs/en/supervised-models.html, notice that the regular model (this is the BIN file) is of 427 MB while the smaller model (the FTZ file) is only 1.7 MB.

This reduction in size is achieved by throwing out some of the information that is encoded in the BIN files (or the bigger model). The problem that needs to be solved here is how to keep information that is important and how to identify information...