In this chapter, the word augmenting techniques are similar to the methods from Chapter 5, which used the Nlpaug library. The difference is that rather than Python libraries, the wrapper functions use powerful ML models to achieve remarkable results. Sometimes, the output or rewritten text is akin to human writers.
In particular, you will learn four new techniques and two variants each. Let’s start with Word2Vec:
- The Word2Vec method uses the neural network NLP Word2Vec algorithm and the GoogleNews-vectors-negative300 pre-trained model. Google trained it using a large corpus containing about 100 billion words and 300 dimensions. Substitute and insert are the two mode variants.
- The BERT method uses Google’s transformer algorithm and BERT pre-trained model. Substitute and insert are the two mode variants.
- The RoBERTa method is a variation of the BERT model. Substitute and insert are the two mode variants.
- The last word augmenting technique...