Stanford University's Pennington, et al. developed an extension of the word2vec
method that is called Global Vectors for Word Representation (GloVe) for efficiently learning word vectors.
GloVe combines the global statistics of matrix factorization techniques, such as LSA, with the local context-based learning in word2vec
. Also, unlike word2vec
, rather than using a window to define local context, GloVe constructs an explicit word context or word co-occurrence matrix using statistics across the whole text corpus. As an effect, the learning model yields generally better word embeddings.
The text2vec
library in R has a GloVe implementation that we could use to train to obtain word embeddings from our own training corpus. Alternatively, pretrained GloVe word embeddings can be downloaded and reused, similar to the way we did in the earlier word2vec
pretrained embedding project covered in the previous section.
The following code block...