Understanding word embeddings
Word embeddings are machine-interpretable representations of words such that embeddings of word pairs with similar meanings will have similar embeddings and words with dissimilar meanings will have vastly different embeddings.
In the first chapter, where we covered the basics of embeddings, we loosely defined embeddings as vector representations of a particular character, word, sentence, paragraph, or text document. These vectors are often made up of hundreds or thousands of real numbers. Each position in this vector is referred to as a dimension.
By now, you've probably wondered how we can tell whether two word embeddings are similar or different from each other. There are several metrics, such as cosine similarity, Euclidean distance, and Jaccard distance, that try to quantify this. Cosine similarity is usually the most commonly used method.
Cosine similarity
Given two embedding vectors, A and B, cosine similarity is defined as follows...