Evaluating word embeddings
In the previous section, we covered the design of Flair embeddings that use language models. The process of training these language models isn't much different from any other type of deep learning training. But well-performing language models don't necessarily mean good embeddings that yield excellent results on downstream tasks.
Instead, there we typically need to rely on the following two approaches of evaluating word embeddings:
- Intrinsic evaluation aims to test the quality of embedding word representations independent of any natural language processing tasks. This is done by measuring semantic relationships between words. The simplest type of intrinsic embedding evaluation is word similarity. It simply uses a similarity metric, such as cosine distance, to measure the similarity between word embeddings and compares it to the human-perceived semantic similarity. For example, the words begin and start are semantically very similar. If...