It is not just enough to produce text; we also need a way to measure the quality of the produced text. One such way is to measure how surprised or perplexed the RNN was to see the output given the input. That is, if the cross-entropy loss for an input xi and its corresponding output yi is , then the perplexity would be as follows:
Using this, we can compute the average perplexity for a training dataset of size N with the following:
In Figure 6.12, we show the behavior of the training and validation perplexities over time. We can see that the train perplexity goes down over time steadily, where the validation perplexity is fluctuating significantly. This is expected because what we are essentially evaluating in the validation perplexity is our RNN's ability to predict a unseen text based on our learning on training data. Since language can be quite difficult to model, this is a very difficult task, and these fluctuations are natural: