Summary
In this chapter, we covered Flair's base types, such as the Sentence
and Token
objects, explained how to initialize and use them, and tried out some of their basic helper methods. This should allow us to handle, transform, and understand data in Flair more easily as we move toward more complex topics. We also covered using custom tokenizers in Flair and implemented our own character-based tokenizer. Finally, we scraped the surface of what Flair's datasets and the Corpus
objects can do. We learned how to load corpora and datasets, assess their size, extract, and read individual sentences, and downsample entire datasets.
We are now familiar enough with the syntax, basic objects and helper methods to be able to move on to Flair's most powerful NLP technique – sequence tagging. We will cover this in the next chapter.