Loading a corpus reader can be an expensive operation due to the number of files, file sizes, and various initialization tasks. And while you'll often want to specify a corpus reader in a common module, you don't always need to access it right away. To speed up module import time when a corpus reader is defined, NLTK provides a LazyCorpusLoader
class that can transform itself into your actual corpus reader as soon as you need it. This way, you can define a corpus reader in a common module without it slowing down module loading.
The LazyCorpusLoader
class requires two arguments: the name of the corpus and the corpus reader class, plus any other arguments needed to initialize the corpus reader class.
The name
argument specifies the root directory name of the corpus, which must be within a corpora
subdirectory of one of the paths in nltk.data.path
. See the Setting up a custom corpus recipe of this chapter for more details on nltk.data.path
.
For example, if you...