In this recipe, we will describe a tokenizer that modifies the tokens in the token stream. We will extend the ModifyTokenTokenizerFactory
class to return text that is rotated by 13 places in the English alphabet, also known as rot-13. Rot-13 is a very simple substitution cipher, which replaces a letter with the letter that follows after 13 places. For example, the letter a
will be replaced by the letter n
, and the letter z
will be replaced by the letter m
. This is a reciprocal cypher, which means that applying the same cypher twice recovers the original text.
We will invoke the Rot13TokenizerFactory
class from the command line:
java -cp "lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar" com.lingpipe.cookbook.chapter2.Rot13TokenizerFactory type a sentence below to see the tokens and white spaces: Move along, nothing to see here. Token:'zbir' Token:'nybat' Token:',' Token:'abguvat' Token:'gb' Token:'frr' Token:'urer' Token:'.' Modified Output: zbir...