In this chapter, we discussed many of the issues that make sentence-detection a difficult task, such as problems that result from periods being used for numbers and abbreviations. The use of ellipses and embedded quotes can also be problematic.
Java provides a couple of techniques to detect the end of a sentence. We saw how regular expressions and the BreakIterator
class can be used. These techniques are useful for simple sentences, but they do not work that well for more complicated sentences.
The use of various NLP APIs was also illustrated. Some of these process the text based on rules, while others use models. We also demonstrated how models can be trained and evaluated.
In the next chapter, Chapter 4, Finding People and Things, you will learn how to find people and things using text.