In this recipe, we will see how we can detect sentences so that we can use them for further analysis. Sentences are a very important text unit for data scientists to experiment different routing exercises, such as classification. To detect sentences from texts, we will be using Java's BreakIterator
class.
Go to https://docs.oracle.com/javase/7/docs/api/java/text/BreakIterator.html and see the examples. This will give you an idea on the usage of a break iterator.
As a test for this recipe's code, we will use two sentences that can create confusion to many regular-expression-based solutions. The two sentences for test are: My name is Rushdi Shams. You can use Dr. before my name as I have a PhD. but I am a bit shy to use it. Interestingly, we will see that Java's BreakIterator
class handles them with great efficiency.
Create a method that takes the test string as argument.
public void useSentenceIterator(String source){
Create a
sentenceiterator...