We will demonstrate POS tagging using OpenNLP, Stanford API, and LingPipe. Each of the examples will use the following sentence. It is the first sentence of Chapter 5 from At A Venture, of Twenty Thousands Leagues Under the Sea, by Jules Verne:
private String[] sentence = {"The", "voyage", "of", "the", "Abraham", "Lincoln", "was", "for", "a", "long", "time", "marked", "by", "no", "special", "incident."};
The text to be processed may not always be defined in this fashion. Sometimes, the sentence will be available as a single string:
String theSentence = "The voyage of the Abraham Lincoln was for a " + "long time marked by no special incident.";
We might need to convert a string to an array of strings. There are numerous techniques for converting this string to an array of words. The following tokenizeSentence
method performs this operation:
public String[] tokenizeSentence(String sentence) { String words[] = sentence.split("S+"); return words; ...