Book Image

Natural Language Processing with Java - Second Edition

By : Richard M. Reese

Book Image

Natural Language Processing with Java - Second Edition

By: Richard M. Reese

Overview of this book

Natural Language Processing (NLP) allows you to take any sentence and identify patterns, special names, company names, and more. The second edition of Natural Language Processing with Java teaches you how to perform language analysis with the help of Java libraries, while constantly gaining insights from the outcomes. You’ll start by understanding how NLP and its various concepts work. Having got to grips with the basics, you’ll explore important tools and libraries in Java for NLP, such as CoreNLP, OpenNLP, Neuroph, and Mallet. You’ll then start performing NLP on different inputs and tasks, such as tokenization, model training, parts-of-speech and parsing trees. You’ll learn about statistical machine translation, summarization, dialog systems, complex searches, supervised and unsupervised NLP, and more. By the end of this book, you’ll have learned more about NLP, neural networks, and various other trained models in Java for enhancing the performance of NLP applications.

Title Page

Dedication

Packt Upsell

Contributors

Preface

Free Chapter

Introduction to NLP

Introduction to NLP

Why is NLP so hard?

Survey of NLP tools

Deep learning for Java

Overview of text-processing tasks

Understanding NLP models

Finding Parts of Text

Finding Parts of Text

Understanding the parts of text

What is tokenization?

Simple Java tokenizers

NLP tokenizer APIs

Understanding normalization

Finding Sentences

Finding Sentences

The SBD process

What makes SBD difficult?

Understanding the SBD rules of LingPipe's HeuristicSentenceModel class

Simple Java SBDs

Training a sentence-detector model

Finding People and Things

Finding People and Things

Why is NER difficult?

Techniques for name recognition

Using regular expressions for NER

Building a new dataset with the NER annotation tool

Training a model

Detecting Part of Speech

Detecting Part of Speech

The tagging process

Using the NLP APIs

Representing Text with Features

Representing Text with Features

Dimensionality reduction

Principle component analysis

Distributed stochastic neighbor embedding

Information Retrieval

Information Retrieval

Boolean retrieval

Dictionaries and tolerant retrieval

Vector space model

Scoring and term weighting

Inverse document frequency

TF-IDF weighting

Evaluation of information retrieval systems

Classifying Texts and Documents

Classifying Texts and Documents

How classification is used

Understanding sentiment analysis

Text-classifying techniques

Using APIs to classify text

Topic Modeling

What is topic modeling?

The basics of LDA

Topic modeling with MALLET

Using Parsers to Extract Relationships

Using Parsers to Extract Relationships

Relationship types

Understanding parse trees

Using extracted relationships

Extracting relationships

Extracting relationships for a question-answer system

Combined Pipeline

Combined Pipeline

Using boilerpipe to extract text from HTML

Using POI to extract text from Word documents

Using PDFBox to extract text from PDF documents

Using Apache Tika for content analysis and extraction

Using the Stanford pipeline

Using multiple cores with the Stanford pipeline

Creating a pipeline to search text

Creating a Chatbot

Creating a Chatbot

Chatbot architecture

Artificial Linguistic Internet Computer Entity

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Index

A

abbreviations / Finding parts of text, What makes SBD difficult?
acronyms / Finding parts of text
Aerosolve
- reference / Deep learning for Java
AI chatbot / Chatbot architecture
American National Corpus
- reference / The tagging process
annotators / Extracting relationships
answering queries / Why use NLP?
Apache Lucene Core
- about / Apache Lucene Core
- references / Apache Lucene Core
Apache OpenNLP
- about / Apache OpenNLP
- references / Apache OpenNLP
Apache PDFBox for PDF
- reference / Preparing data
Apache POI for Word
- reference / Preparing data
Apache POI project
- reference / Using POI to extract text from Word documents
Apache Tika
- download link / Using Apache Tika for content analysis and extraction
- using, for content analysis / Using Apache Tika for content analysis and extraction
- using, for text extraction / Using Apache Tika for content analysis and extraction
approaches, for POS identification (tagging)
- rule-based taggers / The tagging process
- stochastic taggers / The tagging process
Artificial Intelligence (AI) / What is NLP?
Artificial Intelligence Markup Language (AIML)
- about / Chatbot architecture, Understanding AIML
- chatbots, developing / Developing a chatbot using ALICE and AIML
Artificial Linguistic Internet Computer Entity (ALICE)
- about / Artificial Linguistic Internet Computer Entity
- chatbots, developing / Developing a chatbot using ALICE and AIML

B

boilerpipe
- used, for extracting text from HTML / Using boilerpipe to extract text from HTML
boilerpipe for HTML
- reference / Preparing data
Boolean retrieval / Boolean retrieval
brat
- reference / Building a new dataset with the NER annotation tool
BreakIterator class
- using / Using the BreakIterator class
British National Corpus
- reference / The tagging process
Brown Corpus
- reference / The tagging process

C

case / What is tokenization?
chatbots
- architecture / Chatbot architecture
- simple chatbot / Chatbot architecture
- conversational chatbot / Chatbot architecture
- AI chatbot / Chatbot architecture
- aspect / Chatbot architecture
- developing, ALICE used / Developing a chatbot using ALICE and AIML
- developing, AIML used / Developing a chatbot using ALICE and AIML
chunking / Techniques for name recognition
classification
- about / Classifying text and documents
- need for / How classification is used
- ColumnDataClassifier class, using for / Using the ColumnDataClassifier class for classification
Classified class
- used, for training text / Training text using the Classified class
classifiers / Classifying text and documents
clustering / Classifying text and documents
collection frequency (cf) / Inverse document frequency
ColumnDataClassifier class
- using, for classification / Using the ColumnDataClassifier class for classification
conditional random field (CRF) / Using the Stanford API for NER
continuous bag of word (CBOW) / Word embedding
contractions / Finding parts of text
conversational chatbot / Chatbot architecture
coreference resolution / Why is NLP so hard?
coreference resolution entities
- finding / Finding coreference resolution entities
corpus / Building and training the model

D

data
- preparing / Preparing data, Preparing data
dataset
- building, with NER annotation tool / Building a new dataset with the NER annotation tool
DBpedia
- reference / Using extracted relationships
deep learning
- for Java / Deep learning for Java
- tools / Deep learning for Java
Deeplearning4J
- reference / Deep learning for Java
delimiters / Finding parts of text
dictionaries / Dictionaries and tolerant retrieval
dimensionality reduction / Dimensionality reduction
distributed stochastic neighbor embedding / Distributed stochastic neighbor embedding
DocumentCategorizerME
- text, classifying / Using DocumentCategorizerME to classify text
document frequency (df) / Inverse document frequency
DocumentPreprocessor class
- using / Using the DocumentPreprocessor class, Using the DocumentPreprocessor class

E

en-pos-maxent.bin model
- reference / Using the OpenNLP POSTaggerME class for POS taggers
encoding scheme / Why is NLP so hard?
endophora / Finding coreference resolution entities
EnglishStopTokenizerFactory class
- reference / Using LingPipe to remove stopwords
entities
- finding, Java's regular expressions used / Using Java's regular expressions to find entities
Environment for Developing KDD-Applications Supported by Index Structures (ELKI)
- reference / Deep learning for Java
ExactDictionaryChunker class
- using / Using the ExactDictionaryChunker class
extracted relationships
- using / Using extracted relationships

F

feature-engineering / Feature-engineering
Freebase
- reference / Relationship types

G

General Architecture for Text Engineering (GATE) / Survey of NLP tools, Using the MaxentTagger class to tag textese
- about / GATE
- references / GATE
Global Vectors for Word representation (GloVe)
- about / GloVe
- reference / GloVe
GrammarScope
- reference / Understanding parse trees
GrammaticalStructure class
- word dependencies, finding / Finding word dependencies using the GrammaticalStructure class

H

hash tables / Dictionaries and tolerant retrieval
Hidden Markov Models (HMM) / The tagging process
HmmDecoder class
- using, with Best_First tags / Using the HmmDecoder class with Best_First tags
- using, with NBest tags / Using the HmmDecoder class with NBest tags
- tag confidence, determining / Determining tag confidence with the HmmDecoder class

I

IndoEuropeanSentenceModel class
- using / Using the IndoEuropeanSentenceModel class
information extraction / Using extracted relationships
information grouping / Why use NLP?
information retrieval systems
- evaluation / Evaluation of information retrieval systems
inverse document frequency / Inverse document frequency
inverse document frequency (IDF) / TF-IDF weighting
inverted index / Finding people and things

J

Java's regular expressions
- used, for finding entities / Using Java's regular expressions to find entities
Java core tokenization
- performance considerations / Performance considerations with Java core tokenization
- reference / Performance considerations with Java core tokenization
Java patterns
- reference / Specifying the delimiter
Java tokenizers
- simple Java tokenizers / Simple Java tokenizers
- about / Simple Java tokenizers
- Scanner class, using / Using the Scanner class
- split method, using / Using the split method
- BreakIterator class, using / Using the BreakIterator class
- StreamTokenizer class, using / Using the StreamTokenizer class
- StringTokenizer class, using / Using the StringTokenizer class

L

language / What is tokenization?
language identification
- with LingPipe / Language identification using LingPipe
Latent Dirichlet Allocation (LDA)
- basics / The basics of LDA
- reference / The basics of LDA
Leipzig Corpora Collection
- reference / Language identification using LingPipe
lemma / Why is NLP so hard?
lemmatization
- about / Why is NLP so hard?, What is tokenization?
- using / Using lemmatization
- StanfordLemmatizer class, using / Using the StanfordLemmatizer class
- using, in OpenNLP / Using lemmatization in OpenNLP
LexicalizedParser class
- using / Using the LexicalizedParser class
LingPipe
- about / LingPipe
- references / LingPipe
- used, for removing stopwords / Using LingPipe to remove stopwords
- stemming with / Stemming with LingPipe
- HeuristicSentenceModel class, SBD rules / Understanding the SBD rules of LingPipe's HeuristicSentenceModel class
- using / Using LingPipe
- IndoEuropeanSentenceModel class / Using the IndoEuropeanSentenceModel class
- SentenceChunker class / Using the SentenceChunker class
- MedlineSentenceModel class / Using the MedlineSentenceModel class
- RegExChunker class, using of / Using the RegExChunker class of LingPipe
- used, for classifying text / Using LingPipe to classify text
- text, classifying / Classifying text using LingPipe
- sentiment analysis / Sentiment analysis using LingPipe
- language identification / Language identification using LingPipe
LingPipe for NER
- using / Using LingPipe for NER
- named entity models / Using LingPipe's named entity models
- ExactDictionaryChunker class / Using the ExactDictionaryChunker class
LingPipe POS taggers
- using / Using LingPipe POS taggers
- HmmDecoder class / Using the HmmDecoder class with Best_First tags
LingPipe tokenizers
- using / Using LingPipe tokenizers

M

machine translation / Why use NLP?
MALLET
- about / Topic modeling with MALLET
- download link / Topic modeling with MALLET
Massive Online Analysis (MOA)
- reference / Deep learning for Java
MaxentTagger class
- using / Using Stanford MaxentTagger
- used, for tagging textese / Using the MaxentTagger class to tag textese
MedlineSentenceModel class
- using / Using the MedlineSentenceModel class
model
- training / Training a model
- evaluating / Evaluating a model
morpheme / Why is NLP so hard?, Finding parts of text
morphology / Finding parts of text
MPQA Subjectivity Cues Lexicon
- reference / Understanding sentiment analysis
multiple cores
- using, with Stanford pipeline / Using multiple cores with the Stanford pipeline
Multipurpose Internet Mail Extensions (MIME) / Preparing data

N

n-grams / N-grams
Named Entity Recognition (NER)
- about / Why use NLP?, Deep learning for Java, Relationship types
- challenges / Why is NER difficult?
- techniques / Techniques for name recognition
Natural-Language Generation (NLG) / Why use NLP?
Natural Language Processing (NLP)
- about / What is NLP?
- need for / Why use NLP?
- significant problem areas / Why is NLP so hard?
Natural Language Understanding (NLU) / Chatbot architecture
NBest tags
- HmmDecoder class, using with / Using the HmmDecoder class with NBest tags
NER annotation tool
- dataset, building / Building a new dataset with the NER annotation tool
Neuroph
- reference / Deep learning for Java
NLP APIs
- using / Using NLP APIs, Using NLP APIs, Using the NLP APIs, Using NLP APIs
- OpenNLP / Using OpenNLP, Using OpenNLP
- Stanford API / Using the Stanford API, Using the Stanford API
- LingPipe / Using LingPipe
NLP models
- about / Understanding NLP models
- task, identifying / Identifying the task
- selecting / Selecting a model
- building / Building and training the model
- training / Building and training the model
- verifying / Verifying the model
- using / Using the model
NLP tokenizer APIs
- about / NLP tokenizer APIs
- OpenNLPTokenizer class / Using the OpenNLPTokenizer class
- Stanford tokenizer / Using the Stanford tokenizer
NLP tools
- survey / Survey of NLP tools
- Apache OpenNLP / Apache OpenNLP
- Stanford NLP / Stanford NLP
- LingPipe / LingPipe
- GATE / GATE
- Unstructured Information Management Architecture (UIMA) / UIMA
- Apache Lucene Core / Apache Lucene Core
normalization
- about / Understanding normalization
- text, converting to lowercase / Converting to lowercase
- stopwords, removing / Removing stopwords
- stemming, using / Using stemming
- lemmatization / Using lemmatization
- with pipeline / Normalizing using a pipeline
numbers / Finding parts of text

O

OpenNLP
- lemmatization, using / Using lemmatization in OpenNLP
- using / Using OpenNLP, Using OpenNLP
- SentenceDetectorME class / Using the SentenceDetectorME class
- sentPosDetect method / Using the sentPosDetect method
OpenNLP, for NER
- about / Using OpenNLP for NER
- accuracy of entity, determining / Determining the accuracy of the entity
- entity types, using / Using other entity types
- multiple entity types, processing / Processing multiple entity types
OpenNLP APIs
- used, for classifying text / Using APIs to classify text
OpenNLP chunking
- using / Using OpenNLP chunking
OpenNLP classification model
- training / Training an OpenNLP classification model
OpenNLP POSModel
- training / Training the OpenNLP POSModel
OpenNLP POSTaggerME class
- using, for POS taggers / Using the OpenNLP POSTaggerME class for POS taggers
OpenNLP POS taggers
- using / Using OpenNLP POS taggers
- POSTaggerME class / Using the OpenNLP POSTaggerME class for POS taggers
- POSDictionary class / Using the POSDictionary class
OpenNLPTokenizer class
- using / Using the OpenNLPTokenizer class
- SimpleTokenizer class /
- WhitespaceTokenizer class / Using the WhitespaceTokenizer class
- TokenizerME class / Using the TokenizerME class
open source APIs
- references / Survey of NLP tools
Organization for the Advancement of Structured Information Standards (OASIS) / UIMA

P

parse tree / Understanding parse trees
parsing
- dependency / Relationship types
- phrase structure / Relationship types
Parts-of-Speech tagging (POS) / Why use NLP?, Why is NLP so hard?
parts of speech
- in English / The tagging process
parts of text / Understanding the parts of text
PDFBox
- reference / Using PDFBox to extract text from PDF documents
- used, for extracting text from PDF documents / Using PDFBox to extract text from PDF documents
Penn Treebank
- reference / The tagging process
Penn Treebank 3 (PTB) tokenizer
- reference / Using the PTBTokenizer class
periods / What makes SBD difficult?
pipeline
- about / Survey of NLP tools, Pipelines
- creating, for text search / Creating a pipeline to search text
POI
- used, for extracting text from Word documents / Using POI to extract text from Word documents
Porter Stemmer
- reference / Using the Porter Stemmer
- using / Using the Porter Stemmer
POSDictionary class
- using / Using the POSDictionary class
- tag dictionary, obtaining for tagger / Obtaining the tag dictionary for a tagger
- word's tags, determining / Determining a word's tags
- word's tags, modifying / Changing a word's tags
- new tag dictionary, adding / Adding a new tag dictionary
- dictionary, creating from file / Creating a dictionary from a file
POS taggers
- significance / The importance of POS taggers
- OpenNLP POSTaggerME class, using for / Using the OpenNLP POSTaggerME class for POS taggers
POS tagging
- limitations / What makes POS difficult?
prefix / Finding parts of text
principal component analysis (PCA) / Dimensionality reduction, Principle component analysis
PTBTokenizer class
- using / Using the PTBTokenizer class, Using the PTBTokenizer class
- reference / Using the PTBTokenizer class
punctuation ambiguity / What makes SBD difficult?

R

RegExChunker class
- using, of LingPipe / Using the RegExChunker class of LingPipe
regular expressions
- about / Survey of NLP tools
- using / Using regular expressions
- using, for NER / Using regular expressions for NER
relationships
- types / Relationship types
- extracting / Extracting relationships
relationships, extracting for question-answer system
- about / Extracting relationships for a question-answer system
- word dependencies, finding / Finding the word dependencies
- question type, determining / Determining the question type
- answer, searching / Searching for the answer
Resource Description Framework (RDF)
- reference / Using extracted relationships
retrieval-based model / Chatbot architecture
rule-based taggers / The tagging process

S

SBD process
- about / The SBD process
- difficulty, reasons / What makes SBD difficult?
SBD rules
- of LingPipe's HeuristicSentenceModel class / Understanding the SBD rules of LingPipe's HeuristicSentenceModel class
Scanner class
- using / Using the Scanner class
- delimiter, specifying / Specifying the delimiter
- reference / Specifying the delimiter
scoring / Scoring and term weighting
searching / Why use NLP?
semantics / What is NLP?
sentence-detector model
- training / Training a sentence-detector model
- Trained model, using / Using the Trained model
sentence boundary disambiguation (SBD) / Finding sentences
SentenceChunker class
- using / Using the SentenceChunker class
SentenceDetectorEvaluator class
- model, evaluating / Evaluating the model using the SentenceDetectorEvaluator class
SentenceDetectorME class
- using / Using the SentenceDetectorME class
sentiment analysis
- about / Why use NLP?, How classification is used, Understanding sentiment analysis
- performing, Stanford pipeline used / Using the Stanford pipeline to perform sentiment analysis
- with LingPipe / Sentiment analysis using LingPipe
sentPosDetect method
- using / Using the sentPosDetect method
simple chatbot / Chatbot architecture
simple Java SBDs
- about / Simple Java SBDs
- regular expressions, using / Using regular expressions
- BreakIterator class, using / Using the BreakIterator class
SimpleTokenizer class
- using /
simple words / Finding parts of text
Soundex / Soundex
spamming / How classification is used
speech-recognition / Why use NLP?
spelling correction / Spelling correction
split method
- using / Using the split method
Stanford API
- using / Using the Stanford API, Using the Stanford API
- PTBTokenizer class / Using the PTBTokenizer class
- DocumentPreprocessor class / Using the DocumentPreprocessor class
- StanfordCoreNLP class / Using the StanfordCoreNLP class
- using, for classification / Using the Stanford API
- LexicalizedParser class / Using the LexicalizedParser class
- TreePrint class / Using the TreePrint class
Stanford API, for NER
- using / Using the Stanford API for NER
StanfordCoreNLP class
- using / Using the StanfordCoreNLP class
Stanford NLP
- about / Stanford NLP
- references / Stanford NLP
Stanford pipeline
- used, for performing tagging / Using the Stanford pipeline to perform tagging
- sentiment analysis, performing / Using the Stanford pipeline to perform sentiment analysis
- using / Using the Stanford pipeline
- multiple cores, using with / Using multiple cores with the Stanford pipeline
Stanford POS taggers
- using / Using Stanford POS taggers
- MaxentTagger / Using Stanford MaxentTagger
Stanford tokenizer
- using / Using the Stanford tokenizer
- PTBTokenizer class / Using the PTBTokenizer class
- DocumentPreprocessor class / Using the DocumentPreprocessor class
- pipeline, using / Using a pipeline
- LingPipe tokenizers / Using LingPipe tokenizers
stemmer
- about / Using stemming
- Porter Stemmer / Using the Porter Stemmer
stemming
- about / Why is NLP so hard?, What is tokenization?
- using / Using stemming
- with LingPipe / Stemming with LingPipe
stochastic gradient descent (SGD) / Word2vec
stochastic taggers / The tagging process
stopwords
- about / What is tokenization?
- reference / What is tokenization?
- removing / Removing stopwords
- removing, LingPipe used / Using LingPipe to remove stopwords
StopWords class
- creating / Creating a StopWords class
StreamTokenizer class
- using / Using the StreamTokenizer class
StringTokenizer class
- using / Using the StringTokenizer class
suffix / Finding parts of text
summarization / Why is NLP so hard?
summation / Why use NLP?
supervised machine learning (SML) / Text-classifying techniques
support vector machine (SVM) / Text-classifying techniques, Extracting relationships
synonyms / Finding parts of text
syntax / What is NLP?

T

t-distributed Stochastic Neighbor Embedding (t-SNE) / Distributed stochastic neighbor embedding
tag / The tagging process
tag cloud
- example / How classification is used
tag confidence
- determining, with HmmDecoder class / Determining tag confidence with the HmmDecoder class
tagging
- process / The tagging process
- performing, Stanford pipeline used / Using the Stanford pipeline to perform tagging
tag set / The tagging process
techniques, Named Entity Recognition (NER)
- lists / Lists and regular expressions
- regular expressions / Lists and regular expressions, Using regular expressions for NER
- statistical classifiers / Statistical classifiers
term frequency (TF) / TF-IDF weighting
term weighting / Scoring and term weighting
text
- converting, to lowercase / Converting to lowercase
- classifying, OpenNLP APIs used / Using OpenNLP
- classifying, DocumentCategorizerME used / Using DocumentCategorizerME to classify text
- classifying, LingPipe used / Using LingPipe to classify text, Classifying text using LingPipe
- training, Classified class used / Training text using the Classified class
text-classifying techniques / Text-classifying techniques
text-expansion / What is tokenization?
text-processing tasks
- overview / Overview of text-processing tasks
- parts of text, finding / Finding parts of text
- sentences, finding / Finding sentences
- feature-engineering / Feature-engineering
- people, finding / Finding people and things
- things, finding / Finding people and things
- parts of speech, detecting / Detecting parts of speech
- text, classifying / Classifying text and documents
- documents, classifying / Classifying text and documents
- relationships, extracting / Extracting relationships
- combined approaches, using / Using combined approaches
text analytics / Extracting relationships
textese
- tagging, MaxentTagger class used / Using the MaxentTagger class to tag textese
text extraction / Preparing data
text format / What is tokenization?
Text REtrieval Conference (TREC) / Evaluation of information retrieval systems
TF-IDF vectors / Word embedding
TF-IDF weighting / TF-IDF weighting
tokenization / Why is NLP so hard?, Finding parts of text, What is tokenization?
tokenization process
- language / What is tokenization?
- text format / What is tokenization?
- stop words / What is tokenization?
- text-expansion / What is tokenization?
- case / What is tokenization?
- stemming / What is tokenization?
- lemmatization / What is tokenization?
TokenizerME class
- using / Using the TokenizerME class
tokenizers
- uses / Uses of tokenizers
- simple Java tokenizers / Simple Java tokenizers
- training, to find parts of text / Training a tokenizer to find parts of text
- comparing / Comparing tokenizers
tokens / Why is NLP so hard?, What makes POS difficult?
tolerant retrieval / Dictionaries and tolerant retrieval
tools, deep learning
- Deeplearning4J / Deep learning for Java
- Weka / Deep learning for Java
- Massive Online Analysis (MOA) / Deep learning for Java
- Environment for Developing KDD-Applications Supported by Index Structures (ELKI) / Deep learning for Java
- Neuroph / Deep learning for Java
- Aerosolve / Deep learning for Java
topic modeling / What is topic modeling?
topic modeling, with MALLET
- about / Topic modeling with MALLET
- training / Training
- evaluation / Evaluation
training categories / Using other training categories
Treebank / Finding word dependencies using the GrammaticalStructure class
TreePrint class
- using / Using the TreePrint class
trees / Dictionaries and tolerant retrieval
TwitIE
- about / GATE
- references / GATE

U

Unstructured Information Management Architecture (UIMA) / UIMA
- references / UIMA

V

vector space model / Vector space model

W

Weka
- reference / Deep learning for Java
whitespace / What is tokenization?
WhitespaceTokenizer class
- using / Using the WhitespaceTokenizer class
wildcard queries / Wildcard queries
word-sense disambiguation (WSD) / Why is NLP so hard?
word2vec
- about / Word2vec
- reference / Word2vec
word dependencies
- finding, GrammaticalStructure class used / Finding word dependencies using the GrammaticalStructure class
word embedding / Word embedding
WordNet thesaurus
- reference / Relationship types

X

XMLBeans
- reference / Using POI to extract text from Word documents