Index
A
- active learning
- addInterceptFeature Boolean / Getting ready
- AGPL
- AnnealingSchedule / Getting ready
- annotated data
- parsing / Parsing annotated data
- annotation
- about / Annotation, How to do it...
- working / How it works…
- AutoCompleter class
- URL / See also
- automatic phrase completion
- about / Automatic phrase completion
- working / How to do it..., How it works...
B
- background model
- running, on tweets / How to do it..., How it works...
- BaseClassifier<E> classifier interface / How it works...
- baseline
- with cross validation, establishing / Establishing a baseline with cross validation and metrics
- with cross metrics, establishing / Establishing a baseline with cross validation and metrics
- batch process life cycle
- about / The batch process life cycle
- entity universe file, setting up / Setting up the entity universe
- processDocument() method / ProcessDocuments() and ProcessDocument()
- processDocuments() method / ProcessDocuments() and ProcessDocument()
- XDoc, computing / Computing XDoc
- promote() method / The promote() method
- createEntitySpeculative() method / The createEntitySpeculative() method
- XDocCoref.addMentionChainToEntity() entity / The XDocCoref.addMentionChainToEntity() entity
- XDocCoref.resolveMentionChain() entity / The XDocCoref.resolveMentionChain() entity
- resolveCandidates() method / The resolveCandidates() method
- Begin, In, and Out (BIO) tags / Translating between word tagging and chunks – BIO codec
- books on similar topics
- about / Getting ready
- background model / Getting ready
- foreground model / Getting ready
- interesting phrases / Getting ready
- Brown Corpus
- URL / How to do it...
C
- .csv file
- classifier, applying to / Applying a classifier to a .csv file, How it works…
- case-restoring spell corrector
- about / The case restoring corrector
- working / How it works...
- ChainCrfFeatureExtractor interface / SimpleCrfFeatureExtractor
- character stream
- CharacterTokenizerFactory
- CharLmRescoringChunker
- URL / See also
- checkTokens() method / How to do it...
- checkTokensAndWhiteSpaces() method / How to do it...
- Chinese word segmentation tutorial
- URL / See also
- classifier
- deserializing / Deserializing and running a classifier, How to do it..., How it works...
- running / Deserializing and running a classifier, How to do it..., How it works...
- confidence estimates, obtaining from / Getting confidence estimates from a classifier, Getting ready, How to do it…, How it works…
- applying, to .csv file / Applying a classifier to a .csv file, How it works…
- evaluating / Evaluation of classifiers – the confusion matrix, How to do it..., How it works...
- training, with cross validation / How to train and evaluate with cross validation, How it works…, There's more…
- evaluating, with cross validation / How to train and evaluate with cross validation, How it works…, There's more…
- about / A simple classifier
- working / How it works...
- thresholding / Thresholding classifiers, How to do it..., How it works…
- classifier-building life cycle
- about / Classifier-building life cycle, How to do it…
- training data, testing on / Sanity check – test on training data
- baseline with cross validation, establishing / Establishing a baseline with cross validation and metrics
- baseline with metrics, establishing / Establishing a baseline with cross validation and metrics
- single metric, selecting / Picking a single metric to optimize against
- evaluation metric, implementing / Implementing the evaluation metric
- classifier.bestCategory() method / Thresholding classifiers
- ClearTK
- about / Projects similar to LingPipe
- cluster
- URL / Getting ready
- clusterer
- clustering
- about / Single-link and complete-link clustering using edit distance
- URL, for tutorial / Getting ready
- CogNIAC
- about / See also
- competitors, LingPipe
- NLTK / Projects similar to LingPipe
- OpenNLP / Projects similar to LingPipe
- JavaNLP / Projects similar to LingPipe
- ClearTK / Projects similar to LingPipe
- DkPro / Projects similar to LingPipe
- GATE / Projects similar to LingPipe
- Learning Based Java (LBJ) / Projects similar to LingPipe
- Mallet / Projects similar to LingPipe
- Vowpal Wabbit / Projects similar to LingPipe
- SVM / Projects similar to LingPipe
- CompiledSpellChecker / There's more...
- complete-link clustering
- with edit distance / Single-link and complete-link clustering using edit distance, How to do it…
- CompliledSpellChecker class / How it works...
- Computational Natural Language Learning (CoNLL) / Getting ready
- confidence-based tagging
- about / Confidence-based tagging, How it works…
- confidence estimates
- obtaining, from classifier / Getting confidence estimates from a classifier, Getting ready, How to do it…, How it works…
- confusion matrix
- about / How to do it...
- coreference
- about / Introduction
- pronouns, adding to / Adding pronouns to coreference, How to do it…, How it works…
- CRFs
- about / Conditional random fields (CRF) for word/token tagging
- used, for word/token tagging / Conditional random fields (CRF) for word/token tagging, How to do it..., How it works…
- URL / Conditional random fields (CRF) for word/token tagging, CRFs for chunking
- modifying / Modifying CRFs, How it works…
- candidate-edge features / Candidate-edge features
- node features / Node features
- used, for named entity recognition / NER using CRFs with better features, How it works…
- CRFs, for chunking / CRFs for chunking, Getting ready, How to do it..., How it works…
- cross-document coreference (XDoc)
- overview / Cross-document coreference, How to do it..., How it works…
- cross validation
- classifier, training with / How to train and evaluate with cross validation, How it works…, There's more…
- classifier, evaluating with / How to train and evaluate with cross validation, How it works…, There's more…
- about / How to train and evaluate with cross validation
D
- Damerau-Levenstein distance
- data
- obtaining, from Twitter API / Getting data from the Twitter API, How to do it..., How it works...
- database (DB) / Cross-document coreference
- dictionary-based chunking, NER / Dictionary-based chunking for NER, How it works…
- distance
- URL / See also
- distance metrics
- DkPro
- about / Projects similar to LingPipe
E
- early stopping / Annealing schedule and epochs
- editable operations
- Deletion / Distance and proximity – simple edit distance
- Insertion / Distance and proximity – simple edit distance
- Substitution / Distance and proximity – simple edit distance
- Transposition / Distance and proximity – simple edit distance
- edit distance
- using, for spelling correction / Using edit distance and language models for spelling correction, How it works...
- used, for complete-link clustering / Single-link and complete-link clustering using edit distance, How to do it…
- used, for single-link clustering / Single-link and complete-link clustering using edit distance, How to do it…
- EditDistance class
- about / How it works...
- embedded chunks
- marking, in string / Marking embedded chunks in a string – sentence chunk example, How to do it...
- entity-based summarization
- error categories
- evaluation metric
- implementing / Implementing the evaluation metric
- evaluations
- URL, for tutorial / See also…
- eXtensible Business Reporting Language (XBRL) / Topic detection
- Externalizable / How it works...
F
- feature extraction
- tuning / Tuning feature extraction
- customizing / Customizing feature extraction, How to do it…, There's more…
- feature extractors
- about / Feature extractors, How to do it..., How it works…
- combining / Combining feature extractors, There's more…
- filtered tokenizers, Javadoc page
- URL / See also
- foreground model
- about / Foreground- or background-driven interesting phrase detection
- running, on tweets / How to do it..., How it works...
G
- GATE
- about / Projects similar to LingPipe
- gold standard data / Getting ready
H
- handle() method / There's more..., How it works...
- hidden Marcov models (HMM) / How to serialize a LingPipe object – classifier example
- Hidden Markov Models (HMM)
- hierarchical clusterer
- HMM-based NER
- overview / HMM-based NER, Getting ready, How to do it…, How it works…
- HmmChunker / HMM-based NER
- URL / See also
I
- incrementToken() method / How it works...
- IndoEuropeanTokenizerFactory
- Infrequently Asked Questions (IAQs)
- about / Question answering
- installation, LingPipe / LingPipe and its installation
- interesting phrases
- detecting, from small dataset / Interesting phrase detection, How to do it..., How it works...
- interfaces
- BaseClassifier<E> / Getting confidence estimates from a classifier
- RankedClassifier<E> extends BaseClassifer<E> / Getting confidence estimates from a classifier
- ScoredClassifier<E> extends RankedClassifier<E> / Getting confidence estimates from a classifier
- ConditionalClassifier<E> extends RankedClassifier<E> / Getting confidence estimates from a classifier
- JointClassifier<E> extends ConditionalClassifier<E> / Getting confidence estimates from a classifier
- inverse document frequency (IDF) / The Tf-Idf distance
- issues, as classification problem
- about / Common problems as a classification problem
- topic detection / Topic detection
- question answering / Question answering
- degree of sentiment / Degree of sentiment
- non-exclusive category classification / Non-exclusive category classification
- person/company/location detection / Person/company/location detection
J
- Jaccard distance
- near duplicates, eliminating with / Eliminate near duplicates with the Jaccard distance, How it works…
- about / The Jaccard distance
- JaccardDistance example
- running / How to do it..., How it works...
- Japanese classifier
- URL / How to do it...
- Javadoc, for CompiledSpellChecker
- URL / See also
- JavaNLP
- about / Projects similar to LingPipe
- John Smith problem
- overview / The John Smith problem, Getting ready, How to do it...
K
- K-means (++) clustering
- about / There's more…
- K-means clustering
- about / There's more…
L
- language model (LM)
- properties / Getting ready
- URL / See also
- language model classifier
- training / Getting ready, How it works...
- with tokens / Language model classifier with tokens, There's more...
- language models
- using, for spelling correction / Using edit distance and language models for spelling correction, How it works...
- Latent Dirichlet allocation (LDA)
- about / Latent Dirichlet allocation (LDA) for multitopic clustering
- for multitopic clustering / Latent Dirichlet allocation (LDA) for multitopic clustering, How to do it…
- LingPipe
- installing / LingPipe and its installation
- about / Projects similar to LingPipe
- advantages / So, why use LingPipe?
- book code, downloading / Downloading the book code and data
- data, downloading / Downloading the book code and data
- downloading / Downloading LingPipe
- URL, for downloading / Downloading LingPipe
- Lucene tokenizers, using with / Using Lucene/Solr tokenizers with LingPipe, How to do it..., How it works...
- Solr tokenizers, using with / Using Lucene/Solr tokenizers with LingPipe, How to do it..., How it works...
- LingPipe 1.0
- about / LingPipe and its installation
- LingPipe Javadoc, EditDistance
- URL / See also
- LingPipe object
- linguistic tuning
- about / Linguistic tuning, How to do it…
- LMClassifier / Getting ready
- logistic regression
- about / Logistic regression, Getting ready, How to do it...
- URL / Logistic regression
- working / How logistic regression works, How it works…
- parameters, tuning in / Tuning parameters in logistic regression, How to do it...
- feature extraction, tuning / Tuning feature extraction
- priors / Priors
- annealing schedule and epochs / Annealing schedule and epochs
- lowercase tokenizer / Combining tokenizers – lowercase tokenizer, How to do it..., How it works...
- LowerCaseTokenizerFactory
- about / How it works...
- Lucene
- URL / Getting ready
- Lucene tokenizers
- using / Using Lucene/Solr tokenizers, How it works...
- using, with LingPipe / Using Lucene/Solr tokenizers with LingPipe, How to do it..., How it works...
M
- Mallet
- about / Projects similar to LingPipe
- MarginalTaggerEvaluator
- about / There's more…
- maximum entropy / Logistic regression
- MedlineSentenceModel / There's more...
- minImprovement parameter / Getting ready
- MUC-6 / Introduction to tokenizer factories – finding words in a character stream
- multithreaded cross validation
- about / Multithreaded cross validation
- working / How it works…
- MultivariateDistribution / Getting ready
N
- N-best word tagging
- about / N-best word tagging, How it works...
- Naive Bayes
- about / Naïve Bayes, Getting ready
- features / Naïve Bayes
- URL / See also
- expectation maximization tutorial, URL / See also
- named entity coreference, document / Named entity coreference with a document, How it works…
- named entity recognition
- with CRFs / NER using CRFs with better features, How it works…
- NBestTaggerEvaluator
- about / There's more…
- near duplicates
- eliminating, with Jaccard distance / Eliminate near duplicates with the Jaccard distance, How it works…
- NER
- about / Regular expression-based chunking for NER
- regular expression-based chunking / How to do it…, How it works…
- dictionary-based chunking / Dictionary-based chunking for NER, How it works…
- NER sources
- mixing / Mixing the NER sources, How it works…
- nested sentences
- about / Nested sentences
- nextToken() method / How it works...
- nextWhitespace() method / How it works...
- NLP
- about / Introduction
- NLTK
- about / Projects similar to LingPipe
- noun phrases (NP)
- about / Simple noun phrases and verb phrases
- finding / How it works…
O
- OpenNLP
- about / Projects similar to LingPipe
P
- paragraph detection
- overview / Paragraph detection, How to do it...
- parameters
- tuning, in logistic regression / Tuning parameters in logistic regression, How to do it...
- part-of-speech (POS) / Hidden Markov Models (HMM) – part-of-speech
- precision
- about / Understanding precision and recall
- scenarios / Understanding precision and recall
- priors
- about / Priors
- Project Gutenberg
- URL / Getting ready
- pronouns
- adding, to coreference / Adding pronouns to coreference, How to do it…, How it works…
- proximity
- URL / See also
R
- recall
- about / Understanding precision and recall
- scenarios / Understanding precision and recall
- regular expression, e-mail address match
- URL / See also
- regular expression-based chunking, NER / How to do it…, How it works…
S
- sentence chunk
- sentence detection
- overview / Sentence detection, How it works...
- evaluating / Evaluation of sentence detection, How it works...
- tuning / Tuning sentence detection, There's more...
- sentence detector
- about / How it works...
- sentiment
- classifying / How to classify sentiment – simple version, How to do it…, There's more…
- Serializable / How it works...
- short circuits / Computing XDoc
- SimpleCrfFeatureExtractor
- about / SimpleCrfFeatureExtractor
- simple edit distance
- about / Distance and proximity – simple edit distance
- example / How to do it..., How it works...
- single-link clustering
- with edit distance / Single-link and complete-link clustering using edit distance, How to do it…
- single metric
- selecting / Picking a single metric to optimize against
- smoothing / Getting ready
- Solr tokenizers
- using / Using Lucene/Solr tokenizers, How it works...
- using, with LingPipe / Using Lucene/Solr tokenizers with LingPipe, How to do it..., How it works...
- SortedSet<ScoredObject<String()>> collocation / How it works...
- spell checking
- running / How to do it..., How it works...
- spelling-correction tutorial
- URL / See also
- spelling correction
- about / Using edit distance and language models for spelling correction
- edit distance, using for / Using edit distance and language models for spelling correction, How it works...
- language models, using for / Using edit distance and language models for spelling correction, How it works...
- statistically improbable phrases (SIP)
- StopTokenizerFactory filter
- about / How it works...
- stop word tokenizers / Combining tokenizers – stop word tokenizers, How it works...
- string
- embedded chunks, marking in / Marking embedded chunks in a string – sentence chunk example, How to do it...
- string comparison
- supervised trainings
- versus unsupervised trainings / Difference between supervised and unsupervised trainings
- SVM
- about / Projects similar to LingPipe
T
- T&T specification, edit distance
- URL / See also
- tag clouds
- tagging evaluation
- URL / Word-tagging evaluation
- term frequency (TF) / The Tf-Idf distance
- Tf-Idf distance
- about / The Tf-Idf distance
- working / How it works...
- time-separated Twitter data
- about / Getting ready
- background model / Getting ready
- foreground model / Getting ready
- interesting phrases / Getting ready
- token-based language model
- issues / There's more...
- tokenization
- about / There's more…
- tokenize() method / Introduction to tokenizer factories – finding words in a character stream
- tokenized language model
- about / How it works...
- URL / How it works...
- tokenizer() method / How it works...
- tokenizer.nextToken() method / How it works...
- tokenizer factories
- TokenizerFactory instance / Introduction to tokenizer factories – finding words in a character stream
- TokenizerFactory interface / Introduction to tokenizer factories – finding words in a character stream
- Tokenizer object / Introduction to tokenizer factories – finding words in a character stream
- tokenizers
- customizing / Combining tokenizers – lowercase tokenizer, How it works...
- combining / Combining tokenizers – stop word tokenizers, How it works...
- evaluating, with unit tests / Evaluating tokenizers with unit tests, How to do it...
- tokens
- language model classifier with / Language model classifier with tokens, There's more...
- possible stops / There's more...
- impossible penultimates / There's more...
- impossible starts / There's more...
- topic-separated Twitter data
- background model / Getting ready
- foreground model / Getting ready
- interesting phrases / Getting ready
- topic pages
- toString() method / There's more...
- train() method / There's more...
- truecasing
- reference link / See also
- tuning
- about / There's more...
- tuning feature extraction
- about / Tuning feature extraction
- Twitter
- URL / Getting ready
- twitter4j
- URL, for documentation / See also
- Twitter API
- data, obtaining from / Getting data from the Twitter API, How to do it..., How it works...
- URL, for documentation / See also
U
- unit tests
- tokenizers, evaluating with / Evaluating tokenizers with unit tests, How to do it...
- unsupervised trainings
- versus supervised trainings / Difference between supervised and unsupervised trainings
V
- verb phrases (VP)
- about / Simple noun phrases and verb phrases
- finding / How it works…
- Vowpal Wabbit
- about / Projects similar to LingPipe
W
- weighted edit distance
- about / Weighted edit distance
- working / How to do it..., How it works...
- WeightedEditDistance class / How it works...
- URL / Weighted edit distance, See also
- word-tagging evaluation
- words
- finding, in character stream / Introduction to tokenizer factories – finding words in a character stream, How to do it..., How it works...
- words, for languages
- finding, without white spaces / Finding words for languages without white spaces, How to do it..., How it works...
- word tagging
- training / Training word tagging, How to do it..., How it works…, There's more…
- word tagging, and chunks
- translating between / Translating between word tagging and chunks – BIO codec, How to do it…, How it works…