Natural Language Processing with Java

Natural Language Processing with Java - Second Edition

By : Richard M. Reese

Buy this Book

Natural Language Processing with Java - Second Edition

By: Richard M. Reese

Buy this Book

Overview of this book

Natural Language Processing (NLP) allows you to take any sentence and identify patterns, special names, company names, and more. The second edition of Natural Language Processing with Java teaches you how to perform language analysis with the help of Java libraries, while constantly gaining insights from the outcomes. You’ll start by understanding how NLP and its various concepts work. Having got to grips with the basics, you’ll explore important tools and libraries in Java for NLP, such as CoreNLP, OpenNLP, Neuroph, and Mallet. You’ll then start performing NLP on different inputs and tasks, such as tokenization, model training, parts-of-speech and parsing trees. You’ll learn about statistical machine translation, summarization, dialog systems, complex searches, supervised and unsupervised NLP, and more. By the end of this book, you’ll have learned more about NLP, neural networks, and various other trained models in Java for enhancing the performance of NLP applications.

Title Page

Dedication

Packt Upsell

Contributors

Preface

Free Chapter

Introduction to NLP

Deep learning for Java

Overview of text-processing tasks

Understanding NLP models

Preparing data

Summary

Finding Parts of Text

Understanding the parts of text

What is tokenization?

Simple Java tokenizers

NLP tokenizer APIs

Understanding normalization

Summary

Finding Sentences

The SBD process

What makes SBD difficult?

Understanding the SBD rules of LingPipe's HeuristicSentenceModel class

Simple Java SBDs

Using NLP APIs

Training a sentence-detector model

Summary

Finding People and Things

Why is NER difficult?

Techniques for name recognition

Using regular expressions for NER

Using NLP APIs

Building a new dataset with the NER annotation tool

Training a model

Summary

Detecting Part of Speech

The tagging process

Using the NLP APIs

Summary

Representing Text with Features

N-grams

Word embedding

GloVe

Word2vec

Dimensionality reduction

Principle component analysis

Distributed stochastic neighbor embedding

Summary

Information Retrieval

Boolean retrieval

Dictionaries and tolerant retrieval

Vector space model

Scoring and term weighting

Inverse document frequency

TF-IDF weighting

Evaluation of information retrieval systems

Summary

Classifying Texts and Documents

How classification is used

Understanding sentiment analysis

Text-classifying techniques

Using APIs to classify text

Summary

Topic Modeling

What is topic modeling?

The basics of LDA

Topic modeling with MALLET

Summary

Using Parsers to Extract Relationships

Relationship types

Understanding parse trees

Using extracted relationships

Extracting relationships

Using NLP APIs

Extracting relationships for a question-answer system

Summary

Combined Pipeline

Preparing data

Using boilerpipe to extract text from HTML

Using POI to extract text from Word documents

Using PDFBox to extract text from PDF documents

Using Apache Tika for content analysis and extraction

Pipelines

Using the Stanford pipeline

Using multiple cores with the Stanford pipeline

Creating a pipeline to search text

Summary

Creating a Chatbot

Chatbot architecture

Artificial Linguistic Internet Computer Entity

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Preparing data

Text extraction is the primary phase for any NLP tasks you want to undertake. If given a blog post, we want to extract the content of the blog and want to find the title of the post, author of the post, date when the post is published, text or content of the post, media-like images, videos in the post, and links to other posts, if any. Text extraction includes the following:

Structuring so as to identify different fields, blocks of contents, and so on
Determining the language of the document
Finding the sentences, paragraphs, phrases, and quotes
Breaking the text in tokens so as to process it further
Normalization and tagging
Lemmatization and stemming so as to reduce the variations and come close to root words

It also helps in topic modeling, which we have covered in Chapter 9, Topic Modeling. Here, we will quickly cover how text extraction can be performed for HTML, Word, and PDF documents. Although there are several APIs that support these tasks, we will use the following:

Boilerpipe...

Natural Language Processing with Java - Second Edition

By : Richard M. Reese

Natural Language Processing with Java - Second Edition

By: Richard M. Reese

Overview of this book

Related Content you might be interested in

Current Title:

Natural Language Processing with Java - Second Edition

Natural Language Processing with Java Cookbook

Java for Data Science

Java Data Science Cookbook

Preparing data