Book Image

Natural Language Processing with Java - Second Edition

By : Richard M. Reese
Book Image

Natural Language Processing with Java - Second Edition

By: Richard M. Reese

Overview of this book

Natural Language Processing (NLP) allows you to take any sentence and identify patterns, special names, company names, and more. The second edition of Natural Language Processing with Java teaches you how to perform language analysis with the help of Java libraries, while constantly gaining insights from the outcomes. You’ll start by understanding how NLP and its various concepts work. Having got to grips with the basics, you’ll explore important tools and libraries in Java for NLP, such as CoreNLP, OpenNLP, Neuroph, and Mallet. You’ll then start performing NLP on different inputs and tasks, such as tokenization, model training, parts-of-speech and parsing trees. You’ll learn about statistical machine translation, summarization, dialog systems, complex searches, supervised and unsupervised NLP, and more. By the end of this book, you’ll have learned more about NLP, neural networks, and various other trained models in Java for enhancing the performance of NLP applications.
Table of Contents (19 chapters)
Title Page
Dedication
Packt Upsell
Contributors
Preface
Index

Chapter 1. Introduction to NLP

Natural Language Processing (NLP) is a broad topic focused on the use of computers to analyze natural languages. It addresses areas such as speech processing, relationship extraction, document categorization, and summation of text. However, these types of analyses are based on a set of fundamental techniques, such as tokenization, sentence detection, classification, and extracting relationships. These basic techniques are the focus of this book. We will start with a detailed discussion of NLP, investigate why it is important, and identify application areas.

There are many tools available that support NLP tasks. We will focus on the Java language and how various Java Application Programmer Interfaces (APIs) support NLP. In this chapter, we will briefly identify the major APIs, including Apache's OpenNLP, Stanford NLP libraries, LingPipe, and GATE.

This is followed by a discussion of the basic NLP techniques illustrated in this book. The nature and use of these techniques is presented and illustrated using one of the NLP APIs. Many of these techniques will use models. Models are similar to a set of rules that are used to perform a task such as tokenizing text. They are typically represented by a class that is instantiated from a file. We'll round off the chapter with a brief discussion on how data can be prepared to support NLP tasks.

NLP is not easy. While some problems can be solved relatively easily, there are many others that require the use of sophisticated techniques. We will strive to provide a foundation for NLP-processing so that you will be able to better understand which techniques are available for and applicable to a given problem.

NLP is a large and complex field. In this book, we will only be able to address a small part of it. We will focus on core NLP tasks that can be implemented using Java. Throughout this book, we will demonstrate a number of NLP techniques using both the Java SE SDK and other libraries, such as OpenNLP and Stanford NLP. To use these libraries, there are specific API JAR files that need to be associated with the project in which they are being used. A discussion of these libraries is found in the Survey of NLP tools section and contains download links to the libraries. The examples in this book were developed using NetBeans 8.0.2. These projects require the API JAR files to be added to the Libraries category of the Projects Properties dialog box.

In this chapter, we will learn about the following topics:

  • What is NLP?
  • Why use NLP?
  • Why is NLP so hard?
  • Survey of NLP tools
  • Deep learning for Java
  • Overview of text-processing tasks
  • Understanding NLP models
  • Preparing data