Book Image

Natural Language Processing with Java Cookbook

By : Richard M. Reese
Book Image

Natural Language Processing with Java Cookbook

By: Richard M. Reese

Overview of this book

Natural Language Processing (NLP) has become one of the prime technologies for processing very large amounts of unstructured data from disparate information sources. This book includes a wide set of recipes and quick methods that solve challenges in text syntax, semantics, and speech tasks. At the beginning of the book, you'll learn important NLP techniques, such as identifying parts of speech, tagging words, and analyzing word semantics. You will learn how to perform lexical analysis and use machine learning techniques to speed up NLP operations. With independent recipes, you will explore techniques for customizing your existing NLP engines/models using Java libraries such as OpenNLP and the Stanford NLP library. You will also learn how to use NLP processing features from cloud-based sources, including Google and Amazon Web Services (AWS). You will master core tasks, such as stemming, lemmatization, part-of-speech tagging, and named entity recognition. You will also learn about sentiment analysis, semantic text similarity, language identification, machine translation, and text summarization. By the end of this book, you will be ready to become a professional NLP expert using a problem-solution approach to analyze any sort of text, sentence, or semantic word.
Table of Contents (14 chapters)

To get the most out of this book

The reader should be proficient with Java in order to understand and use many of the APIs covered in this book. The recipes used here are presented as Eclipse projects. Familiarity with Eclipse is not an absolute requirement, but will speed up the learning process. While it is possible to use another IDE, the recipes are written using Eclipse.

Most, but not all, of the recipes use Maven to import the necessary API libraries for the recipes. A basic understanding of how to use a POM file is useful. In some recipes, we will directly import JAR files into a project when they are not available in a Maven repository. In these situations, instructions for Eclipse will be provided.

In the last chapter, Chapter 11, Creating a Chatbot, we will be using the AWS Toolkit for Eclipse. This can easily be installed in most IDEs. For a few chapters, we will be using GCP and various Amazon AWS libraries. The reader will need to establish accounts on these platforms, which are free as long as certain usage quotas are not exceeded.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Natural-Language-Processing-with-Java-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Start by adding the following import statement to your project's class."

A block of code is set as follows:

while (scanner.hasNext()) {
String token = scanner.next();
list.add(token);
}

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select the Create Intent button."

Warnings or important notes appear like this.
Tips and tricks appear like this.