Book Image

Natural Language Processing with Java - Second Edition

By : Richard M. Reese
Book Image

Natural Language Processing with Java - Second Edition

By: Richard M. Reese

Overview of this book

Natural Language Processing (NLP) allows you to take any sentence and identify patterns, special names, company names, and more. The second edition of Natural Language Processing with Java teaches you how to perform language analysis with the help of Java libraries, while constantly gaining insights from the outcomes. You’ll start by understanding how NLP and its various concepts work. Having got to grips with the basics, you’ll explore important tools and libraries in Java for NLP, such as CoreNLP, OpenNLP, Neuroph, and Mallet. You’ll then start performing NLP on different inputs and tasks, such as tokenization, model training, parts-of-speech and parsing trees. You’ll learn about statistical machine translation, summarization, dialog systems, complex searches, supervised and unsupervised NLP, and more. By the end of this book, you’ll have learned more about NLP, neural networks, and various other trained models in Java for enhancing the performance of NLP applications.
Table of Contents (19 chapters)
Title Page
Dedication
Packt Upsell
Contributors
Preface
Index

Preface

Natural Language Processing (NLP) allows you to take any sentence and identify patterns, special names, company names, and more. The second edition of Natural Language Processing with Java teaches you how to perform language analysis with the help of Java libraries, while constantly gaining insights from the outcomes.

You'll start by understanding how NLP and its various concepts work. Having got to grips with the basics, you'll explore important tools and libraries in Java for NLP, such as CoreNLP, OpenNLP, Neuroph, Mallet, and more. You'll then start performing NLP on different inputs and tasks, such as tokenization, model training, parts of speech, parsing trees, and more. You'll learn about statistical machine translation, summarization, dialog systems, complex searches, supervised and unsupervised NLP, and other things. By the end of this book, you'll have learned more about NLP, neural networks, and various other trained models in Java for enhancing the performance of NLP applications.

Who this book is for

Natural Language Processing with Java is for you if you are a data analyst, data scientist, or machine learning engineer who wants to extract information from a language using Java. Knowledge of Java programming is needed, while a basic understanding of statistics will be useful, but is not mandatory.

What this book covers

Chapter 1, Introduction to NLP, explains the importance and uses of NLP. The NLP techniques used in this chapter are explained with simple examples illustrating their use.

Chapter 2, Finding Parts of Text, focuses primarily on tokenization. This is the first step in more advanced NLP tasks. Both core Java and Java NLP tokenization APIs are illustrated.

Chapter 3, Finding Sentences, proves that sentence boundary disambiguation is an important NLP task. This step is a precursor for many other downstream NLP tasks in which text elements should not be split across sentence boundaries. This includes ensuring that all phrases are in one sentence and supporting Parts-of-Speech analysis.

Chapter 4, Finding People and Things, covers what is commonly referred to as Named Entity Recognition (NER). This task is concerned with identifying people, places, and similar entities in text. This technique is a preliminary step for processing queries and searches.

Chapter 5, Detecting Parts of Speech, shows you how to detect Parts-of -Speech, which are grammatical elements of text, such as nouns and verbs. Identifying these elements is a significant step in determining the meaning of text and detecting relationships within text.

Chapter 6, Representing Text with Features, explains how text is presented using N-grams and outlines role they play in revealing the context.

Chapter 7, Information Retrieval, deals with processing the huge amount of data uncovered in information retrieval and finding the relevant information using various approaches, such as Boolean retrieval, dictionaries, and tolerant retrieval.

Chapter 8, Classifying Texts and Documents, proves that classifying text is useful for tasks such as spam detection and sentiment analysis. The NLP techniques that support this process are investigated and illustrated.

Chapter 9, Topic Modeling, discusses the basics of topic modeling using a document that contains some text.

Chapter 10, Using Parsers to Extract Relationships, demonstrates parse trees. A parse tree is used for many purposes, including information extraction. It holds information regarding the relationships between these elements. An example implementing a simple query is presented to illustrate this process.

Chapter 11, Combined Pipeline, addresses several issues surrounding the use of combinations of techniques that solve NLP problems.

Chapter 12, Creating a ChatBot, looks at different types of chatbot, and we will be developing a simple appointment-booking chatbot too.

To get the most out of this book

Java SDK 8 is used to illustrate the NLP techniques. Various NLP APIs are needed and can be readily downloaded. An IDE is not required but is desirable.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packtpub.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Natural-Language-Processing-with-Java-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/NaturalLanguageProcessingwithJavaSecondEdition_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "To process the text, we will use the theSentence variable as input to Annotator."

A block of code is set as follows:

System.out.println(tagger.tagString("AFAIK she H8 cth!")); 
System.out.println(tagger.tagString( 
    "BTW had a GR8 tym at the party BBIAM."));

Any command-line input or output is written as follows:

mallet-2.0.6$ bin/mallet import-dir --input sample-data/web/en --output tutorial.mallet --keep-sequence --remove-stopwords

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.