Book Image

Mastering Concurrency Programming with Java 9 - Second Edition

By : Javier Fernández González
Book Image

Mastering Concurrency Programming with Java 9 - Second Edition

By: Javier Fernández González

Overview of this book

Concurrency programming allows several large tasks to be divided into smaller sub-tasks, which are further processed as individual tasks that run in parallel. Java 9 includes a comprehensive API with lots of ready-to-use components for easily implementing powerful concurrency applications, but with high flexibility so you can adapt these components to your needs. The book starts with a full description of the design principles of concurrent applications and explains how to parallelize a sequential algorithm. You will then be introduced to Threads and Runnables, which are an integral part of Java 9's concurrency API. You will see how to use all the components of the Java concurrency API, from the basics to the most advanced techniques, and will implement them in powerful real-world concurrency applications. The book ends with a detailed description of the tools and techniques you can use to test a concurrent Java application, along with a brief insight into other concurrency mechanisms in JVM.
Table of Contents (21 chapters)
Title Page
About the Author
About the Reviewer
Customer Feedback

First example - a keyword extraction algorithm

In this section, you are going to use a phaser to implement a keyword extraction algorithm. The main purpose of these kinds of algorithms is to extract the words from a text document or a collection of documents, which define the document or the document inside the collection, better. These terms can be used to summarize the documents, cluster them, or to improve the information search process.

The most basic algorithm to extract the keywords of the documents in a collection (but it's still commonly used nowadays) is based on the TF-IDF measure where:

  • Term Frequency (TF) is the number of times that a d appears in a document.
  • Document Frequency (DF) is the number of documents that contain a word. The Inverse Document Frequency (IDF) measures the information that word provides to distinguish a document from others. If a word is very common, its IDF will be low, but if the word appears in only a few documents, its IDF will be high.

The TF-IDF of the...