Book Image

Machine Learning in Java - Second Edition

By : AshishSingh Bhatia, Bostjan Kaluza
Book Image

Machine Learning in Java - Second Edition

By: AshishSingh Bhatia, Bostjan Kaluza

Overview of this book

As the amount of data in the world continues to grow at an almost incomprehensible rate, being able to understand and process data is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, spam detection, document search, and trading strategies, to speech recognition. This makes machine learning well-suited to the present-day era of big data and Data Science. The main challenge is how to transform data into actionable knowledge. Machine Learning in Java will provide you with the techniques and tools you need. You will start by learning how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. The code in this book works for JDK 8 and above, the code is tested on JDK 11. Moving on, you will discover how to detect anomalies and fraud, and ways to perform activity recognition, image recognition, and text analysis. By the end of the book, you will have explored related web resources and technologies that will help you take your learning to the next level. By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data.
Table of Contents (13 chapters)

What this book covers

Chapter 1, Applied Machine Learning Quick Start, introduces the field of natural language processing (NLP). The tools and basic techniques that support NLP are discussed. The use of models, their validation, and their use from a conceptual perspective are presented.

Chapter 2, Java Libraries and Platforms for Machine Learning, covers the purpose and uses of tokenizers. Different tokenization processes will be explored, followed by how they can be used to solve specific problems.

Chapter 3, Basic Algorithms – Classification, Regression, and Clustering, covers the problems associated with sentence detection. Correct detection of the end of sentences is important for many reasons. We will examine different approaches to this problem using a variety of examples.

Chapter 4, Customer Relationship Prediction with Ensembles, covers the process and problems associated with name recognition. Finding names, locations, and various things in a document is an important step in NLP. The techniques available are identified and demonstrated.

Chapter 5, Affinity Analysis, covers the process of determining the part of speech that is useful in determining the importance of words and their relationships in a document. It is a process that can enhance the effectiveness of other NLP tasks.

Chapter 6, Recommendation Engine with Apache Mahout, covers traditional features that do not apply to text documents. In this chapter, we'll learn how text documents can be presented.

Chapter 7, Fraud and Anomaly Detection, covers information retrieval, which entails finding documents in an unstructured format, such as text that satisfies a query.

Chapter 8, Image Recognition with Deeplearning4J, covers the issues surrounding how documents and text can be classified. Once we have isolated the parts of text, we can begin the process of analyzing it for information. One of these processes involves classifying and clustering information.

Chapter 9, Activity Recognition with Mobile Phone Sensors, demonstrates how to discover topics in a set of documents.

Chapter 10, Text Mining with Mallet – Topic Modeling and Spam Detection, covers the use of parsers and chunkers to solve text problems that are then examined. This important process, which normally results in a parse tree, provides insights into the structure and meaning of documents.

Chapter 11, What is Next?, brings together many of the topics in previous chapters to address other more sophisticated problems. The use and construction of a pipeline is discussed. The use of open source tools to support these operations is presented.