Book Image

Mastering Java Machine Learning

By : Uday Kamath, Krishna Choppella
Book Image

Mastering Java Machine Learning

By: Uday Kamath, Krishna Choppella

Overview of this book

Java is one of the main languages used by practicing data scientists; much of the Hadoop ecosystem is Java-based, and it is certainly the language that most production systems in Data Science are written in. If you know Java, Mastering Machine Learning with Java is your next step on the path to becoming an advanced practitioner in Data Science. This book aims to introduce you to an array of advanced techniques in machine learning, including classification, clustering, anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, deep learning, and big data batch and stream machine learning. Accompanying each chapter are illustrative examples and real-world case studies that show how to apply the newly learned techniques using sound methodologies and the best Java-based tools available today. On completing this book, you will have an understanding of the tools and techniques for building powerful machine learning models to solve data science problems in just about any domain.
Table of Contents (20 chapters)
Mastering Java Machine Learning
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Linear Algebra
Index

Assumptions and mathematical notations


There are some key assumptions made by many stream machine learning techniques and we will state them explicitly here:

  • The number of features in the data is fixed.

  • Data has small to medium dimensions, or number of features, typically in the hundreds.

  • The number of examples or training data can be infinite or very large, typically in the millions or billions.

  • The number of class labels in supervised learning or clusters are small and finite, typically less than 10.

  • Normally, there is an upper bound on memory; that is, we cannot fit all the data in memory, so learning from data must take this into account, especially lazy learners such as K-Nearest-Neighbors.

  • Normally, there is an upper bound on the time taken to process the event or the data, typically a few milliseconds.

  • The patterns or the distributions in the data can be evolving over time.

  • Learning algorithms must converge to a solution in finite time.

Let Dt = {xi, yi : y = f(x)} be the given data available...