Book Image

Machine Learning in Java - Second Edition

By : AshishSingh Bhatia, Bostjan Kaluza
Book Image

Machine Learning in Java - Second Edition

By: AshishSingh Bhatia, Bostjan Kaluza

Overview of this book

As the amount of data in the world continues to grow at an almost incomprehensible rate, being able to understand and process data is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, spam detection, document search, and trading strategies, to speech recognition. This makes machine learning well-suited to the present-day era of big data and Data Science. The main challenge is how to transform data into actionable knowledge. Machine Learning in Java will provide you with the techniques and tools you need. You will start by learning how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. The code in this book works for JDK 8 and above, the code is tested on JDK 11. Moving on, you will discover how to detect anomalies and fraud, and ways to perform activity recognition, image recognition, and text analysis. By the end of the book, you will have explored related web resources and technologies that will help you take your learning to the next level. By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data.
Table of Contents (13 chapters)

Outlier detection using ELKI

ELKI stands for Environment for Loping KDD applications Index structures, where KDD stands for Knowledge Discovery in Database. It is an open source software used mainly for data mining, with an emphasis on unsupervised learning. It supports various algorithms for cluster analysis and outlier detection. The following are some outlier algorithms:

  • Distance-based outlier detection: This is used to specify two parameters. The object is flagged outlier if its fraction, p, for all the data objects that have a distance above d from c. There are many algorithms, such as DBOutlierDetection, DBOutlierScore, KNNOutlier, KNNWeightOutlier, ParallelKNNOutlier, ParallelKNNWeightOutlier, ReferenceBasedOutlierDetection, and so on.
  • LOF family methods: This computes density-based local outlier factors on specific parameters. It includes algorithms such as LOF, ParallelLOF...