Book Image

Scala for Machine Learning - Second Edition

Book Image

Scala for Machine Learning - Second Edition

Overview of this book

The discovery of information through data clustering and classification is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, engineering design, logistics, manufacturing, and trading strategies, to detection of genetic anomalies. The book is your one stop guide that introduces you to the functional capabilities of the Scala programming language that are critical to the creation of machine learning algorithms such as dependency injection and implicits. You start by learning data preprocessing and filtering techniques. Following this, you'll move on to unsupervised learning techniques such as clustering and dimension reduction, followed by probabilistic graphical models such as Naïve Bayes, hidden Markov models and Monte Carlo inference. Further, it covers the discriminative algorithms such as linear, logistic regression with regularization, kernelization, support vector machines, neural networks, and deep learning. You’ll move on to evolutionary computing, multibandit algorithms, and reinforcement learning. Finally, the book includes a comprehensive overview of parallel computing in Scala and Akka followed by a description of Apache Spark and its ML library. With updated codes based on the latest version of Scala and comprehensive examples, this book will ensure that you have more than just a solid fundamental knowledge in machine learning with Scala.
Table of Contents (27 chapters)
Scala for Machine Learning Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Index

Chapter 1. Getting Started

It is critical for any computer scientist that they understand the different classes of machine learning algorithms and are able to select the ones that are relevant to the domain of their expertise and dataset. However, the application of these algorithms represents a small fraction of the overall effort needed to extract an accurate and performing model from input data. A common data mining workflow consists of the following sequential steps:

  1. Defining the problem to solve.

  2. Loading the data.

  3. Cleaning the data.

  4. Discovering patterns, affinities, clusters, and classes, if needed.

  5. Selecting the model features and the appropriate machine learning algorithm(s).

  6. Refining and validating the model.

  7. Improving the computational performance of the implementation.

As we will emphasize throughout this book, each stage of the process is critical for building a model appropriate for the problem.

It is impossible to describe in every detail the key machine learning algorithms and their implementation in a single book. The sheer quantity of information and Scala code would overwhelm even the most dedicated readers. Each chapter focuses on the mathematics and code that are absolutely essential for the understanding of the topic. Developers are encouraged to browse through the following areas:

  • Scala coding conventions and standards used in the book in the Appendix

  • API Scala docs

  • Fully documented source code, available online

This first chapter introduces the following elements:

  • Basic concept of machine learning

  • Taxonomy of machine learning algorithms

  • Language, tools, frameworks, and libraries used throughout the book

  • A typical workflow of model training and prediction

  • A simple concrete application using binomial logistic regression