Scala for Machine Learning

Book Image

Scala for Machine Learning

By : Patrick R. Nicolas

Book Image

Scala for Machine Learning

By: Patrick R. Nicolas

Overview of this book

Scala for Machine Learning

Scala for Machine Learning

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Getting Started

Getting Started

Mathematical notation for the curious

Why machine learning?

Model categorization

Taxonomy of machine learning algorithms

Don't reinvent the wheel!

Tools and frameworks

Let's kick the tires

Hello World!

Defining a methodology

Monadic data transformation

A workflow computational model

Assessing a model

Data Preprocessing

Data Preprocessing

Time series in Scala

Moving averages

Fourier analysis

The discrete Kalman filter

Alternative preprocessing techniques

Unsupervised Learning

Unsupervised Learning

Dimension reduction

Performance considerations

Naïve Bayes Classifiers

Naïve Bayes Classifiers

Probabilistic graphical models

Naïve Bayes classifiers

The Multivariate Bernoulli classification

Naïve Bayes and text mining

Regression and Regularization

Regression and Regularization

Linear regression

Numerical optimization

Logistic regression

Sequential Data Models

Sequential Data Models

Markov decision processes

The hidden Markov model

Conditional random fields

Regularized CRFs and text analytics

Comparing CRF and HMM

Performance consideration

Kernel Models and Support Vector Machines

Kernel Models and Support Vector Machines

Kernel functions

Support vector machines

Support vector classifiers – SVC

Anomaly detection with one-class SVC

Support vector regression

Performance considerations

Artificial Neural Networks

Artificial Neural Networks

Feed-forward neural networks

The multilayer perceptron

Convolution neural networks

Benefits and limitations

Genetic Algorithms

Genetic Algorithms

Genetic algorithms and machine learning

Genetic algorithm components

GA for trading strategies

Advantages and risks of genetic algorithms

Reinforcement Learning

Reinforcement Learning

Reinforcement learning

Learning classifier systems

Scalable Frameworks

Scalable Frameworks

Scalability with Actors

Basic Concepts

Scala programming

Suggested online courses

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Performance considerations

The three unsupervised learning techniques share the same limitation—a high computational complexity.

K-means

The K-means has the computational complexity of O(iKnm), where i is the number of iterations (or recursions), K is the number of clusters, n is the number of observations, and m is the number of features. Here are some remedies to the poor performance of the K-means algorithm:

Reducing the average number of iterations by seeding the centroid using a technique such as initialization by ranking the variance of the initial cluster, as described in the beginning of this chapter
Using a parallel implementation of K-means and leveraging a large-scale framework such as Hadoop or Spark
Reducing the number of outliers and features by filtering out the noise with a smoothing algorithm such as a discrete Fourier transform or a Kalman filter
Decreasing the dimensions of the model by following a two-step process:
1. Execute a first pass with a smaller number of clusters K and...