Book Image

Scala for Machine Learning

By : R. Nicolas
Book Image

Scala for Machine Learning

By: R. Nicolas

Overview of this book

Are you curious about AI? All you need is a good understanding of the Scala programming language, a basic knowledge of statistics, a keen interest in Big Data processing, and this book!
Table of Contents (15 chapters)
14
Index

Performance considerations

The three unsupervised learning techniques share the same limitation—a high computational complexity.

K-means

The K-means has the computational complexity of O(iKnm), where i is the number of iterations, K the number of clusters, n the number of observations, and m the number of features. The algorithm can be improved through the use of other techniques by using the following techniques:

  • Reducing the average number of iterations by seeding the centroid using an algorithm such as initialization by ranking the variance of the initial cluster as described at the beginning of this chapter.
  • Using a parallel implementation of K-means and leveraging a large-scale framework such as Hadoop or Spark.
  • Reducing the number of outliers and possible features by filtering out the noise with a smoothing algorithm such as a discrete Fourier transform or a Kalman filter.
  • Decreasing the dimensions of the model by following a two-step process: a first pass with a smaller number of...