Book Image

Scala for Machine Learning

By : Patrick R. Nicolas
Book Image

Scala for Machine Learning

By: Patrick R. Nicolas

Overview of this book

Table of Contents (20 chapters)
Scala for Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Source code


The Scala programming language is used to implement and evaluate the machine learning techniques covered in Scala for Machine Learning. However, the source code snippets are reduced to the strict minimum essential to the understanding of machine learning algorithms discussed throughout the book. The formal implementation of these algorithms is available on the website of Packt Publishing (http://www.packtpub.com).

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Context versus view bounds

Most Scala classes discussed in the book are parameterized with the type associated with the discrete/categorical value (Int) or continuous value (Double). Context bounds would require that any type used by the client code has Int or Double as upper bounds:

class A[T <: Int](param: Param)
class B[T <: Double](param: Param)

Such a design introduces constraints on the client to inherit from simple types and to deal with covariance and contravariance for container types [1:9].

For this book, view bounds are used instead of context bounds because they only require an implicit conversion to the parameterized type to be defined:

class A[T <: AnyVal](param: Param)(implicit f: T => Int)
class C[T < : AnyVal](param: Param)(implicit f: T => Float)

Note

View bound deprecation

The notation for the view bound, T <% Double, is being deprecated in Scala 2.11 and higher. The class A[T <% Float] declaration is the short notation for class A[T](implicit f: T => Float).

Presentation

For the sake of readability of the implementation of algorithms, all nonessential code such as error checking, comments, exceptions, or imports are omitted. The following code elements are omitted in the code snippet presented in the book:

  • Code documentation:

    // …..
    /* … */
  • Validation of class parameters and method arguments:

    require( Math.abs(x) < EPS, " …")
  • Class qualifiers and scope declaration:

    final protected class SVM { … }
    private[this] val lsError = …
  • Method qualifiers:

    final protected def dot: = …
  • Exceptions:

    try {
       correlate …
    } catch {
       case e: MathException => ….
    }
    Try {    .. } match {
      case Success(res) =>
      case Failure(e => ..
    }
  • Logging and debugging code:

    private val logger = Logger.getLogger("..")
    logger.info( … )
  • Nonessential annotation:

    @inline def main = ….
    @throw(classOf[IllegalStateException])
  • Nonessential methods

The complete list of Scala code elements omitted in the code snippets in this book can be found in the Code snippets format section in the Appendix A, Basic Concepts.

Primitives and implicits

The algorithms presented in this book share the same primitive types, generic operators, and implicit conversions.

Primitive types

For the sake of readability of the code, the following primitive types will be used:

type DblPair = (Double, Double)
type DblArray = Array[Double]
type DblMatrix = Array[DblArray]
type DblVector = Vector[Double]
type XSeries[T] = Vector[T]         // One dimensional vector
type XVSeries[T] = Vector[Array[T]] // multi-dimensional vector

The times series introduced in the Time series in Scala section in Chapter 3, Data Preprocessing, is implemented as XSeries[T] or XVSeries[T] of a parameterized T type.

Note

Make a note of these six types; they are used throughout the book.

Type conversions

Implicit conversion is an important feature of the Scala programming language. It allows developers to specify a type conversion for an entire library in a single place. Here are a few of the implicit type conversions that are used throughout the book:

object Types {
  Object ScalaMl {  
   implicit def double2Array(x: Double): DblArray = 
      Array[Double](x)
   implicit def dblPair2Vector(x: DblPair): Vector[DblPair] = 
      Vector[DblPair](x._1,x._2)
   ...
  }
}

Note

Library-specific conversion

The conversion between the primitive type listed here and types introduced in a particular library (such as, the Apache Commons Math library) are described in the relevant chapters.

Immutability

It is usually a good idea to reduce the number of states of an object. A method invocation transitions an object from one state to another. The larger the number of methods or states, the more cumbersome the testing process becomes.

There is no point in creating a model that is not defined (trained). Therefore, making the training of a model as part of the constructor of the class it implements makes a lot of sense. Therefore, the only public methods of a machine learning algorithm are as follows:

  • Classification or prediction

  • Validation

  • Retrieval of model parameters (weights, latent variables, hidden states, and so on), if needed

Performance of Scala iterators

The evaluation of the performance of Scala high-order iterative methods is beyond the scope of this book. However, it is important to be aware of the trade-off of each method.

The for construct is to be avoided as a counting iterator. It is designed to implement the for-comprehensive monad (map and flatMap). The source code presented in this book uses the high-order foreach method instead.