Book Image

Scala for Machine Learning

By : Patrick R. Nicolas
Book Image

Scala for Machine Learning

By: Patrick R. Nicolas

Overview of this book

Table of Contents (20 chapters)
Scala for Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Tools and frameworks


Before getting your hands dirty, you need to download and deploy a minimum set of tools and libraries; there is no need to reinvent the wheel after all. A few key components have to be installed in order to compile and run the source code described throughout the book. We focus on open source and commonly available libraries, although you are invited to experiment with equivalent tools of your choice. The learning curve for the frameworks described here is minimal.

Java

The code described in this book has been tested with JDK 1.7.0_45 and JDK 1.8.0_25 on Windows x64 and Mac OS X x64. You need to install the Java Development Kit if you have not already done so. Finally, the JAVA_HOME, PATH, and CLASSPATH environment variables have to be updated accordingly.

Scala

The code has been tested with Scala 2.10.4 and 2.11.4. We recommend that you use Scala Version 2.10.4 or higher with SBT 0.13 or higher. Let's assume that Scala runtime (REPL) and libraries have been properly installed and the SCALA_HOME and PATH environment variables have been updated.

The description and installation instructions of the Scala plugin for Eclipse (version 4.0 or higher) are available at http://scala-ide.org/docs/user/gettingstarted.html. You can also download the Scala plugin for IntelliJ IDEA (version 13 or higher) from the JetBrains website at http://confluence.jetbrains.com/display/SCA/.

The ubiquitous Simple Build Tool (SBT) will be our primary building engine. The syntax of the build file, sbt/build.sbt, conforms to the Version 0.13 and is used to compile and assemble the source code presented throughout the book. Sbt can be downloaded as part of Typesafe activator or directly from http://www.scala-sbt.org/download.html.

Apache Commons Math

Apache Commons Math is a Java library used for numerical processing, algebra, statistics, and optimization [1:6].

Description

This is a lightweight library that provides developers with a foundation of small, ready-to-use Java classes that can be easily weaved into a machine learning problem. The examples used throughout the book require Version 3.5 or higher.

The math library supports the following:

  • Functions, differentiation, and integral and ordinary differential equations

  • Statistics distributions

  • Linear and nonlinear optimization

  • Dense and sparse vectors and matrices

  • Curve fitting, correlation, and regression

For more information, visit http://commons.apache.org/proper/commons-math.

Licensing

We need Apache Public License 2.0; the terms are available at http://www.apache.org/licenses/LICENSE-2.0.

Installation

The installation and deployment of the Apache Commons Math library are quite simple. The steps are as follows:

  1. Go to the download page at http://commons.apache.org/proper/commons-math/download_math.cgi.

  2. Download the latest .jar files to the binary section, commons-math3-3.5-bin.zip (for instance, for Version 3.5).

  3. Unzip and install the .jar file.

  4. Add commons-math3-3.5.jar to the classpath as follows:

    • For Mac OS X: export CLASSPATH=$CLASSPATH:/Commons_Math_path/commons-math3-3.5.jar

    • For Windows: Go to system Properties | Advanced system settings | Advanced | Environment Variables, then edit the CLASSPATH variable

  5. Add the commons-math3-3.5.jar file to your IDE environment if needed (that is, for Eclipse, go to Project | Properties | Java Build Path | Libraries | Add External JARs and for IntelliJ IDEA, go to File | Project Structure | Project Settings | Libraries).

You can also download commons-math3-3.5-src.zip from the Source section.

JFreeChart

JFreeChart is an open source chart and plotting Java library, widely used in the Java programmer community. It was originally created by David Gilbert [1:7].

Description

The library supports a variety of configurable plots and charts (scatter, dial, pie, area, bar, box and whisker, stacked, and 3D). We use JFreeChart to display the output of data processing and algorithms throughout the book, but you are encouraged to explore this great library on your own, as time permits.

Licensing

It is distributed under the terms of the GNU Lesser General Public License (LGPL), which permits its use in proprietary applications.

Installation

To install and deploy JFreeChart, perform the following steps:

  1. Visit http://www.jfree.org/jfreechart/.

  2. Download the latest version from Source Forge at http://sourceforge.net/projects/jfreechart/files.

  3. Unzip and deploy the .jar file.

  4. Add jfreechart-1.0.17.jar (for Version 1.0.17) to the classpath as follows:

    • For Mac OS X: export CLASSPATH=$CLASSPATH:/JFreeChart_path/jfreechart-1.0.17.jar

    • For Windows: Go to system Properties | Advanced system settings | Advanced | Environment Variables, then edit the CLASSPATH variable

  5. Add the jfreechart-1.0.17.jar file to your IDE environment, if needed

Other libraries and frameworks

Libraries and tools that are specific to a single chapter are introduced along with the topic. Scalable frameworks are presented in the last chapter along with the instructions to download them. Libraries related to the conditional random fields and support vector machines are described in their respective chapters.

Note

Why not use the Scala algebra and numerical libraries?

Libraries such as Breeze, ScalaNLP, and Algebird are interesting Scala frameworks for linear algebra, numerical analysis, and machine learning. They provide even the most seasoned Scala programmer with a high-quality layer of abstraction. However, this book is designed as a tutorial that allows developers to write algorithms from the ground up using existing or legacy Java libraries [1:8].