Book Image

Scientific Computing with Scala

By : Vytautas Jancauskas
Book Image

Scientific Computing with Scala

By: Vytautas Jancauskas

Overview of this book

Scala is a statically typed, Java Virtual Machine (JVM)-based language with strong support for functional programming. There exist libraries for Scala that cover a range of common scientific computing tasks – from linear algebra and numerical algorithms to convenient and safe parallelization to powerful plotting facilities. Learning to use these to perform common scientific tasks will allow you to write programs that are both fast and easy to write and maintain. We will start by discussing the advantages of using Scala over other scientific computing platforms. You will discover Scala packages that provide the functionality you have come to expect when writing scientific software. We will explore using Scala's Breeze library for linear algebra, optimization, and signal processing. We will then proceed to the Saddle library for data analysis. If you have experience in R or with Python's popular pandas library you will learn how to translate those skills to Saddle. If you are new to data analysis, you will learn basic concepts of Saddle as well. Well will explore the numerical computing environment called ScalaLab. It comes bundled with a lot of scientific software readily available. We will use it for interactive computing, data analysis, and visualization. In the following chapters, we will explore using Scala's powerful parallel collections for safe and convenient parallel programming. Topics such as the Akka concurrency framework will be covered. Finally, you will learn about multivariate data visualization and how to produce professional-looking plots in Scala easily. After reading the book, you should have more than enough information on how to start using Scala as your scientific computing platform
Table of Contents (16 chapters)
Scientific Computing with Scala
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Building, testing, and distributing your Scala software


We are making the assumption that you are familiar with the Scala programming language in this book. Therefore, language concepts are not introduced. We want to present a convenient way of building, testing, and distributing your software. Even if you have already read several books on Scala and implemented some basic (or not so basic) programs, you may still benefit from this.

In this section, we will discuss how to build, test, and distribute your software using the SBT tool. You can get the tool from the SBT website www.scala-sbt.org. There, you will find instructions for how to set it up on whichever operating system you are using. We will only consider here how to use SBT to build and test your software. If you also want to use version control (and you should), then you should consult the documentation and tutorials for tools such as git.

SBT is an open source build tool similar to Java's Maven or Ant that lets you compile your Scala project, integrates with many popular Scala test frameworks, has dependency management functionality, integrates with the Scala interpreter, supports mixed Scala/Java projects, and much more.

For this tutorial, we will consider a simple interval arithmetic software package. This will serve as an example to showcase the capabilities of SBT and will also let you try your hand at creating a full (albeit simple) Scala library ready for distribution for other people's benefit. This small package will serve to illustrate the principles of building and testing software with SBT. It is a small library implementing an Interval class and operations that correspond to interval arithmetic for that class.

Interval arithmetic is a generalization of standard arithmetic rules designed to operate on intervals of values. It has many applications in science and engineering. For example, measuring rounding errors or in global optimization methods. While it is full of complex intricacies, at the base of it are some very simple ideas. An interval [a, b] is a range of values between a and b including a and b. Now what is the sum of two intervals [a, b] and [c, d]? We will define the sum of two intervals as the interval that the result of adding any number from the interval [a, b] to any number from the interval [c, d] will fall in to. This is simply the interval [a + c, b + d]. It is not difficult to convince oneself that this is so by considering that a and c are the smallest numbers from their respective intervals; thus, their sum is the smallest number in the resulting interval. The same goes for the upper bounds b and d.

Similarly for subtraction, we get that [a, b] minus [c, d] is equal to [a – d, b – c]; for multiplication, we get that [a, b] times [c, d] is equal to the interval [min(ac, ad, bc, bd), max(ac, ad, bc, bd)]; and finally for division, we get that [a, b] divided by [c, d] is equal to [min(a/c, a/d, b/c, b/d), max(a/c, a/d, b/c, b/d)]. Finally, we can define the relational operators as follows [a, b] < [c, d] if and only if b < c and similarly [a, b] > [c, d] if and only if a > d. Two intervals are considered equal if (and only if) a = c and b = d. Using this information lets us define a Scala class called Interval and define operations with the semantics that we discussed here. You should put all of the following code into a file called Interval.scala:

package org.intervals.intervals

import java.lang.ArithmeticException
class Interval(ac: Double, bc: Double) {
    var a: Double = ac
    var b: Double = bc
    if (a > b) {
        val tmp = a
        a = b
        b = tmp
    }

    def contains(x: Double): Boolean =
        x >= this.a && x <= this.b

    def contains(x: Interval): Boolean = 
        x.a >= this.a && x.b <= this.b

    def +(that: Interval): Interval =
        new Interval(this.a + that.a, this.b + that.b)

    def -(that: Interval): Interval =
        new Interval(this.a – that.b, this.b – that.a)

    def *(that: Interval) : Interval = {
        val all = List(this.a * that.a, this.a * that.b,
                       this.b * that.a, this.b * that.b)
        new Interval(all.min, all.max)
    }

    def /(that: Interval) : Interval = {
        if (that.contains(0.0)) {
            throw new ArithmeticException("Division by an interval containing zero")
        }
        val all = List(this.a / that.a, this.a / that.b,
                       this.b / that.a, this.b / that.b)
        new Interval(all.min, all.max)
    }

    def ==(that: Interval): Boolean =
        this.a == that.a && this.b == that.b

    def <(that: Interval): Boolean =
        this.b < that.a

    def >(that: Interval): Boolean =
        this.a > that.b
     
    override def toString(): String =
        "[" + this.a + ", " + this.b + "]"
}

This will create a new class called Interval, which can be constructed by specifying the interval limits. You can then add, subtract, multiply, and divide intervals using standard Scala syntax. We made sure that division by zero threw an exception if the user tries to divide by an interval containing zero. This is because division by zero is undefined, and it is not immediately clear what to do when the interval you divide by contains zero. To use it, you would would use Scala statements such as these:

val interval1 = new Interval(-0.5, 0.8)
val interval2 = new Interval(0.3, 0.5)
val interval3 = (interval1 + interval2) * interval3 / (interval1 – interval2)

Obviously, to start doing this, you have to make sure the program using it is able to find it. Let's explore how this can be achieved next.

Directory structure

The SBT tool expects you to follow a certain directory structure that is given here. If you put the appropriate files into specific directories, SBT will be able to automatically build and test your software without having to specify many details in the configuration files:

project/
    src/
        main/
            resources/
            scala/
            java/
        test/
            resources/
            scala/
            java/

For example, for our project, we will want to create a directory called intervals in which we then create the whole directory tree starting with src. Naturally, we will want to put our Interval.scala file inside the src/ main/ scala directory. There is, however, another thing to consider concerning the directory structure. You can follow the Java convention of structuring directories according to the package name. While this is mandatory in Java, it is only optional in Scala, but we will do it anyway. Because of that, our Interval.scala file ends up inside the src/main/scala/org/intervals/intervals directory.

We now need to tell SBT some basic things about our project. These include various bits of metadata such as the project name, version number, and the version of Scala we want to use. One nice thing about SBT is that it will download the Scala version you need for your project, whichever version you may already have installed on your system. Also, it has to know the root directory of our project. Let's now add the build.sbt file to the project. You need to put that file under the project/ directory of the main project file tree. In our case, we called the project directory intervals. For now, fill in the file with the following information:

lazy val commonSettings = Seq(
    organization := "org.intervals",
    name := "intervals",
    version := "0.0.1",
    scalaVersion := "2.11.4"
)

lazy val root = (project in file(".")).
    settings(commonSettings: _*)

Now, if we want to build the project using SBT, believe it or not nothing remains to be done. SBT will take advantage of the a priori known folder structure and look for files in expected places. Simply go to the project directory and issue the following command from the terminal:

$ sbt compile console

The preceding commands will first compile the Scala code and then put us into the Scala REPL. Alternatively, you can run SBT first and then type the compile and console commands into its command interpreter. After the Interval.scala file is compiled, you will be dropped in to the Scala REPL where you can start using your new class immediately. Let's try it out.

We need to import our new library first:

scala> import org.intervals.intervals.Interval
import org.intervals.intervals.Interval

Now, let's create a couple of Interval objects:

scala> val ab = new Interval(-3.0, 2.0)
ab: org.intervals.intervals.Interval = [-3.0, 2.0]

scala> val cd = new Interval(4.0, 7.0)
cd: org.intervals.intervals.Interval = [4.0, 7.0]

Now, let's test whether our newly defined interval arithmetic operations work as expected:

scala> ab + cd
res0: org.intervals.intervals.Interval = [1.0, 9.0]

scala> ab - cd
res1: org.intervals.intervals.Interval = [-10.0, -2.0]

scala> ab * cd
res2: org.intervals.intervals.Interval = [-21.0, 14.0]

scala> ab / cd
res3: org.intervals.intervals.Interval = [-0.75, 0.5]

And finally, let's test the relational operators. Again, these will test that our implementation follows the rules we described for partially ordering intervals:

scala> ab == cd
res4: Boolean = false

scala> ab < cd
res5: Boolean = true

scala> ab > cd
res6: Boolean = false

scala> ab contains 0.0
res7: Boolean = true

It seems that SBT successfully built and loaded our newly created software package. Now, if only there was some way to see if the software works correctly without having to type all that stuff in to the Scala console all the time!

Testing Scala code with the help of SBT

Testing code when you use SBT to build your Scala software is very easy. All you need to do is make sure SBT knows you need the testing framework and then type sbt compile test into the command line. To make sure SBT downloads and installs the testing framework of your choice, you need to add it to the build.sbt file that we discussed earlier. We recommend using ScalaTest, since it allows very simple testing, which is great for medium-sized software that most scientific computing packages are. It also has more advanced capabilities if you need them. To use ScalaTest, add the following line to the end of your build.sbt file:

libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.0" % "test"

Use a higher version number than 2.2.0 if needed. This will pull in the testing classes as needed. Now, we will need to write the actual test code and put it into our src/ test/scala/ directory. We will be using the appropriately named FunSuite class for our tests. Let's call this file IntervalSuite.scala and put in the tests that follow. First, we want to import both the FunSuite and Interval classes, which we will be testing:

import org.scalatest.FunSuite
import org.intervals.intervals.Interval

class IntervalSuite extends FunSuite {

Testing with FunSuite is really simple. Just use test followed by description of the test and use assert in the body of the test that will fail if our program exhibits undesired behavior. In the following cases, we want to test if our newly defined interval arithmetic operations work according to interval arithmetic rules:

  test("interval addition should work according to interval arithmetic") {
    val interval1 = new Interval(0.1, 0.2)
    val interval2 = new Interval(1.0, 3.0)
    val sum = interval1 + interval2
    assert(sum.a == 1.1)
    assert(sum.b == 3.2)
  }

  test("interval subtraction should work according to interval arithmetic") {
    val interval1 = new Interval(0.1, 0.2)
    val interval2 = new Interval(1.0, 3.0)
    val sub = interval1 - interval2
    assert(sub.a == -2.9)
    assert(sub.b == -0.8)
  }

  test("inclusion should return true if a Double falls within the interval bounds") {
    val interval = new Interval(-1.0, 1.0)
    assert(interval.contains(0.0))
    assert(!interval.contains(2.0))
    assert(!interval.contains(-2.0))
  }

  test("interval multiplication should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    val interval2 = new Interval(-3.0, -1.0)
    val mul = interval1 * interval2
    assert(mul.a == -12.0)
    assert(mul.b == 6.0)
  }

In the following test, we want to test if division works as expected. Division by an interval that contains zero is undefined for our simplified interval arithmetic system. As such, we want the division to signal an exception if the divisor interval contains zero. To do this, we employ the intercept statement. We specify there that we expect that dividing interval2 by interval1 will signal an ArithmeticException exception, which according to our implementation it should:

  test("interval division should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    val interval2 = new Interval(-3.0, -1.0)
    intercept[ArithmeticException] {
      interval2 / interval1
    }
    val div = interval1 / interval2
    assert(div.a == -4.0)
    assert(div.b == 2.0)
  }

  test("equality operator should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    assert(interval1 == interval1)
  }

  test("inequality operators should work according to interval arithmetic") {
    val interval1 = new Interval(-2.0, 4.0)
    val interval2 = new Interval(5.0, 6.0)
    assert(interval1 < interval2)
    assert(interval2 > interval1)
    assert(interval1 != interval2)
  }

Finally, we add one more test to be completely sure. All basic interval arithmetic operations are inclusion-isotonic. This means that, if the intervals are i1, i2, i3, and i4 and if i1 is fully contained within i3 and i2 is contained within i4, then the result of i1 op i2 is contained within the interval i3 op i4. Here, op is one of +, -, *, or / defined according to interval arithmetic rules:

  test("all basic interval arithmetic operations should be inclusion isotonic") {
    val interval1 = new Interval(2.0, 4.0)
    val interval2 = new Interval(2.5, 3.5)
    val interval3 = new Interval(1.0, 3.0)
    val interval4 = new Interval(1.5, 2.5)
    assert((interval1 + interval3).contains(interval2 + interval3))
    assert((interval1 - interval3).contains(interval2 - interval3))
    assert((interval1 * interval3).contains(interval2 * interval3))
    assert((interval1 / interval3).contains(interval2 / interval3))
  }
}

With the IntervalSuite.scala file put in the src/ test/ scala directory, testing our library is simple. Simply type in sbt compile test into the console window. The result will show all the tests passed and failed and the reasons for failure if any. Testing your scientific software becomes simple this way: just a matter of writing the tests and using SBT to run them.

ENSIME and SBT integration

You can take advantage of the SBT integration if you use the ENSIME mode for Emacs. To begin using it, you need to create the .ensime file in your project folder. Do this by adding the following line to your ~/.sbt/0.13/plugins/plugins.sbt file:

addSbtPlugin("org.ensime" % "ensime-sbt" % "0.2.0")

Now, you can just go to the root of your project folder and issue the sbt gen-ensime command. This will create the .ensime file using the information gathered by SBT about your project. After that, you can start using ENSIME to develop your project. Just load the newly created .ensime file before starting ENSIME.

Distributing your software

After you have written your oh-so-useful Scala library, you will probably want other people be able to use it. Ideally, you just want people to append the library name to their library dependencies list in the build.sbt file and then have that package automatically downloaded whenever needed. The process for publishing software this way is not currently very simple. There is, however, a simpler way of publishing your software and that is as an unmanaged dependency. Unmanaged dependencies differ from managed ones. The user will have to download a .jar file containing your library and place it under the lib directory in their project file tree in an unmanaged dependency. To create a .jar file for your project, all you have to do is use the sbt publish command. Simply type in sbt publish at the console and your Scala package will be compiled and put in target/ intervals_2.11-0.0.1.jar. Now, it is a simple matter of putting that .jar file in the lib/ directory of the project you want to use it in. Alternatively, you can put it up online for people to download. One thing to watch out for though is that, if your library has dependencies, then the user of that library will have to make sure they also end up in their lib/ folder.

Now, let's test this with our intervals library. First package it using the sbt publish command. Then, create a new project. It is actually very simple. Instead of creating a full project tree, you can simply create a directory for the project and put your source code directly in it. SBT is clever enough to figure out what is going on in these cases too.

Let's say we create a new project directory called intervals_user; inside this directory, create a new directory called lib. Now, copy the result of the sbt publish command, which will be called intervals_2.11-0.0.1.jar and will reside in the target subdirectory of our intervals project to this new directory. From here on, SBT will let you use this library in your new project. Create a new file called IntervalUser.scala and put the following code there:

import org.orbitfold.iafs.Interval

object IntervalUser {
  def main(args: Array[String]) = {
    val interval1 = new Interval(-0.5, 0.5)
    val interval2 = new Interval(0.2, 0.8)
    println(interval1 + interval2)
    println(interval1 - interval2)
    println(interval1 * interval2)
    println(interval1 / interval2)
  }
}

It is now simple to run this program. You can merely issue the sbt run command in the intervals_user folder that we created for this project. If you have done everything right, you should see the following lines as part of the output of this program:

[-0.3, 1.3]
[-1.3, 0.3]
[-0.4, 0.4]
[-2.5, 2.5]

Another method for distributing software is more involved. SBT uses Apache Ivy, which in turn looks for packages on the central Maven repository by default. What happens when you add a dependency to the library dependencies list in your build.sbt file is that the information there is used to locate the appropriate files on the Maven repository; the files are then downloaded to your computer. The process of publishing your library to these is complicated and will not be discussed here since it would be a large detour for a book about writing scientific software with Scala. For now, you can simply ask people to download your package to their lib folder. After you have worked more on your library and want it known and widely used, you can look up the process for publishing software to Maven central online.