Book Image

Scala for Java Developers

By : Thomas Alexandre
Book Image

Scala for Java Developers

By: Thomas Alexandre

Overview of this book

Table of Contents (19 chapters)
Scala for Java Developers
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Operations on collections


In this section, we are going to illustrate how the manipulation of collections in Scala can be expressed in a concise and expressive way.

Transforming collections containing primitive types

The REPL is a great tool to try out the powerful operations that we can apply to the collection elements. Let's go back to our interpreter prompt:

scala> val numbers = List(1,2,3,4,5,6)
numbers: List[Int] = List(1,2,3,4,5,6)
scala> val reversedList = numbers.reverse
reversedList: List[Int] = List(6,5,4,3,2,1)
scala> val onlyAFew = numbers drop 2 take 3
onlyAFew: List[Int] = List(3, 4, 5)

The drop method indicates that we get rid of the first two elements of the list, and the take method indicates that we keep only three elements from the result obtained after the drop method.

This last command is interesting for two reasons:

  • Since every method call is evaluated to an expression, we can chain several method calls at once (here, take is invoked on the result of drop)

  • As already stated before, the syntactic sugar added to the Scala syntax makes it equivalent to write numbers drop 2 instead of the more traditional Java numbers.drop(2)

Another way of writing elements in a given list is by using the :: method, generally referred to in Scala documentation as the "cons operator". This alternative syntax looks like the following expression:

scala> val numbers = 1 :: 2 :: 3 :: 4 :: 5 :: 6 :: Nil
numbers: List[Int] = List(1, 2, 3, 4, 5, 6)

If you are wondering why there is a Nil value at the end of this expression, this is because there is a simple rule in Scala that says that a method whose last character is : (that is, a colon) is applied on its right side rather than the left side (such a method is called as right-associative). So, the evaluation of 6 :: Nil is not equivalent to 6.::(Nil) in that case, but rather Nil.::(6). We can exhibit that into the REPL as follows:

scala> val simpleList = Nil.::(6)
simpleList: List[Int] = List(6)

The evaluation of 5 :: 6 :: Nil is therefore done by applying the :: method on the simpleList that we saw earlier, which is List(6):

scala> val twoElementsList = List(6).::(5)
twoElementsList: List[Int] = List(5, 6)

In this case, 5 was appended before 6. Repeating this operation several times will give you the final List(1,2,3,4,5,6).

This convenient way of expressing lists is not just for simple values such as integers but can be applied to any type. Moreover, we can concatenate two List instances by using the ::: method in a similar way:

scala> val concatenatedList = simpleList ::: twoElementsList
concatenatedList: List[Int] = List(6, 5, 6)

We can even mix elements of various types in the same List, for example, integers and Booleans, as shown in the following code snippet:

scala> val things = List(0,1,true)
things: List[AnyVal] = List(0, 1, true) 

However, as you probably noticed, the result type AnyVal chosen by the compiler in that case is the first common type between integers and Booleans encountered in their hierarchy. For instance, retrieving only the Boolean element (at index two in the list) will return an element of type AnyVal rather than a Boolean value:

scala> things(2)
res6: AnyVal = true

Now, if we put an element of type String within the list as well, we will get a different common type:

scala> val things = List(0,1,true,"false")
things: List[Any] = List(0, 1, true, false)

The reason for that can be directly visualized by looking at the hierarchy of Scala types. Classes representing primitive types such as Int, Byte, Boolean, or Char belong to value types of scala.AnyVal, whereas String, Vector, List, or Set belong to reference types of scala.AnyRef, both being subclasses of the common type Any, as shown in the following diagram:

The full hierarchy of Scala types is given in the official Scala documentation at http://docs.scala-lang.org/tutorials/tour/unified-types.html.

Collections of more complex objects

Let's manipulate objects that are more complex than integers. We can, for instance, create some collections of Money instances that we made earlier and experiment with them:

scala> val amounts = List(Money(10,"USD"),Money(2,"EUR"),Money(20,"GBP"),Money(75,"EUR"),Money(100,"USD"),Money(50,"USD"))
amounts: List[Money] = List(Money(10,USD), Money(2,EUR), Money(20,GBP), Money(75,EUR), Money(100,USD), Money(50,USD))
scala> val first = amounts.head
first: Money = Money(10,USD)
scala> val amountsWithoutFirst = amounts.tail
amountsWithoutFirst: List[Money] = List(Money(2,EUR), Money(20,GBP), Money(75,EUR), Money(100,USD), Money(50,USD))

Filter and partition

Filtering elements of a collection is one of the most common operations and can be written as follows:

scala> val euros = amounts.filter(money => money.currency=="EUR")
euros: List[Money] = List(Money(2,EUR), Money(75,EUR))

The parameter given to the filter method is a function that takes a Money item as the input and returns a Boolean value (that is, a predicate), which is the result of evaluating money.currency=="EUR".

The filter method iterates over the collection items and applies the function to each element, keeping only the elements for which the function returns True. Lambda expressions are also referred to as anonymous functions because we could give any name we want to the input argument, for example, x instead of the money used previously, and still get the same output:

scala> val euros = amounts.filter(x => x.currency=="EUR")
euros: List[Money] = List(Money(2,EUR),Money(75,EUR))

A slightly shorter way of writing this one-liner can be done using an _ sign, a character that one encounters often when reading Scala code and that might seem awkward for a Java developer at first sight. It simply means "that thing", or "the current element". It can be thought of as the blank space or gap used to fill paper-based inquiries or passport registration forms, in the olden days. Other languages that deal with anonymous functions reserve other keywords, such as it in Groovy, or self in Python. The previous lambda example can be rewritten with the short underscore notation as the following:

scala> val euros = amounts.filter(_.currency=="EUR")
euros: List[Money] = List(Money(2,EUR),Money(75,EUR))

A filterNot method also exists to keep elements for which the evaluation of the function returns False. Moreover, a partition method is available to combine both the filter and filterNot methods into one single call that returns two collections, one evaluating to True and the other to its complement, as shown in the following code snippet:

scala> val allAmounts = amounts.partition(amt =>
     |   amt.currency=="EUR")
allAmounts: (List[Money], List[Money]) = (List(Money(2,EUR), Money(75,EUR)),List(Money(10,USD), Money(20,GBP), Money(100,USD), Money(50,USD)))

Dealing with tuples

Notice the return type of the partition result, (List[Money],List[Money]). Scala supports the concept of tuples. The preceding parenthesis notation denotes a Tuple type, which is a part of the standard Scala library and useful to manipulate several elements at once without having to create a more complex type for encapsulating them. In our case, allAmounts is a Tuple2 pair containing two lists of Money. To access only one of the two collections, we just need to type the following expressions:

scala> val euros = allAmounts._1
euros: List[Money] = List(Money(2,EUR),Money(75,EUR))
scala> val everythingButEuros= allAmounts._2
everythingButEuros: List[Money] = List(Money(10,USD),Money(20,GBP),Money(100,USD),Money(50,USD))

A cleaner and more natural syntax to achieve this as a one-liner, is the one that expresses the partition method without referring to ._1 and ._2, as shown in the following code snippet:

scala> val (euros,everythingButEuros) = amounts.partition(amt =>
     |   amt.currency=="EUR")
euros: List[Money] = List(Money(2,EUR), Money(75,EUR))
everythingButEuros: List[Money] = List(Money(10,USD), Money(20,GBP), Money(100,USD), Money(50,USD))

This time, as a result, we get two variables, euros and everythingButEuros, which we can reuse individually:

scala> euros
res2: List[Money] = List(Money(2,EUR), Money(75,EUR))

Introducing Map

Another elegant usage of tuples is related to the definition of a Map collection, another structure that is part of the Scala collections. Similar to Java, the Map collection stores key-value pairs. In Java, a trivial HashMap definition that populates and retrieves elements of a Map collection with a couple of values can be written with a few lines of code:

import java.util.HashMap;
import java.util.Map;

public class MapSample {
    public static void main(String[] args) {
        Map amounts = new HashMap<String,Integer>();
        amounts.put("USD", 10);
        amounts.put("EUR", 2);

        Integer euros = (Integer)amounts.get("EUR");
        Integer pounds = (Integer)amounts.get("GBP");

        System.out.println("Euros: "+euros);
        System.out.println("Pounds: "+pounds);
    }
}

Since no amount of GBP currency has been inserted into the Map collection, running this sample will return a null value for the Pounds variable:

Euros: 2
Pounds: null

Populating a Map collection in Scala can be elegantly written as follows:

scala> val wallet = Map( "USD" -> 10, "EUR" -> 2 )
wallet: scala.collection.immutable.Map[String,Int] = Map(USD -> 10, EUR -> 2)

The "USD" -> 10 expression is a convenient way of specifying a key-value pair and is equivalent to the definition of a Tuple2[String,Integer] object in this case, as illustrated directly in the REPL (which could infer the type automatically):

scala> val tenDollars = "USD"-> 10
tenDollars : (String, Int) = (USD,10)
scala> val tenDollars = ("USD",10)
tenDollars : (String, Int) = (USD,10)

The process of adding and retrieving an element is very straightforward:

scala> val updatedWallet = wallet + ("GBP" -> 20)
wallet: scala.collection.immutable.Map[String,Int] = Map(USD -> 10, EUR -> 2, GBP -> 20)
scala> val someEuros = wallet("EUR")
someEuros: Int = 2

However, accessing an element that is not included in the Map collection will throw an exception, as follows:

scala> val somePounds = wallet("GBP")
java.util.NoSuchElementException: key not found: GBP  (followed by a full stacktrace)

Introducing the Option construct

A safer way to retrieve an element from the Map collection that was introduced in the previous section is to invoke its .get() method, which will instead return an object of type Option, a feature that is not currently available in Java. Basically, an Option type wraps a value into an object that can either return the type None if the value is null, or Some(value) otherwise. Let's enter this in the REPL:

scala> val mayBeSomeEuros = wallet.get("EUR")
mayBeSomeEuros: Option[Int] = Some(2)
scala> val mayBeSomePounds = wallet.get("GBP")
mayBeSomePounds: Option[Int] = None

A glimpse at pattern matching

Avoiding the throwing of an exception makes it convenient to continue handling the flow of an algorithm as an evaluated expression. It not only gives the programmer the freedom of sophisticated chaining of the Option values without having to check for the existence of a value, but also enables one to handle the two different cases via pattern matching:

scala> val status = mayBeSomeEuros match {
     |   case None => "Nothing of that currency"
     |   case Some(value) => "I have "+value+" Euros"
     | }
status: String = I have 2 Euros

Pattern matching is an essential and powerful feature of the Scala language. We will look at more examples of it later on.

The filter and partition methods were just two examples of the so-called "higher-order" functions on lists, since they operate on containers of collection types (such as lists, sets, and so on) rather than the types themselves.

The map method

Among the collections' methods that cannot be overlooked lies the map method (not to be confused with the Map object). Basically, it applies a function to every element of a collection, but instead of returning Unit for the foreach method, it returns a collection of a similar container type (for example, a List will return a List of the same size) that contains the result of transforming each element through the function. A very simple example is shown in the following code snippet:

scala> List(1,2,3,4).map(x => x+1)
res6: List[Int] = List(2,3,4,5)

In Scala, you may define standalone functions as follows:

scala> def increment = (x:Int) => x + 1
increment: Int => Int

We have declared an increment function that takes an Int value as the input (denoted by x) and returns another Int value (x+1).

The previous List transformation can be rewritten slightly in a different manner as shown in the following code snippet:

scala> List(1,2,3,4).map(increment)
res7: List[Int] = List(2,3,4,5)

Using a bit of syntactic sugar, the . sign in the method call, as well as the parenthesis on the function parameter can be omitted for readability, which leads to the following concise one-liner:

scala> List(1,2,3,4) map increment
res8: List[Int] = List(2, 3, 4, 5)

Going back to our initial list of the Money amounts, we can, for example, transform them into strings as follows:

scala> val printedAmounts =
     |   amounts map(m=> ""+  m.amount + " " + m.currency)
printedAmounts: List[String] = List(10 USD, 2 EUR, 20 GBP, 75 EUR, 100 USD, 50 USD)

Looking at String Interpolation

In Java, concatenating strings using a + operator, as we did in the previous line, is a very common operation. In Scala, a more elegant and efficient way to deal with the presentation of strings is a feature named String Interpolation. Available since Scala Version 2.10, the new syntax involves prepending a s character to the string literal as shown in the following code snippet:

scala> val many = 10000.2345
many: Double = 10000.2345
scala> val amount = s"$many euros"
amount: String = 10000.2345 euros 

Any variable in scope can be processed and embedded in a string. Formatting can even be more precise by using an f interpolator instead of s. In that case, the syntax follows the same style as that of the printf method of other languages, where, for instance, %4d means a four-digit formatting or %12.2f means a floating point notation with exactly twelve digits before the comma and two afterwards:

scala> val amount = f"$many%12.2f euros"
amount: String = "    10000.23 euros"

Moreover, the String Interpolation syntax enables us to embed the full evaluation of an expression, that is, a full block of code performing a calculation. The following is an example, where we want to display the value of our many variable twice:

scala> val amount = s"${many*2} euros"
amount: String = 20000.469 euros

The preceding block of code obeys the same rules as any method or function evaluation, meaning that the last statement in the block is the result. Although here we have a very simple computation, it is perfectly valid to include a multiline algorithm if needed.

Knowing the interpolation syntax, we can rewrite our previous amounts as follows:

scala> val printedAmounts =
     |   amounts map(m=> s"${m.amount} ${m.currency}")
printedAmounts: List[String] = List(10 USD, 2 EUR, 20 GBP, 75 EUR, 100 USD, 50 USD)

The groupBy method

Another convenient operation is the groupBy method that transforms a collection into a Map collection:

scala> val sortedAmounts = amounts groupBy(_.currency)
sortedAmounts: scala.collection.immutable.Map[String,List[Money]] = Map(EUR -> List(Money(2,EUR), Money(75,EUR)), GBP -> List(Money(20,GBP)), USD -> List(Money(10,USD), Money(100,USD), Money(50,USD)))

The foldLeft method

One last method that we would like to introduce here is the foldLeft method, which propagates some state from one element to the next. For instance, to sum elements in a list, you need to accumulate them and keep track of the intermediate counter from one element to the next:

scala> val sumOfNumbers = numbers.foldLeft(0) { (total,element) =>
     |   total + element
     | }
sumOfNumbers: Int = 21

The value 0 given as the first argument to foldLeft is the initial value (which means total=0 when applying the function for the first List element). The (total,element) notation represents a Tuple2 pair. Note, however, that for summation, the Scala API provides a sum method, so the last statement could have been written as follows:

scala> val sumOfNumbers = numbers.sum
sumOfNumbers: Int = 21