Book Image

Learn Scala Programming

By : Slava Schmidt
Book Image

Learn Scala Programming

By: Slava Schmidt

Overview of this book

The second version of Scala has undergone multiple changes to support features and library implementations. Scala 2.13, with its main focus on modularizing the standard library and simplifying collections, brings with it a host of updates. Learn Scala Programming addresses both technical and architectural changes to the redesigned standard library and collections, along with covering in-depth type systems and first-level support for functions. You will discover how to leverage implicits as a primary mechanism for building type classes and look at different ways to test Scala code. You will also learn about abstract building blocks used in functional programming, giving you sufficient understanding to pick and use any existing functional programming library out there. In the concluding chapters, you will explore reactive programming by covering the Akka framework and reactive streams. By the end of this book, you will have built microservices and learned to implement them with the Scala and Lagom framework.
Table of Contents (19 chapters)

The Scala 2.13 Collection Library

Scala 2.13 delivers a new collection library, for historical reasons it is also known as "collection - strawman". The refactoring of the library pursued a few main goals, such as fixing common gotchas of the previous version, simplifying its implementation and internal structure, as well as usage and backward-compatibility, achieving better integration with lazy collections and java streams and cleaner API separation between mutable and immutable collections, improving performance, and, last but not least, minimizing the migration effort from Scala 2.12's collections.

As a result, we have a library that is mostly source-compatible with the previous version, has many old methods and types (such as Traversable, TraversableOnce, and Stream) deprecated, and has a simpler internal hierarchy.

This book assumes that the reader has a rudimentary understanding of Scala collections. With this assumption, the next section will take a holistic approach and focus on giving a consistent overview of the new collection framework.

The next diagram represents the top-level hierarchy of the collection library:

Here and further, we will pretend to always have import scala.collections._ in the scope and use the following colour encoding in our diagrams:

Each of the traits describes the structure, the essence of the collection. As the name suggests, IterableOnce can be iterated over only one time. Iterable softens this constraint so that it is possible to iterate over the collection multiple times. Seq adds a notion of succession to the elements of the collection, Set adds a constraint of the uniqueness of its elements, and Map changes the type of the collection from a single element, A, to a pair of key, K, and value, V.

As mentioned, in the spirit of the separation of concerns, these traits cover only the structural characteristics. The operations defined for the specific type are placed in the helper traits carrying the Ops suffix in the name. These traits form a hierarchical structure similar to the previous one, as shown here:

Where the "normal" traits had only one type parameter, the type of the element, the Ops have three of them. In addition to the type of the element, A, the C type describes the specific representation type of the collection this trait is mixed into and thus to the return type of the first-order methods defined on this collection. The CC type refers to the representation type that can be returned by the higher-order methods, or the type constructor. We will see later in this chapter how this works in practice.

Because the inheritance tree is structured as it is, IterableOps and IterableOnceOps are effectively mixed into every collection implementation in the library. Three traits on the bottom just add some more methods, unique to the specific collection type, and override some of the definitions for efficiency. Both Iterable*Ops traits define more than a hundred of methods and they are the reason the Scala collection library is very consistent and homogenous.

Because of the importance of IterableOnceOps and IterableOps, we will take a detailed look at them in the next section. After that, we will explore the unique features of the specialized collections.

IterableOnceOps

IterableOnceOps represents a blueprint for the collections that can be traversed one or multiple times. It defines a few abstract methods that must be implemented by every collection and a number of concrete methods implemented in terms of an iterator available from IterableOnce. The concrete methods provide default, if possible, lazy, implementations and fall into one of the following categories:

  • Size operations: isEmpty, nonEmpty, size, knownSize, and isTraversableAgain check the collection for (non) emptiness or return its size. knownSize is an optimization that returns -1 if the size cannot be determined without iterating over the collection. isTraversableAgain returns false for IterableOnce.
  • Element tests: forall, exists, and count check whether all, at least one, or some number of elements satisfy the given predicate.
  • String operations: mkString and addString. These methods with different argument sets provide a possibility to build alternative string representation.
  • Conversions to another collections: copyToArray, toList, toMap, to, toSet, toSeq, toIndexedSeq, toBuffer, and toArray. These methods copy or convert an Iterable into another collection. The to method is special in this list because it allows us to return any type of the collection that has a Factory available. We will look at it in more detail soon.
  • Fold and reduce: foldLeft, foldRight, reduce, reduceLeft, reduceRight, reduceOption, reduceLeftOption, and reduceRightOption apply a binary operation to the elements of the collection. The reduce*Option methods handle the case of an empty collection gracefully by returning None.
  • Numeric combinations: sum and product calculate the sum or product of the elements if there is an implicit Numeric[B] such that B >: A is available.
  • Ordering combinations: min, minOption, max, maxOption, maxBy, maxByOption, minBy, and minByOption find an element of the collection that satisfies giving predicate if there is an implicit Ordering[B] with B >: A available. The *Option methods return None for an empty collection instead of throwing an exception.
  • Element retrieval: collectFirst and find. Choose an element that satisfies a given condition.
  • Equality: corresponds is an alternative way to compare collections. Satisfied if every element of this collection relates to matching element of another collection by given predicate.

Abstract methods fall into one of the following categories:

  • Subcollection retrieval: filter, filterNot, take, takeWhile, drop, dropWhile, slice, and span. Take or discard elements that satisfy the given predicate or range, from the whole collection or beginning of it.
  • Mapping: map, flatMap, collect, and scanLeft. Transforms elements of the collection by applying some function and possibly filtering the results.
  • Zippers: zipWithIndex adds an index to all elements of the collection.

IterableOps

IterableOps extends IterableOnceOps and contains methods that is impossible to implement without a possibility to iterate over the collection multiple times.

They fall into the following categories:

  • Element retrieval: head, headOption, last, and lastOption return the first or last element of the collection throwing NoSuchElementException or returning None for an empty collection.
  • Size: sizeCompare is an optimization that allows us to efficiently compare the size of the collection with given value.
  • Subcollection retrieval: partition, partitionWith, splitAt, takeRight, dropRight, grouped, sliding, tail, init, groupBy, groupMap, groupMapReduce, tails, and inits. These split the collection as defined by some predicate or index, take or drop elements from the end, group elements by some criteria or predicate possibly applying transformative function, and discard first or last elements.
  • Mapping: scanRight produces a collection containing the cumulative results of applying the giving function starting from the end of the collection.
  • Addition: concat, ++ returns another collection containing all elements of this collection and a collection provided as an argument.
  • Zippers: zip, zipAll, unzip, and unzip3 combine the elements of the collection with the elements of another collection into a product, or split them into separate collections.
  • Transformation: transpose transforms the collection of collections by turning rows into columns and vice versa.

The following methods that defined the abstract in IterableOnceOps got a concrete default implementation in IterableOps: filter, filterNot, take, takeWhile, span, drop, dropWile, slice, scanLeft, map, flatMap, flatten, collect, and zipWithIndex. isTraversableAgain is overriden to return true.

It's worth noting that Iterable and IterableOnce do not support a general-equality operation, it is defined on specific collection subtypes. Because of this, it is impossible to compare these types directly using the equality operation, as the following example suggests:

scala> Set(1,2,3) == Seq(1,2,3)
res4: Boolean = false

Also there are three special methods that deserve our additional attention because they introduce types we haven't met yet:

  • def withFilter(p: A => Boolean): collection.WithFilter[A, CC]
  • def iterableFactory: IterableFactory[CC]
  • def view: View[A]

Let's discuss them quickly before moving on to more specific collection types.

WithFilter

WithFilter is a template class that contains the map, flatMap, foreach, and withFilter methods of Iterable. It allows us to specialize mapping and filtering operations for distinguished collections.

Because of its technical nature, we won't go into further details here.

IterableFactory

trait IterableFactory[+CC[_]] is a base trait for companion objects for specific collections which provides a number of operations to create a specific collection with the type specified by the CC type constructor; this is sometimes called target-type driven building because the type of the source collection is ignored. Most of the companion objects in the collection library extend this trait, which makes it possible to use them in places where IterableFactory is expected.

As this is the main abstraction that allows to build a collection from scratch, it is useful to know which methods it supplies. All of them return CC[A]. The following table contains a short summary:

def from[A](source: IterableOnce[A]) Creates a target collection from existing IterableOnce.
def empty[A]: CC[A] An empty collection, often defined as an object.
def apply[A](elems: A*): CC[A] Creates a collection from the elems given as var-arg.
def iterate[A](start: A, len: Int)(f: A => A): CC[A] Fills the collection with values taken from the result of the application of f on start, then on produced values and so on, len number of times.
def range[A : Integral](start: A, end: A, step: A): CC[A] Collection containing increasing integers [start, end-1] with difference between successive numbers of step. There is also a version of this method with default value of step = 1.
def fill[A](n: Int)(elem: => A): CC[A] Fills the collection with n evaluations of elem. There are variations of this function that go up to five dimensions.
def tabulate[A](n: Int)(f: Int => A): CC[A] The same as fill, but using index as an argument for the evaluation. Similarly, there are variations of this function that go up to five dimensions.
def concat[A](xss: Iterable[A]*): CC[A] Concatenates all argument collections into a single collection.
def unfold[A, S](init: S)(f: S => Option[(A, S)]): CC[A] Calls f to produce the element of the collection using and modifying the internal state starting with the init state.

Arguably, IterableFactory provides a lot of different possibilities to create a collection of a desired type.

View

View has been reimplemented in the new version of the library. Now it represents a reified Iterator operations.

Reification is the process by which an abstract idea about a computer program is turned into an explicit data model or other object created in a programming language (https://en.wikipedia.org/wiki/Reification_(computer_science)).

This means that the Iterator methods are represented as a subclasses of View and encapsulate transformations to apply. The evaluation happens at the moment the view is converted to the strict collection type, or traversed, for example using the foreach method. Views don't remember the type of the source collection. This can be demonstrated by the following example. First, we define a generic transformation that might be strict or lazy, depending on the type of the collection given as an argument:

def transform[C <: Iterable[Char]](i: C): Iterable[Char] = i 
map { c => print(s"-$c-"); c.toUpper }
take { println("\ntake"); 6 }

Next, for each transformation step, we print out its result in the console at the moment the step happens. Now we can compare lazy and strict collection behaviors:

val str = "Scala 2.13"
val view: StringView = StringView(str)
val transformed = transform(view) // A
val strict = transform(str.toList) // B
print("Lazy view constructed: ")
transformed.foreach(print) // C
print("\nLazy view forced: ")
println(transformed.to(List)) // D
println(s"Strict: $strict") // E

This snippet produces the following output in the REPL:

take
-S--c--a--l--a-- --2--.--1--3-
take
Lazy view constructed: -S-S-c-C-a-A-l-L-a-A- -
Lazy view forced: -S--c--a--l--a-- -List(S, C, A, L, A, )
Strict: List(S, C, A, L, A, )

In the first line, we can see that the take method is always evaluated strictly regardless of the underlying collection typethis is commented as A in the preceding code. The second and third lines show the strict evaluation for List[Char], line B in the code. Lines 4 and 5 demonstrate that View[Char] is then evaluated twice, each time at the moment it is forced, once by calling foreach (line C) and once by converting it to the List (line D). Also interesting is that map is only applied to the results of the take method even given the fact that map is the first transformation step in the chain.

Set

Set is a base trait for collections that have a notion of uniques of the elements. It is defined as trait Set[A] extends Iterable[A] with SetOps[A, Set, Set[A]] with Equals. We can see that it is effectively an Iterable that has some additional operations defined in SetOps and adds a notion of equality among sets. The sub-hierarchy of Set is represented in the following diagram:

The previously mentioned SetOps adds a few methods on top of IterableOps. These methods are shown here:

  • Element retrieval: contains and apply return true if this set contains a given element.
  • Equality: subsetOf and subsets check whether this set is a subset of another set, or return all subsets of this set, possibly with a given size.
  • Combinations with another set: intersect, &, diff, &~, concat, ++, union, and |. These methods compute the intersection, difference, or union of this and another set.

There are few classes in the hierarchy that augment Set with further properties:

  • SortedSet extends Set with SortedOps[A, +C] and has two immutable and two mutable implementations—two TreeSets and two BitSets. SortedOps embodies following methods that depend on the notion of Ordering:
  • Key retrieval: firstKey and lastKey return the first or last element of this collection.
  • Subcollection retrieval: range, rangeFrom, rangeUntil, and rangeTo create a ranged projection of this collection, satisfying given criteria.

Because of the overloading, SortedSet has many overloaded methods defined twice, with and without ordering. If an operation is intended to be applied to the underlying unsorted Set, the type has to be coerced:

scala> import scala.collection.SortedSet
import scala.collection.SortedSet
scala> val set = SortedSet(1,2,3)
set: scala.collection.SortedSet[Int] = TreeSet(1, 2, 3)
scala> val ordered = set.map(math.abs)
ordered: scala.collection.SortedSet[Int] = TreeSet(1, 2, 3)
scala> val unordered = set.to(Set).map(math.abs)
unordered: scala.collection.immutable.Set[Int] = Set(1, 2, 3)

Please note that direct type-ascription will not work in the case of Set because its definition is invariant:

scala> val set1: Set[Int] = SortedSet(1,2,3)
^
error: type mismatch;
found : scala.collection.SortedSet[Int]
required: Set[Int]

The invariance of Set is related to the fact that Set[A] extends a function, A => Boolean, that returns true if the set contains a given element. Thus, sets can be used in places where such one argument function is expected:

scala> ordered.forall(set)
res3: Boolean = true

There are four more specific set implementations in addition to TreeSet and BitSet:

  • ListSet implements immutable sets using a list-based data structure.
  • The immutable HashSet realizes immutable sets using a Compressed Hash-Array Mapped Prefix-tree (CHAMP)
  • The mutable HashSet and LinkedHashSet implement mutable sets using a hash table, storing the data unordered and ordered, respectively.

The sets are closely related to Map, which represents a collection with elements represented as a pair of keys and values.

Map

Map is defined as trait Map[K, +V] extends Iterable[(K, V)] with MapOps[K, V, Map, Map[K, V]] with Equals which makes it an Iterable over pairs of key, K, and value, V. It also defines a notion of equality among maps. The class hierarchy for Map is represented on the following diagram:

There are three different MapOps, one general for both mutable and immutable, and one specific for each of these forms.

MapOps extends IterableOps with the following specific operations:

  • Element retrieval: get, getOrElse, apply, applyOrElse, default, contains, and isDefinedAt. These methods allow us to retrieve a value or check whether a value is present by a given key, optionally returning a default value or throwing an exception if the key can't be found.
  • Subcollection retrieval: keySet, keys, values, keysIterator, and valuesIterator allow us to get a keys or values in different forms.
  • Mapping: map, flatMap, and collect are transforming and optionally filtering the pairs of keys and values.
  • Addition: concat returns a new collection with elements of both maps combined.

immutable.MapOps adds the following methods on top of MapOps:

  • Element removal: remove, removeAll, -- removes one or all given elements from the map returning new map.
  • Element updates: updated and + update an element with the given key returning new map.
  • Mapping: transform applies a given function to all elements, producing a new map with the returned results as values.

mutable.MapOps has a different set of methods as compared to the mutable one:

  • Element addition: put adds a new value or updates existing one.
  • Element update: updated, +, and getOrElseUpdate updates a value in place.
  • Element removal: remove and clear remove one or all elements of the map.
  • Filtering: filterInPlace retains only mappings that satisfy the predicate.
  • Mapping: mapValuesInPlace applies a transformation to the values of the map, storing returned results as values.

The general Map definition has quite a few specialized subtypes, as shown in the preceding diagram. We will take a quick look at them now.

SortedMap

SortedMap is similar to SortedSet. It has a two implementations, a mutable and immutable TreeMap, and provides a few methods defined in terms of SortedOps such as:

  • Subcollection retrieval: iteratorFrom, keysIteratorFrom, valuesIteratorFrom, and rangeTo give us a way to get elements of the map as an iterator.
  • Element retrieval: firstKey, lastKey, minAfter, and maxBefore allow us to retrieve an element that satisfies some ordering condition.

HashMap

HashMap is also available in two flavors—immutable and mutable.

The immutable HashMap is implemented using a CHAMP tree.

The mutable HashMap implements mutable maps using a hashtable. A hash table stores its elements in an array. The hash code of the item is used to calculate the position in the array.

MultiMap

MultiMap is a trait for mutable maps that have multiple values assigned to a key.

It defines the addBinding, removeBinding and entryExists methods, which can be used to query or manipulate entries for a key.

SeqMap

SeqMap is a generic abstraction for ordered immutable maps. SeqMap itself exists in mutable and immutable forms. These forms have few different implementations:

  • The immutable ListMap implements immutable maps using a list-based data structure. The methods traversing ListMap visit its elements in the order they were inserted.
  • The mutable ListMap is a simple mutable map backed by a list. It preserves insertion order as its immutable sibling does.
  • VectorMap exists only in immutable form. It is implemented using a vector/map-based data structure and preserves insertion order. It has constant lookup but slower other operations.
  • LinkedHashMap is a mutable map whose implementation is based on a hashtable and preserves the insertion order if iterated over.

Seq

Seq is probably the most ubiquitous collection in the library. Like Map, it has a notion of succession of elements and the elements have indices. It is defined as trait Seq[+A] extends Iterable[A] with PartialFunction[Int, A] with SeqOps[A, Seq, Seq[A]] with Equals. Also similar to map, Seq specifies support for the equality relation and also extends PartialFunction, which accepts an index of the element as a parameter. As there are a lots of classes implementing Seq, we will take a gradual approach and look at them level by level. The direct children of Seq are shown in the following diagram:

scala.Seq, known from previous Scala versions, is now replaced by scala.collection.immutable.Seq.

As with other collections, SeqOps extend IterableOps by adding quite a few methods:

  • Element retrieval: apply retrieves an element with a given index.
  • Indexing and search: segmentLength, isDefinedAt, indexWhere, indexOf, lastIndexOf, lastIndexWhere, indexOfSlice, lastIndexOfSlice, containsSlice, contains, startsWith, endsWith, indices, and search. These methods allow us to retrieve information about the presence or indexes of elements or subsequences given some predicate or element value.
  • Size: length and lengthCompare provide efficient operations to retrieve the length of the collection.
  • Addition: prepend, +, appended, :+, prependAll, ++, appendedAll, :++, concat, and union. Can be used to append or prepend one or multiple elements to the collection.
  • Filtering: distinct and distinctBy remove duplicates, possibly given some predicate.
  • Reversal: reverse and reverseIterator return a new collection with elements in the reversed order.
  • Sorting: sorted, sortWith, and sortBy sort this collection by some implicit Ordering, or by a given function, or both.
  • Equality: sameElements and corresponds check whether this collection contains the same elements in the same order as given, using equality-checking or the provided comparison function.
  • Subcollection retrieval: permutations and combinations. These methods allow us to retrieve a subcollection(s) that satisfies given conditions.
  • Updates: diff, intersect, patch, updated, and update (mutable). Modify elements of this collection using another collection or element and returning another collection (except the last method defined on mutable.Seq, which update elements in place).

Each of the Seq direct descendants has its own specific properties and a subtree of implementations. We'll breeze through them now.

IndexedSeq

IndexedSeq does not introduce new operations, at least not in its immutable incarnation, but overrides a lots of methods defined in SeqOps to provide more efficient implementations. There are four classes implementing it:

The mutable. IndexedSeq adds the following mutation options: mapInPlace, sortInPlace, sortInPlaceWith, and sortInPlaceBy.

The mutable. ArraySeq is a collection representing an Array. It defines an array method returning an underlying array, and an elemTag method that returns the tag of the element type needed to properly support different kinds of arrays as required by the JVM. Because of this requirement, it has separate implementations for all primitive types including numeric types (in addition to ofByte, there are implementations for all other numeric primitives, not shown on the diagram) and Boolean, AnyRef and Unit.

The immutable. ArraySeq was added in the version 2.13. It is an effectively an immutable sequence that wraps an array and is used to pass varargs parameters. It has the same descendants as its mutable cousin.

Range is an immutable structure that contains integer values. It is defined by start, end, and a step. There are two additional methods available: isInclusive, which is true for Range.Inclusive and false for Range.Exclusive, and by, which creates the new range with a different step but the same start and end.

Vector is an immutable structure that provides constant-time access and updates and fast append and prepend. Because of this, Vector is the default implementation for IndexedSeq, as demonstrated in the following snippet:

scala> IndexedSeq.fill(2)("A")
res6: IndexedSeq[String] = Vector(A, A)

WrappedString is an immutable wrapper over some String. It extends strings with all of the operations defined in IndexedSeqOps.

LinearSeq

Linear sequences have the notion of a head and a tail. The definition looks like trait LinearSeq[+A] extends Seq[A] with LinearSeqOps[A, LinearSeq, LinearSeq[A]] and the class diagram is shown here:

There are three representatives of LinearSeq, all are immutable:

  • List defines three symbolic methods which provide a nice syntax for pattern-matching and building lists. :: prepends element to the list, ::: prepends all elements of a given list, and reverse_::: prepends all elements of a given list but in the reverse order.
  • LazyList is a new immutable collection available since Scala 2.13. It implements a list with a head and a tail, which are not evaluated until needed. As it is superior to Stream, which is only lazy in the tail, Stream is now deprecated. LazyList has two additional methods, force which evaluates it, and lazyAppendAll which lazily appends a given collection to this list.
  • Queue in this hierarchy is also immutable. It allows a first-in-first-out (FIFO) insertion and retrieval of the elements. For this functionality, it defines the enqueue, enqueueAll, dequeue, dequeueOption, and front methods.

Buffers

Buffers conclude our rush through the collection library. In essence, Buffer is just a Seq that can grow and shrink. This sub-hierarchy exists only in immutable form, though IndexedBuffer inherits from both Buffer and IndexedSeq, as shown by the next diagram:


Let's take a look at the methods that these collections define, in addition to the definitions inherited from SeqOps:

  • Buffer defines methods to add or remove one or more elements in place or returning new buffer: prepend, append, appendAll, +=:, prependAll, ++=:, insert, insertAll, remove, subtractOne, trimStart, trimEnd, patchInPlace, dropInPlace, dropRightInPlace, takeRightInPlace, sliceInPlace, dropWhileInPlace, takeWhileInPlace, and padToInPlace.
  • ListBuffer is a concrete buffer implementation baked by a List. In addition to other discussed methods, it provides prependToList, which allows us to prepend this collection to another list and a triple of mapInPlace, flatMapInPlace, and filterInPlace, giving us a possibility to modify elements in place.
  • Take a Buffer, add an IndexedSeq, and you'll get an IndexedBuffer. Similar to ListBuffer, it provides the flatMapInPlace and filterInPlace methods.
  • ArrayBuffer is a concrete implementation of IndexedBuffer that uses an array to store its elements and has a constant time for append, update, and random access. It has a sizeHint method, which can be used to enlarge the underlying array. It is a default implementation instantiated if mutable.Seq is created.
  • ArrayDeque is another efficient collection that emerged in the 2.13 version. It implements a double-ended queue that internally uses a resizable circular buffer. This allows to have a constant time for append, prepend, remove first, remove last, and random-access operations. There are lots of additional methods available on this collection, mostly because of the notion of the second-end: removeHeadOption, removeHead, removeLastOption, removeLast, removeAll, removeAllReverse, removeHeadWhile, removeLastWhile, removeFirst, removeAll, clearAndShrink, copySliceToArray, and trimToSize.
  • The Queue in this hierarchy is mutable. It is based on ArrayDeque and allows us to insert and retrieve elements in the FIFO manner. The following methods are available for that: enqueue, enqueueAll, dequeue, dequeueFirst, dequeueAll, and dequeueWhile.
  • The Stack is similar to Queue, but it implements in last-in-first-out (LIFO) order instead of FIFO. The methods it defines are formulated in the corresponding terms: push, pushAll, pop, popAll, popWhile, and top.

Scala Collection Contrib library

Needless to say, the standard Scala collection library is very rich and provides collections for most of the common use cases. But of course, there are some other structures that might be useful in a number of specific situations. The Scala Collection Contrib module is Scala's way of having both a stable standard library and some extra features. In a sense, this module is an incubator for the new collection types; types that prove to be useful for a broad audience will presumably be incorporated into the standard library in further Scala versions.

Currently, there are four collection types available in the module, each both mutable and immutable:

  • MultiDict
  • SortedMultiDict
  • MultiSet
  • SortedMultiSet

Also this library provides a possibility to define additional operations on existing collections via an implicit enrichment. The following import is required to make it available:

import scala.collection.decorators._

And it delivers these methods:

  • On Seq: intersperse and replaced
  • On Map: zipByKey, join, zipByKeyWith, mergeByKey, mergeByKeyWith, fullOuterJoin, leftOuterJoin, and rightOUterJoing

Please consult the module documentation at https://github.com/scala/scala-collection-contrib for further details.