Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Data Engineering with Scala and Spark
  • Table Of Contents Toc
Data Engineering with Scala and Spark

Data Engineering with Scala and Spark

By : Eric Tome, Rupam Bhattacharjee, David Radford
4.2 (5)
close
close
Data Engineering with Scala and Spark

Data Engineering with Scala and Spark

4.2 (5)
By: Eric Tome, Rupam Bhattacharjee, David Radford

Overview of this book

Most data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount. This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You’ll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You’ll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users. By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.
Table of Contents (21 chapters)
close
close
1
Part 1 – Introduction to Data Engineering, Scala, and an Environment Setup
4
Part 2 – Data Ingestion, Transformation, Cleansing, and Profiling Using Scala and Spark
10
Part 3 – Software Engineering Best Practices for Data Engineering in Scala
13
Part 4 – Productionalizing Data Engineering Pipelines – Orchestration and Tuning
16
Part 5 – End-to-End Data Pipelines

Variance

As mentioned earlier, functions are first-class objects in Scala. Scala automatically converts function literals into objects of the FunctionN type (N = 0 to 22). For example, consider the following anonymous function:

val f: Int => Any = (x: Int) => x

Example 1.45

This function will be converted automatically to the following:

val f = new Function1[Int, Any] {def apply(x: Int) = x}

Example 1.46

Please note that the preceding syntax represents an object of an anonymous class that extends Function1[Int, Any] and implements its abstract apply method. In other words, it is equivalent to the following:

class AnonymousClass extends Function1[Int, Any] {
  def apply(x: Int): Any = x
}
val f = new AnonymousClass

Example 1.47

If we refer to the type signature of the Function1 trait, we would see the following:

Function1[-T1, +T2]

Example 1.48

T1 represents the argument type and T2 represents the return type. The type variance of T1 is contravariant and that of T2 is covariant. In general, covariance designed by + means if a class or trait is covariant in its type parameter T, that is, C[+T], then C[T1] and C[T2] will adhere to the subtyping relationship between T1 and T2. For example, since Any is a supertype of Int, C[Any] will be a supertype of C[Int].

The order is reversed for contravariance. So, if we have C[-T], then C[Int] will be a supertype of C[Any].

Since we have Function1[-T1, +R], that would then mean type Function1[Int, Any] will be a supertype of, say, Function1[Any, String].

To see it in action, let’s define a method that takes a function of type Int => Any and returns Unit:

def caller(op: Int => Any): Unit = List
  .tabulate(5)(i => i + 1)
  .foreach(i => print(s"$i "))

Example 1.49

Let’s now define two functions:

scala> val f1: Int => Any = (x: Int) => x
f1: Int => Any = $Lambda$9151/1234201645@34f561c8
scala> val f2 : Any => String = (x: Any) => x.toString
f2: Any => String = $Lambda$9152/1734317897@699fe6f6

Example 1.50

A function (or method) with a parameter of type T can be invoked with an argument that is either of type T or its subtype. And since Int => Any is a supertype of Any => String, we should be able to pass both of these functions as arguments. As can be seen, both of them indeed work:

scala> caller(f1)
1 2 3 4 5
scala> caller(f2)
1 2 3 4 5

Example 1.51

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Data Engineering with Scala and Spark
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon