Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Data Engineering with Scala and Spark
  • Table Of Contents Toc
Data Engineering with Scala and Spark

Data Engineering with Scala and Spark

By : Eric Tome, Rupam Bhattacharjee, David Radford
4.2 (5)
close
close
Data Engineering with Scala and Spark

Data Engineering with Scala and Spark

4.2 (5)
By: Eric Tome, Rupam Bhattacharjee, David Radford

Overview of this book

Most data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount. This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You’ll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You’ll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users. By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.
Table of Contents (21 chapters)
close
close
1
Part 1 – Introduction to Data Engineering, Scala, and an Environment Setup
4
Part 2 – Data Ingestion, Transformation, Cleansing, and Profiling Using Scala and Spark
10
Part 3 – Software Engineering Best Practices for Data Engineering in Scala
13
Part 4 – Productionalizing Data Engineering Pipelines – Orchestration and Tuning
16
Part 5 – End-to-End Data Pipelines

Understanding functional programming

Functional programming is based on the principle that programs are constructed using only pure functions. A pure function does not have any side effects and only returns a result. Some examples of side effects are modifying a variable, modifying a data structure in place, and performing I/O. We can think of a pure function as just like a regular algebraic function.

An example of a pure function is the length function on a string object. It only returns the length of the string and does nothing else, such as mutating a variable. Similarly, an integer addition function that takes two integers and returns an integer is a pure function.

Two important aspects of functional programming are referential transparency (RT) and the substitution model. An expression is referentially transparent if all of its occurrences can be substituted by the result of the expression without altering the meaning of the program.

In the following example, Example 1.1, we set x and then use it to set r1 and r2, both of which have the same value:

scala> val x: String = "hello"
x: String = hello
scala> val r1 = x + " world!"
r1: String = hello world!
scala> val r2 = x + " world!"
r2: String = hello world!

Example 1.1

Now, if we replace x with the expression referenced by x, r1 and r2 will be the same. In other words, the expression hello is referentially transparent.

Example 1.2 shows the output from a Scala interpreter:

scala> val r1 = "hello" + " world!"
r1: String = hello world!
scala> val r2 = "hello" + " world!"
r2: String = hello world!

Example 1.2

Let’s now look at the following example, Example 1.3, where x is an instance of StringBuilder instead of String:

scala> val x = new StringBuilder("who")
x: StringBuilder = who
scala> val y = x.append(" am i?")
y: StringBuilder = who am i?
scala> val r1 = y.toString
r1: String = who am i?
scala> val r2 = y.toString
r2: String = who am i?

Example 1.3

If we substitute y with the expression it refers to (val y = x.append(" am i?")), r1 and r2 will no longer be equal:

scala> val x = new StringBuilder("who")
x: StringBuilder = who
scala> val r1 = x.append(" am i?").toString
r1: String = who am i?
scala> val r2 = x.append(" am i?").toString
r2: String = who am i? am i?

Example 1.4

So, the expression x.append(" am i?") is not referentially transparent.

One of the advantages of the functional programming style is it allows you to apply local reasoning without having to worry about whether it updates any globally accessible mutable state. Also, since no variable in the global scope is updated, it considerably simplifies building a multi-threaded application.

Another advantage is pure functions are also easier to test as they do not depend on any state apart from the inputs supplied, and they generate the same output for the same input values.

We won’t delve deep into functional programming as it is outside of the scope of this book. Please refer to the Further reading section for additional material on functional programming. In the rest of this chapter, we will provide a high-level tour of some of the important language features that the subsequent chapters build upon.

In this section, we looked at a very high-level introduction to functional programming. Starting with the next section, we will look at Scala language features that enable both functional and object-oriented programming styles.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Data Engineering with Scala and Spark
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon