Book Image

Building Big Data Pipelines with Apache Beam

By : Jan Lukavský
Book Image

Building Big Data Pipelines with Apache Beam

By: Jan Lukavský

Overview of this book

Apache Beam is an open source unified programming model for implementing and executing data processing pipelines, including Extract, Transform, and Load (ETL), batch, and stream processing. This book will help you to confidently build data processing pipelines with Apache Beam. You’ll start with an overview of Apache Beam and understand how to use it to implement basic pipelines. You’ll also learn how to test and run the pipelines efficiently. As you progress, you’ll explore how to structure your code for reusability and also use various Domain Specific Languages (DSLs). Later chapters will show you how to use schemas and query your data using (streaming) SQL. Finally, you’ll understand advanced Apache Beam concepts, such as implementing your own I/O connectors. By the end of this book, you’ll have gained a deep understanding of the Apache Beam model and be able to apply it to solve problems.
Table of Contents (13 chapters)
1
Section 1 Apache Beam: Essentials
5
Section 2 Apache Beam: Toward Improving Usability
9
Section 3 Apache Beam: Advanced Concepts

Explaining the differences between classic and portable runners

The description in the previous section – Describing the anatomy of an Apache Beam runner – applies to both classic and portable runners. However, there are some important differences between the two.

A classic runner is a runner that is implemented using the same programming language as the Beam SDK. The runner is made in a way that enables it to run the specific SDK only. An example of a classic runner is a classic FlinkRunner instance, which uses Apache Flink, has a native API implemented in Java, and is able to execute Beam pipelines written in the Java SDK. We used this runner throughout the first five chapters of this book.

A portable runner is implemented using the portability layer and as a result, it can be used to execute pipelines written in any SDK that is supported by Beam. However, this flexibility comes at a price – a portable runner implemented in Java and running a pipeline implemented...