Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Understanding the Catalyst optimizer


Most of the power of Spark SQL comes due to Catalyst optimizer, so it makes sense to spend some time understanding it.

How it works…

Catalyst optimizer primarily leverages functional programming constructs of Scala such as pattern matching. It offers a general framework for transforming trees, which we use to perform analysis, optimization, planning, and runtime code generation.

Catalyst optimizer has two primary goals:

  • Make adding new optimization techniques easy

  • Enable external developers to extend the optimizer

Spark SQL uses Catalyst's transformation framework in four phases:

  • Analyzing a logical plan to resolve references

  • Logical plan optimization

  • Physical planning

  • Code generation to compile the parts of the query to Java bytecode

Analysis

The analysis phase involved looking at a SQL query or a DataFrame, creating a logical plan out of it, which is still unresolved (the columns referred may not exist or may be of wrong datatype) and then resolving this plan using...