Most of the power of Spark SQL comes due to Catalyst optimizer, so it makes sense to spend some time understanding it.
Catalyst optimizer primarily leverages functional programming constructs of Scala such as pattern matching. It offers a general framework for transforming trees, which we use to perform analysis, optimization, planning, and runtime code generation.
Catalyst optimizer has two primary goals:
Make adding new optimization techniques easy
Enable external developers to extend the optimizer
Spark SQL uses Catalyst's transformation framework in four phases:
Analyzing a logical plan to resolve references
Logical plan optimization
Physical planning
Code generation to compile the parts of the query to Java bytecode