PySpark API
We have been using the PySpark API across all sections when describing the features of Azure Databricks without discussing too much of its functionalities and how we can leverage them to make reliable ETL operations when working with big data. PySpark is the Python API for Apache Spark, a cluster-computing framework that is the heart of Azure Databricks.
Main functionalities of PySpark
PySpark allows you to harness the power of distributed computing with the ease of use of Python and it's the default way in which we express our computations through this book unless stated otherwise.
The fundamentals of PySpark lies in the functionality of its sub-packages of which the most central are the following: