Chapter 6. Using Spark SQL in Machine Learning Applications
n this chapter, we will present typical use cases for using Spark SQL in machine learning applications. We will focus on the Spark machine learning API called spark.ml
, which is the recommended solution for implementing ML workflows. The spark.ml
API is built on DataFrames and provides many ready-to-use packages, including feature extractors, Transformers, selectors, and machine learning algorithms, such as classification, regression, and clustering algorithms. We will also use Apache Spark to perform exploratory data analysis (EDA), data pre-processing, feature engineering, and developing machine learning pipelines using spark.ml
APIs and algorithms.
More specifically, in this chapter, you will learn the following topics:
- Machine learning applications
- Key components of Spark ML pipelines
- Understand Feature engineering
- Implementing machine learning pipelines/applications
- Code examples using Spark MLlib