In this chapter, we will present several examples of developing applications using Spark SQL. We will primarily focus on text analysis-based applications, including preprocessing pipelines, bag-of-words techniques, computing readability metrics for financial documents, identifying themes in document corpuses, and using Naive Bayes classifiers. Additionally, we will describe the implementation of a machine learning example.
More specifically, you will learn about the following in this chapter:
- Spark SQL-based application's development
- Preprocessing textual data
- Building preprocessing data pipelines
- Identifying themes in document corpuses
- Using Naive Bayes classifiers
- Developing a machine learning application