This is our last chapter for this book, and we have looked at the technology topics around Spark from architecture to the details of the APIs including RDDs, DataFrames, and machine learning and GraphX frameworks. In the last chapter, we covered a recommendation engine use case where we primarily looked at the Scala API. We've primarily used Scala, Python, or R-Shell. In this chapter, we will be using the Jupyter notebook with the Pyspark interpreter to look at the Churn prediction use case.
The chapter covers:
- Overview of customer churn
- Importance of churn prediction
- Understanding the dataset
- Exploring data
- Building a machine learning pipeline
- Predicting Churn
This chapter will hopefully give you a good introduction to churn prediction systems, which you can use as a baseline for other prediction activities.
Let's get started.