This section is going to focus on creating various charts and plots to visually represent various aspects of the MovieLens 100K Dataset that are related to the use cases described in the preceding section. The charts and plots drawing process described throughout this chapter follows a pattern. Here are the important steps in that pattern of activities:
Read data from the data file using Spark.
Make the data available in a Spark DataFrame.
Apply the necessary data processing using DataFrame API.
The processing is mainly to make available only the minimal and required data for charting and plotting purposes.
Transfer the processed data from Spark DataFrame to the local Python collection object in the Spark Driver program.
Use the charting and plotting libraries to generate the figures using the data available in the Python collection objects.