Many data scientists use R to perform exploratory data analysis, data visualization, data munging, data processing, and machine learning tasks. SparkR is an R package that enables practitioners to work with data by leveraging the Apache Spark's distributed processing capabilities. In this chapter, we will cover SparkR (an R frontend package) that leverages Spark's engine to perform data analysis at scale. We will also describe the key elements of SparkR's design and implementation.
More specifically, in this chapter, you will learn the following topics:
- What is SparkR?
- Understanding the SparkR architecture
- Understanding SparkR DataFrames
- Using SparkR for Exploratory Data Analysis (EDA) and data munging tasks
- Using SparkR for data visualization
- Using SparkR for machine learning