In this chapter, you will learn the following recipes:
Working with Pandas
Variable identification
Sampling data
Summary and descriptive statistics
Generating frequency tables
Installing Pandas on Linux
Installing Pandas from source
Using IPython with PySpark
Creating Pandas DataFrames over Spark
Splitting, slicing, sorting, filtering and grouping DataFrames over Spark.
Implementing co-variance and correlation using DataFrames over Spark.
Concatenating and merging operations over DataFrames
Complex operations over DataFrames.
Sparkling Pandas