In the previous recipe, we saw how to create a DataFrame. The next natural step, after creating DataFrames, is to play with the data inside them. Other than the numerous functions that help us to do that, we also find other interesting functions that help us sample the data, print the schema of the data, and so on. We'll take a look at them one by one in this recipe.
Note
The code and the sample file for this recipe could be found at https://github.com/arunma/ScalaDataAnalysisCookbook/blob/master/chapter1-spark-csv/src/main/scala/com/packt/scaladata/spark/csv/DataFrameCSV.scala.
Now, let's see how we can manipulate DataFrames using the following subrecipes:
Printing the schema of the DataFrame
Sampling data in the DataFrame
Selecting specific columns in the DataFrame
Filtering data by condition
Sorting data in the frame
Renaming columns
Treating the DataFrame as a relational table to execute SQL queries
Saving the DataFrame as a file