In the previous recipe, we learned how to create or replace temporary views.
In this recipe, we will learn how to play with the data within a DataFrame using SQL queries.
To execute this recipe, you need to have a working Spark 2.3 environment. You should have gone through the Specifying the schema programmatically recipe, as we will be using the sample_data_schema
DataFrame we created there.
There are no other requirements.
In this example, we will extend our original data with the form factor for each model of Apple's computer:
models_df = sc.parallelize([ ('MacBook Pro', 'Laptop') , ('MacBook', 'Laptop') , ('MacBook Air', 'Laptop') , ('iMac', 'Desktop') ]).toDF(['Model', 'FormFactor']) models_df.createOrReplaceTempView('models') sample_data_schema.createOrReplaceTempView('sample_data_view') spark.sql(''' SELECT a.* , b.FormFactor FROM sample_data_view AS a LEFT JOIN models AS b ...