Before any modeling and predictions are performed on the data, it is important to first explore and visualize the data at hand for any hidden gems.
We will perform transformations and visualizations on the dataframe in this section. This will require importing the following libraries in Python:
pyspark.sql.functions
matplotlib
The following section walks through the steps to explore and visualize the stock market data.
- Transform the
Date
column in the dataframe by removing the timestamp using the following script:
import pyspark.sql.functions as f df = df.withColumn('date', f.to_date('Date'))
- Create a for-cycle to add three additional columns to the dataframe. The loop breaks apart the
date
field intoyear
,month
, andday
, as seen in the following script:
date_breakdown = ['year', 'month', 'day'] for i in enumerate(date_breakdown): index = i[0] name = i[1] df = df.withColumn(name, f.split('date', '-')[index...