We are almost ready to build a prediction algorithm for the stock value performance of Apple. The remaining task at hand is to prepare the data in a manner that ensures the best possible predictive outcome.
We will perform transformations and visualizations on the dataframe in this section. This will require importing the following libraries in Python:
numpy
MinMaxScaler()
This section walks through the steps for preparing the stock market data for our model.
- Execute the following script to group the year column by the
Adj Close
count:
df.groupBy(['year']).agg({'Adj Close':'count'})\ .withColumnRenamed('count(Adj Close)', 'Row Count')\ .orderBy(["year"],ascending=False)\ .show()
- Execute the following script to create two new dataframes for training and testing purposes:
trainDF = df[df.year < 2017] testDF = df[df.year > 2016]
- Convert the two new dataframes to
pandas
dataframes to get row and column counts...