Please observe the following steps:
- Firstly, import the required libraries. We import pandas, pyspark.sql, and numpy for data manipulation, keras for machine learning, and sklearn for evaluating the model. After evaluating the model we use io, pickle, and mlflow to save the model and results so that it can be evaluated against other models:
from pyspark.sql.functions import *
from pyspark.sql.window import Window
import pandas as pd
import numpy as np
import io
import keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, Activation, LeakyReLU, Dropout
import pickle
import mlflow
- Next, we import training and testing data. Out training data will be used to train our models and our testing data will be used to evaluate the models:
X_train = spark.sql("select rolling_average_s2, rolling_average_s3,
...