Book Image

Hands-On Predictive Analytics with Python

By : Alvaro Fuentes
Book Image

Hands-On Predictive Analytics with Python

By: Alvaro Fuentes

Overview of this book

Predictive analytics is an applied field that employs a variety of quantitative methods using data to make predictions. It involves much more than just throwing data onto a computer to build a model. This book provides practical coverage to help you understand the most important concepts of predictive analytics. Using practical, step-by-step examples, we build predictive analytics solutions while using cutting-edge Python tools and packages. The book's step-by-step approach starts by defining the problem and moves on to identifying relevant data. We will also be performing data preparation, exploring and visualizing relationships, building models, tuning, evaluating, and deploying model. Each stage has relevant practical examples and efficient Python code. You will work with models such as KNN, Random Forests, and neural networks using the most important libraries in Python's data science stack: NumPy, Pandas, Matplotlib, Seaborn, Keras, Dash, and so on. In addition to hands-on code examples, you will find intuitive explanations of the inner workings of the main techniques and algorithms used in predictive analytics. By the end of this book, you will be all set to build high-performance predictive analytics solutions using Python programming.
Table of Contents (11 chapters)

The k-fold cross-validation

So far, we have been evaluating our models in the test set. By now, it is clear why we do it; however, there is one point we have not discussed yet. Let's go back to the diamond prices problem. In this chapter, we have built a simple multiple linear regression model and we have calculated some metrics on the test set. Let's say that we will use the MAE for evaluating the model. When we calculated this metric, we got 733.67. Now let's repeat the same steps for model building:

  • Train-test split
  • Standardize the numeric features
  • Model training
  • Get predictions
  • Evaluate the model using the same metric

Here we have the code again:

## Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=2)

## Standardize the numeric features
scaler = StandardScaler()
scaler.fit(X_train[numerical_features])
X_train...