Book Image

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Book Image

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
Table of Contents (22 chapters)
1
Part 1:Introduction to Statistics
7
Part 2:Regression Models
10
Part 3:Classification Models
13
Part 4:Time Series Models
17
Part 5:Survival Analysis

More on model evaluation

In the previous sections, we discussed other methods to prepare data, test and validate models. In this section, we will discuss how to validate time series models and introduce several methods for validating time series models. We will cover the following methods for model evaluation: resampling, shifting, optimized persistence forecasting, and rolling window forecasting.

The real-world dataset considered in this section is Coca Cola stock data collected from Yahoo Finance databases from 01/19/1962 to 12/19/2021 for stock price prediction. This is a time series analysis to forecast the future stock value of a given stock. The reader can download the dataset from the Kaggle platform for this analysis. To motivate the study, we first go to explore the Coco Cola stock dataset:

data = pd.read_csv("COCO COLA.csv", parse_dates=["Date"], index_col="Date")
Figure 11.26 – Coco Cola dataset

Figure 11.26 – Coco Cola dataset

The...