Book Image

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Book Image

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
Table of Contents (22 chapters)
1
Part 1:Introduction to Statistics
7
Part 2:Regression Models
10
Part 3:Classification Models
13
Part 4:Time Series Models
17
Part 5:Survival Analysis

Dimension reduction

In this section, we will use a specific technique – PCR – to study MLR. This technique is useful when we need to deal with a multicollinearity data issue. Multicollinearity occurs when an independent variable is highly correlated with another independent variable, or an independent variable can be predicted from another independent variable in a regression model. A high correlation can affect the result poorly when fitting a model.

The PCR technique is based on PCA as used in unsupervised machine learning for data compression and exploratory analysis. The idea behind it is to use the dimension reduction technique, PCA, on these original variables to create new uncorrelated variables. The information obtained on these new variables helps us to understand the relationship and then apply the MLR algorithm to these new variables. The PCA technique can also be used in a classification problem, which we will discuss in the next chapter.

PCA –...