Book Image

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Book Image

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
Table of Contents (22 chapters)
1
Part 1:Introduction to Statistics
7
Part 2:Regression Models
10
Part 3:Classification Models
13
Part 4:Time Series Models
17
Part 5:Survival Analysis

Cox Proportional Hazards regression model

Survival analysis, also called TTE analysis, as we discussed in Chapter 13, Time-to-Event Variables, is an analytical approach that uses probability to estimate the time remaining before an event occurs based on previous observations. We have seen how this can be helpful when including appropriate covariates in applications such as estimating life expectancy, mechanical failure, and customer churn, which can help with prioritizing needs and to more efficiently allocate resources. As we discussed in depth in Chapter 13, censoring is an aspect making survival analysis unique from other statistical questions that can be solved using techniques such as regression. Consequently—and because dropping an observation due to censoring will almost certainly mislead our model and provide results we cannot trust—we insert what is known as an event status indicator to help account for whether an event will occur or fail to occur prior to estimating...