Book Image

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Book Image

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
Table of Contents (22 chapters)
1
Part 1:Introduction to Statistics
7
Part 2:Regression Models
10
Part 3:Classification Models
13
Part 4:Time Series Models
17
Part 5:Survival Analysis

Shrinkage methods

The bias-variance trade-off is a decision point all statistics and machine learning practitioners must balance when performing modeling. Too much of either renders results useless. To catch these when they become issues, we look at test results and the residuals. For example, assuming a useful set of features and the appropriate model have been selected, a model that performs well on validation, but poorly on a test set could be indicative of too much variance and conversely, a model that fails to perform well at all could have too much bias. In either case, both models fail to generalize well. However, while bias in a model can be identified in poor model performance from the start, high variance can be notoriously deceptive as it has the potential to perform very well during training and even during validation, depending on the data. High-variance models frequently use values of coefficients that are unnecessarily high when very similar results can be obtained from...