Book Image

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Book Image

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
Table of Contents (22 chapters)
1
Part 1:Introduction to Statistics
7
Part 2:Regression Models
10
Part 3:Classification Models
13
Part 4:Time Series Models
17
Part 5:Survival Analysis

Summary

This chapter covered topics of parametric tests. Starting with the assumptions of parametric tests, we identified and applied methods for testing the violation of these assumptions and discussed scenarios where robustness can be assumed when the required assumptions are not met. We then looked at one of the most popular alternatives to the z-test, the t-test. We iterated through multiple applications of this test, covering one-sample and two-sample versions of this test using pooling, pairing, and Welch’s non-pooled version of the two-sample analysis. Next, we explored ANOVA techniques, where we looked at using data from multiple groups to identify statistically significant differences between them. This included one of the most popular adjustments to the p-value for when a high volume of groups is present—the Bonferroni correction, which helps prevent inflating the Type I error when performing multiple tests. We then looked at performing correlation analysis...