Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Part 1:Introduction to Statistics

Free Chapter

Chapter 1: Sampling and Generalization

Software and environment setup

Population versus sample

Population inference from samples

Sampling strategies – random, systematic, stratified, and clustering

Summary

Chapter 2: Distributions of Data

Technical requirements

Understanding data types

Measuring and describing distributions

The normal distribution and central limit theorem

Summary

Chapter 3: Hypothesis Testing

The goal of hypothesis testing

Type I and Type II errors

Basics of the z-test – the z-score, z-statistic, critical values, and p-values

Summary

Chapter 4: Parametric Tests

Assumptions of parametric tests

T-test – a parametric hypothesis test

Tests with more than two groups and ANOVA

Summary

References

Chapter 5: Non-Parametric Tests

When parametric test assumptions are violated

The Rank-Sum test

The Signed-Rank test

The Kruskal-Wallis test

Chi-square distribution

Chi-square goodness-of-fit

Chi-square test of independence

Chi-square goodness-of-fit test power analysis

Spearman’s rank correlation coefficient

Summary

Part 2:Regression Models

Chapter 6: Simple Linear Regression

Simple linear regression using OLS

Coefficients of correlation and determination

Required model assumptions

Testing for significance and validating models

Summary

Chapter 7: Multiple Linear Regression

Multiple linear regression

Feature selection

Shrinkage methods

Dimension reduction

Summary

Part 3:Classification Models

Chapter 8: Discrete Models

Probit and logit models

Multinomial logit model

Poisson model

The negative binomial regression model

Summary

Chapter 9: Discriminant Analysis

Bayes’ theorem

Linear Discriminant Analysis

Quadratic Discriminant Analysis

Summary

Part 4:Time Series Models

Chapter 10: Introduction to Time Series

What is a time series?

Goals of time series analysis

Statistical measurements

The white-noise model

Stationarity

Summary

References

Chapter 11: ARIMA Models

Technical requirements

Models for stationary time series

Models for non-stationary time series

Seasonal ARIMA models

The Rank-Sum test

When the assumptions of the t-test are not met, the Rank-Sum test is often a good non-parametric alternative test. While the t-test can be used to test for the difference between the means of two distributions, the Rank-Sum test is used to test for the difference between the locations of two distributions. This difference in the test utility is due to the lack of parametric assumptions in the Rank-Sum test. The null hypothesis of the Rank-Sum test is that the distribution underlying the first sample is the same as the second sample. If the sample distributions appear to be similar, this allows us to use the Rank-Sum test to test for the difference in the locations of the two samples. As stated, the Rank-Sum test cannot specifically be used for testing the difference between means because it does not require assumptions about the sample distributions.

The test statistic procedure

The test procedure is straightforward. The process is outlined here and an example...

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

Related Content you might be interested in

Current Title:

Building Statistical Models in Python

Essential Statistics for Non-STEM Data Analysts

Associations and Correlations

Practical Time Series Analysis

The Rank-Sum test

The test statistic procedure