Book Image

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Book Image

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
Table of Contents (22 chapters)
1
Part 1:Introduction to Statistics
7
Part 2:Regression Models
10
Part 3:Classification Models
13
Part 4:Time Series Models
17
Part 5:Survival Analysis

Statistical measurements

When using time-series models to work with serially correlated data sets, we need to understand mean and variance – within the context of time – in addition to autocorrelation and cross-correlation. Understanding these variables helps build an intuition about how time-series models work and when they are more useful than models that do not account for time.

Mean

In time-series analysis, the sample mean of a series is the sum of all values across each point in time in the series divided by the count of values. Where t represents each discrete time step and n is the total number of time steps, we can calculate the sample mean of a time series as follows:

 _ X  =  1 _ n   t=1 n x t

There are two types of processes generating time series; one is an ergodic process and the other is non-ergodic. An ergodic process has consistent output independent of time, whereas a non-ergodic...