Book Image

Building Statistical Models in Python

By : Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
Book Image

Building Statistical Models in Python

By: Huy Hoang Nguyen, Paul N Adams, Stuart J Miller

Overview of this book

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation. This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more. By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.
Table of Contents (22 chapters)
1
Part 1:Introduction to Statistics
7
Part 2:Regression Models
10
Part 3:Classification Models
13
Part 4:Time Series Models
17
Part 5:Survival Analysis

What is a time series?

In this chapter and the next few chapters, we will work with a type of data called time-series data. Up until this point, we have worked with independent data—that is, data consisting of samples that are not related. A time series is typically a measurement of the same sample taken over time, which makes the samples in this type of data related. There are many time series present around us every day. A few common examples of time series are daily temperature measurements, stock price ticks, and the heights of ocean tides. While a time series does not need to be measured at fixed intervals, in this book, we will primarily be concerned with measurements taken at fixed intervals, such as daily or every second.

Let’s look at some notation. In the following equation, we have a variable x that is repeatedly sampled over time. The subscripts enumerate the sample points (sample 1 through sample t), and the whole series of samples is denoted X. The subscript...