Chapter 5: Probabilistic Modeling | 15 Math Concepts Every Data Scientist Should Know

Book Overview & Buying
Table Of Contents

15 Math Concepts Every Data Scientist Should Know

By : David Hoyle

4.3 (6)

Buy this Book

15 Math Concepts Every Data Scientist Should Know

4.3 (6)

By: David Hoyle

Buy this Book

Overview of this book

Data science combines the power of data with the rigor of scientific methodology, with mathematics providing the tools and frameworks for analysis, algorithm development, and deriving insights. As machine learning algorithms become increasingly complex, a solid grounding in math is crucial for data scientists. David Hoyle, with over 30 years of experience in statistical and mathematical modeling, brings unparalleled industrial expertise to this book, drawing from his work in building predictive models for the world's largest retailers. Encompassing 15 crucial concepts, this book covers a spectrum of mathematical techniques to help you understand a vast range of data science algorithms and applications. Starting with essential foundational concepts, such as random variables and probability distributions, you’ll learn why data varies, and explore matrices and linear algebra to transform that data. Building upon this foundation, the book spans general intermediate concepts, such as model complexity and network analysis, as well as advanced concepts such as kernel-based learning and information theory. Each concept is illustrated with Python code snippets demonstrating their practical application to solve problems. By the end of the book, you’ll have the confidence to apply key mathematical concepts to your data science challenges.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Share your thoughts

Download a free PDF copy of this book

Free Chapter

Part 1: Essential Concepts

Chapter 1: Recap of Mathematical Notation and Terminology

Technical requirements

Number systems

Linear algebra

Sums, products, and logarithms

Differential and integral calculus

Analysis

Combinatorics

Summary

Notes and further reading

Chapter 2: Random Variables and Probability Distributions

Technical requirements

All data is random

Random variables and probability distributions

Sampling from distributions

Understanding statistical estimators

The Central Limit Theorem

Summary

Exercises

Chapter 3: Matrices and Linear Algebra

Technical requirements

Inner and outer products of vectors

Matrices as transformations

Matrix decompositions

Matrix properties

Matrix factorization and dimensionality reduction

Summary

Exercises

Notes and further reading

Chapter 4: Loss Functions and Optimization

Technical requirements

Loss functions – what are they?

Least Squares

Linear models

Gradient descent

Summary

Exercises

Chapter 5: Probabilistic Modeling

Technical requirements

Likelihood

Bayes’ theorem

Bayesian modeling

Bayesian modeling in practice

Summary

Exercises

Part 2: Intermediate Concepts

Chapter 6: Time Series and Forecasting

Technical requirements

What is time series data?

ARIMA models

ARIMA modeling in practice

Machine learning approaches to time series analysis

Summary

Exercises

Notes and further reading

Chapter 7: Hypothesis Testing

Technical requirements

What is a hypothesis test?

Confidence intervals

Type I and Type II errors, and power

Summary

Exercises

Notes and further reading

Chapter 8: Model Complexity

Technical requirements

Generalization, overfitting, and the role of model complexity

The bias-variance trade-off

Model complexity measures for model selection

Summary

Notes and further reading

Chapter 9: Function Decomposition

Technical requirements

Why do we want to decompose a function?

Expanding a function in terms of basis functions

Fourier series

Fourier transforms

The discrete Fourier transform

Summary

Exercises

Chapter 10: Network Analysis

Technical requirements

Graphs and network data

Basic characteristics of graphs

Different types of graphs

Community detection and decomposing graphs

Summary

Exercises

Notes and further reading

Part 3: Selected Advanced Concepts

Chapter 11: Dynamical Systems

Technical requirements

What is a dynamical system and what is an evolution equation?

First-order discrete Markov processes

Higher-order discrete Markov processes

Hidden Markov Models

Summary

Exercises

Notes and further reading

Chapter 12: Kernel Methods

Technical requirements

The role of inner products in common learning algorithms

The kernel trick

An example of a kernelized learning algorithm

Summary

Exercises

Chapter 13: Information Theory

Technical requirements

What is information and why is it useful?

Entropy as expected information

Mutual information

The Kullback-Leibler divergence

Summary

Exercises

Notes and further reading

Chapter 14: Non-Parametric Bayesian Methods

Technical requirements

What are non-parametric Bayesian methods?

Gaussian processes

Dirichlet processes

Summary

Exercises

Chapter 15: Random Matrices

Technical requirements

What is a random matrix?

Using random matrices to represent interactions in large-scale systems

Universal behavior of large random matrices

Random matrices and high-dimensional covariance matrices

Summary

Exercises

Notes and further reading

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share your thoughts

Download a free PDF copy of this book

15 Math Concepts Every Data Scientist Should Know

By : David Hoyle

15 Math Concepts Every Data Scientist Should Know

By: David Hoyle

Overview of this book

Likelihood

A simple probabilistic model

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access