Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying 15 Math Concepts Every Data Scientist Should Know
  • Table Of Contents Toc
  • Feedback & Rating feedback
15 Math Concepts Every Data Scientist Should Know

15 Math Concepts Every Data Scientist Should Know

By : David Hoyle
4.3 (6)
close
close
15 Math Concepts Every Data Scientist Should Know

15 Math Concepts Every Data Scientist Should Know

4.3 (6)
By: David Hoyle

Overview of this book

Data science combines the power of data with the rigor of scientific methodology, with mathematics providing the tools and frameworks for analysis, algorithm development, and deriving insights. As machine learning algorithms become increasingly complex, a solid grounding in math is crucial for data scientists. David Hoyle, with over 30 years of experience in statistical and mathematical modeling, brings unparalleled industrial expertise to this book, drawing from his work in building predictive models for the world's largest retailers. Encompassing 15 crucial concepts, this book covers a spectrum of mathematical techniques to help you understand a vast range of data science algorithms and applications. Starting with essential foundational concepts, such as random variables and probability distributions, you’ll learn why data varies, and explore matrices and linear algebra to transform that data. Building upon this foundation, the book spans general intermediate concepts, such as model complexity and network analysis, as well as advanced concepts such as kernel-based learning and information theory. Each concept is illustrated with Python code snippets demonstrating their practical application to solve problems. By the end of the book, you’ll have the confidence to apply key mathematical concepts to your data science challenges.
Table of Contents (21 chapters)
close
close
Lock Free Chapter
1
Part 1: Essential Concepts
7
Part 2: Intermediate Concepts
13
Part 3: Selected Advanced Concepts

Random matrices and high-dimensional covariance matrices

The examples of large random matrices in the previous section were all square matrices. However, in real-world data science, not all matrices are square. Take the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> data matrix that we encountered in Chapter 3 when doing Principal Component Analysis (PCA). It is an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>d</mml:mi></mml:math> matrix, where <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi></mml:math> is the number of data points and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>d</mml:mi></mml:math> is the number of features. We will assume, for this section, that the data has already been mean-centered, so that the sum of each column of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> is 0.

The <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> matrix is what we use to do PCA. It is also the design matrix that we use when building statistical models. So, the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math> matrix is non-square (unless <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo></mml:math>. However, in practice, we usually derive a square matrix from <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:math>. For example, when doing PCA, we would calculate the sample covariance matrix <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mover accent="true"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math>, which is defined as follows:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow><mrow><mover><munder><munder><mi>C</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mo stretchy="true">ˆ</mo></mover><mo>=</mo><mfrac><mn>1</mn><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow></mfrac><msup><munder><munder><mi>X</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mi mathvariant="normal">⊤</mi></msup><munder><munder><mi>X</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder></mrow></mrow></math>

Eq.10

The <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mover accent="true"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:munder underaccent="false"><mml:mrow><mml:mi>C</mml:mi></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>_</mml:mo></mml:munder></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:math> matrix in Eq.10 is <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"><mml:mi>d</mml:mi><mml:mo>×</mml:mo><mml:mi>d</mml:mi></mml:math> and symmetric. If we had many features, it would be a large matrix. Since <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mover><munder><munder><mi>C</mi><mo stretchy="true">_</mo></munder><mo stretchy="true">_</mo></munder><mo stretchy="true">ˆ</mo></mover></mrow></math>is derived from our data, which contains...

Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
15 Math Concepts Every Data Scientist Should Know
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon