-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
15 Math Concepts Every Data Scientist Should Know
By :
So far, we’ve learned a lot about random variables, probability distributions, and how to calculate some of the key characteristics of a distribution such as its mean and variance, and we’ve learned about some commonly occurring distributions. But so far, it doesn’t feel like we’ve learned much about data. We’ll now change that.
We said at the beginning of this chapter that all data is random. This means when data is captured or generated, we are drawing or sampling values from some underlying probability distribution. This is illustrated schematically in Figure 2.10:
Figure 2.10: Diagram illustrating how real data is generated as samples from a population
A sample is finite. It represents a snapshot or subset of the entirety of possible outcomes; for example, a subset of all users who might visit a website. But from...