-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
15 Math Concepts Every Data Scientist Should Know
By :
To understand the probability distribution that the data follows, we’ll look at an explicit example of how a random component is incorporated into data.
We’ll start with the simplest way in which we can introduce a random component into our observations of the response (target) variable
, namely by adding noise to a deterministic quantity. In fact, we’ll just consider the observations
in our dataset to be noise-corrupted versions of a model output
. So, we have this relationship:

Eq. 1
Here,
is the noise value that has been added to the model output
to get the observation
for the
datapoint. The value
is a random variable. Without loss of generality, we can assume its expectation value is zero, so we have
. We can make this assumption because if the expectation of
was non-zero, it would mean we have a non-zero deterministic average contribution from
that we could just absorb into the definition of
...