## Linear regression

We start by looking at the most simple and most used model in statistics, which consists of fitting a straight line to a dataset. We assume we have a data set of pairs *(x _{i}*

*, y*

_{i}*)*that are

*i.i.d*and we want to fit a model such that:

*y = βx +β _{0}*

*+*

*ϵ*

Here, ϵ is a Gaussian noise. If we assume that *x _{i}*

*ϵ*

*ℝ*then the expected value can also be written as:

^{n}Or, in matrix notation, we can also include the intercept *β _{0}* into the vector of parameters and add a column on 1 in

*X*, such that

*X = (1, x*

_{1}*, …, x*

_{n}*)*to finally obtain:

*ŷ = X ^{T}*

*β*

The following figure shows an example (in one dimension) of a data set with its corresponding regression line:

In R, fitting a linear model is an easy task, as we will see now. Here, we produce a small data set with an artificial number, in order to reproduce the previous figure. In R, the function to fit a linear model is `lm()`

and it is the workhorse of this language in many situations. Of course, later in this chapter we will see more advanced algorithms:

**N=30**...