Linear regression is a statistical model identifying a relationship between numeric variables. Given a set of objects described by the y attribute and the x1, …,
and xn
features, the model defines a relationship between the features and the attribute. The relationship is described by the linear function y = a0 + a1 * x1 + … + an * xn, and a0, …,
and an
are parameters defined by the method in such a way that the relationship is as close as possible to the data.
In the case of machine learning, linear regression can be used to predict a numeric attribute. The algorithm learns from the training dataset to determine the parameters. Then, given a new object, the model inserts its features into the linear function to estimate the attribute.
In our example, we want to estimate the population of a country starting from its area. First, let's visualize the data about the area (in thousand km2) and the population (in millions), as shown in the following figure:
Most of the countries...