At the core of linear regression, there is the search for a line's equation that it is able to minimize the sum of the squared errors of the difference between the line's y values and the original ones. As a reminder, let's say our regression function is called h
, and its predictions h(X)
, as in this formulation:
Consequently, our cost function to be minimized is as follows:
There are quite a few methods to minimize it, some performing better than others in the presence of large quantities of data. Among the better performers, the most important ones are Pseudoinverse (you can find this in books on statistics), QR factorization, and gradient descent.