Regularization: Ridge and Lasso
Regularization is an important concept in machine learning; it's used to counteract overfitting. In the world of big data, it's easy to overfit data to the training set. When this happens, the model will often perform badly on the test set as indicated by mean_squared_error
, or some other error.
You may wonder why a test set is kept aside at all. Wouldn't the most accurate machine learning model come from fitting the algorithm on all the data?
The answer, generally accepted by the machine learning community after years of research and experimentation, is probably not.
There are two main problems with fitting a machine learning model on all the data:
- There is no way to test the model on unseen data. Machine learning models are powerful when they make good predictions on new data. Models are trained on known results, but they perform in the real world on data that has never been seen before. It's not vital to see...