-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Enter the chapter number
4.5
Enter the page number
257
What is the cell's number in the notebook
No response
Enter the environment you are using to run the notebook
None
Question
Hello,
A little question about regularization.
Non parametric models, like random forests, make no hypothesis on the distribution of the data and can adapt to any shapes. The counterpart is they can even fit to the noise and local errors, which lead to overfitting. In this context, I perfectly understand the need of regularization : we force the model to not adapt to much to the data, to prevent overfitting.
But in the case of a parametric model with few parameters, like a linear models, I don’t understand what’s the point with lasso, ridge, elastic net regression, etc. It can be proved that minimizing the squared sum give the best unbiased estimators for all parameters with the minimum variance. Is there a mathematical justification to add a penalty to this natural loss function ? Or, if it is totally empirical, why this difference between theory and practice ?
Thanks