L2 Regularization, also known as Ridge Regularization, adds a penalty term proportional to the square of the weights to the loss function.
This technique enhances the robustness of linear regression models (and Logistic Regression) by penalizing large coefficients, encouraging smaller weights overall, and distributing weight values more evenly across all features.
Key Points
- Overfitting Mitigation: Ridge helps mitigate overfitting, especially in high-dimensional datasets, and is effective in managing Multicollinearity among predictors.
- Coefficient Shrinkage: Unlike Lasso regularization (L1), which can eliminate some features entirely by driving their coefficients to zero, Ridge reduces the magnitudes of coefficients but retains all features.
- Multicollinearity Handling: Particularly useful when predictors are highly correlated, as it stabilizes estimates by shrinking the coefficients of correlated features.
- Feature Retention: Ridge retains all features in the model, unlike Lasso, which can perform Feature Selectionby setting some coefficients to zero.
Understanding Ridge Regularization
1. Purpose of Ridge Regularization
- Penalty Addition: Adds a penalty term to the loss function, proportional to the square of the coefficients (weights), discouraging overly complex models by shrinking the coefficients.
2. Mathematical Formulation
- The loss function for Ridge regression can be expressed as:
- Where:
- SSE (Sum of Squared Errors) is the original loss function for linear regression
- is the regularization parameter (penalty term) that controls the strength of the penalty.
- are the coefficients of the model.
- is the number of predictors.
- Where:
3. Effect of the Regularization Parameter ()
- Range: can take values from 0 to infinity.
- Impact:
- A small (close to 0) means the model behaves similarly to ordinary least squares (OLS) regression, with minimal regularization.
- A large increases the penalty, leading to smaller coefficients and a simpler model.
4. Finding the Best
- Use techniques like cross-validation to determine the optimal value of . By testing various values and evaluating model performance, select the one that minimizes prediction error (or variance).
Example Code
Resources
Understanding the Content
- L2 Regularization (Ridge): This technique is crucial for improving model generalization by penalizing large coefficients, which helps in reducing overfitting and handling multicollinearity. The regularization parameter controls the trade-off between fitting the training data well and keeping the model coefficients small.
L2 Regularization (Ridge Regression): for Neural network
Adds a penalty term to the loss: ( L_{\text{regularized}} = L + \lambda \cdot ||W||^2 ). This discourages overly complex models by penalizing large weights.
Example: