Regularisation

Regularization is a technique in machine learning that reduces the risk of overfitting by adding a penalty to the Loss function during model training. This penalty term restricts the magnitude of the model’s parameters, thereby controlling the complexity of the model. It is especially useful in linear models but can also be applied to more complex models like neural networks.

It simplifies the model.

Key Concepts

$L_{1}$ Regularization (L1 Regularisation): Adds the absolute value of the coefficients to the loss function, encouraging sparsity by driving some coefficients to zero, effectively selecting a subset of features.
$L_{2}$ Regularization (Ridge): Adds the square of the coefficients to the loss function, shrinking them toward zero. It encourages smaller coefficients but does not push them exactly to zero, helping reduce overfitting by penalizing large weights.
Elastic Net: Combines both Lasso and Ridge regularization.

Benefits

Prevents Overfitting: Regularization adds a penalty term to the loss function to avoid overfitting.
Feature Sparsity: $L_{1}$ encourages feature sparsity, while $L_{2}$ reduces coefficient magnitudes.
Enhanced Generalization: Dropout enhances generalization by preventing unit co-adaptation in neural networks.

Considerations

Underfitting Risk: Over-penalizing parameters can lead to underfitting, where the model becomes too simplistic.
Tuning $λ$ : Choosing the right penalty term (i.e., $λ$ ) is crucial for balancing bias and variance.

When and why not to us regularisation

Questions

How does the balance between $L_{1}$ and $L_{2}$ regularization impact model performance in large feature spaces?
What are the best practices for tuning the $λ$ parameter in regularization? Model Parameters Tuning.

Example

Consider a linear regression model with $L_{2}$ regularization (Ridge). The loss function would be:

$J (θ) = \frac{1}{2 m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)})^{2} + λ \sum_{j = 1}^{n} θ_{j}^{2}$

Here, $λ$ controls the strength of the regularization. Higher $λ$ values shrink the coefficients more.

Feature Selection: L1 regularization can zero out irrelevant features, improving model Interpretability and reducing computational costs.
Model Selection techniques for high-dimensional data.

Applications

Regularization is widely used in linear models but is also applied in other machine learning models, particularly those prone to overfitting:

Implementation

In ML_Tools see: Regularisation.py

Data Archive

Explorer

Regularisation

Key Concepts

Benefits

Considerations

Questions

Example

Applications

Implementation

Backlinks

Explorer

Data Archive

Explorer

Regularisation

Key Concepts

Benefits

Considerations

Questions

Example

Related Topics

Applications

Implementation

Backlinks

Explorer