L1 regularization adds a penalty proportional to the absolute values of the model coefficients to the loss function. This penalty encourages sparsity-some coefficients become exactly zero-making it useful for feature selection.
Remember how L1 and L2 metrics work with regards to the unit ball.
Loss Function:
where:
- = Mean Squared Error
- = Regularization strength
- = Model weights
Key Properties:
- Adds penalty based on absolute value of coefficients.
- Drives some coefficients to zero by removes less relevant features.
- Produces a sparse model (subset of important features retained).
Example (Lasso in scikit-learn):
from sklearn.linear_model import Lasso
# Initialize and fit Lasso model
model = Lasso(alpha=0.1) # alpha controls regularization strength
model.fit(X_train, y_train)
Use Case:
- Ideal for feature selection when dealing with many predictors.