L1 regularization adds a penalty proportional to the absolute values of the model coefficients to the loss function. This penalty encourages sparsity-some coefficients become exactly zero-making it useful for feature selection.

Remember how L1 and L2 metrics work with regards to the unit ball.

Loss Function:

where:

  • = Mean Squared Error
  • = Regularization strength
  • = Model weights

Key Properties:

  • Adds penalty based on absolute value of coefficients.
  • Drives some coefficients to zero by removes less relevant features.
  • Produces a sparse model (subset of important features retained).

Example (Lasso in scikit-learn):

from sklearn.linear_model import Lasso
 
# Initialize and fit Lasso model
model = Lasso(alpha=0.1)  # alpha controls regularization strength
model.fit(X_train, y_train)

Use Case:

  • Ideal for feature selection when dealing with many predictors.

Video Explanation