f_regression is a statistical method provided by sklearn.feature_selection to evaluate the linear relationship between each independent variable in X and a continuous target variable y. It is a univariate feature selection method based on the F-statistic from simple linear regression.

Specifically:

  • For each feature, f_regression fits a simple linear regression model (i.e. one feature at a time).
  • It computes:
    • The F-statistic, which tests whether there is a linear relationship between the feature and the target.
    • The corresponding p-value, which helps assess the statistical significance of that relationship.

Statistical Assumptions

  • The relationship between each feature and the target is assumed to be linear.
  • Errors are assumed to be normally distributed with constant variance.
  • Features are assessed independently — mutual influence or multicollinearity is ignored.

Limitations

  • f_regression does not account for feature interactions or joint effects.
  • It cannot capture non-linear dependencies.
  • It is not applicable to models outside the linear regression framework (e.g., tree-based models or SVMs).

When to Use f_regression

Use f_regression when:

  • You’re performing linear regression and want to evaluate individual feature relevance.
  • You want a fast, interpretable filter method for selecting features before training.
  • You assume no or limited multicollinearity between features.

Do not use f_regression when:

  • Your model is non-linear or non-parametric.
  • Feature interactions are essential to the model’s behavior.
  • You’re working with classification tasks — instead, use f_classif.

Example: Computing P-values with f_regression

from sklearn.feature_selection import f_regression
 
# Compute F-statistics and p-values
f_stats, p_values = f_regression(X, y)
 
print(p_values)

Example Output:

(array([56.04804786, 0.17558437]), array([7.19951844e-11, 6.76291372e-01]))
  • Feature 1 is statistically significant (very small p-value).
  • Feature 2 is not statistically significant.

Documentation: scikit-learn f_regression