A lag value is simply the value of a series at a previous time step, used as a feature for the current prediction. For example, at time , the feature is the lagged value from the previous step.
Lags capture short-term dependencies in a series and are especially useful for models such as Random Forest for Time Series.
Why use lags?
- They help the model learn relationships from recent history.
- Useful when modeling Seasonality in Time Series, since repeating patterns often depend on past values.
- Performance can be compared by looking at Evaluation Metrics with and without lag features.
Limitations of Lags
While lags can boost short-horizon forecasts, they can cause problems in longer-term forecasting:
- At some point, the model no longer has real lag values to use as inputs—it must rely on its own predicted values.
- This creates error propagation: each prediction carries uncertainty, which compounds with each additional step.
Example:
- At time , , model predicts .
- At , the model must use 6 (a prediction) as input, leading to more uncertainty.
- If it predicts at , the forecast for is based on (itself dependent on ), compounding the error.
This recursive process means performance usually degrades with horizon length.
Beyond Lags
To reduce over-reliance on lagged features, it’s often useful to add other numerical or categorical predictors, such as:
- Calendar features (holiday flags, weekday vs weekend, month/season indicators).
- External variables (e.g., temperature, promotions, economic indicators).
These features can help the model generalize better and reduce error accumulation over time.
Resources: