Validation

In machine learning, validation refers to the process of evaluating a model on data that was not used during training, in order to guide model development.

More precisely:

You split your data into at least two parts:
- Training set: used to fit the model parameters
- Validation set: used to assess model performance during development

The validation set is used to:

Tune hyperparameters For example, choosing:
- regularisation strength $λ$
- tree depth in a random forest
- learning rate in gradient methods (Learning Rate)
You train multiple models with different settings and select the one with the best validation performance.
Select models: Compare different model classes (e.g., linear regression vs. gradient boosting) using the same validation set.
Detect overfitting: If training performance improves but validation performance degrades, the model is fitting noise rather than signal.

Formal view

Let:

$D = (x_{i}, y_{i})_{i = 1}^{n}$ be the dataset
Split into $D_{train}$ and $D_{val}$
and $l$ be the loss fucntion
$l (f_{θ} (x), y)$ =error between prediction and truth

Train a model:

Evaluate on validation:

ValError = \frac{1}{∣ D _{val} ∣} (x, y) \in D_{val} \sum ℓ (f_{θ} (x), y)

This approximates how well the model generalises to unseen data.

Distinction from test data

Validation set: used repeatedly during development
Test set: used once at the end for an unbiased estimate

Repeated use of validation data introduces bias Bias in ML, which is why a separate test set is needed.

Common extensions

Cross Validation Instead of a single split, partition data into $k$ folds and rotate the validation set: $CV error = \frac{1}{k} \sum_{j = 1}^{k} error_{j}$
Time series validation Use ordered splits (e.g., train on past, validate on future) rather than random splits.

Simple example

Train a model on 80% of the data
Evaluate accuracy on the remaining 20%
Adjust hyperparameters and repeat until performance stabilises

Future questions:

how validation interacts with model selection bias

Train-Dev-Test Sets

Data Archive

Explorer