Model training is the numerical Model Optimisation procedure that adjusts a model’s parameters (Model Parameters) so that its predictions approximate a target function with respect to observed data.

More formally:

Given:

  • A dataset
  • A parametric model
  • A loss function

Training solves (explicitly or approximately):

The outcome is an estimate that minimises empirical risk.


What Actually Changes During Training

Training modifies model parameters:

  • Linear regression: weights and bias
  • Neural networks: weight matrices and bias vectors
  • Tree-based models: split thresholds and structure
  • SVM: support vectors and margin parameters

The structure of the model is fixed beforehand. Training does not change the model class, only its parameter values.


Core Components

1. Forward Pass

Compute predictions:

2. Loss Computation

Measure discrepancy between prediction and target:

  • Regression:
  • Classification: cross-entropy
  • Others: hinge loss, MAE, etc.

3. Optimisation Step

Update parameters using an optimisation algorithm:

Where:

  • = learning rate
  • = gradient of loss

This is typically stochastic gradient descent (SGD) or a variant (Adam, RMSProp).


What Training Achieves

Training attempts to:

  • Learn the mapping
  • Reduce empirical error
  • Capture statistical structure in the data
  • Generalise to unseen data

The final model encodes patterns from the training data into parameter values.


Conceptual View

Training is:

  • An optimisation problem (minimise loss)
  • A statistical estimation problem (estimate parameters)
  • A numerical procedure (iterative updates)
  • A representation learning process (in deep learning)

Important Distinctions

Training is not:

  • Evaluation (that happens on Validation/test sets)
  • Feature engineering (done before training)
  • Hyperparameter tuning (outer loop optimisation)
  • Inference (using the trained model for prediction)

Example

Linear regression:

Model:

Loss:

Training computes optimal that minimise squared error across the dataset.

After training, prediction is just: