Model training is the numerical Model Optimisation procedure that adjusts a model’s parameters (Model Parameters) so that its predictions approximate a target function with respect to observed data.
More formally:
Given:
- A dataset
- A parametric model
- A loss function
Training solves (explicitly or approximately):
The outcome is an estimate that minimises empirical risk.
What Actually Changes During Training
Training modifies model parameters:
- Linear regression: weights and bias
- Neural networks: weight matrices and bias vectors
- Tree-based models: split thresholds and structure
- SVM: support vectors and margin parameters
The structure of the model is fixed beforehand. Training does not change the model class, only its parameter values.
Core Components
1. Forward Pass
Compute predictions:
2. Loss Computation
Measure discrepancy between prediction and target:
- Regression:
- Classification: cross-entropy
- Others: hinge loss, MAE, etc.
3. Optimisation Step
Update parameters using an optimisation algorithm:
Where:
- = learning rate
- = gradient of loss
This is typically stochastic gradient descent (SGD) or a variant (Adam, RMSProp).
What Training Achieves
Training attempts to:
- Learn the mapping
- Reduce empirical error
- Capture statistical structure in the data
- Generalise to unseen data
The final model encodes patterns from the training data into parameter values.
Conceptual View
Training is:
- An optimisation problem (minimise loss)
- A statistical estimation problem (estimate parameters)
- A numerical procedure (iterative updates)
- A representation learning process (in deep learning)
Important Distinctions
Training is not:
- Evaluation (that happens on Validation/test sets)
- Feature engineering (done before training)
- Hyperparameter tuning (outer loop optimisation)
- Inference (using the trained model for prediction)
Example
Linear regression:
Model:
Loss:
Training computes optimal that minimise squared error across the dataset.
After training, prediction is just: