Data Archive

Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is an optimization algorithm that updates Model Parameters using the gradient computed from a single randomly selected training example at each iteration, rather than the entire dataset, as in Gradient Descent .

Why do we use SGD

It allows efficient optimization (Optimisation) when working with large datasets, as computing the gradient on the entire dataset is expensive.
Introduces randomness, which can help escape local minima.
Can run on a deployed system.

Key Characteristics

Update Rule: Parameters are updated for each sample using its gradient contribution.
Objective: Minimize the Loss function efficiently without processing the full dataset at every step.

Pros:

Fast parameter updates.
Handles large-scale and streaming data well.

Cons:

Noisy updates can cause high Variance in the cost function.
Requires techniques like Learning Rate scheduling or Momentum for stable convergence.

Backlinks

Batch gradient descent
Deep Learning
Fitting weights and biases of a neural network
Gradient Descent
Mini-batch gradient descent
Optimisation techniques
PyTorch
Use Cases for a Simple Neural Network Like

math
optimisation
statistics

Created with Quartz v4.3.1 © 2025

GitHub
Linkedin