initialization methods

A weight initialization method sets the initial values of a neural network’s weights before training begins. This choice significantly affects how well and how fast the network learns.

Why Initialization Matters

During training, weights are updated via gradient descent. If weights start too small or too large, gradients can:

Vanish (approach zero), halting learning in earlier layers.
Explode (grow very large), causing unstable updates.

Proper initialization ensures that:

The input signal maintains a consistent scale as it propagates forward.
The gradient signal maintains a consistent scale as it propagates backward.

Main Goals of Initialization

Prevent vanishing/exploding gradients.
Maintain similar variance of activations and gradients across layers.
Speed up convergence.

Common Initialization Methods

Xavier (Glorot): Scales weights based on the number of input and output units to maintain variance.
He: Optimized for ReLU activations, focuses on variance preservation from the input side.
Uniform / Normal: Basic methods that are prone to instability in deep networks.

Data Archive

Explorer

initialization methods

Why Initialization Matters

Main Goals of Initialization

Common Initialization Methods

Backlinks

Explorer

Data Archive

Explorer

initialization methods

Why Initialization Matters

Main Goals of Initialization

Common Initialization Methods

Related Notes

Backlinks

Explorer