Batch Gradient Descent computes the gradient of the cost function using the entire training dataset at each iteration before updating the model parameters.

Key Characteristics

  • Update Rule: Parameters are updated once per epoch, after processing the entire dataset.
  • Objective: Achieve accurate and stable updates, as the gradient is computed over all training examples.

Pros:

  • Produces a stable convergence path.
  • Provides an accurate estimate of the gradient.

Cons:

  • Computationally expensive for large datasets.
  • Requires the entire dataset to fit in memory, which may not be feasible for big data.
  • Slower to start learning compared to SGD or Mini-Batch methods.