Decision Trees are considered fragile because small changes in the training data can lead to large changes in the tree structure and predictions.

Why this happens

  1. Greedy splitting

    • Trees build by making locally optimal splits at each node.
    • A slight change in data (e.g., one sample or a small noise) can change which split is chosen, altering the entire subtree downstream.
  2. High variance Variance in ML

    • Decision trees have low bias but high variance.
    • They overfit easily to noise or outliers unless pruned or regularized.
  3. Hierarchical structure

    • Early splits strongly influence the rest of the tree.
    • A different root split due to minor changes cascades into a completely different structure.
  4. Sensitivity to outliers

    • Extreme values can dominate split selection, further increasing instability.

Consequence

  • Different training sets produce very different trees → poor generalization.
  • This is why ensemble methods like Random Forestand Gradient Boosted Trees are widely used; they reduce variance by combining multiple trees.